计算机科学
情绪识别
任务(项目管理)
语音识别
微调
基线(sea)
编码(集合论)
人工智能
自然语言处理
模式识别(心理学)
程序设计语言
工程类
海洋学
物理
地质学
集合(抽象数据类型)
系统工程
量子力学
作者
Liwei Chen,Alexander I. Rudnicky
标识
DOI:10.1109/icassp49357.2023.10095036
摘要
While Wav2Vec 2.0 has been proposed for speech recognition (ASR), it can also be used for speech emotion recognition (SER); its performance can be significantly improved using different fine-tuning strategies. Two baseline methods, vanilla fine-tuning (V-FT) and task adaptive pretraining (TAPT) are first presented. We show that V-FT is able to outperform state-of-the-art models on the IEMOCAP dataset. TAPT, an existing NLP fine-tuning strategy, further improves the performance on SER. We also introduce a novel fine-tuning method termed P-TAPT, which modifies the TAPT objective to learn contextualized emotion representations. Experiments show that P-TAPT performs better than TAPT, especially under low-resource settings. Compared to prior works in this literature, our top-line system achieved a 7.4% absolute improvement in unweighted accuracy (UA) over the state-of-the-art performance on IEMOCAP. Our code is publicly available. 1
科研通智能强力驱动
Strongly Powered by AbleSci AI