计算机科学
水准点(测量)
代表(政治)
人工智能
监督学习
机器学习
多样性(控制论)
半监督学习
语音识别
特征学习
自然语言处理
人工神经网络
大地测量学
政治
政治学
法学
地理
作者
Sangeeta Srivastava,Yun Wang,Andros Tjandra,Anurag Kumar,Chunxi Liu,Kritika Singh,Yatharth Saraf
出处
期刊:Cornell University - arXiv
日期:2021-01-01
标识
DOI:10.48550/arxiv.2110.07313
摘要
Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks. In this paper, we propose a self-supervised audio representation learning method and apply it to a variety of downstream non-speech audio tasks. We combine the well-known wav2vec 2.0 framework, which has shown success in self-supervised learning for speech tasks, with parameter-efficient conformer architectures. Our self-supervised pre-training can reduce the need for labeled data by two-thirds. On the AudioSet benchmark, we achieve a mean average precision (mAP) score of 0.415, which is a new state-of-the-art on this dataset through audio-only self-supervised learning. Our fine-tuned conformers also surpass or match the performance of previous systems pre-trained in a supervised way on several downstream tasks. We further discuss the important design considerations for both pre-training and fine-tuning.
科研通智能强力驱动
Strongly Powered by AbleSci AI