计算机科学
超参数
任务(项目管理)
卷积(计算机科学)
特征(语言学)
人工智能
机器学习
数据挖掘
人工神经网络
语言学
哲学
管理
经济
标识
DOI:10.1016/j.procs.2023.08.156
摘要
Multi-task learning (MTL), an important branch of machine learning, has been successfully applied to many fields, and its effectiveness in practice has been proved. However, at present, the soft parameter sharing model represented by multi-gate mixtureof-experts (MMOE) still has some disadvantages, including negative transfer, seesaw phenomenon, and inadequate utilization of shared information. Although the existing research has improved these issues, they also bring some new problems, such as high model complexity and difficulty in hyperparameter tuning. To address these issues, we propose a multi-gate mixture-of-experts model base on attention and convolution (AC-MMOE), which incorporates a multi-layer perception based attention module and a column convolution module. AC-MMOE applies attention to achieve feature extraction and convolution to integrate the output of shared substructures, which improves the feature extraction ability and information fusion ability of the model without significantly increasing the model training cost. We validate the performance of AC-MMOE on several MTL datasets, the experimental results show that our model achieves better results than other baselines on various datasets with different task correlations.
科研通智能强力驱动
Strongly Powered by AbleSci AI