变压器
编码器
计算机科学
解码方法
计算机工程
人工智能
工程类
算法
电压
电气工程
操作系统
作者
Xiaocui Zhu,Qunsheng Ruan,Sai Qian,Miaohui Zhang
标识
DOI:10.21203/rs.3.rs-4782985/v1
摘要
Abstract In recent years, State Space Models (SSMs) have achieved significant advancements in the field oflanguage modeling. With the advent of Mamba, these models have garnered even greater attention,surpassing Transformers in certain aspects. Despite Mamba’s unique advantages, Transformers remainindispensable due to their complex computational capabilities and proven effectiveness. This paperproposes a novel model that effectively combines the strengths of both Transformers and Mamba.Specifically, our model employs the Transformer’s encoder for encoding and utilizes Mamba as thedecoder for decoding. We introduce a feature fusion technique that integrates the features generated bythe encoder with the hidden states produced by the decoder. This approach effectively amalgamatesthe advantages of both Transformer and Mamba, resulting in enhanced performance. Extensiveexperiments on various language tasks demonstrate that our proposed model achieves competitiveresults, consistently outperforming existing benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI