计算机科学
判决
机器翻译
编码器
人工智能
变压器
解码方法
自然语言处理
背景(考古学)
编码(内存)
上下文模型
编码
源代码
语音识别
算法
物理
古生物学
操作系统
电压
基因
化学
生物
量子力学
生物化学
对象(语法)
作者
Hongfei Xu,Deyi Xiong,Josef van Genabith,Qiuhui Liu
标识
DOI:10.24963/ijcai.2020/544
摘要
Existing Neural Machine Translation (NMT) systems are generally trained on a large amount of sentence-level parallel data, and during prediction sentences are independently translated, ignoring cross-sentence contextual information. This leads to inconsistency between translated sentences. In order to address this issue, context-aware models have been proposed. However, document-level parallel data constitutes only a small part of the parallel data available, and many approaches build context-aware models based on a pre-trained frozen sentence-level translation model in a two-step training manner. The computational cost of these approaches is usually high. In this paper, we propose to make the most of layers pre-trained on sentence-level data in contextual representation learning, reusing representations from the sentence-level Transformer and significantly reducing the cost of incorporating contexts in translation. We find that representations from shallow layers of a pre-trained sentence-level encoder play a vital role in source context encoding, and propose to perform source context encoding upon weighted combinations of pre-trained encoder layers' outputs. Instead of separately performing source context and input encoding, we propose to iteratively and jointly encode the source input and its contexts and to generate input-aware context representations with a cross-attention layer and a gating mechanism, which resets irrelevant information in context encoding. Our context-aware Transformer model outperforms the recent CADec [Voita et al., 2019c] on the English-Russian subtitle data and is about twice as fast in training and decoding.
科研通智能强力驱动
Strongly Powered by AbleSci AI