困惑
计算机科学
语言模型
变压器
人工智能
超参数
树库
自然语言处理
依赖关系(UML)
工程类
电气工程
电压
作者
Zihang Dai,Zhilin Yang,Yiming Yang,Jaime Carbonell,Quoc V. Le,Ruslan Salakhutdinov
出处
期刊:Cornell University - arXiv
日期:2019-01-01
被引量:439
标识
DOI:10.48550/arxiv.1901.02860
摘要
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch.
科研通智能强力驱动
Strongly Powered by AbleSci AI