计算机科学
人工智能
依赖关系(UML)
变压器
信息学
模式识别(心理学)
数据挖掘
工程类
电气工程
电压
作者
Wu Lee,Yuliang Shi,Han Yu,Lin Cheng,Xinjun Wang,Zhongmin Yan,Fanyu Kong
标识
DOI:10.1109/tpami.2025.3593657
摘要
Transformers based on Self-Attention (SA) mechanism have demonstrated unrivaled superiority in numerous areas. Compared to RNN-based networks, Transformers can learn the temporal dependency representation of an entire sequence in parallel, while efficiently dealing with long-range dependencies. However, the $\mathcal {O}(L^{2})$ ($L$ denotes the length of the sequence) computational complexity of the SA mechanism and the high memory usage make the construction cost of the Transformer-based model prohibitively expensive. To address these challenges, we propose a Transformer-like model, HPformer: Low-Parameter Transformer with Temporal Dependency Hierarchical Propagation. HPformer first chunks the sequence into $K$ ($K = \left\lceil \log {L} \right\rceil + 1$, $\left\lceil \cdot \right\rceil$ denotes ceiling operation) sequence segments, then leverages the hierarchical propagation mechanism with $\mathcal {O}(L)$ computational complexity to learn the temporal dependencies between the segments and within the segments, and ultimately generates $K$ vectors as $Key$ matrices. This reduces the complexity of the SA mechanism from $\mathcal {O}(L^{2})$ to $\mathcal {O}(L\log {L})$. In addition, we employ a strategy of sharing $Key$ and $Value$ matrices between layers to build the HPformer, thus reducing memory usage. Extensive experiments based on public health informatics benchmark and Long-Range Arena (LRA) benchmark have demonstrated that HPformer has advantages over Transformer-based models in terms of memory usage and efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI