前进 (2024-07-31 11:35):
#paper DOI:https://doi.org/10.48550/arXiv.2006.16236 Katharopoulos A, Vyas A, Pappas N, et al. Transformers are rnns: Fast autoregressive transformers with linear attention[C]//International conference on machine learning. PMLR, 2020: 5156-5165. 这篇论文提出了一种新型的线性Transformer模型,该模型通过将自注意力机制表达为线性点积的核特征映射,并利用矩阵乘法的结合性质,显著降低了传统Transformer在处理长序列时的计算复杂度,从O(N^2)降低到O(N)。作者展示了这种新模型不仅能够实现与标准Transformer相似的性能,而且在自回归预测长序列时速度提升了多达4000倍。此外,论文还探讨了Transformer与循环神经网络(RNN)之间的关系,证明了通过适当的转换,Transformer可以像RNN一样高效地进行自回归预测。
arXiv, 2020-06-29T17:55:38Z. DOI: 10.48550/arXiv.2006.16236
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
翻译
Abstract:
Transformers achieve remarkable performance in several tasks but due to theirquadratic complexity, with respect to the input's length, they areprohibitively slow for very long sequences. To address this limitation, weexpress the self-attention as a linear dot-product of kernel feature maps andmake use of the associativity property of matrix products to reduce thecomplexity from $\mathcal{O}\left(N^2\right)$ to $\mathcal{O}\left(N\right)$,where $N$ is the sequence length. We show that this formulation permits aniterative implementation that dramatically accelerates autoregressivetransformers and reveals their relationship to recurrent neural networks. Ourlinear transformers achieve similar performance to vanilla transformers andthey are up to 4000x faster on autoregressive prediction of very longsequences.
翻译
回到顶部