Vincent (2025-11-30 21:07):
#paper https://arxiv.org/abs/2104.09864 Arxiv. 2021. RoFormer: Enhanced Transformer with Rotary Position Embedding 这篇论文提出 RoFormer,一种通过旋转式位置编码(Rotary Position Embedding, RoPE)增强 Transformer 推理能力的新方法。传统 Transformer 需要依赖绝对或相对位置向量“相加”到 token 表示中,而 RoPE 另辟蹊径,通过对 query 与 key 施加与位置相关的旋转变换,使自注意力在点积阶段自然地体现相对位置信息。该方法在数学上更优雅、在实现上轻量,并具备更好的长程依赖建模能力,同时与线性注意力等高效变体完全兼容。实验结果显示,RoFormer 在多个长文本任务上均显著优于传统位置编码方案,不需要额外训练成本却能带来更强表示能力,展示出其在更大规模语言模型与复杂序列任务中的广泛应用潜力。
arXiv, 2021-04-20T09:54:06Z. DOI: 10.48550/arXiv.2104.09864
RoFormer: Enhanced Transformer with Rotary Position Embedding
翻译
Abstract:
Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}.
翻译
回到顶部