林海onrush (2022-10-29 13:51):
#paper,Formal Algorithms for Transformers,url:https://arxiv.org/pdf/2207.09238.pdf,在过去5年多的时间里,Transfermers在多个领域表现出惊人的效果。但是,对于Transformers算法的描述基本都集中在使用图形、文字描述、或针对优化部分的解释,并没有一篇论文给出一个较为完整的Algorithm伪代码。deepmind官方给出了形式化算法伪代码,论文详解见下面PDF
Formal Algorithms for Transformers
翻译
Abstract:
This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.
翻译
回到顶部