张浩彬 (2024-09-30 17:03):
#paper DOI 10.48550/arXiv.1902.00751 Parameter-Efficient Transfer Learning for NLP 。ICML 2019 Google 提出了Adapter,这算是peft方法中的开篇文章了。最近在整理大模型的peft的经典文章准备给学生上课,这篇作为开篇最为合适。 微调大型预训练模型是NLP中有效的迁移机制。然而,在存在许多下游任务的情况下,微调在参数效率方面不佳:每个任务都需要一个全新的模型。作为替代方案,我们提出了使用适配器模块进行迁移。适配器模块产生紧凑且可扩展的模型;它们只为每个任务添加少量可训练参数,并且可以在不重新访问之前任务的情况下添加新任务。原始网络的参数保持固定,从而产生高度的参数共享。为了证明适配器的有效性,我们将最近提出的BERT Transformer模型迁移到26个不同的文本分类任务,包括GLUE基准测试。适配器达到了接近最先进的性能,同时每个任务只添加少量参数。在GLUE上,我们达到了完全微调性能的0.4%以内,每个任务只增加3.6%的参数。相比之下,微调每个任务训练100%的参数。 论文中提出了以往的领域适应方法,我们都需要单独对模型进行训练,一般来说包括了两种办法,分别是基于特征的迁移和微调。基于特征的迁移就是基于预训练的embedding模型进行作为特征输入,然后输入到特定的下游任务模型中。
arXiv, 2019-02-02T16:29:47Z. DOI: 10.48550/arXiv.1902.00751
Parameter-Efficient Transfer Learning for NLP
翻译
Abstract:
Fine-tuning large pre-trained models is an effective transfer mechanism inNLP. However, in the presence of many downstream tasks, fine-tuning isparameter inefficient: an entire new model is required for every task. As analternative, we propose transfer with adapter modules. Adapter modules yield acompact and extensible model; they add only a few trainable parameters pertask, and new tasks can be added without revisiting previous ones. Theparameters of the original network remain fixed, yielding a high degree ofparameter sharing. To demonstrate adapter's effectiveness, we transfer therecently proposed BERT Transformer model to 26 diverse text classificationtasks, including the GLUE benchmark. Adapters attain near state-of-the-artperformance, whilst adding only a few parameters per task. On GLUE, we attainwithin 0.4% of the performance of full fine-tuning, adding only 3.6% parametersper task. By contrast, fine-tuning trains 100% of the parameters per task.
翻译
回到顶部