姗姗来迟 (2023-06-30 13:25):
#paper Arabic Dialect Identification with a Few Labeled Examples Using Generative Adversarial Networks https://aclanthology.org/2022.aacl-main.16.pdf 考虑到在处理阿拉伯方言(DA)变化时引入的挑战和复杂性,基于Transformer的模型,例如BERT,在处理DA识别任务方面优于其他模型。然而,要对这些模型进行微调,需要大量的语料库。为一些阿拉伯语方言课程获取大量高质量的示例是具有挑战性和耗时的。 该论文扩展了基于Transformer的模型(ARBERT和MARBERT)使用 半监督生成对抗网络(SS-GAN)在生成对抗设置中使用未分类数据。模型能够为阿拉伯语方言样本 生成高质量的嵌入,并帮助模型更好地泛化下游分类任务。
ACL Anthology, 2022.
Arabic Dialect Identification with a Few Labeled Examples Using Generative Adversarial Networks
翻译
Abstract:
Given the challenges and complexities introduced while dealing with Dialect Arabic (DA) variations, Transformer based models, e.g., BERT, outperformed other models in dealing with the DA identification task. However, to fine-tune these models, a large corpus is required. Getting a large number high quality labeled examples for some Dialect Arabic classes is challenging and time-consuming. In this paper, we address the Dialect Arabic Identification task. We extend the transformer-based models, ARBERT and MARBERT, with unlabeled data in a generative adversarial setting using Semi-Supervised Generative Adversarial Networks (SS-GAN). Our model enabled producing high-quality embeddings for the Dialect Arabic examples and aided the model to better generalize for the downstream classification task given few labeled examples. Experimental results showed that our model reached better performance and faster convergence when only a few labeled examples are available.
翻译
回到顶部