尹志 (2025-01-31 17:05):
#paper https://doi.org/10.48550/arXiv.2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews 一篇讨论大语言模型使用情况的文章, 特别举了在AI顶会评审中使用的具体例子。(包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。)研究发现,这些论文review中,有6.5%至16.9%可能被LLM大幅修改,而且这些review有很多有趣的特点,比如confidence比较低,接近ddl才提交,而且不太愿意回应作者反驳等。更多有趣的现象可参考原文。文章中贴了最常见的AI喜欢使用的形容词,比如“commendable”, “meticulous”, and “intricate”等,确实很像AI搞的,哈哈哈。 看来以后审稿人要对作者更加负责才行噢。
arXiv, 2024-03-11T21:51:39Z. DOI: 10.48550/arXiv.2403.07183
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
翻译
Abstract:
We present an approach for estimating the fraction of text in a large corpuswhich is likely to be substantially modified or produced by a large languagemodel (LLM). Our maximum likelihood model leverages expert-written andAI-generated reference texts to accurately and efficiently examine real-worldLLM-use at the corpus level. We apply this approach to a case study ofscientific peer review in AI conferences that took place after the release ofChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggestthat between 6.5% and 16.9% of text submitted as peer reviews to theseconferences could have been substantially modified by LLMs, i.e. beyondspell-checking or minor writing updates. The circumstances in which generatedtext occurs offer insight into user behavior: the estimated fraction ofLLM-generated text is higher in reviews which report lower confidence, weresubmitted close to the deadline, and from reviewers who are less likely torespond to author rebuttals. We also observe corpus-level trends in generatedtext which may be too subtle to detect at the individual level, and discuss theimplications of such trends on peer review. We call for futureinterdisciplinary work to examine how LLM use is changing our information andknowledge practices.
翻译
回到顶部