尹志
(2025-01-31 17:05):
#paper https://doi.org/10.48550/arXiv.2403.07183 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
一篇讨论大语言模型使用情况的文章, 特别举了在AI顶会评审中使用的具体例子。(包括ICLR 2024、NeurIPS 2023、CoRL 2023和EMNLP 2023。)研究发现,这些论文review中,有6.5%至16.9%可能被LLM大幅修改,而且这些review有很多有趣的特点,比如confidence比较低,接近ddl才提交,而且不太愿意回应作者反驳等。更多有趣的现象可参考原文。文章中贴了最常见的AI喜欢使用的形容词,比如“commendable”, “meticulous”, and “intricate”等,确实很像AI搞的,哈哈哈。 看来以后审稿人要对作者更加负责才行噢。
arXiv,
2024-03-11T21:51:39Z.
DOI: 10.48550/arXiv.2403.07183
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Abstract:
We present an approach for estimating the fraction of text in a large corpus<br>which is likely to be substantially modified or produced by a large language<br>model (LLM). Our maximum likelihood model leverages expert-written and<br>AI-generated reference texts to accurately and efficiently examine real-world<br>LLM-use at the corpus level. We apply this approach to a case study of<br>scientific peer review in AI conferences that took place after the release of<br>ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest<br>that between 6.5% and 16.9% of text submitted as peer reviews to these<br>conferences could have been substantially modified by LLMs, i.e. beyond<br>spell-checking or minor writing updates. The circumstances in which generated<br>text occurs offer insight into user behavior: the estimated fraction of<br>LLM-generated text is higher in reviews which report lower confidence, were<br>submitted close to the deadline, and from reviewers who are less likely to<br>respond to author rebuttals. We also observe corpus-level trends in generated<br>text which may be too subtle to detect at the individual level, and discuss the<br>implications of such trends on peer review. We call for future<br>interdisciplinary work to examine how LLM use is changing our information and<br>knowledge practices.
Related Links: