ZĒNG Yíngzhū (Zoo) 曾莹珠
(2026-04-21 09:47):
#paper
Sycophantic AI decreases prosocial intentions and promotes dependence.
Myra Cheng et al.
2026
https://doi.org/10.1126/
science.aec8352
研究者使用三批数据(日常的建议寻求;Reddit里一个论坛Am I the asshole,众人认同做错了的;伤害自我或他人的行为描述),对11个主流AI模型的谄媚程度进行了分析。AI比人类更多认同用户,多出47-51%。
之后进行了三个实验。
2AI谄媚与否(是vs否)*2AI回答的风格(拟人vs机械)。
2AI谄媚与否(是vs否)*感知回答来源(真人vsAI) 。
被试回忆自身的人际矛盾,2AI谄媚与否(是vs否)。
结果是,谄媚组被试更不愿意道歉,更不会主动改善处境或者改变自身行为。这效应不受AI回答风格或感知回答来源的影响。控制场景和被试人口学变量之后效应还是显著。
被试对谄媚的AI的信任度更高,对它们回答质量的评分更高,更愿意下次再用这些谄媚的模型。
读完之后,我的感受是,个人用户来说,可以特意要求AI指出自己的问题所在,下明确的指令,以及质疑AI。
还有反过来提问,描述问题时,把对方当做自己,换个角度来描述。就类似明明我是要polish稿子,要过的人,但说成我在给别人审批,让AI列出我不让稿子过的原因。
Science,
2026-3-26.
DOI: 10.1126/science.aec8352
Sycophantic AI decreases prosocial intentions and promotes dependence
Myra Cheng,
Cinoo Lee,
Pranav Khadpe,
Sunny Yu,
Dyllan Han,
Dan Jurafsky
Abstract:
<br> Despite rising concerns about sycophancy—excessive agreement or flattery from artificial intelligence (AI) systems—little is known about its prevalence or consequences. We show that sycophancy is widespread and harmful. Across 11 state-of-the-art models, AI affirmed users’ actions 49% more often than humans, even when queries involved deception, illegality, or other harms. In three preregistered experiments (<br> N</i><br> = 2405), even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing their conviction that they were right. Despite distorting judgment, sycophantic models were trusted and preferred. This creates perverse incentives for sycophancy to persist: The very feature that causes harm also drives engagement. Our findings underscore the need for design, evaluation, and accountability mechanisms to protect user well-being.<br>
Related Links: