文献收藏与分享平台

刘昊辰 (2025-09-08 15:13):

#paper Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 研究团队通过构建自验证流程（含初始解题、自我改进、验证纠错等步骤）并优化提示词设计，利用 Google 的Gemini 2.5 Pro 模型在 2025 年国际数学奥林匹克竞赛（IMO 2025）的 6 道题目中成功解出 5 道，且为避免数据污染仅使用最新发布的 IMO 2025 题目作为测试集；研究还对比了带提示（如数学归纳法、解析几何）与无提示解题的效果，发现提示主要提升效率而非创造新能力，同时指出模型在第 6 题中因错误假设导致解题失败，最终证实强大 LLM 结合合理策略可实现高水平数学推理，接近人类金牌水平。下载地址：https://arxiv.org/pdf/2507.15855

arXiv, 2025-07-21T17:59:49Z. DOI: 10.48550/arXiv.2507.15855

Gemini 2.5 Pro Capable of Winning Gold at IMO 2025

翻译

Yichen Huang, Lin F. Yang

Abstract:

The International Mathematical Olympiad (IMO) poses uniquely challengingproblems requiring deep insight, creativity, and formal reasoning. While LargeLanguage Models (LLMs) perform well on mathematical benchmarks like AIME, theystruggle with Olympiad-level tasks. We use Google's Gemini 2.5 Pro on the newlyreleased IMO 2025 problems, avoiding data contamination. Using aself-verification pipeline with careful prompt design, 5 (out of 6) problemsare solved correctly. This result underscores the importance of developingoptimal strategies to harness the full potential of powerful LLMs for complexreasoning tasks.

翻译

Related Links:

https://doi.org/10.48550/arXiv.2507.15855