刘昊辰 (2025-09-08 15:13):
#paper Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 研究团队通过构建自验证流程(含初始解题、自我改进、验证纠错等步骤)并优化提示词设计,利用 Google 的Gemini 2.5 Pro 模型在 2025 年国际数学奥林匹克竞赛(IMO 2025)的 6 道题目中成功解出 5 道,且为避免数据污染仅使用最新发布的 IMO 2025 题目作为测试集;研究还对比了带提示(如数学归纳法、解析几何)与无提示解题的效果,发现提示主要提升效率而非创造新能力,同时指出模型在第 6 题中因错误假设导致解题失败,最终证实强大 LLM 结合合理策略可实现高水平数学推理,接近人类金牌水平。下载地址:https://arxiv.org/pdf/2507.15855
arXiv, 2025-07-21T17:59:49Z. DOI: 10.48550/arXiv.2507.15855
Gemini 2.5 Pro Capable of Winning Gold at IMO 2025
翻译
Abstract:
The International Mathematical Olympiad (IMO) poses uniquely challengingproblems requiring deep insight, creativity, and formal reasoning. While LargeLanguage Models (LLMs) perform well on mathematical benchmarks like AIME, theystruggle with Olympiad-level tasks. We use Google's Gemini 2.5 Pro on the newlyreleased IMO 2025 problems, avoiding data contamination. Using aself-verification pipeline with careful prompt design, 5 (out of 6) problemsare solved correctly. This result underscores the importance of developingoptimal strategies to harness the full potential of powerful LLMs for complexreasoning tasks.
翻译
回到顶部