OpenAI Model Scores 35/42 to Earn IMO Gold

OpenAI’s experimental reasoning model solved 5 of 6 problems from the 2025 International Mathematical Olympiad competition, earning a gold medal-level 35/42 points under strict exam conditions.

Competition Conditions

The model’s work matched the human exam environment. The model operated under identical conditions to human contestants, including two 4.5-hour exam sessions, no tools or internet access, reading the official problem statements, and writing natural language proofs.

After submission, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after a unanimous consensus was reached.

Elite Contest Context

The 2025 International Mathematical Olympiad took place on the Sunshine Coast, Australia, from July 15 to 16, 2025. Only 67 out of 630 contestants received gold medals in the 2025 IMO, approximately 10 percent.

Technical Innovation

The IMO has long been referred to as the “Mount Everest” of AI reasoning challenges, dating back to 1959.

This milestone marks the achievement of breaking new ground in general-purpose reinforcement learning and test-time compute scaling. OpenAI emphasized that “o1 thought for seconds. Deep Research for minutes. This one thinks for hours”. Emerging studies also show that Optimal test-time compute scaling can be more effective than scaling model parameters for reasoning tasks.

Leadership At OpenAI

Alexander Wei has been a Member of Technical Staff at OpenAI since January 2024. He received his PhD in Computer Science from UC Berkeley in 2023 and helped develop earlier reasoning systems.

Noam Brown leads AI reasoning research at OpenAI and joined the company in June 2023. He previously worked at Meta, where he developed game-playing AI systems, including Libratus and Pluribus.

Release Plans And GPT-5

OpenAI says they don’t plan to release anything with this level of math capability for several months. CEO Sam Altman confirmed that GPT-5 is coming this summer, 2025, but without immediate IMO-level reasoning.

Key Takeaways:

OpenAI’s model matched human exam conditions and solved most IMO problems under strict rules.
The gold medal result follows DeepMind’s 2024 silver performance, scoring 35/42 vs. 28/42 points.
Test-time compute scaling proved crucial, allowing the model to “think for hours” on challenging proofs.
The release of GPT-5 this summer won’t yet include these advanced math capabilities.