AI Joins the International Mathematical Olympiad — and Wins Gold

The International Mathematical Olympiad (IMO) has long been regarded as the pinnacle of problem-solving contests for high school students worldwide. Each year, top young minds from dozens of countries gather to tackle six challenging, creative, and entirely original math problems. The problems are designed not to test rote memorisation or advanced academic coursework, but pure mathematical creativity and reasoning.

Participants have just two consecutive days to complete the test, with each day containing three problems to be solved in four-and-a-half hours. Scoring is meticulous — a perfect score is 42 points, and the threshold for a Gold medal typically demands solutions to at least five of the six problems. Lower thresholds earn Silver or Bronze.

This year, however, something unprecedented happened — artificial intelligence models entered the arena.

AI’s First Taste of Olympiad Gold

In a move that surprised many, an advanced AI reasoning system successfully competed under the same time constraints as human participants — and secured a Gold medal-level score.

The most striking part? This was not a model trained specifically for mathematics or for the IMO. Instead, it was a general-purpose reasoning AI capable of handling diverse tasks — yet its abilities were strong enough to meet the Olympiad’s highest performance tier.

Only days later, another AI system participated in the contest under official observation. The result was even more striking — the system scored 35 out of 42 points, comfortably in Gold medal territory. Contest graders described its solutions as clear, precise, and remarkably easy to follow.

From Clumsy Mistakes to Flawless Logic

This leap in AI capability marks a dramatic turnaround from the early days of large language models. The first generations of these systems, while impressive in conversational ability, frequently made factual errors or stumbled over basic arithmetic.

These weaknesses were especially fatal in mathematics — where a single misstep in logic or calculation renders an entire solution invalid. Even modest problem-solving contests were well beyond the reach of early AI chatbots.

The change began when developers introduced “agent” capabilities — allowing AI to:

  • Perform web searches to gather accurate factual information.
  • Run code in Python interpreters to carry out complex calculations.
  • Check reasoning by running numerical experiments.

This transformed AI from a passive text generator into an active problem-solver capable of verifying its own work. While this made it good enough to handle moderately difficult math challenges, IMO-level problems — which demand deep insight, precision, and originality — remained out of reach.

Formal Proof Verification: A New Ally

The next breakthrough came from pairing AI with formal proof systems. These are specialised computer programs designed to read, understand, and verify mathematical proofs with complete logical rigor.

One example is the Lean prover, an open-source system that has become increasingly important in both mathematics and computer science. When an AI’s proposed solution is translated into formal logic, Lean can confirm whether it is correct or expose exactly where it fails.

In 2024, a combined AI-proof system managed to reach a Silver medal score at the IMO level — a significant achievement, though the system required two full days to complete its work, far longer than the competition’s time limit.

The Breakthrough: Reasoning Models

The real game-changer has been the emergence of reasoning models — a new generation of AI that operates with something resembling an internal monologue.

When faced with a complex question, these systems:

  1. Explore multiple potential solution paths.
  2. Work through detailed calculations and logical arguments.
  3. Revisit earlier steps to check for errors.
  4. Sometimes abandon an approach entirely and start from scratch.

Only after this iterative process do they produce a final answer — often one that matches or exceeds human expert performance.

With additional refinements, these reasoning models have now achieved IMO Gold medal scores under official competition conditions. The significance of this cannot be overstated — it is the first time AI has matched the top tier of one of the most demanding human intellectual competitions without any tailoring to the event.

Why This Matters Beyond the Olympiad

The IMO is not just a contest — it’s a benchmark. Solving its problems requires a combination of creativity, precision, and the ability to navigate unfamiliar territory. These are exactly the traits needed in real-world scientific research.

However, research problems differ in one crucial way: they demand long-term focus. While an Olympiad problem might take hours to solve, a research challenge may require months or years of sustained effort, all while avoiding mistakes that could invalidate the entire project.

By integrating reasoning models with formal proof verification, AI could soon maintain this precision over long timescales — making it a reliable partner in ongoing research.

A Powerful New Collaborator

Already, AI is proving valuable in research environments:

  • Suggesting problem-solving strategies human experts may not have considered.
  • Exploring related problems that could lead to breakthroughs.
  • Rapidly testing multiple solution approaches to filter out unpromising ones.

As these systems improve, researchers envision an era in which AI operates as a true collaborator, capable of brainstorming ideas, verifying proofs, and accelerating the pace of mathematical discovery.

Some experts are calling this the beginning of the “super-scientist” era — where human creativity is combined with AI’s tireless precision and vast computational reach.

Challenges and Questions Ahead

The arrival of AI in top-level competitions also raises important questions:

  • Fairness: Should AI be competing in events designed for humans, or should separate AI competitions be created?
  • Verification: Even with proof systems, ensuring AI outputs are correct remains a challenge in some domains.
  • Ethics: How should credit be assigned for discoveries made with AI assistance?

While these debates continue, the undeniable fact is that AI’s capabilities have leaped forward in ways few anticipated even a few years ago.

A New Benchmark for Human–AI Collaboration

The success of AI at the IMO represents more than just a technical milestone — it’s a symbolic moment in the evolving relationship between humans and machines. For decades, mathematics competitions have been the arena where the best young human minds push the limits of problem-solving. Now, machines have stepped onto that stage — not to replace human thinkers, but to join them.

In the years ahead, we may see joint teams of humans and AI tackling problems neither could solve alone. We may see research timelines compressed from years to months. And we may witness discoveries made possible only because an AI could tirelessly explore countless possibilities without fatigue or distraction.

For now, though, the achievement stands as a testament to how far AI has come. The IMO, once purely a celebration of human ingenuity, is now also a showcase for what happens when human curiosity meets machine intelligence.

The next question is no longer “Can AI match human problem-solving?” — it’s “What will we solve together?”

Leave a Reply

Your email address will not be published. Required fields are marked *