Mathematics is undergoing the biggest change in its history

Page of old textured vintage paper with calculation of higher mathematics — Is the age of handwritten math coming to an end?

Laboratory assistant / Alamy

In March 2025, mathematician Daniel Litt bet. Despite the progress of artificial intelligence in many fields, he thought his subject was safe, betting with a colleague that there was only a 25 percent chance that an AI could write a mathematical paper on par with the best human mathematicians by 2030. Just a year later, he believes he was wrong. “I now expect to lose this bet,” he declared on his blog.

Mathematicians have been surprised by the speed of improvements in AI’s ability to solve problems and produce proofs. “A couple of years ago they were basically useless for solving high school math problems, and now they can sometimes solve problems that really come up in the mathematician’s research life,” says Litt, who is at the University of Toronto.

This progress is faster than many had predicted, and mathematicians warn that their profession is undergoing one of the fastest developments the field has ever seen. “We’re running out of places to hide,” wrote Jeremy Avigad of Carnegie Mellon University in Pennsylvania in a recent essay. “We have to face the fact that AI will soon be able to prove theorems better than we can.”

It is not a single event that causes reactions like this, but the steady mathematical progress AIs show. Last year, companies like OpenAI and Google DeepMind achieved gold medal performances at the International Mathematical Olympiad, an elite competition for high school students that many experts had previously written off as beyond the reach of AI tools. In January, mathematicians began to use similar tools to solve long-standing problems posed by the Hungarian mathematician Paul Erdős.

Now, in two separate developments, AI has begun to take on more complex mathematics, solving real-world research problems and helping to automatically verify groundbreaking proofs, which traditionally could take a huge amount of work from teams of mathematicians.

In February, Nikhil Srivastava at the University of California, Berkeley, and his colleagues started the First Proof project in an effort to find a more realistic benchmark for testing AI’s mathematical abilities. The first round of the project consisted of 10 problems that researchers had needed to solve as part of their work, from widely different mathematical fields.

“They were naturally occurring problems in our day-to-day research,” says Srivastava. “They were drawn from a typical difficulty distribution. They weren’t super hard, but they weren’t routine either. It was really a range.”

Evidence of progress

After the problems were made public, solutions started pouring in. Researchers at tech companies, including OpenAI and Google DeepMind, were among those trying to solve the First Proof problems with their own AI models. OpenAI claims it answered half of the problems correctly, according to “expert feedback”, while Google DeepMind scored 6 out of 10, according to mathematicians it consulted for each problem.

“Things have changed so fast,” says Thang Luong of Google DeepMind. “For us, AI has really become a serious collaborator, either to produce serious research work or, in the case of First Proof, it can actually also propose a solution on its own.”

Google’s AI math tool, called Aletheia, uses a computationally intensive version of its Gemini AI chatbot, along with a confirmation algorithm to check for errors in possible solutions. It can then iteratively produce improvements to itself until it arrives at an answer. Google hasn’t disclosed how many iterations Aletheia needed to solve these problems, making it hard to assess how good it is, but mathematicians are still impressed.

Not all problems were unanimously agreed to be solved. With problem 8, for example, which is in a niche area of geometry, only five of the seven experts that Google asked agreed that the proposed solution was correct. Ivan Smith of the University of Cambridge, who was not involved in the Google effort, says AI appears to be taking a sensible approach to this problem and is showing good progress. “If this was a graduate student coming back with their thoughts, that would be encouraging and would build confidence that the result was actually true,” says Smith.

This highlights a problem with AI-generated evidence – checking it is hard work. It would be easy to get into a situation where AI can generate evidence faster than humans can check it. If a theorem is proven by an AI, but no one is around to check it, has it been proven? AI can help here too.

Technology is rapidly improving at translating handwritten natural language proofs, like the problems in First Proof, into a format that can be checked by a computer, a process called formalization.

AI company Math, Inc. recently surprised mathematicians by announcing that its AI tool, called Gauss, had formalized an award-winning proof and confirmed that it was correct. The proof concerned how many spheres can be packed into a room and was the subject of Maryna Viazovska’s 2022 Fields Medal, often called the Nobel Prize in Mathematics.

Efforts to formalize Viazovska’s work began with a small group of mathematicians in late 2024, working separately from mathematics. Inc, who hoped to manually translate the problem into computer code. They first looked at Viazovska’s ball package solution in eight dimensions. While making steady progress, Math, Inc., which had later assisted the researchers, announced that they already had a complete proof, and then a more general version of a result for 24 dimensions.

Bhavik Mehta at Imperial College London and his colleagues had originally outlined a rough plan for how to formalize the work, as well as come up with important mathematical definitions. Without this, AI could not have completed its proofs, says Mehta.

“We had made all the parts, but we hadn’t written the instruction manual explaining how to put them together,” says Chris Birkbeck of the University of East Anglia, UK, also part of the team.

A new style of mathematics

The final proof was about 200,000 lines of code, which constitutes about 10 percent of all existing formalized mathematics. Although it is likely that this code is about 10 times longer than a human would have produced to do the same task, it is still a great achievement, says Johan Commelin of Utrecht University in the Netherlands. “This is a big deal. This is Fields Medal-winning work, and it is automatically formalized.”

Similar efforts should now be possible for a wide range of other fields, says Commelin, which can transform how mathematics is practised. “The future that we’re all thinking about is that we’ll have tools that will automatically formalize new research and mathematical papers, and flag whether it’s wrong or not,” says Commelin. “This will have huge implications for, for example, peer review and referee work.”

Faced with a future where an increasing proportion of mathematics is performed by AI, some mathematicians, such as Avigad, are sounding the alarm about the harmful effects this could have on our ability to practice and come up with new mathematics.

Using machines to solve the kind of problems posed in First Proof can provide concrete evidence, says Anna Marie Bohmann of Vanderbilt University in Tennessee, but we lose the “learning opportunity”, she says. “Struggle to create and formulate new ideas and solve new problems is one of the most important ways in which both students and mathematics professionals consolidate their knowledge.”

Tony Feng, one of the Aletheia team at Google DeepMind, feels the same way and is wary of using the tool himself. “A lot of times I feel like I should do my own homework and go through the process of building my own intuition.”

Even formalizing proof can generate important insights, Mehta says, and he and his colleagues now need to unpack the 200,000-line AI proof to find out what might be useful for other projects.

But mathematicians remain hopeful that there will be a place for them in an increasingly mechanized future. Looking back on history, Commelin says manual calculations were once a big part of being a mathematician, but now they’re done automatically. “I think similar things will happen here, where we radically change what we do, but in 10 or 20 years we will still recognize what we do as mathematics, in a new style.”

Topics: