pwshub.com

Google latest models prove AI doesn't have to suck at math

Researchers at Google DeepMind claim they've developed a pair of AI models capable of taking home a silver medal in the International Mathematical Olympiad (IMO) – although not within the allotted time limit.

Dubbed AlphaProof and AlphaGeometry 2, these models are designed to help solve one of the bigger hurdles facing popular AI systems today: thanks to limitations in artificial reasoning and training data, they kind of suck at math.

To overcome this, DeepMind developed AlphaProof – which combines a language model with its AlphaZero reinforcement learning algorithm. You may recall this is the same reinforcement model that DeepMind used to master chess, shogi and Go a few years back.

AlphaProof trains itself to prove mathematical statements using a functional programming language called Lean.

"Formal languages offer the critical advantage that proofs involving mathematical reasoning can be formally verified," the DeepMind team wrote in a recent blog.

This means that AlphaProof can not only provide an answer, but prove that it's correct. This differs from existing natural language processing, which will confidently hallucinate a plausible sounding answer – but doesn't actually know whether it's correct.

That's not to say NLP or LLMs can't be useful. DeepMind fine-tuned a Gemini LLM to translate natural language problem statements into ones the AlphaZero algorithm can interpret and use.

"When presented with a problem, AlphaProof generates candidates and then proves or disproves them by searching other possible proof steps in Lean. Every time a solution is verified it's used to reinforce the model.

The DeepMind team did this for millions of problems leading up to the IMO competition – in fact, because it's a reinforcement model, training continued throughout the competition.

AlphaGeometry 2, on the other hand, functions in much the same way as its predecessor, detailed back in January. It combines a neural language model with a "rule-bound deduction engine" and the two work together to find proofs for problems.

As DeepMind explains it, the language model is used to identify patterns and suggest useful constructs. The symbolic engine, meanwhile, uses formal logic to arrive at solutions. The downside to this approach is the second step is comparatively slow.

For its second-gen AlphaGeometry model, DeepMind explained the model's language processing is based on Gemini and trained using an "order of magnitude" more synthetic data than its predecessor. The Symbolic Engine has also been sped up considerably and is said to be "two orders of magnitude" faster.

Putting them to the test

To put these to the test, the DeepMind team tasked AlphaProof and AlphaGeometry 2 with solving the six advanced mathematics problems faced by competitors in this year's IMO.

The competition – which dates back to 1959 and sees pre-college mathematicians tackle some of the hardest problems in algebra, combinatorics, geometry, and number theory – has become something of a proving ground for machine learning devs in recent years.

According to DeepMind, the two models were able to complete four of the six problems – AlphaProof solved two algebra problems and one number theory problem, and AlphaGeometry 2 tackled this year's geometry problem. Unfortunately, the latter was no match for the two combinatorics questions.

Tallied up, DeepMind's models still did fairly well, with a score of 28 out of 42 – the equivalent of a silver medal and one point off from gold.

  • OpenAI unveils AI search engine SearchGPT – not that you're allowed to use it yet
  • Microsoft adds generative search to its Bing engine
  • AI models face collapse if they overdose on their own output
  • Mistral Large 2 leaps out as a leaner, meaner rival to GPT-4-class AI models

However, there seems to be plenty of room for improvement. At least for this competition, DeepMind conceded it was still necessary to translate the problems given to competitors manually into a formal mathematical language the models could understand.

The models also failed to solve the majority of the problems within the allotted time period – which spans two 4.5 hour sessions, spread across two days. While the researchers were able to solve one of the problems within a few minutes, the others took upwards of three days to solve.

The DeepMind researchers are not done yet, of course. They report they are already experimenting with a natural language reasoning system, built on Gemini, that wouldn't require problems to be translated into a formal language and could be combined with other AI systems. That should speed things up a bit. ®

Source: theregister.com

Related stories
2 weeks ago - Intel's Senior Vice President and General Manager, Jim Johnson, annouced Intel's New Core Ultra Processors made for AI PC's at IFA 2024 in Berlin, Germany. Watch the highlights from Intel's Keynote Annoucnement.
1 month ago - It's still the camera, design and battery life that make the Pixel shine, rather than its new AI tricks.
1 week ago - Get up to speed on the rapidly evolving world of AI with our roundup of the week's developments.
1 week ago - Internal Amazon documents reveal what caused voice assistant Alexa to apparently favor Kamala Harris over Donald Trump in a viral video this week.
1 month ago - At CNET we go to a lot of events in person, and some are better than others. Here's our insider take, complete with ratings, by the editors who went.
Other stories
4 minutes ago - Act fast to grab this high-performing mesh router for less than $500, keeping you connected while saving some cash too.
4 minutes ago - If the old-school PlayStation is dear to your heart, you can soon relive those totally sweet 1990s memories. Sony is releasing a series of products...
4 minutes ago - If you've got an old phone to part with, T-Mobile is offering both new and existing customers the brand-new Apple iPhone 16 Pro for free with this trade-in deal.
4 minutes ago - Who doesn't want the best for their beloved pooch? Grab some of these tasty treats to make your dog feel special.
10 minutes ago - To be fair, Joe was probably taking a nap The Iranian cyber snoops who stole files from the Trump campaign, with the intention of leaking those documents, tried to slip the data to the Biden camp — but were apparently ignored, according...