pwshub.com

OpenAI debuts "o1" AI models, promising PhD-level reasoning in science and math

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

In a nutshell: OpenAI has unveiled a new series of AI language models named the "o1," specifically engineered to enhance reasoning capabilities, particularly for complex issues in science, coding, and mathematics. The company is so confident in these advancements that it has reset the model version counter to 1, starting anew after GPT-4o, and has notably moved away from the GPT branding.

The first model in the "o1" series, named "o1-preview," is available in both ChatGPT and OpenAI's API. Despite its preview status, the company promises regular updates and enhancements are part of the plan.

The "o1" models have been trained to enhance their problem-solving approach by spending more time analyzing issues before providing an answer. This method allows the models to experiment with various strategies, identify their own errors, and tackle complex tasks in a more systematic, human-like manner.

The results shared by OpenAI suggest a significant advancement with the new "o1" models. According to the company, these models perform at a level comparable to PhD students on challenging benchmarks in fields such as physics, chemistry, and biology.

For example, it achieved an 83 percent accuracy rate on a test qualifying students for the International Math Olympiad, a notable improvement over the 13 percent accuracy of GPT-4o.

Of course, AI benchmarks can sometimes be unreliable, so the true performance of the "o1" models will become clearer as more users test them in various scenarios.

Additionally, the new models seem to resolve some long-standing questions, such as the number of R's in "strawberry," finally putting the memes to rest. OpenAI also showcased a demo where the model successfully generated Python code for an arcade game, highlighting its advanced capabilities.

OpenAI o1 answers a famously tricky question for large language models. pic.twitter.com/5ZlQIOBWEd

– OpenAI (@OpenAI) September 12, 2024

OpenAI was previously reported to be working on a project codenamed "Strawberry" to develop models capable of tackling complex reasoning tasks. Given that the "o1" series seems to be the result of the Strawberry project, it's amusing to think that the project's name might have been inspired by the "strawberry" test.

In addition to enhancing reasoning capabilities, OpenAI also focused on strengthening defenses against "jailbreaking," a technique used to bypass safety mechanisms. According to the company, the "o1-preview" scored 84 out of 100 in one of its most challenging jailbreaking tests, compared to only 22 for GPT-4o.

To make these models more accessible, especially for developers, OpenAI is also releasing a lighter "o1-mini" version designed for coding tasks.

Access to both "o1-mini" and "o1-preview" is now rolling out for paid ChatGPT Plus and Teams plans. While the advanced reasoning capabilities are currently opt-in with weekly usage limits, OpenAI is working to expand capacity and enable automatic model selection based on the complexity of the prompt.

Source: techspot.com

Related stories
3 days ago - Get up to speed on the rapidly evolving world of AI with our roundup of the week's developments.
1 day ago - Chatbots can give very humanlike responses to our prompts and queries, but they don't think -- or learn -- the way we do.
6 days ago - New o1 language model can solve complex tasks iteratively, count R's in "strawberry."
6 days ago - 'Chain of thought' techniques mean latest LLM is better at stepping through complex challenges OpenAI on Thursday introduced o1, its latest large language model family, which it claims is capable of emulating complex reasoning.…
6 days ago - The new model is designed to solve complex problems across a small handful of fields, but OpenAI says the model performs similarly to PhD students in those tasks.
Other stories
1 minute ago - Many left reeling from July's IT meltdown, but not to worry, it was all unavoidable Germany's Federal Office for Information Security (BSI) says one in ten organizations in the country affected by CrowdStrike's outage in July are dropping...
1 hour ago - Experts at the Netherlands Institute for Radio Astronomy (ASTRON) claim that second-generation, or "V2," Mini Starlink satellites emit interference that is a staggering 32 times stronger than that from previous models. Director Jessica...
1 hour ago - The PKfail incident shocked the computer industry, exposing a deeply hidden flaw within the core of modern firmware infrastructure. The researchers who uncovered the issue have returned with new data, offering a more realistic assessment...
1 hour ago - Nighttime anxiety can really mess up your ability to sleep at night. Here's what you can do about it right now.
1 hour ago - With spectacular visuals and incredible combat, I cannot wait for Veilguard to launch on Oct. 31.