pwshub.com

OpenAI’s new o1 large language model can decode scrambled text and ace math exams

OpenAI today launched a new large language model series, o1, that can decode scrambled text, answer science questions with better accuracy than PhD holders and perform other complex tasks.

The LLM series better-known by it code name Strawberry, comprises two models on launch: o1-preview and o1-mini. The former is the more capable of the two, while the latter algorithm trades off some response quality for better cost-efficiency. Both models became available today in the paid versions of OpenAI’s ChatGPT chatbot service. 

OpenAI says that the o1 series is not a drop-in replacement for the GPT-4o model it debuted in May. The new LLMs currently lack several of the features offered by that model, notably the ability to analyze files uploaded by the user. There are also no integrations that would allow o1 to interact with external applications.

On the other hand, the new LLM series is significantly better at tasks that require reasoning skills.

In one internal test, OpenAI engineers had o1-preview complete a qualifying exam for the U.S. Math Olympiad. The model’s average scores ranged from 74% to 93%, a significant improvement over the 12% achieved by GPT-4o. OpenAI says that o1-preview’s best average score put it among the top 500 test takers in the U.S.

In another evaluation, the ChatGPT developer had o1-preview tackle the GPQA Diamond benchmark, a collection of complex science questions. The model achieved a higher score across a set of physics, biology, and chemistry questions than a group of experts with doctorates.

The company says one of the contributors to o1’s reasoning prowess is its use of a machine learning approach known as CoT, or chain of thought. The technique allows LLMs to break down a complex task into smaller steps and carry out those steps one by one. In many cases, tackling complex prompts this way can help an LLM improve the accuracy of its responses.

OpenAI refined o1’s CoT mechanism using reinforcement learning. This is a machine learning technique that helps LLMs improve their output quality over time through a kind of trial and error training process. In most reinforcement learning projects, a model is given a set of training tasks and receives positive feedback whenever it solves one of them correctly, which helps it become more accurate.

One of the tasks to which o1’s CoT-powered reasoning features can be applied is decoding scrambled text. During an internal test, OpenAI had o1-preview decipher a scrambled version of the sentence “There are three R’s in Strawberry.” The model successfully completed the task by following a line of reasoning that comprised dozens of steps and required it to change tactics multiple times.

OpenAI says o1’s CoT features also make it safer than earlier models. “We conducted a suite of safety tests and red-teaming before deployment,” the company’s researchers detailed in a blog post today. “We found that chain-of-thought reasoning contributed to capability improvements across our evaluations.”

The o1 series is being available in not only ChatGPT but also through its application programming interface, which allows developers to integrate its LLMs into their software. The scaled-down o1-mini model trades some of o1-preview’s accuracy for 80% lower inference pricing. OpenAI says that o1-mini has a smaller knowledge base but is “particularly effective at coding.”

Down the line, the company plans to make o1-mini available in the free version of ChatGPT. It also intends to raise the usage limits on o1 in the paid versions of the chatbot service. On launch, customers can send 30 prompts a day to o1-preview and 50 to o1-mini. 

Source: siliconangle.com

Related stories
6 days ago - This was the week that Apple finally infused artificial intelligence into its new iPhones, Watches and AirPods, though some of features won’t be coming for a bit and overall, the AI stuff seemed a little underwhelming. The medical...
5 days ago - OpenAI's new financing round is expected to come in the form of convertible notes, according to sources with direct knowledge of the matter, who said its $150 billion valuation will be contingent on whether the ChatGPT-maker can upend its...
1 month ago - Elon Musk has sued OpenAI for allegedly misleading him about the goals of its artificial intelligence development efforts. The lawsuit was filed today with the District Court for the Northern District of California. Musk’s attorneys name...
1 month ago - Elon Musk’s xAI Corp. has debuted two new language models, Grok-2 and Grok-2 mini, that it claims can perform some tasks with similar accuracy to OpenAI’s GPT-4o. The models rolled out to X on Tuesday. Later this month, they will also...
2 weeks ago - But Microsoft says it’s "hearing something quite different” from customers of the generative AI chatbot.
Other stories
21 minutes ago - Tom Lee has called for a stock rally after rate cuts, but even after the Fed cut 50 basis points, he's wary on stocks ahead of the election.
21 minutes ago - With the lockup period set to expire, Trump could start offloading his nearly $2 billion worth of stock, though the former president has said he wouldn't sell.
2 hours ago - (Bloomberg) -- Skechers U.S.A. Inc. shares delivered their worst daily performance since February after the footwear company’s chief financial officer told an industry conference that China sales will be under pressure the rest of the...
3 hours ago - The Fed's cutting cycle in 1995 sparked an economic boom, with the stock market more than doubling in value by the end of the decade.
3 hours ago - There's nothing like a potentially massive government contract to win the hearts of both investors and analysts.