pwshub.com

OpenAI introduces new multimodal processing, AI fine-tuning tools at DevDay

OpenAI introduced a set of new developer tools today at its DevDay product event in San Francisco.

The additions are headlined by Realtime API, a cloud service that enables software teams to equip their applications with multimodal processing capabilities. The service powers those capabilities using OpenAI’s artificial intelligence models. On launch, Realtime API supports one use case: creating AI applications that can understand voice commands and read out their responses out loud.

Multimodal processing

Usually, sending a voice command to an OpenAI model for processing involves multiple steps. Developers have to transcribe the audio, feed the transcript into the model and then turn the model’s text-based output into synthetic speech. OpenAI’s new Realtime API makes it possible to stream audio to GPT-4o directly without those intermediary steps.

The company says that the service can not only simplify development but also reduce model latency. As a result, AI applications powered by Realtime API can respond to user instructions more quickly. Moreover, the service includes a feature that allows the applications it powers to automatically perform tasks in external systems.

In the future, OpenAI plans to extend Realtime API to several additional use cases including image and video processing. To make it easier for software teams to adopt the service, the company will also make changes to its development kits. Those changes will simplify the task of integrating Realtime API into workloads built using Python and the Node.js application development framework.

Realtime API is not the only multimodal processing tool that OpenAI detailed at DevDay. It also introduced a similar capability for processing voice input to its existing Chat Completions API. According to OpenAI, the capability is geared toward audio processing use cases that don’t require the low latency offered by Realtime API.

For developers building applications that process images, OpenAI is rolling out a feature called vision fine-tuning. Fine-tuning is the process of supplying a neural network with additional training data to boost the quality of its output. Using the new vision fine-tuning capability, developers can provide ChatGPT-4o with custom image datasets to make it better at computer vision tasks.

A company using GPT-4o to generate website layouts could provide the model with a collection of sample designs. Similarly, organizations that rely on the model to extract data from scanned documents could reduce accuracy issues training it on previously processed files. OpenAI says that a fine-tuning database requires as few as 100 images to improve GPT-4o’s performance. 

Cost-efficient inference

Alongside the new multimodal capabilities, OpenAI today debuted two features designed to lower inference costs for customers. The first addition, Model Distillation, produces savings through an AI method known as knowledge distillation. This method allows developers to replace a large, highly capable model with a smaller one that uses less hardware and consequently costs less.

Given the same prompt, a large neural network is likely to generate a better response than a smaller one. With knowledge distillation, developers can take the larger model’s higher-quality response and feed it into the smaller model. This allows the latter algorithm to match the output quality of its more advanced counterpart using a small fraction of the hardware.

OpenAI’s new model distillation feature is available through an application programming interface. It enables developers to submit prompts to one of the company’s frontier models and then turn the model’s responses in an AI training dataset. That dataset, in turn, can be used to boost the quality of a smaller neural network. 

The other feature OpenAI rolled out today to lower customers’ inference costs is called Prompt Caching. It allows the company’s models to reuse user input in certain situations and thereby avoid repeating calculations that they already completed once before. OpenAI is promising an up to 50% reduction in inference costs as well as better response times.

Photo: Focal Foto/Flickr

Source: siliconangle.com

Related stories
2 days ago - If OpenAI could just monetize all the ink that gets spilled on the company, perhaps it could justify raising such a crazy amount of money this week. And get this: The $6.6 billion round, at a (gulp) $157 billion valuation, the biggest VC...
1 month ago - It’s no surprise that entrepreneurs with a pedigree like Ilya Sutskever’s can raise a billion dollars, as the OpenAI co-founder did this week for his startup, SSI. And he wasn’t alone, as Nvidia and others also invested in two other...
3 weeks ago - This was the week that Apple finally infused artificial intelligence into its new iPhones, Watches and AirPods, though some of features won’t be coming for a bit and overall, the AI stuff seemed a little underwhelming. The medical...
1 month ago - All eyes were on Nvidia’s earnings report this week as a proxy for the artificial intelligence economy, and even for the graphics chip giant, it was too much to live up to. Nvidia earnings disappointed, but really, how could they not?...
1 month ago - Amazon.com Inc. is hiring the three founders of Covariant, a well-funded startup that develops artificial intelligence software for warehouse robots. The company announced the move late Friday. Pieter Abbeel, Peter Chen and Rocky Duan,...
Other stories
8 minutes ago - MELBOURNE (Reuters) -Rio Tinto, has made an approach to buy lithium producer Arcadium Lithium, the two parties said in separate statements on Monday, without revealing any financial details. Rio's approach to Arcadium comes as miners are...
8 minutes ago - Trading in Asia kicks off on Monday with the global macro and market landscape suddenly appearing very different from how it looked on Friday, thanks to a set of U.S. employment figures that not even the most bullish of forecasters...
8 minutes ago - (Bloomberg) -- Oil futures posted their largest gain in more than a year last week. And the frenzy was even bigger in the options market.Most Read from BloombergSingapore Ends 181 Years of Horse Racing to Make Way for HomesFrom Cleveland...
23 minutes ago - During World War II, the U.S. Army Air Forces twice targeted ball bearing factories in Schweinfurt based on the thesis that disrupting manufacturing operations would have an impact on Germany’s ability to produce many forms of war...
23 minutes ago - As networks become more critical to modern business operations, network management software becomes essential to manage growing complexity. Companies seek solutions to simplify network management, particularly as they face the challenge...