pwshub.com

Google’s lightweight Gemini 1.5 Flash-8B hits general availability

Google LLC is making a new version of its popular Gemini 1.5 Flash artificial intelligence model available that’s smaller and faster than the original.

It’s called Gemini 1.5 Flash-8B, and it’s much more affordable, at half the price. Gemini 1.5 Flash is the lightweight version of Google’s Gemini family of large language models, optimized for speed and efficiency and designed to be deployed on low-powered devices such as smartphones and sensors.

The company announced Gemini 1.5 Flash at Google I/O 2024 in May, and it was released to some paying customers a few weeks later, before becoming available for free via the Gemini mobile app, albeit with some restrictions on use.

It finally hit general availability at the end of June, offering competitive pricing and a 1 million-token context window combined with high-speed processing. At the time of its launch, Google noted that its input size is 60 times larger than that of OpenAI’s GPT-3.5 Turbo, and 40% faster on average.

The original version was designed to provide a very low token input price, making it price-competitive for developers, and was adopted by customers such as Uber Technologies Inc., powering the Eats AI assistant in that company’s UberEats food delivery service.

With Gemini 1.5 Flash-8B, Google is introducing one of the most affordable lightweight LLMs available on the market, with a 50% lower price and double the rate limits compared with the original 1.5 Flash. In addition, it also offers lower latency on small prompts, the company said.

Developers can access Gemini 1.5 Flash-8B for free via the Gemini API and Google AI Studio.

In a blog post, Gemini API Senior Product Manager Logan Kilpatrick explained that the company has made “considerable progress” in its efforts to improve 1.5 Flash, taken into consideration feedback from developers and “testing the limits” of what’s possible with such lightweight LLMs.

He explained that the company announced an experimental version of Gemini 1.5 Flash-8B last month. It has since been refined further, and is now generally available for production-use.

According to Kilpatrick, the 8-B version can almost match the performance of the original 1.5 Flash model on many key benchmarks, and has shown to be especially useful in tasks such as chat, transcription and long context language translation.

“Our release of best-in-class small models continues to be informed by developer feedback and our own testing of what is possible with these models,” Kilpatrick added. “We see the most potential for this model in tasks ranging from high volume multimodal use cases to long context summarization tasks.”

Kilpatrick added that Gemini 1.5 Flash-8B offers the lowest cost per intelligence of any Gemini model released so far:

The pricing compares well with equivalent models from OpenAI and Anthropic PBC. In the case of OpenAI, its cheapest model is still GPT-4o mini, which costs $0.15/1M input, though that drops by 50% for reused prompt prefixes and batched requests. Meanwhile, Anthropic’s most affordable model is Claude 3 Haiku at $0.25/M, though the price drops to $0.03/M for cached tokens.

In addition, Kilpatrick said the company is doubling 1.5 Flash-8B’s rate limits, in an effort to make it more useful for simple, high-volume tasks. As such, developers can now send up to 4,000 requests per minute, he said.

Images: Google

Source: siliconangle.com

Related stories
2 days ago - Google LLC today announced the launch of two new Chromebook models and artificial intelligence features, powered by Gemini, coming to the company’s lightweight laptop’s ChromeOS. Beginning this month all Chromebooks will now feature chat...
1 month ago - Google Cloud is giving developers an easier way to get their artificial intelligence applications up and running in the cloud, with the addition of graphics processing unit support on the Google Cloud Run serverless platform. The company...
1 month ago - Nvidia Corp. today released a lightweight language model, Mistral-NeMo-Minitron 8B, that can outperform comparably sized neural networks across a range of tasks. The code for the model is available on Hugging Face under an open-source...
3 weeks ago - For the second time in a year, Google is going to trial against the US Justice Department to defend its lucrative businesses.
1 month ago - Eric Schmidt was responding to a question about how Google have lost the lead in AI to startups like OpenAI and Anthropic.
Other stories
22 minutes ago - Oil prices fell by 17% in the third quarter, the largest quarterly decline in a year, on worries about the global oil demand outlook. The company, in its earnings snapshot, indicated weaker refining margins during the quarter would also...
22 minutes ago - Damola Adamolekun says the company’s endless shrimp offering hurt “one of the most important companies in American history.”
1 hour ago - Texas Attorney General Ken Paxton today filed a lawsuit against the social media giant TikTok for allegedly sharing the personal data of young people, a violation of the state’s new child safety law. That law, the Securing Children Online...
1 hour ago - A podcaster recently shared his experience of meeting Charlie Munger and having dinner with the legendary investor and former vice chairman of Berkshire Hathaway (NYSE:BRK)(NYSE:BRK). Here's the most valuable piece of advice and key...
1 hour ago - (Bloomberg) -- Billions of dollars have exited China’s largest money market exchange-traded funds just as billions more flowed into ETFs tracking equities, signaling that Beijing has finally drawn skeptical investors back to the country’s...