pwshub.com

Google Cloud Run speeds up on-demand AI inference with Nvidia’s L4 GPUs

Google Cloud is giving developers an easier way to get their artificial intelligence applications up and running in the cloud, with the addition of graphics processing unit support on the Google Cloud Run serverless platform.

The company said in a blog post today that it’s adding support for Nvidia’s L4 graphics processing units on Google Cloud Run in preview in a limited number of regions, ahead of a wider rollout in future.

First unveiled in 2019, Google Cloud Run is a fully managed, serverless computing platform that makes it easy for developers to launch applications, websites and online workflows. With Cloud Run, developers simply upload their code as a stateless container into a serverless environment, so there’s no need to worry about infrastructure management.

It differs from other cloud computing platforms because everything is fully managed. Though some developers appreciate the cloud because it provides the ability to fine-tune the way their computing environments are configured, not everyone wants to bother with this.

Cloud Run does all of the heavy lifting for developers, so they don’t have to ponder over their compute and storage requirements or worry about configurations and provisioning. It also eliminates the risk of overprovisioning and paying for more computing resources than what developers actually use, thanks to its pay-per-use pricing model, and it naturally requires fewer people to get a new application or website up and running.

On-demand AI inference

In a blog post, Google Cloud Serverless Product Manager Sagar Randive said his team realized that Cloud Run’s benefits make it an ideal option for running real-time AI inference applications that serve generative AI models. So that’s why the company is introducing support for Nvidia’s L4 GPUs.

With support for Nvidia’s GPUs, Cloud Run users can perform on-demand online AI inference using any large language model they want, in a matter of seconds.

“With 24GB of vRAM, you can expect fast token rates for models with up to 9 billion parameters, including Llama 3.1(8B), Mistral (7B), Gemma 2 (9B),” Randive said. “When your app is not in use, the service automatically scales down to zero so that you are not charged for it.”

The company believes that GPU support makes Cloud Run a more viable option for various AI workloads, including inference tasks with lightweight LLMs such as Gemma 2B, Gemma 7B or Llama-3 8B. In turn, this paves the way for developers to build and launch customized chatbots or AI summarization models that can scale to handle spikes in traffic.

Other use cases include serving customized and fine-tuned generative AI models, such as a scalable and cost-effective image generator that’s tailored for a company’s brand. In addition, the Cloud Run GPUs also support non-AI tasks such as on-demand image recognition, video transcoding, streaming and 3D rendering, Google said.

Nvidia’s L4 GPUs are available in preview on Google Cloud Run now in the us-central1(Iowa) region, and will launch in europe-west4 (Netherlands) and asia-southeast1 (Singapore) by the end of the year. The service supports a single L4 GPU per instance, and there’s no need to reserve the GPU in advance, Google said.

A handful of customers have already been lucky enough to pilot the new offering, including the cosmetics and beauty products giant L’Oréal S.A., which is using GPUs on Cloud Run to power a number of its real-time inference applications.

“The low cold-start latency is impressive, allowing our models to serve predictions almost instantly, which is critical for time-sensitive customer experiences,” said Thomas Menard, head of AI at L’Oreal. “Cloud Run GPUs maintain consistently minimal serving latency under varying loads, ensuring our generative AI applications are always responsive and dependable.”

Source: siliconangle.com

Related stories
3 weeks ago - A flurry of new artificial intelligence models this week illustrated what’s coming next in AI: smaller language models targeted at vertical industries and functions. Both Nvidia and Microsoft debuted smaller large language models too....
1 week ago - While much of the attention surrounding the growth of artificial intelligence has centered on software development and building models, the engine driving AI is still hardware in the form of compute, storage and networking. Increasingly,...
1 month ago - Ahead of the annual Black Hat cybersecurity conference in Las Vegas, we warned that defensive tool sprawl is only likely to get worse. Onsite, the talk was about, of course, the impact of AI. So far, so good, but defenders are bracing for...
14 minutes ago - Ampere Computing Inc. has hired a financial adviser to explore a potential sale, Bloomberg reported today. It’s believed that the chipmaker is seeking a takeover offer from a “larger industry player.” It’s unclear if Ampere hopes to ink a...
6 days ago - This was the week that Apple finally infused artificial intelligence into its new iPhones, Watches and AirPods, though some of features won’t be coming for a bit and overall, the AI stuff seemed a little underwhelming. The medical...
Other stories
43 minutes ago - (Reuters) -Nike said on Thursday that former senior executive Elliott Hill will rejoin the company to succeed John Donahoe as president and CEO, as the sportswear giant shakes up its top rank amid efforts to revive sales and battle rising...
43 minutes ago - Trump maintains a roughly 60% stake in Trump Media & Technology Group, which trades on the Nasdaq under the ticker symbol "DJT."
43 minutes ago - FedEx and other transportation firms expanded operations during the pandemic-fueled online shipping boom. The company has been trying to cut billions in overhead costs after demand normalized. In June, FedEx completed a restructuring...
43 minutes ago - On CNBC's “Mad Money Lightning Round,” Jim Cramer said Wells Fargo & Company (NYSE:WFC) is going to go higher, adding that it's a “winner.” On Sept. 17, the San Francisco-based bank launched specialized Application Programming Interfaces...
43 minutes ago - Wall Street has absorbed the Fed's message that a deep cut will prove positive for the economy.