pwshub.com

Hugging Face puts squeeze on Nvidia's AI microservice play

Hugging Face this week announced HUGS, its answer to Nvidia's Inference Microservices (NIMs), which the AI repo claims will let customers deploy and run LLMs and models on a much wider variety of hardware.

Like Nvidia's previously announced NIMs, Hugging Face Generative AI Services (HUGS) are essentially just containerized model images that contain everything a user might need to deploy the model. The idea is that rather than having to futz with vLLM or TensorRT LLM to get a large language model running optimally at scale, users can instead spin up a preconfigured container image in Docker or Kubernetes and connect to it via standard OpenAI API calls.

HUGS are built around its open source Text Generation Inference (TGI) and Transformers frameworks and libraries, which means they can be deployed on a variety of hardware platforms including Nvidia and AMD GPUs, and will eventually extend support for more specialized AI accelerators like Amazon's Inferentia or Google's TPUs. Apparently no love for Intel Gaudi just yet.

Despite being based on open source technologies, HUGS like NIMS aren't free. If deployed in AWS or Google Cloud, they'll run you about $1 an hour per container.

For comparison, Nvidia charges $1 per hour per GPU for NIMs deployed in the cloud or $4,500 a year per GPU on-prem. If you're deploying a larger model, say Meta's Llama 3.1 405B, that spans eight GPUs, Hugging Face's offering will be significantly less expensive to deploy. What's more, support for alternative hardware types means customers won't be limited to Nvidia's hardware ecosystem.

Whether or not HUGS will be more performant or better optimized than NIMs, remains to be seen.

For those looking to deploy HUGS at a smaller scale, Hugging Face will also make the images available on DigitalOcean's cloud platform at no additional cost, but you'll still have to pay for the compute.

DigitalOcean recently announced the availability of GPU-accelerated VMs based on Nvidia's H100 accelerators which will run you between $2.5 and $6.74 per hour per GPU depending on whether you opt for a single accelerator or sign a 12-month commitment for eight.

Finally, those shelling out the $20 a month per user for Hugging Face's Enterprise Hub subscribers will have the option to deploy HUGS on their own infrastructure.

  • Nvidia CEO whines Europeans aren't buying enough GPUs
  • Anthropic's latest Claude model can interact with computers – what could go wrong?
  • Sorry, but the ROI on enterprise AI is abysmal
  • Major publishers sue Perplexity AI for scraping without paying

In terms of models, Hugging Face is fairly conservative and focuses on some of the most popular open models, including:

  • Meta's Llama 3.1 8B,  70B, and 405B (FP8)
  • Mistral AI's Mixtral 8x7B, 8x22B, and Mistral 7B
  • Nous Research's Hermes fine tunes of: Meta's three Llama 3.1 models and Mistral's Mixtral 8x7B
  • Google's Gemma 2 9B and 27B
  • Alibaba's Qwen 2.5 7B

We expect Hugging Face will quickly expand support to additional models like Microsoft's Phi-series of LLMs in the near future.

But, if paying for what essentially is a bundle of open source software and model files doesn't strike your fancy, nothing stops anyone from building their own containerized models using vLLM, Llama.cpp, TGI, or TensorRT LLM. You can find our hands-on guide on containerizing AI apps here.

With that said, what you're really paying for with Hugging Faces' HUGS or Nvidia's NIMs, for that matter, is the time and effort spent tuning and optimizing the containers for maximum performance. ®

Source: theregister.com

Related stories
1 month ago - Hugging Face cites community-driven customization as fuel for diverse AI model boom.
1 month ago - Prithvi, Prithvi, Prithvi good Researchers at IBM and NASA this week released an open source AI climate model designed to accurately predict weather patterns while consuming fewer compute resources compared to traditional physics-based...
1 week ago - New Google agreement could boost development of zero-emission small modular reactors.
1 week ago - Emteq Labs has announced the development of Sense, which it says is the world's first emotion-sensing smart glasses.Read Entire Article
1 month ago - AI opponents say Gates, Altman, and others will guide Oprah through an AI "sales pitch."
Other stories
25 minutes ago - Telecom, advertising, and newspaper lobbying groups have filed multiple lawsuits to block the Federal Trade Commission's adoption of its newly approved "Click-to-Cancel" regulation. It's not surprising considering that these same groups...
49 minutes ago - We love these headphones for their stellar performance and low price, which just dropped even lower.
49 minutes ago - Snag one of these sleek Android devices for less with straightforward discounts, trade-in offers and more bargains.
49 minutes ago - A complete guide to the many types of web hosting, including advice on how to choose the best hosting solution for your website.
49 minutes ago - Earn competitive rates with this full-service bank that may have flown under your radar.