pwshub.com

Meta debuts slimmed-down Llama models for low-powered devices

Meta Platforms Inc. is striving to make its popular open-source large language models more accessible with the release of “quantized” versions of the Llama 3.2 1B and Llama 3B models, designed to run on low-powered devices.

The Llama 3.2 1B and 3B models were announced at Meta’s Connect 2024 event last month. They’re the company’s smallest LLMs so far, designed to address the demand to run generative artificial intelligence on-device and in edge deployments.

Now it’s releasing quantized, or lightweight, versions of those models, which come with a reduced memory footprint and support faster on-device inference, with greater accuracy, the company said. It’s all in the pursuit of portability, Meta said, enabling the Llama 3.2 1B and 3B models to be deployed on resource-constrained devices while maintaining their strong performance.

In a blog post today, Meta’s AI research team explained that, thanks to the limited runtime memory available on mobile devices, it opted to prioritize “short-context applications up to 8K” for the quantized models. Quantization is a technique that can be applied to reduce the size of large language models by modifying the precision of their model weights.

Meta’s researchers said they used two different methods to quantize the Llama 3.2 1B and 3B models, including a technique known as “Quantization-Aware Training with LoRA adaptors,” or QLoRA, which helps to optimize their performance in low-precision environments.

The QLoRA method helps to prioritize accuracy when quantizing LLMs, but in cases where developers would rather put more emphasis on portability at the expense of performance, a second technique, known as SpinQuant, can be used. Using SpinQuant, Meta said it can determine the best possible combination for compression, so as to ensure the model can be ported to the target device while retaining the best possible performance.

Meta noted that inference using both of the quantization techniques is supported in the Llama Stack reference implementation via PyTorch’s ExecuTorch framework.

In its tests, Meta demonstrated that the quantized Llama 3.2 1B and Llama 3B models enable an average reduction in model size of 56% compared to the original formats, resulting in a two- to four-times speedup in terms of inference processing. The company said tests with Android OnePlus 12 smartphones showed that the models reduced memory resource usage by an average of 41%, while almost matching the performance of the full-sized versions.

Meta developed the quantized Llama 3.2 1B and Llama 3B models in collaboration with Qualcomm Inc and MediaTek Inc. to ensure that they’re optimized to run on those companies’ Arm-based system-on-chip hardware. It added that it used Kleidi AI kernels to optimize the models for mobile central processing units. By enabling the Llama models to run on mobile CPUs, developers will be able to create more unique AI experiences with greater privacy, with all interactions taking place on the device.

The quantized Llama 3.2 1B and Llama 3B models can be downloaded from Llama.com and Hugging Face starting today.

Meta’s AI research efforts have been in overdrive this month. The quantized Llama models are the company’s fourth major announcement in just the last three weeks. At the start of the month, the company unveiled a family of Meta Movie Gen models that can be used to create and edit video footage with text-based prompts.

A few days later, it announced a host of new generative AI advertising features for marketers, and late last week it debuted an entirely new model called Spirit LM, for creating expressive AI-generated voices that reflect happiness, sadness, anger, surprise and other emotions.

Source: siliconangle.com

Related stories
2 weeks ago - Meta Platforms Inc. today said it’s rolling out a full-screen video tab on Facebook and Instagram in recognition of the fact that its users spend more time watching videos than anything else on its platforms. And with that, it’s also...
1 week ago - Concerns are mounting over when and how all this investment in artificial intelligence will pay off — even at AI leader OpenAI, which reportedly predicts it will lose $14 billion in 2026 on $100 billion in revenue and won’t make a profit...
1 month ago - The Allen Institute for AI today released Molmo, a family of open-source language models that can process text and images. The launch came against the backdrop of Meta Platforms Inc.’s Connect 2024 product event. Alongside new mixed...
2 days ago - Haiper Ltd., a venture-backed artificial intelligence startup, today debuted a video generator that can create short clips based on user prompts. Haiper 2.0 is a new iteration of an AI model that the company debuted earlier this year....
1 month ago - Seekr Technologies Inc., an enterprise-ready artificial intelligence platform building trust into the application lifecycle, today announced the launch of its self-service AI product SeekrFlow that will allow enterprise customers to...
Other stories
50 minutes ago - Cohere for AI, the nonprofit research lab run by the artificial intelligence startup Cohere Inc., pushed the boundaries of multilingual frontier AI model research today with the release of Aya Expanse, a family of high-performance...
51 minutes ago - Electric-vehicle (EV) battery technology company QuantumScape (NYSE: QS) gave investors a reason to cheer with its third-quarter report last night,...
51 minutes ago - Frontier Airlines (ULCC) is exploring a renewed bid for Spirit Airlines (SAVE) as Spirit continues discussions with bondholders over the terms of a potential bankruptcy filing, Alison Sider and Alexander Gladstone of Wall Street Journal...
51 minutes ago - A $350,000 account could be exhausted in just a few years if both members of a couple required semi-private rooms in skilled nursing facilities. However, that's not necessarily what would happen. For one thing, most people don't run up...
1 hour ago - Artificial intelligence is at the forefront of shaking up industries, with AI workforce automation poised to transform how businesses operate. While the promise of increased efficiency and innovation is clear, the challenges that come...