pwshub.com

Cerebras challenges Nvidia with new AI inference approach

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

In brief: The AI inference market is expanding rapidly, with OpenAI projected to earn $3.4 billion in revenue this year from its ChatGPT predictions. This growth presents opportunities for new entrants like Cerebras Systems to capture some of this market share. The company's innovative approach could lead to faster, more efficient AI applications, provided its performance claims are validated through independent testing and real-world applications.

Cerebras Systems, traditionally focused on selling AI computers for training neural networks, is pivoting to offer inference services. The company is using its wafer-scale engine (WSE), a computer chip the size of a dinner plate, to integrate Meta's open-source LLaMA 3.1 AI model directly onto the chip – a configuration that significantly reduces costs and power consumption while dramatically increasing processing speeds.

Cerebras' WSE is unique in that it integrates up to 900,000 cores on a single wafer, eliminating the need for external wiring between separate chips. Each core functions as a self-contained unit, combining both computation and memory. This design allows the model weights to be distributed across the cores, enabling faster data access and processing.

"We actually load the model weights onto the wafer, so it's right there, next to the core, which facilitates rapid data processing without the need for data to travel over slower interfaces," Andy Hock, Cerebras' Senior Vice President of Product and Strategy, told Forbes.

The performance of Cerebras' chip is "10 times faster than anything else on the market" for AI inference tasks, with the ability to process 1,800 tokens per second for the 8-billion parameter version of LLaMA 3.1, compared to 260 tokens per second on state-of-the-art GPUs, Andrew Feldman, co-founder and CEO of Cerebras, said in a presentation to the press. This level of performance has been validated by Artificial Analysis, Inc.

Beyond speed, Feldman and chief technologist Sean Lie suggested that faster processing enables more complex and accurate inference tasks. This includes multiple-query tasks and real-time voice responses. Feldman explained that speed allows for "chain-of-thought prompting," leading to more accurate results by encouraging the model to show its work and refine its answers.

In natural language processing, for example, models could generate more accurate and coherent responses, enhancing automated customer service systems by providing more contextually aware interactions. In healthcare, AI models could process larger datasets more quickly, leading to faster diagnoses and personalized treatment plans. In the business sector, real-time analytics and decision-making could be significantly improved, allowing companies to respond to market changes with greater agility and precision.

Despite its promising technology, Cerebras faces challenges in a market dominated by established players like Nvidia. Team Green's dominance is partly due to CUDA, a widely used parallel computing platform that has created a strong ecosystem around its GPUs. Cerebras' WSE requires software adaptation, as it is fundamentally different from traditional GPUs. To ease this transition, Cerebras supports high-level frameworks like PyTorch and offers its own software development kit.

Cerebras is also launching its inference service through an API to its own cloud, making it accessible to organizations without requiring an overhaul of existing infrastructure. The company plans to expand its offerings by soon providing the larger LLaMA 405 billion-parameter model on its WSE, followed by models from Mistral and Cohere. This expansion could further solidify its position in the AI market.

However, Jack Gold, an analyst with J.Gold Associates, cautions that "it's premature to estimate just how superior it will be" until more concrete real-world benchmarks and operations at scale are available.

Source: techspot.com

Related stories
3 weeks ago - Shove 32 of 'em in a box and you've got nearly 24 petaFLOPS of FP8 perf Hot Chips RISC-V champion Tenstorrent offered the closest look yet at its upcoming Blackhole AI accelerators at Hot Chips this week, which they claim can outperform...
3 days ago - Under the skin — Matching refractive indexes lets some wavelengths pass cleanly through the skin. Enlarge / Zihao...
1 week ago - AI infra startup serves up Llama 3.1 405B at 100+ tokens per second Not to be outdone by rival AI systems upstarts, SambaNova has launched inference cloud of its own that it says is ready to serve up Meta’s largest models faster than the...
3 weeks ago - The setback won't stop us from banking billions, CFO insists Nvidia has confirmed earlier reports that its Blackwell generation of GPUs suffered from a design defect that adversely impacted the yields of the hotly anticipated accelerators.…
1 month ago - All a simple misunderstanding, we're told A meeting intended to allay fears that Emirati AI firm G42's deal with Microsoft could result in China accessing advanced US technology was reportedly cancelled by the UAE ambassador to the US.…
Other stories
7 minutes ago - Those anxiously waiting to place a pre-order for a PlayStation 5 Pro might want to hold on a second. Sony just announced that it is releasing a 30th-anniversary edition that harkens back to the days of PlayStation yore. The deck comes in...
7 minutes ago - Football is back, and you can see every snap, tackle and touchdown for a single affordable price with Hulu + Live TV. You also get access to your favorite shows and thousands of on-demand movies from Hulu and Disney+.
7 minutes ago - Whether you're excited for the World Series, the NFL, NHL, NBA, or college sports, Hulu + Live TV gives you comprehensive coverage of all of this fall's biggest leagues and matchups in one place for a single affordable price–plus...
7 minutes ago - Apple released the first public beta of iOS 18.1 on Thursday, less than a week after the tech giant released iOS 18 to the general public. iOS 18...
8 minutes ago - The best solar installation company can make your life easier by helping guide you through the process of permitting and installation. Here are the ones we recommend.