pwshub.com

Nvidia fixes Blackwell chip flaw with help from TSMC, mass production back on schedule

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

What just happened? Nvidia has successfully fixed a design flaw in its latest Blackwell AI chips, according to CEO Jensen Huang. The issue, which caused production delays, has been solved with the assistance of TSMC, Nvidia's long-standing manufacturing partner. In fact, it was TSMC that originally spotted the problem.

Overcoming this issue was crucial for Nvidia, as it aims to maintain its dominant position in the AI chip market. As demand for high-performance AI computing solutions continues to surge, the successful launch of Blackwell will play a pivotal role in providing the necessary hardware.

Huang candidly admitted the company's responsibility for the setback. "We had a design flaw in Blackwell," he said. "It was functional, but the design flaw caused the yield to be low. It was 100 percent Nvidia's fault."

The Blackwell chips, unveiled in March, were originally slated for second-quarter shipping. However, the design flaw led to delays, potentially affecting major customers such as Meta, Google, and Microsoft.

The Blackwell project was unusually complex, Huang said, which may have been a factor in the flaw. "In order to make a Blackwell computer work, seven different types of chips were designed from scratch and had to be ramped into production at the same time."

The technical issue stemmed from the intricate packaging technology used in the Blackwell B100 and B200 GPUs. These chips employ TSMC's CoWoS-L packaging, which utilizes an RDL interposer with local silicon interconnect bridges to achieve data transfer rates of about 10 TB/s. The problem arose from a mismatch in thermal expansion properties between various components, causing system warping and failure.

To address this, Nvidia modified the top metal layers and bumps of the GPU silicon, enhancing production yields. While specific details of the fix remain undisclosed, the company confirmed that new masks were required.

The speed of the resolution is noteworthy. Typically, addressing such issues in the semiconductor industry involves modifying metal layers and creating new steppings, a process that can take around three months. "What TSMC did was to help us recover from that yield difficulty and resume the manufacturing of Blackwell at an incredible pace," Huang said.

With the design flaw now resolved, mass production of the fixed Blackwell GPUs is set to begin in late October. Shipments are expected to start in early 2025, aligning with Nvidia's fiscal year.

Despite the setback, demand for Blackwell chips remains high. Huang had previously described the demand as "insane," with customers eager to be first in line for the new technology.

Google has ordered over 400,000 GB200 chips in a deal exceeding $10 billion. Similarly, Meta has placed a $10 billion order, while Microsoft is set to receive 55,000 to 65,000 GB200 GPUs ready for OpenAI by the first quarter of 2025.

Source: techspot.com

Related stories
1 month ago - The setback won't stop us from banking billions, CFO insists Nvidia has confirmed earlier reports that its Blackwell generation of GPUs suffered from a design defect that adversely impacted the yields of the hotly anticipated accelerators.…
1 month ago - 'Chain of thought' techniques mean latest LLM is better at stepping through complex challenges OpenAI on Thursday introduced o1, its latest large language model family, which it claims is capable of emulating complex reasoning.…
1 month ago - GPU giant accused of colluding with Microsoft, RPX to sideline startup Nvidia is embroiled in an antitrust'n'patent lawsuit, which alleges the GPU giant colluded with Microsoft and the intellectual property risk management firm RPX to rip...
3 weeks ago - It’s called ‘Cosmos’ and Nvidia, Cisco, X, SuperMicro and VAST Data all think it will help – them and you A group of top enterprise vendors feel that AI is changing so fast it’s “undigestible” to many, so they’ve created an org they hope...
1 month ago - Data centres gobble up roughly 2% of global electricity, which translates to around 1% of energy-related greenhouse gas emissions. Streaming Netflix, storing stuff in the cloud, and meeting up on Zoom are just some of the online...
Other stories
6 minutes ago - After introducing its "computing broker" solution in 2023, Fujitsu has now confirmed that the product is finally available for purchase in Japan and other markets worldwide. The Kawasaki-based corporation aims to achieve through software...
29 minutes ago - Electrifying your home doesn't have to be complicated. Here's how you can save on energy-efficient home upgrades and prevent buyer's remorse.
29 minutes ago - The CDC has opened an investigation into the fast food giant after tainted Quarter Pounders left one dead, 10 hospitalized and dozens ill.
29 minutes ago - Why You Can Trust CNET Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy...
30 minutes ago - Why You Can Trust CNET Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy...