pwshub.com

Transforming AI infrastructure: The role of networking in high-performance computing

AI infrastructure is rapidly transforming industries, especially in networking and data transfer.

Networking is an exciting and necessary part of building high-performance infrastructure for data transfer and machine learning training, with a shift toward on-premises infrastructure due to cost and data privacy concerns, according to Raj Yavatkar (pictured), chief technology officer of Juniper Networks Inc..

“It is an exciting place to be because everybody is building huge GPU clusters,” he said. “All of those clusters need to be fed data from storage networks. Then once you have data in, you need to have data transferred between these GPUs. So, networking is needed no matter what — very high performance, a high throughput, low latency kind of network. That’s being built. More exciting part is that a lot of the infrastructure is not just being built in hyperscalers, it’s also being built by enterprises on-prem, which is a big shift in the market.”

Yavatkar spoke with theCUBE Research’s John Furrier at the AI Infrastructure Silicon Valley – Executive Series event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the integration of AI into networking, the shift toward AI-native infrastructure and the challenges of scaling networks to support large GPU clusters in enterprise environments.

AI infrastructure drives the shift to on-premises solutions

AI is being used to automate network configurations based on workload, with the Ultra Ethernet Consortium working to evolve the Ethernet standard to match training and inferencing requirements, according to Yavatkar.

“What’s happening with GPU-based clusters, these things are coming together. The standard Ethernet, we are being able to show it’s not just good enough. It performs as well or better than InfiniBand-based networks,” he said. “That is being recognized by industry now by creating this Ultra Ethernet Consortium, which is a consortium of all the vendors trying to evolve Ethernet standard to add some new capabilities to match the training workloads and inferencing workloads.”

Networking has evolved to intelligence-based, integrating telemetry and machine learning for application-aware assurance, automating root cause diagnostics and potentially remediation, Yavatkar added.

“You start collecting telemetry from application, from compute, GPU, networking, operating systems, and you start correlating using machine learning model,” he said. “If you do that, now you can start finding out where the problem is because you can find the anomalies, you find the switch buffers are running out of space, you find the packet loss has increased, latency has spiked. Based on that, you can point out where the problem is, and then you can do automated root cause diagnostics.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the AI Infrastructure Silicon Valley – Executive Series event:

Photo: SiliconANGLE

Source: siliconangle.com

Related stories
3 weeks ago - The rapid advancement of AI hardware, particularly in high-performance computing, is revolutionizing the enterprise computing infrastructure, with new data- and AI-centric solutions emerging quickly. Recently, a plethora of industry...
1 month ago - Ahead of the annual Black Hat cybersecurity conference in Las Vegas, we warned that defensive tool sprawl is only likely to get worse. Onsite, the talk was about, of course, the impact of AI. So far, so good, but defenders are bracing for...
1 month ago - The future of artificial intelligence infrastructure is here, and modern business operations are seeking to capitalize, including with enterprise AI tools. It’s a fast-moving evolution that is expected only to grow in the years to come:...
1 month ago - The growing threat of cybersecurity attacks along the increasingly complex AI landscape reflects one reason Black Hat USA 2024 is one of the biggest cybersecurity conferences of the year. With 17.8 million phishing emails in the last six...
1 month ago - The cybersecurity industry is experiencing a transformative phase driven by the rapid adoption of artificial intelligence and the escalating need for stronger data security. As organizations navigate the complexities of integrating AI...
Other stories
15 minutes ago - Trump maintains a roughly 60% stake in Trump Media & Technology Group, which trades on the Nasdaq under the ticker symbol "DJT."
16 minutes ago - Dividend investing took a back seat ever since the AI-led craze caused everyone to pile into technology growth stocks. However, long-term investors seeking a stable and reliable income stream always look for strong dividend payers that...
16 minutes ago - It’s easy to think that once someone hits billionaire status, they'd just buy whatever they want with cash – especially something as basic as a home. But even the world's wealthiest, like Elon Musk, Mark Zuckerberg and Jay-Z, have taken...
16 minutes ago - On Wednesday, the Federal Trade Commission said Ryan Cohen, managing partner of RC Ventures and Chairman and CEO of GameStop Corporation (NYSE:GME), will pay a $985,320 civil penalty. This fine stems from charges that Cohen violated the...
52 minutes ago - Coming into 2024, the enterprise technology space buzzed with speculation on the future following VMware LLC’s acquisition by Broadcom Inc. Analysts and experts mused on how Broadcom would handle the portfolio direction for VMware’s many...