pwshub.com

Open lakehouse spurs innovation amid AI data demands

Generative artificial intelligence is demanding breakneck innovation from enterprises. It’s highlighting a critical need for cohesive data management and driving a seismic shift in data storage, processing and utilization. It’s also prompting a rethink of the open lakehouse concept pioneered by companies such as Onehouse.

Vinoth Chandar, CEO of Infinilake (aka Onehouse), talks to the CUBE about the concept of the open lakehouse at Supercloud 7 2024.

Supercloud 7 discussion on open lakehouses with Onehouse’s Vinoth Chandar.

“We firmly believe an open lakehouse is the way of the future, but there is no Snowflake experience for the lakehouse per se,” said Vinoth Chandar (pictured), chief executive officer of Infinilake Inc. (aka Onehouse). “Onehouse was founded on the premise that we are going to bet on this open lakehouse being the one house that is going to store the data for diverse use cases. And the bet that we are going to be in a world where there’s multiple workloads and we feel like today it’s kind of converged to that point.”

Chandar spoke with theCUBE Research’s John Furrier at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed Onehouse paving the way for a future where data management is more integrated, efficient and adaptable to the evolving needs of AI-driven enterprises by prioritizing open formats, unified data layers and collaborative development.

The genesis of Onehouse and the open lakehouse model

Onehouse was founded to address the growing complexities in data management and to champion the open lakehouse model. The origin of Onehouse traces back to Uber Technologies Inc., where the team built the world’s first data lakehouse, initially termed a transactional data lake. This pioneering project evolved into Apache Hudi, a technology that underpins the open lakehouse approach, according to Chandar.

“We had only two options: we run every pipeline in a streaming mode, which costs a lot of money. It’s not even feasible to do that. Or we make our data processing on the lake smarter and more intelligent. We looked at warehouses and databases and we said if we brought some of that functionality just on top of HDFS and like a YARN compute layer … what we missed was this database abstraction on top of it. So that’s how we conceived the project.”

The ethos at Onehouse is that an open lakehouse offers a scalable and unified solution for diverse data workloads. Unlike traditional data lakes, which often result in ambiguous returns on investment, the lakehouse model promises a more cohesive and efficient data management framework. This evolution reflects the industry’s shift toward integrating data lakes and warehouses, resulting in a versatile system capable of handling various data formats and workloads, according to Chandar.

The need for unified data and the open data layer model

Data unification is foundational to the lakehouse architecture. Gen AI use cases also benefit greatly from a unified data layer rather than fragmented, siloed data sources. Unified data does not imply centralization but rather an integrated system where data from various sources can be accessed and utilized seamlessly, Chandar noted.

“If you look at the lakehouse, the story so far, it’s actually been about structured data,” he said. “So what we’ve done is adapt our warehouse capabilities, which have been more focused on structured data to the lake, but I think [in] the coming years, you will see that the lakehouse technologies [are] focused a lot more on unstructured data in a way that you can store both side by side. You have a single data management framework covering all of this.”

The concept of an open-data layer is central to this approach. By adopting open-data formats and ensuring interoperability across different data engines, organizations can achieve greater flexibility and scalability. This model aligns with the broader industry trend toward open-source solutions and collaborative development, which are crucial for fostering innovation and adaptability, according to Chandar.

“What’s broadly going on now is these four layers are getting unbundled in a way that … now we are saying, ‘Instead of a proprietary columnar format, stored data and Parquet, use one of these table formats to represent them as tables. And the SQL layer of these warehouses are proprietary engines, and governance layers can recognize these tables,'” he said. “I think that’s where we moved from 2021 to 2024. All [of] the recent news you see around the governance catalogs is essentially the next layer in the stack that is now getting a little bit unpacked.”

Stay tuned for the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Supercloud 7: Get Ready for the Next Data Platform event.

Photo: SiliconANGLE

Source: siliconangle.com

Related stories
1 month ago - Three main pressure points are transforming the modern data landscape: 1) Increased interest in adopting open table formats to allow any compute to operate on any data; 2) The point of control is shifting from the database management...
1 month ago - The blowback continues tied to the massive outage from CrowdStrike Holdings Inc., having hit an estimated 8.5 million computers running Windows. Providing an CrowdStrike outage impact analysis was a main focus for theCUBE Research...
3 weeks ago - We believe enterprise applications are undergoing a profound change. By next year, highly capable agentic systems will emerge to create new application classes and alter the way organizations think about their backend systems, data...
1 month ago - Survey results recently published by SiliconANGLE showed that enterprise customers remained conflicted on how to rationalize a need to balance data trust with a strong motivation to move fast and innovate. As key data management firms,...
1 month ago - FOSSA Inc., an open-source compliance and security platform, today announced it has acquired the developer tool community platform StackShare for an undisclosed amount, bringing on board 1.5 million registered users. As a software...
Other stories
7 minutes ago - Tom Lee has called for a stock rally after rate cuts, but even after the Fed cut 50 basis points, he's wary on stocks ahead of the election.
7 minutes ago - With the lockup period set to expire, Trump could start offloading his nearly $2 billion worth of stock, though the former president has said he wouldn't sell.
2 hours ago - (Bloomberg) -- Skechers U.S.A. Inc. shares delivered their worst daily performance since February after the footwear company’s chief financial officer told an industry conference that China sales will be under pressure the rest of the...
3 hours ago - The Fed's cutting cycle in 1995 sparked an economic boom, with the stock market more than doubling in value by the end of the decade.
3 hours ago - There's nothing like a potentially massive government contract to win the hearts of both investors and analysts.