pwshub.com

Metadata management tools and the data platform shift - SiliconANGLE

There’s an ongoing changing source of truth amid the data platform shift. It’s a rapidly evolving situation, as companies must consider open table formats and metadata management tools.

The open table format landscape includes Delta Lake, Iceberg and Apache Hudi. Much has happened in recent months, including Databricks Inc. purchasing Tabular Inc., according to Bob Muglia (pictured), entrepreneur and builder.

“In a way, right now, Databricks has controlling capability of both the Iceberg and the Delta formats, but this is important to all the other vendors, and we’ll just watch what happens over the coming months,” Muglia said. “I do think that we’ll continue to see coexistence of these two things. In the last year, fortunately, there have been tools that have been developed to allow for both to be used simultaneously.”

Muglia spoke with theCUBE Research’s George Gilbert at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the evolution of data platform standards and the importance of metadata management tools.

Metadata management tools and universal open-source capabilities

There are metadata management tools that exist today including XTable, which copies metadata. Fortunately, data formats all use the same on-disk format for data, according to Muglia.

“It’s really just the metadata we’re talking about. But I do think we’ll see those things converging, and I expect to see an open-source capability coming out, an open-source environment coming out that will be adopted pretty much universally across the vendors,” he said. “That’s what I hope to see anyway.”

The second thing that appears to be happening is catalogs being built on top of open data lake formats and collectively between a catalog and an underlying data format that is one’s source of truth, according to Muglia. They’re being developed, but they’re not very compatible with each other.

“Once again, my guess is that’s just early stages of things, and we’ll start to see something emerging that could be compatible and used across multiple vendors, but that’s certainly not where we are at the moment,” he said. “We’re early stages of this transition from where we have proprietary formats to an open format, but the industry hasn’t quite settled on it yet.”

It’s clear that the source of truth isn’t just the data. Metadata has to start with the technical operational data because the data warehouses and tools that run in the data environment have to be able to work with the data in a cohesive and secure way, according to Muglia.

“I think over time it’ll include the higher levels of semantics as well. This is one of those open questions. Nobody really knows how that’s going to develop,” he said. “As you go up the stack and try to do more and more, you may want to have more and more capabilities, which could be an opportunity for vendor differentiation as well. So we’ll see.”

The challenge of unifying technical, operational and semantic data

It all poses a question: Is there a way to separate the technical metadata from the operational metadata, from the richer semantics? Or, if one wants a coherent source of data, do they all need to have one underlying unifying owner?

“I don’t think you need one engineer for it,” Muglia said. “I think you need to have a way of accessing the data coherently across multiple engines, potentially.”

For instance, if one had knowledge graph database processors, that would want to work with the same information a SQL database would be working with, according to Muglia. It means that some of the same metadata is required.

“But then there’s a lot more information that one could put in the higher level semantic layer. And in fact, if you look at that, there’s a lot of operations that you want to perform on that data,” he said. “They’re graphs, and they’re complicated graphs, and there are relational operators that can be applied across the graphs.”

Today’s databases and catalogs don’t do that. But change is happening fast.

“You need something different, which I believe is a relational knowledge graph, which we’re starting to see emerge now,” Muglia said.

Ultimately, companies will need to have a vision across all of their underlying metadata to get a consistent source of truth, according to Muglia. These changes are still far off into the distance.

“We’re really just beginning to see the emergence of this metadata in this semantic layer as a real thing,” he said.

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Supercloud 7: Get Ready for the Next Data Platform event:

Photo: SiliconANGLE

Source: siliconangle.com

Related stories
1 month ago - Ahead of the annual Black Hat cybersecurity conference in Las Vegas, we warned that defensive tool sprawl is only likely to get worse. Onsite, the talk was about, of course, the impact of AI. So far, so good, but defenders are bracing for...
1 month ago - Three main pressure points are transforming the modern data landscape: 1) Increased interest in adopting open table formats to allow any compute to operate on any data; 2) The point of control is shifting from the database management...
2 weeks ago - Last year theCUBE Research asserted that we are on the brink of a transformative shift toward intelligent data applications, set to revolutionize business operations. We introduced the concept of “Uber for All” as a metaphor, predicting...
3 weeks ago - We believe enterprise applications are undergoing a profound change. By next year, highly capable agentic systems will emerge to create new application classes and alter the way organizations think about their backend systems, data...
1 month ago - Snowflake Inc. is navigating a pivotal shift in the data landscape with its ambitious move to evolve the Snowflake Data Platform into an AI data cloud. The company’s strategic dilemma has centered on balancing solutions integration with...
Other stories
44 minutes ago - Tom Lee has called for a stock rally after rate cuts, but even after the Fed cut 50 basis points, he's wary on stocks ahead of the election.
45 minutes ago - With the lockup period set to expire, Trump could start offloading his nearly $2 billion worth of stock, though the former president has said he wouldn't sell.
2 hours ago - (Bloomberg) -- Skechers U.S.A. Inc. shares delivered their worst daily performance since February after the footwear company’s chief financial officer told an industry conference that China sales will be under pressure the rest of the...
3 hours ago - The Fed's cutting cycle in 1995 sparked an economic boom, with the stock market more than doubling in value by the end of the decade.
3 hours ago - There's nothing like a potentially massive government contract to win the hearts of both investors and analysts.