pwshub.com

Dremio says it has dramatically improved query performance on Iceberg data lakes

Data lakehouse company Dremio Corp. today announced a set of advanced analytics performance capabilities that it says significantly speed query performance on Apache Iceberg tables while reducing the need for user intervention.

The two major new features are Live Reflections and Result Set Caching. Dremio Reflections are a feature of the company’s data lake engine that accelerates query performance by creating optimized, precomputed data representations. They’re similar in concept to materialized views but are more flexible and integrated with Dremio’s architecture. As a result, they enable faster and more interactive querying of large datasets stored in data lakes without data movement or duplication.

Live Reflections ensure that materialized views and aggregations are automatically updated for optimal performance whenever changes are made to base Iceberg tables. Users can accelerate queries without any maintenance overhead with the system recommending Reflections that provide the best value and system-wide performance.

“It used to be that you had to figure out which Reflections you wanted to create and then manage the refresh cycle,” said Chief Executive Tomer Shiran (pictured). “You had to logically figure out what aggregations you needed, how to sort the table and how frequently to refresh. We’ve now solved both of those problems.”

Recommended Reflections essentially monitor activity across the entire data lake and learn what queries are being used most often and how they can be accelerated. Any updates to a table automatically refresh all the downstream Reflections incrementally, even if joins cross multiple tables.

Shiran said Apache Iceberg’s embedded change-tracking features make this possible. “You can note that the version of this table that was used for this query is the same as the version currently being queried,” he said. “I don’t have to worry that something may have changed. I know with certainty that it won’t return a different result than what the user expects.”

Result Set Caching can accelerate query responses up to 28-fold across all data sources by storing frequently accessed query results rather than just the queries, Dremio claimed. “People often query the same data,” Shiran said. “The optimizer takes the query plan, and asks if it can use one of the existing Reflections. The user isn’t aware of it.”

Storing query results instead of queries in the database consumes more storage but “object storage is cheap,” Shiran said. “Compute is expensive.”

A new data merge-on-read feature speeds Iceberg table writes and ingestion operations by up to 85%. Notification-based auto ingest ensures continuous updates with fresh data by automatically monitoring object storage for new files and automatically ingesting them when a notification is received.

“It’s all incremental and live, unlike in the past when you had to manually schedule an operation,” Shiran said. “Now you just insert the records automatically, and because all the updates are incremental, they’re cheap.”

Photo: SiliconANGLE

Source: siliconangle.com

Related stories
1 month ago - All eyes were on Nvidia’s earnings report this week as a proxy for the artificial intelligence economy, and even for the graphics chip giant, it was too much to live up to. Nvidia earnings disappointed, but really, how could they not?...
3 days ago - Big-data company dbt Labs Inc. kicked off its annual user conference Coalesce 2024 in Las Vegas this week with a host of updates to its flagship dbt Cloud product, saying they will help cement its status as a “data control plane” for...
1 week ago - Artificial intelligence is a game-changing technology in the enterprise world, increasingly adopted for its potential to enhance human capabilities. AI trends are driving industrial transformation across sectors. Large language models,...
1 week ago - Data storage company Vast Data Inc. is putting the infrastructure in place for enterprises that want to use data retrieval to enhance the capabilities of their artificial intelligence systems. Today it announced the launch of Vast...
Other stories
59 minutes ago - The proposed Cumberland Project, set to be constructed by Kinder Morgan's Tennessee Gas Pipeline, could transport about 245,000 dekatherms per day of additional natural gas to power supplier Tennessee Valley Authority. On a 2-1 vote,...
1 hour ago - Industries across the board today are facing the challenge of integrating cutting-edge technologies with established infrastructures. Companies are bringing together different data systems using data unification solutions to simplify work...
1 hour ago - China's economy has the perfect ingredients to stage a monster stock rally in the next year, one research firm CEO says.
2 hours ago - Digital banking provider Monzo Bank Ltd. today disclosed that it has completed a secondary sale at a $5.9 billion valuation. A secondary sale is a transaction that allows a startup’s employees and early investors to sell some of their...
3 hours ago - Nvidia's Blackwell GPU chips are sold out for the next 12 months, Morgan Stanley said after hosting CEO Jensen Huang and CFO Colette Kress.