pwshub.com

Big data vendors embrace Apache Iceberg

Apache Iceberg has secured renewed momentum in the last week after leading vendors in data warehousing and analytics all announced new features around the open source table format.

AWS, Cloudera, Google, and Snowflake came out in support of Apache Iceberg. Iceberg faces off contenders including Databricks' Delta Lake – also an open source Linux Foundation project – and Apache Hudi. They are all battling to become the standard table format, allowing users to query data with an analytics engine of their choice without moving it.

For example, Google's data warehouse and analytics environment, BigQuery, is previewing BigQuery tables for Apache Iceberg, which it calls a fully managed, Apache Iceberg-compatible storage engine. The Chocolate Factory aims to bring together its data warehouse and data lake technology, BigLake, in a so-called lakehouse architecture.

"BigLake tables are currently read-only; BigQuery customers have to perform data mutations through external query engines and manually orchestrate data management," the vendor explained in a blog post.

"BigQuery tables for Apache Iceberg use the Apache Iceberg format to store data in customer-owned cloud storage buckets while providing a similar customer experience and feature set as BigQuery native tables."

In this way, the new BigQuery tables are also writable from BigQuery through GoogleSQL data manipulation language (DML) and support ingestion from open source engines such as Apache Spark through BigQuery's Write API.

AWS's Redshift is a rival to BigQuery in so-called cloud-native data warehousing. It has introduced secure sharing of data lake tables, which supports open file formats including Parquet, ORC, JSON, and CSV, as well as open table format Apache Iceberg, all stored in Amazon S3.

Cloudera and Snowflake have different histories in the data analytics market. While the former started out building data lakes out of the Apache Hadoop (HDFS) system, Snowflake was seen as a leader in executing the separation of storage and computing in cloud-based data warehouse systems.

  • The force is strong in Iceberg: Are the table format wars entering the final chapter?
  • Cassandra redesigns indexing, storage management for 5.0 release
  • Snowflake claims Iceberg wins table format wars, and Databricks has just proved it
  • Third time was the charm for SingleStore in the cloud, CEO says

In 2022, both companies backed Apache Iceberg to improve interoperability without moving data.

Last week, Cloudera announced integration with Snowflake by extending its Open Data Lakehouse interoperability, which it said would offer joint customers access to Cloudera's Data Lakehouse via its Apache Iceberg REST Catalog.

In a statement, Abhas Ricky, chief strategy officer of Cloudera, said the move would help customers simplify their data architecture, minimize data pipelines, and reduce the total cost of ownership of their data estate while reducing security risks.

icebergs iceland

The force is strong in Iceberg: Are the table format wars entering the final chapter?

READ MORE

Keen observers will note exceptions to the table format love-in, including Microsoft, provider of the second-largest market cloud infrastructure Azure and a slew of data technologies, including its lakehouse environment Fabric. Microsoft went with Delta Lake, owing to market demand, according to Arun Ulag, corporate vice president of Azure Data. Although Microsoft Fabric provides some support for Iceberg and Hudi by default, Fabric favors Delta and Apache Parquet, the column-oriented data file format.

Databricks, meanwhile, dreams of creating a single standard with the best bits of Iceberg and Delta. While that work progresses, it offers hope of integration via its UniForm product, designed to allow data stored in Delta to be read as if it were Apache Iceberg or Apache Hudi.

Earlier this month, Snowflake principal engineer Russell Spitzer said he hoped the de facto standard would be Iceberg. After recently joining from Apple – where Iceberg is said to be wall-to-wall – he said he was seeing a number of developer groups from vendors and tech firms start to contribute to the Iceberg project. ®

Source: theregister.com

Related stories
3 weeks ago - Former Apple engineer and Apache PMC member Russell Spitzer describes efforts to unite around a single format Interview In June, Databricks shelled out $1 billion for Tabular, a startup backer of the open source Apache Iceberg table...
1 month ago - A company specializing in cloud computing services recently published its latest report on the state of the cloud market. According to Civo's analysis, this year's cloud computing landscape is causing significant headaches for many...
1 month ago - Some users will see the appeal of Big Red stacking its hardware in Amazon's datacenters Analysis At Big Red's recent CloudWorld shindig in Las Vegas, Matt Garman, CEO of AWS, looked comfortable and relaxed being hosted by arch rival...
2 weeks ago - Surely Redmond knows that almost nobody has tamed unstructured data? Opinion A year ago it looked as if the world could be Microsoft's oyster. The software giant dominated the enterprise, was catching up to cloudy rivals, and then managed...
1 month ago - T-Mobile recently hosted the company's Capital Markets Day, where they displayed a respectable effort to reignite excitement around 5G and what it can do to enhance both our personal and professional lives. The company made it clear that...
Other stories
16 minutes ago - Artificial General Intelligence readiness advisor Miles Brundage bails, because nobody is ready OpenAI has lost another senior staffer, and on his way out the door this one warned the company – and all other AI shops – are just not ready...
25 minutes ago - Here's today's NYT Mini Crossword answer. These answers will help you solve New York Times' popular crossword game, Mini Crossword, every day!
2 hours ago - Station claims its visionary, ex-employees claim it cynical; reality appears way more fiscal A Polish radio station has ditched its on-air talent for AI in what its editor-in-chief calls an experiment on the effect of AI in society,...
3 hours ago - But it remains to be seen how the next president will approach the controversial and complex issue.
3 hours ago - Here's today's Connections answer and hints for groups. These clues will help you solve New York Times' popular puzzle game, Connections, every day!