News

The State of Apache Iceberg in the Enterprise (2026)

Yossi Reitblat

February 19, 2026

‍TL;DR

It’s always fascinating to attach numbers to feelings.
Working on Ryft, I’ve had a lot of feelings about how the Apache Iceberg ecosystem is evolving. Today I’m happy to share that I also have some numbers.

In January 2026, we surveyed 252 senior data leaders who are operating an Iceberg data lake in production. A few things are clear: Iceberg delivers strong performance, multi-engine flexibility, and a foundation for AI and ML workloads at scale.

But, these benefits also come with a new set of challenges.

Most organizations still rely on custom scripts and internal tooling to manage compaction, metadata growth, snapshot lifecycle, retention enforcement, and access controls. As table counts and data volumes grow, operational complexity grows with them.

👉 Download the full report

Why Apache Iceberg Matters in the Enterprise Today

Apache Iceberg is now foundational infrastructure in many enterprise data platforms.

What began as a table format adopted by lakehouse pioneers now defines how enterprises manage large-scale analytical data. Iceberg-managed data supports business-critical workloads, real-time analytics, customer analytics, and AI and ML workloads.

Data engineering leaders are standardizing on Iceberg because they need:

A unified data layer for their entire organization
Efficient data access at scale
Multi-engine access across Spark, Trino, Flink, and cloud-native engines

Until now, however, there has been limited data-backed insight into how enterprises actually operate Iceberg-managed data at production scale.

This report addresses that gap. Also, we really love graphs.

About the State of Apache Iceberg in the Enterprise Report

We commissioned an independent research firm to survey 252 senior data leaders actively responsible for Iceberg in production.

Respondents include VPs, directors, platform leads, and engineering managers responsible for their company’s data platform.

The research focuses on real-world production behavior:

Adoption patterns and workload mix
Table counts and data growth trajectories
Data management practices
Governance enforcement approaches
Operational tooling strategies

The survey documents how teams operate product Iceberg environments.

Key findings

1. Iceberg is a core part of the enterprise data platform

Iceberg is no longer positioned as an emerging technology.

Survey respondents report using Iceberg-managed data for large-scale analytics, ML feature stores, customer telemetry, and regulated datasets. Satisfaction levels are high. Most report measurable improvements in query performance and data reliability after migrating from legacy Hive or proprietary warehouse systems.

For many organizations, Iceberg has become the default table format for new analytical workloads.

2. Adoption is strong. Operations are fragmented

Architectural benefits are clear. Operational consistency is not.

Most organizations rely on internally built scripts or manually orchestrated workflows to handle:

Data compaction and optimization
Snapshot expiration
Data retention & lifecycle
Access and governance controls
Disaster recovery

These approaches work at modest scale, but they become fragile as environments expand to thousands of tables and petabyte-scale storage.

3. Iceberg usage is accelerating

Most respondents plan to migrate additional datasets to Iceberg in the next 12 months.

Growth drivers include:

AI and ML training pipelines
Product analytics and customer telemetry
Consolidation of legacy warehouse systems
GDPR, CCPA, and HIPAA retention requirements

As table counts and data volumes increase, manual operational approaches become harder to sustain.

Operational complexity scales faster than many teams expect.

Download the full report

The complete research report provides a deeper analysis of:

Production scale benchmarks
AI and ML workload patterns
Snapshot lifecycle and retention practices
Governance enforcement approaches
Operational tooling strategies across multi-engine environments

If you are responsible for operating Iceberg-managed data in production, this report will help you benchmark your current state and anticipate the next phase of operational complexity.

👉 Download The State of Apache Iceberg in the Enterprise (2026)

Table of Contents

Example H2

Example H3

Get the latest posts straight to your inbox

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Browse other blogs

News

Announcing the Ryft Context Layer

Ryft already monitors Iceberg lakehouses for optimization and observability. That means we already collect the signals that matter most for context: schema and structure, query patterns across every engine (Spark, Trino, Snowflake, Athena), write and ingestion behavior, freshness, and statistics. It's the same information a senior analyst would use to understand a table, captured in real time, at infrastructure scale.The Lakehouse Context Layer combines these signals into rich, agent-readable context for every table. Instead of starting from a blank documentation page, your tables come with context that reflects how the data actually behaves: what gets queried, how it's joined, how often it's updated, what the common access patterns look like.

Guy Yasoor

Yuval Yogev

April 11, 2026

April 7, 2026

News