
TL;DR
It’s always fascinating to attach numbers to feelings.
Working on Ryft, I’ve had a lot of feelings about how the Apache Iceberg ecosystem is evolving. Today I’m happy to share that I also have some numbers.
In January 2026, we surveyed 252 senior data leaders who are operating an Iceberg data lake in production. A few things are clear: Iceberg delivers strong performance, multi-engine flexibility, and a foundation for AI and ML workloads at scale.
But, these benefits also come with a new set of challenges.
Most organizations still rely on custom scripts and internal tooling to manage compaction, metadata growth, snapshot lifecycle, retention enforcement, and access controls. As table counts and data volumes grow, operational complexity grows with them.
Why Apache Iceberg Matters in the Enterprise Today
Apache Iceberg is now foundational infrastructure in many enterprise data platforms.
What began as a table format adopted by lakehouse pioneers now defines how enterprises manage large-scale analytical data. Iceberg-managed data supports business-critical workloads, real-time analytics, customer analytics, and AI and ML workloads.
Data engineering leaders are standardizing on Iceberg because they need:
- A unified data layer for their entire organization
- Efficient data access at scale
- Multi-engine access across Spark, Trino, Flink, and cloud-native engines
Until now, however, there has been limited data-backed insight into how enterprises actually operate Iceberg-managed data at production scale.
This report addresses that gap. Also, we really love graphs.
About the State of Apache Iceberg in the Enterprise Report
We commissioned an independent research firm to survey 252 senior data leaders actively responsible for Iceberg in production.
Respondents include VPs, directors, platform leads, and engineering managers responsible for their company’s data platform.
The research focuses on real-world production behavior:
- Adoption patterns and workload mix
- Table counts and data growth trajectories
- Data management practices
- Governance enforcement approaches
- Operational tooling strategies
The survey documents how teams operate product Iceberg environments.
Key findings
1. Iceberg is a core part of the enterprise data platform
Iceberg is no longer positioned as an emerging technology.
Survey respondents report using Iceberg-managed data for large-scale analytics, ML feature stores, customer telemetry, and regulated datasets. Satisfaction levels are high. Most report measurable improvements in query performance and data reliability after migrating from legacy Hive or proprietary warehouse systems.
For many organizations, Iceberg has become the default table format for new analytical workloads.
2. Adoption is strong. Operations are fragmented
Architectural benefits are clear. Operational consistency is not.
Most organizations rely on internally built scripts or manually orchestrated workflows to handle:
- Data compaction and optimization
- Snapshot expiration
- Data retention & lifecycle
- Access and governance controls
- Disaster recovery
These approaches work at modest scale, but they become fragile as environments expand to thousands of tables and petabyte-scale storage.
3. Iceberg usage is accelerating
Most respondents plan to migrate additional datasets to Iceberg in the next 12 months.
Growth drivers include:
- AI and ML training pipelines
- Product analytics and customer telemetry
- Consolidation of legacy warehouse systems
- GDPR, CCPA, and HIPAA retention requirements
As table counts and data volumes increase, manual operational approaches become harder to sustain.
Operational complexity scales faster than many teams expect.
Download the full report
The complete research report provides a deeper analysis of:
- Production scale benchmarks
- AI and ML workload patterns
- Snapshot lifecycle and retention practices
- Governance enforcement approaches
- Operational tooling strategies across multi-engine environments
If you are responsible for operating Iceberg-managed data in production, this report will help you benchmark your current state and anticipate the next phase of operational complexity.
👉 Download The State of Apache Iceberg in the Enterprise (2026)
Browse other blogs

Announcing Ryft Data Retention & Compliance Enforcement for Apache Iceberg
Today, we’re introducing two new capabilities in Ryft: Automated Data Retention and Data Compliance Enforcement for Apache Iceberg™. These features integrate directly into the Ryft platform to ensure efficient, policy-driven data deletion and compliance, working seamlessly alongside table maintenance and optimization.


.avif)

.avif)