Ryft Blog

Announcing Ryft Adaptive Optimization

Yossi Reitblat
Yossi Reitblat
September 17, 2025
6
Mins read
News
Announcing Ryft Adaptive Optimization

Today, we’re officially introducing Ryft Adaptive Optimization - always-on, dynamic optimization engine for Apache Iceberg™. Our engine continuously compacts, rewrites, indexes, and reorders data based on how your tables are actually used, delivering up to 5× faster queries, 10x storage reduction, and 7x better compaction efficiency compared to other engines.

The Challenge with Manual Table Management

Managing Iceberg tables is an extremely important yet manual process requiring a lot of testing and constant tweaking to get right.

Data teams need to constantly ensure that each table is maintained and optimized, otherwise performance would plummet and costs would skyrocket. 

The challenge is that every table is unique:

  1. Different workload patterns: Batch processing, CDC streams, real-time ingestion, bulk merges, and large deletes all have distinct optimization needs.
  2. Varying data characteristics: Wide tables, highly compressible datasets, JSON columns, and high or low cardinality fields each require different approaches.
  3. Unpredictable access patterns: Query filters, group-bys, and joins constantly evolve, making static optimization strategies obsolete.

This forces teams into endless cycles of monitoring and manual tuning - adjusting compaction schedules, debugging failed maintenance jobs, tweaking compression settings, optimizing file sizes, indexing columns, and reordering data. It's time-consuming, error-prone, and doesn't scale - not a good spend of your time.

How Adaptive Optimization Works

Ryft's engine learns the unique characteristics of each table and automatically adapts optimization strategies accordingly. We analyze three key dimensions:

  1. Workload type: Streaming workloads need different compaction algorithms and schedules than batch processing.
  2. Data profiling: We continuously assess data volumes, compression ratios, column counts, and distribution patterns.
  3. Usage analytics: By collecting query patterns from your compute engines, we make informed decisions about how to optimize each table for maximum impact.

Looking Under The Covers

  • Targeted partition compaction: Only compact partitions that need it, based on file size, age, delete density, and scan frequency.
  • Predictive efficiency scoring: Evaluate optimization plans upfront and skip work that won't meaningfully improve performance or cost.
  • Intelligent delete file management: Rewrite equality and position deletes at the right time and for the right partitions, and clean up orphaned delete files.
  • Streaming-aware tiered compaction: Micro-merges for frequently accessed data, with opportunistic macro-compaction that preserves SLAs.
  • Dynamic resource allocation: Dynamically allocates resources based on data volume and properties, to achieve the best performance and cost. 
  • Smart partition prioritization: Partitions are prioritized for optimization based on recency, usage, and potential benefit.
  • Metadata optimization: Reduce query planning overhead by compacting manifests and managing metadata overhead.
  • Dynamic file sizing: Automatically tune target file sizes based on usage patterns - wider files for scan-heavy fact tables, smaller files for point lookups.

Real-World Impact

Optimize Only What Matters

For organizations managing hundreds, thousands or tens of thousands of tables, not every table needs constant optimization. Tables that update infrequently don't need constant compaction. Tables created and not updated or rarely used don't need ongoing maintenance.

Ryft’s engine identifies unused tables, optimizes them first and skips unnecessary, ongoing optimizations, ensuring effective use of precious compute resources and budgets.

CDC Replication Made Simple

Change Data Capture workloads present two unique challenges.

  • Many different partitions are frequently updated, making naive "last partition" optimization strategies ineffective.
  • Engineers need to monitor and fix commit conflicts when compaction jobs interfere with ongoing table updates.

Ryft monitors all partitions individually, triggering optimization only where changes occurred, while running specialized jobs to minimize commit conflicts and maintain table health.

Streaming Without the Headaches

Streaming workloads create two persistent problems: endless small files from frequent micro-batches, and commit conflicts between compaction and active writers.

Ryft’s engine detects streaming patterns and adapts its behavior to maintain read performance without interfering with data ingestion, ensuring your streaming pipelines stay healthy and performant.

Seamlessly Integrated with Compliance Rules

For tables requiring GDPR or CCPA compliance, and ongoing retention and tiering, Ryft’s engine coordinates optimization with data retention and compliance cleanup policies. This prevents job collisions and ensures all maintenance tasks are performed efficiently within a unified system.

Proven at Scale

Ryft Adaptive Optimization has been battle-tested in production with several large, multi-petabyte data lakes. The results consistently demonstrate:

  • Up to 5× faster queries on frequently accessed partitions through targeted compaction and sort rewrites.
  • ~7× higher compaction efficiency compared to other engines, delivering more performance per byte rewritten.
  • Up to 10× storage reduction through adaptive compression and comprehensive cleanup of snapshots, small files, metadata, and delete files.

Bytes processed by compaction for a 500GB streaming table ingested over a day.

Bytes processed by compaction for a 500GB streaming table ingested over a day.
Bytes scanned by queries from a table, before and after enabling Ryft Adaptive Optimization.

Ryft Adaptive Optimization is now available for everyone

Adaptive Optimization is generally available in Ryft for all of our customers. Unlock faster queries, predictable SLAs, and lower compute and storage costs, without the manual overhead. Contact us to enable Adaptive Optimization on your highest-value tables and see the impact in your environment.

Table of Contents