OpenZL: Using Graphs to Compress Smaller and Faster

2026-05-11Information Retrieval

Information RetrievalDatabases
AI summary

The authors explain that while recent research has improved lossless data compression, these methods are often too slow and resource-heavy for real-world use. They introduce a new approach called the "graph model" of compression, which organizes compression processes as a network of simple building blocks called codecs. Their system, OpenZL, can quickly create specialized compressors that work well on specific types of data while remaining easy to develop and maintain. Tests show OpenZL is faster and sometimes more effective than existing compressors, and it has already improved compression tasks internally at Meta, saving development time.

lossless compressionthroughputapplication-specific compressorsdirected acyclic graphcodecsself-describing wire formatOpenZLdeep learning compressorsdata-intensive applicationscompression ratio
Authors
Yann Collet, Nick Terrell, W. Felix Handte, Danielle Rozenblit, Victor Zhang, Kevin Zhang, Yaelle Goldschlag, Jennifer Lee, Elliot Gorokhovsky, Yonatan Komornik, Daniel Riegel, Stan Angelov, Nadav Rotem
Abstract
In the last few decades, research techniques have improved lossless compression ratios by significantly increasing processing time. However, these techniques have not gained popularity in industry because production systems require high throughput and low resource utilization. Instead, real world improvements in compression are increasingly realized by building application-specific compressors which can exploit knowledge about the structure and semantics of the data being compressed. Application-specific compressor systems outperform even the best generic compressors, but these techniques have severe drawbacks -- they are inherently limited in applicability, are hard to develop, and are difficult to maintain and deploy. In this work, we show that these challenges can be overcome with a new compression strategy. We propose the "graph model" of compression, a new theoretical framework for representing compression as a directed acyclic graph of modular codecs. OpenZL implements this framework and compresses data into a self-describing wire format, any configuration of which can be decompressed by a universal decoder. OpenZL's design enables rapid development of application-specific compressors with minimal code. Experimental results demonstrate that OpenZL achieves superior compression ratios and speeds compared to state-of-the-art general-purpose compressors on a variety of real-world datasets. Compared to ratio-focused deep-learning compressors, OpenZL is competitive on ratio while being many orders of magnitude faster. Internal deployments at Meta have also shown consistent improvements in size and/or speed, with development timelines reduced from months to days. OpenZL thus represents a significant advance in practical, scalable, and maintainable data compression for modern data-intensive applications.