Ranked MSO-enumeration over compressed words

2026-06-02Data Structures and Algorithms

Data Structures and AlgorithmsDatabasesLogic in Computer Science
AI summary

The authors show that for certain logical queries on strings, you can quickly list answers in a specific order even when the string is stored in a compressed form using a special grammar method. Their method preprocesses the compressed string in linear time and then outputs each answer with a constant delay. This is the first time ranking queries on compressed data have been handled this efficiently. They also extend the result to special string-to-string functions called polyregular functions. Their proofs use a tool called factorization trees, adapted to work with compressed strings.

MSO-queryranked query enumerationgrammar compressionstraight-line programcontext-free grammarpolyregular functionlinear preprocessingconstant delayfactorization treescompressed data
Authors
Markus Lohrey
Abstract
It is shown that the ranked query enumeration problem for a fixed MSO-query on strings can be solved with linear preprocessing and constant delay in the grammar-compressed setting, where the input string is given by a so-called straight-line program, i.e., a context-free grammar that produces exactly one string. Moreover, `ranked' means that the output tuples of the MSO-query are printed in a specific order that has to be MSO-definable. This is the first result for ranked query enumeration on compressed data. A corollary of this result is that for a fixed polyregular function $f$ and a word $w$ that is given by a straight-line program of size $n$, one can list after preprocessing time $\mathcal{O}(n)$ the symbols in $f(w)$ from left to right with constant delay, which generalizes a result of Bojanczyk for the case where $w$ is uncompressed. The proofs for these results are based on factorization trees, which are made accessible to the grammar-compressed setting (a contribution of independent interest).