Decoupling Inference from State Updates in Low-Latency Feature Engines via Probabilistic Thinning
2026-06-15 • Databases
DatabasesMachine Learning
AI summaryⓘ
The authors study how streaming Machine Learning systems update data when new information arrives continuously. They propose a method called probabilistic thinning, where only important events cause updates to permanent storage, reducing the number of costly operations. This approach works without complex coordination or heavy memory use, relying instead on approximate statistics stored on disk. Their experiments show that this method cuts down storage work by up to 90% while keeping the accuracy of the results intact or even slightly better.
Streaming dataMachine Learning pipelinesProbabilistic thinningState persistenceRead-modify-write operationsKey-value storesTime-based aggregationsLatencyStorage Input/OutputSerialization overhead
Authors
Augusto Peres, Iker Perez, Pedro Valdeira, Guilherme Jardim, Ana Sofia Gomes, Hugo Ferreira, Pedro Bizarro
Abstract
Streaming data systems increasingly underpin Machine Learning workflows that maintain large numbers of continuously updated aggregations. In production settings, each incoming event typically triggers read-modify-write operations to persistent storage, making high-frequency state updates a dominant source of latency, contention, and operational cost. In this work, we decouple inference from state persistence in streaming Machine Learning pipelines via probabilistic thinning: every event is scored, but durable state updates are selectively triggered by informative events. Unlike approaches that shed input or state, we show that persistence-path control is achievable without a high-frequency in-memory control plane or cross-worker coordination, relying exclusively on approximate statistics retrieved from disk-backed key-value stores. We model the resulting stochastic processes, derive bounds on filtering rates, and prove that common time-based aggregations remain unbiased under variance-aware formulations, preventing systemic error accumulation. We evaluate the approach in a controlled setting that isolates per-event costs, demonstrating substantial reductions in storage Input/Output and serialization overhead. Across experiments, up to 90% of events are excluded from the persistence path while preserving and in some cases improving downstream utility.