Learning Filters with Certainty

2026-06-22 • Artificial Intelligence

Artificial IntelligenceDistributed, Parallel, and Cluster ComputingPerformance

AI summaryⓘ

The authors explain that common data tools called Bloom filters usually tell you if something is probably in a set or definitely not, but they don't show how sure they are about it. They focus on a special kind called Counting Bloom Filters, which keep numbers instead of just yes/no bits. These numbers help estimate how confident the filter is when it says an item is in the set. The authors suggest using this confidence information to improve systems combining Bloom filters with machine learning models.

Bloom filterCounting Bloom Filterhash collisionsmembership testingfalse negativescountersmachine learningdata structurescertainty estimation

Authors

Yuval Banoun, Daniel Sadoc Menasche, Ori Rottenstreich

Abstract

Hash-based data structures such as Bloom filters are widely used in network systems for tasks including caching, anomaly detection, and machine learning pipelines. They typically provide binary indications of whether an element belongs to a set of interest, e.g., the contents of a cache. When uncertainty arises due to hash collisions, a positive indication is returned to avoid false negatives. We argue that the certainty associated with such indications can itself be useful information. This work focuses on Counting Bloom Filters (CBFs), a Bloom-filter variant that maintains counters rather than bits. Besides supporting insertions and deletions, these counters provide additional information that can be used to estimate the certainty of positive membership indications. We show how this certainty signal can be exploited in architectures that combine Bloom Filters with machine learning (ML) models.

View PDFOpen arXiv