Query-Limited Community Recovery in Stochastic Block Models

2026-06-01Information Theory

Information TheoryMachine LearningSocial and Information Networks
AI summary

The authors study how to perfectly find two communities in a network when they have limited and noisy information. They consider a setup where you can ask about neighbors of nodes, but these answers are incomplete and you can only ask a limited number of times. They find that just querying uniformly is not the best strategy and show an adaptive two-step method that uses fewer questions to succeed. When combined with a partial view of the network, adaptive queries focused on unclear parts can recover communities better than uniform queries, proving that smart data collection helps in exact recovery.

Stochastic Block ModelCommunity DetectionExact RecoveryNoisy OracleAdaptive QueryingUniform QueryingGraph SamplingInformation-Theoretic LimitsTwo-Stage StrategySublinear Queries
Authors
Sabyasachi Basu, Manuj Mukherjee, Lutz Oettershagen, Suhas Thejaswi
Abstract
We study exact community recovery in the two-community stochastic block model on $n$ vertices under limited and noisy access to network data. The learner may query a noisy neighborhood oracle that reveals each true neighbor of a queried vertex independently with fixed probability and never returns non-neighbors, subject to a finite query budget. We consider both oracle-only access and a combined model where the learner also observes a single subsampled copy of the underlying graph. For oracle-only access, balanced uniform querying gives a sharp non-adaptive benchmark: when each vertex is queried the same integer number of times, the observations reduce to an SBM with attenuated edge probabilities and the Abbe-Bandeira-Hall exact-recovery threshold applies. We show that this benchmark is not adaptively optimal: a two-stage adaptive strategy succeeds with $n+o(n)$ queries in a regime where balanced uniform querying requires $m n$ queries for some $m>1$. With an additional subsampled graph, we prove a sublinear-query adaptivity gap: balanced data-independent uniform querying with a sublinear budget does not improve over the subsampled graph alone, whereas adaptive querying can target a small set of uncertain vertices and achieve exact recovery. Thus adaptive data acquisition can strictly improve the information-theoretic limits of exact recovery.