The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

2026-05-11Machine Learning

Machine Learning
AI summary

The authors investigate how data that changes slowly over time can help certain tricky problems in sparse learning be solved more easily using gradient methods. They focus on Boolean k-juntas, which are hard to learn when samples are random and independent. By generating samples through a lazy random walk, the authors show that a simple two-layer neural network trained with a special method comparing changes between consecutive samples can learn efficiently with a sample size growing linearly with the input size. They also find that standard gradient methods using usual loss functions don't gain this benefit from the temporal correlations.

Boolean k-juntasparse learninggradient-based methodslazy random walkhypercubetwo-layer ReLU networkstochastic gradient descent (SGD)temporal-difference losssample complexityconvex loss functions
Authors
Elisabetta Cornacchia, Dan Mikulincer, Elchanan Mossel
Abstract
We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.