Mean-Shift PCA by Knockoff Mean
2026-05-25 • Machine Learning
Machine Learning
AI summaryⓘ
The authors show that when doing PCA (a method to simplify data), shifting the average of some data points can mess up the results. They found a way to add a special kind of fake noise to help spot and remove these average shifts. Using math from Random Matrix Theory, they prove the real important data features stay stable even with these shifts. Their method uses regular PCA steps, making it easier to clean data affected by mean shifts.
PCAmean shiftnoiseRandom Matrix TheoryeigenvalueseigenspaceRobust PCAhigh-dimensional datacovariance matrixmixture model
Authors
Mengda Li, Zeng Li, Jianfeng Yao
Abstract
Removing noise is difficult, but adding noise is easy. In this work, we show how to eliminate mean-shift noisy components from PCA by deliberately introducing knockoff mean-shift perturbation. Standard PCA is highly sensitive to shifts in the sample mean: a small fraction of samples from a shifted distribution can cause large deviations in the leading principal components. In high-dimensional regimes, existing Robust PCA approaches cannot handle the mean-shift contamination structure inherent in the mixture model. Using tools from Random Matrix Theory, we prove that the mean-shift spikes are spectrally separable from the stable eigenvalues of the original covariance. Furthermore, the original eigenspace remains asymptotically invariant to the contamination, independent of the mixture weight. Exploiting this spectral stability, we propose a simple, two-stage PCA algorithm by adding knockoff mean that identifies and removes the mean-shift component using only standard PCA operations.