Differential Spectral Damping Gap Adaptive Regularization for Ill-Conditioned Kernel Methods

2026-06-22Machine Learning

Machine Learning
AI summary

The authors address a problem in kernel methods like LSTSVM where matrix inversion becomes unstable due to rapidly shrinking eigenvalues, making standard regularization methods less effective. They introduce Differential Spectral Damping (DSD), a new way to adjust regularization based on how trustworthy each eigenvector is, using ideas from the Davis-Kahan theorem. Testing shows that DSD improves classification accuracy notably on real datasets and performs comparably on data reconstruction tasks under high noise. They also identify the conditions where DSD works best and when simpler methods are sufficient, giving practical advice for its use.

Kernel methodsLeast-Squares Twin Support Vector Machines (LSTSVM)Matrix inversionEigenvalue decayTikhonov regularizationDifferential Spectral Damping (DSD)Davis-Kahan theoremSpectral gapsRegularizationCondition number
Authors
Praveg Vashishtha
Abstract
Kernel methods requiring matrix inversion -- particularly Least-Squares Twin Support Vector Machines (LSTSVM) -- suffer from exponential eigenvalue decay in their system matrices, producing severely ill-conditioned problems where standard Tikhonov regularization applies uniform damping regardless of eigenvector reliability. We propose Differential Spectral Damping (DSD), a regularization formula that adapts its penalty to localized eigengap structure: preserving eigenvectors with large spectral gaps (reliable per Davis-Kahan perturbation theory) while aggressively suppressing those with small gaps (directionally corrupted beyond recovery). We motivate DSD through a principled design procedure grounded in the Davis-Kahan $\sin(Θ)$ theorem, systematically deriving the requirements for a reliability-aware damping function and selecting the exponential form for its smoothness, differentiability, and natural saturation properties. Through rigorous paired testing with fairly optimized baselines (including gradient-optimized Tikhonov receiving equal optimization opportunity), we demonstrate that DSD improves LSTSVM classification accuracy by +4.8 percentage points on real-world GINA ($d=970$, Cohen's $d = 4.49$, $p < 0.0001$), +10.4 percentage points at $d=200$, and +2.6 percentage points on Madelon ($d=500$) -- all using only principled spectral initialization while Tikhonov receives grid search. For pre-image reconstruction on manifold data, DSD ties Tikhonov at high perturbation noise ($p=0.99$) but slightly underperforms at lower noise levels; both reduce naive inversion error by $66\times$. We characterize the precise operating regime ($d \geq 100$, condition number $> 10^3$) and document where simpler methods suffice, providing practitioners with clear deployment guidance.