ProbeScale: Probing Analysis to Optimize Neural Scaling Laws for Efficient Small Language Model Inference
2026-06-01 • Computation and Language
Computation and LanguageArtificial IntelligenceMachine Learning
AI summaryⓘ
The authors introduce ProbScale, a method to find smaller, efficient parts inside bigger language models that still work well for specific tasks. They use a mix of model probing techniques and knowledge about how model performance scales with size to pick which layers are most useful. By selecting these important layers, they create smaller subnetworks that keep almost the same performance as the full models but use much fewer parameters. Tests on popular models like RoBERTa-Large and T5-Base show that ProbScale can reduce model size by 5 to 10 times while maintaining 95% to 98% of original task performance.
Small Language ModelsNeural Scaling LawsModel ProbingSubnetwork SelectionParameter EfficiencyRoBERTaT5Layer ImportanceDownstream TasksPerformance Trade-off
Authors
Sourav Das
Abstract
Small Language Models (SLMs) offer a balance between capability and computational feasibility. Neural scaling laws inform their optimal training, suggesting that they possess rich internal representations that scale with their size. However, deploying even these SLMs can be challenging under strict resource constraints. Language model probing provides methods for analyzing the linguistic knowledge encoded in a model's internals. We propose ProbScale, a framework that unifies insights from scaling laws and probing to identify parameter-efficient subnetworks within pre-trained SLMs. ProbScale utilizes the high-quality representations of well-scaled SLMs and uses task-specific probes to mathematically quantify the relevance of each layer for target downstream capabilities. This allows selecting subnetworks that optimally trade off performance against parameter size. We formulate the subnetwork selection as finding a layer subset maximizing aggregated, task-weighted probe performance under a parameter budget. Experiments on representative SLMs such as RoBERTa-Large and T5-Base demonstrate that ProbScale identifies subnetworks achieving significant parameter reduction, from 5 to 10 times, while maintaining high performance (95% to 98% of the original SLMs) on targeted tasks, outperforming heuristic baselines.