From Weights to Activations: Is Steering the Next Frontier of Adaptation?

2026-04-15Computation and Language

Computation and Language
AI summary

The authors explain that changing how language models work after they're trained is usually done by updating their settings or tweaking their inputs. They focus on a newer method called steering, which changes the model's internal signals when it is running, without changing the model itself. The authors argue that steering is actually a type of adaptation and compare it to traditional methods using specific criteria. They highlight that steering allows quick, reversible changes in behavior without altering the model's parameters, helping to organize all adaptation methods into one clear system.

language modelspost-training adaptationfine-tuningparameter-efficient adaptationpromptingsteeringinternal activationsinference timemodel parametersadaptation methods
Authors
Simon Ostermann, Daniil Gurgurov, Tanja Baeumel, Michael A. Hedderich, Sebastian Lapuschkin, Wojciech Samek, Vera Schmitt
Abstract
Post-training adaptation of language models is commonly achieved through parameter updates or input-based methods such as fine-tuning, parameter-efficient adaptation, and prompting. In parallel, a growing body of work modifies internal activations at inference time to influence model behavior, an approach known as steering. Despite increasing use, steering is rarely analyzed within the same conceptual framework as established adaptation methods. In this work, we argue that steering should be regarded as a form of model adaptation. We introduce a set of functional criteria for adaptation methods and use them to compare steering approaches with classical alternatives. This analysis positions steering as a distinct adaptation paradigm based on targeted interventions in activation space, enabling local and reversible behavioral change without parameter updates. The resulting framing clarifies how steering relates to existing methods, motivating a unified taxonomy for model adaptation.