Training Prompt Matters: State-Adaptive Optimization for Robust Fine-Tuning

2026-06-01 • Computation and Language

Computation and Language

AI summaryⓘ

The authors found that during training, the exact wording of prompts matters more than previously thought. Even prompts that mean the same thing can affect how well a language model remembers old tasks and learns new ones differently. They showed that some prompts consistently lead to better learning and can be predicted beforehand. Based on this, they created a method called SAPO that adjusts prompts dynamically during training, helping models learn better and forget less. Their tests showed SAPO works well across many different challenges.

Large Language ModelsPrompt EngineeringFine-tuningCatastrophic ForgettingGeneralizationTask LossPrompt OptimizationDynamic Task FormulationState-Adaptive Learning

Authors

Wenhang Shi, Yiren Chen, Shuqing Bian, Zhe Zhao, Jinhao Dong, Pengfei Hu, Wei Lu, Xiaoyong Du

Abstract

While prompt engineering is instrumental in maximizing the capabilities of Large Language Models (LLMs) during inference, the role of prompts during training remains critically underexplored. Prevailing fine-tuning paradigms typically treat training prompts as mere surface forms, assuming that semantically equivalent instructions yield identical learning outcomes. However, we reveal that this equivalence is deceptive: while paraphrased prompts often lead to comparable in-task performance, they induce drastically different cross-task impacts regarding catastrophic forgetting and generalization. Crucially, these impacts are positively correlated across tasks, indicating the existence of superior prompts that consistently yield better performance. Furthermore, we discover that these superior prompts can be robustly identified by task loss prior to learning. Leveraging these insights, we introduce State-Adaptive Prompt Optimization (SAPO), a lightweight yet effective training strategy that shifts task formulation from a static input to a dynamic, state-adaptive variable. Comprehensive experiments on diverse benchmarks confirm its effectiveness, which significantly mitigates forgetting while improving generalization, achieving substantial performance gains over state-of-the-art methods. These results provide insights into how training prompts shape learning dynamics and offer a practical recipe for robust fine-tuning. Our code is available at https://github.com/Eric8932/SAPO.

View PDFOpen arXiv