AI summaryⓘ
The authors present AgentPLM, a method that improves protein language models by allowing them to check their work during sequence generation using tools that predict protein structure and function. They introduce two key ideas: Reasoning-Augmented Decoding, which mixes sequence generation with calls to these tools, and Contrastive Agent Policy Optimisation, a training method that helps the model learn when to trust feedback. AgentPLM was tested on various protein design tasks and outperformed previous models, especially in antibody design, showing it can correct its mistakes on-the-fly without needing to start over. This makes protein generation more accurate by incorporating feedback about the protein's physical properties as it is created.
Protein language modelsAutoregressive generationEnzyme designAntibody optimisationThermostabilityProtein-protein interaction (PPI)Oracle feedbackContrastive learningFoldXAutoDock Vina
Authors
Sahil Rahman, Maxx Richard Rahman
Abstract
Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external biophysical feedback or redirect generation when a candidate violates thermodynamic or structural constraints. We introduce AgentPLM, which addresses this by equipping a pre-trained PLM with i) Reasoning-Augmented Decoding (RAD), which interleaves autoregressive generation with tool calls (ESMFold, FoldX, AutoDock Vina), and ii) Contrastive Agent Policy Optimisation (CAPO), a trajectory-level extension of direct preference optimisation that trains the policy end-to-end to learn when oracle feedback is informative rather than merely imitating high-fitness sequences. We evaluate AgentPLM on benchmark tasks spanning de novo enzyme design, antibody optimisation, thermostability, PPI interface design, and zero-shot fitness prediction with standardised oracle APIs and controlled sequence-identity splits. AgentPLM achieves state-of-the-art results with a gain in antibody top-10% hit rate over the strongest passive baseline, providing mechanistic evidence of online error correction without explicit backtracking.