ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design
2026-05-11 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors focus on designing proteins with specific functions by improving how protein language models (PLMs) can be guided to meet multiple goals without losing their ability to create good protein sequences. They introduce ProteinOPD, a method that teaches a model by combining lessons from several specialized teacher models into one student model, avoiding forgetting important information. This approach balances different design preferences efficiently and trains faster than previous methods that used reinforcement learning. Their experiments show this method works well without sacrificing the quality of protein designs.
protein language modelpreference alignmentcatastrophic forgettingon-policy distillationsynthetic biologydrug discoverymulti-objective optimizationreinforcement learningmodel distillationprotein design
Authors
Yulin Zhang, He Cao, Zihao Jiang, Chenyi Zi, Zhipeng Zhou, Zijing Liu, Yu Li, Jia Li, Ziqi Gao
Abstract
Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competing objectives. To address these issues, we draw inspiration from On-Policy Distillation (OPD), an advanced post-training method renowned for mitigating catastrophic forgetting through its mode-seeking nature. In this work, we propose ProteinOPD, a multi-objective preference alignment framework that can effectively balance multiple preference objectives while maintaining the inherent designability of PLMs. ProteinOPD adapts a pretrained PLM into preference-specific teachers and distills their knowledge into a shared student via token-level OPD on the student's own trajectories. During this process, the student is aligned to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This bridges the gap for OPD in multi-objective/teacher alignment. Extensive experiments show that ProteinOPD achieves substantial gains on target preference objectives without compromising the designability, with an 8x training speedup over RL-based alignment competitors.