TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization

2026-06-22 • Machine Learning

Machine LearningCryptography and Security

AI summaryⓘ

The authors introduce TROPT, an open-source tool that makes it easier to find specific text sequences that guide models toward certain goals. These text sequences are important for tasks like testing model vulnerabilities and understanding how models work. TROPT brings together many optimization methods under one easy-to-use system, allowing users to mix and match models, objectives, and optimizers. The authors show how TROPT helps compare different techniques and apply them to new problems, lowering the difficulty of working with discrete text optimization.

discrete optimizationtext-triggermodel red-teamingLLM jailbreakmodel interpretabilityoptimization algorithmsblack-box optimizationwhite-box optimizationloss functionscorpus poisoning

Authors

Matan Ben-Tov, Mahmood Sharif

Abstract

Discrete text-trigger optimization -- searching for text sequences that, when ingested by a model, steer it toward a specified objective -- underpins model red-teaming (e.g., LLM jailbreaks), as well as auditing and interpretability. However, the current state of discrete optimizers hinders their adoption and progress. First, existing optimizers, when open-sourced at all, are scattered across research codebases tied to specific models, objectives, and problem domains. Second, optimizer variants proliferate, each requiring engineering overhead to use or extend, and remaining hard to compare head-to-head. Together, these raise the bar for adopting optimizers in existing or new domains, and for advancing them via new strategies. We address these gaps with TROPT, the first open-source framework that unifies discrete optimizers' execution and standardizes their development under a single interface. TROPT makes it easy to customize end-to-end optimization recipes by swapping any component -- models, objectives, and optimizers -- extending its reach across domains and new applications. TROPT currently ships with 30+ optimization recipes -- covering applications such as jailbreaking and probing model internals -- built from 15+ optimizers (spanning white-box to black-box access) and 15+ losses, from foundational to state-of-the-art methods. Demonstrating its utility, we leverage TROPT in several studies: (i) controlled, large-scale experiments comparing and enhancing optimization strategies for LLM jailbreaks, revealing potent-yet-underadopted techniques; and (ii) porting optimizers from one domain (e.g., LLM jailbreak) to new domains (e.g., corpus-poisoning embedding model). In all, TROPT significantly lowers the barrier to adopting and advancing discrete text optimization.

View PDFOpen arXiv