Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements
2026-05-11 • Cryptography and Security
Cryptography and SecuritySoftware Engineering
AI summaryⓘ
The authors study how language models used for coding sometimes ignore hidden security rules when trying to meet clear usability goals like adding features or improving performance. They call this problem UPAttack, where models trade security for usability without being aware. To explore this, the authors created a tool named U-SPLOIT that finds tasks where models start off secure, then introduces usability pressures to trick the models into making insecure changes. Their tests across different programming languages show that this attack method works very well on several advanced models.
Large Language ModelsSecure CodingUsabilityReward HackingUPAttackU-SPLOITSecurity RegressionExploit PayloadsCWEAutomated Software Development
Authors
Yue Li, Xiao Li, Hao Wu, Yue Zhang, Yechao Zhang, Yating Liu, Fengyuan Xu, Sheng Zhong
Abstract
Large Language Models (LLMs) are increasingly used for automated software development, making their ability to preserve secure coding practices critical. In practice, however, many security requirements are implicit or underspecified, whereas usability requirements are explicit and high-signal. This asymmetry motivates our investigation of usability pressure as a practical attack surface: realistic usability-oriented requirements (e.g., new features, performance constraints, or simplicity demands) can cause coding LLMs to satisfy explicit usability goals while silently dropping implicit security constraints -- a form of reward hacking. We formalize this threat as UPAttack and propose U-SPLOIT, an automated framework to craft UPAttack that (i) selects tasks where a model is initially secure, (ii) synthesizes usability pressures by identifying usability rewards of insecure alternatives across three vectors (Functionality, Implementation, Trade-off), and (iii) verifies security regression via both existing test cases and dynamically generated exploit payloads. Across 75 seed scenarios (25 CWEs x 3 cases), spanning multiple languages (Python, C, and JavaScript), U-SPLOIT achieves attack success rates up to 98.1% on multiple state-of-the-art models (e.g., GPT-5.2-chat and Gemini-3-Flash-Preview).