Engineering Robustness into Personal Agents with the AI Workflow Store

2026-05-11Cryptography and Security

Cryptography and SecurityArtificial Intelligence
AI summary

The authors argue that current AI agents work by quickly making plans and acting in real-time, but this rush can skip careful testing and design steps used in traditional software engineering. This may cause AI agents to act like rough prototypes rather than reliable tools, especially in important situations. They suggest adding thorough engineering processes to create dependable and secure AI systems that are reusable across many users. The authors propose an "AI Workflow Store" with tested and reliable AI routines that agents can call upon, which could be more trustworthy than always generating new plans from scratch. They identify challenges in balancing flexibility with reliability that need new research beyond the quick, on-the-fly method.

AI agentson-the-fly synthesissoftware engineeringiterative designadversarial evaluationstaged deploymentworkflowreliabilitysecurityAI Workflow Store
Authors
Roxana Geambasu, Mariana Raykova, Pierre Tholoniat, Trishita Tiwari, Lillian Tsai, Wen Zhang
Abstract
The dominant paradigm for AI agents is an "on-the-fly" loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts. We argue that this paradigm short-circuits disciplined software engineering (SE) processes -- iterative design, rigorous testing, adversarial evaluation, staged deployment, and more -- that have delivered the (relatively) reliable and secure systems we use today. By focusing on rapid, real-time synthesis, are AI agents effectively delivering users improvised prototypes rather than systems fit for high-stakes scenarios in which users may unwittingly apply them? This paper argues for the need to integrate rigorous SE processes into the agentic loop to produce production-grade, hardened, and deterministically-constrained agent *workflows* that substantially outperform the potentially brittle and vulnerable results of on-the-fly synthesis. Doing so may require extra compute and time, and if so, we must amortize the cost of rigor through reuse across a broad user community. We envision an *AI Workflow Store* that consists of hardened and reusable workflows that agents can invoke with far greater reliability and security than improvised tool chains. We outline the research challenges of this vision, which stem from a broader flexibility-robustness tension that we argue requires moving beyond the ``on-the-fly'' paradigm to navigate effectively.