Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

2026-05-25Software Engineering

Software EngineeringArtificial Intelligence
AI summary

The authors explain that making AI-powered software isn't just about creating one model or tool, but about managing the entire process continuously as the software runs and changes over time. They designed a system called a meta-engineering harness that breaks down work into clear rules, uses specialized AI agents for different tasks, checks work carefully, and learns from mistakes to get better. This approach was tested in a real setting helping small businesses with their websites and other software needs. The authors share how their system found problems in early tests and improved based on those findings, creating a reliable way to maintain AI-based software over time.

AI-native softwaresoftware productioncontract verificationmeta-engineeringAI agentsfailure classificationouter-loop calibrationsoftware maintenanceservice-as-softwareoperational context
Authors
Satadru Sengupta, Tamunokorite Briggs, Ivan Myshakivskyi
Abstract
AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insufficient for production environments where software must be continuously produced, verified, deployed, maintained, and adapted across many operational contexts and long time horizons. We present a meta-engineering harness: a software-production architecture that transforms operational and product feature requirements into explicit contracts, routes work through role-specialized AI agents, performs independent and adversarial verification, and continuously improves itself through structured failure classification and outer-loop calibration. The harness is designed for settings in which software delivery is not a one-time project but an ongoing operating function. In our motivating application, CTO-as-a-service for small service firms, the system manages websites, booking flows, payment systems, backoffice workflow automations, and AI-agent interfaces as continuously evolving technical infrastructure rather than one-off deliverables. We describe the layered architecture, including two-pass contract compilation, persistent markdown memory with specialization records, attention-based and independence-based verifications, a four-way failure arbiter, and outer-loop calibration. We report results from an early production deployment spanning 17 features over several weeks, including a detailed in-app payments case study that revealed contract incompleteness and verification-boundary issues. These observations directly drove targeted improvements to the harness. The contribution is an implemented, measurable, and extensible verification architecture for making AI-native service-as-a-software production reliable, auditable, and improvable over time.