Deterministic vs. LLM-Controlled Orchestration for COBOL-to-Python Modernization

2026-05-11Software Engineering

Software EngineeringMultiagent Systems
AI summary

The authors studied two ways of updating old COBOL programs to Python using AI models. One way lets the AI fully control the steps (agentic orchestration), while the other uses a fixed process (deterministic orchestration). They found that the fixed process is just as accurate, more reliable, and much cheaper to run. This means sticking to a strict plan can be better than letting the AI decide everything in tasks with clear checks. Their work helps understand how to make software modernization more stable and cost-effective.

COBOLPythonLarge Language ModelsSoftware ModernizationExecution OrchestrationDeterministic ExecutionAgentic WorkflowsFunctional CorrectnessRobustnessToken Consumption
Authors
Naing Oo Lwin, Rajesh Kumar
Abstract
Modernizing legacy COBOL systems remains difficult due to scarce expertise, large and long-lived codebases, and strict correctness requirements. Recent large language model (LLM)-based modernization systems increasingly rely on agentic workflows in which the model controls multi-step tool execution. However, it remains unclear whether delegating execution control to the LLM improves correctness, robustness, or efficiency in structured software engineering workflows. We present a controlled empirical study of deterministic and LLM-controlled orchestration for COBOL-to-Python modernization. Using a unified experimental framework, we hold the language models, prompts, tools, configurations, and source programs constant while varying only the execution control strategy. This isolates orchestration as the sole experimental variable. We evaluate both approaches using functional correctness, robustness across repeated stochastic runs, and computational efficiency. Across multiple models, deterministic orchestration achieves comparable computational accuracy to LLM-controlled orchestration while improving worst-case robustness and reducing performance variability across runs. Deterministic execution also reduces token consumption by up to 3.5x, leading to substantially lower operational cost. These results suggest that, in structured modernization workflows with explicit validation stages, fixed execution policies provide more stable and cost-efficient behavior than fully agentic orchestration without reducing translation quality.