ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents
2026-04-09 • Multiagent Systems
Multiagent SystemsComputation and LanguageSoftware Engineering
AI summaryⓘ
The authors looked at how different types of information help language models (LMs) improve automated software engineering tasks. They created a method called Oracle-SWE to separate and measure how each piece of information, like test results or code context, affects the success of these models. By also testing information provided by strong LMs, they aimed to understand what helps agents perform better in real coding scenarios. Their work helps focus future research on the most useful information signals for autonomous coding systems.
language modelsautomated software engineeringagentic workflowsoracle informationsoftware benchmarkstest signalsexecution contextAPI usage
Authors
Kenan Li, Qirui Jin, Liao Zhu, Xiaosong Huang, Yijia Wu, Yikai Zhang, Xin Zhang, Zijian Jin, Yufan Huang, Elsie Nallipogu, Chaoyun Zhang, Yu Kang, Saravan Rajmohan, Qingwei Lin, Wenke Lee, Dongmei Zhang
Abstract
Recent advances in language model (LM) agents have significantly improved automated software engineering (SWE). Prior work has proposed various agentic workflows and training strategies as well as analyzed failure modes of agentic systems on SWE tasks, focusing on several contextual information signals: Reproduction Test, Regression Test, Edit Location, Execution Context, and API Usage. However, the individual contribution of each signal to overall success remains underexplored, particularly their ideal contribution when intermediate information is perfectly obtained. To address this gap, we introduce Oracle-SWE, a unified method to isolate and extract oracle information signals from SWE benchmarks and quantify the impact of each signal on agent performance. To further validate the pattern, we evaluate the performance gain of signals extracted by strong LMs when provided to a base agent, approximating real-world task-resolution settings. These evaluations aim to guide research prioritization for autonomous coding systems.