Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables
2026-05-11 • Software Engineering
Software EngineeringComputation and Language
AI summaryⓘ
The authors studied how different structural aspects of configuration files affect how well AI coding agents follow instructions. They tested factors like file size, instruction placement, file organization, and conflicting info, but found no clear impact on the agents' compliance. Instead, they noticed that the more functions the agent generates during a session, the less likely it is to follow the instructions exactly, though this pattern isn't straightforward. These results were consistent across different codebases and models. Overall, the authors suggest that task type and session progress matter more than file structure for how these agents behave.
configuration filescoding agentscomplianceBayesian analysismixed-effects modelsfactorial studyTypeScriptfunction generationAI code models
Authors
Damon McMillan
Abstract
Frontier coding agents read configuration files (CLAUDE.md, AGENTS.md, Cursor Rules) at session start and are expected to follow the conventions inside them. Practitioners assume that structural choices (file size, instruction position, file architecture, contradictions in adjacent files) measurably affect adherence. We report a systematic factorial study of these choices using four manipulated variables, measuring compliance with a trivial target annotation across 1,650 Claude Code CLI sessions (16,050 function-level observations) on two TypeScript codebases, three frontier models (primarily Sonnet 4.6, with Opus 4.6 as a CLI-matched cross-model check and Opus 4.7 reported descriptively under a CLI-version confound), and five coding tasks. We use mixed-effects models with a Bayesian companion. None of the four structural variables or three two-way interactions produces a detectable contrast after multiple-testing correction. Size and conflict nulls are supported by affirmative-null Bayes factors (BF10 between 0.05 and 0.10); position and architecture nulls are failures to reject without Bayes-factor support. The largest effect we measured is within-session: each additional function the agent generates is associated with approximately 5.6% lower odds of compliance per step (OR = 0.944) within the session-length range we tested, though the relationship is non-monotonic rather than a constant per-step effect. This reproduces on a second TypeScript codebase and on Opus 4.6 at matched configuration; it was identified during analysis rather than pre-specified. Within the conditions tested, file-structure variables did not produce detectable contrasts; compliance varies systematically between coding tasks and across each session's sequence of generated functions.