The Illusion of Agentic Complexity in README.md Generation: Evaluating Single-Agent vs. Multi-Agent RAG Systems

2026-06-29Software Engineering

Software Engineering
AI summary

The authors studied how different system designs affect the generation of README files for GitHub projects using Large Language Models. They compared a simple Single-Agent system, a Multi-Agent System (MAS), and a version guided by developers' plans against an existing baseline. They found that the Single-Agent system is much faster and uses fewer resources but lacks some formatting quality that the MAS provides. Adding simple developer guidance improved overall documentation quality the most. This shows a trade-off between speed, resource use, and structural quality in automated documentation.

Large Language ModelsMulti-Agent SystemsRetrieval-Augmented GenerationREADME generationSoftware documentationTask decompositionAutonomous planningDeveloper-guided planningToken consumptionLexical quality
Authors
Abu Saleh, Tesfay Welegebreal Tesfay, Phuong T. Nguyen, Juri Di Rocco, Muhammad Umar Zeshan, Davide Di Ruscio
Abstract
Large Language Models (LLMs) are increasingly utilized to automate several software engineering tasks, including code completion, code summarization, testing, and the generation of repository-level documentation. While Multi-Agent Systems (MAS) are often adopted to support such tasks under the premise that task decomposition improves performance, the impact of architectural complexity on practical efficiency remains under-examined. This study empirically evaluates Retrieval-Augmented Generation (RAG) dependent architectures for the generation of README files for GitHub repositories. In this work, we conducted a systematic comparison between a Single-Agent pipeline, a specialized MAS, and a developer-guided planning (DevPlan) variant, benchmarked against LARCH -- a state-of-the-art baseline -- and the original ground truth. Results indicate a critical architectural trade-off: the Single-Agent pipeline achieves lexical quality comparable to MAS while reducing token consumption by 86% and operating at twice the speed. In contrast, manual taxonomy analysis demonstrates that MAS achieves high structural consistency (98%), resolving formatting issues observed in single-agent approaches. Autonomous planning is identified as the primary pipeline bottleneck; incorporating lightweight developer-guided plans produces the highest overall documentation quality, surpassing all the analyzed configurations.