Automated Generation of High-Quality Bug Reports for Android Applications
2026-04-01 • Software Engineering
Software Engineering
AI summaryⓘ
The authors created a tool called BugScribe to help improve bug reports for mobile apps, which are often unclear or missing important details. By giving BugScribe specific information about the app's screens and actions, it can automatically write clearer and more complete descriptions of what went wrong, what was expected, and how to reproduce the bugs. They tested BugScribe on real Android app bug reports and found it produced better results than the original reports and other newer methods. This tool aims to help developers fix bugs faster by providing more accurate and useful reports.
bug reportmobile applicationObserved BehaviorExpected BehaviorSteps to ReproduceLLM (Large Language Model)GUI interactionAndroid appbug trackingsoftware testing
Authors
Antu Saha, Atish Kumar Dipongkor, Sam Bennett, Kevin Moran, Andrian Marcus, Oscar Chaparro
Abstract
Most defects in mobile applications are visually observable on the device screen. To track these defects, users, testers, and developers must manually submit bug reports, especially in the absence of crashes. However, these reports are frequently ambiguous or inaccurate, often omitting essential components such as the Observed Behavior (OB), Expected Behavior (EB), or Steps to Reproduce (S2Rs). Low-quality reports hinder developers' ability to understand and reproduce defects, delaying resolution and leading to incorrect or unresolvable fixes. In this paper, we posit that providing specific app-related information (e.g., GUI interactions or specific screens where bugs appear) to LLMs as key points of context can assist in automatically generating clear, detailed, and accurate OB, EB, and S2Rs. We built and evaluated a novel approach, BugScribe, that generates bug reports in this way. To support the evaluation, we introduce a unified quality framework that defines correctness and completeness dimensions for OB, EB, and S2Rs. Using 48 bug reports from 26 Android apps, we show that BugScribe produces higher-quality and more accurate components than the original reports and outperforms recent LLM-based baselines. We envision that BugScribe can serve as a practical assistant for testers and developers by enhancing incomplete bug reports with reliable and accurate OB, EB, and S2Rs, thereby streamlining bug resolution and improving mobile app quality.