Ekka: Automated Diagnosis of Silent Errors in LLM Inference
2026-06-03 • Distributed, Parallel, and Cluster Computing
Distributed, Parallel, and Cluster ComputingArtificial IntelligenceSoftware Engineering
AI summaryⓘ
The authors study how software for running large language models (LLM) can have hidden bugs that don't cause clear errors but make the output worse. They suggest treating the problem like a 'differential debugging' task, where you compare a buggy system against a correct reference to find differences causing issues. Their tool, Ekka, automatically checks internal steps to find the root causes of silent errors. Tested on real-world examples, Ekka found the correct causes most of the time and even discovered new bugs confirmed by developers.
Large Language ModelsSilent ErrorsDifferential DebuggingSoftware StackOutput QualityBenchmarkReference ImplementationDiagnosis SystemServing FrameworkExecution States
Authors
Yile Gu, Zhen Zhang, Shaowei Zhu, Xinwei Fu, Jun Wu, Yida Wang, Baris Kasikci
Abstract
LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit error signals. Diagnosing silent errors is notoriously difficult due to the substantial semantic gap between the high-level symptoms and the low-level root causes. We observe that diagnosis of silent errors can be effectively framed as a differential debugging problem by leveraging the existence of semantically correct reference implementations. We propose Ekka, an automated diagnosis system that identifies root causes by systematically aligning and comparing intermediate execution states between a target and a reference framework. We constructed a benchmark of real-world silent errors from popular serving frameworks, where Ekka shows 80% pass@1 diagnosis accuracy and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems. Ekka also diagnoses 4 new silent errors from serving frameworks, all of which have been confirmed by the developers.