Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference
2026-06-01 • Distributed, Parallel, and Cluster Computing
Distributed, Parallel, and Cluster ComputingArtificial Intelligence
AI summaryⓘ
The authors studied how mistakes caused by soft errors can affect the performance of large language models (LLMs) when they are used in scientific computing tasks. They created a tool called LLMFI to purposely inject errors and see how these mistakes spread across different models and tasks like reasoning and coding. Their experiments revealed common patterns in how errors impact LLM outputs and suggested four software-only ways to make LLMs more reliable. This work helps improve our understanding of LLM robustness and guides future efforts to detect and fix errors.
Large language modelsSoft errorsFault injectionHigh-performance computingModel inferenceError propagationReliabilityCode generationMultilingual tasksScientific computing
Authors
Yafan Huang, Sheng Di, Guanpeng Li
Abstract
Large language models (LLMs) are increasingly integrated into high-performance computing (HPC) workflows, accelerating scientific discovery through diverse perspectives such as code generation and domain-specific decision-making. Yet, how soft errors propagate and affect LLM inference remains largely unexplored. To bridge this gap, we present a comprehensive study on error propagation in LLM inference, enabled by our proposed LLMFI, a configurable and deterministic fault-injection framework. Using LLMFI, we systematically inject faults across three open-weighted LLMs and thirteen representative tasks, covering reasoning, multilingual, mathematical, and coding domains. In addition, we conduct fine-grained case studies that reveal critical vulnerability patterns. Overall, our study yields 17 takeaways that advance the understanding of error propagation in LLM inference and introduces four low-overhead directions to improve reliability through software-only modification, offering practical guidance for future error detection and mitigation.