CelerLog: Fast Log Parsing via Dynamic Routing

2026-05-25Software Engineering

Software Engineering
AI summary

The authors address the challenge of turning messy computer logs into neat, organized data. They noticed that many logs are repetitive and simple, so their method, called CelerLog, quickly handles these using fast statistical tools. Only the complicated logs get sent to a slow, smart language model for deeper understanding. This approach makes parsing much faster and cheaper while still being accurate, as tested on many datasets.

log parsingsemantic inferenceLLM (Large Language Model)statistical analysisdynamic routinglog analysisDrain parsertoken consumptioncomputational efficiency
Authors
Shiwen Shan, Yintong Huo, Minxing Wang, Zhiying Wu, Yuxin Su, Zibin Zheng
Abstract
Log parsing is a fundamental step for automated log analysis, which transforms raw log messages into structured formats. Existing syntax-based parsers struggle with complex logs because they lack semantic reasoning ability. Emerging LLM-powered semantic parsers achieve high accuracy but suffer from prohibitive latency and token costs because they apply semantic inference across all logs. Our key observation is that not all logs necessitate complex semantic understanding: a vast majority of logs exhibit repetitive patterns that can be extracted via straightforward statistical analysis. Driven by this insight, we propose CelerLog, a fast and effective log parser. CelerLog introduces a dynamic routing mechanism to classify logs into dense and sparse groups. Logs with strong statistical patterns (dense groups) are processed by an efficient statistical processor, whereas the sparse groups lacking such patterns are routed to an LLM for semantic inference. This hybrid strategy avoids unnecessary LLM invocations. Extensive experiments on 14 public datasets show that CelerLog achieves leading performance over state-of-the-art baselines and is 7.9x to 18.6x faster than LLM methods and up to 1.5x faster than Drain. Additionally, it reduces costs by decreasing token consumption by 80.2% - 94.1% and LLM invocations by 86.4% - 90.9%.