UModel: An Agent-Ready Observability Data Modeling Method at Scale

2026-06-03Software Engineering

Software Engineering
AI summary

The authors created UModel, a new way to organize and connect different types of system data to help automatically find the cause of network failures. Unlike older methods that focus on scattered raw data, UModel uses a unified object-based system to link information and expert knowledge together in a meaningful way. They also developed U-SPL, a tool that lets software agents explore these connections easily. Testing on a challenge dataset showed their approach improved root cause detection accuracy by 8%. UModel has been successfully used at Alibaba Cloud, supporting many users and fast queries.

Root Cause Analysis (RCA)observabilityontological frameworksemantic graphtelemetryAIOpslarge language models (LLM)networked systemsdata modelingquery interface
Authors
Changhua Pei, Zheyuan Li, Zexin Wang, Hang Cui, Xiaohui Nie, Qi Zhou, Fang Situ, Cheng Zhang, Xin Zhang, Xidao Wen, Gaogang Xie, Jingjing Li, Dan Pei
Abstract
When networked system failures occur, automatically performing Root Cause Analysis (RCA) using observability data is critical for ensuring networked system reliability. Recently, LLM-based agents have shown promise for automating this diagnosis process through advanced reasoning and autonomous exploration. However, existing observability frameworks remain archaic, characterized by fragmented data silos, incompatible schemas, and insufficient semantic metadata, preventing agents from establishing the complex relationships required for effective RCA. To address these challenges, we present UModel, a unified ontological framework that shifts observability from data-centric to object-centric modeling. UModel constructs a virtual ontological layer where heterogeneous telemetry, entities, and expert knowledge are standardized as objects and interconnected via semantic graphs. In addition, we introduce U-SPL, a pipeline-based query interface that enables agents to autonomously explore system topologies and correlate multimodal data. By re-modeling the "AIOps 2025 Challenge" dataset using UModel, the precision of root cause localization improved by 8%, demonstrating that enhanced data organization can significantly increase the accuracy of downstream tasks. UModel provides a scalable modeling framework that, in its deployment at Alibaba Cloud for more than one year, has served tens of thousands of users, sustained millions of operations per second, and delivered sub-second query latency.