Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

2026-06-17Multiagent Systems

Multiagent SystemsArtificial IntelligenceDatabases
AI summary

The authors describe a system called Data Intelligence Agents (DIA) designed to simplify how people work together to understand and use company data. Instead of just writing code, DIA's agents create, test, and fix actual data tools automatically, while keeping notes so experts can review their work. They focus mainly on the Query Generator agent, which writes database queries, and show that it performs as well or better than current top methods on several tests involving different types of database languages. Their approach works well because it learns from past experience and uses clear instructions in everyday language. This system is already being used by businesses.

Data Intelligence AgentsAutonomous Coding AgentsSQL benchmarksQuery GeneratorData schemaEnterprise dataExecution validationShared memoryNatural-language instructions
Authors
Anoushka Vyas, Aarushi Dhanuka, Sina Khoshfetrat Pakazad, Henrik Ohlsson
Abstract
Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterprise data. We present Data Intelligence Agents (DIA), a system of three agents (Data Interpreter, Schema Creator, and Query Generator) that compresses this workflow by treating autonomous coding agents (ACAs) as a first-class abstraction: rather than emitting text, the agents generate, execute, validate, and repair concrete artifacts, draw on a shared memory for experience reuse, and surface each for review by domain experts. DIA is deployed in production for enterprise customers. We study the Query Generator in depth and evaluate it in fully autonomous mode across seven SQL benchmarks spanning four task categories and four dialects. It matches or surpasses the best published results on all seven, demonstrating that an architecture grounded in execution, built on ACAs and a shared memory, generalizes across the data intelligence workload with adaptation confined to natural-language instructions.