Scaling Datasets for Multi-Sensor, Multi-Agent, and Multi-Domain Learning in Autonomous Systems

2026-06-03 • Machine Learning

Machine Learning

AI summaryⓘ

The authors created a system that produces very large sets of labeled data for studying how multiple robots or sensors work together in complex environments. Their tool uses a popular driving simulator and a software framework to generate data from ground vehicles, drones, and infrastructure sensors. This data can be customized for different setups and conditions, helping researchers train and test perception and teamwork in autonomous systems. They also showed examples of how this data supports specialized training and collaboration between multiple agents.

multi-agent systemssensor fusionautonomous vehiclesdata simulationCARLA simulatorperceptionground truth labelsmulti-sensor datadata generation pipelinecollaborative autonomy

Authors

R. Spencer Hallyburton, David Hunt, Miroslav Pajic

Abstract

Existing datasets cannot support large-scale learning in multi-agent, multi-sensor, or multi-domain autonomy, where diversity and coordination are essential. We present a modular dataset generation pipeline that creates terabyte-scale, ground-truth-labeled data for ground, aerial, and infrastructure-based systems using the AVstack framework and CARLA simulator. Supporting single- and multi-agent configurations with flexible sensor suites, the pipeline enables controllable experimentation across challenging conditions. Representative perception and fusion studies show how generated data can support application-specific training and collaborative autonomy.

View PDFOpen arXiv