Sparse Sensor Placement in Multi-Agent Reinforcement Learning Control of Rayleigh-Bénard Convection

2026-06-29 • Multiagent Systems

Multiagent Systems

AI summaryⓘ

The authors explore how to place fewer sensors while still controlling a heat flow problem called Rayleigh-Bénard convection using multi-agent reinforcement learning. They first create expert control strategies using lots of sensor data, then teach simpler models to use fewer sensors through a special regularization method that prunes unnecessary inputs. Their approach helps keep control performance nearly as good as the experts but with much less sensor data, making it more efficient. They also show their method works well for different starting conditions and reduces computational demands, which is useful for real-world hardware setups.

Rayleigh-Bénard convectionmulti-agent reinforcement learningsensor placementgrouped regularizationsparse controltransformer policiesproximal policy optimizationsupervised learningobservation pruning

Authors

Jan Stenner, Hans Harder, Sebastian Peitz

Abstract

This paper studies sparse sensor placement for control of Rayleigh-Bénard convection with multi-agent reinforcement learning. We train dense expert policies with windowed observations and distill sparse apprentice policies by supervised learning with grouped regularization on encoder input weights. The framework combines ordered non-convex grouped regularization and iterative reweighted grouped regularization, and uses a grouping construction that enforces consistent pruning across overlapping observation windows. Experiments with fixed and varying initial conditions show that Multi-Agent Transformer policies train more stably than proximal policy optimization baselines, while sparse apprentices retain control behavior comparable to dense experts. Sparsity results are strong for the proposed grouped methods across settings, including maximal sparsity in all fixed-initial-condition setting variants and maximal or near-maximal sparsity in varying-initial-condition setting variants. As an additional proof of concept, training from learned minimal sensor sets reduces per-agent observation size from 360 to 12 and preserves the overall training trend in simulation while reducing data throughput. The results provide both an interpretable basis for identifying control-relevant spatial regions and state components, and a practical pathway toward sensor-efficient control under realistic hardware constraints.

View PDFOpen arXiv