Towards Autonomous Railway Operations: A Semi-Hierarchical Deep Reinforcement Learning Approach to the Vehicle Rescheduling Problem

2026-05-11 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors tackle the problem of managing disruptions in busy railway networks, which is hard because there are many trains and limited tracks. They propose a new way to use reinforcement learning (RL) by splitting the problem into two parts: deciding when to dispatch trains and choosing their routes. This method helps the system learn better and coordinate trains more effectively than previous approaches. Tests on a simulator showed their method helps more trains reach their destinations with fewer deadlocks, even when the network is very crowded.

Vehicle Routing and Scheduling ProblemReinforcement LearningRailway Traffic ManagementMulti-Agent CoordinationDispatchingRoutingFlatland-RL SimulatorDeadlockHeuristic MethodsSemi-Hierarchical Formulation

Authors

Alberto Castagna, Stefan Zahlner, Adrian Egli, Christian Eichenberger, Daniel Boos, Manuel Meyer, Anton Fuxjager

Abstract

Managing disruptions in railway traffic management is a major challenge. Rising traffic density and infrastructure limits increase complexity, making the Vehicle Routing and Scheduling Problem (VRSP) difficult to solve reliably and in real time. While Operational Research (OR) methods are widely used, most dispatching still relies on human expertise due to the problem's exponential combinatorial complexity. Reinforcement Learning (RL) has gained attention for its potential in multi-agent coordination, but existing RL approaches often underperform OR methods and struggle to scale in dense rail networks. This paper addresses this gap from a machine learning perspective by introducing a semi-hierarchical RL formulation tailored to operational railway constraints. The method separates dispatching from routing through dedicated action and observation spaces, enabling policies to specialise in distinct decision scopes and addressing the imbalance between rare dispatch decisions and frequent routing updates. The approach is evaluated on the Flatland-RL simulator across five difficulty levels and 50 random seeds, with 7 to 80 trains. Results show substantially improved coordination, resource utilisation, and robustness compared with heuristic baselines and monolithic RL, nearly doubling the number of trains reaching their destinations, while keeping deadlock rates below 5% and adaptively sequencing, delaying, or cancelling trains under heavy congestion.

View PDFOpen arXiv