Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning

2026-06-02 • Machine Learning

Machine LearningArtificial Intelligence

AI summaryⓘ

The authors focus on improving metro system expansions by using a simple form of reinforcement learning instead of complex deep learning methods. They show that their approach works well because the problem size is small enough, making it faster, easier to understand, and less costly in terms of computation and environment. They also include fairness in their design goals and test their method on real cities like Xi'an and Amsterdam. Their results are competitive with deep learning but use far fewer resources.

Metro Network Expansion ProblemTransport Network Design ProblemReinforcement LearningNon-Markovian Rewards Decision ProcessTabular RLDeep Reinforcement LearningSocial EquityCombinatorial OptimizationTravel DemandReward Function

Authors

Dimitris Michailidis, Sennay Ghebreab, Fernando P. Santos

Abstract

We tackle the Metro Network Expansion Problem (MNEP), a subset of the Transport Network Design Problem (TNDP), which focuses on expanding metro systems to satisfy travel demand. Traditional methods rely on exact and heuristic approaches that require expert-defined constraints to reduce the search space. Recently, deep reinforcement learning (Deep RL) has emerged due to its effectiveness in complex sequential decision-making processes-it remains, however, computationally expensive, environmentally costly, and requires additional engineering to interpret. We show that MNEP problems are small enough to not require Deep RL methods. Reformulating the MNEP as a Non-Markovian Rewards Decision Process (NMRDP), we use tabular RL to achieve similar performance with significantly fewer training episodes, additionally offering greater interpretability. Additionally, we incorporate social equity criteria into the reward functions, focusing on efficiency and fairness, highlighting the versatility of our method. Evaluated in real-world settings-Xi'an and Amsterdam-our method reduces total episodes by a factor of 18 and total carbon emissions by a factor of 12 on average, while remaining competitive with Deep RL. This approach offers a replicable, modular, interpretable, and resource-efficient solution with potential applications to other combinatorial optimization problems.

View PDFOpen arXiv