Phase-Aware Guidance Injection for Recurrent MAPPO in Assembly-Line Disruption Recovery

2026-06-15Artificial Intelligence

Artificial Intelligence
AI summary

The authors focus on helping assembly lines recover faster from problems like machine breakdowns or missing workers. They created a new system that improves decision-making by mixing different types of advice—like rules or online language models—right when decisions are made, without changing the main scheduling program. Their tests show this method helps especially when high-quality rule advice is available, and even imperfect advice can still be useful. This approach aims to reduce delays and keep deliveries on time during disruptions.

Assembly line schedulingDisruption recoveryRMAPPODecision-time guidanceLogit-level action biasReinforcement learningLarge language modelsOn-time delivery
Authors
Xin Huang, Yongcai Wang, Fengyi Zhang, Zhikun Tao, Yunjun Han, Naiqi Wu
Abstract
Disruption recovery in industrial assembly lines requires timely decisions under machine faults, worker absence, and emergency orders. Existing methods either rely on rigid handcrafted recovery logic or learn adaptive policies that do not readily exploit heterogeneous external recovery knowledge at decision time to reduce abnormal recovery time (ART) and preserve on-time delivery (OTD). To address this gap, we propose a phase-aware guidance injection framework that augments a trained recurrent MAPPO (RMAPPO) scheduling policy through logit-level action bias during evaluation. The framework provides a unified decision-time interface for rule-based, replay-based, and online LLM-based guidance, while activating intervention only during abnormal and recovery phases. Experiments on a custom AssemblyLineEnv show that high-quality rule guidance yields the strongest gains, replay-based guidance degrades smoothly under imperfect availability, and online LLM guidance still provides useful intermediate improvements. These results show that decision-time guidance injection can exploit heterogeneous recovery hints without redesigning the actor.