EES-CND: Collaborative Neural Decision-Making for Drift-Aware Fault-Tolerant Edge-Cloud Service Placement
2026-06-01 • Distributed, Parallel, and Cluster Computing
Distributed, Parallel, and Cluster Computing
AI summaryⓘ
The authors studied how to keep computer services running smoothly when using a mix of edge devices and cloud servers, which can sometimes fail. They created a new method called EES-CND that uses multiple small neural networks working together to decide how to move services around during failures. Their approach also learns and adapts over time to handle changing conditions better. Tests showed their method recovers services faster and more reliably than other existing techniques.
edge-cloud computingservice placementfault toleranceneural networksevolution strategyadaptive modelsservice-level objectivesperformance driftredeployment strategies
Authors
Mohammadsadeq Garshasbi Herabad, Javid Taheri, Bestoun S. Ahmed, Calin Curescu
Abstract
The edge-cloud paradigm improves service delivery by orchestrating resources across edge nodes and cloud data centres. These environments consist of heterogeneous, interconnected computing nodes that cooperate to deliver continuous services. However, their scale and complexity increase vulnerability to failures from hardware malfunctions, software defects, and dynamic operating conditions. These failures can disrupt system configurations and service execution, leading to reduced reliability, performance degradation, and violations of service-level objectives. Ensuring service execution requires adaptive service placement strategies across edge-cloud resources. This study introduces a fault-tolerant service placement approach (Enhanced Evolution Strategy for Collaborative Neural Decision-making, EES-CND) for edge-cloud environments. The method employs collaborative decision-making, wherein multiple lightweight neural networks jointly infer redeployment strategies during failure events. To address the system dynamics and mitigate performance drift, adaptive models are updated online using an enhanced evolution strategy. Extensive simulations show that EES-CND effectively handles performance drift and significantly outperforms existing methods in service recovery time, response time, and reliability, achieving a 44.8\% reduction in fault-tolerance cost compared to standalone models.