CANS: Accelerating Multiuser Collaborative Edge Inference via Cooperative Autodidactic NeuroSurgeon

2026-06-08Machine Learning

Machine LearningArtificial IntelligenceDistributed, Parallel, and Cluster Computing
AI summary

The authors study how multiple mobile devices can work together with an edge server to run parts of deep learning models faster despite changing network and device conditions. They created a system called Cooperative Autodidactic NeuroSurgeon (CANS) that lets devices share information and learn the best way to split their model tasks during use. To make this learning more efficient, they also developed an algorithm that groups similar devices and uses past experience to start smarter. Their tests show that CANS reduces delay compared to other methods, cutting inference time on real devices by up to half. This means smarter and quicker AI responses on phones or other gadgets with limited power.

Mobile edge computingDeep neural network inferenceModel partitioningWireless networksDevice heterogeneityFederated learningOnline learningInference latencyRegret boundEdge server
Authors
Zheshun Wu, Ziyang Zhang, Changyao Lin, Zenglin Xu, Jie Liu
Abstract
Recently, mobile edge computing (MEC)-enabled collaborative deep neural network (DNN) inference has emerged as a promising approach for delivering intelligent services to resource-constrained mobile devices. A representative scenario is multi-user collaborative edge inference, where distinct devices independently partition their DNN models and offload backend computation to a common edge server over wireless networks. However, determining the optimal DNN partition for each device is challenging due to unknown and time-varying system conditions, including fluctuating wireless links and diverse device capabilities. To address this problem, we propose Cooperative Autodidactic NeuroSurgeon (CANS), a collaborative edge inference framework that enables devices to adaptively learn optimal DNN partitions by sharing informative feedback during online inference. To handle the challenge of device heterogeneity and better leverage offline inference experience, we integrate a novel FedLinUCB-DW algorithm that groups devices of the same type and warm-starts online exploration using local offline early-exit inference experience. Furthermore, we provide theoretical guarantees for FedLinUCB-DW by deriving the regret upper bound. We also validate our method on both a simulated environment and a hardware prototype system. Empirical evaluations demonstrate that CANS achieves lower inference latency compared to state-of-the-art baselines. Especially, in prototype experiments on two edge devices, the proposed CANS reduced average inference latency by up to 50% compared to the non-cooperative baseline.