The Art of Mixology: Mixup-based Obfuscation for Privacy-Preserving Split Learning in Large Language Models

2026-06-15Computation and Language

Computation and Language
AI summary

The authors propose MIXGUARD, a new way to train large language models by splitting tasks between a user and a server while protecting user data. Their method mixes and hides information at different levels to keep learning useful signals but stop the server from seeing private data. They tested MIXGUARD on various language tasks and models, showing it keeps performance high and defends better against attacks trying to reconstruct private data than previous methods. It also works well even when attackers try to adapt to it.

Split learningLarge Language Models (LLMs)Privacy-preservingData reconstruction attacksMixupToken-level obfuscationGradient perturbationFine-tuningModel utilityAdaptive attacks
Authors
Chen Chen, Xiang Gao, Xianshun Wang, Chengran Li, Shengyu Xia, Xueluan Gong, Linru Zhang, Qian Wang, Kwok-Yan Lam
Abstract
Split learning provides a practical paradigm for resource-constrained users to train Large Language Models (LLMs) by offloading computation-intensive layers to a server while keeping raw data local. However, existing privacy-preserving split learning methods still face a difficult trade-off among utility, privacy, efficiency, and stability. Specifically, these methods often suffer from substantial utility degradation, remain vulnerable to advanced data reconstruction attacks, incur prohibitive computational and communication overhead, or exhibit unstable performance across different tasks. In this paper, we propose MIXGUARD, a novel mixup-based privacy-preserving split learning framework for LLMs. MIXGUARD introduces token-level obfuscation, representation-level obfuscation, and adaptive gradient perturbation mechanisms, which operate jointly to preserve useful learning signals while preventing privacy leakage to the server. Technically, MIXGUARD first constructs a lightweight calibration model on a public dataset to refine the approximated target representation, and then applies this model during privacy-preserving fine-tuning on private data. We conduct extensive experiments on four classification tasks and four text generation tasks across multiple LLM families, model sizes, architectures, and fine-tuning strategies. The results show that MIXGUARD preserves model utility comparable to non-split training baselines, consistently achieves stronger privacy protection than existing split learning defense methods against state-of-the-art data reconstruction attacks, and remains robust under adaptive attack settings.