DataGuard: Guaranteeing Private Training in Systolic-array Based Accelerators

2026-06-15Hardware Architecture

Hardware Architecture
AI summary

The authors explain that federated learning keeps your data on your device while training a model, and differential privacy adds noise to protect individual information. However, current methods trust a third-party app to handle privacy correctly, risking data leaks. To fix this, the authors propose DataGuard, a hardware tool that makes sure only privacy-safe data leaves the device without trusting the app. They tested DataGuard and found it adds very little extra hardware or slows performance only a tiny bit.

Differential PrivacyFederated LearningPrivacy BudgetMachine LearningGradient ClippingNoise AdditionHardware SecurityData LeakagePrivacy PreservationModel Training
Authors
Pawan Kumar Sanjaya, Christina Giannoula, Nikhil Shreekumar, Ian Colbert, Alec Dewulf, Mehdi Saeedi, Ihab Amer, Gabor Sines, Nandita Vijaykumar
Abstract
Differential privacy (DP) and federated learning (FL) have emerged as important privacy-preserving approaches when using sensitive data to train machine learning (ML) models. FL ensures that raw sensitive data does not leave the users' devices by training the model locally on the device. DP ensures that the model does not leak any information about an individual by clipping and adding noise to the gradients before updating the model. It provides formalism to constrain privacy loss during training to a privacy budget determined a priori by the owner of sensitive data. However, real-life deployments of FL algorithms typically assume that a third-party FL application can be trusted to correctly implement DP algorithms. Thus, the third-party application is given full access to sensitive data. In this work, we propose DataGuard, a hardware-based mechanism that guarantees that the only data that can leave the device is the result of computation that meets DP requirements. DataGuard can thus be used to ensure that the privacy budget defined by the data owner is not exceeded during FL training without the need to trust a third-party application. We evaluate DataGuard in simulations of four accelerators for various ML models and demonstrate only small area overheads of less than 0.01\% and performance slowdowns of less than 0.3\%.