SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models

2026-06-03 • Cryptography and Security

Cryptography and SecurityArtificial Intelligence

AI summaryⓘ

The authors created SharedRequest, a way to keep prompts private when people use large language models like ChatGPT. Instead of hiding each prompt separately, they mix real prompts with noisy ones and group similar instructions together, which helps keep information safe while saving time and cost. This method works with any language model without needing to change how the model is built. Their tests showed it keeps responses useful and reduces the cost of running many queries at once.

large language modelsprivacy-preserving inferencebatch processingdifferential privacyprompt privacymodel-agnosticnoisy dataquery costChatGPT

Authors

Peihua Mai, Xuanrong Gao, Youlong Ding, Xianglong Du, Wei Liu, Yan Pang

Abstract

With the widespread deployment of public large language models (LLMs) such as ChatGPT, protecting user prompt privacy has become an increasingly critical issue. Existing privacy-preserving inference methods sacrifice either utility or efficiency, and often require model-specific modifications that limit their compatibility. In this paper, we propose SharedRequest, a model-agnostic framework for privacy-preserving LLM inference that reformulates privacy protection at the batch level rather than the individual-prompt level. The key idea is to obscure sensitive information by mixing original prompts with noisy variants, while grouping semantically equivalent instructions to amortize the inference cost over a large batch of queries with minimal impact on LLM response quality. This design is independent of the LLM architecture, requiring no access to model parameters or architectural modification. Empirical results demonstrate that SharedRequest achieves over $20\%$ higher utility compared to prior differential privacy baselines, and its shared-prompt mechanism reduces query cost by up to $5\times$ compared to non-batched inference.

View PDFOpen arXiv