AI summaryⓘ
The authors study how private data about the same person, spread across multiple separate databases each using differential privacy, can still be combined to reveal that person's identity. They develop a new privacy concept called cross-silo person-level differential privacy (XSP-DP) to model this situation. Their results show there is a sharp threshold on the number of databases (k) after which identifying the person is possible, depending on the size of the population and privacy settings. They also show that even when each individual database's data looks uninformative, combining them can leak information, and coordination between databases matters for attacks to succeed. This sets a theoretical baseline for privacy risk across multiple private data sources.
differential privacydata silocross-silo person-level privacylocal differential privacyrandomized responsemutual informationde-anonymizationphase transitionFano's inequalitymaximum-likelihood estimation
Abstract
When a person's records appear in k independent data silos, each protected by (epsilon, delta)-differential privacy, standard composition yields a valid (k*epsilon, k*delta)-DP guarantee for the joint output. This worst-case bound, however, does not answer the concrete inference question: at what k can an adversary actually identify a target person? This paper develops the information-theoretic framework needed to answer that question. We introduce cross-silo person-level DP (XSP-DP), a Pufferfish-style privacy notion whose adjacency relation captures all records of a single person across all silos simultaneously, and verify that the standard basic composition bound carries over to this adjacency model. Within this framework we prove that de-anonymization undergoes a phase transition at k* = Theta(log n / epsilon^2) (population size n, per-silo RR parameter epsilon): a Fano lower bound shows any estimator fails for k << k*, while a matching maximum-likelihood upper bound shows the attack succeeds for k >> k*. An explicit XOR + randomized-response construction demonstrates information synergy: each silo's output is individually uninformative about the target, yet the joint mutual information is strictly positive. For non-coordinated binary randomized-response mechanisms, we prove that de-anonymization is inevitable once k exceeds the threshold, establishing that cross-silo coordination is necessary. These results provide a baseline threat model and Theta-level threshold for cross-silo inference attacks under local DP.