Fine-Tuning Over Architectural Complexity: Broad-Coverage PII Detection on PIIBench with DeBERTa

2026-05-25Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors studied how well different methods detect personal information in diverse types of text. They tested three versions of a model called DeBERTa on a big, corrected dataset with many types of personal data. Surprisingly, the simplest way—just fine-tuning the model directly—worked better overall than more complex approaches that added special conditions or training steps. Their results suggest having lots of varied training data and a straightforward training setup is more important than fancy model designs for finding personal information in different texts. The best model found more correct personal data types compared to others they tested.

Personally Identifiable Information (PII)DeBERTaToken ClassificationFine-tuningCross-entropy lossEntity RecognitionMulti-source DatasetF1 ScoreHierarchical ModelCurriculum Learning
Authors
Pritesh Jha
Abstract
Personally identifiable information (PII) detection systems are frequently trained within narrow source or domain boundaries, limiting coverage when deployed on heterogeneous text. We study model fine-tuning on a corrected multi-source PIIBench preparation spanning 82 retained entity types across ten source datasets. We evaluate three DeBERTa-based approaches: direct token classification fine-tuning, a source-conditioned hierarchical model (SC+H), and a three-phase curriculum extension (SC+H+Curr). Against eight published comparator systems on a reproducible 5,000-record held-out subset (test_5k), direct fine-tuned DeBERTa achieves F1 0.6476, while SC+H and the curriculum variant achieve 0.5899 and 0.2772 respectively; the strongest published comparator reaches only 0.1723. Because validation initially favoured SC+H, we perform a final streamed evaluation on the complete 100,002-record held-out split. Direct fine-tuning remains superior, achieving F1 0.6455 versus 0.5894 for SC+H. Entity-level analysis shows that direct fine tuning wins 54 of 82 fine entity types and all ten coarse groups by support-weighted entity F1, while SC+H retains localised advantages on 28 types. The results indicate that diverse task-specific training data and a simple weighted cross-entropy objective contribute more to broad-coverage PII detection than the tested architectural and curriculum complexity.