Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

2026-06-29Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors studied how language models mix up grammatical gender (like masculine or feminine noun endings) with social gender bias in languages such as Spanish. They focused on separating these two issues in advanced language models that understand context, not just single words. To do this, they created special datasets using both controlled sentences and real Wikipedia texts featuring inanimate nouns. They also developed new methods and metrics to better measure and reduce the overlap between grammatical gender and social bias. Their findings show that certain methods, like using unweighted controlled contexts and a centroid estimator, work best to isolate grammatical gender from social meanings.

contextual language modelsgrammatical gendersocial semantic biasgender debiasingstatic word embeddingscontextual embeddingsSupport Vector Machine (SVM)Linear Discriminant Analysis (LDA)centroid estimatorgender direction
Authors
Huanping Xiao, Yingji Li
Abstract
Contextual language models conflate grammatical gender and social semantic bias in gendered languages such as Spanish. Existing gender debiasing approaches only operate on static word embeddings leaving contextual representations unexplored for this two dimensional gender disentanglement. To address the this issue, we make the first attempt to disentangle grammatical gender from semantic contamination for contextual embeddings. We construct both controlled templates and natural Wikipedia contexts to build balanced datasets of inanimate nouns, and design a framework equipped with centroid, Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) gender direction estimators as well as contamination-aware weighting strategies. A set of dual-objective evaluation metrics is proposed to balance the suppression of grammatical gender leakage on inanimate nouns and the preservation of semantic gender distinctions for occupation terms. The results reveal that unweighted controlled contexts yield the purest grammatical gender direction, and the centroid estimator achieves better performance than discriminative baselines.