AI summaryⓘ
The authors studied how vision-language models handle gender biases when shown unclear images, like people dressed in gear or seen from behind. They found that even when these models internally recognize female-associated traits, they often output male stereotypes, especially for certain jobs. To understand this, the authors created a new method (LALS) to look inside the models and see how gender associations change across different layers. They discovered that male signals get stronger near the output, while female signals appear earlier but get suppressed later, influenced by things like clothing color. This shows a hidden bias inside the models that doesn't always match their final answers.
vision-language modelsgender biaslatent associationLALS metrictoken activationsinternal representationslayer-wise analysisoccupation stereotypeszero-shot evaluationcolor ablation
Authors
Arnau Marin-Llobet, Simon Henniger, Mahzarin R. Banaji
Abstract
Alignment teaches vision-language models (VLMs) to avoid expressing demographic biases, and when gender is clearly visible they largely succeed. Far less is known about ambiguous inputs (a worker in full gear, a figure seen from behind) cases common in practice yet rarely studied. We find that minimal prompting pressure exposes occupation-gender defaults when prompting ambiguous input images, with models collapsing to male even for strongly female-stereotyped occupations. But do these outputs reflect what models actually encode internally? We introduce LALS (Latent Association Leaning Score), a zero-shot metric that projects visual-token activations into the model's text-embedding space to measure concept associations per token and layer. Across 15 occupations, over 800 gender-ambiguous images, and four VLMs, internal representations and outputs are systematically decoupled: models often encode a female association internally yet output male. Layer-wise analysis reveals an asymmetric filter -- male signal amplifies end-to-end while female signal peaks mid-network and is suppressed before generation -- and a color ablation shows that culturally loaded visual cues such as clothing color further modulate these internal associations.