AuAu: A Benchmark for Auditing Authoritarian Alignment in Large Language Models

2026-06-15Computation and Language

Computation and LanguageArtificial IntelligenceMachine Learning
AI summary

The authors created AuAu, a new test to check if large language models (LLMs) show or encourage authoritarian attitudes. They used three types of tests: psychology questions, scenario-based behavior checks, and real user prompts. Testing 17 models from different countries, they found most models showed strong authoritarian responses in the psychology tests, but less so in real-world tasks. They also showed that telling the models to be authoritarian easily made many respond more that way. The authors highlight the importance of regularly checking LLMs to reduce unwanted authoritarian outputs.

large language modelsauthoritarianismbenchmarkpsychometricscontextual vignettessystem promptsbehavior evaluationmodel auditingresponse generation
Authors
Andreas Einwiller, Max Klabunde, Florian Lemmerich
Abstract
The worldwide surge of authoritarianism, combined with the increasing central role in users' everyday lives, raises the question of to what extent specific models exhibit or promote authoritarian attitudes and characteristics. We introduce AuAu, a comprehensive benchmark that aims to assess the risk of LLMs generating responses with authoritarian tendencies. This benchmark combines three evaluation approaches: (i) psychometric questions from an extensive pool of 15 human validated instruments; (ii) contextual behavior vignettes probing intended actions in concrete situations; and (iii) responses to realistic user prompts. Unlike prior work, AuAu evaluates not only a general closeness towards authoritarianism but also the established sub-concepts Authoritarian Aggression, Authoritarian Submission, and Conventionalism. Evaluating 17 models from China, the EU, Russia, and the USA, we find that all tested models exhibit substantial authoritarian response rates under the psychometric evaluation, though rates drop significantly in increasingly more realistic downstream task. We further find that an authoritarian system prompt easily manipulates 15 out of 17 models to promote increased authoritarianism. Our results underscore the need for continued, systematic auditing of LLM-based AI systems to detect and ultimately mitigate undesired authoritarian tendencies in generated output. Our code and data are available at: https://github.com/andreaseinwiller/AuAu