Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions

2026-05-11 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors study how large language models (LLMs) often say they follow certain values but act differently, creating a 'value-action gap.' They identify a problem called 'Pseudo-Deliberation,' where models seem to reason thoughtfully but don't actually behave accordingly. To analyze this, they created VALDI, a tool with many scenarios and tests to measure how well models stick to their stated values in conversations. They found that many models still misalign their words and actions, and proposed VIVALDI, a system that checks and fixes value alignment during response generation.

large language modelsvalue-action gappseudo-deliberationmodel alignmentVALDI frameworkvalue adherence metricsmulti-agent systemVIVALDIbehavioral alignmentdialogue generation

Authors

Sushrita Rakshit, Hanwen Zhang, Hua Shen

Abstract

Large language models (LLMs) are often evaluated based on their stated values, yet these do not reliably translate into their actions, a discrepancy termed "value-action gap." In this work, we argue that this gap persists even under explicit reasoning, revealing a deeper failure mode we call "Pseudo-Deliberation": the appearance of principled reasoning without corresponding behavioral alignment. To study this systematically, we introduce VALDI, a framework for measuring alignment between stated values and generated dialogue. VALDI includes 4,941 human-centered scenarios across five domains, three tasks that elicit value articulation, reasoning, and action, and five metrics for quantifying value adherence. Across both proprietary and open-source LLMs, we observe consistent misalignment between expressed values and downstream dialogues. To investigate intervention strategies, we propose VIVALDI, a multi-agent value auditor that intervenes at different stages of generation.

View PDFOpen arXiv