Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions

2026-05-11Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors study how large language models (LLMs) often say they follow certain values but act differently, creating a 'value-action gap.' They identify a problem called 'Pseudo-Deliberation,' where models seem to reason thoughtfully but don't actually behave accordingly. To analyze this, they created VALDI, a tool with many scenarios and tests to measure how well models stick to their stated values in conversations. They found that many models still misalign their words and actions, and proposed VIVALDI, a system that checks and fixes value alignment during response generation.

large language modelsvalue-action gappseudo-deliberationmodel alignmentVALDI frameworkvalue adherence metricsmulti-agent systemVIVALDIbehavioral alignmentdialogue generation
Authors
Sushrita Rakshit, Hanwen Zhang, Hua Shen
Abstract
Large language models (LLMs) are often evaluated based on their stated values, yet these do not reliably translate into their actions, a discrepancy termed "value-action gap." In this work, we argue that this gap persists even under explicit reasoning, revealing a deeper failure mode we call "Pseudo-Deliberation": the appearance of principled reasoning without corresponding behavioral alignment. To study this systematically, we introduce VALDI, a framework for measuring alignment between stated values and generated dialogue. VALDI includes 4,941 human-centered scenarios across five domains, three tasks that elicit value articulation, reasoning, and action, and five metrics for quantifying value adherence. Across both proprietary and open-source LLMs, we observe consistent misalignment between expressed values and downstream dialogues. To investigate intervention strategies, we propose VIVALDI, a multi-agent value auditor that intervenes at different stages of generation.