The Topology of Ill-Posed Questions: Persistent Homology for Detection and Steering in LLMs

2026-06-22Artificial Intelligence

Artificial Intelligence
AI summary

The authors explore how confusing or unclear questions can be detected and better handled by large language models (LLMs). They look inside the model’s internal workings, viewing the hidden states as shapes and using a math tool called persistent homology to capture their geometric features. They create simple summaries of these shapes to represent the question's complexity and use this to guide the model to clarify or avoid answering ill-posed questions. Their method improves the model’s ability to recognize unclear questions and respond appropriately across several tests. This shows that analyzing the shape of internal data can help models deal with confusing inputs.

Large Language ModelsIll-posed QuestionsPersistent HomologyTransformer LayersHidden StatesTopologyActivation SteeringAmbigQASituatedQA
Authors
Guangyu Jiang, Sizhe Tang, Mahdi Imani, Tian Lan
Abstract
Ill-posed questions, including ambiguous, underspecified, or contradictory queries, may admit no valid answer or multiple plausible answers, posing a challenge for large language models (LLMs). Existing approaches largely analyze ill-posedness through model outputs and often focus on specific subclasses. We investigate whether diverse sources of ill-posedness can be represented within a unified topology of LLM internal states and whether this structure can be used to steer response behavior. We model the contextual hidden states of prompt tokens at each transformer layer as a point cloud and characterize its geometry using finite zero-dimensional persistent homology. Each layer is summarized by three compact descriptors: mean finite lifetime, normalized lifetime entropy, and largest-lifetime concentration. Concatenating these descriptors across layers yields a topology representation of the question. We further introduce topology-conditioned activation steering, which retrieves topologically similar examples and constructs query-specific activation interventions that encourage source-aware clarification or abstention. Across three open-weight LLMs, topology features consistently outperform prompt-based and pooled-hidden-state baselines for ill-posedness classification, improving average accuracy from \(67.4\%\) to \(78.9\%\) on AmbigQA, from \(79.9\%\) to \(88.5\%\) on SituatedQA, and from \(57.6\%\) to \(69.6\%\) on CLAMBER 9-way classification. Topology-conditioned steering increases the average total acceptable response rate from \(61.4\%\) to \(70.6\%\) and grounded acceptable responses from \(11.9\%\) to \(16.4\%\). These results show that persistent homology provides both an interpretable representation of ill-posedness and an effective mechanism for targeted response steering.