On the Limits of Prompt-Conditioned Language Models as General-Purpose Learners

2026-06-22 • Machine Learning

Machine Learning

AI summaryⓘ

The authors explain that large language models (LLMs) can't solve every task just by using language prompts because language is limited in how much information it can convey. They show with a game theory model that when a task is too complex to fully describe with language, the LLM will inevitably make mistakes, no matter how much data or computing power it has. They also find that safety and alignment rules can prevent the model from producing some correct answers, causing additional unavoidable errors. Overall, the authors argue that LLMs are not universal solvers through prompting alone and suggest that adding other types of inputs like images or memory could help overcome these limits.

Large Language ModelsPromptingCapacity-limited CommunicationCheap-talk GamePAC-Bayes BoundsTask InferenceAlignment ConstraintsExpressivity FloorObjective MisalignmentGeneralization Limits

Authors

David Mguni, Julian Ma, Jun Wang

Abstract

Large Language Models (LLMs) are frequently portrayed as general-purpose solvers capable of solving arbitrary tasks. We argue that this view overlooks a fundamental constraint: language is a compressed and capacity-limited interface for conveying task information. Modelling User--System interaction as a bilevel \emph{cheap-talk} game, we analyse how latent tasks are encoded into prompts and reinterpreted under alignment and safety constraints. We introduce a conceptual decomposition separating task inference from execution and derive PAC-Bayes bounds that distinguish finite-sample estimation error from irreducible structural limitations. Our first main result establishes an \emph{expressivity floor}: language acts as a capacity-limited communication channel, and whenever the informational complexity of a task family exceeds the capacity of that channel, distinct tasks become unavoidably indistinguishable to the Solver, inducing a strictly positive error floor that cannot be eliminated by additional data, optimisation, or model scaling alone. We then establish an \emph{objective-misalignment floor}: when alignment constraints restrict the admissible output set, the User-ideal distribution may lie outside the feasible class, inducing an irreducible distortion. Together, these results yield a formal negative conclusion: prompt-conditioned LLMs are not universal problem solvers through prompting alone, as there exist task families for which correct behaviour is provably unattainable even in the infinite-data regime. More broadly, our analysis shows the limits of prompt-based generalisation arise from information-constrained communication and alignment-constrained objectives. This suggests that interfaces beyond natural language, including multimodal observations and, external memory, may reduce the inherent LLM limitations by increasing the task-relevant information available to the System.

View PDFOpen arXiv