Mitigating Bias in Locally Constrained Decoding via Tractable Proposals
2026-06-01 • Computation and Language
Computation and Language
AI summaryⓘ
The authors study how to make language models generate outputs that follow specific rules, like valid JSON structures. They point out that previous methods that enforce these rules step-by-step can cause problems by limiting the choices too much. Their new method uses a mathematical approach called sequential Monte Carlo sampling combined with special structures called finite automata, which can run efficiently on GPUs. This approach helps the model better follow the rules and produces correct outputs faster with fewer attempts. They tested their method on tasks like generating function calls and SQL queries and found it works better than older methods.
Large Language ModelsJSON SchemaLocally Constrained DecodingSequential Monte CarloFinite AutomataGPU AccelerationHidden Markov ModelsProbabilistic ModelingFunction CallingSQL Generation
Authors
Meihua Dang, Linxin Song, Honghua Zhang, Jieyu Zhao, Guy Van den Broeck, Stefano Ermon
Abstract
Generations from large language models often fail to conform to desired constraints such as JSON schema. Existing locally constrained decoding (LCD) approaches enforce constraints by myopically masking out next tokens, resulting in biased sampling and degradation in performance. Recent work uses sequential Monte Carlo (SMC) methods to mitigate such biases, but designing effective proposal distributions or potential functions remains a key challenge. In this work, we propose a generic approach to construct proposals and potentials for SMC sampling from $p_{\mathrm{lm}}( \cdot \mid \mathrm{constraint})$. First, we show that constraints specified as finite automata can be tensorized for efficient execution on GPUs, which we use to construct globally constrained decoding (GCD) proposals. In addition, leveraging the fact that tensorized finite automata share the same circuit structure as hidden Markov models, we circuit-multiply them to obtain the probabilistic GCD (P-GCD) proposals encoding both logical and probabilistic information about the target distributions. We evaluate (P-)GCD on the tasks of function calling, keyword-based generation, and SQL generation. Experiments show that under the same SMC sampling setup, compared to LCD proposals, (P-)GCD converges faster to the target distribution with significantly fewer particles.