From Statute to Control Flow: Span-Grounded Deontic Trees for Defeasible Scope Parsing

2026-06-08Computation and Language

Computation and LanguageArtificial IntelligenceComputational Engineering, Finance, and Science
AI summary

The authors study a problem where AI models follow rules but silently ignore important exceptions, causing errors in tricky cases. They created NormBench, a test set that includes laws and policies in multiple languages to better spot these missed exceptions by teaching models to understand which rules override others. Their new method, using Span-Grounded Deontic Trees, helps models track exceptions more accurately, but improvements mainly show up in cases where exceptions actually matter. Overall, the work focuses on improving how AI systems understand complex legal rules, especially when exceptions come into play.

Silent Scope Omissiondefeasible scope parsingSpan-Grounded Deontic Treeslegal NLPrule-following agentsexceptions in rulescontrol flow in legal textsNormBenchlarge language modelsauditability
Authors
Jian Chen, Siyuan Li, Chucheng Wan, Zixuan Yuan
Abstract
Rule-following agents tasked with executing policies and regulations often fail via Silent Scope Omission (SSO): a model applies a general rule but silently drops nested exceptions or counter-exceptions, producing outputs that appear compliant yet break on important edge cases. Although such failures are often framed as an agentic-systems problem, the underlying bottleneck is statutory and policy understanding, a capability typically studied in legal NLP. However, most existing legal NLP benchmarks emphasize end-task outcomes, which can overlook the structural omissions that cause SSO. To diagnose and mitigate SSO, we introduce NormBench, a benchmark of 2,290 provisions spanning Chinese (laws and local policies), English (U.S. tax law, GDPR, and corporate policies), and cross-lingual settings, designed for defeasible scope parsing: identifying precisely which clause overrides which. NormBench uses Span-Grounded Deontic Trees (SG-DT), a compiler-style intermediate representation that anchors every logical branch to source spans and requires explicit exclusion guards, enabling deterministic compilation and audit. Evaluations of frontier LLMs reveal two recurring pathologies: (1) Recursion Decay, where performance drops sharply as defeater depth increases, and (2) an Auditability Trap, where models retrieve relevant spans but fail to assemble correct control flow. Using SG-DT as a constrained intermediate output improves whole-tree fidelity and defeater recovery, and downstream experiments show that its utility is mechanism-specific: gains concentrate on exception-active, SSO-prone cases, while aggregate accuracy can be mixed when the added structure is unnecessary or parser fidelity is low.