Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models
2026-05-29 • Computation and Language
Computation and Language
AI summaryⓘ
The authors propose a new method called Semantic Triplet Restoration (STR) to help models better understand tables by turning each cell into a simple fact with a clear path for its row, column, and content. They also create TripletQL, a tool to pick the most helpful parts of these facts for answering questions. Their approach works as well or better than existing methods that use HTML for table understanding, especially helping smaller models and longer tables by using fewer tokens. This suggests their method makes table data clearer and easier to use for language models.
table question answeringsemantic relationshierarchical headersHTML serializationlanguage modelsSemantic Triplet RestorationTripletQLtoken efficiencyquery-aware routing
Authors
Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang
Abstract
Table question answering requires models to recover semantic relations encoded implicitly by two-dimensional layout, merged cells, and hierarchical headers. Current pipelines typically use HTML or Markdown as intermediate table representations, but these layout-oriented serializations introduce markup overhead and require large language models to infer header-cell alignments from row and column spans. We propose Semantic Triplet Restoration (STR), a protocol that rewrites each cell as an atomic fact <item path, feature path, value>, where the item path specifies the row-wise entity, the feature path specifies the hierarchical attribute, and the value contains the cell content. We also present TripletQL, a lightweight query-aware router that uses STR to select an appropriate rendering or filtered subset of triplets for each question. Across four Chinese and English table-QA benchmarks, STR matches or improves upon HTML-based baselines while reducing input tokens. The relative benefit grows for smaller language models and longer table contexts, suggesting that explicit semantic representations are especially useful under constrained inference budgets. Code and data are available at https://github.com/Phoenix-ni/STR.git .