StrucTab: A Structured Optimization Framework for Table Parsing

2026-06-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors address the challenge of turning pictures of tables into organized, computer-friendly data. They note that existing models often skip important reasoning steps and have trouble learning from rewards. Their approach, called StrucTab, breaks down the task into smaller parts like counting rows and columns, then combines these parts step-by-step to better understand the table. They also create a new training method, Uni-TabRL, that provides clearer feedback by splitting rewards into different categories. Their experiments show that StrucTab performs very well, especially on a new large and difficult table dataset called TableVerse-5K.

table parsingvision-language modelsreinforcement learningstructural supervisionreward decompositionrow-column countingmerged-cell analysissequential reasoningbenchmark datasetmachine-readable tables

Authors

Gengluo Li, Shangpin Peng, Chengquan Zhang, Binghong Wu, Hao Feng, Weinong Wang, Pengyuan Lyu, Huawen Shen, Xingyu Wan, Zhuotao Tian, Han Hu, Can Ma, Yu Zhou

Abstract

Table parsing aims to convert table images into structured, machine-readable representations, a task requiring the joint perception of complex spatial layouts and textual content. While recent vision-language models (VLMs) enable end-to-end parsing, they typically rely on direct supervision of the final output, thereby bypassing the explicit intermediate reasoning that is crucial for understanding complex table structures. Furthermore, attempts to optimize these models using reinforcement learning (RL) are often hindered by unstable or ambiguous reward designs, limiting potential performance gains. To address these limitations, we propose StrucTab, a table parsing model learned through intermediate structural supervision and reward decomposition. At the modeling level, by decomposing the parsing process into human-inspired subtasks, such as row-column counting and merged-cell analysis, StrucTab progressively unifies them through a sequential reasoning strategy. At the optimization level, we introduce Uni-TabRL, a unified RL framework that leverages decomposed rewards (validity, structure, and content) to provide stable and informative optimization signals. Finally, at the evaluation level, we present TableVerse-5K, a large-scale, challenging benchmark encompassing diverse, real-world table scenarios. Extensive experiments demonstrate the state-of-the-art performance of StrucTab across all evaluated public benchmarks and significant improvements on TableVerse-5K, validating the effectiveness of explicit structural modeling and decomposed reward optimization. Code and benchmark are publicly available at https://github.com/VirtualLUOUCAS/StrucTab.

View PDFOpen arXiv