Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework
2026-06-29 • Human-Computer Interaction
Human-Computer InteractionArtificial Intelligence
AI summaryⓘ
The authors look at how well large language models can take pictures of charts and turn them back into tables of data, especially when the charts don’t have clear labels. They found that current models can figure out the table’s layout but aren’t very good at getting exact numbers. To improve this, they made the model learn in steps that mimic how humans read charts. Their approach boosts accuracy and works well with people helping the process in real use.
chart data extractionmultimodal large language modelsdata tablesmixed-initiative systemsnumerical accuracybenchmarkprogressive learninghuman-centered design
Authors
Yuchen He, Peizhi Ying, Liqi Cheng, Kuilin Peng, Yuan Tian, Dazhen Deng, Yingcai Wu
Abstract
Chart data extraction, which reverse-engineers data tables from chart images, is essential for reproducibility, analysis, retrieval, and redesign. Existing interactive tools are reliable but tedious, and mixed-initiative systems, while more efficient, lack generalizability. Recent multimodal large language models (MLLMs) offer a unified interface for chart interpretation, yet their ability to extract accurate data tables, especially without visible labels, remains unclear. We build a benchmark featuring diverse real-world charts without data labels to evaluate this capability. Results show that, while current MLLMs reliably reconstruct table structures, they struggle with precise value recovery. To address this, we revisit chart data extraction from a human-centered perspective and argue that extraction should follow a progressive learning process similar to how people read charts. Our training framework substantially improves numerical accuracy, achieving state-of-the-art performance with a 7B-parameter model. A user study further shows that our model effectively supports mixed-initiative workflows for reliable chart data extraction.