Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations

2026-04-10Artificial Intelligence

Artificial Intelligence
AI summary

The authors focus on teaching AI to find exact math formulas that describe physical fields just by looking at pictures of them. They worked on fields that don’t change over time and are steady in two dimensions. To do this, they created a new method called ViSA-R2 that mimics how physicists think, by guessing a formula family, figuring out parameters, and checking if it makes sense. They also made a special test set called ViSA-Bench, which has lots of examples with known correct answers to help evaluate AI models. Their approach performed better than other existing models on this task.

Visual-to-symbolic inferenceAnalytical solutionsLinear steady-state fieldsSymPy expressionsChain-of-thought reasoningViSA-R2ViSA-BenchPhysics-informed AIVector language modelsPattern recognition
Authors
Pengze Li, Jiaquan Zhang, Yunbo Long, Xinping Liu, Zhou wenjie, Encheng Su, Zihang Zeng, Jiaqi Liu, Jiyao Liu, Junchi Yu, Lihao Liu, Philip Torr, Shixiang Tang, Aoran Wang, Xi Chen
Abstract
Recovering analytical solutions of physical fields from visual observations is a fundamental yet underexplored capability for AI-assisted scientific reasoning. We study visual-to-symbolic analytical solution inference (ViSA) for two-dimensional linear steady-state fields: given field visualizations (and first-order derivatives) plus minimal auxiliary metadata, the model must output a single executable SymPy expression with fully instantiated numeric constants. We introduce ViSA-R2 and align it with a self-verifying, solution-centric chain-of-thought pipeline that follows a physicist-like pathway: structural pattern recognition solution-family (ansatz) hypothesis parameter derivation consistency verification. We also release ViSA-Bench, a VLM-ready synthetic benchmark covering 30 linear steady-state scenarios with verifiable analytical/symbolic annotations, and evaluate predictions by numerical accuracy, expression-structure similarity, and character-level accuracy. Using an 8B open-weight Qwen3-VL backbone, ViSA-R2 outperforms strong open-source baselines and the evaluated closed-source frontier VLMs under a standardized protocol.