GeoSVG-RL: Geometry-Aware Reinforcement Learning for Layout-Constrained Text-to-SVG Diagram Generation

2026-05-25Computation and Language

Computation and Language
AI summary

The authors address the problem of generating precise diagrams in SVG format using language models, which often fail due to small structural mistakes. They created GeoSVG-RL, a system that uses reinforcement learning to improve diagram layouts by checking and optimizing the geometric correctness of the output. Their method involves planning the layout first, then verifying it in a browser to measure quality across several factors like proper alignment and fitting on the canvas. This approach leads to more accurate and reliable diagrams compared to previous techniques. Overall, the authors demonstrate a way to make computer-generated technical drawings more usable and professional.

SVGreinforcement learninglayout planningpolicy optimizationgraph connectivityrendering validitytext containmentstructured outputvector graphicsGroup Relative Policy Optimization
Authors
Sifan Li, Yujun Cai, Hongkai Chen, Yiwei Wang
Abstract
Generating structured, editable diagrams remains a significant challenge for contemporary large language models, despite their proficiency in general-purpose vector code generation. The primary difficulty lies in the structural fragility of the output; minor errors such as misaligned connector endpoints, text labels overlapping borders, or complex layouts drifting beyond the canvas boundaries render the resulting SVG files functionally unusable for professional applications. To address these issues, we introduce GeoSVG-RL, a specialized reinforcement learning framework designed for layout-constrained text-to-SVG generation. Unlike standard training objectives that rely solely on maximizing token-level likelihood, our approach optimizes the policy against explicit, executable geometric feedback. The model first produces a structured layout plan that serves as a geometric contract for the subsequent generation of the SVG code. This code is then rendered through a browser-backed verifier, enabling the calculation of fine-grained rewards across six critical dimensions: rendering validity, canvas fitting, precise anchor placement, text containment, graph consistency, and code cleanliness. We utilize Group Relative Policy Optimization (GRPO) to refine the model, sampling multiple candidates per prompt to facilitate updates based on relative quality. Starting from a supervised warm-start phase on synthetic data, GeoSVG-RL achieves substantial gains in structural reliability, particularly in arrow-anchor accuracy and text-in-box rates. Quantitative evaluations demonstrate that our method consistently outperforms current state-of-the-art systems in local geometric precision and the preservation of graph connectivity, providing a robust pathway toward automated yet reliable technical illustration.