GeoSVG-RL: Geometry-Aware Reinforcement Learning for Layout-Constrained Text-to-SVG Diagram Generation

2026-05-25 • Computation and Language

Computation and Language

AI summaryⓘ

The authors address the problem of generating precise diagrams in SVG format using language models, which often fail due to small structural mistakes. They created GeoSVG-RL, a system that uses reinforcement learning to improve diagram layouts by checking and optimizing the geometric correctness of the output. Their method involves planning the layout first, then verifying it in a browser to measure quality across several factors like proper alignment and fitting on the canvas. This approach leads to more accurate and reliable diagrams compared to previous techniques. Overall, the authors demonstrate a way to make computer-generated technical drawings more usable and professional.

SVGreinforcement learninglayout planningpolicy optimizationgraph connectivityrendering validitytext containmentstructured outputvector graphicsGroup Relative Policy Optimization

Authors

Sifan Li, Yujun Cai, Hongkai Chen, Yiwei Wang

Abstract

Generating structured, editable diagrams remains a significant challenge for contemporary large language models, despite their proficiency in general-purpose vector code generation. The primary difficulty lies in the structural fragility of the output; minor errors such as misaligned connector endpoints, text labels overlapping borders, or complex layouts drifting beyond the canvas boundaries render the resulting SVG files functionally unusable for professional applications. To address these issues, we introduce GeoSVG-RL, a specialized reinforcement learning framework designed for layout-constrained text-to-SVG generation. Unlike standard training objectives that rely solely on maximizing token-level likelihood, our approach optimizes the policy against explicit, executable geometric feedback. The model first produces a structured layout plan that serves as a geometric contract for the subsequent generation of the SVG code. This code is then rendered through a browser-backed verifier, enabling the calculation of fine-grained rewards across six critical dimensions: rendering validity, canvas fitting, precise anchor placement, text containment, graph consistency, and code cleanliness. We utilize Group Relative Policy Optimization (GRPO) to refine the model, sampling multiple candidates per prompt to facilitate updates based on relative quality. Starting from a supervised warm-start phase on synthetic data, GeoSVG-RL achieves substantial gains in structural reliability, particularly in arrow-anchor accuracy and text-in-box rates. Quantitative evaluations demonstrate that our method consistently outperforms current state-of-the-art systems in local geometric precision and the preservation of graph connectivity, providing a robust pathway toward automated yet reliable technical illustration.

View PDFOpen arXiv