AI summaryⓘ
The authors study how to turn natural language descriptions into exact geometric designs that meet many constraints at once, which is hard because even one small mistake can block learning updates. They created PyGeoX, a system that turns geometric rules into a form that a model can learn from, and PyGeoX-Bench, a set of 300 test problems to measure success. They found a problem called Outlier Gradient Masking, where a single big error hides feedback from other correct parts. To fix this, they designed Saturating Additive Rewards, a new way to give learning signals that keeps progress visible even when some constraints fail. Their method helps models solve tough geometry problems much better than usual approaches.
Large Language ModelsGeometric ConstraintsDifferentiable LossOutlier Gradient MaskingSaturating Additive RewardsGeometric SynthesisNatural Language ProcessingPyGeoXBenchmarkingMulti-Constraint Optimization
Authors
Rafael Cabral, Pang Zixi, Ziyi Shou, Shen Xin
Abstract
Large Language Models frequently hallucinate in precision-critical domains such as technical diagramming and mechanical design, where outputs must satisfy strict geometric constraints. We study open-ended geometric synthesis from natural language: translating free-form descriptions into precise constructions whose entities must simultaneously satisfy dozens of interacting constraints. To make this tractable, we release PyGeoX, a programmable geometric DSL that compiles declarative constraints into a differentiable loss, and PyGeoX-Bench, a stratified suite of 300 problems with per-constraint verifiable rewards. Using PyGeoX as a verifier, we identify a failure mode we call Outlier Gradient Masking: under global-norm rewards (any scheme that aggregates residuals through a single norm, for example, $\exp(-\mathrm{MSE})$), a single outlier constraint can nullify the learning signal across all others. To address this, we propose Saturating Additive Rewards (SAR), which decompose the reward into bounded per-constraint terms, preserving partial progress and ensuring consistent gradients even under severe violations. Against MSE-based rewards, the natural baseline for geometry solvers, SAR improves the hard-tier solving rate by $2.3\times$, and the resulting 8B model is competitive with much larger frontier systems on this benchmark. We release the engine, benchmark, and data at https://github.com/Huawei-AI4Math/PyGeoX.