GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair
2026-05-01 • Software Engineering
Software EngineeringArtificial Intelligence
AI summaryⓘ
The authors introduce GeoContra, a system that helps check and fix Python code generated by AI models to ensure it follows important geographic rules in spatial analysis. GeoContra uses detailed contracts describing what the code should do and verifies the code against these rules through multiple checks. When mistakes are found, it tries to repair the code automatically. Testing on thousands of real tasks showed GeoContra significantly improves the accuracy of AI-generated geospatial programs. This makes the resulting analyses more reliable and geographically valid.
GISciencespatial analysisLLMgeospatial contractcoordinate reference system (CRS)topologyruntime validationstatic rule inspection
Authors
Yinhao Xiao, Rongbo Xiao, Yihan Zhang
Abstract
Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS systems generate fluent scripts but rarely enforce these geographic rules at scale. We present GeoContra, a verification and repair framework for LLM-driven Python GIS workflows. It represents each task as an executable geospatial contract-including natural-language questions, schemas, CRS metadata, expected outputs, spatial predicates, topology, metrics, required operations, and forbidden shortcuts. Generated programs undergo static rule inspection, runtime validation, and semantic verification, with violations fed back into a bounded repair loop. Evaluated on 7,079 real geospatial tasks across 15 Boston-area zones, 9 task families, and 11 open-source models (600 runs each), GeoContra improves spatial correctness on closed models from 47.6% to 77.5% for DeepSeek-V4 and from 57.7% to 81.5% for Kimi-K2.5. Across 11 open models, average correctness rises by 26.6%. GeoContra turns fluent code production into verifiable spatial analysis, catching negative travel times, CRS/field-schema violations, missing predicates, and brittle output casts that otherwise yield executable but geographically invalid results.