Multi-Modal Spatio-Temporal Graph Neural Network with Mixture of Experts for Soil Organic Carbon Prediction

2026-06-15 • Machine Learning

Machine LearningComputer Vision and Pattern Recognition

AI summaryⓘ

The authors developed a new method called SpTGNN to better predict how much organic carbon is in topsoil, which is important for farming and land management. Their approach uses a special type of neural network that treats soil samples like points on a graph connected by different relationships such as location, soil properties, and elevation. They combine data from satellites and environmental info, using advanced techniques to make these predictions more accurate and to estimate uncertainty. Tests show their method works better than previous models, especially in different regions like Africa and Europe. They also confirmed that each part of their design helps improve performance and reliable uncertainty estimates.

soil organic carbongraph neural networkheterogeneous graphspatio-temporal dataSentinel-1Sentinel-2Mixture-of-Expertsuncertainty quantificationMoran's Ideep ensembles

Authors

Daniele Mos, Felipe Drummond, Anton Bossenbroek, Soufiane el Khinifri

Abstract

Top-soil organic carbon (SOC) prediction is fundamental to agricultural sustainability, land use policy and fertilization planning. Existing approaches face two limitations: they pair hand-crafted covariates with classical ML or single-modal deep models that miss rich spectral and temporal information, and grid-based architectures ignore the irregular spatial structure of field measurements. We introduce SpTGNN, a multi-modal spatio-temporal graph neural network addressing both. SpTGNN represents soil measurements as nodes in a heterogeneous graph with three edge types (spatial proximity, spectral similarity, elevation), and applies relational graph attention to learn separate patterns per relation. A fine-tuned TerraMind encoder extracts node features from Sentinel-2, Sentinel-1 and DEM signals, combined with per-sample environmental covariates and learned positional and temporal embeddings. A sparse Mixture-of-Experts module fuses the four streams via top-$k$ routing. Uncertainty is captured by pairing heteroscedastic regression (aleatoric) with deep ensembles (epistemic), and a Moran's $I$ penalty regularizes spatial autocorrelation. We evaluate on a global SOC corpus split into three regional instances ($\sim$49k samples globally, Africa $\sim$26k, Europe $\sim$14k). Our 5-member deep ensemble reports $R^2=0.762$, RMSE $=3.51\pm0.48$ g/kg and MAPE $=22.9\%$ on the Africa test split, improving over a tabular XGBoost baseline; the best single checkpoint reaches validation $R^2=0.864$. Ablations confirm the heterogeneous graph, MoE fusion and fine-tuned backbone each contribute substantively, and the ensemble UQ stack achieves post-calibration ECE of $0.031$ (hybrid) and $0.026$ ($β$-NLL). To our knowledge, this is the first framework to unify foundation-model feature extraction, heterogeneous graph attention and decomposed uncertainty quantification for SOC estimation.

View PDFOpen arXiv