Developing a foundation model for high-resolution remote sensing data of the Netherlands
2026-05-11 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors created a new computer model that learns from 1.2 million detailed satellite pictures of the Netherlands. Their model combines two types of AI methods to understand both small details and big landscape patterns. It also looks at images taken over time, which helps the model better understand changes in nature and land use. The model works well on tasks in the Netherlands and performs competitively on global tests, even though it uses less data and is smaller than many current models. The authors have shared their code and model online for others to use.
foundation modelsatellite imageryConvolutional Neural NetworkVision Transformertemporal dataland-coverremote sensingrepresentation learningbenchmark datasetsgeneralization
Authors
Paul Vermeeren, Heysem Kaya
Abstract
We develop a foundation model using 1.2m high resolution satellite images of the Netherlands. By combining a Convolutional Neural Network and a Vision Transformer, the model captures both low- and high-frequency landscape features, such as fine textures, edges, and small objects as well as large terrain structures, elevation patterns, and land-cover distributions. Leveraging temporal data as input, the model learns from broader contextual information across time, allowing the model to exploit the temporal dependencies, such as topographic features, land-cover changes, and seasonal dynamics. These additional constraints reduce feature ambiguity, improve representation learning, and enable better generalization with fewer labeled samples. The foundation model is evaluated on multiple downstream tasks, ranging from use cases within the Netherlands to global benchmarking datasets. On the vegetation monitoring dataset of the Netherlands, the model shows clear performance improvements by incorporating temporal information instead of relying on a single time point. Despite using a smaller model and less pretraining data limited to the Netherlands, it achieves competitive results on global benchmarks when compared to state-of-the-art models. These results demonstrate that the model can learn rich, generalizable representations from limited data, achieving competitive performance on global benchmarks while using a fraction of the parameters of larger state-of-the-art remote sensing models. To maximize reproducibility and reuse, we made the scripts and the model accessible on GitHub.