One country, multiple portraits: representativeness in GPS-based mobility data is source-specific and spatially dependent

2026-06-22Computers and Society

Computers and Society
AI summary

The authors studied how well anonymized GPS data from mobile phones represents the population in Mexico. They compared two data sources—Facebook and a multi-app aggregator called Veraset—to official census data and found that each source has different biases. Facebook's data coverage is more evenly spread out, but the multi-app data tends to focus on richer, more connected areas. The authors used machine learning to show that these biases are influenced by things like digital access and population structure, and they found that location matters too. Their work helps to understand and correct for these biases to make mobile phone data more accurate for population studies.

GPS-based mobile phone datapopulation distributioncoverage biasFacebook datamulti-app aggregatorMexican Population Censusmachine learningspatial dependencedigital accesspopulation structure
Authors
Carmen Cabrera, Francisco Rowe, Miguel González-Leonardo, Juan Ignacio Vilchis-García, Elisa Omodei, Maribel Hernández-Rosales
Abstract
Anonymised GPS-based mobile phone data are increasingly used to estimate population distribution and human mobility, supporting applications across disaster response, public health, urban planning and migration research. Yet whether these data fairly represent the populations they describe, particularly outside high-income countries, remains poorly understood. We quantify coverage bias for 2,478 municipalities in Mexico by comparing population estimates from a single-platform source (Facebook) and a multi-app aggregator (Veraset) against the 2020 Mexican Population Census. We find that the magnitude and spatial distribution of coverage bias differ substantially across sources. Facebook provides higher and more evenly distributed coverage, whereas the multi-app data concentrate users in larger, wealthier and more digitally connected places. Coverage bias is also spatially structured, with neighbouring municipalities showing similar levels of over- or under-coverage. Using explainable machine learning, we show that digital access and material resources are the dominant drivers of bias for the multi-app data, while demographic and population structure dominate for Facebook. Explicitly modelling spatial dependence improves the performance of statistical models for explaining bias and reveals that an appreciable share of spatial variation remains unexplained by observed covariates. These findings show that coverage bias is source-specific and spatially dependent, and provide a foundation for adjustments that improve the representativeness of mobile phone data in unequal, data-scarce settings.