AI summaryⓘ
The authors studied how the detail level of smart meter readings, from every 15 minutes up to weekly summaries, affects the ability to guess household traits like home size or pool ownership. They found that prediction accuracy stays about the same when data is collected between 15 minutes and 1 hour, and also between 1 day and 7 days, suggesting less detailed data can still be useful. They also found that simpler, understandable features work nearly as well as complex deep learning features, and a specific method called XGBoost performs best at predictions. Some traits like home size can be guessed even from rough data, while others like pool use need detailed time info. This helps balance privacy and usefulness of smart meter data.
Smart meter dataTemporal resolutionSocio-demographic inferenceLoad profilesFeature extractionXGBoostCNN autoencoderData granularityPrivacy-utility trade-offTime series analysis
Authors
Dejan Radovanovic, Maximilian Schirl, Andreas Unterweger, Günther Eibl
Abstract
Smart meter data can reveal sensitive socio-demographic characteristics of households, raising privacy concerns. While this risk has been demonstrated at fixed granularities, the role of temporal resolution in shaping inference performance remains insufficiently explored. This paper addresses this gap by analyzing how load profiles with granularities from 15 minutes to 7 days affect the predictability of eight socio-demographic attributes in a dataset of 1,589 households over one year. We introduce an evaluation framework where classifiers are trained on year-round data but tested on arbitrary weeks, forcing generalization across seasonal and weekly variations. Our results show three main findings. First, while coarsening granularity reduces predictive accuracy, two plateaus emerge: performance is stable between 15 minutes and 1 hour, and again between 1 and 7 days. This reveals opportunities for data minimization without sacrificing utility. Second, interpretable handcrafted and tsfresh features remain competitive with CNN-based autoencoder embeddings, while XGBoost consistently outperforms alternative classifiers. Third, feature importance analysis highlights differences between static and dynamic attributes: dwelling size can be inferred even from coarse data, whereas swimming pool usage requires fine-grained temporal signals. Overall, our study provides new insights into the privacy-utility trade-off in smart metering, showing how temporal resolution, feature extraction, and classifier choice jointly influence socio-demographic inference.