Summary: Definition of 'Good' for Synthetic Smart Meter Data

July 16, 2024

Authors

Centre for Net Zero Massachusetts Institute of Technology University of Oxford Georgia Institute of Technology

Overview

As energy systems electrify and use more variable renewable generation, accurately profiling and actively managing demand is key. Granular demand data is therefore highly valuable to researchers, grid operators and innovators. However, it is often not accessible due to privacy concerns. AI-generated synthetic data can help overcome this and democratise access, but we need to ensure its quality so that it can be used with confidence.

Co-authored by Centre for Net Zero and academics from Massachusetts Institute of Technology, the University of Oxford and Georgia Institute of Technology, this paper proposes a common evaluation framework to benchmark algorithms which generate synthetic smart meter data, drawing inspiration from work already done in areas like health and finance. It applies three tests to synthetic data - fidelity, utility and privacy - to consider whether it meets privacy requirements whilst still being sufficiently accurate for its intended purpose.

Key findings

① There is a trade-off between privacy and fidelity - and, by extension, utility
We recommend a suite of fidelity and utility metrics to account for the unique characteristics of smart meter data. For privacy, the level of protection is based on risk appetite, which depends on the specific use case.

② Differential privacy alone is not enough to guarantee the privacy of smart meter datasets
‘Standard’ attack methods are unsuitable as there remains a risk of leaks from outliers. We propose improved methods of validating privacy techniques by injecting training data with implausible outliers and launching privacy attacks on these data points.

③ While increasing dataset size can produce synthetic data of higher fidelity and utility, it does not reduce privacy risk
The need for privacy protections with tailored privacy attacks remains.

Next steps

① Future work should consider a wider set of use cases. Synthetic data generation models could be trained or fine-tuned on objectives that explicitly capture the utility of the generated data for downstream tasks.

② More research is needed using the relevant data to study attribute inference attacks in order to quantify the risks of sensitive information (such as income, religion or gender) being revealed in synthetic smart meter data.

③ As synthetic smart meter data becomes more widely available, Governments and regulators need to consider how it can be used while addressing privacy and other ethical considerations, either through regulations or appropriate licensing.

④ The energy industry will also need to consider safeguards to limit the possibility of synthetic smart meter data from being misused, similar to existing regulation and laws in the financial services industry in the UK and the US.

Full paper