OpenSynth: new data release

September 05, 2024

Following the release of Centre for Net Zero’s source code for our generative AI model, Faraday, via the OpenSynth project, we are pleased to announce that the model’s synthetic outputs are also now openly accessible via OpenSynth on Zenodo.

OpenSynth is an open data community, originated by CNZ and sourced under The Linux Foundation (LF Energy). It empowers both holders of raw smart meter data around the world to be able to generate and share synthetic data, and for community members to generate, improve and share algorithms.

The dataset we’ve released via OpenSynth contains 10 million synthetic load profiles trained on over 300 million smart meter readings from 20,000 Octopus Energy UK households sampled between 2021 and 2022. It is conditioned on labels such as:

Property types: house, flat, terraced, detached, semi-detached etc
Energy performance certificate (EPC) rating: A/B/C, D/E, F/G etc
Low Carbon Technology (LCT) ownership: heat pumps, electric vehicles, solar PVs etc
Seasonality: days of the week and month of the year

For more information about Faraday, please refer to the workshop paper that CNZ presented at ICLR 2024 - the world’s leading machine learning conference. Whilst work is still underway to build out the backend infrastructure to host the datasets, CNZ has now released the data on Zenodo for public access. The dataset is licensed under Creative Commons Attribution 4.0.

Features of CNZ’s Faraday dataset

The fidelity of Faraday’s output can be seen from the plots above using t-distributed stochastic neighbour embedding (TSNE). TSNE is a popular technique used to reduce the dimensions of data for visualisation. In Fig 1, we reduced the daily load profile from 48 dimensions (48 half-hourly smart meter readings) to 2 dimensions to visualise on a scatter plot.

This figure demonstrates how Faraday’s output is highly faithful to the original training data it was trained on. The plots below show further evidence of this. On a population level, synthetic data and real data have similar mean and quantile values (peak consumption). On the individual household level, synthetic profiles highly resemble that of real households.

As mentioned above, Faraday’s output can be also conditioned on variables such as the household’s property type, energy rating, as well as their LCT ownership. Below shows the comparison of Faraday’s output between households with and without LCTs:

Households with LCTs have median consumption peaking around settlement period 0-3 (00:00 hrs to 02:00 hrs) as they are leveraging cheap energy prices to charge their electric vehicles.
Households without LCTs have a more traditional load profile where energy consumption is at the highest during evening peaks when people return from work.

What’s next?

Access to smart meter data is essential to rapid and successful energy transitions, yet access to demand data is highly restrictive, as a result of privacy protections. Rather than joining industry calls to unlock raw smart meter data through existing mechanisms, by challenging current data regulations and smart meter legislation, OpenSynth is advancing synthetic data as the fastest way to achieve widespread, global access to smart meter datasets.

Faraday to date has been used by more than 100 alpha testers globally, from universities to industrial partners to government and regulators. We’re pleased to now release a snapshot of synthetic data produced by Faraday for public access.

For more information on the OpenSynth project, please visit our Github repository and subscribe to our mailing list here. This will keep you informed of future data and algorithm updates - and more.