Case study  /  Forecasting

Forecasting energy
demand to 2.7%

I tested a dozen forecasting models on 3.5 million records of London electricity use, then combined the best into one that predicted daily demand to within roughly 2.7 percent, accurate enough for a grid operator to get supply ready before a peak.

Role
Analyst, team of three
Context
Graduate project, Notre Dame Mendoza
Data
3.5M+ records, 5,567 households
Stack
R, ARIMA, neural nets, ensembling
Why this matters, in plain terms

A power grid cannot store much electricity. It has to know how much people will use tomorrow and have it ready, or the lights go out. London has had real blackouts when demand spiked in the cold. The question was simple: can we predict daily electricity use accurately enough to prepare for the peak? If yes, an operator can switch on extra supply before the cold snap, not after it.

01The challenge

"Accurate enough" is the whole game. A forecast that is roughly right but ten percent off is useless when you are deciding how much backup power to hold in reserve. We needed the error down to low single digits.

02The data

We joined two public datasets: every day's London weather, and the electricity use of 5,567 households over more than two years, about 3.5 million readings in all. The very first step was not modeling. It was checking whether the past actually tells you anything about the future here. It did.

For the analysts

Tested both series for a random walk; neither was, so the history carried real signal. Switched from daily sums to daily averages once we saw the panel was a rolling, opt-in sample. Confirmed strong annual seasonality before model selection.

03What I tested

Instead of betting on one model, I ran a bake-off and then combined the winners. Think of it like asking a dozen forecasters the same question and then blending the most reliable answers into one.

For the analysts

Baselines (seasonal naive, trailing moving average), classical time series (seasonal ARIMA, ARIMAX, Holt-Winters), regression with trend and seasonality, and a neural network tuned by randomized grid search (125 lookback steps, seven hidden nodes). All scored on the same hold-out with RMSE, MAE, and MAPE.

04The result

No single method won. The combined model (the ensemble) beat every one of them, predicting daily demand to within about 2.7 percent on average.

Every model tested, ranked by error

Average error (MAPE). Shorter bar is better. Green is the winner.
Ensemble 2.66, moving average 2.69, ARIMAX 6.03, Holt-Winters 7.96, seasonal ARIMA 8.31, seasonal naive 8.35, linear regression 9.00, neural network 10.89 percent error.

What this shows: the combined model made the smallest mistakes, roughly a quarter of the error of the standard seasonal models. In plain terms, it was the most trustworthy forecaster in the room.

The forecast against what actually happened

Daily demand through the year. The forecast line tracks reality closely.
The forecast line stays close to actual demand across the period, peaking each winter and dipping each summer.

Illustrative of the seasonal pattern the models captured. Exact daily values come from the study's R models.

What this shows: demand rises every winter and falls every summer, and the forecast (green) sits almost on top of what actually happened (gray). That closeness is the whole point.

The bottom line

A few percent off is the difference between a grid that holds and a grid that fails.

05Did the weather help?

With a strong forecast already in hand, I tested whether adding yesterday's weather, temperature, sunshine, and rain, could sharpen it further. Snow and cloud cover were dropped because they added little.

Weather-driven models, ranked by error

Average error (MAPE). Shorter is better.
Lagged ensemble 4.26, lagged linear regression 4.42, lagged GAM 4.87, lagged neural network 6.82 percent error.

What this shows: the same lesson held. Blending beats betting on one, the combined weather model came out ahead of every single weather model on its own.

06So what

Demand at this scale is predictable to within a few percent. That gives a grid operator a real, numbers-backed basis to hold the right amount of backup power before winter peaks instead of guessing, and it protects the people downstream from shortages. The deliverable was never the model. It was a decision a planner could make with confidence.

07The honest caveat

The data showed a gentle downward drift that almost certainly reflects the study's household sample shrinking over time, not a real fall in demand. I flagged it rather than letting it flatter the result. A production version would correct for that before trusting the trend. Saying so is part of the job.