The statistics of energy prices, part 3
Energy arbitrage, TBx, and the value of long-duration storage
Hello everyone. This is it! It’s the final installment of this three-part series on energy price statistics. I hope you’ve enjoyed the story so far.
Today, we will analyze energy price distributions through the lens of energy storage — a technology of special interest to me. We will develop a statistical estimate for energy storage arbitrage revenue based on energy price mean, volatility, and storage duration, and briefly observe what this means for the economics of long-duration storage.
In case you missed it
In Part 1, we looked at electricity market basics as well as how to characterize energy price distributions. In Part 2, we applied these energy price distributions to problems like estimating power plant generation and profits. Check them out here!
Energy arbitrage basics
Let’s start with the basics. Energy storage assets provide value to the grid by shifting energy from when it is needed less to when it is needed more. In other words, they should charge when energy prices are low, and discharge when prices are high. This is energy arbitrage in a nutshell, and it’s how batteries make money today.
Real-world complications
Energy arbitrage is conceptually easy, but operationally difficult. Real-world energy storage assets — lithium-ion batteries and pumped storage hydro, mostly — have real-world constraints like duration, round-trip efficiency, depth of discharge limits, capacity degradation, and imperfect price foresight. Calculating energy storage arbitrage revenue — even just on historical price data, let alone forecasted prices — requires an optimized battery dispatch model that can accommodate these constraints, and, ideally, co-optimize between multiple energy products like day-ahead energy, real-time energy, and ancillary services.
Simplifying assumptions
Instead of running a computationally intensive battery dispatch model, we can get a quick estimate of storage arbitrage revenue by doing what’s called a TBx calculation.
TBx = top-bottom x, where x is storage duration. For a 4-hour storage asset, this is called a TB4.1 This means summing the top 4 price hours, then subtracting the sum of the bottom 4 price hours.2
Developing the math
Mathematically, TBx can be expressed as:
where P is the day-ahead or real-time electricity price in $/MWh sorted in ascending order, x is storage duration, k is the number of price points, T is the price interval in hours (usually 1 hour for day-ahead, 1/12 hours for real-time), and TB(x) is the result in $/MW.
We can visualize this with the day-ahead energy price curve introduced in Part 1.
For November 26, 2024 in ERCOT North, TB4 = $172/MW.
This assumes one charge/discharge cycle per day, and ignores real-world operational constraints. Actual storage revenue is typically 60-80% of TBx.
Discrete to continuous
How does storage revenue vary with duration and energy price volatility? Well, we can apply our TBx calculation to continuous probability density functions to investigate this relationship, and develop an analytical solution for calculating TBx, similar to the equations we developed in Part 2.
Instead of a discrete top and bottom x hour calculation, we can think about our TBx as taking the top and bottom x/24 percentiles of a continuous energy price distribution. For a TB4, this would be the 4/24 = 17th and 1 - 4/24 = 83rd percentiles.
This model works, but before we continue, we must understand our assumptions. TBx assumes a daily arbitrage, but our day-ahead energy price distributions represent prices across an entire year. This presents some complications, because a storage asset cycling once per day cannot possibly arbitrage between prices in January and July. To account for this, we’ll make the simplifying assumption that this lognormal price distribution also applies to each day within a given year, and develop an expression for the daily average TBx.
The derivation
All we need to do is calculate the areas under the curve as illustrated. Mathematically, this is equal to:
where P is the day-ahead or real-time energy price, Pt is the energy price defining the top x/24 prices, Pb is the energy price defining the bottom x/24 prices, and p(P) is our energy price distribution, where we’ll assume a lognormal energy price distribution.
First, we must solve for our top and bottom energy prices.
Solving for Pb and Pt as a function of x, we arrive at the following.
Now, we can plug these solutions into our TBx integral and solve, yielding the result.
where
which is also known as the probit function, or the inverse of the standard normal cumulative distribution function, available in Excel as NORM.S.INV(x).
Theory vs. reality
How does our TBx analytical solution compare with numerical calculations?
Well, I’m not going to sugar coat it. It’s pretty terrible. Our analytical and numerical curves are nearly two orders of magnitude apart. This is because our lognormal distribution assumption and corresponding fitted parameters3 are severely underestimating volatility, as previously mentioned in Part 1 and Part 2. We are severely underfitting high energy prices, which are the prices that have the biggest effect on TBx.
Still, our solution can be helpful for fitting real data. Here’s what it looks like when we fit μ and σ to our numerical TBx data.
It’s remarkably accurate!
Our fit parameters now carry less physical meaning, but I hope you agree that this is still useful for developing an understanding of how storage assets generate revenue.
Long-duration storage and diminishing marginal returns
Finally, our equation also provides some insight into the value of long-duration storage and the concept of diminishing marginal returns.
Specifically, we can analyze how storage revenue (TBx) varies with duration (x) by taking the derivative with respect to x.
Marginal revenue is reported at $/MWh-day, because this is the additional $/MW-day you get from adding one additional unit (hour) of storage. As duration increases, revenue (TBx) increases, too. This makes sense. But the rate of increase slows with increasing duration, too, ultimately reaching $0/MWh-day at x = 12 hours. These diminishing marginal returns are inherent to the shape of our probability distribution.
The difference between our high and low prices decreases as we increase our duration. Once we reach 12 hours, we run out of room to arbitrage any further, explaining the diminishing returns to $0/MWh-day.4 Neat!
If you’re interested in reading more about diminishing marginal returns in energy and capacity markets and what this means for the future of long-duration storage, I recommend you check out NREL’s 2023 report on the subject. You could also subscribe to our research at Wood Mackenzie :)
Closing thoughts
Well, that’s all, folks! I hope you’ve enjoyed this three-part series on energy price statistics! Besides energy storage, we’ve covered so much.
Together, we’ve looked at:
Electricity markets 101 and why energy prices are inherently volatile
How to characterize energy markets using probability distributions
How to incorporate energy price statistics into economic calculations like power plant dispatch, revenues, profit, and energy storage arbitrage
We’ve also learned that, while the lognormal distribution assumption is helpful in understanding the basics of energy price statistics and power plant modeling, there are severe limitations that come with it, namely the inability to capture negative prices and an insufficiently long tail that fails to capture the most lucrative generation periods.
Future work could probably build upon these basic lognormal distribution assumptions and determine more sophisticated distributions that better fit these energy price distributions, which are evolving over time as renewable penetration increases.
And last but not least, I’d like to extended my gratitude and say that the support has just been amazing. I’m glad my thoughts seem to have resonated with those in the energy & data science communities, and I look forward to continuing the conversation in the future.
Thank you all for reading.
I’m surprised there’s no Wikipedia article on this yet! But you can find these calculations in financial derivatives like the CAISO SP15 day-ahead TB4 I linked.
Ignoring units, I hope you appreciate that this calculation doesn’t have anything to do with energy — this is purely a statistical metric that can be applied to any distribution, energy or otherwise!
A bit of an aside I’m going to include for thoroughness: The lognormal fit parameters of μ = 3.01 and σ = 0.44 come from a least-squares fit on the observed day-ahead energy price probability distribution for ERCOT Houston in 2024. The actual values of this dataset, in log scale, are μ = 3.34 and σ = 4.09. If we use these values as our inputs for our TBx formula, we’d severely overestimate revenues. This suggests the lognormal fit cannot accommodate the full nature of this price distribution.
Assuming one cycle per day.
Thanks Kasim. I wonder if you could offer thoughts on the economics of the new 19 GWh BESS in the UAE. I think it can only discharge @ 1 GW (?). How do they get big enough price spreads to turn a profit?