April 14, 2025

Imagine you’re an electricity grid operator dispatching resources to meet demand. How would you know the resources performed as expected? With generation resources, it’s simple: you measure the electricity they inject into the grid. But it’s more complicated with demand response (DR) resources, which reduce load rather than generate power. You can measure a customer’s actual load during a DR event, but you also need to know what their “baseline” load would have been if the event hadn’t been called.

How can you determine what a customer’s load would have been? For most of the DR industry’s history, the answer was to compare that customer to itself, using recent days’ load profiles as a baseline. This worked well when DR resources were mostly limited to large industrial users with consistent load profiles who were dispatched in emergency conditions. However, it begins to break down with the new generation of distributed energy resources (DERs) that are being rapidly deployed across the country.
The Beef With Backwards-Looking Baselines
Distributed batteries (oftentimes paired with solar), EVs, and smart thermostats don’t need to limit their participation to emergency events. However, their ability to dispatch more frequently is hindered by current DR performance measurement methods.
If a DR resource dispatches every day, then there’s no historic “baseline” to compare performance against even though it's taking deliberate, daily action to participate. It can even create a perverse incentive to not use their technology sometimes simply to show that they could turn it off other times.
The magnitude of these distortions can be significant: in California, DR providers select five of the last 45 days before a DR event (ten days for C&I customers), then use an average of that customer’s load over those five days as their baseline. In practical terms, this puts daily energy savings and DR baselines at odds with each other and creates an incentive not to utilize a technology every day to reduce load — under-utilizing resources that are otherwise ready to perform.

The Inaccessible Benefits of Control Group Baselines
Another option is for DR providers to measure participants’ performance via comparisons to similar customers. This is commonly called a “control group” baseline methodology, because it uses a group of non-participating customers to predict what a DR customer’s load would have been without a DR event. In theory, this approach could establish a baseline for a DR resource regardless of how often that particular resource was dispatched in the preceding days.
Advances in data science make it possible to create finely-matched control groups for DR participants based on climate zone, technology type, and average load outside of DR events. When a DR event is called, grid operators would set up a control group for that day as well, establishing a clear, highly-accurate counterfactual for DR participant’s load. A 2017 CAISO report on baseline accuracy concluded that “control groups consistently outperformed day and weather matching baselines.”
Despite this, almost no DR providers use control group methodologies because it requires access to large-scale non-participant data, which is difficult to obtain. The California Energy Commission (CEC) has access to anonymized customer data and could provide it for this purpose, but the data it receives arrives months after the meter is read — too late for timely performance measurements or market settlements. Unless this time lag is reduced significantly, control group methodologies will continue to face real-world challenges.
Prescriptive Baselines: The Promising Middle Ground
Prescriptive baselines are exactly what the name implies: baselines that are “prescriptive” by DR program operators based on pre-established analysis of what a particular customers’ load would be. The benefit of prescriptive baselines is that they significantly reduce complexity in performance calculations. The downside is that they’re often rudimentarily designed, turning them into blunt tools that sacrifice accuracy for the sake of simplicity.
However, by constructing prescriptive baselines using some of the more nuanced techniques used to develop control group baselines, we can split the difference between the two options and overcome the issues with both.
Grid operators evaluating DR performance in wholesale markets could pre-construct baselines by collecting historic meter data from non-participating customers. During DR events, participating customers would have their event load compared to the historic load of similar customers under similar conditions. For example, a smart thermostat’s response to a DR dispatch on a 100-degree day could be measured against the typical load of similarly-sized customers at the same temperature over the past few years.
In other words, grid operators would use a control group baseline methodology but set up those control groups in advance. This approach avoids the need for day-of control groups while allowing performance measurements regardless of dispatch frequency. More importantly, it unlocks a greater role for centralized data repositories by removing any concerns about time lags.

Over the past few years, new DR programs have used a version of this approach for specific technology types. For example, the CEC’s Demand Side Grid Support (DSGS) program uses a prescriptive baseline for battery baselines, which the CEC updates regularly as new data becomes available. In Massachusetts, the Clean Peak Standard (CPS) program uses a similar approach for EVs based on historic data on average EV demand in Massachusetts. These programs simplify performance measurements and have gained traction with DR providers.
Towards a Better Baseline Paradigm
This approach may require some adjustments in thinking for grid operators, who are used to knowing exactly how everything on the grid is performing at all times. However, this level of granularity is not necessary for DER-based DR resources, whose grid impact is only significant in the aggregate. Creating prescriptive baselines based on average customer load is appropriate for those resources, which themselves are made up of hundreds or thousands of individual meters.
State agencies like the CEC and the Massachusetts Clean Energy Center (which implements the CPS program) are already showing how this can be done for specific devices. These types of organizations are well-positioned to take the next steps in developing and testing similar baselines for a broader set of resources. Grid operators could then adopt these baselines or construct their own to evaluate DR participation in wholesale markets, allowing them to scale up DR’s ability to support grid reliability more regularly than the current systems allow.