Blog Archives

Building Qualifications Test Reveals Wide Applicability of CalTRACK Method for Portfolio Analysis

3/29/2018

Week Eight CalTRACK Update

Today we had an exciting working group meeting focused on Building Qualifications with test results and recommendations. We will be stepping into hourly methods in the upcoming week.

3/29 CalTRACK Working Group Meeting Recording

Daily and Billing Period Methods Specifications:
The final comments on daily and billing period methods have been received. The new, finalized specifications are currently being updated you can track the final on GitHub Issue #82. Participants can review the final specifications and track any issues that fed into the final specification prior to their publication at this site.

Building Qualifications:
The CalTRACK methods are equipped to normalize for weather’s effect on a building’s energy consumption. The methods become unreliable when a building’s energy consumption patterns are instead correlated with non-weather variables. For example, irrigation pumps are used on an agricultural cycle rather than in response to temperature changes. It is reasonable to expect higher energy consumption during the growing season and an analyst may suggest re-specifying the model to control for these seasonal variations. Without modification, existing billing and daily CalTRACK models cannot accommodate these nuanced cases. For this reason, buildings that are not well-specified by the CalTRACK model should be identified and removed from portfolios because they have significant effects on portfolio uncertainty. This issue can be tracked on GitHub Issue #71.

Building Qualification Metric:
To evaluate a building’s qualification status, a metric and a thresholds should be defined. After empirical testing, the recommended metric for CalTRACK 2.0 is CV(RMSE).

Justification:

The properties of the CV(RMSE) metric penalize buildings with outlier energy use. Due to their significant effect on portfolio uncertainty, it is recommended that buildings with outlier energy use are eliminated from portfolios. The CV(RMSE) metric makes removing outlier buildings from portfolios more important.
CV(RMSE) is not sensitive to close to zero individual usage values. This makes it more robust than MAPE and similar metrics.

Building Type:
Building type can be a useful identifier for aggregators to determine a building’s suitability for a portfolio. In figure 1, the relationship between energy consumption and CV(RMSE) is measured by building type. The size of each dot corresponds with the number of meters per building type.

The CalTRACK methods will be most effective for buildings in region A, which have relatively low energy consumption and low CV(RMSE). Buildings in region B are high energy consumers. These buildings often have a single meter tracking consumption for various sub-buildings with mixed uses, which make it difficult to quantify the effect of an energy-efficiency intervention on overall consumption. These buildings will likely require custom M&V and not qualify for CalTRACK. The buildings in region C have high CV(RMSE). The high CV(RMSE) is likely due to correlation in energy usage that is not specified in the model, such as seasonality. These models should not qualify for CalTRACK.

CV(RMSE) Threshold for Building Qualification:
Due to differences in model quality, portfolio size, and building type between aggregator datasets, it is difficult to establish a universally applicable building-level CV(RMSE) cut-off. The graphs below visualize the relationship between building-level CV(RMSE), portfolio uncertainty, and building attrition. The various graphs show these relationships with different building types and portfolio sizes. While analyzing the graphs, consider that procurers and aggregators tend to be more concerned with portfolio-level uncertainty and building attrition than the building-level CV(RMSE), especially for pay-for-performance programs and Non-Wires Alternatives procurements.

The results show that strict building-level thresholds of 25% CV(RMSE) result in low portfolio uncertainty, but significant building attrition. Also, the results vary depending on the portfolio size and building type. From an aggregators perspective, it may be preferable to adopt a less restrictive building-level CV(RMSE) threshold and focus on minimizing building attrition with respect to a strict portfolio uncertainty threshold.

Recommendations:
We are recommending that the specific building eligibility requirements be generally left to the procurer, who can set the requirements that align best with their goals for a procurement, provided these are specified clearly upfront. CalTRACK can provide general guidelines as follows.

For use cases where confidence in portfolio-level performance is required (e.g. aggregator-driven pay-for-performance, non-wires alternatives (NWA) procurements), we recommend using a permissive building-level CVRMSE threshold (100% is recommended), but requiring that a portfolio-level metric be respected (e.g. weighted mean CVRMSE or portfolio fractional savings uncertainty). The portfolio-level threshold will be a policy decision and may differ depending on the use case (e.g. NWA procurement may require less than 15% uncertainty, regular pay-for-performance program may require 25% to align with ASHRAE Guideline 14 etc.)
For use cases where confidence in individual building results is required (e.g. customer-facing performance based incentives), ASHRAE Guideline 14 thresholds may be used.

Other Reading Cited in the Working Group Meeting today:
Normalized Metered Energy Consumption Draft Guidance CPUC
ASHRAE Guideline 14

Homework:

Comments on Recommendations for Building Qualifications
Suggestions and recommendations for hourly models
Revisiting other requirements in terms of hourly models
Tests for hourly models

Next CalTRACK working group meeting will be April 12, 2018.

2 Comments

Building Qualification Method Discussions Continue

3/26/2018

0 Comments

Week Seven CalTRACK Update

A quick update for this week and a reminder of the working group meeting:
Thursday, March 29th at 12:00 (PST) in which we will cover:

Final discussion of building qualification methods
Introductory discussion on Hourly Methods

During week seven of CalTRACK, consideration of building qualification methods continued. This week’s working group meeting will conclude the discussion of building qualification methods and launch the testing period. Comments or test results should be added on GitHub issues early next week to ensure they can be considered before proposals are finalized.

Homework:

Contribute final tests and comments for Building Qualifications on GitHub
Analyze and respond to test results and comments on GitHub
Attend the bi-weekly meeting on Thursday March 29th at 12:00 (PST)

0 Comments

Considering Metrics for Building Qualifications

3/19/2018

1 Comment

Week Six CalTRACK Update

Week six was primarily focused on the building qualification discussions and will continue to be the focus of testing and experimentation this week; this was coming off of an exciting working group meeting on March 15, 2018 linked below.

Recording: March 15, 2018 Working Group Meeting

Review of properties of intercept-only models in PRISM:
As we analyze building qualifications, it is useful to review the properties of PRISM intercept-only models to ensure they are properly treated. Here are a few characteristics of intercept-only models:
Properties:

Intercept-only models imply no significant effect of HDD or CDD on energy consumption was detected. Generally, this means that weather did not have a significant effect on the site’s energy consumption
In intercept-only models, predicted energy savings are the difference between the current year’s energy consumption and the previous year’s consumption
Significant temperature-related energy savings are not expected at sites with intercept-only models

Weaknesses:

These models are susceptible to poor savings estimates if the previous year was atypical. For example, if a resident did not live in their house for a majority of the previous year, then it may not be a good predictor of energy consumption in the current year
Intercept-only models impose an average energy consumption over the entire year. This yearly average may be inappropriate when estimating more granular fluctuations, such as daily or hourly energy consumption

Description of Each Proposed Metric:
During the upcoming week, we will use empirical testing to establish the preferred metric and threshold to determine a building’s suitability for CalTrack methods. The two proposed metrics are described below:

Coefficient of Variation Root-Mean-Square-Error (CVRMSE)
The CVRMSE is calculated by:

Measuring the distance between each predicted value and actual value
Squaring each of these distances
Averaging all of these squared distances from (2)
Taking the square root of the average

Because the distances are squared in the CVRSME before they are averaged, outliers can have a large effect on this metric. In the context of pay-for performance, we are uncertain if it is advantageous to choose a metric that is sensitive to outliers or not. We look forward to seeing test results on this issue.
Mean Absolute Percent Error (MAPE)
The MAPE is calculated by:

Subtracting each observation’s actual value from their predicted value
Dividing by that observation’s actual value
Taking the average of the value in (2) for all observations
Multiplying by 100 to give the result in a percentage

The MAPE is another appealing metric. It is worth noting that with a MAPE calculation, it is problematic if the actual values are zero for observations because this would require dividing by zero, which is a mathematical problem.

Other Reference Materials on Baseline Models that inform the discussion:
In the Granderson, et. al. study cited below, one key question it tackled was: “How can buildings be pre-screened to identify those that are highly model predictable and those that are not, in order to identify estimates of building energy savings that have small errors/uncertainty?”

Granderson, J., Price, P., Jump, D., Addy, N., Sohn, M. 2015. Automated Measurement and Verification: Performance of Public Domain Whole-Building Electric Baseline Models. Applied Energy 144:106-133.

In the Southern California Edison study, a buildings were sorted into four categories to identify applicability of the analytical method.

Southern California Edison with FirstFuel. February 2016. Energy Efficiency Impact Study for the Preferred Resources Pilot

Suggestions on Testing These Metrics
Remember, our goal for testing is to establish our preferred metric and threshold for building qualification. When testing the CVRMSE and MAPE metrics, we have some suggestions to yield the most informative results:

Test residential, commercial, and industrial buildings separately. This provides information on CalTrack’s performance across different building types
Test intercept-only model performance. This will inform model usage decisions in the future

Non-Routine Adjustments:
Some discussion arose regarding the possibility of making non-routine adjustments for sites that are outliers. CalTRACK 1.0 addressed this issue by stipulating specific criteria for accepting a non-routine adjustment. Specifically, if savings exceeded 50% +/-, either party would be able to make an appeal to remove the project from the portfolio. Other specific considerations may be related to program eligibility, such as a house that adds solar panels during a performance period. At a general level, CalTRACK methods shy away from stipulating methods for non-routine adjustments, as these tend to demand substantial additional effort and may require additional data that would run contrary to the premise of using CalTRACK methods in the first place. As CalTRACK 1.0 testing demonstrated, for aggregators of residential projects, larger sample sizes diminish the effect of these outliers.

Participant Homework:

Review the Issues page and plans for testing on building qualifications
Conduct your own tests on relevant questions
Analyze test results as they emerge and comment

1 Comment

Metrics & Model Selection for Building Qualification

3/12/2018

1 Comment

Week Five CalTRACK Update

We look forward to another working group meeting Thursday March 15th. Here are some updates from the activity last week.

Building Qualification Summary:
During the past week, we started discussions to establish an empirical method to determine a building’s suitability for CalTRACK methods. Listed below are two topics related to building qualifications that require testing:

Preferred Metric: The two main metrics being discussed to evaluate a building’s suitability for CalTRACK methods are Coefficient of Variance Root-Mean Standard Error (CVRMSE) and Mean Absolute Percent Error (MAPE). The ideal metric will effectively determine a building’s suitability for CalTRACK methods, while also being convenient for other stakeholders in pay-for-performance programs.
Model Selection: Due to the tendency of buildings that are unqualified for CalTRACK methods to be intercept-only models, it is proposed that the quality of intercept-only models is evaluated based on the variance of mean usage for daily or billing period data instead of the R-Squared criteria.

Some additional areas that require further analysis before our discussion of building qualifications closes:

How do our metrics and methods align with pay-for-performance programs? Do they reflect performance risk and uncertainty?
What should we do with disqualified buildings? What is the best way to accommodate them in pay-for-performance programs?

The discussion of building qualifications is scheduled to close on March 14th, so remember to add proposals, comments, and test results as soon as possible.
Homework for Participants:

Attend the standing, bi-weekly meeting on Thursday, March 15th at 12:00 PST
Review Github and provide feedback on any relevant issues. This will be the last week for suggesting building qualification test plans.

1 Comment

Results from Daily & Billing Period Methods Testing

3/5/2018

1 Comment

Week Four CalTRACK Update

During week 4, we received some interesting results from tests on daily and billing period methods. In this week’s blog post, we analyze the test results and determine their effect on our proposed daily and billing period methods. Additionally, we will introduce the new topic of building qualifications.
(Participant Homework can be found at the bottom of this post)

Test Results:

Weather Station Mapping:
Because we do not have access to weather data at the location of each site, the best approach for estimating a site’s temperature is to use data from nearby, high-quality weather stations. The most intuitive way to “map” which primary and backup weather stations to use for a site is to simply choose weather stations with the shortest distance to the site. Some argue that this simple method fails to account for the unequal distribution of weather patterns over space. For example, imagine a mountain home is technically closer to a weather station in the desert valley than to another weather station in the mountains. We might expect the house’s weather data to be better approximated by the mountain weather station than the desert valley weather station, despite it being further away. To account for this phenomenon, another proposal is to use pre-defined climate zones throughout states to choose the closest primary and backup weather stations that are within that site’s climate zone.

Two proposals for mapping a site’s weather stations:

Method 1: Choosing the primary and backup weather stations that are closest to the site
Method 2: Choosing the primary and backup weather stations that are closest to the site and within the same climate zone

To empirically inform our decision, we ran a simulation for each mapping method where we used the actual weather stations as units, instead of the sites, and compared each method’s accuracy. Our results show that both proposed methods provide very similar results, with “Method 1” and “Method 2” providing a perfect match 53% and 56% of total matches respectively. These results indicate that there are not significant accuracy reductions from choosing the simpler “Method 1".

The Importance of Exact Weather Data:

The purpose of our weather station mapping methods are to ensure that each site has the best possible estimation of their “true” weather values. Because there is uncertainty in estimation of each site’s weather values, a question that follows is: “how important is the accuracy of weather data when predicting energy savings?” To answer this question, we did an empirical experiment that provides some insight.

The Experiment:

We took data from 1 electricity meter and 1 gas meter
We ran a model using data from climate zone weather station mapping (Method 2)
With the same meter data, we ran the same model but used weather data from a set of 2 weather stations for all 50 states in the USA. There is significant weather diversity in the USA, so these results indicate the effect of inaccurate weather on model prediction.
We analyzed the results. In the graphs, there is a dot for each model. This includes a dot for each of the 50 states and one dot for climate zone mapping.

CLICK TO ENLARGE

Results:

Although there are some moderate increases in data error and reduction in model fit that result from adding very inaccurate weather data, our results show that the predicted energy savings are remarkably robust to changes in weather data. This indicates that the accuracy of weather data does not have a significant effect on annual energy savings predictions, even in extreme cases.
It would be very useful to see this hypothesis tested with more data

Maximum Baseline Period Length:
There have been discussions about defining a maximum baseline period because excessively long baseline periods may absorb unnecessary variation that could obscure our model predictions. To determine the effect of longer baseline periods, we calculated baselines of 12, 15, 18, 21, and 24 months. The graph below shows that normalized annual consumption (NAC) can be unstable as we increase the baseline period.

Recommendation:

We recommend using a 12 month baseline as it is most indicative of the period immediately prior to the intervention.
It would be reassuring to see these findings confirmed by others in different datasets

Degree Day Balance Points:
A proposed new method for CalTRACK 2.0 is to use variable balance points instead of fixed balance points on the HDD and CDD variables. In the figure below, we can see that buildings tend to cluster at the limits of balance point degree ranges, which implies that some results may be constrained by small search grids. When the degree range is expanded, the results displays a distribution that is closer to Gaussian.

Although expanding the search grid may uncover a balance point that yields a higher R-squared, the figure on the right shows that these results have a nominal impact on model fit. Regardless, variable balance points are advised because they provide better balance point estimates, which have more interpretation value.

Recommended Allowable Ranges:
HDD: 40-80
CDD: 50-90

Model Selection Criteria
In CalTRACK 1.0, model selection criteria involved:

Filtering out estimators with insignificant p-values (greater than 0.1) and
Choosing model based on the adjusted R squared

In CalTRACK 2.0, we intend to eliminate the p-value screen and select models strictly on the adjusted R squared. We suggest this change because the p-value screen does not increase our model fit and we lose valuable information on estimators when we drop them due to high p-values, as well as eliminating many weather-sensitive model fits.

Handling Billing Data:
Modeling using billing data was underspecified in CalTRACK 1.0. We are proposing to include explicit instructions on modeling using billing data that includes billing periods of different lengths using weighted least squares regression.

New Topics:

Building Qualification:
In this coming week, we will begin our examination of building qualification screening criteria. The CalTRACK methods were initially tested using residential buildings and are currently mainly used to quantify energy savings for residential units and small commercial buildings. The limits of the CalTRACK methods for measuring savings in commercial or industrial buildings (where weather is likely to be a poorer predictor of energy consumption) has been subject to less scrutiny. Our goal is to create empirical tests for determining the energy usage patterns in buildings that qualify a building for CalTRACK methods and exclude those that would be better estimated with different methods. We are looking forward to your input on potential methods and tests to define buildings that qualify for CalTRACK.

Some questions that need to be addressed (and we welcome additional questions):

When should we accept intercept only (non-weather based) models? What's a good metric to assess an intercept-only model fit?
How do our metrics and methods align with pay-for-performance programs? Do they reflect performance risk and uncertainty? Are they convenient for implementers and aggregators?
What should we do with disqualified buildings? Are they eliminated from participating in pay-for-performance programs or is there another way to accommodate them?
What metrics and thresholds are useful for assessing baseline model fit in a pay-for-performance context?

We are looking forward to your input on building qualification in this coming week. There will be a lot to discuss on GitHub Issues. We would like to test proposed building qualification methods empirically before making decisions, so it is important to make method and testing suggestions as soon as possible.

HOMEWORK:

This is the last chance to make suggestions on proposed updates to daily and billing period methods in "FINAL CALL FOR COMMENTS: Monthly and Daily Methods Updates"
- Post comments or alternative results on GitHub before the issue closes
- If you have a comment on the final specifications after the issue is closed, address this in a new issue
Post questions, comments, research, or testing ideas on Building Qualifications in GitHub Issues.

1 Comment

Building Qualifications Test Reveals Wide Applicability of CalTRACK Method for Portfolio Analysis

Building Qualification Method Discussions Continue

Considering Metrics for Building Qualifications

Metrics & Model Selection for Building Qualification

Results from Daily & Billing Period Methods Testing

Sign Up for Technical Working Group Updates

Archives