Demystifying Steam: Interactive Map

We found that age is the strongest indicator of whether a building is steam heated: those constructed after 1968 are significantly less likely to use steam. A building’s number of stories, use type, and boiler type are also useful characteristics for predicting heating system type. Explore the results of our analysis in the interactive map below. 

Predicted Steam Heat in Small and Medium Multifamily Properties 

Each block represents an individual zip code, and the color reflects the percentage of steam heated properties within the area. Zip codes with fewer than 10 small and medium multifamily properties appear in gray due to insufficient data.

The map is best viewed in Google's Chrome browser. If you experience any issues, please try clearing your cache or view the map on CARTO.

Predictive Modeling

Predictive modeling uses data to forecast outcomes by finding correlations between variables. Patterns in the data can inform predictions or unlabelled data. Predictive modeling is widely utilized in many fields, but offers untapped potential for better analyses in the building industry.


Our model was developed to predict binary categorical outcomes for each building (steam heat or non-steam heat) based on heating system type classifications from 2,500 NYC Local Law 87 multifamily building audits of 50,000 square feet or more.

The model used known building characteristics such as year built and building floor area from the NYC tax database to identify trends and predict outcomes. We used a random forest model, which is made up of many randomly sampled decision trees.1 This algorithm was the best choice for this project because it prevents overfitting that can occur with simpler algorithms and unequally split sample data.2

The result is a useful tool that accurately predicts the prevalence of steam heat in NYC’s unaudited multifamily buildings between 5,000 and 50,000 square feet. Accuracy was measured and verified in a number of key ways (test error rate = 0.12, out of bag error rate = 0.11) and efficacy has been proven and optimized.3

Download the Report
Learn More

1Learn more about random forests here.
2Learn more about overfitting and fitting models here.
3Learn more about out of bag error here.