Surrogate Modeling for Building Design: Energy and Cost Prediction Compared to Simulation-Based Methods

Designing energy-efficient buildings is essential for reducing global energy consumption and carbon emissions. However, traditional physics-based simulation models require substantial computational resources, detailed input data, and domain expertise. To address these limitations, this study investi...

Full description

Saved in:
Bibliographic Details
Main Authors: Navid Shirzadi, Dominic Lau, Meli Stylianou
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Buildings
Subjects:
Online Access:https://www.mdpi.com/2075-5309/15/13/2361
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Designing energy-efficient buildings is essential for reducing global energy consumption and carbon emissions. However, traditional physics-based simulation models require substantial computational resources, detailed input data, and domain expertise. To address these limitations, this study investigates the use of three machine learning-based surrogate models—Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP)—trained on a synthetic dataset of 2000 EnergyPlus-simulated building design scenarios to predict both energy use intensity (EUI) and cost estimates for midrise apartment buildings in the Toronto area. All three models exhibit strong predictive performance, with R<sup>2</sup> values exceeding 0.9 for both EUI and cost. XGBoost achieves the best performance in cost prediction on the testing dataset with a root mean squared error (RMSE) of 5.13 CAD/m<sup>2</sup>, while MLP outperforms others in EUI prediction with a testing RMSE of 0.002 GJ/m<sup>2</sup>. In terms of computational efficiency, the surrogate models significantly outperform a physics-based simulation model, with MLP running approximately 340 times faster and XGBoost and RF achieving over 200 times speedup. This study also examines the effect of training dataset size on model performance, identifying a point of diminishing returns where further increases in data size yield minimal accuracy gains but substantially higher training times. To enhance model interpretability, SHapley Additive exPlanations (SHAP) analysis is used to quantify feature importance, revealing how different model types prioritize design parameters. A parametric design configuration analysis further evaluates the models’ sensitivity to changes in building envelope features. Overall, the findings demonstrate that machine learning-based surrogate models can serve as fast, accurate, and interpretable alternatives to traditional simulation methods, supporting efficient decision-making during early-stage building design.
ISSN:2075-5309