CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
Abstract Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexi...
Saved in:
Main Authors: | , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-024-04320-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571974849658880 |
---|---|
author | Hansi Hettiarachchi Amna Dridi Mohamed Medhat Gaber Pouyan Parsafard Nicoleta Bocaneala Katja Breitenfelder Gonçal Costa Maria Hedblom Mihaela Juganaru-Mathieu Thamer Mecharnia Sumee Park He Tan Abdel-Rahman H. Tawil Edlira Vakaj |
author_facet | Hansi Hettiarachchi Amna Dridi Mohamed Medhat Gaber Pouyan Parsafard Nicoleta Bocaneala Katja Breitenfelder Gonçal Costa Maria Hedblom Mihaela Juganaru-Mathieu Thamer Mecharnia Sumee Park He Tan Abdel-Rahman H. Tawil Edlira Vakaj |
author_sort | Hansi Hettiarachchi |
collection | DOAJ |
description | Abstract Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC. |
format | Article |
id | doaj-art-063edf6c49c549eda095c72edc4510bd |
institution | Kabale University |
issn | 2052-4463 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj-art-063edf6c49c549eda095c72edc4510bd2025-02-02T12:08:17ZengNature PortfolioScientific Data2052-44632025-01-0112111410.1038/s41597-024-04320-xCODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checkingHansi Hettiarachchi0Amna Dridi1Mohamed Medhat Gaber2Pouyan Parsafard3Nicoleta Bocaneala4Katja Breitenfelder5Gonçal Costa6Maria Hedblom7Mihaela Juganaru-Mathieu8Thamer Mecharnia9Sumee Park10He Tan11Abdel-Rahman H. Tawil12Edlira Vakaj13Faculty of Science and Technology, Lancaster UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFraunhofer Institute for Building Physics IBP, Department Indoor Climate and Climatic ImpactsHuman Environment Research (HER), La Salle, Ramon Llull UniversityDepartment of Computing, School of Engineering, Jönköping University, Box 1026Mines Saint-Etienne, Institut Henri Fayol, Département ISIUniversité de Lorraine, CNRS, LORIAFraunhofer Institute for Building Physics IBP, Department Indoor Climate and Climatic ImpactsDepartment of Computing, School of Engineering, Jönköping University, Box 1026Faculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityAbstract Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.https://doi.org/10.1038/s41597-024-04320-x |
spellingShingle | Hansi Hettiarachchi Amna Dridi Mohamed Medhat Gaber Pouyan Parsafard Nicoleta Bocaneala Katja Breitenfelder Gonçal Costa Maria Hedblom Mihaela Juganaru-Mathieu Thamer Mecharnia Sumee Park He Tan Abdel-Rahman H. Tawil Edlira Vakaj CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking Scientific Data |
title | CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking |
title_full | CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking |
title_fullStr | CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking |
title_full_unstemmed | CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking |
title_short | CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking |
title_sort | code accord a corpus of building regulatory data for rule generation towards automatic compliance checking |
url | https://doi.org/10.1038/s41597-024-04320-x |
work_keys_str_mv | AT hansihettiarachchi codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT amnadridi codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT mohamedmedhatgaber codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT pouyanparsafard codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT nicoletabocaneala codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT katjabreitenfelder codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT goncalcosta codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT mariahedblom codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT mihaelajuganarumathieu codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT thamermecharnia codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT sumeepark codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT hetan codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT abdelrahmanhtawil codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking AT edliravakaj codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking |