CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking

Abstract Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexi...

Full description

Saved in:
Bibliographic Details
Main Authors: Hansi Hettiarachchi, Amna Dridi, Mohamed Medhat Gaber, Pouyan Parsafard, Nicoleta Bocaneala, Katja Breitenfelder, Gonçal Costa, Maria Hedblom, Mihaela Juganaru-Mathieu, Thamer Mecharnia, Sumee Park, He Tan, Abdel-Rahman H. Tawil, Edlira Vakaj
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-024-04320-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571974849658880
author Hansi Hettiarachchi
Amna Dridi
Mohamed Medhat Gaber
Pouyan Parsafard
Nicoleta Bocaneala
Katja Breitenfelder
Gonçal Costa
Maria Hedblom
Mihaela Juganaru-Mathieu
Thamer Mecharnia
Sumee Park
He Tan
Abdel-Rahman H. Tawil
Edlira Vakaj
author_facet Hansi Hettiarachchi
Amna Dridi
Mohamed Medhat Gaber
Pouyan Parsafard
Nicoleta Bocaneala
Katja Breitenfelder
Gonçal Costa
Maria Hedblom
Mihaela Juganaru-Mathieu
Thamer Mecharnia
Sumee Park
He Tan
Abdel-Rahman H. Tawil
Edlira Vakaj
author_sort Hansi Hettiarachchi
collection DOAJ
description Abstract Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.
format Article
id doaj-art-063edf6c49c549eda095c72edc4510bd
institution Kabale University
issn 2052-4463
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-063edf6c49c549eda095c72edc4510bd2025-02-02T12:08:17ZengNature PortfolioScientific Data2052-44632025-01-0112111410.1038/s41597-024-04320-xCODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checkingHansi Hettiarachchi0Amna Dridi1Mohamed Medhat Gaber2Pouyan Parsafard3Nicoleta Bocaneala4Katja Breitenfelder5Gonçal Costa6Maria Hedblom7Mihaela Juganaru-Mathieu8Thamer Mecharnia9Sumee Park10He Tan11Abdel-Rahman H. Tawil12Edlira Vakaj13Faculty of Science and Technology, Lancaster UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityFraunhofer Institute for Building Physics IBP, Department Indoor Climate and Climatic ImpactsHuman Environment Research (HER), La Salle, Ramon Llull UniversityDepartment of Computing, School of Engineering, Jönköping University, Box 1026Mines Saint-Etienne, Institut Henri Fayol, Département ISIUniversité de Lorraine, CNRS, LORIAFraunhofer Institute for Building Physics IBP, Department Indoor Climate and Climatic ImpactsDepartment of Computing, School of Engineering, Jönköping University, Box 1026Faculty of Computing, Engineering and Built Environment, Birmingham City UniversityFaculty of Computing, Engineering and Built Environment, Birmingham City UniversityAbstract Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.https://doi.org/10.1038/s41597-024-04320-x
spellingShingle Hansi Hettiarachchi
Amna Dridi
Mohamed Medhat Gaber
Pouyan Parsafard
Nicoleta Bocaneala
Katja Breitenfelder
Gonçal Costa
Maria Hedblom
Mihaela Juganaru-Mathieu
Thamer Mecharnia
Sumee Park
He Tan
Abdel-Rahman H. Tawil
Edlira Vakaj
CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
Scientific Data
title CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
title_full CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
title_fullStr CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
title_full_unstemmed CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
title_short CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking
title_sort code accord a corpus of building regulatory data for rule generation towards automatic compliance checking
url https://doi.org/10.1038/s41597-024-04320-x
work_keys_str_mv AT hansihettiarachchi codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT amnadridi codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT mohamedmedhatgaber codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT pouyanparsafard codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT nicoletabocaneala codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT katjabreitenfelder codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT goncalcosta codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT mariahedblom codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT mihaelajuganarumathieu codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT thamermecharnia codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT sumeepark codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT hetan codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT abdelrahmanhtawil codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking
AT edliravakaj codeaccordacorpusofbuildingregulatorydataforrulegenerationtowardsautomaticcompliancechecking