Leveraging Word Embeddings and Transformers to Extract Semantics from Building Regulations Text

Okonkwo, Odinakachukwu and Dridi, Amna and Vakaj, Edlira (2023) Leveraging Word Embeddings and Transformers to Extract Semantics from Building Regulations Text. In: 11th Linked Data in Architecture and Construction Workshop, 15th - 16th June 2023, Matera, Italy.

LDAC2023_paper_7205.pdf - Published Version
Available under License Creative Commons Attribution.

Download (306kB)


In the recent years, the interest to knowledge extraction in the architecture, engineering and construction (AEC) domain has grown dramatically. Along with the advances in the AEC domain, a massive amount of data is collected from sensors, project management software, drones and 3D scanning. However, the construction regulatory knowledge has maintained primarily in the form of unstructured text. Natural Language Processing (NLP) has been recently introduced to the construction industry to extract underlying knowledge from unstructured data. For instance, NLP can be used to extract key information from construction contracts and specifications, identify potential risks, and automate compliance checking. It is considered impractical for construction engineers and stakeholders to author formal, accurate, and structured building regulatory rules. However, previous efforts on extracting knowledge from unstructured text in AEC domain have mainly focused on basic concepts and hierarchies for ontology engineering using traditional NLP techniques, rather than deeply digging in the nature of the used NLP techniques and their abilities to capture semantics from the building regulations text. In this context, this paper focuses on the development of a semantic-based testing approach that studies the performance of modern NLP techniques, namely word embeddings and transformers, on extracting semantic regularities within the building regulatory text. Specifically, this paper studies the ability of word2vec, BERT, and Sentence BERT (SBERT) to extract semantic regularities from the British building regulations at both word and sentence levels. The UK building regulations code has been used as a dataset. The ground truth of semantic regulations has been manually curated from the well-established Brick Ontology to test the performance of the proposed NLP techniques to capture the semantic regularities from the building regulatory text. Both quantitative and qualitative analyses have been performed, and the obtained results show that modern NLP techniques can reliably capture semantic regularities from the building regulations text at both word and sentence levels, with an accuracy that reaches 80% at the word-level, and hits 100% at the sentence-level.

Item Type: Conference or Workshop Item (Paper)
1 June 2023Accepted
16 June 2023Published Online
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Gemma Tonks
Date Deposited: 16 Feb 2024 14:18
Last Modified: 16 Feb 2024 14:18
URI: https://www.open-access.bcu.ac.uk/id/eprint/15260

Actions (login required)

View Item View Item


In this section...