Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base

In the field of construction engineering, there exists a dynamic evolution of extensive technical standards and specifications (e.g., GB/T and ISO series) that permeate the entire lifecycle of design, construction, and operation–maintenance. These standards require continuous version iteration to ad...

Full description

Saved in:
Bibliographic Details
Main Authors: Haiyuan Wang, Deli Zhang, Jianmin Li, Zelong Feng, Feng Zhang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/6/3134
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849341399474896896
author Haiyuan Wang
Deli Zhang
Jianmin Li
Zelong Feng
Feng Zhang
author_facet Haiyuan Wang
Deli Zhang
Jianmin Li
Zelong Feng
Feng Zhang
author_sort Haiyuan Wang
collection DOAJ
description In the field of construction engineering, there exists a dynamic evolution of extensive technical standards and specifications (e.g., GB/T and ISO series) that permeate the entire lifecycle of design, construction, and operation–maintenance. These standards require continuous version iteration to adapt to technological innovations. Engineers require specialized knowledge bases to assist in understanding and updating these standards. The advancement of large language models (LLMs) and Retrieval-Augmented Generation (RAG) technologies provides robust technical support for constructing domain-specific knowledge bases. This study developed and tested a vertical domain knowledge base construction scheme based on RAG architecture and LLMs, comprising three critical components: entropy-optimized dynamic text segmentation (EDTS), vector correlation-based chunk ranking, and iterative optimization of prompt engineering. This study employs an EDTS method to ensure information clarity and predictability within limited chunk lengths, followed by selecting 10 relevant chunks to form prompts for input into LLMs, thereby enabling efficient retrieval of vertical domain knowledge. Experimental validation using Qwen-series LLMs with a test set of 101 expert-verified questions from Chinese construction industry standard demonstrates that the overall test accuracy reaches 76%. The comparative experiments across model scales (1.5B, 3B, 7B, 14B, 32B, and 72B) quantitatively reveal the relationship between model size, answer accuracy, and execution time, providing decision-making guidance for computational resource-accuracy tradeoffs in engineering practice.
format Article
id doaj-art-ea5d08fcec394d97bc9a35536dee5d0f
institution Kabale University
issn 2076-3417
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-ea5d08fcec394d97bc9a35536dee5d0f2025-08-20T03:43:37ZengMDPI AGApplied Sciences2076-34172025-03-01156313410.3390/app15063134Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge BaseHaiyuan Wang0Deli Zhang1Jianmin Li2Zelong Feng3Feng Zhang4CABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaIn the field of construction engineering, there exists a dynamic evolution of extensive technical standards and specifications (e.g., GB/T and ISO series) that permeate the entire lifecycle of design, construction, and operation–maintenance. These standards require continuous version iteration to adapt to technological innovations. Engineers require specialized knowledge bases to assist in understanding and updating these standards. The advancement of large language models (LLMs) and Retrieval-Augmented Generation (RAG) technologies provides robust technical support for constructing domain-specific knowledge bases. This study developed and tested a vertical domain knowledge base construction scheme based on RAG architecture and LLMs, comprising three critical components: entropy-optimized dynamic text segmentation (EDTS), vector correlation-based chunk ranking, and iterative optimization of prompt engineering. This study employs an EDTS method to ensure information clarity and predictability within limited chunk lengths, followed by selecting 10 relevant chunks to form prompts for input into LLMs, thereby enabling efficient retrieval of vertical domain knowledge. Experimental validation using Qwen-series LLMs with a test set of 101 expert-verified questions from Chinese construction industry standard demonstrates that the overall test accuracy reaches 76%. The comparative experiments across model scales (1.5B, 3B, 7B, 14B, 32B, and 72B) quantitatively reveal the relationship between model size, answer accuracy, and execution time, providing decision-making guidance for computational resource-accuracy tradeoffs in engineering practice.https://www.mdpi.com/2076-3417/15/6/3134large language modelprompt engineeringretrieval-augmented generationconditional entropy
spellingShingle Haiyuan Wang
Deli Zhang
Jianmin Li
Zelong Feng
Feng Zhang
Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base
Applied Sciences
large language model
prompt engineering
retrieval-augmented generation
conditional entropy
title Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base
title_full Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base
title_fullStr Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base
title_full_unstemmed Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base
title_short Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base
title_sort entropy optimized dynamic text segmentation and rag enhanced llms for construction engineering knowledge base
topic large language model
prompt engineering
retrieval-augmented generation
conditional entropy
url https://www.mdpi.com/2076-3417/15/6/3134
work_keys_str_mv AT haiyuanwang entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase
AT delizhang entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase
AT jianminli entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase
AT zelongfeng entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase
AT fengzhang entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase