Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base
In the field of construction engineering, there exists a dynamic evolution of extensive technical standards and specifications (e.g., GB/T and ISO series) that permeate the entire lifecycle of design, construction, and operation–maintenance. These standards require continuous version iteration to ad...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/6/3134 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849341399474896896 |
|---|---|
| author | Haiyuan Wang Deli Zhang Jianmin Li Zelong Feng Feng Zhang |
| author_facet | Haiyuan Wang Deli Zhang Jianmin Li Zelong Feng Feng Zhang |
| author_sort | Haiyuan Wang |
| collection | DOAJ |
| description | In the field of construction engineering, there exists a dynamic evolution of extensive technical standards and specifications (e.g., GB/T and ISO series) that permeate the entire lifecycle of design, construction, and operation–maintenance. These standards require continuous version iteration to adapt to technological innovations. Engineers require specialized knowledge bases to assist in understanding and updating these standards. The advancement of large language models (LLMs) and Retrieval-Augmented Generation (RAG) technologies provides robust technical support for constructing domain-specific knowledge bases. This study developed and tested a vertical domain knowledge base construction scheme based on RAG architecture and LLMs, comprising three critical components: entropy-optimized dynamic text segmentation (EDTS), vector correlation-based chunk ranking, and iterative optimization of prompt engineering. This study employs an EDTS method to ensure information clarity and predictability within limited chunk lengths, followed by selecting 10 relevant chunks to form prompts for input into LLMs, thereby enabling efficient retrieval of vertical domain knowledge. Experimental validation using Qwen-series LLMs with a test set of 101 expert-verified questions from Chinese construction industry standard demonstrates that the overall test accuracy reaches 76%. The comparative experiments across model scales (1.5B, 3B, 7B, 14B, 32B, and 72B) quantitatively reveal the relationship between model size, answer accuracy, and execution time, providing decision-making guidance for computational resource-accuracy tradeoffs in engineering practice. |
| format | Article |
| id | doaj-art-ea5d08fcec394d97bc9a35536dee5d0f |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-ea5d08fcec394d97bc9a35536dee5d0f2025-08-20T03:43:37ZengMDPI AGApplied Sciences2076-34172025-03-01156313410.3390/app15063134Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge BaseHaiyuan Wang0Deli Zhang1Jianmin Li2Zelong Feng3Feng Zhang4CABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaCABR Testing Center Co., Ltd., Beijing 100013, ChinaIn the field of construction engineering, there exists a dynamic evolution of extensive technical standards and specifications (e.g., GB/T and ISO series) that permeate the entire lifecycle of design, construction, and operation–maintenance. These standards require continuous version iteration to adapt to technological innovations. Engineers require specialized knowledge bases to assist in understanding and updating these standards. The advancement of large language models (LLMs) and Retrieval-Augmented Generation (RAG) technologies provides robust technical support for constructing domain-specific knowledge bases. This study developed and tested a vertical domain knowledge base construction scheme based on RAG architecture and LLMs, comprising three critical components: entropy-optimized dynamic text segmentation (EDTS), vector correlation-based chunk ranking, and iterative optimization of prompt engineering. This study employs an EDTS method to ensure information clarity and predictability within limited chunk lengths, followed by selecting 10 relevant chunks to form prompts for input into LLMs, thereby enabling efficient retrieval of vertical domain knowledge. Experimental validation using Qwen-series LLMs with a test set of 101 expert-verified questions from Chinese construction industry standard demonstrates that the overall test accuracy reaches 76%. The comparative experiments across model scales (1.5B, 3B, 7B, 14B, 32B, and 72B) quantitatively reveal the relationship between model size, answer accuracy, and execution time, providing decision-making guidance for computational resource-accuracy tradeoffs in engineering practice.https://www.mdpi.com/2076-3417/15/6/3134large language modelprompt engineeringretrieval-augmented generationconditional entropy |
| spellingShingle | Haiyuan Wang Deli Zhang Jianmin Li Zelong Feng Feng Zhang Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base Applied Sciences large language model prompt engineering retrieval-augmented generation conditional entropy |
| title | Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base |
| title_full | Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base |
| title_fullStr | Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base |
| title_full_unstemmed | Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base |
| title_short | Entropy-Optimized Dynamic Text Segmentation and RAG-Enhanced LLMs for Construction Engineering Knowledge Base |
| title_sort | entropy optimized dynamic text segmentation and rag enhanced llms for construction engineering knowledge base |
| topic | large language model prompt engineering retrieval-augmented generation conditional entropy |
| url | https://www.mdpi.com/2076-3417/15/6/3134 |
| work_keys_str_mv | AT haiyuanwang entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase AT delizhang entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase AT jianminli entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase AT zelongfeng entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase AT fengzhang entropyoptimizeddynamictextsegmentationandragenhancedllmsforconstructionengineeringknowledgebase |