A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports

Abstract Railway operational equipment is crucial for ensuring the safe, smooth, and efficient operation of trains. Comprehensive analysis and mining of historical railway operational equipment failure (ROEF) reports are of significant importance for improving railway safety. Currently, significant...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaorui Yang, Honghui Li, Yi Xu, Nahao Shen, Ruiyi He
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-11622-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849333072191815680
author Xiaorui Yang
Honghui Li
Yi Xu
Nahao Shen
Ruiyi He
author_facet Xiaorui Yang
Honghui Li
Yi Xu
Nahao Shen
Ruiyi He
author_sort Xiaorui Yang
collection DOAJ
description Abstract Railway operational equipment is crucial for ensuring the safe, smooth, and efficient operation of trains. Comprehensive analysis and mining of historical railway operational equipment failure (ROEF) reports are of significant importance for improving railway safety. Currently, significant challenges in comprehensively analyzing ROEF reports arise due to limitations in text mining technologies. To address this concern, this study leverages advanced text mining techniques to thoroughly analyze these reports. Firstly, real historical failure report data provided by a Chinese railway bureau is used as the data source. The data is preprocessed and an ROEF corpus is constructed according to the related standard. Secondly, based on this corpus, text mining techniques are introduced to build an innovative named entity recognition (NER) model. This model combines bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM) networks, and conditional random fields (CRF), with an additional entity attention layer to deeply extract entity features. This network architecture is used to classify specific entities in the unstructured data of failure reports. Finally, a knowledge graph (KG) is constructed using the Neo4j database to store and visualize the extracted ROEF-related entities and relationships. The results indicate that by constructing the topological relationships of the ROEF network, this study enables the analysis and visualization of potential relationships of historical failure factors, laying a foundation for predicting failures and enhancing railway safety, while also filling the current gap in the mining and analysis of ROEF reports.
format Article
id doaj-art-6ec79c05896842958bcec7f132cb629b
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-6ec79c05896842958bcec7f132cb629b2025-08-20T03:46:00ZengNature PortfolioScientific Reports2045-23222025-07-0115112110.1038/s41598-025-11622-6A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reportsXiaorui Yang0Honghui Li1Yi Xu2Nahao Shen3Ruiyi He4School of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversityAbstract Railway operational equipment is crucial for ensuring the safe, smooth, and efficient operation of trains. Comprehensive analysis and mining of historical railway operational equipment failure (ROEF) reports are of significant importance for improving railway safety. Currently, significant challenges in comprehensively analyzing ROEF reports arise due to limitations in text mining technologies. To address this concern, this study leverages advanced text mining techniques to thoroughly analyze these reports. Firstly, real historical failure report data provided by a Chinese railway bureau is used as the data source. The data is preprocessed and an ROEF corpus is constructed according to the related standard. Secondly, based on this corpus, text mining techniques are introduced to build an innovative named entity recognition (NER) model. This model combines bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM) networks, and conditional random fields (CRF), with an additional entity attention layer to deeply extract entity features. This network architecture is used to classify specific entities in the unstructured data of failure reports. Finally, a knowledge graph (KG) is constructed using the Neo4j database to store and visualize the extracted ROEF-related entities and relationships. The results indicate that by constructing the topological relationships of the ROEF network, this study enables the analysis and visualization of potential relationships of historical failure factors, laying a foundation for predicting failures and enhancing railway safety, while also filling the current gap in the mining and analysis of ROEF reports.https://doi.org/10.1038/s41598-025-11622-6Text miningRailway operational equipment failureBERTBiLSTMCRFKnowledge graph
spellingShingle Xiaorui Yang
Honghui Li
Yi Xu
Nahao Shen
Ruiyi He
A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports
Scientific Reports
Text mining
Railway operational equipment failure
BERT
BiLSTM
CRF
Knowledge graph
title A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports
title_full A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports
title_fullStr A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports
title_full_unstemmed A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports
title_short A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports
title_sort text mining based approach for comprehensive understanding of chinese railway operational equipment failure reports
topic Text mining
Railway operational equipment failure
BERT
BiLSTM
CRF
Knowledge graph
url https://doi.org/10.1038/s41598-025-11622-6
work_keys_str_mv AT xiaoruiyang atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT honghuili atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT yixu atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT nahaoshen atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT ruiyihe atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT xiaoruiyang textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT honghuili textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT yixu textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT nahaoshen textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports
AT ruiyihe textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports