A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports
Abstract Railway operational equipment is crucial for ensuring the safe, smooth, and efficient operation of trains. Comprehensive analysis and mining of historical railway operational equipment failure (ROEF) reports are of significant importance for improving railway safety. Currently, significant...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-11622-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849333072191815680 |
|---|---|
| author | Xiaorui Yang Honghui Li Yi Xu Nahao Shen Ruiyi He |
| author_facet | Xiaorui Yang Honghui Li Yi Xu Nahao Shen Ruiyi He |
| author_sort | Xiaorui Yang |
| collection | DOAJ |
| description | Abstract Railway operational equipment is crucial for ensuring the safe, smooth, and efficient operation of trains. Comprehensive analysis and mining of historical railway operational equipment failure (ROEF) reports are of significant importance for improving railway safety. Currently, significant challenges in comprehensively analyzing ROEF reports arise due to limitations in text mining technologies. To address this concern, this study leverages advanced text mining techniques to thoroughly analyze these reports. Firstly, real historical failure report data provided by a Chinese railway bureau is used as the data source. The data is preprocessed and an ROEF corpus is constructed according to the related standard. Secondly, based on this corpus, text mining techniques are introduced to build an innovative named entity recognition (NER) model. This model combines bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM) networks, and conditional random fields (CRF), with an additional entity attention layer to deeply extract entity features. This network architecture is used to classify specific entities in the unstructured data of failure reports. Finally, a knowledge graph (KG) is constructed using the Neo4j database to store and visualize the extracted ROEF-related entities and relationships. The results indicate that by constructing the topological relationships of the ROEF network, this study enables the analysis and visualization of potential relationships of historical failure factors, laying a foundation for predicting failures and enhancing railway safety, while also filling the current gap in the mining and analysis of ROEF reports. |
| format | Article |
| id | doaj-art-6ec79c05896842958bcec7f132cb629b |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-6ec79c05896842958bcec7f132cb629b2025-08-20T03:46:00ZengNature PortfolioScientific Reports2045-23222025-07-0115112110.1038/s41598-025-11622-6A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reportsXiaorui Yang0Honghui Li1Yi Xu2Nahao Shen3Ruiyi He4School of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversitySchool of Computer Science and Technology, Beijing Jiaotong UniversityAbstract Railway operational equipment is crucial for ensuring the safe, smooth, and efficient operation of trains. Comprehensive analysis and mining of historical railway operational equipment failure (ROEF) reports are of significant importance for improving railway safety. Currently, significant challenges in comprehensively analyzing ROEF reports arise due to limitations in text mining technologies. To address this concern, this study leverages advanced text mining techniques to thoroughly analyze these reports. Firstly, real historical failure report data provided by a Chinese railway bureau is used as the data source. The data is preprocessed and an ROEF corpus is constructed according to the related standard. Secondly, based on this corpus, text mining techniques are introduced to build an innovative named entity recognition (NER) model. This model combines bidirectional encoder representations from transformers (BERT), bidirectional long short-term memory (BiLSTM) networks, and conditional random fields (CRF), with an additional entity attention layer to deeply extract entity features. This network architecture is used to classify specific entities in the unstructured data of failure reports. Finally, a knowledge graph (KG) is constructed using the Neo4j database to store and visualize the extracted ROEF-related entities and relationships. The results indicate that by constructing the topological relationships of the ROEF network, this study enables the analysis and visualization of potential relationships of historical failure factors, laying a foundation for predicting failures and enhancing railway safety, while also filling the current gap in the mining and analysis of ROEF reports.https://doi.org/10.1038/s41598-025-11622-6Text miningRailway operational equipment failureBERTBiLSTMCRFKnowledge graph |
| spellingShingle | Xiaorui Yang Honghui Li Yi Xu Nahao Shen Ruiyi He A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports Scientific Reports Text mining Railway operational equipment failure BERT BiLSTM CRF Knowledge graph |
| title | A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports |
| title_full | A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports |
| title_fullStr | A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports |
| title_full_unstemmed | A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports |
| title_short | A text mining-based approach for comprehensive understanding of Chinese railway operational equipment failure reports |
| title_sort | text mining based approach for comprehensive understanding of chinese railway operational equipment failure reports |
| topic | Text mining Railway operational equipment failure BERT BiLSTM CRF Knowledge graph |
| url | https://doi.org/10.1038/s41598-025-11622-6 |
| work_keys_str_mv | AT xiaoruiyang atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT honghuili atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT yixu atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT nahaoshen atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT ruiyihe atextminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT xiaoruiyang textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT honghuili textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT yixu textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT nahaoshen textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports AT ruiyihe textminingbasedapproachforcomprehensiveunderstandingofchineserailwayoperationalequipmentfailurereports |