Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based Reasoning

Managing IP/MPLS networks requires advanced tools due to their inherent complexity. Problems such as chain failures can be particularly challenging to resolve, as a single issue may impact multiple devices. This study introduces an integrated system aimed at improving the management of IP/MPLS netwo...

Full description

Saved in:
Bibliographic Details
Main Authors: Tikumporn Wankvar, Apichon Witayangkurn
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11053841/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849425986241691648
author Tikumporn Wankvar
Apichon Witayangkurn
author_facet Tikumporn Wankvar
Apichon Witayangkurn
author_sort Tikumporn Wankvar
collection DOAJ
description Managing IP/MPLS networks requires advanced tools due to their inherent complexity. Problems such as chain failures can be particularly challenging to resolve, as a single issue may impact multiple devices. This study introduces an integrated system aimed at improving the management of IP/MPLS networks by automatically identifying the root causes of network failures—particularly within large-scale environments. The proposed system features a dual-layered architecture comprising a Log Analysis Layer and an Operation and Maintenance Layer. The Log Analysis Layer enhances message uniformity by standardizing event logs through template generation, which replaces variable elements with wildcards. The Operation and Maintenance Layer includes components such as an event analysis service, node chain lookup, and node test service. These modules work together to identify affected devices, collect diagnostic metrics, and filter out critical events. A supervised learning model is employed to classify event messages, trained on a dataset of over seven million entries. The use of Term Frequency-Inverse Document Frequency for feature extraction improves classification accuracy by emphasizing distinctive terms over commonly occurring ones. Among the models evaluated, the SVM algorithm achieved the highest performance, with an F1-score of 0.969. The system integrates Apache Kafka as a high-throughput message broker to enable real-time processing of SNMP Traps and Syslog data. Additionally, a case-based fault identification service automates fault analysis and provides actionable insights via an interactive dashboard and a notification system that delivers alerts through modern messaging platforms. Experimental results demonstrate significant improvements in network resilience, including reduced reliance on manual troubleshooting, enhanced decision-making accuracy, and faster fault recovery times.
format Article
id doaj-art-5d97597f54e046ad8f0c7db7dfe8229b
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-5d97597f54e046ad8f0c7db7dfe8229b2025-08-20T03:29:35ZengIEEEIEEE Access2169-35362025-01-011311154211155410.1109/ACCESS.2025.358381711053841Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based ReasoningTikumporn Wankvar0Apichon Witayangkurn1https://orcid.org/0000-0003-1454-1820School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Khlong Nueng, Pathum Thani, ThailandSchool of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Khlong Nueng, Pathum Thani, ThailandManaging IP/MPLS networks requires advanced tools due to their inherent complexity. Problems such as chain failures can be particularly challenging to resolve, as a single issue may impact multiple devices. This study introduces an integrated system aimed at improving the management of IP/MPLS networks by automatically identifying the root causes of network failures—particularly within large-scale environments. The proposed system features a dual-layered architecture comprising a Log Analysis Layer and an Operation and Maintenance Layer. The Log Analysis Layer enhances message uniformity by standardizing event logs through template generation, which replaces variable elements with wildcards. The Operation and Maintenance Layer includes components such as an event analysis service, node chain lookup, and node test service. These modules work together to identify affected devices, collect diagnostic metrics, and filter out critical events. A supervised learning model is employed to classify event messages, trained on a dataset of over seven million entries. The use of Term Frequency-Inverse Document Frequency for feature extraction improves classification accuracy by emphasizing distinctive terms over commonly occurring ones. Among the models evaluated, the SVM algorithm achieved the highest performance, with an F1-score of 0.969. The system integrates Apache Kafka as a high-throughput message broker to enable real-time processing of SNMP Traps and Syslog data. Additionally, a case-based fault identification service automates fault analysis and provides actionable insights via an interactive dashboard and a notification system that delivers alerts through modern messaging platforms. Experimental results demonstrate significant improvements in network resilience, including reduced reliance on manual troubleshooting, enhanced decision-making accuracy, and faster fault recovery times.https://ieeexplore.ieee.org/document/11053841/Case-based reasoningevent messageslog analysis network managementroot cause analysisstream processing
spellingShingle Tikumporn Wankvar
Apichon Witayangkurn
Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based Reasoning
IEEE Access
Case-based reasoning
event messages
log analysis network management
root cause analysis
stream processing
title Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based Reasoning
title_full Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based Reasoning
title_fullStr Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based Reasoning
title_full_unstemmed Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based Reasoning
title_short Automated Root Cause Analysis of Network Failures in IP/MPLS Network Using Machine Learning and Case-Based Reasoning
title_sort automated root cause analysis of network failures in ip mpls network using machine learning and case based reasoning
topic Case-based reasoning
event messages
log analysis network management
root cause analysis
stream processing
url https://ieeexplore.ieee.org/document/11053841/
work_keys_str_mv AT tikumpornwankvar automatedrootcauseanalysisofnetworkfailuresinipmplsnetworkusingmachinelearningandcasebasedreasoning
AT apichonwitayangkurn automatedrootcauseanalysisofnetworkfailuresinipmplsnetworkusingmachinelearningandcasebasedreasoning