Systems monitoring platform integrating artificial intelligence for incident response in servers
The increasing complexity of IT management and the need to monitor critical infrastructure metrics, such as CPU usage, memory, storage, and service logs, detect failures, and respond quickly to alerts, imply the adoption of advanced technologies that enable comprehensive monitoring and efficient re...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | Spanish |
| Published: |
Universidad Nacional de San Martín
2025-07-01
|
| Series: | Revista Científica de Sistemas e Informática |
| Subjects: | |
| Online Access: | https://revistas.unsm.edu.pe/index.php/rcsi/article/view/811 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The increasing complexity of IT management and the need to monitor critical infrastructure metrics, such as CPU usage, memory, storage, and service logs, detect failures, and respond quickly to alerts, imply the adoption of advanced technologies that enable comprehensive monitoring and efficient response. This work developed a server monitoring system with alerts sent via Telegram. Additionally, it integrates artificial intelligence to provide immediate solutions to server incidents, using tools such as Grafana and Prometheus for metric collection and Grafana Loki for log management. The OpenAI API was incorporated to analyze the logs and enhance alerts with a detailed diagnosis. A total of 311 tests were conducted, where the results showed that the system notified incidents in an average of 1.02 seconds, while the GPT model completed the analysis in an average of 2.17 seconds, allowing root causes of problems to be identified and timely recommendations for resolution to be generated. It is concluded that the integration of artificial intelligence and proactive monitoring improves incident management, suggesting future applications in IoT environments to enrich monitoring.
|
|---|---|
| ISSN: | 2709-992X |