Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis

The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with...

Full description

Saved in:
Bibliographic Details
Main Authors: Moïse Manyol, Samuel Eke, Alphonse J. M. Massoma, Alain Biboum, Ruben Mouangue
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:International Transactions on Electrical Energy Systems
Online Access:http://dx.doi.org/10.1155/2022/8546588
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832563110050791424
author Moïse Manyol
Samuel Eke
Alphonse J. M. Massoma
Alain Biboum
Ruben Mouangue
author_facet Moïse Manyol
Samuel Eke
Alphonse J. M. Massoma
Alain Biboum
Ruben Mouangue
author_sort Moïse Manyol
collection DOAJ
description The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with outliers, redundant, and even more missing values. Missing data and outliers are ubiquitous in our databases, and imputation techniques will help us mitigate their influence. To solve this problem, as well as the problem of data size, this paper proposes a data preprocessing approach based on the k-nearest neighbor (KNN) completion for imputation of missing data and principal component analysis (PCA) for processing redundant data, thus reducing the data size by generating a significant quality sample after imputation of missing and outlier data. A rigorous comparison is made between our approach and two others. The dissolved gas data from Rio Tinto Alcan’s transformer T0001 were imputed by KNN, where k equals 5. For 6 imputed gases, the average percentage error is about 2%, 17.5% after average imputation, and 23.65% after multiple imputations. For data compression, 2 axes were selected based on the elbow rule and the Kaiser threshold.
format Article
id doaj-art-a8a922bb34c54f509c79f42b349c6c0e
institution Kabale University
issn 2050-7038
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series International Transactions on Electrical Energy Systems
spelling doaj-art-a8a922bb34c54f509c79f42b349c6c0e2025-02-03T01:21:04ZengWileyInternational Transactions on Electrical Energy Systems2050-70382022-01-01202210.1155/2022/8546588Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component AnalysisMoïse Manyol0Samuel Eke1Alphonse J. M. Massoma2Alain Biboum3Ruben Mouangue4Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)Mechanical and Industrial Engineering DepartmentEnergy, Materials, Modeling, and Methods Research Laboratory (LE3M)The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with outliers, redundant, and even more missing values. Missing data and outliers are ubiquitous in our databases, and imputation techniques will help us mitigate their influence. To solve this problem, as well as the problem of data size, this paper proposes a data preprocessing approach based on the k-nearest neighbor (KNN) completion for imputation of missing data and principal component analysis (PCA) for processing redundant data, thus reducing the data size by generating a significant quality sample after imputation of missing and outlier data. A rigorous comparison is made between our approach and two others. The dissolved gas data from Rio Tinto Alcan’s transformer T0001 were imputed by KNN, where k equals 5. For 6 imputed gases, the average percentage error is about 2%, 17.5% after average imputation, and 23.65% after multiple imputations. For data compression, 2 axes were selected based on the elbow rule and the Kaiser threshold.http://dx.doi.org/10.1155/2022/8546588
spellingShingle Moïse Manyol
Samuel Eke
Alphonse J. M. Massoma
Alain Biboum
Ruben Mouangue
Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
International Transactions on Electrical Energy Systems
title Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_full Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_fullStr Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_full_unstemmed Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_short Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_sort preprocessing approach for power transformer maintenance data mining based on k nearest neighbor completion and principal component analysis
url http://dx.doi.org/10.1155/2022/8546588
work_keys_str_mv AT moisemanyol preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis
AT samueleke preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis
AT alphonsejmmassoma preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis
AT alainbiboum preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis
AT rubenmouangue preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis