Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis

The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with...

Full description

Saved in:

Bibliographic Details
Main Authors:	Moïse Manyol, Samuel Eke, Alphonse J. M. Massoma, Alain Biboum, Ruben Mouangue
Format:	Article
Language:	English
Published:	Wiley 2022-01-01
Series:	International Transactions on Electrical Energy Systems
Online Access:	http://dx.doi.org/10.1155/2022/8546588
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832563110050791424
author	Moïse Manyol Samuel Eke Alphonse J. M. Massoma Alain Biboum Ruben Mouangue
author_facet	Moïse Manyol Samuel Eke Alphonse J. M. Massoma Alain Biboum Ruben Mouangue
author_sort	Moïse Manyol
collection	DOAJ
description	The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with outliers, redundant, and even more missing values. Missing data and outliers are ubiquitous in our databases, and imputation techniques will help us mitigate their influence. To solve this problem, as well as the problem of data size, this paper proposes a data preprocessing approach based on the k-nearest neighbor (KNN) completion for imputation of missing data and principal component analysis (PCA) for processing redundant data, thus reducing the data size by generating a significant quality sample after imputation of missing and outlier data. A rigorous comparison is made between our approach and two others. The dissolved gas data from Rio Tinto Alcan’s transformer T0001 were imputed by KNN, where k equals 5. For 6 imputed gases, the average percentage error is about 2%, 17.5% after average imputation, and 23.65% after multiple imputations. For data compression, 2 axes were selected based on the elbow rule and the Kaiser threshold.
format	Article
id	doaj-art-a8a922bb34c54f509c79f42b349c6c0e
institution	Kabale University
issn	2050-7038
language	English
publishDate	2022-01-01
publisher	Wiley
record_format	Article
series	International Transactions on Electrical Energy Systems
spelling	doaj-art-a8a922bb34c54f509c79f42b349c6c0e2025-02-03T01:21:04ZengWileyInternational Transactions on Electrical Energy Systems2050-70382022-01-01202210.1155/2022/8546588Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component AnalysisMoïse Manyol0Samuel Eke1Alphonse J. M. Massoma2Alain Biboum3Ruben Mouangue4Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)Energy, Materials, Modeling, and Methods Research Laboratory (LE3M)Mechanical and Industrial Engineering DepartmentEnergy, Materials, Modeling, and Methods Research Laboratory (LE3M)The accuracy of a knowledge extraction algorithm in a large database depends on the quality of the data preprocessing and the methods used. The massive amounts of data that we collect every day are putting storage capacity at a premium. In reality, many databases are characterized by attributes with outliers, redundant, and even more missing values. Missing data and outliers are ubiquitous in our databases, and imputation techniques will help us mitigate their influence. To solve this problem, as well as the problem of data size, this paper proposes a data preprocessing approach based on the k-nearest neighbor (KNN) completion for imputation of missing data and principal component analysis (PCA) for processing redundant data, thus reducing the data size by generating a significant quality sample after imputation of missing and outlier data. A rigorous comparison is made between our approach and two others. The dissolved gas data from Rio Tinto Alcan’s transformer T0001 were imputed by KNN, where k equals 5. For 6 imputed gases, the average percentage error is about 2%, 17.5% after average imputation, and 23.65% after multiple imputations. For data compression, 2 axes were selected based on the elbow rule and the Kaiser threshold.http://dx.doi.org/10.1155/2022/8546588
spellingShingle	Moïse Manyol Samuel Eke Alphonse J. M. Massoma Alain Biboum Ruben Mouangue Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis International Transactions on Electrical Energy Systems
title	Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_full	Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_fullStr	Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_full_unstemmed	Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_short	Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis
title_sort	preprocessing approach for power transformer maintenance data mining based on k nearest neighbor completion and principal component analysis
url	http://dx.doi.org/10.1155/2022/8546588
work_keys_str_mv	AT moisemanyol preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis AT samueleke preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis AT alphonsejmmassoma preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis AT alainbiboum preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis AT rubenmouangue preprocessingapproachforpowertransformermaintenancedataminingbasedonknearestneighborcompletionandprincipalcomponentanalysis

Preprocessing Approach for Power Transformer Maintenance Data Mining Based on k-Nearest Neighbor Completion and Principal Component Analysis

Similar Items