Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data

Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples...

Full description

Saved in:
Bibliographic Details
Main Authors: Fengqi Li, Chuang Yu, Nanhai Yang, Feng Xia, Guangming Li, Fatemeh Kaveh-Yazdy
Format: Article
Language:English
Published: Wiley 2013-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2013/875450
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849684946941116416
author Fengqi Li
Chuang Yu
Nanhai Yang
Feng Xia
Guangming Li
Fatemeh Kaveh-Yazdy
author_facet Fengqi Li
Chuang Yu
Nanhai Yang
Feng Xia
Guangming Li
Fatemeh Kaveh-Yazdy
author_sort Fengqi Li
collection DOAJ
description Transductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.
format Article
id doaj-art-bcf204ac33d146a88c6f06da72a98345
institution DOAJ
issn 1537-744X
language English
publishDate 2013-01-01
publisher Wiley
record_format Article
series The Scientific World Journal
spelling doaj-art-bcf204ac33d146a88c6f06da72a983452025-08-20T03:23:19ZengWileyThe Scientific World Journal1537-744X2013-01-01201310.1155/2013/875450875450Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced DataFengqi Li0Chuang Yu1Nanhai Yang2Feng Xia3Guangming Li4Fatemeh Kaveh-Yazdy5School of Software, Dalian University of Technology, Dalian 116620, ChinaSchool of Software, Dalian University of Technology, Dalian 116620, ChinaSchool of Software, Dalian University of Technology, Dalian 116620, ChinaSchool of Software, Dalian University of Technology, Dalian 116620, ChinaSchool of Software, Dalian University of Technology, Dalian 116620, ChinaSchool of Software, Dalian University of Technology, Dalian 116620, ChinaTransductive graph-based semisupervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution which happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.http://dx.doi.org/10.1155/2013/875450
spellingShingle Fengqi Li
Chuang Yu
Nanhai Yang
Feng Xia
Guangming Li
Fatemeh Kaveh-Yazdy
Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
The Scientific World Journal
title Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_full Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_fullStr Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_full_unstemmed Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_short Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data
title_sort iterative nearest neighborhood oversampling in semisupervised learning from imbalanced data
url http://dx.doi.org/10.1155/2013/875450
work_keys_str_mv AT fengqili iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT chuangyu iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT nanhaiyang iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT fengxia iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT guangmingli iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata
AT fatemehkavehyazdy iterativenearestneighborhoodoversamplinginsemisupervisedlearningfromimbalanceddata