An Ant Colony Optimization Based Feature Selection for Web Page Classification

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as...

Full description

Saved in:
Bibliographic Details
Main Authors: Esra Saraç, Selma Ayşe Özel
Format: Article
Language:English
Published: Wiley 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/649260
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832551501130629120
author Esra Saraç
Selma Ayşe Özel
author_facet Esra Saraç
Selma Ayşe Özel
author_sort Esra Saraç
collection DOAJ
description The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
format Article
id doaj-art-e30930b5a00040f6ac12216a3eb320cd
institution Kabale University
issn 2356-6140
1537-744X
language English
publishDate 2014-01-01
publisher Wiley
record_format Article
series The Scientific World Journal
spelling doaj-art-e30930b5a00040f6ac12216a3eb320cd2025-02-03T06:01:20ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/649260649260An Ant Colony Optimization Based Feature Selection for Web Page ClassificationEsra Saraç0Selma Ayşe Özel1Department of Computer Engineering, Çukurova University, Balcali, Sarıçam, 01330 Adana, TurkeyDepartment of Computer Engineering, Çukurova University, Balcali, Sarıçam, 01330 Adana, TurkeyThe increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.http://dx.doi.org/10.1155/2014/649260
spellingShingle Esra Saraç
Selma Ayşe Özel
An Ant Colony Optimization Based Feature Selection for Web Page Classification
The Scientific World Journal
title An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_full An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_fullStr An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_full_unstemmed An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_short An Ant Colony Optimization Based Feature Selection for Web Page Classification
title_sort ant colony optimization based feature selection for web page classification
url http://dx.doi.org/10.1155/2014/649260
work_keys_str_mv AT esrasarac anantcolonyoptimizationbasedfeatureselectionforwebpageclassification
AT selmaayseozel anantcolonyoptimizationbasedfeatureselectionforwebpageclassification
AT esrasarac antcolonyoptimizationbasedfeatureselectionforwebpageclassification
AT selmaayseozel antcolonyoptimizationbasedfeatureselectionforwebpageclassification