Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing project...

Full description

Saved in:

Bibliographic Details
Main Authors:	Muhammad Javed Iqbal, Ibrahima Faye, Brahim Belhaouari Samir, Abas Md Said
Format:	Article
Language:	English
Published:	Wiley 2014-01-01
Series:	The Scientific World Journal
Online Access:	http://dx.doi.org/10.1155/2014/173869
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832559031060791296
author	Muhammad Javed Iqbal Ibrahima Faye Brahim Belhaouari Samir Abas Md Said
author_facet	Muhammad Javed Iqbal Ibrahima Faye Brahim Belhaouari Samir Abas Md Said
author_sort	Muhammad Javed Iqbal
collection	DOAJ
description	Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.
format	Article
id	doaj-art-c4d9950a444c453abcb7fe47660aa7ec
institution	Kabale University
issn	2356-6140 1537-744X
language	English
publishDate	2014-01-01
publisher	Wiley
record_format	Article
series	The Scientific World Journal
spelling	doaj-art-c4d9950a444c453abcb7fe47660aa7ec2025-02-03T01:31:03ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/173869173869Efficient Feature Selection and Classification of Protein Sequence Data in BioinformaticsMuhammad Javed Iqbal0Ibrahima Faye1Brahim Belhaouari Samir2Abas Md Said3Computer and Information Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Tronoh, Perak, MalaysiaFundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Tronoh, Perak, MalaysiaCollege of Sciences, Alfaisal University, P.O. Box 50927, Riyadh 11533, Saudi ArabiaComputer and Information Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Tronoh, Perak, MalaysiaBioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.http://dx.doi.org/10.1155/2014/173869
spellingShingle	Muhammad Javed Iqbal Ibrahima Faye Brahim Belhaouari Samir Abas Md Said Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics The Scientific World Journal
title	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_full	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_fullStr	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_full_unstemmed	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_short	Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
title_sort	efficient feature selection and classification of protein sequence data in bioinformatics
url	http://dx.doi.org/10.1155/2014/173869
work_keys_str_mv	AT muhammadjavediqbal efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics AT ibrahimafaye efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics AT brahimbelhaouarisamir efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics AT abasmdsaid efficientfeatureselectionandclassificationofproteinsequencedatainbioinformatics

Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

Similar Items