Gene expression data classification: some distance-based methods

Micro-array dataset is a classical example of high throughput data characterized with more features(genes) than sample points(gene expression levels). A number of classification techniques have been proposed in literature. Many of these methods are either computationally expensive or perform sub-opt...

Full description

Saved in:
Bibliographic Details
Main Author: Olusola Samuel Makinde
Format: Article
Language:English
Published: Elsevier 2019-08-01
Series:Kuwait Journal of Science
Online Access:https://journalskuwait.org/kjs/index.php/KJS/article/view/5191
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850138734744305664
author Olusola Samuel Makinde
author_facet Olusola Samuel Makinde
author_sort Olusola Samuel Makinde
collection DOAJ
description Micro-array dataset is a classical example of high throughput data characterized with more features(genes) than sample points(gene expression levels). A number of classification techniques have been proposed in literature. Many of these methods are either computationally expensive or perform sub-optimally. In this paper, some distance functions are considered and classification rules based on the distance functions are formulated. The distance functions include average distance measure, distance to component-wise median, distance to mean. These methods are computationally simple and are expected to perform well for gene expression data. We also define a probabilistic approach to classification rules based on two of the distance measures. Gene selection technique based on shrunken centroids regularized discriminant analysis was employed on small round blue cell tissue, colon cancer, lymphoma, prostate cancer and leukaemia data before applying the classification rules. Three simulation studies were performed to mimic gene expression data. The performance of the classification methods mentioned above was compared with performance of some known classification methods in literature. The distance-based methods were also performed on gene expression data. The performance of the distance-based classification methods is competitive with some existing classification methods. Distance based methods implemented in this study are computationally simple and very cheap in terms of computational cost.
format Article
id doaj-art-4cf25b2de6ca428fac961fcc20aaafa1
institution OA Journals
issn 2307-4108
2307-4116
language English
publishDate 2019-08-01
publisher Elsevier
record_format Article
series Kuwait Journal of Science
spelling doaj-art-4cf25b2de6ca428fac961fcc20aaafa12025-08-20T02:30:31ZengElsevierKuwait Journal of Science2307-41082307-41162019-08-01463Gene expression data classification: some distance-based methodsOlusola Samuel Makinde0The Federal Universityof Technology, P.M.B. 704 Akure, NigeriaMicro-array dataset is a classical example of high throughput data characterized with more features(genes) than sample points(gene expression levels). A number of classification techniques have been proposed in literature. Many of these methods are either computationally expensive or perform sub-optimally. In this paper, some distance functions are considered and classification rules based on the distance functions are formulated. The distance functions include average distance measure, distance to component-wise median, distance to mean. These methods are computationally simple and are expected to perform well for gene expression data. We also define a probabilistic approach to classification rules based on two of the distance measures. Gene selection technique based on shrunken centroids regularized discriminant analysis was employed on small round blue cell tissue, colon cancer, lymphoma, prostate cancer and leukaemia data before applying the classification rules. Three simulation studies were performed to mimic gene expression data. The performance of the classification methods mentioned above was compared with performance of some known classification methods in literature. The distance-based methods were also performed on gene expression data. The performance of the distance-based classification methods is competitive with some existing classification methods. Distance based methods implemented in this study are computationally simple and very cheap in terms of computational cost.https://journalskuwait.org/kjs/index.php/KJS/article/view/5191
spellingShingle Olusola Samuel Makinde
Gene expression data classification: some distance-based methods
Kuwait Journal of Science
title Gene expression data classification: some distance-based methods
title_full Gene expression data classification: some distance-based methods
title_fullStr Gene expression data classification: some distance-based methods
title_full_unstemmed Gene expression data classification: some distance-based methods
title_short Gene expression data classification: some distance-based methods
title_sort gene expression data classification some distance based methods
url https://journalskuwait.org/kjs/index.php/KJS/article/view/5191
work_keys_str_mv AT olusolasamuelmakinde geneexpressiondataclassificationsomedistancebasedmethods