Gene expression data classification: some distance-based methods

Micro-array dataset is a classical example of high throughput data characterized with more features(genes) than sample points(gene expression levels). A number of classification techniques have been proposed in literature. Many of these methods are either computationally expensive or perform sub-opt...

Full description

Saved in:
Bibliographic Details
Main Author: Olusola Samuel Makinde
Format: Article
Language:English
Published: Elsevier 2019-08-01
Series:Kuwait Journal of Science
Online Access:https://journalskuwait.org/kjs/index.php/KJS/article/view/5191
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Micro-array dataset is a classical example of high throughput data characterized with more features(genes) than sample points(gene expression levels). A number of classification techniques have been proposed in literature. Many of these methods are either computationally expensive or perform sub-optimally. In this paper, some distance functions are considered and classification rules based on the distance functions are formulated. The distance functions include average distance measure, distance to component-wise median, distance to mean. These methods are computationally simple and are expected to perform well for gene expression data. We also define a probabilistic approach to classification rules based on two of the distance measures. Gene selection technique based on shrunken centroids regularized discriminant analysis was employed on small round blue cell tissue, colon cancer, lymphoma, prostate cancer and leukaemia data before applying the classification rules. Three simulation studies were performed to mimic gene expression data. The performance of the classification methods mentioned above was compared with performance of some known classification methods in literature. The distance-based methods were also performed on gene expression data. The performance of the distance-based classification methods is competitive with some existing classification methods. Distance based methods implemented in this study are computationally simple and very cheap in terms of computational cost.
ISSN:2307-4108
2307-4116