Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms

Most of the studies on phenotype differences, including some diseases, are based on studying some specific positions in the genome called Single Nucleotide Polymorphism (SNP). Some SNPs alone and some by interacting with others, play an important role in any phenotype or specific disease. Various mo...

Full description

Saved in:
Bibliographic Details
Main Authors: seyedeh rezwan Hosseini, Farnaz Ghassemi, Mohammad Hasan Moradi
Format: Article
Language:English
Published: Amirkabir University of Technology 2021-06-01
Series:AUT Journal of Electrical Engineering
Subjects:
Online Access:https://eej.aut.ac.ir/article_4217_555bf807a3f2ddc515d0565c28ac6e80.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849431163781775360
author seyedeh rezwan Hosseini
Farnaz Ghassemi
Mohammad Hasan Moradi
author_facet seyedeh rezwan Hosseini
Farnaz Ghassemi
Mohammad Hasan Moradi
author_sort seyedeh rezwan Hosseini
collection DOAJ
description Most of the studies on phenotype differences, including some diseases, are based on studying some specific positions in the genome called Single Nucleotide Polymorphism (SNP). Some SNPs alone and some by interacting with others, play an important role in any phenotype or specific disease. Various models, including the regression models, are designed and implemented for the prediction of these diseases. In this paper, three penalized logistic models including Ridge, Lasso and Elastic Net (EN), are used to predict the risk of a specific disease, while overcoming the limitation of the classic logistic regression on high-dimensional SNP datasets. The models are implemented on 10000 samples of the SNP datasets of OWKIN-Inserm Institute, which contains 18124 SNPs. Among these three, the Lasso model with minimizer lambda indicate higher accuracy (73.73%) and AUC (83.54%). The model is also less complex, since it eliminates less related features as much as possible and keeps only the most informative. Additionally, getting better results with Lasso indicates that multicollinearity is either not existence between variables or is low and can be neglected.
format Article
id doaj-art-0f6323b9cc40480a9cf6c39b81ec65fd
institution Kabale University
issn 2588-2910
2588-2929
language English
publishDate 2021-06-01
publisher Amirkabir University of Technology
record_format Article
series AUT Journal of Electrical Engineering
spelling doaj-art-0f6323b9cc40480a9cf6c39b81ec65fd2025-08-20T03:27:43ZengAmirkabir University of TechnologyAUT Journal of Electrical Engineering2588-29102588-29292021-06-01531414610.22060/eej.2020.18965.53754217Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphismsseyedeh rezwan Hosseini0Farnaz Ghassemi1Mohammad Hasan Moradi2Biomedical engineering, M.Sc student, Amirkabir University of Technology, Tehran, IranAmirkabir University of Technology, Biomedical Engineering DepartmentAmirkabir University of technology, Biomedical Engineering DepartmentMost of the studies on phenotype differences, including some diseases, are based on studying some specific positions in the genome called Single Nucleotide Polymorphism (SNP). Some SNPs alone and some by interacting with others, play an important role in any phenotype or specific disease. Various models, including the regression models, are designed and implemented for the prediction of these diseases. In this paper, three penalized logistic models including Ridge, Lasso and Elastic Net (EN), are used to predict the risk of a specific disease, while overcoming the limitation of the classic logistic regression on high-dimensional SNP datasets. The models are implemented on 10000 samples of the SNP datasets of OWKIN-Inserm Institute, which contains 18124 SNPs. Among these three, the Lasso model with minimizer lambda indicate higher accuracy (73.73%) and AUC (83.54%). The model is also less complex, since it eliminates less related features as much as possible and keeps only the most informative. Additionally, getting better results with Lasso indicates that multicollinearity is either not existence between variables or is low and can be neglected.https://eej.aut.ac.ir/article_4217_555bf807a3f2ddc515d0565c28ac6e80.pdfcomplex diseases predictiongenotype-phenotype associationssnpregressionpenalized logistic regression
spellingShingle seyedeh rezwan Hosseini
Farnaz Ghassemi
Mohammad Hasan Moradi
Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms
AUT Journal of Electrical Engineering
complex diseases prediction
genotype-phenotype associations
snp
regression
penalized logistic regression
title Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms
title_full Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms
title_fullStr Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms
title_full_unstemmed Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms
title_short Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms
title_sort penalized logistic regression models for phenotype prediction based on single nucleotide polymorphisms
topic complex diseases prediction
genotype-phenotype associations
snp
regression
penalized logistic regression
url https://eej.aut.ac.ir/article_4217_555bf807a3f2ddc515d0565c28ac6e80.pdf
work_keys_str_mv AT seyedehrezwanhosseini penalizedlogisticregressionmodelsforphenotypepredictionbasedonsinglenucleotidepolymorphisms
AT farnazghassemi penalizedlogisticregressionmodelsforphenotypepredictionbasedonsinglenucleotidepolymorphisms
AT mohammadhasanmoradi penalizedlogisticregressionmodelsforphenotypepredictionbasedonsinglenucleotidepolymorphisms