Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms
Most of the studies on phenotype differences, including some diseases, are based on studying some specific positions in the genome called Single Nucleotide Polymorphism (SNP). Some SNPs alone and some by interacting with others, play an important role in any phenotype or specific disease. Various mo...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Amirkabir University of Technology
2021-06-01
|
| Series: | AUT Journal of Electrical Engineering |
| Subjects: | |
| Online Access: | https://eej.aut.ac.ir/article_4217_555bf807a3f2ddc515d0565c28ac6e80.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849431163781775360 |
|---|---|
| author | seyedeh rezwan Hosseini Farnaz Ghassemi Mohammad Hasan Moradi |
| author_facet | seyedeh rezwan Hosseini Farnaz Ghassemi Mohammad Hasan Moradi |
| author_sort | seyedeh rezwan Hosseini |
| collection | DOAJ |
| description | Most of the studies on phenotype differences, including some diseases, are based on studying some specific positions in the genome called Single Nucleotide Polymorphism (SNP). Some SNPs alone and some by interacting with others, play an important role in any phenotype or specific disease. Various models, including the regression models, are designed and implemented for the prediction of these diseases. In this paper, three penalized logistic models including Ridge, Lasso and Elastic Net (EN), are used to predict the risk of a specific disease, while overcoming the limitation of the classic logistic regression on high-dimensional SNP datasets. The models are implemented on 10000 samples of the SNP datasets of OWKIN-Inserm Institute, which contains 18124 SNPs. Among these three, the Lasso model with minimizer lambda indicate higher accuracy (73.73%) and AUC (83.54%). The model is also less complex, since it eliminates less related features as much as possible and keeps only the most informative. Additionally, getting better results with Lasso indicates that multicollinearity is either not existence between variables or is low and can be neglected. |
| format | Article |
| id | doaj-art-0f6323b9cc40480a9cf6c39b81ec65fd |
| institution | Kabale University |
| issn | 2588-2910 2588-2929 |
| language | English |
| publishDate | 2021-06-01 |
| publisher | Amirkabir University of Technology |
| record_format | Article |
| series | AUT Journal of Electrical Engineering |
| spelling | doaj-art-0f6323b9cc40480a9cf6c39b81ec65fd2025-08-20T03:27:43ZengAmirkabir University of TechnologyAUT Journal of Electrical Engineering2588-29102588-29292021-06-01531414610.22060/eej.2020.18965.53754217Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphismsseyedeh rezwan Hosseini0Farnaz Ghassemi1Mohammad Hasan Moradi2Biomedical engineering, M.Sc student, Amirkabir University of Technology, Tehran, IranAmirkabir University of Technology, Biomedical Engineering DepartmentAmirkabir University of technology, Biomedical Engineering DepartmentMost of the studies on phenotype differences, including some diseases, are based on studying some specific positions in the genome called Single Nucleotide Polymorphism (SNP). Some SNPs alone and some by interacting with others, play an important role in any phenotype or specific disease. Various models, including the regression models, are designed and implemented for the prediction of these diseases. In this paper, three penalized logistic models including Ridge, Lasso and Elastic Net (EN), are used to predict the risk of a specific disease, while overcoming the limitation of the classic logistic regression on high-dimensional SNP datasets. The models are implemented on 10000 samples of the SNP datasets of OWKIN-Inserm Institute, which contains 18124 SNPs. Among these three, the Lasso model with minimizer lambda indicate higher accuracy (73.73%) and AUC (83.54%). The model is also less complex, since it eliminates less related features as much as possible and keeps only the most informative. Additionally, getting better results with Lasso indicates that multicollinearity is either not existence between variables or is low and can be neglected.https://eej.aut.ac.ir/article_4217_555bf807a3f2ddc515d0565c28ac6e80.pdfcomplex diseases predictiongenotype-phenotype associationssnpregressionpenalized logistic regression |
| spellingShingle | seyedeh rezwan Hosseini Farnaz Ghassemi Mohammad Hasan Moradi Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms AUT Journal of Electrical Engineering complex diseases prediction genotype-phenotype associations snp regression penalized logistic regression |
| title | Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms |
| title_full | Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms |
| title_fullStr | Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms |
| title_full_unstemmed | Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms |
| title_short | Penalized Logistic Regression Models for Phenotype Prediction Based on Single Nucleotide Polymorphisms |
| title_sort | penalized logistic regression models for phenotype prediction based on single nucleotide polymorphisms |
| topic | complex diseases prediction genotype-phenotype associations snp regression penalized logistic regression |
| url | https://eej.aut.ac.ir/article_4217_555bf807a3f2ddc515d0565c28ac6e80.pdf |
| work_keys_str_mv | AT seyedehrezwanhosseini penalizedlogisticregressionmodelsforphenotypepredictionbasedonsinglenucleotidepolymorphisms AT farnazghassemi penalizedlogisticregressionmodelsforphenotypepredictionbasedonsinglenucleotidepolymorphisms AT mohammadhasanmoradi penalizedlogisticregressionmodelsforphenotypepredictionbasedonsinglenucleotidepolymorphisms |