PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA

This research was conducted to compare the accuracy when decision tree and logistic regression methods are used on some data. Decision tree is one method of classification techniques in data mining. In the decision tree method, very large data samples will be represented as smaller rules, and logist...

Full description

Saved in:
Bibliographic Details
Main Authors: Adi Setiawan, Febi Setivani, Tundjung Mahatma
Format: Article
Language:English
Published: Universitas Pattimura 2024-03-01
Series:Barekeng
Subjects:
Online Access:https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/10450
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849402509261537280
author Adi Setiawan
Febi Setivani
Tundjung Mahatma
author_facet Adi Setiawan
Febi Setivani
Tundjung Mahatma
author_sort Adi Setiawan
collection DOAJ
description This research was conducted to compare the accuracy when decision tree and logistic regression methods are used on some data. Decision tree is one method of classification techniques in data mining. In the decision tree method, very large data samples will be represented as smaller rules, and logistic regression is a method that aims to determine the effect of an independent variable on other variables, namely dichotomous dependent variables. Both algorithms were written and analyzed using R software to see which method is better between the decision tree method and the logistic regression method applied to SNP (Single Nucleotide Polymorphism) genetic data, namely Asthma data. SNP Genetic Data was obtained from R software with the package name "SNPassoc" and the data name "asthma". Asthma data has 57 features, namely Country, Gender, Age, BMI, Smoke, Case control, and SNP (Single Nucleotide Polymorphism) genetic code. Comparative analysis was carried out based on the results of the accuracy values obtained in the two methods. Variations in the proportion of the test data used were 40%, 30%, 20% and 10% and were simulated 1000 times on the grounds of obtaining a better accuracy value. The results obtained show that the decision tree method obtains an accuracy value of 0.5793, 0.5777, 0.5745, 0.5526, respectively, while the logistic regression method is 0.7696, 0.7729, 0.7763, 0.7788, respectively and they are achieved at the proportion of test data of 40%, 30%, 20%, 10%. Thus it can be concluded that in this case the logistic regression method is better than the decision tree method in classifying Asthma data.
format Article
id doaj-art-abd8a15dc00945db8e5a2f34fc983475
institution Kabale University
issn 1978-7227
2615-3017
language English
publishDate 2024-03-01
publisher Universitas Pattimura
record_format Article
series Barekeng
spelling doaj-art-abd8a15dc00945db8e5a2f34fc9834752025-08-20T03:37:31ZengUniversitas PattimuraBarekeng1978-72272615-30172024-03-011810403041210.30598/barekengvol18iss1pp0403-041210450PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATAAdi Setiawan0Febi Setivani1Tundjung Mahatma2Department of Data Science, Faculty of Science and Mathematics, Satya Wacana Christian University, IndonesiaDepartment of Mathematics, Faculty of Science and Mathematics, Satya Wacana Christian University, IndonesiaDepartment of Mathematics, Faculty of Science and Mathematics, Satya Wacana Christian University, IndonesiaThis research was conducted to compare the accuracy when decision tree and logistic regression methods are used on some data. Decision tree is one method of classification techniques in data mining. In the decision tree method, very large data samples will be represented as smaller rules, and logistic regression is a method that aims to determine the effect of an independent variable on other variables, namely dichotomous dependent variables. Both algorithms were written and analyzed using R software to see which method is better between the decision tree method and the logistic regression method applied to SNP (Single Nucleotide Polymorphism) genetic data, namely Asthma data. SNP Genetic Data was obtained from R software with the package name "SNPassoc" and the data name "asthma". Asthma data has 57 features, namely Country, Gender, Age, BMI, Smoke, Case control, and SNP (Single Nucleotide Polymorphism) genetic code. Comparative analysis was carried out based on the results of the accuracy values obtained in the two methods. Variations in the proportion of the test data used were 40%, 30%, 20% and 10% and were simulated 1000 times on the grounds of obtaining a better accuracy value. The results obtained show that the decision tree method obtains an accuracy value of 0.5793, 0.5777, 0.5745, 0.5526, respectively, while the logistic regression method is 0.7696, 0.7729, 0.7763, 0.7788, respectively and they are achieved at the proportion of test data of 40%, 30%, 20%, 10%. Thus it can be concluded that in this case the logistic regression method is better than the decision tree method in classifying Asthma data.https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/10450accuracydecision treelogistic regression
spellingShingle Adi Setiawan
Febi Setivani
Tundjung Mahatma
PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA
Barekeng
accuracy
decision tree
logistic regression
title PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA
title_full PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA
title_fullStr PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA
title_full_unstemmed PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA
title_short PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA
title_sort performance comparison of decision tree and logistic regression methods for classification of snp genetic data
topic accuracy
decision tree
logistic regression
url https://ojs3.unpatti.ac.id/index.php/barekeng/article/view/10450
work_keys_str_mv AT adisetiawan performancecomparisonofdecisiontreeandlogisticregressionmethodsforclassificationofsnpgeneticdata
AT febisetivani performancecomparisonofdecisiontreeandlogisticregressionmethodsforclassificationofsnpgeneticdata
AT tundjungmahatma performancecomparisonofdecisiontreeandlogisticregressionmethodsforclassificationofsnpgeneticdata