Software complex for simulation modelling of single nucleotide genetic polymorphism sites

Objectives. High-throughput sequencing methods have recently become widely used in the fundamental and applied research of various human diseases. Sequencing of functionally significant regions of the human genome enables the simultaneous identification of multiple genetic polymorphism sites that ha...

Full description

Saved in:

Bibliographic Details
Main Authors:	M. M. Yatskou, D. D. Sarnatski, V. V. Skakun, V. V. Grinev
Format:	Article
Language:	Russian
Published:	National Academy of Sciences of Belarus, the United Institute of Informatics Problems 2025-07-01
Series:	Informatika
Subjects:	single nucleotide genetic polymorphism software complex simulation modelling machine learning data mining r package web application
Online Access:	https://inf.grid.by/jour/article/view/1355
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849240231123877888
author	M. M. Yatskou D. D. Sarnatski V. V. Skakun V. V. Grinev
author_facet	M. M. Yatskou D. D. Sarnatski V. V. Skakun V. V. Grinev
author_sort	M. M. Yatskou
collection	DOAJ
description	Objectives. High-throughput sequencing methods have recently become widely used in the fundamental and applied research of various human diseases. Sequencing of functionally significant regions of the human genome enables the simultaneous identification of multiple genetic polymorphism sites that have diagnostic and/or prognostic significance for human genetic diseases. One of the key goals in this area is to develop efficient software tools for processing genomic data and identifying single nucleotide polymorphism sites using computer modelling and big data analysis methods.Methods. A software complex has been developed for simulation modelling and identification of single nucleotide polymorphism sites using machine learning methods. The methods for the approach to simulation modelling and analysis of single nucleotide polymorphism sites in DNA molecules are implemented based on the beta or normal distributions, the parameters of which are determined from the available experimental data, and machine learning models trained on simulated data and used to accurately identify single nucleotide polymorphism sites. The software complex includes an R package, a web application, and auxiliary computational tools for processing experimental genomic sequencing data.Results. The performance of the developed software complex was tested on sets of simulated and experimental data from human cell genomic sequencing. A comparative analysis of the most effective algorithms for identifying single nucleotide polymorphism sites was performed. The best results were obtained for machine learning models.Conclusion. The use of the software complex increases the accuracy of identifying genetic polymorphism sites during the analysis of big genomic sequencing data. The software can be used for modelling synthetic data, based on experimental data or independently, for the purpose of comprehensive testing and selection of the best algorithms for identifying single nucleotide polymorphisms, as well as for generative data modelling used in training identification algorithms based on machine learning methods
format	Article
id	doaj-art-90fd5c069f8f41bc9fd116de897f7936
institution	Kabale University
issn	1816-0301
language	Russian
publishDate	2025-07-01
publisher	National Academy of Sciences of Belarus, the United Institute of Informatics Problems
record_format	Article
series	Informatika
spelling	doaj-art-90fd5c069f8f41bc9fd116de897f79362025-08-20T04:00:40ZrusNational Academy of Sciences of Belarus, the United Institute of Informatics ProblemsInformatika1816-03012025-07-01222819410.37661/1816-0301-2025-22-2-81-941111Software complex for simulation modelling of single nucleotide genetic polymorphism sitesM. M. Yatskou0D. D. Sarnatski1V. V. Skakun2V. V. Grinev3Belarusian State UniversityBelarusian State UniversityBelarusian State UniversityBelarusian State UniversityObjectives. High-throughput sequencing methods have recently become widely used in the fundamental and applied research of various human diseases. Sequencing of functionally significant regions of the human genome enables the simultaneous identification of multiple genetic polymorphism sites that have diagnostic and/or prognostic significance for human genetic diseases. One of the key goals in this area is to develop efficient software tools for processing genomic data and identifying single nucleotide polymorphism sites using computer modelling and big data analysis methods.Methods. A software complex has been developed for simulation modelling and identification of single nucleotide polymorphism sites using machine learning methods. The methods for the approach to simulation modelling and analysis of single nucleotide polymorphism sites in DNA molecules are implemented based on the beta or normal distributions, the parameters of which are determined from the available experimental data, and machine learning models trained on simulated data and used to accurately identify single nucleotide polymorphism sites. The software complex includes an R package, a web application, and auxiliary computational tools for processing experimental genomic sequencing data.Results. The performance of the developed software complex was tested on sets of simulated and experimental data from human cell genomic sequencing. A comparative analysis of the most effective algorithms for identifying single nucleotide polymorphism sites was performed. The best results were obtained for machine learning models.Conclusion. The use of the software complex increases the accuracy of identifying genetic polymorphism sites during the analysis of big genomic sequencing data. The software can be used for modelling synthetic data, based on experimental data or independently, for the purpose of comprehensive testing and selection of the best algorithms for identifying single nucleotide polymorphisms, as well as for generative data modelling used in training identification algorithms based on machine learning methodshttps://inf.grid.by/jour/article/view/1355single nucleotide genetic polymorphismsoftware complexsimulation modellingmachine learningdata miningr packageweb application
spellingShingle	M. M. Yatskou D. D. Sarnatski V. V. Skakun V. V. Grinev Software complex for simulation modelling of single nucleotide genetic polymorphism sites Informatika single nucleotide genetic polymorphism software complex simulation modelling machine learning data mining r package web application
title	Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_full	Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_fullStr	Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_full_unstemmed	Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_short	Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_sort	software complex for simulation modelling of single nucleotide genetic polymorphism sites
topic	single nucleotide genetic polymorphism software complex simulation modelling machine learning data mining r package web application
url	https://inf.grid.by/jour/article/view/1355
work_keys_str_mv	AT mmyatskou softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites AT ddsarnatski softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites AT vvskakun softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites AT vvgrinev softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites

Software complex for simulation modelling of single nucleotide genetic polymorphism sites

Similar Items