Software complex for simulation modelling of single nucleotide genetic polymorphism sites

Objectives. High-throughput sequencing methods have recently become widely used in the fundamental and applied research of various human diseases. Sequencing of functionally significant regions of the human genome enables the simultaneous identification of multiple genetic polymorphism sites that ha...

Full description

Saved in:
Bibliographic Details
Main Authors: M. M. Yatskou, D. D. Sarnatski, V. V. Skakun, V. V. Grinev
Format: Article
Language:Russian
Published: National Academy of Sciences of Belarus, the United Institute of Informatics Problems 2025-07-01
Series:Informatika
Subjects:
Online Access:https://inf.grid.by/jour/article/view/1355
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849240231123877888
author M. M. Yatskou
D. D. Sarnatski
V. V. Skakun
V. V. Grinev
author_facet M. M. Yatskou
D. D. Sarnatski
V. V. Skakun
V. V. Grinev
author_sort M. M. Yatskou
collection DOAJ
description Objectives. High-throughput sequencing methods have recently become widely used in the fundamental and applied research of various human diseases. Sequencing of functionally significant regions of the human genome enables the simultaneous identification of multiple genetic polymorphism sites that have diagnostic and/or prognostic significance for human genetic diseases. One of the key goals in this area is to develop efficient software tools for processing genomic data and identifying single nucleotide polymorphism sites using computer modelling and big data analysis methods.Methods. A software complex has been developed for simulation modelling and identification of single nucleotide polymorphism sites using machine learning methods. The methods for the approach to simulation modelling and analysis of single nucleotide polymorphism sites in DNA molecules are implemented based on the beta or normal distributions, the parameters of which are determined from the available experimental data, and machine learning models trained on simulated data and used to accurately identify single nucleotide polymorphism sites. The software complex includes an R package, a web application, and auxiliary computational tools for processing experimental genomic sequencing data.Results. The performance of the developed software complex was tested on sets of simulated and experimental data from human cell genomic sequencing. A comparative analysis of the most effective algorithms for identifying single nucleotide polymorphism sites was performed. The best results were obtained for machine learning models.Conclusion. The use of the software complex increases the accuracy of identifying genetic polymorphism sites during the analysis of big genomic sequencing data. The software can be used for modelling synthetic data, based on experimental data or independently, for the purpose of comprehensive testing and selection of the best algorithms for identifying single nucleotide polymorphisms, as well as for generative data modelling used in training identification algorithms based on machine learning methods
format Article
id doaj-art-90fd5c069f8f41bc9fd116de897f7936
institution Kabale University
issn 1816-0301
language Russian
publishDate 2025-07-01
publisher National Academy of Sciences of Belarus, the United Institute of Informatics Problems
record_format Article
series Informatika
spelling doaj-art-90fd5c069f8f41bc9fd116de897f79362025-08-20T04:00:40ZrusNational Academy of Sciences of Belarus, the United Institute of Informatics ProblemsInformatika1816-03012025-07-01222819410.37661/1816-0301-2025-22-2-81-941111Software complex for simulation modelling of single nucleotide genetic polymorphism sitesM. M. Yatskou0D. D. Sarnatski1V. V. Skakun2V. V. Grinev3Belarusian State UniversityBelarusian State UniversityBelarusian State UniversityBelarusian State UniversityObjectives. High-throughput sequencing methods have recently become widely used in the fundamental and applied research of various human diseases. Sequencing of functionally significant regions of the human genome enables the simultaneous identification of multiple genetic polymorphism sites that have diagnostic and/or prognostic significance for human genetic diseases. One of the key goals in this area is to develop efficient software tools for processing genomic data and identifying single nucleotide polymorphism sites using computer modelling and big data analysis methods.Methods. A software complex has been developed for simulation modelling and identification of single nucleotide polymorphism sites using machine learning methods. The methods for the approach to simulation modelling and analysis of single nucleotide polymorphism sites in DNA molecules are implemented based on the beta or normal distributions, the parameters of which are determined from the available experimental data, and machine learning models trained on simulated data and used to accurately identify single nucleotide polymorphism sites. The software complex includes an R package, a web application, and auxiliary computational tools for processing experimental genomic sequencing data.Results. The performance of the developed software complex was tested on sets of simulated and experimental data from human cell genomic sequencing. A comparative analysis of the most effective algorithms for identifying single nucleotide polymorphism sites was performed. The best results were obtained for machine learning models.Conclusion. The use of the software complex increases the accuracy of identifying genetic polymorphism sites during the analysis of big genomic sequencing data. The software can be used for modelling synthetic data, based on experimental data or independently, for the purpose of comprehensive testing and selection of the best algorithms for identifying single nucleotide polymorphisms, as well as for generative data modelling used in training identification algorithms based on machine learning methodshttps://inf.grid.by/jour/article/view/1355single nucleotide genetic polymorphismsoftware complexsimulation modellingmachine learningdata miningr packageweb application
spellingShingle M. M. Yatskou
D. D. Sarnatski
V. V. Skakun
V. V. Grinev
Software complex for simulation modelling of single nucleotide genetic polymorphism sites
Informatika
single nucleotide genetic polymorphism
software complex
simulation modelling
machine learning
data mining
r package
web application
title Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_full Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_fullStr Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_full_unstemmed Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_short Software complex for simulation modelling of single nucleotide genetic polymorphism sites
title_sort software complex for simulation modelling of single nucleotide genetic polymorphism sites
topic single nucleotide genetic polymorphism
software complex
simulation modelling
machine learning
data mining
r package
web application
url https://inf.grid.by/jour/article/view/1355
work_keys_str_mv AT mmyatskou softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites
AT ddsarnatski softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites
AT vvskakun softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites
AT vvgrinev softwarecomplexforsimulationmodellingofsinglenucleotidegeneticpolymorphismsites