Software analysis of scientific texts: comparative study of distributed computing frameworks

The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis...

Full description

Saved in:
Bibliographic Details
Main Authors: Serik Altynbek, Gabit Shuitenov, Madi Muratbekov, Alibek Barlybayev
Format: Article
Language:English
Published: National Aerospace University «Kharkiv Aviation Institute» 2025-05-01
Series:Радіоелектронні і комп'ютерні системи
Subjects:
Online Access:http://nti.khai.edu/ojs/index.php/reks/article/view/2987
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849730475886641152
author Serik Altynbek
Gabit Shuitenov
Madi Muratbekov
Alibek Barlybayev
author_facet Serik Altynbek
Gabit Shuitenov
Madi Muratbekov
Alibek Barlybayev
author_sort Serik Altynbek
collection DOAJ
description The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis of the scientific literature, which has systematized the key features of distributed frameworks, such as Apache Flink, Apache Spark, and Apache Hadoop, with an in-depth focus on their application in the field of scientific text analysis. The results obtained from this study allowed delving into the architectural features of each of the studied frameworks, highlighting their strengths, such as high performance, scalability, and flexibility in data processing. Limitations such as resource requirements and customization complexity were also identified. The comparative analysis revealed the following: Apache Flink and Apache Spark have high performance and scalability by performing in-memory computation to increase processing speed and efficiency. They support both batch and streaming data processing and guarantee processing “exactly once”. Conversely, Apache Hadoop has lower performance, mainly using disc-based data processing. Importantly, Apache Flink and Apache Spark support several programming languages, such as Java, Scala, and Python, providing developers with flexibility. Thus, the results of the study provide comprehensive information for researchers and engineers, helping them to choose the most appropriate framework based on their research’s specific needs and objectives. The practical significance of this study is to provide information on the best tools for analyzing scientific texts, which can contribute to more efficient data processing and accelerate scientific research in various fields.
format Article
id doaj-art-e518882782c44e63a97fadbf874ead17
institution DOAJ
issn 1814-4225
2663-2012
language English
publishDate 2025-05-01
publisher National Aerospace University «Kharkiv Aviation Institute»
record_format Article
series Радіоелектронні і комп'ютерні системи
spelling doaj-art-e518882782c44e63a97fadbf874ead172025-08-20T03:08:51ZengNational Aerospace University «Kharkiv Aviation Institute»Радіоелектронні і комп'ютерні системи1814-42252663-20122025-05-012025210.32620/reks.2025.2.072592Software analysis of scientific texts: comparative study of distributed computing frameworksSerik Altynbek0Gabit Shuitenov1Madi Muratbekov2Alibek Barlybayev3Kazakh University of Technology and Business, AstanaEsil University, AstanaL.N. Gumilyov Eurasian National University, AstanaL.N. Gumilyov Eurasian National University, AstanaThe relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis of the scientific literature, which has systematized the key features of distributed frameworks, such as Apache Flink, Apache Spark, and Apache Hadoop, with an in-depth focus on their application in the field of scientific text analysis. The results obtained from this study allowed delving into the architectural features of each of the studied frameworks, highlighting their strengths, such as high performance, scalability, and flexibility in data processing. Limitations such as resource requirements and customization complexity were also identified. The comparative analysis revealed the following: Apache Flink and Apache Spark have high performance and scalability by performing in-memory computation to increase processing speed and efficiency. They support both batch and streaming data processing and guarantee processing “exactly once”. Conversely, Apache Hadoop has lower performance, mainly using disc-based data processing. Importantly, Apache Flink and Apache Spark support several programming languages, such as Java, Scala, and Python, providing developers with flexibility. Thus, the results of the study provide comprehensive information for researchers and engineers, helping them to choose the most appropriate framework based on their research’s specific needs and objectives. The practical significance of this study is to provide information on the best tools for analyzing scientific texts, which can contribute to more efficient data processing and accelerate scientific research in various fields.http://nti.khai.edu/ojs/index.php/reks/article/view/2987text analysisapache flinkapache sparkapache hadoopmachine learningbig data
spellingShingle Serik Altynbek
Gabit Shuitenov
Madi Muratbekov
Alibek Barlybayev
Software analysis of scientific texts: comparative study of distributed computing frameworks
Радіоелектронні і комп'ютерні системи
text analysis
apache flink
apache spark
apache hadoop
machine learning
big data
title Software analysis of scientific texts: comparative study of distributed computing frameworks
title_full Software analysis of scientific texts: comparative study of distributed computing frameworks
title_fullStr Software analysis of scientific texts: comparative study of distributed computing frameworks
title_full_unstemmed Software analysis of scientific texts: comparative study of distributed computing frameworks
title_short Software analysis of scientific texts: comparative study of distributed computing frameworks
title_sort software analysis of scientific texts comparative study of distributed computing frameworks
topic text analysis
apache flink
apache spark
apache hadoop
machine learning
big data
url http://nti.khai.edu/ojs/index.php/reks/article/view/2987
work_keys_str_mv AT serikaltynbek softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks
AT gabitshuitenov softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks
AT madimuratbekov softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks
AT alibekbarlybayev softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks