Software analysis of scientific texts: comparative study of distributed computing frameworks

The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis...

Full description

Saved in:

Bibliographic Details
Main Authors:	Serik Altynbek, Gabit Shuitenov, Madi Muratbekov, Alibek Barlybayev
Format:	Article
Language:	English
Published:	National Aerospace University «Kharkiv Aviation Institute» 2025-05-01
Series:	Радіоелектронні і комп'ютерні системи
Subjects:	text analysis apache flink apache spark apache hadoop machine learning big data
Online Access:	http://nti.khai.edu/ojs/index.php/reks/article/view/2987
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849730475886641152
author	Serik Altynbek Gabit Shuitenov Madi Muratbekov Alibek Barlybayev
author_facet	Serik Altynbek Gabit Shuitenov Madi Muratbekov Alibek Barlybayev
author_sort	Serik Altynbek
collection	DOAJ
description	The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis of the scientific literature, which has systematized the key features of distributed frameworks, such as Apache Flink, Apache Spark, and Apache Hadoop, with an in-depth focus on their application in the field of scientific text analysis. The results obtained from this study allowed delving into the architectural features of each of the studied frameworks, highlighting their strengths, such as high performance, scalability, and flexibility in data processing. Limitations such as resource requirements and customization complexity were also identified. The comparative analysis revealed the following: Apache Flink and Apache Spark have high performance and scalability by performing in-memory computation to increase processing speed and efficiency. They support both batch and streaming data processing and guarantee processing “exactly once”. Conversely, Apache Hadoop has lower performance, mainly using disc-based data processing. Importantly, Apache Flink and Apache Spark support several programming languages, such as Java, Scala, and Python, providing developers with flexibility. Thus, the results of the study provide comprehensive information for researchers and engineers, helping them to choose the most appropriate framework based on their research’s specific needs and objectives. The practical significance of this study is to provide information on the best tools for analyzing scientific texts, which can contribute to more efficient data processing and accelerate scientific research in various fields.
format	Article
id	doaj-art-e518882782c44e63a97fadbf874ead17
institution	DOAJ
issn	1814-4225 2663-2012
language	English
publishDate	2025-05-01
publisher	National Aerospace University «Kharkiv Aviation Institute»
record_format	Article
series	Радіоелектронні і комп'ютерні системи
spelling	doaj-art-e518882782c44e63a97fadbf874ead172025-08-20T03:08:51ZengNational Aerospace University «Kharkiv Aviation Institute»Радіоелектронні і комп'ютерні системи1814-42252663-20122025-05-012025210.32620/reks.2025.2.072592Software analysis of scientific texts: comparative study of distributed computing frameworksSerik Altynbek0Gabit Shuitenov1Madi Muratbekov2Alibek Barlybayev3Kazakh University of Technology and Business, AstanaEsil University, AstanaL.N. Gumilyov Eurasian National University, AstanaL.N. Gumilyov Eurasian National University, AstanaThe relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis of the scientific literature, which has systematized the key features of distributed frameworks, such as Apache Flink, Apache Spark, and Apache Hadoop, with an in-depth focus on their application in the field of scientific text analysis. The results obtained from this study allowed delving into the architectural features of each of the studied frameworks, highlighting their strengths, such as high performance, scalability, and flexibility in data processing. Limitations such as resource requirements and customization complexity were also identified. The comparative analysis revealed the following: Apache Flink and Apache Spark have high performance and scalability by performing in-memory computation to increase processing speed and efficiency. They support both batch and streaming data processing and guarantee processing “exactly once”. Conversely, Apache Hadoop has lower performance, mainly using disc-based data processing. Importantly, Apache Flink and Apache Spark support several programming languages, such as Java, Scala, and Python, providing developers with flexibility. Thus, the results of the study provide comprehensive information for researchers and engineers, helping them to choose the most appropriate framework based on their research’s specific needs and objectives. The practical significance of this study is to provide information on the best tools for analyzing scientific texts, which can contribute to more efficient data processing and accelerate scientific research in various fields.http://nti.khai.edu/ojs/index.php/reks/article/view/2987text analysisapache flinkapache sparkapache hadoopmachine learningbig data
spellingShingle	Serik Altynbek Gabit Shuitenov Madi Muratbekov Alibek Barlybayev Software analysis of scientific texts: comparative study of distributed computing frameworks Радіоелектронні і комп'ютерні системи text analysis apache flink apache spark apache hadoop machine learning big data
title	Software analysis of scientific texts: comparative study of distributed computing frameworks
title_full	Software analysis of scientific texts: comparative study of distributed computing frameworks
title_fullStr	Software analysis of scientific texts: comparative study of distributed computing frameworks
title_full_unstemmed	Software analysis of scientific texts: comparative study of distributed computing frameworks
title_short	Software analysis of scientific texts: comparative study of distributed computing frameworks
title_sort	software analysis of scientific texts comparative study of distributed computing frameworks
topic	text analysis apache flink apache spark apache hadoop machine learning big data
url	http://nti.khai.edu/ojs/index.php/reks/article/view/2987
work_keys_str_mv	AT serikaltynbek softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks AT gabitshuitenov softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks AT madimuratbekov softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks AT alibekbarlybayev softwareanalysisofscientifictextscomparativestudyofdistributedcomputingframeworks

Software analysis of scientific texts: comparative study of distributed computing frameworks

Similar Items