Software analysis of scientific texts: comparative study of distributed computing frameworks

The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis...

Full description

Saved in:

Bibliographic Details
Main Authors:	Serik Altynbek, Gabit Shuitenov, Madi Muratbekov, Alibek Barlybayev
Format:	Article
Language:	English
Published:	National Aerospace University «Kharkiv Aviation Institute» 2025-05-01
Series:	Радіоелектронні і комп'ютерні системи
Subjects:	text analysis apache flink apache spark apache hadoop machine learning big data
Online Access:	http://nti.khai.edu/ojs/index.php/reks/article/view/2987
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis of the scientific literature, which has systematized the key features of distributed frameworks, such as Apache Flink, Apache Spark, and Apache Hadoop, with an in-depth focus on their application in the field of scientific text analysis. The results obtained from this study allowed delving into the architectural features of each of the studied frameworks, highlighting their strengths, such as high performance, scalability, and flexibility in data processing. Limitations such as resource requirements and customization complexity were also identified. The comparative analysis revealed the following: Apache Flink and Apache Spark have high performance and scalability by performing in-memory computation to increase processing speed and efficiency. They support both batch and streaming data processing and guarantee processing “exactly once”. Conversely, Apache Hadoop has lower performance, mainly using disc-based data processing. Importantly, Apache Flink and Apache Spark support several programming languages, such as Java, Scala, and Python, providing developers with flexibility. Thus, the results of the study provide comprehensive information for researchers and engineers, helping them to choose the most appropriate framework based on their research’s specific needs and objectives. The practical significance of this study is to provide information on the best tools for analyzing scientific texts, which can contribute to more efficient data processing and accelerate scientific research in various fields.
ISSN:	1814-4225 2663-2012

Software analysis of scientific texts: comparative study of distributed computing frameworks

Similar Items