Software analysis of scientific texts: comparative study of distributed computing frameworks
The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
National Aerospace University «Kharkiv Aviation Institute»
2025-05-01
|
| Series: | Радіоелектронні і комп'ютерні системи |
| Subjects: | |
| Online Access: | http://nti.khai.edu/ojs/index.php/reks/article/view/2987 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The relevance of this study is related to the need for efficient analysis of scientific texts in the context of the growing amount of information. This study aims to conduct a study of popular distributed computing frameworks for scientific text processing. This study conducted an extensive analysis of the scientific literature, which has systematized the key features of distributed frameworks, such as Apache Flink, Apache Spark, and Apache Hadoop, with an in-depth focus on their application in the field of scientific text analysis. The results obtained from this study allowed delving into the architectural features of each of the studied frameworks, highlighting their strengths, such as high performance, scalability, and flexibility in data processing. Limitations such as resource requirements and customization complexity were also identified. The comparative analysis revealed the following: Apache Flink and Apache Spark have high performance and scalability by performing in-memory computation to increase processing speed and efficiency. They support both batch and streaming data processing and guarantee processing “exactly once”. Conversely, Apache Hadoop has lower performance, mainly using disc-based data processing. Importantly, Apache Flink and Apache Spark support several programming languages, such as Java, Scala, and Python, providing developers with flexibility. Thus, the results of the study provide comprehensive information for researchers and engineers, helping them to choose the most appropriate framework based on their research’s specific needs and objectives. The practical significance of this study is to provide information on the best tools for analyzing scientific texts, which can contribute to more efficient data processing and accelerate scientific research in various fields. |
|---|---|
| ISSN: | 1814-4225 2663-2012 |