Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]
While researchers benefit from Apache Spark for executing scientific workflows at scale, they often lack provenance support due to the framework’s design limitations. This paper presents SAMbA-RaP, a provenance extension for Apache Spark. It focuses on: (i) Executing external, black-box applications...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-12-01
|
| Series: | SoftwareX |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352711024002978 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850266154721869824 |
|---|---|
| author | Thaylon Guedes Marta Mattoso Marcos Bedo Daniel de Oliveira |
| author_facet | Thaylon Guedes Marta Mattoso Marcos Bedo Daniel de Oliveira |
| author_sort | Thaylon Guedes |
| collection | DOAJ |
| description | While researchers benefit from Apache Spark for executing scientific workflows at scale, they often lack provenance support due to the framework’s design limitations. This paper presents SAMbA-RaP, a provenance extension for Apache Spark. It focuses on: (i) Executing external, black-box applications with intensive I/O operations within the workflow while leveraging Spark’s in-memory data structures, (ii) Extracting domain-specific data from in-memory data structures and (iii) Implementing data versioning and capturing the provenance graph in a workflow execution. SAMbA-RaP also provides real-time reports via a web interface, enabling scientists to explore dataflow transformations and content evolution as they run workflows. |
| format | Article |
| id | doaj-art-d5b2f94f3c03413db1ee37f6fe42b1ed |
| institution | OA Journals |
| issn | 2352-7110 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | SoftwareX |
| spelling | doaj-art-d5b2f94f3c03413db1ee37f6fe42b1ed2025-08-20T01:54:15ZengElsevierSoftwareX2352-71102024-12-012810192710.1016/j.softx.2024.101927Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]Thaylon Guedes0Marta Mattoso1Marcos Bedo2Daniel de Oliveira3Fluminense Federal University, G. Milton Tavares de Souza, Av., S/N, Niterói/RJ, BrazilFederal University of Rio de Janeiro, P.O Box 68501, Rio de Janeiro/RJ, BrazilFluminense Federal University, G. Milton Tavares de Souza, Av., S/N, Niterói/RJ, Brazil; Corresponding author.Fluminense Federal University, G. Milton Tavares de Souza, Av., S/N, Niterói/RJ, BrazilWhile researchers benefit from Apache Spark for executing scientific workflows at scale, they often lack provenance support due to the framework’s design limitations. This paper presents SAMbA-RaP, a provenance extension for Apache Spark. It focuses on: (i) Executing external, black-box applications with intensive I/O operations within the workflow while leveraging Spark’s in-memory data structures, (ii) Extracting domain-specific data from in-memory data structures and (iii) Implementing data versioning and capturing the provenance graph in a workflow execution. SAMbA-RaP also provides real-time reports via a web interface, enabling scientists to explore dataflow transformations and content evolution as they run workflows.http://www.sciencedirect.com/science/article/pii/S2352711024002978ProvenanceScientific workflowsDISC systemsDomain data |
| spellingShingle | Thaylon Guedes Marta Mattoso Marcos Bedo Daniel de Oliveira Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows] SoftwareX Provenance Scientific workflows DISC systems Domain data |
| title | Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows] |
| title_full | Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows] |
| title_fullStr | Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows] |
| title_full_unstemmed | Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows] |
| title_short | Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows] |
| title_sort | version 1 0 samba rap is music to scientists ears adding provenance support to spark based scientific workflows |
| topic | Provenance Scientific workflows DISC systems Domain data |
| url | http://www.sciencedirect.com/science/article/pii/S2352711024002978 |
| work_keys_str_mv | AT thaylonguedes version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows AT martamattoso version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows AT marcosbedo version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows AT danieldeoliveira version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows |