Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]

While researchers benefit from Apache Spark for executing scientific workflows at scale, they often lack provenance support due to the framework’s design limitations. This paper presents SAMbA-RaP, a provenance extension for Apache Spark. It focuses on: (i) Executing external, black-box applications...

Full description

Saved in:
Bibliographic Details
Main Authors: Thaylon Guedes, Marta Mattoso, Marcos Bedo, Daniel de Oliveira
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:SoftwareX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352711024002978
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850266154721869824
author Thaylon Guedes
Marta Mattoso
Marcos Bedo
Daniel de Oliveira
author_facet Thaylon Guedes
Marta Mattoso
Marcos Bedo
Daniel de Oliveira
author_sort Thaylon Guedes
collection DOAJ
description While researchers benefit from Apache Spark for executing scientific workflows at scale, they often lack provenance support due to the framework’s design limitations. This paper presents SAMbA-RaP, a provenance extension for Apache Spark. It focuses on: (i) Executing external, black-box applications with intensive I/O operations within the workflow while leveraging Spark’s in-memory data structures, (ii) Extracting domain-specific data from in-memory data structures and (iii) Implementing data versioning and capturing the provenance graph in a workflow execution. SAMbA-RaP also provides real-time reports via a web interface, enabling scientists to explore dataflow transformations and content evolution as they run workflows.
format Article
id doaj-art-d5b2f94f3c03413db1ee37f6fe42b1ed
institution OA Journals
issn 2352-7110
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series SoftwareX
spelling doaj-art-d5b2f94f3c03413db1ee37f6fe42b1ed2025-08-20T01:54:15ZengElsevierSoftwareX2352-71102024-12-012810192710.1016/j.softx.2024.101927Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]Thaylon Guedes0Marta Mattoso1Marcos Bedo2Daniel de Oliveira3Fluminense Federal University, G. Milton Tavares de Souza, Av., S/N, Niterói/RJ, BrazilFederal University of Rio de Janeiro, P.O Box 68501, Rio de Janeiro/RJ, BrazilFluminense Federal University, G. Milton Tavares de Souza, Av., S/N, Niterói/RJ, Brazil; Corresponding author.Fluminense Federal University, G. Milton Tavares de Souza, Av., S/N, Niterói/RJ, BrazilWhile researchers benefit from Apache Spark for executing scientific workflows at scale, they often lack provenance support due to the framework’s design limitations. This paper presents SAMbA-RaP, a provenance extension for Apache Spark. It focuses on: (i) Executing external, black-box applications with intensive I/O operations within the workflow while leveraging Spark’s in-memory data structures, (ii) Extracting domain-specific data from in-memory data structures and (iii) Implementing data versioning and capturing the provenance graph in a workflow execution. SAMbA-RaP also provides real-time reports via a web interface, enabling scientists to explore dataflow transformations and content evolution as they run workflows.http://www.sciencedirect.com/science/article/pii/S2352711024002978ProvenanceScientific workflowsDISC systemsDomain data
spellingShingle Thaylon Guedes
Marta Mattoso
Marcos Bedo
Daniel de Oliveira
Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]
SoftwareX
Provenance
Scientific workflows
DISC systems
Domain data
title Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]
title_full Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]
title_fullStr Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]
title_full_unstemmed Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]
title_short Version [1.0]- [SAMbA-RaP is music to scientists’ ears: Adding provenance support to spark-based scientific workflows]
title_sort version 1 0 samba rap is music to scientists ears adding provenance support to spark based scientific workflows
topic Provenance
Scientific workflows
DISC systems
Domain data
url http://www.sciencedirect.com/science/article/pii/S2352711024002978
work_keys_str_mv AT thaylonguedes version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows
AT martamattoso version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows
AT marcosbedo version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows
AT danieldeoliveira version10sambarapismusictoscientistsearsaddingprovenancesupporttosparkbasedscientificworkflows