Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resources

Abstract In the rapidly evolving landscape of healthcare and drug development, the ability to efficiently collect, process, and analyze large volumes of real‐world data (RWD) is critical for advancing drug development. This article provides a blueprint for establishing an end‐to‐end data and analyti...

Full description

Saved in:
Bibliographic Details
Main Authors: Wes Anderson, Roopal Bhatnagar, Keith Scollick, Marco Schito, Ramona Walls, Jagdeep T. Podichetty
Format: Article
Language:English
Published: Wiley 2024-12-01
Series:Clinical and Translational Science
Online Access:https://doi.org/10.1111/cts.70078
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850243184089628672
author Wes Anderson
Roopal Bhatnagar
Keith Scollick
Marco Schito
Ramona Walls
Jagdeep T. Podichetty
author_facet Wes Anderson
Roopal Bhatnagar
Keith Scollick
Marco Schito
Ramona Walls
Jagdeep T. Podichetty
author_sort Wes Anderson
collection DOAJ
description Abstract In the rapidly evolving landscape of healthcare and drug development, the ability to efficiently collect, process, and analyze large volumes of real‐world data (RWD) is critical for advancing drug development. This article provides a blueprint for establishing an end‐to‐end data and analytics pipeline in a cloud‐based environment. The pipeline presented here includes four major components, including data ingestion, transformation, visualization, and analytics, each supported by a suite of Amazon Web Services (AWS) tools. The pipeline is exemplified through the CURE ID platform, a collaborative tool designed to capture and analyze real‐world, off‐label treatment administrations. By using services such as AWS Lambda, Amazon Relational Database Service (RDS), Amazon QuickSight, and Amazon SageMaker, the pipeline facilitates the ingestion of diverse data sources, the transformation of raw data into structured formats, the creation of interactive dashboards for data visualization, and the application of advanced machine learning models for data analytics. The described architecture not only supports the needs of the CURE ID platform, but also offers a scalable and adaptable framework that can be applied across various domains to enhance data‐driven decision making beyond drug repurposing.
format Article
id doaj-art-535156f338d44a00ae28c2d8fde68a4c
institution OA Journals
issn 1752-8054
1752-8062
language English
publishDate 2024-12-01
publisher Wiley
record_format Article
series Clinical and Translational Science
spelling doaj-art-535156f338d44a00ae28c2d8fde68a4c2025-08-20T02:00:03ZengWileyClinical and Translational Science1752-80541752-80622024-12-011712n/an/a10.1111/cts.70078Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resourcesWes Anderson0Roopal Bhatnagar1Keith Scollick2Marco Schito3Ramona Walls4Jagdeep T. Podichetty5Critical Path Institute Tucson Arizona USACritical Path Institute Tucson Arizona USACritical Path Institute Tucson Arizona USACritical Path Institute Tucson Arizona USACritical Path Institute Tucson Arizona USACritical Path Institute Tucson Arizona USAAbstract In the rapidly evolving landscape of healthcare and drug development, the ability to efficiently collect, process, and analyze large volumes of real‐world data (RWD) is critical for advancing drug development. This article provides a blueprint for establishing an end‐to‐end data and analytics pipeline in a cloud‐based environment. The pipeline presented here includes four major components, including data ingestion, transformation, visualization, and analytics, each supported by a suite of Amazon Web Services (AWS) tools. The pipeline is exemplified through the CURE ID platform, a collaborative tool designed to capture and analyze real‐world, off‐label treatment administrations. By using services such as AWS Lambda, Amazon Relational Database Service (RDS), Amazon QuickSight, and Amazon SageMaker, the pipeline facilitates the ingestion of diverse data sources, the transformation of raw data into structured formats, the creation of interactive dashboards for data visualization, and the application of advanced machine learning models for data analytics. The described architecture not only supports the needs of the CURE ID platform, but also offers a scalable and adaptable framework that can be applied across various domains to enhance data‐driven decision making beyond drug repurposing.https://doi.org/10.1111/cts.70078
spellingShingle Wes Anderson
Roopal Bhatnagar
Keith Scollick
Marco Schito
Ramona Walls
Jagdeep T. Podichetty
Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resources
Clinical and Translational Science
title Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resources
title_full Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resources
title_fullStr Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resources
title_full_unstemmed Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resources
title_short Real‐world evidence in the cloud: Tutorial on developing an end‐to‐end data and analytics pipeline using Amazon Web Services resources
title_sort real world evidence in the cloud tutorial on developing an end to end data and analytics pipeline using amazon web services resources
url https://doi.org/10.1111/cts.70078
work_keys_str_mv AT wesanderson realworldevidenceinthecloudtutorialondevelopinganendtoenddataandanalyticspipelineusingamazonwebservicesresources
AT roopalbhatnagar realworldevidenceinthecloudtutorialondevelopinganendtoenddataandanalyticspipelineusingamazonwebservicesresources
AT keithscollick realworldevidenceinthecloudtutorialondevelopinganendtoenddataandanalyticspipelineusingamazonwebservicesresources
AT marcoschito realworldevidenceinthecloudtutorialondevelopinganendtoenddataandanalyticspipelineusingamazonwebservicesresources
AT ramonawalls realworldevidenceinthecloudtutorialondevelopinganendtoenddataandanalyticspipelineusingamazonwebservicesresources
AT jagdeeptpodichetty realworldevidenceinthecloudtutorialondevelopinganendtoenddataandanalyticspipelineusingamazonwebservicesresources