rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.

The Clinical Practice Research Datalink (CPRD) is a large and widely used resource of electronic health records from the UK, linking primary care data to hospital data, death registration data, cancer registry data, deprivation data and mental health services data. Extraction and management of CPRD...

Full description

Saved in:
Bibliographic Details
Main Authors: Alexander Pate, Rosa Parisi, Evangelos Kontopantelis, Matthew Sperrin
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0327229
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849226886648954880
author Alexander Pate
Rosa Parisi
Evangelos Kontopantelis
Matthew Sperrin
author_facet Alexander Pate
Rosa Parisi
Evangelos Kontopantelis
Matthew Sperrin
author_sort Alexander Pate
collection DOAJ
description The Clinical Practice Research Datalink (CPRD) is a large and widely used resource of electronic health records from the UK, linking primary care data to hospital data, death registration data, cancer registry data, deprivation data and mental health services data. Extraction and management of CPRD data is a computationally demanding process and requires a significant amount of work, in particular when using R. The rcprd package simplifies the process of extracting and processing CPRD data in order to build datasets ready for statistical analysis. Raw CPRD data is provided in thousands of.txt files, making querying this data cumbersome and inefficient. rcprd saves the relevant information into an SQLite database stored on the hard drive which can then be queried efficiently to extract required information about individuals. rcprd follows a four-stage process: 1) Definition of a cohort, 2) Read in medical/prescription data and save into an SQLite database, 3) Query this SQLite database for specific codes and tests to create variables for each individual in the cohort, 4) Combine extracted variables into a dataset ready for statistical analysis. Functions are available to extract common variable types (e.g., history of a condition, or time until an event occurs, relative to an index date), and more general functions for database queries, allowing users to define their own variables for extraction. The entire process can be done from within R, with no knowledge of SQL required. This manuscript showcases the functionality of rcprd by running through an example using simulated CPRD Aurum data. rcprd will reduce the duplication of time and effort among those using CPRD data for research, allowing more time to be focused on other aspects of research projects.
format Article
id doaj-art-af18d1e9014e4eb294376b54fe9db082
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-af18d1e9014e4eb294376b54fe9db0822025-08-24T05:31:07ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01208e032722910.1371/journal.pone.0327229rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.Alexander PateRosa ParisiEvangelos KontopantelisMatthew SperrinThe Clinical Practice Research Datalink (CPRD) is a large and widely used resource of electronic health records from the UK, linking primary care data to hospital data, death registration data, cancer registry data, deprivation data and mental health services data. Extraction and management of CPRD data is a computationally demanding process and requires a significant amount of work, in particular when using R. The rcprd package simplifies the process of extracting and processing CPRD data in order to build datasets ready for statistical analysis. Raw CPRD data is provided in thousands of.txt files, making querying this data cumbersome and inefficient. rcprd saves the relevant information into an SQLite database stored on the hard drive which can then be queried efficiently to extract required information about individuals. rcprd follows a four-stage process: 1) Definition of a cohort, 2) Read in medical/prescription data and save into an SQLite database, 3) Query this SQLite database for specific codes and tests to create variables for each individual in the cohort, 4) Combine extracted variables into a dataset ready for statistical analysis. Functions are available to extract common variable types (e.g., history of a condition, or time until an event occurs, relative to an index date), and more general functions for database queries, allowing users to define their own variables for extraction. The entire process can be done from within R, with no knowledge of SQL required. This manuscript showcases the functionality of rcprd by running through an example using simulated CPRD Aurum data. rcprd will reduce the duplication of time and effort among those using CPRD data for research, allowing more time to be focused on other aspects of research projects.https://doi.org/10.1371/journal.pone.0327229
spellingShingle Alexander Pate
Rosa Parisi
Evangelos Kontopantelis
Matthew Sperrin
rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.
PLoS ONE
title rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.
title_full rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.
title_fullStr rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.
title_full_unstemmed rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.
title_short rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.
title_sort rcprd an r package to simplify the extraction and processing of clinical practice research datalink cprd data and create analysis ready datasets
url https://doi.org/10.1371/journal.pone.0327229
work_keys_str_mv AT alexanderpate rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets
AT rosaparisi rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets
AT evangeloskontopantelis rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets
AT matthewsperrin rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets