rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.
The Clinical Practice Research Datalink (CPRD) is a large and widely used resource of electronic health records from the UK, linking primary care data to hospital data, death registration data, cancer registry data, deprivation data and mental health services data. Extraction and management of CPRD...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0327229 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849226886648954880 |
|---|---|
| author | Alexander Pate Rosa Parisi Evangelos Kontopantelis Matthew Sperrin |
| author_facet | Alexander Pate Rosa Parisi Evangelos Kontopantelis Matthew Sperrin |
| author_sort | Alexander Pate |
| collection | DOAJ |
| description | The Clinical Practice Research Datalink (CPRD) is a large and widely used resource of electronic health records from the UK, linking primary care data to hospital data, death registration data, cancer registry data, deprivation data and mental health services data. Extraction and management of CPRD data is a computationally demanding process and requires a significant amount of work, in particular when using R. The rcprd package simplifies the process of extracting and processing CPRD data in order to build datasets ready for statistical analysis. Raw CPRD data is provided in thousands of.txt files, making querying this data cumbersome and inefficient. rcprd saves the relevant information into an SQLite database stored on the hard drive which can then be queried efficiently to extract required information about individuals. rcprd follows a four-stage process: 1) Definition of a cohort, 2) Read in medical/prescription data and save into an SQLite database, 3) Query this SQLite database for specific codes and tests to create variables for each individual in the cohort, 4) Combine extracted variables into a dataset ready for statistical analysis. Functions are available to extract common variable types (e.g., history of a condition, or time until an event occurs, relative to an index date), and more general functions for database queries, allowing users to define their own variables for extraction. The entire process can be done from within R, with no knowledge of SQL required. This manuscript showcases the functionality of rcprd by running through an example using simulated CPRD Aurum data. rcprd will reduce the duplication of time and effort among those using CPRD data for research, allowing more time to be focused on other aspects of research projects. |
| format | Article |
| id | doaj-art-af18d1e9014e4eb294376b54fe9db082 |
| institution | Kabale University |
| issn | 1932-6203 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS ONE |
| spelling | doaj-art-af18d1e9014e4eb294376b54fe9db0822025-08-24T05:31:07ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01208e032722910.1371/journal.pone.0327229rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets.Alexander PateRosa ParisiEvangelos KontopantelisMatthew SperrinThe Clinical Practice Research Datalink (CPRD) is a large and widely used resource of electronic health records from the UK, linking primary care data to hospital data, death registration data, cancer registry data, deprivation data and mental health services data. Extraction and management of CPRD data is a computationally demanding process and requires a significant amount of work, in particular when using R. The rcprd package simplifies the process of extracting and processing CPRD data in order to build datasets ready for statistical analysis. Raw CPRD data is provided in thousands of.txt files, making querying this data cumbersome and inefficient. rcprd saves the relevant information into an SQLite database stored on the hard drive which can then be queried efficiently to extract required information about individuals. rcprd follows a four-stage process: 1) Definition of a cohort, 2) Read in medical/prescription data and save into an SQLite database, 3) Query this SQLite database for specific codes and tests to create variables for each individual in the cohort, 4) Combine extracted variables into a dataset ready for statistical analysis. Functions are available to extract common variable types (e.g., history of a condition, or time until an event occurs, relative to an index date), and more general functions for database queries, allowing users to define their own variables for extraction. The entire process can be done from within R, with no knowledge of SQL required. This manuscript showcases the functionality of rcprd by running through an example using simulated CPRD Aurum data. rcprd will reduce the duplication of time and effort among those using CPRD data for research, allowing more time to be focused on other aspects of research projects.https://doi.org/10.1371/journal.pone.0327229 |
| spellingShingle | Alexander Pate Rosa Parisi Evangelos Kontopantelis Matthew Sperrin rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets. PLoS ONE |
| title | rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets. |
| title_full | rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets. |
| title_fullStr | rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets. |
| title_full_unstemmed | rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets. |
| title_short | rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets. |
| title_sort | rcprd an r package to simplify the extraction and processing of clinical practice research datalink cprd data and create analysis ready datasets |
| url | https://doi.org/10.1371/journal.pone.0327229 |
| work_keys_str_mv | AT alexanderpate rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets AT rosaparisi rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets AT evangeloskontopantelis rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets AT matthewsperrin rcprdanrpackagetosimplifytheextractionandprocessingofclinicalpracticeresearchdatalinkcprddataandcreateanalysisreadydatasets |