Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.

<h4>Objective</h4>Cohort selection is ubiquitous and essential, but manual and ad hoc approaches are time-consuming, labor-intense, and difficult to scale. We sought to automate the task of cohort selection by building self-service tools that enable researchers to independently generate...

Full description

Saved in:
Bibliographic Details
Main Authors: James V Lacey, Emma S Spielfogel, Jennifer L Benbow, Kristen E Savage, Kai Lin, Cheryl A M Anderson, Jessica Clague-DeHart, Christine N Duffy, Maria Elena Martinez, Hannah Lui Park, Caroline A Thompson, Sophia S Wang, Sandeep Chandra
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0296611
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849688999160971264
author James V Lacey
Emma S Spielfogel
Jennifer L Benbow
Kristen E Savage
Kai Lin
Cheryl A M Anderson
Jessica Clague-DeHart
Christine N Duffy
Maria Elena Martinez
Hannah Lui Park
Caroline A Thompson
Sophia S Wang
Sandeep Chandra
author_facet James V Lacey
Emma S Spielfogel
Jennifer L Benbow
Kristen E Savage
Kai Lin
Cheryl A M Anderson
Jessica Clague-DeHart
Christine N Duffy
Maria Elena Martinez
Hannah Lui Park
Caroline A Thompson
Sophia S Wang
Sandeep Chandra
author_sort James V Lacey
collection DOAJ
description <h4>Objective</h4>Cohort selection is ubiquitous and essential, but manual and ad hoc approaches are time-consuming, labor-intense, and difficult to scale. We sought to automate the task of cohort selection by building self-service tools that enable researchers to independently generate datasets for population sciences research.<h4>Materials and methods</h4>The California Teachers Study (CTS) is a prospective observational study of 133,477 women who have been followed continuously since 1995. The CTS includes extensive survey-based and real-world data from cancer, hospitalization, and mortality linkages. We curated data from our data warehouse into a column-oriented database and developed a researcher-facing web application that guides researchers through the project lifecycle; captures researchers' inputs; and automatically generates custom and analysis-ready data, code, dictionaries, and documentation.<h4>Results</h4>Researchers can register, access data, and propose projects on the CTS Researcher Platform via our CTS website. The Platform supports cohort and cross-sectional study designs for cancer, mortality, and any other ICD-based phenotypes or endpoints. User-friendly prompts and menus capture analytic design, inclusion/exclusion criteria, endpoint definitions, censoring rules, and covariate selection. Our platform empowers researchers everywhere to query, choose, review, and automatically and quickly receive custom data, analytic scripts, and documentation for their research projects. Research teams can review, revise, and update their choices anytime.<h4>Discussion</h4>We replaced inefficient traditional cohort-selection processes with an integrated self-service approach that simplifies and improves cohort selection for all stakeholders. Compared with manual methods, our solution is faster and more scalable, user-friendly, and collaborative. Other studies could re-configure our individual database, project-tracking, website, and data-delivery components for their own specific needs, or they could utilize other widely available solutions (e.g., alternative database or project-tracking tools) to enable similarly automated cohort-selection in their own settings. Our comprehensive and flexible framework could be adopted to improve cohort selection in other population sciences and observational research settings.
format Article
id doaj-art-61cb5e8471db4a8e99470c42e263d408
institution DOAJ
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-61cb5e8471db4a8e99470c42e263d4082025-08-20T03:21:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e029661110.1371/journal.pone.0296611Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.James V LaceyEmma S SpielfogelJennifer L BenbowKristen E SavageKai LinCheryl A M AndersonJessica Clague-DeHartChristine N DuffyMaria Elena MartinezHannah Lui ParkCaroline A ThompsonSophia S WangSandeep Chandra<h4>Objective</h4>Cohort selection is ubiquitous and essential, but manual and ad hoc approaches are time-consuming, labor-intense, and difficult to scale. We sought to automate the task of cohort selection by building self-service tools that enable researchers to independently generate datasets for population sciences research.<h4>Materials and methods</h4>The California Teachers Study (CTS) is a prospective observational study of 133,477 women who have been followed continuously since 1995. The CTS includes extensive survey-based and real-world data from cancer, hospitalization, and mortality linkages. We curated data from our data warehouse into a column-oriented database and developed a researcher-facing web application that guides researchers through the project lifecycle; captures researchers' inputs; and automatically generates custom and analysis-ready data, code, dictionaries, and documentation.<h4>Results</h4>Researchers can register, access data, and propose projects on the CTS Researcher Platform via our CTS website. The Platform supports cohort and cross-sectional study designs for cancer, mortality, and any other ICD-based phenotypes or endpoints. User-friendly prompts and menus capture analytic design, inclusion/exclusion criteria, endpoint definitions, censoring rules, and covariate selection. Our platform empowers researchers everywhere to query, choose, review, and automatically and quickly receive custom data, analytic scripts, and documentation for their research projects. Research teams can review, revise, and update their choices anytime.<h4>Discussion</h4>We replaced inefficient traditional cohort-selection processes with an integrated self-service approach that simplifies and improves cohort selection for all stakeholders. Compared with manual methods, our solution is faster and more scalable, user-friendly, and collaborative. Other studies could re-configure our individual database, project-tracking, website, and data-delivery components for their own specific needs, or they could utilize other widely available solutions (e.g., alternative database or project-tracking tools) to enable similarly automated cohort-selection in their own settings. Our comprehensive and flexible framework could be adopted to improve cohort selection in other population sciences and observational research settings.https://doi.org/10.1371/journal.pone.0296611
spellingShingle James V Lacey
Emma S Spielfogel
Jennifer L Benbow
Kristen E Savage
Kai Lin
Cheryl A M Anderson
Jessica Clague-DeHart
Christine N Duffy
Maria Elena Martinez
Hannah Lui Park
Caroline A Thompson
Sophia S Wang
Sandeep Chandra
Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.
PLoS ONE
title Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.
title_full Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.
title_fullStr Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.
title_full_unstemmed Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.
title_short Automated self-service cohort selection for large-scale population sciences and observational research: The California teachers study researcher platform.
title_sort automated self service cohort selection for large scale population sciences and observational research the california teachers study researcher platform
url https://doi.org/10.1371/journal.pone.0296611
work_keys_str_mv AT jamesvlacey automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT emmasspielfogel automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT jenniferlbenbow automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT kristenesavage automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT kailin automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT cherylamanderson automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT jessicaclaguedehart automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT christinenduffy automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT mariaelenamartinez automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT hannahluipark automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT carolineathompson automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT sophiaswang automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform
AT sandeepchandra automatedselfservicecohortselectionforlargescalepopulationsciencesandobservationalresearchthecaliforniateachersstudyresearcherplatform