Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
Objective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, w...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-01-01
|
| Series: | Journal of Pathology Informatics |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2153353924000506 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850239103513133056 |
|---|---|
| author | Victor Garcia Emma Gardecki Stephanie Jou Xiaoxian Li Kenneth R. Shroyer Joel Saltz Balazs Acs Katherine Elfer Jochen Lennerz Roberto Salgado Brandon D. Gallas |
| author_facet | Victor Garcia Emma Gardecki Stephanie Jou Xiaoxian Li Kenneth R. Shroyer Joel Saltz Balazs Acs Katherine Elfer Jochen Lennerz Roberto Salgado Brandon D. Gallas |
| author_sort | Victor Garcia |
| collection | DOAJ |
| description | Objective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC). Materials and methods: We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization. Results: We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements. Conclusion: This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study. |
| format | Article |
| id | doaj-art-baf1d9bc0d3a47e7bcfef00f42ec18a9 |
| institution | OA Journals |
| issn | 2153-3539 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Journal of Pathology Informatics |
| spelling | doaj-art-baf1d9bc0d3a47e7bcfef00f42ec18a92025-08-20T02:01:15ZengElsevierJournal of Pathology Informatics2153-35392025-01-011610041110.1016/j.jpi.2024.100411Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotationsVictor Garcia0Emma Gardecki1Stephanie Jou2Xiaoxian Li3Kenneth R. Shroyer4Joel Saltz5Balazs Acs6Katherine Elfer7Jochen Lennerz8Roberto Salgado9Brandon D. Gallas10U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of America; Corresponding author at: 10903 New Hampshire Ave, Silver Spring, MD 20903, USA.U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of AmericaDepartment of Pathology and Laboratory Medicine, Emory University, Atlanta, GA, United States of AmericaDepartment of Pathology and Laboratory Medicine, Emory University, Atlanta, GA, United States of AmericaDepartment of Pathology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, United States of AmericaDepartment of Pathology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, United States of AmericaDepartment of Oncology and Pathology, Cancer Centre Karolinska (CCK), Karolinska Institutet, Stockholm, Sweden; Department of Clinical Pathology and Cancer Diagnostics, Karolinska University Hospital, Stockholm, SwedenU.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of America; Division of Cancer Prevention, National Cancer Institute, National Institute of Health, Shady Grove, MD, United States of AmericaBostonGene, Waltham, MA, USADivision of Research, Peter Mac Callum Cancer Centre, Melbourne, Australia; Department of Pathology, ZAS Hospitals, Antwerp, BelgiumU.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of AmericaObjective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC). Materials and methods: We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization. Results: We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements. Conclusion: This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study.http://www.sciencedirect.com/science/article/pii/S2153353924000506DataSamplingPrioritizationValidation |
| spellingShingle | Victor Garcia Emma Gardecki Stephanie Jou Xiaoxian Li Kenneth R. Shroyer Joel Saltz Balazs Acs Katherine Elfer Jochen Lennerz Roberto Salgado Brandon D. Gallas Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations Journal of Pathology Informatics Data Sampling Prioritization Validation |
| title | Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations |
| title_full | Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations |
| title_fullStr | Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations |
| title_full_unstemmed | Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations |
| title_short | Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations |
| title_sort | prioritizing cases from a multi institutional cohort for a dataset of pathologist annotations |
| topic | Data Sampling Prioritization Validation |
| url | http://www.sciencedirect.com/science/article/pii/S2153353924000506 |
| work_keys_str_mv | AT victorgarcia prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT emmagardecki prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT stephaniejou prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT xiaoxianli prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT kennethrshroyer prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT joelsaltz prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT balazsacs prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT katherineelfer prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT jochenlennerz prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT robertosalgado prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations AT brandondgallas prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations |