Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations

Objective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, w...

Full description

Saved in:
Bibliographic Details
Main Authors: Victor Garcia, Emma Gardecki, Stephanie Jou, Xiaoxian Li, Kenneth R. Shroyer, Joel Saltz, Balazs Acs, Katherine Elfer, Jochen Lennerz, Roberto Salgado, Brandon D. Gallas
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Journal of Pathology Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2153353924000506
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850239103513133056
author Victor Garcia
Emma Gardecki
Stephanie Jou
Xiaoxian Li
Kenneth R. Shroyer
Joel Saltz
Balazs Acs
Katherine Elfer
Jochen Lennerz
Roberto Salgado
Brandon D. Gallas
author_facet Victor Garcia
Emma Gardecki
Stephanie Jou
Xiaoxian Li
Kenneth R. Shroyer
Joel Saltz
Balazs Acs
Katherine Elfer
Jochen Lennerz
Roberto Salgado
Brandon D. Gallas
author_sort Victor Garcia
collection DOAJ
description Objective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC). Materials and methods: We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization. Results: We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements. Conclusion: This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study.
format Article
id doaj-art-baf1d9bc0d3a47e7bcfef00f42ec18a9
institution OA Journals
issn 2153-3539
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Journal of Pathology Informatics
spelling doaj-art-baf1d9bc0d3a47e7bcfef00f42ec18a92025-08-20T02:01:15ZengElsevierJournal of Pathology Informatics2153-35392025-01-011610041110.1016/j.jpi.2024.100411Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotationsVictor Garcia0Emma Gardecki1Stephanie Jou2Xiaoxian Li3Kenneth R. Shroyer4Joel Saltz5Balazs Acs6Katherine Elfer7Jochen Lennerz8Roberto Salgado9Brandon D. Gallas10U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of America; Corresponding author at: 10903 New Hampshire Ave, Silver Spring, MD 20903, USA.U.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of AmericaDepartment of Pathology and Laboratory Medicine, Emory University, Atlanta, GA, United States of AmericaDepartment of Pathology and Laboratory Medicine, Emory University, Atlanta, GA, United States of AmericaDepartment of Pathology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, United States of AmericaDepartment of Pathology, Renaissance School of Medicine, Stony Brook University, Stony Brook, NY, United States of AmericaDepartment of Oncology and Pathology, Cancer Centre Karolinska (CCK), Karolinska Institutet, Stockholm, Sweden; Department of Clinical Pathology and Cancer Diagnostics, Karolinska University Hospital, Stockholm, SwedenU.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of America; Division of Cancer Prevention, National Cancer Institute, National Institute of Health, Shady Grove, MD, United States of AmericaBostonGene, Waltham, MA, USADivision of Research, Peter Mac Callum Cancer Centre, Melbourne, Australia; Department of Pathology, ZAS Hospitals, Antwerp, BelgiumU.S. Food and Drug Administration, Center for Devices and Radiological Health, Office of Science and Engineering Laboratories, Division of Imaging, Diagnostics, and Software Reliability, Silver Spring, MD, United States of AmericaObjective: With the increasing energy surrounding the development of artificial intelligence and machine learning (AI/ML) models, the use of the same external validation dataset by various developers allows for a direct comparison of model performance. Through our High Throughput Truthing project, we are creating a validation dataset for AI/ML models trained in the assessment of stromal tumor-infiltrating lymphocytes (sTILs) in triple negative breast cancer (TNBC). Materials and methods: We obtained clinical metadata for hematoxylin and eosin-stained glass slides and corresponding scanned whole slide images (WSIs) of TNBC core biopsies from two US academic medical centers. We selected regions of interest (ROIs) from the WSIs to target regions with various tissue morphologies and sTILs densities. Given the selected ROIs, we implemented a hierarchical rank-sort method for case prioritization. Results: We received 122 glass slides and clinical metadata on 105 unique patients with TNBC. All received cases were female, and the mean age was 63.44 years. 60% of all cases were White patients, and 38.1% were Black or African American. After case prioritization, the skewness of the sTILs density distribution improved from 0.60 to 0.46 with a corresponding increase in the entropy of the sTILs density bins from 1.20 to 1.24. We retained cases with less prevalent metadata elements. Conclusion: This method allows us to prioritize underrepresented subgroups based on important clinical factors. In this manuscript, we discuss how we sourced the clinical metadata, selected ROIs, and developed our approach to prioritizing cases for inclusion in our pivotal study.http://www.sciencedirect.com/science/article/pii/S2153353924000506DataSamplingPrioritizationValidation
spellingShingle Victor Garcia
Emma Gardecki
Stephanie Jou
Xiaoxian Li
Kenneth R. Shroyer
Joel Saltz
Balazs Acs
Katherine Elfer
Jochen Lennerz
Roberto Salgado
Brandon D. Gallas
Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
Journal of Pathology Informatics
Data
Sampling
Prioritization
Validation
title Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
title_full Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
title_fullStr Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
title_full_unstemmed Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
title_short Prioritizing cases from a multi-institutional cohort for a dataset of pathologist annotations
title_sort prioritizing cases from a multi institutional cohort for a dataset of pathologist annotations
topic Data
Sampling
Prioritization
Validation
url http://www.sciencedirect.com/science/article/pii/S2153353924000506
work_keys_str_mv AT victorgarcia prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT emmagardecki prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT stephaniejou prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT xiaoxianli prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT kennethrshroyer prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT joelsaltz prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT balazsacs prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT katherineelfer prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT jochenlennerz prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT robertosalgado prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations
AT brandondgallas prioritizingcasesfromamultiinstitutionalcohortforadatasetofpathologistannotations