Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography

Objectives To evaluate the ability of a commercially available comprehensive chest radiography deep convolutional neural network (DCNN) to detect simple and tension pneumothorax, as stratified by the following subgroups: the presence of an intercostal drain; rib, clavicular, scapular or humeral frac...

Full description

Saved in:

Bibliographic Details
Main Authors:	Luke Oakden-Rayner, Catherine M Jones, John Lambert, Jarrel Seah, Cyril Tang, Quinlan D Buchlak, Michael Robert Milne, Xavier Holt, Hassan Ahmad, Nazanin Esmaili, Peter Brotchie
Format:	Article
Language:	English
Published:	BMJ Publishing Group 2021-12-01
Series:	BMJ Open
Online Access:	https://bmjopen.bmj.com/content/11/12/e053024.full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850066335233474560
author	Luke Oakden-Rayner Catherine M Jones John Lambert Jarrel Seah Cyril Tang Quinlan D Buchlak Michael Robert Milne Xavier Holt Hassan Ahmad Nazanin Esmaili Peter Brotchie
author_facet	Luke Oakden-Rayner Catherine M Jones John Lambert Jarrel Seah Cyril Tang Quinlan D Buchlak Michael Robert Milne Xavier Holt Hassan Ahmad Nazanin Esmaili Peter Brotchie
author_sort	Luke Oakden-Rayner
collection	DOAJ
description	Objectives To evaluate the ability of a commercially available comprehensive chest radiography deep convolutional neural network (DCNN) to detect simple and tension pneumothorax, as stratified by the following subgroups: the presence of an intercostal drain; rib, clavicular, scapular or humeral fractures or rib resections; subcutaneous emphysema and erect versus non-erect positioning. The hypothesis was that performance would not differ significantly in each of these subgroups when compared with the overall test dataset.Design A retrospective case–control study was undertaken.Setting Community radiology clinics and hospitals in Australia and the USA.Participants A test dataset of 2557 chest radiography studies was ground-truthed by three subspecialty thoracic radiologists for the presence of simple or tension pneumothorax as well as each subgroup other than positioning. Radiograph positioning was derived from radiographer annotations on the images.Outcome measures DCNN performance for detecting simple and tension pneumothorax was evaluated over the entire test set, as well as within each subgroup, using the area under the receiver operating characteristic curve (AUC). A difference in AUC of more than 0.05 was considered clinically significant.Results When compared with the overall test set, performance of the DCNN for detecting simple and tension pneumothorax was statistically non-inferior in all subgroups. The DCNN had an AUC of 0.981 (0.976–0.986) for detecting simple pneumothorax and 0.997 (0.995–0.999) for detecting tension pneumothorax.Conclusions Hidden stratification has significant implications for potential failures of deep learning when applied in clinical practice. This study demonstrated that a comprehensively trained DCNN can be resilient to hidden stratification in several clinically meaningful subgroups in detecting pneumothorax.
format	Article
id	doaj-art-24f002e2d0d34a5aad4957cd0e6dfae7
institution	DOAJ
issn	2044-6055
language	English
publishDate	2021-12-01
publisher	BMJ Publishing Group
record_format	Article
series	BMJ Open
spelling	doaj-art-24f002e2d0d34a5aad4957cd0e6dfae72025-08-20T02:48:46ZengBMJ Publishing GroupBMJ Open2044-60552021-12-01111210.1136/bmjopen-2021-053024Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiographyLuke Oakden-Rayner0Catherine M Jones1John Lambert2Jarrel Seah3Cyril Tang4Quinlan D Buchlak5Michael Robert Milne6Xavier Holt7Hassan Ahmad8Nazanin Esmaili9Peter Brotchie10Australian Institute for Machine Learning, The University of Adelaide, Adelaide, South Australia, AustraliaAnnalise-AI, Sydney, New South Wales, Australiaannalise.ai, Sydney, New South Wales, AustraliaAnnalise-AI, Sydney, New South Wales, AustraliaAnnalise-AI, Sydney, New South Wales, AustraliaAnnalise-AI, Sydney, New South Wales, Australiaannalise.ai, Sydney, New South Wales, Australiaannalise.ai, Sydney, New South Wales, Australiaannalise.ai, Sydney, New South Wales, AustraliaSchool of Medicine, The University of Notre Dame Australia School of Medicine Sydney Campus, Darlinghurst, New South Wales, Australiaannalise.ai, Sydney, New South Wales, AustraliaObjectives To evaluate the ability of a commercially available comprehensive chest radiography deep convolutional neural network (DCNN) to detect simple and tension pneumothorax, as stratified by the following subgroups: the presence of an intercostal drain; rib, clavicular, scapular or humeral fractures or rib resections; subcutaneous emphysema and erect versus non-erect positioning. The hypothesis was that performance would not differ significantly in each of these subgroups when compared with the overall test dataset.Design A retrospective case–control study was undertaken.Setting Community radiology clinics and hospitals in Australia and the USA.Participants A test dataset of 2557 chest radiography studies was ground-truthed by three subspecialty thoracic radiologists for the presence of simple or tension pneumothorax as well as each subgroup other than positioning. Radiograph positioning was derived from radiographer annotations on the images.Outcome measures DCNN performance for detecting simple and tension pneumothorax was evaluated over the entire test set, as well as within each subgroup, using the area under the receiver operating characteristic curve (AUC). A difference in AUC of more than 0.05 was considered clinically significant.Results When compared with the overall test set, performance of the DCNN for detecting simple and tension pneumothorax was statistically non-inferior in all subgroups. The DCNN had an AUC of 0.981 (0.976–0.986) for detecting simple pneumothorax and 0.997 (0.995–0.999) for detecting tension pneumothorax.Conclusions Hidden stratification has significant implications for potential failures of deep learning when applied in clinical practice. This study demonstrated that a comprehensively trained DCNN can be resilient to hidden stratification in several clinically meaningful subgroups in detecting pneumothorax.https://bmjopen.bmj.com/content/11/12/e053024.full
spellingShingle	Luke Oakden-Rayner Catherine M Jones John Lambert Jarrel Seah Cyril Tang Quinlan D Buchlak Michael Robert Milne Xavier Holt Hassan Ahmad Nazanin Esmaili Peter Brotchie Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography BMJ Open
title	Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography
title_full	Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography
title_fullStr	Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography
title_full_unstemmed	Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography
title_short	Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography
title_sort	do comprehensive deep learning algorithms suffer from hidden stratification a retrospective study on pneumothorax detection in chest radiography
url	https://bmjopen.bmj.com/content/11/12/e053024.full
work_keys_str_mv	AT lukeoakdenrayner docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT catherinemjones docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT johnlambert docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT jarrelseah docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT cyriltang docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT quinlandbuchlak docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT michaelrobertmilne docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT xavierholt docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT hassanahmad docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT nazaninesmaili docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography AT peterbrotchie docomprehensivedeeplearningalgorithmssufferfromhiddenstratificationaretrospectivestudyonpneumothoraxdetectioninchestradiography

Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography

Similar Items