Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples

Abstract Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. Successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, requires large amounts of data for model building and optimization...

Full description

Saved in:
Bibliographic Details
Main Authors: Sai Spandana Chintapalli, Rongguang Wang, Zhijian Yang, Vasiliki Tassopoulou, Fanyang Yu, Vishnu Bashyam, Guray Erus, Pratik Chaudhari, Haochang Shou, Christos Davatzikos
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-024-04157-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850169257484091392
author Sai Spandana Chintapalli
Rongguang Wang
Zhijian Yang
Vasiliki Tassopoulou
Fanyang Yu
Vishnu Bashyam
Guray Erus
Pratik Chaudhari
Haochang Shou
Christos Davatzikos
author_facet Sai Spandana Chintapalli
Rongguang Wang
Zhijian Yang
Vasiliki Tassopoulou
Fanyang Yu
Vishnu Bashyam
Guray Erus
Pratik Chaudhari
Haochang Shou
Christos Davatzikos
author_sort Sai Spandana Chintapalli
collection DOAJ
description Abstract Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. Successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, requires large amounts of data for model building and optimization. To help overcome such limitations in the context of brain MRI, we present GenMIND: a collection of generative models of normative regional volumetric features derived from structural brain imaging. GenMIND models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging GenMIND, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model’s capability to generate unlimited data. Experimental results indicate that samples generated from GenMIND align well with the distributions observed in real data. Most importantly, the generated normative data significantly enhances the accuracy of downstream machine learning models on tasks such as disease classification. Dataset and the generative models are publicly available.
format Article
id doaj-art-74fba8976325481aa6d8e108dfb3a498
institution OA Journals
issn 2052-4463
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-74fba8976325481aa6d8e108dfb3a4982025-08-20T02:20:45ZengNature PortfolioScientific Data2052-44632024-12-0111111010.1038/s41597-024-04157-4Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samplesSai Spandana Chintapalli0Rongguang Wang1Zhijian Yang2Vasiliki Tassopoulou3Fanyang Yu4Vishnu Bashyam5Guray Erus6Pratik Chaudhari7Haochang Shou8Christos Davatzikos9Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaDepartment of Electrical and Systems Engineering, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaAbstract Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. Successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, requires large amounts of data for model building and optimization. To help overcome such limitations in the context of brain MRI, we present GenMIND: a collection of generative models of normative regional volumetric features derived from structural brain imaging. GenMIND models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging GenMIND, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model’s capability to generate unlimited data. Experimental results indicate that samples generated from GenMIND align well with the distributions observed in real data. Most importantly, the generated normative data significantly enhances the accuracy of downstream machine learning models on tasks such as disease classification. Dataset and the generative models are publicly available.https://doi.org/10.1038/s41597-024-04157-4
spellingShingle Sai Spandana Chintapalli
Rongguang Wang
Zhijian Yang
Vasiliki Tassopoulou
Fanyang Yu
Vishnu Bashyam
Guray Erus
Pratik Chaudhari
Haochang Shou
Christos Davatzikos
Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples
Scientific Data
title Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples
title_full Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples
title_fullStr Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples
title_full_unstemmed Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples
title_short Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples
title_sort generative models of mri derived neuroimaging features and associated dataset of 18 000 samples
url https://doi.org/10.1038/s41597-024-04157-4
work_keys_str_mv AT saispandanachintapalli generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT rongguangwang generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT zhijianyang generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT vasilikitassopoulou generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT fanyangyu generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT vishnubashyam generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT gurayerus generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT pratikchaudhari generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT haochangshou generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples
AT christosdavatzikos generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples