Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples
Abstract Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. Successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, requires large amounts of data for model building and optimization...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-12-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-024-04157-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850169257484091392 |
|---|---|
| author | Sai Spandana Chintapalli Rongguang Wang Zhijian Yang Vasiliki Tassopoulou Fanyang Yu Vishnu Bashyam Guray Erus Pratik Chaudhari Haochang Shou Christos Davatzikos |
| author_facet | Sai Spandana Chintapalli Rongguang Wang Zhijian Yang Vasiliki Tassopoulou Fanyang Yu Vishnu Bashyam Guray Erus Pratik Chaudhari Haochang Shou Christos Davatzikos |
| author_sort | Sai Spandana Chintapalli |
| collection | DOAJ |
| description | Abstract Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. Successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, requires large amounts of data for model building and optimization. To help overcome such limitations in the context of brain MRI, we present GenMIND: a collection of generative models of normative regional volumetric features derived from structural brain imaging. GenMIND models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging GenMIND, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model’s capability to generate unlimited data. Experimental results indicate that samples generated from GenMIND align well with the distributions observed in real data. Most importantly, the generated normative data significantly enhances the accuracy of downstream machine learning models on tasks such as disease classification. Dataset and the generative models are publicly available. |
| format | Article |
| id | doaj-art-74fba8976325481aa6d8e108dfb3a498 |
| institution | OA Journals |
| issn | 2052-4463 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-74fba8976325481aa6d8e108dfb3a4982025-08-20T02:20:45ZengNature PortfolioScientific Data2052-44632024-12-0111111010.1038/s41597-024-04157-4Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samplesSai Spandana Chintapalli0Rongguang Wang1Zhijian Yang2Vasiliki Tassopoulou3Fanyang Yu4Vishnu Bashyam5Guray Erus6Pratik Chaudhari7Haochang Shou8Christos Davatzikos9Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaDepartment of Electrical and Systems Engineering, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaCenter for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of PennsylvaniaAbstract Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. Successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, requires large amounts of data for model building and optimization. To help overcome such limitations in the context of brain MRI, we present GenMIND: a collection of generative models of normative regional volumetric features derived from structural brain imaging. GenMIND models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging GenMIND, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model’s capability to generate unlimited data. Experimental results indicate that samples generated from GenMIND align well with the distributions observed in real data. Most importantly, the generated normative data significantly enhances the accuracy of downstream machine learning models on tasks such as disease classification. Dataset and the generative models are publicly available.https://doi.org/10.1038/s41597-024-04157-4 |
| spellingShingle | Sai Spandana Chintapalli Rongguang Wang Zhijian Yang Vasiliki Tassopoulou Fanyang Yu Vishnu Bashyam Guray Erus Pratik Chaudhari Haochang Shou Christos Davatzikos Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples Scientific Data |
| title | Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples |
| title_full | Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples |
| title_fullStr | Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples |
| title_full_unstemmed | Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples |
| title_short | Generative models of MRI-derived neuroimaging features and associated dataset of 18,000 samples |
| title_sort | generative models of mri derived neuroimaging features and associated dataset of 18 000 samples |
| url | https://doi.org/10.1038/s41597-024-04157-4 |
| work_keys_str_mv | AT saispandanachintapalli generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT rongguangwang generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT zhijianyang generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT vasilikitassopoulou generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT fanyangyu generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT vishnubashyam generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT gurayerus generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT pratikchaudhari generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT haochangshou generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples AT christosdavatzikos generativemodelsofmriderivedneuroimagingfeaturesandassociateddatasetof18000samples |