Enhancing neuromolecular imaging classification in low-data regimes with generative machine learning: A case study in HDAC PET/MR imaging of alcohol use disorder

Introduction: Positron Emission Tomography (PET) is a vital modality for investigating brain related disorders. However, data scarcity especially for novel molecular targets like neuroepigenetic enzymes combined with difficult-to-recruit patient populations limits the development of machine learning...

Full description

Saved in:
Bibliographic Details
Main Authors: Tyler N. Meyer, Olga Andreeva, Roger D. Weiss, Wei Ding, Iris Shen, Changning Wang, Ping Chen, Tewodros Mulugeta Dagnew
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Neuroscience Informatics
Online Access:http://www.sciencedirect.com/science/article/pii/S2772528625000408
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Introduction: Positron Emission Tomography (PET) is a vital modality for investigating brain related disorders. However, data scarcity especially for novel molecular targets like neuroepigenetic enzymes combined with difficult-to-recruit patient populations limits the development of machine learning (ML) models. Our primary objective is to enhance single-subject classification of neuromolecular imaging data and facilitate biomarker discovery. We demonstrate our approach using histone deacetylase (HDAC) PET/MR imaging in Alcohol Use Disorder (AUD). Methods: We propose Catalysis Training pipeline, a framework that augments real imaging data with high-quality synthetic data generated by a Wasserstein Conditional Generative Adversarial Network (WCGAN). Using [11C]Martinostat PET/MR imaging, we extracted 1-D standardized uptake value ratio (SUVR) tabular features representing HDAC enzyme expression density across eight cingulate subregions. These were used to train and test ML classifiers, including Support Vector Machine (SVM), XGBoost, and Random Forest, under leave-one-out cross-validation. Results: Integrating synthetic data in the training process improved classification accuracy significantly: +26% for XGBoost and Random Forest (from 59% to 85%), and +18% for SVM (from 70% to 88%). Synthetic samples improved model generalizability. Key hemispheric and subregional cingulate HDAC patterns were also identified as potential biomarkers. Conclusion: Our results demonstrate that generative AI can help overcome data scarcity in low-data regime neuroimaging applications. Catalysis Training provides a scalable strategy to enhance ML-driven biomarker discovery and disease classification, especially for rare or difficult-to-study disorders like AUD. Clinically, cingulate HDAC expression measured by [11C]Martinostat PET/MR shows promise as an objective biomarker for AUD, complementing DSM-based diagnosis and informing novel treatment strategies.
ISSN:2772-5286