An explainable dataset linking facial phenotypes and genes to rare genetic diseases

Abstract Distinctive facial phenotypes serve as crucial diagnostic markers for many rare genetic diseases. Although AI-driven image recognition achieves high diagnostic accuracy, it often fails to explain its predictions. In this study, we present the Facial phenotype-Gene-Disease Dataset (FGDD), an...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Song, Mengqiao He, Shumin Ren, Bairong Shen
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04922-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850181445231837184
author Jie Song
Mengqiao He
Shumin Ren
Bairong Shen
author_facet Jie Song
Mengqiao He
Shumin Ren
Bairong Shen
author_sort Jie Song
collection DOAJ
description Abstract Distinctive facial phenotypes serve as crucial diagnostic markers for many rare genetic diseases. Although AI-driven image recognition achieves high diagnostic accuracy, it often fails to explain its predictions. In this study, we present the Facial phenotype-Gene-Disease Dataset (FGDD), an explainable dataset collected from 509 research publications. It contains 1,147 data records encompassing 197 disease-causing genes, 437 facial phenotypes, and 211 disease entities, with 689 records having disease labels. Each data record represents a patient group and includes demographic information, variation information, and phenotype information. Baseline and explainability validations conducted on FGDD confirmed the dataset’s effectiveness. FGDD supports the training of diagnostic models for rare genetic diseases while delivering explainable results, and provides a foundation for exploring intricate connections between genes, diseases, and facial phenotypes.
format Article
id doaj-art-f91c1ceeef9a495985c3fea34ebeacae
institution OA Journals
issn 2052-4463
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-f91c1ceeef9a495985c3fea34ebeacae2025-08-20T02:17:53ZengNature PortfolioScientific Data2052-44632025-04-0112111010.1038/s41597-025-04922-zAn explainable dataset linking facial phenotypes and genes to rare genetic diseasesJie Song0Mengqiao He1Shumin Ren2Bairong Shen3Department of Ophthalmology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan UniversityDepartment of Ophthalmology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan UniversityDepartment of Ophthalmology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan UniversityDepartment of Ophthalmology and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan UniversityAbstract Distinctive facial phenotypes serve as crucial diagnostic markers for many rare genetic diseases. Although AI-driven image recognition achieves high diagnostic accuracy, it often fails to explain its predictions. In this study, we present the Facial phenotype-Gene-Disease Dataset (FGDD), an explainable dataset collected from 509 research publications. It contains 1,147 data records encompassing 197 disease-causing genes, 437 facial phenotypes, and 211 disease entities, with 689 records having disease labels. Each data record represents a patient group and includes demographic information, variation information, and phenotype information. Baseline and explainability validations conducted on FGDD confirmed the dataset’s effectiveness. FGDD supports the training of diagnostic models for rare genetic diseases while delivering explainable results, and provides a foundation for exploring intricate connections between genes, diseases, and facial phenotypes.https://doi.org/10.1038/s41597-025-04922-z
spellingShingle Jie Song
Mengqiao He
Shumin Ren
Bairong Shen
An explainable dataset linking facial phenotypes and genes to rare genetic diseases
Scientific Data
title An explainable dataset linking facial phenotypes and genes to rare genetic diseases
title_full An explainable dataset linking facial phenotypes and genes to rare genetic diseases
title_fullStr An explainable dataset linking facial phenotypes and genes to rare genetic diseases
title_full_unstemmed An explainable dataset linking facial phenotypes and genes to rare genetic diseases
title_short An explainable dataset linking facial phenotypes and genes to rare genetic diseases
title_sort explainable dataset linking facial phenotypes and genes to rare genetic diseases
url https://doi.org/10.1038/s41597-025-04922-z
work_keys_str_mv AT jiesong anexplainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases
AT mengqiaohe anexplainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases
AT shuminren anexplainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases
AT bairongshen anexplainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases
AT jiesong explainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases
AT mengqiaohe explainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases
AT shuminren explainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases
AT bairongshen explainabledatasetlinkingfacialphenotypesandgenestoraregeneticdiseases