An explainable dataset linking facial phenotypes and genes to rare genetic diseases

Abstract Distinctive facial phenotypes serve as crucial diagnostic markers for many rare genetic diseases. Although AI-driven image recognition achieves high diagnostic accuracy, it often fails to explain its predictions. In this study, we present the Facial phenotype-Gene-Disease Dataset (FGDD), an...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Song, Mengqiao He, Shumin Ren, Bairong Shen
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04922-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Distinctive facial phenotypes serve as crucial diagnostic markers for many rare genetic diseases. Although AI-driven image recognition achieves high diagnostic accuracy, it often fails to explain its predictions. In this study, we present the Facial phenotype-Gene-Disease Dataset (FGDD), an explainable dataset collected from 509 research publications. It contains 1,147 data records encompassing 197 disease-causing genes, 437 facial phenotypes, and 211 disease entities, with 689 records having disease labels. Each data record represents a patient group and includes demographic information, variation information, and phenotype information. Baseline and explainability validations conducted on FGDD confirmed the dataset’s effectiveness. FGDD supports the training of diagnostic models for rare genetic diseases while delivering explainable results, and provides a foundation for exploring intricate connections between genes, diseases, and facial phenotypes.
ISSN:2052-4463