GeNetFormer: Transformer-Based Framework for Gene Expression Prediction in Breast Cancer

<i>Background:</i> Histopathological images are often used to diagnose breast cancer and have shown high accuracy in classifying cancer subtypes. Prediction of gene expression from whole-slide images and spatial transcriptomics data is important for cancer treatment in general and breast...

Full description

Saved in:
Bibliographic Details
Main Authors: Oumeima Thaalbi, Moulay A. Akhloufi
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/6/3/43
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<i>Background:</i> Histopathological images are often used to diagnose breast cancer and have shown high accuracy in classifying cancer subtypes. Prediction of gene expression from whole-slide images and spatial transcriptomics data is important for cancer treatment in general and breast cancer in particular. This topic has been a challenge in numerous studies. <i>Method:</i> In this study, we present a deep learning framework called GeNetFormer. We evaluated eight advanced transformer models including EfficientFormer, FasterViT, BEiT v2, and Swin Transformer v2, and tested their performance in predicting gene expression using the STNet dataset. This dataset contains 68 H&E-stained histology images and transcriptomics data from different types of breast cancer. We followed a detailed process to prepare the data, including filtering genes and spots, normalizing stain colors, and creating smaller image patches for training. The models were trained to predict the expression of 250 genes using different image sizes and loss functions. GeNetFormer achieved the best performance using the MSELoss function and a resolution of 256 × 256 while integrating EfficientFormer. <i>Results:</i> It predicted nine out of the top ten genes with a higher Pearson Correlation Coefficient (PCC) compared to the retrained ST-Net method. For cancer biomarker genes such as DDX5 and XBP1, the PCC values were 0.7450 and 0.7203, respectively, outperforming ST-Net, which scored 0.6713 and 0.7320, respectively. In addition, our method gave better predictions for other genes such as FASN (0.7018 vs. 0.6968) and ERBB2 (0.6241 vs. 0.6211). <i>Conclusions:</i> Our results show that GeNetFormer provides improvements over other models such as ST-Net and show how transformer architectures are capable of analyzing spatial transcriptomics data to advance cancer research.
ISSN:2673-2688