Transfer learning for accelerated failure time model with microarray data

Abstract Background In microarray prognostic studies, researchers aim to identify genes associated with disease progression. However, due to the rarity of certain diseases and the cost of sample collection, researchers often face the challenge of limited sample size, which may prevent accurate estim...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan-Bo Pei, Zheng-Yang Yu, Jun-Shan Shen
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06056-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849390266604060672
author Yan-Bo Pei
Zheng-Yang Yu
Jun-Shan Shen
author_facet Yan-Bo Pei
Zheng-Yang Yu
Jun-Shan Shen
author_sort Yan-Bo Pei
collection DOAJ
description Abstract Background In microarray prognostic studies, researchers aim to identify genes associated with disease progression. However, due to the rarity of certain diseases and the cost of sample collection, researchers often face the challenge of limited sample size, which may prevent accurate estimation and risk assessment. This challenge necessitates methods that can leverage information from external data (i.e., source cohorts) to improve gene selection and risk assessment based on the current sample (i.e., target cohort). Method We propose a transfer learning method for the accelerated failure time (AFT) model to enhance the fit on the target cohort by adaptively borrowing information from the source cohorts. We use a Leave-One-Out cross validation based procedure to evaluate the relative stability of selected genes and overall predictive power. Conclusion In simulation studies, the transfer learning method for the AFT model can correctly identify a small number of genes, its estimation error is smaller than the estimation error obtained without using the source cohorts. Furthermore, the proposed method demonstrates satisfactory accuracy and robustness in addressing heterogeneity across the cohorts compared to the method that directly combines the target and the source cohorts in the AFT model. We analyze the GSE88770 and GSE25055 data using the proposed method. The selected genes are relatively stable, and the proposed method can make an overall satisfactory risk prediction.
format Article
id doaj-art-8342e72822b844ca80ee13df0c71e34d
institution Kabale University
issn 1471-2105
language English
publishDate 2025-03-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-8342e72822b844ca80ee13df0c71e34d2025-08-20T03:41:43ZengBMCBMC Bioinformatics1471-21052025-03-0126111910.1186/s12859-025-06056-wTransfer learning for accelerated failure time model with microarray dataYan-Bo Pei0Zheng-Yang Yu1Jun-Shan Shen2School of Statistics, Capital University of Economics and BusinessSchool of Statistics, Capital University of Economics and BusinessSchool of Statistics, Capital University of Economics and BusinessAbstract Background In microarray prognostic studies, researchers aim to identify genes associated with disease progression. However, due to the rarity of certain diseases and the cost of sample collection, researchers often face the challenge of limited sample size, which may prevent accurate estimation and risk assessment. This challenge necessitates methods that can leverage information from external data (i.e., source cohorts) to improve gene selection and risk assessment based on the current sample (i.e., target cohort). Method We propose a transfer learning method for the accelerated failure time (AFT) model to enhance the fit on the target cohort by adaptively borrowing information from the source cohorts. We use a Leave-One-Out cross validation based procedure to evaluate the relative stability of selected genes and overall predictive power. Conclusion In simulation studies, the transfer learning method for the AFT model can correctly identify a small number of genes, its estimation error is smaller than the estimation error obtained without using the source cohorts. Furthermore, the proposed method demonstrates satisfactory accuracy and robustness in addressing heterogeneity across the cohorts compared to the method that directly combines the target and the source cohorts in the AFT model. We analyze the GSE88770 and GSE25055 data using the proposed method. The selected genes are relatively stable, and the proposed method can make an overall satisfactory risk prediction.https://doi.org/10.1186/s12859-025-06056-wSurvival analysisAuxiliary studiesGene expression dataWeighted least squaresTransfer learning
spellingShingle Yan-Bo Pei
Zheng-Yang Yu
Jun-Shan Shen
Transfer learning for accelerated failure time model with microarray data
BMC Bioinformatics
Survival analysis
Auxiliary studies
Gene expression data
Weighted least squares
Transfer learning
title Transfer learning for accelerated failure time model with microarray data
title_full Transfer learning for accelerated failure time model with microarray data
title_fullStr Transfer learning for accelerated failure time model with microarray data
title_full_unstemmed Transfer learning for accelerated failure time model with microarray data
title_short Transfer learning for accelerated failure time model with microarray data
title_sort transfer learning for accelerated failure time model with microarray data
topic Survival analysis
Auxiliary studies
Gene expression data
Weighted least squares
Transfer learning
url https://doi.org/10.1186/s12859-025-06056-w
work_keys_str_mv AT yanbopei transferlearningforacceleratedfailuretimemodelwithmicroarraydata
AT zhengyangyu transferlearningforacceleratedfailuretimemodelwithmicroarraydata
AT junshanshen transferlearningforacceleratedfailuretimemodelwithmicroarraydata