Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis

This paper presents deep learning models—specifically, Long Short-Term Memory (LSTM) networks and hybrid Convolutional Neural Network–LSTM (CNN-LSTM) with a Copula-Based Random Forest (CBRF) model to estimate Heterogeneous Treatment Effects (HTEs) in survival analysis. The proposed method is designe...

Full description

Saved in:
Bibliographic Details
Main Author: Jong-Min Kim
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/10/1659
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850257767620673536
author Jong-Min Kim
author_facet Jong-Min Kim
author_sort Jong-Min Kim
collection DOAJ
description This paper presents deep learning models—specifically, Long Short-Term Memory (LSTM) networks and hybrid Convolutional Neural Network–LSTM (CNN-LSTM) with a Copula-Based Random Forest (CBRF) model to estimate Heterogeneous Treatment Effects (HTEs) in survival analysis. The proposed method is designed to capture non-linear relationships and temporal dependencies in clinical and genomic data, with a particular focus on exploring how treatment effects vary by race as a moderating factor. Using breast cancer data from the TCGA-BRCA dataset, which includes both clinical variables and gene expression profiles, we filter the data to focus on two racial groups: Black or African American and White. Dimensionality reduction is performed using Principal Component Analysis (PCA). We compare the CNN-LSTM, LSTM, and CBRF models under three weighting strategies—no weights, Horvitz–Thompson (HT) weights, and Inverse Probability of Treatment Weighting (IPTW)—for predicting treatment effects. Model performance is evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Concordance statistic (C-statistic), Average Treatment Effect (ATE), and Conditional Average Treatment Effect (CATE) by race. The CNN-LSTM model consistently outperforms the others, achieving the lowest prediction errors and highest discrimination, particularly under IPTW. Among the weighting strategies, IPTW yields the most substantial improvements in model performance and bias reduction. Importantly, race-specific treatment effects exhibit notable variation: CNN-LSTM estimates a slightly higher CATE for Black individuals under IPTW. Overall, CNN-LSTM with IPTW is recommended for robust and equitable causal inference, especially in racially stratified settings.
format Article
id doaj-art-c59ca09674804d88b6b3e141e0fecf54
institution OA Journals
issn 2227-7390
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-c59ca09674804d88b6b3e141e0fecf542025-08-20T01:56:19ZengMDPI AGMathematics2227-73902025-05-011310165910.3390/math13101659Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival AnalysisJong-Min Kim0Statistics Discipline, Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN 56267, USAThis paper presents deep learning models—specifically, Long Short-Term Memory (LSTM) networks and hybrid Convolutional Neural Network–LSTM (CNN-LSTM) with a Copula-Based Random Forest (CBRF) model to estimate Heterogeneous Treatment Effects (HTEs) in survival analysis. The proposed method is designed to capture non-linear relationships and temporal dependencies in clinical and genomic data, with a particular focus on exploring how treatment effects vary by race as a moderating factor. Using breast cancer data from the TCGA-BRCA dataset, which includes both clinical variables and gene expression profiles, we filter the data to focus on two racial groups: Black or African American and White. Dimensionality reduction is performed using Principal Component Analysis (PCA). We compare the CNN-LSTM, LSTM, and CBRF models under three weighting strategies—no weights, Horvitz–Thompson (HT) weights, and Inverse Probability of Treatment Weighting (IPTW)—for predicting treatment effects. Model performance is evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Concordance statistic (C-statistic), Average Treatment Effect (ATE), and Conditional Average Treatment Effect (CATE) by race. The CNN-LSTM model consistently outperforms the others, achieving the lowest prediction errors and highest discrimination, particularly under IPTW. Among the weighting strategies, IPTW yields the most substantial improvements in model performance and bias reduction. Importantly, race-specific treatment effects exhibit notable variation: CNN-LSTM estimates a slightly higher CATE for Black individuals under IPTW. Overall, CNN-LSTM with IPTW is recommended for robust and equitable causal inference, especially in racially stratified settings.https://www.mdpi.com/2227-7390/13/10/1659causal inferencedeep learningcopulasurvival analysis
spellingShingle Jong-Min Kim
Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis
Mathematics
causal inference
deep learning
copula
survival analysis
title Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis
title_full Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis
title_fullStr Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis
title_full_unstemmed Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis
title_short Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis
title_sort integrating copula based random forest and deep learning approaches for analyzing heterogeneous treatment effects in survival analysis
topic causal inference
deep learning
copula
survival analysis
url https://www.mdpi.com/2227-7390/13/10/1659
work_keys_str_mv AT jongminkim integratingcopulabasedrandomforestanddeeplearningapproachesforanalyzingheterogeneoustreatmenteffectsinsurvivalanalysis