Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis

This paper presents deep learning models—specifically, Long Short-Term Memory (LSTM) networks and hybrid Convolutional Neural Network–LSTM (CNN-LSTM) with a Copula-Based Random Forest (CBRF) model to estimate Heterogeneous Treatment Effects (HTEs) in survival analysis. The proposed method is designe...

Full description

Saved in:
Bibliographic Details
Main Author: Jong-Min Kim
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/10/1659
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents deep learning models—specifically, Long Short-Term Memory (LSTM) networks and hybrid Convolutional Neural Network–LSTM (CNN-LSTM) with a Copula-Based Random Forest (CBRF) model to estimate Heterogeneous Treatment Effects (HTEs) in survival analysis. The proposed method is designed to capture non-linear relationships and temporal dependencies in clinical and genomic data, with a particular focus on exploring how treatment effects vary by race as a moderating factor. Using breast cancer data from the TCGA-BRCA dataset, which includes both clinical variables and gene expression profiles, we filter the data to focus on two racial groups: Black or African American and White. Dimensionality reduction is performed using Principal Component Analysis (PCA). We compare the CNN-LSTM, LSTM, and CBRF models under three weighting strategies—no weights, Horvitz–Thompson (HT) weights, and Inverse Probability of Treatment Weighting (IPTW)—for predicting treatment effects. Model performance is evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Concordance statistic (C-statistic), Average Treatment Effect (ATE), and Conditional Average Treatment Effect (CATE) by race. The CNN-LSTM model consistently outperforms the others, achieving the lowest prediction errors and highest discrimination, particularly under IPTW. Among the weighting strategies, IPTW yields the most substantial improvements in model performance and bias reduction. Importantly, race-specific treatment effects exhibit notable variation: CNN-LSTM estimates a slightly higher CATE for Black individuals under IPTW. Overall, CNN-LSTM with IPTW is recommended for robust and equitable causal inference, especially in racially stratified settings.
ISSN:2227-7390