Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study

Abstract Background Classical approaches to subgroup analysis in randomised controlled trials (RCTs) to identify heterogeneous treatment effects (HTEs) involve testing the interaction between each pre-specified possible treatment effect modifier and the treatment effect. However, individual signific...

Full description

Saved in:
Bibliographic Details
Main Authors: Eleanor Van Vogt, Anthony C. Gordon, Karla Diaz-Ordaz, Suzie Cro
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-025-02489-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849724109164904448
author Eleanor Van Vogt
Anthony C. Gordon
Karla Diaz-Ordaz
Suzie Cro
author_facet Eleanor Van Vogt
Anthony C. Gordon
Karla Diaz-Ordaz
Suzie Cro
author_sort Eleanor Van Vogt
collection DOAJ
description Abstract Background Classical approaches to subgroup analysis in randomised controlled trials (RCTs) to identify heterogeneous treatment effects (HTEs) involve testing the interaction between each pre-specified possible treatment effect modifier and the treatment effect. However, individual significant interactions may not always yield clinically actionable subgroups, particularly for continuous covariates. Non-parametric causal machine learning approaches are flexible alternatives for estimating HTEs across many possible treatment effect modifiers in a single analysis. Methods We conducted a secondary analysis of the VANISH RCT, which compared the early use of vasopressin with norepinephrine on renal failure-free survival for patients with septic shock at 28 days. We used classical (separate tests for interaction with Bonferroni correction), data-adaptive (hierarchical lasso regression), and non-parametric causal machine learning (causal forest) methods to analyse HTEs for the primary outcome of being alive at 28 days. Causal forests comprise honest causal trees, which use sample splitting to determine tree splits and estimate treatment effects separately. The modal initial (root) splits of the causal forest were extracted, and the mean value was used as a threshold to partition the population into subgroups with different treatment effects. Results All three models found evidence of HTE with serum potassium levels. Univariable logistic regression OR 0.435 (95%CI [0.270, 0.683]. p = 0.0004), hierarchical lasso logistic regression standardised OR: 0.604 (95% CI 0.259, 0.701), lambda = 0.0049. Hierarchical lasso kept the interaction between the treatment and serum potassium, sodium level, minimum temperature, platelet count and presence of ischemic heart disease. The causal forest approach found some evidence of HTE (p = 0.124). When extracting root splits, the modal split was on serum potassium (mean applied threshold of 4.68 mmol/L). When dividing the patient population into subgroups based on the mean initial root threshold, risk differences in being alive at 28 days were 0.069 (95%CI [-0.032, 0.169]) and − 0.257 (95%CI [-0.368, -0.146]) with serum potassium ≤ 4.68 and > 4.68 respectively. Conclusions The causal forest agreed with the data-adaptive and classical method of subgroup analysis in identifying HTE by serum potassium. Whilst classical and data-adaptive methods may identify sources of HTE, they do not immediately suggest subgroup splits which are clinically actionable. The extraction of root splits in causal forests is a novel approach to obtaining data-derived subgroups, to be further investigated.
format Article
id doaj-art-dd229ec33eee4cf99e94bf1d6588e959
institution DOAJ
issn 1471-2288
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj-art-dd229ec33eee4cf99e94bf1d6588e9592025-08-20T03:10:50ZengBMCBMC Medical Research Methodology1471-22882025-02-0125111310.1186/s12874-025-02489-2Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case studyEleanor Van Vogt0Anthony C. Gordon1Karla Diaz-Ordaz2Suzie Cro3Imperial College LondonImperial College LondonUniversity College LondonImperial College LondonAbstract Background Classical approaches to subgroup analysis in randomised controlled trials (RCTs) to identify heterogeneous treatment effects (HTEs) involve testing the interaction between each pre-specified possible treatment effect modifier and the treatment effect. However, individual significant interactions may not always yield clinically actionable subgroups, particularly for continuous covariates. Non-parametric causal machine learning approaches are flexible alternatives for estimating HTEs across many possible treatment effect modifiers in a single analysis. Methods We conducted a secondary analysis of the VANISH RCT, which compared the early use of vasopressin with norepinephrine on renal failure-free survival for patients with septic shock at 28 days. We used classical (separate tests for interaction with Bonferroni correction), data-adaptive (hierarchical lasso regression), and non-parametric causal machine learning (causal forest) methods to analyse HTEs for the primary outcome of being alive at 28 days. Causal forests comprise honest causal trees, which use sample splitting to determine tree splits and estimate treatment effects separately. The modal initial (root) splits of the causal forest were extracted, and the mean value was used as a threshold to partition the population into subgroups with different treatment effects. Results All three models found evidence of HTE with serum potassium levels. Univariable logistic regression OR 0.435 (95%CI [0.270, 0.683]. p = 0.0004), hierarchical lasso logistic regression standardised OR: 0.604 (95% CI 0.259, 0.701), lambda = 0.0049. Hierarchical lasso kept the interaction between the treatment and serum potassium, sodium level, minimum temperature, platelet count and presence of ischemic heart disease. The causal forest approach found some evidence of HTE (p = 0.124). When extracting root splits, the modal split was on serum potassium (mean applied threshold of 4.68 mmol/L). When dividing the patient population into subgroups based on the mean initial root threshold, risk differences in being alive at 28 days were 0.069 (95%CI [-0.032, 0.169]) and − 0.257 (95%CI [-0.368, -0.146]) with serum potassium ≤ 4.68 and > 4.68 respectively. Conclusions The causal forest agreed with the data-adaptive and classical method of subgroup analysis in identifying HTE by serum potassium. Whilst classical and data-adaptive methods may identify sources of HTE, they do not immediately suggest subgroup splits which are clinically actionable. The extraction of root splits in causal forests is a novel approach to obtaining data-derived subgroups, to be further investigated.https://doi.org/10.1186/s12874-025-02489-2Causal machine learningSubgroup analysisCritical careTreatment effect heterogeneityCausal inferenceRandomised controlled trial
spellingShingle Eleanor Van Vogt
Anthony C. Gordon
Karla Diaz-Ordaz
Suzie Cro
Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study
BMC Medical Research Methodology
Causal machine learning
Subgroup analysis
Critical care
Treatment effect heterogeneity
Causal inference
Randomised controlled trial
title Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study
title_full Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study
title_fullStr Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study
title_full_unstemmed Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study
title_short Application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects: a case study
title_sort application of causal forests to randomised controlled trial data to identify heterogeneous treatment effects a case study
topic Causal machine learning
Subgroup analysis
Critical care
Treatment effect heterogeneity
Causal inference
Randomised controlled trial
url https://doi.org/10.1186/s12874-025-02489-2
work_keys_str_mv AT eleanorvanvogt applicationofcausalforeststorandomisedcontrolledtrialdatatoidentifyheterogeneoustreatmenteffectsacasestudy
AT anthonycgordon applicationofcausalforeststorandomisedcontrolledtrialdatatoidentifyheterogeneoustreatmenteffectsacasestudy
AT karladiazordaz applicationofcausalforeststorandomisedcontrolledtrialdatatoidentifyheterogeneoustreatmenteffectsacasestudy
AT suziecro applicationofcausalforeststorandomisedcontrolledtrialdatatoidentifyheterogeneoustreatmenteffectsacasestudy