Propensity Score Matching: should we use it in designing observational studies?
Abstract Background Propensity Score Matching (PSM) stands as a widely embraced method in comparative effectiveness research. PSM crafts matched datasets, mimicking some attributes of randomized designs, from observational data. In a valid PSM design where all baseline confounders are measured and m...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | BMC Medical Research Methodology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12874-025-02481-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract Background Propensity Score Matching (PSM) stands as a widely embraced method in comparative effectiveness research. PSM crafts matched datasets, mimicking some attributes of randomized designs, from observational data. In a valid PSM design where all baseline confounders are measured and matched, the confounders would be balanced, allowing the treatment status to be considered as if it were randomly assigned. Nevertheless, recent research has unveiled a different facet of PSM, termed “the PSM paradox”. As PSM approaches exact matching by progressively pruning matched sets in order of decreasing propensity score distance, it can paradoxically lead to greater covariate imbalance, heightened model dependence, and increased bias, contrary to its intended purpose. Methods We used analytic formula, simulation, and literature to demonstrate that this paradox stems from the misuse of metrics for assessing chance imbalance and bias. Results Firstly, matched pairs typically exhibit different covariate values despite having identical propensity scores. However, this disparity represents a “chance” difference and will average to zero over a large number of matched pairs. Common distance metrics cannot capture this “chance” nature in covariate imbalance, instead reflecting increasing variability in chance imbalance as units are pruned and the sample size diminishes. Secondly, the largest estimate among numerous fitted models, because of uncertainty among researchers over the correct model, was used to determine statistical bias. This cherry-picking procedure ignores the most significant benefit of matching design-reducing model dependence based on its robustness against model misspecification bias. Conclusions We conclude that the PSM paradox is not a legitimate concern and should not stop researchers from using PSM designs. |
---|---|
ISSN: | 1471-2288 |