Model-X knockoffs in the replication crisis era: Reducing false discoveries and researcher bias in social science research

The present study addresses problems faced by data-driven social science caused by having too much or not enough data. In particular, an abundance of data or a (sudden) lack thereof makes it challenging to identify the most important predictors in a sea of noise using the most parsimonious and repro...

Full description

Saved in:
Bibliographic Details
Main Authors: Jing Zhou, Sebastian Scherr
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Social Sciences and Humanities Open
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S259029112500107X
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The present study addresses problems faced by data-driven social science caused by having too much or not enough data. In particular, an abundance of data or a (sudden) lack thereof makes it challenging to identify the most important predictors in a sea of noise using the most parsimonious and reproducible model possible. In this article, we present the model-X knockoff method, which was introduced by Candès et al. (2018) for reducing the false identification of significant effects due to flexibility-ambiguity issues, to a broader audience, particularly within the social sciences and humanities. Our goal is to provide an accessible starting point and ideally spark interest among researchers in these fields to explore how model-X knockoffs can enhance their work. The findings from a performance contrast simulation indicate that model-X knockoffs select fewer relevant variables than other statistical methods to automatically identify variables, resulting in fewer mistakes. The simulation findings also demonstrate that model-X knockoffs are stable and less sensitive to even small changes in the dataset than other procedures, making them a viable way to reduce researcher degrees of freedom and increase the reproducibility of scientific findings. An additional real data example demonstrates the operational utility of the simulation.
ISSN:2590-2911