Large language models for causal hypothesis generation in science

Towards the goal of understanding the causal structure underlying complex systems—such as the Earth, the climate, or the brain—integrating Large language models (LLMs) with data-driven and domain-expertise-driven approaches has the potential to become a game-changer, especially in data and expertise...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kai-Hendrik Cohrs, Emiliano Diaz, Vasileios Sitokonstantinou, Gherardo Varando, Gustau Camps-Valls
Format:	Article
Language:	English
Published:	IOP Publishing 2025-01-01
Series:	Machine Learning: Science and Technology
Subjects:	causality large language models hypothesis generation science causal discovery
Online Access:	https://doi.org/10.1088/2632-2153/ada47f
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832576049971462144
author	Kai-Hendrik Cohrs Emiliano Diaz Vasileios Sitokonstantinou Gherardo Varando Gustau Camps-Valls
author_facet	Kai-Hendrik Cohrs Emiliano Diaz Vasileios Sitokonstantinou Gherardo Varando Gustau Camps-Valls
author_sort	Kai-Hendrik Cohrs
collection	DOAJ
description	Towards the goal of understanding the causal structure underlying complex systems—such as the Earth, the climate, or the brain—integrating Large language models (LLMs) with data-driven and domain-expertise-driven approaches has the potential to become a game-changer, especially in data and expertise-limited scenarios. Debates persist around LLMs’ causal reasoning capacities. However, rather than engaging in philosophical debates, we propose integrating LLMs into a scientific framework for causal hypothesis generation alongside expert knowledge and data. Our goals include formalizing LLMs as probabilistic imperfect experts, developing adaptive methods for causal hypothesis generation, and establishing universal benchmarks for comprehensive comparisons. Specifically, we introduce a spectrum of integration methods for experts, LLMs, and data-driven approaches. We review existing approaches for causal hypothesis generation and classify them within this spectrum. As an example, our hybrid (LLM + data) causal discovery algorithm illustrates ways for deeper integration. Characterizing imperfect experts along dimensions such as (1) reliability, (2) consistency, (3) uncertainty, and (4) content vs. reasoning are emphasized for developing adaptable methods. Lastly, we stress the importance of model-agnostic benchmarks.
format	Article
id	doaj-art-6605f3b5eace4ea9802d52c10afa0e72
institution	Kabale University
issn	2632-2153
language	English
publishDate	2025-01-01
publisher	IOP Publishing
record_format	Article
series	Machine Learning: Science and Technology
spelling	doaj-art-6605f3b5eace4ea9802d52c10afa0e722025-01-31T13:28:55ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101300110.1088/2632-2153/ada47fLarge language models for causal hypothesis generation in scienceKai-Hendrik Cohrs0https://orcid.org/0000-0002-2286-7487Emiliano Diaz1https://orcid.org/0000-0001-8410-6635Vasileios Sitokonstantinou2https://orcid.org/0000-0001-5506-2872Gherardo Varando3https://orcid.org/0000-0002-6708-1103Gustau Camps-Valls4https://orcid.org/0000-0003-1683-2138Image Processing Laboratory (IPL), Universitat de València , València, SpainImage Processing Laboratory (IPL), Universitat de València , València, SpainImage Processing Laboratory (IPL), Universitat de València , València, SpainImage Processing Laboratory (IPL), Universitat de València , València, SpainImage Processing Laboratory (IPL), Universitat de València , València, SpainTowards the goal of understanding the causal structure underlying complex systems—such as the Earth, the climate, or the brain—integrating Large language models (LLMs) with data-driven and domain-expertise-driven approaches has the potential to become a game-changer, especially in data and expertise-limited scenarios. Debates persist around LLMs’ causal reasoning capacities. However, rather than engaging in philosophical debates, we propose integrating LLMs into a scientific framework for causal hypothesis generation alongside expert knowledge and data. Our goals include formalizing LLMs as probabilistic imperfect experts, developing adaptive methods for causal hypothesis generation, and establishing universal benchmarks for comprehensive comparisons. Specifically, we introduce a spectrum of integration methods for experts, LLMs, and data-driven approaches. We review existing approaches for causal hypothesis generation and classify them within this spectrum. As an example, our hybrid (LLM + data) causal discovery algorithm illustrates ways for deeper integration. Characterizing imperfect experts along dimensions such as (1) reliability, (2) consistency, (3) uncertainty, and (4) content vs. reasoning are emphasized for developing adaptable methods. Lastly, we stress the importance of model-agnostic benchmarks.https://doi.org/10.1088/2632-2153/ada47fcausalitylarge language modelshypothesis generationsciencecausal discovery
spellingShingle	Kai-Hendrik Cohrs Emiliano Diaz Vasileios Sitokonstantinou Gherardo Varando Gustau Camps-Valls Large language models for causal hypothesis generation in science Machine Learning: Science and Technology causality large language models hypothesis generation science causal discovery
title	Large language models for causal hypothesis generation in science
title_full	Large language models for causal hypothesis generation in science
title_fullStr	Large language models for causal hypothesis generation in science
title_full_unstemmed	Large language models for causal hypothesis generation in science
title_short	Large language models for causal hypothesis generation in science
title_sort	large language models for causal hypothesis generation in science
topic	causality large language models hypothesis generation science causal discovery
url	https://doi.org/10.1088/2632-2153/ada47f
work_keys_str_mv	AT kaihendrikcohrs largelanguagemodelsforcausalhypothesisgenerationinscience AT emilianodiaz largelanguagemodelsforcausalhypothesisgenerationinscience AT vasileiossitokonstantinou largelanguagemodelsforcausalhypothesisgenerationinscience AT gherardovarando largelanguagemodelsforcausalhypothesisgenerationinscience AT gustaucampsvalls largelanguagemodelsforcausalhypothesisgenerationinscience

Large language models for causal hypothesis generation in science

Similar Items