SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.

A Directed Acyclic Graph (DAG) offers an easy approach to define causal structures among gathered nodes: causal linkages are represented by arrows between the variables, leading from cause to effect. Recently, industry and academics have paid close attention to DAG structure learning from observable...

Full description

Saved in:
Bibliographic Details
Main Authors: Mario Grassi, Barbara Tarantino
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0317283
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841533170269290496
author Mario Grassi
Barbara Tarantino
author_facet Mario Grassi
Barbara Tarantino
author_sort Mario Grassi
collection DOAJ
description A Directed Acyclic Graph (DAG) offers an easy approach to define causal structures among gathered nodes: causal linkages are represented by arrows between the variables, leading from cause to effect. Recently, industry and academics have paid close attention to DAG structure learning from observable data, and many techniques have been put out to address the problem. We provide a two-step approach, named SEMdag(), that can be used to quickly learn high-dimensional linear SEMs. It is included in the R package SEMgraph and employs a two-stage order-based search using previous knowledge (Knowledge-based, KB) or data-driven method (Bottom-up, BU), under the premise that a linear SEM with equal variance error terms is assumed. We evaluated our framework's for finding plausible DAGs against six well-known causal discovery techniques (ARGES, GES, PC, LiNGAM, CAM, NOTEARS). We conducted a series of experiments using observed expression (or RNA-seq) data, taking into account a pair of training and testing datasets for four distinct diseases: Amyotrophic Lateral Sclerosis (ALS), Breast cancer (BRCA), Coronavirus disease (COVID-19) and ST-elevation myocardial infarction (STEMI). The results show that the SEMdag() procedure can recover a graph structure with good disease prediction performance evaluated by a conventional supervised learning algorithm (RF): in the scenario where the initial graph is sparse, the BU approach may be a better choice than the KB one; in the case where the graph is denser, both BU an KB report high performance, with highest score for KB approach based on topological layers. Besides its superior disease predictive performance compared to previous research, SEMdag() offers the user the flexibility to define distinct structure learning algorithms and can handle high dimensional issues with less computing load. SEMdag() function is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph.
format Article
id doaj-art-7cb1c6dfdbe141aca466db2eeaf64078
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-7cb1c6dfdbe141aca466db2eeaf640782025-01-17T05:31:34ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031728310.1371/journal.pone.0317283SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.Mario GrassiBarbara TarantinoA Directed Acyclic Graph (DAG) offers an easy approach to define causal structures among gathered nodes: causal linkages are represented by arrows between the variables, leading from cause to effect. Recently, industry and academics have paid close attention to DAG structure learning from observable data, and many techniques have been put out to address the problem. We provide a two-step approach, named SEMdag(), that can be used to quickly learn high-dimensional linear SEMs. It is included in the R package SEMgraph and employs a two-stage order-based search using previous knowledge (Knowledge-based, KB) or data-driven method (Bottom-up, BU), under the premise that a linear SEM with equal variance error terms is assumed. We evaluated our framework's for finding plausible DAGs against six well-known causal discovery techniques (ARGES, GES, PC, LiNGAM, CAM, NOTEARS). We conducted a series of experiments using observed expression (or RNA-seq) data, taking into account a pair of training and testing datasets for four distinct diseases: Amyotrophic Lateral Sclerosis (ALS), Breast cancer (BRCA), Coronavirus disease (COVID-19) and ST-elevation myocardial infarction (STEMI). The results show that the SEMdag() procedure can recover a graph structure with good disease prediction performance evaluated by a conventional supervised learning algorithm (RF): in the scenario where the initial graph is sparse, the BU approach may be a better choice than the KB one; in the case where the graph is denser, both BU an KB report high performance, with highest score for KB approach based on topological layers. Besides its superior disease predictive performance compared to previous research, SEMdag() offers the user the flexibility to define distinct structure learning algorithms and can handle high dimensional issues with less computing load. SEMdag() function is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph.https://doi.org/10.1371/journal.pone.0317283
spellingShingle Mario Grassi
Barbara Tarantino
SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.
PLoS ONE
title SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.
title_full SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.
title_fullStr SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.
title_full_unstemmed SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.
title_short SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.
title_sort semdag fast learning of directed acyclic graphs via node or layer ordering
url https://doi.org/10.1371/journal.pone.0317283
work_keys_str_mv AT mariograssi semdagfastlearningofdirectedacyclicgraphsvianodeorlayerordering
AT barbaratarantino semdagfastlearningofdirectedacyclicgraphsvianodeorlayerordering