BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature

Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the n...

Full description

Saved in:
Bibliographic Details
Main Authors: Henning Schäfer, Ahmad Idrissi-Yaghir, Kamyar Arzideh, Hendrik Damm, Tabea M.G. Pakull, Cynthia S. Schmidt, Mikel Bahn, Georg Lodde, Elisabeth Livingstone, Dirk Schadendorf, Felix Nensa, Peter A. Horn, Christoph M. Friedrich
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037024003386
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850250575394897920
author Henning Schäfer
Ahmad Idrissi-Yaghir
Kamyar Arzideh
Hendrik Damm
Tabea M.G. Pakull
Cynthia S. Schmidt
Mikel Bahn
Georg Lodde
Elisabeth Livingstone
Dirk Schadendorf
Felix Nensa
Peter A. Horn
Christoph M. Friedrich
author_facet Henning Schäfer
Ahmad Idrissi-Yaghir
Kamyar Arzideh
Hendrik Damm
Tabea M.G. Pakull
Cynthia S. Schmidt
Mikel Bahn
Georg Lodde
Elisabeth Livingstone
Dirk Schadendorf
Felix Nensa
Peter A. Horn
Christoph M. Friedrich
author_sort Henning Schäfer
collection DOAJ
description Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs.Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models.Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1-Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F1-Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab.Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.
format Article
id doaj-art-bb5ff474e9dc4345913fc469fb7b07e5
institution OA Journals
issn 2001-0370
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj-art-bb5ff474e9dc4345913fc469fb7b07e52025-08-20T01:58:09ZengElsevierComputational and Structural Biotechnology Journal2001-03702024-12-012463966010.1016/j.csbj.2024.10.017BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literatureHenning Schäfer0Ahmad Idrissi-Yaghir1Kamyar Arzideh2Hendrik Damm3Tabea M.G. Pakull4Cynthia S. Schmidt5Mikel Bahn6Georg Lodde7Elisabeth Livingstone8Dirk Schadendorf9Felix Nensa10Peter A. Horn11Christoph M. Friedrich12Institute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, GermanyDepartment of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, GermanyDepartment of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, GermanyInstitute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Institute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, GermanyInstitute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, GermanyDepartment of Dermatology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyDepartment of Dermatology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyDepartment of Dermatology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany; Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyDepartment of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Corresponding author at: Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany.Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs.Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models.Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1-Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F1-Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab.Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.http://www.sciencedirect.com/science/article/pii/S2001037024003386Knowledge graphNamed entity recognitionEntity linkingClinical guidelinesSoftware
spellingShingle Henning Schäfer
Ahmad Idrissi-Yaghir
Kamyar Arzideh
Hendrik Damm
Tabea M.G. Pakull
Cynthia S. Schmidt
Mikel Bahn
Georg Lodde
Elisabeth Livingstone
Dirk Schadendorf
Felix Nensa
Peter A. Horn
Christoph M. Friedrich
BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature
Computational and Structural Biotechnology Journal
Knowledge graph
Named entity recognition
Entity linking
Clinical guidelines
Software
title BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature
title_full BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature
title_fullStr BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature
title_full_unstemmed BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature
title_short BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature
title_sort biokgrapher initial evaluation of automated knowledge graph construction from biomedical literature
topic Knowledge graph
Named entity recognition
Entity linking
Clinical guidelines
Software
url http://www.sciencedirect.com/science/article/pii/S2001037024003386
work_keys_str_mv AT henningschafer biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT ahmadidrissiyaghir biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT kamyararzideh biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT hendrikdamm biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT tabeamgpakull biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT cynthiasschmidt biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT mikelbahn biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT georglodde biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT elisabethlivingstone biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT dirkschadendorf biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT felixnensa biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT peterahorn biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature
AT christophmfriedrich biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature