BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature
Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the n...
Saved in:
| Main Authors: | , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-12-01
|
| Series: | Computational and Structural Biotechnology Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037024003386 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850250575394897920 |
|---|---|
| author | Henning Schäfer Ahmad Idrissi-Yaghir Kamyar Arzideh Hendrik Damm Tabea M.G. Pakull Cynthia S. Schmidt Mikel Bahn Georg Lodde Elisabeth Livingstone Dirk Schadendorf Felix Nensa Peter A. Horn Christoph M. Friedrich |
| author_facet | Henning Schäfer Ahmad Idrissi-Yaghir Kamyar Arzideh Hendrik Damm Tabea M.G. Pakull Cynthia S. Schmidt Mikel Bahn Georg Lodde Elisabeth Livingstone Dirk Schadendorf Felix Nensa Peter A. Horn Christoph M. Friedrich |
| author_sort | Henning Schäfer |
| collection | DOAJ |
| description | Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs.Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models.Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1-Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F1-Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab.Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing. |
| format | Article |
| id | doaj-art-bb5ff474e9dc4345913fc469fb7b07e5 |
| institution | OA Journals |
| issn | 2001-0370 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Computational and Structural Biotechnology Journal |
| spelling | doaj-art-bb5ff474e9dc4345913fc469fb7b07e52025-08-20T01:58:09ZengElsevierComputational and Structural Biotechnology Journal2001-03702024-12-012463966010.1016/j.csbj.2024.10.017BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literatureHenning Schäfer0Ahmad Idrissi-Yaghir1Kamyar Arzideh2Hendrik Damm3Tabea M.G. Pakull4Cynthia S. Schmidt5Mikel Bahn6Georg Lodde7Elisabeth Livingstone8Dirk Schadendorf9Felix Nensa10Peter A. Horn11Christoph M. Friedrich12Institute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, GermanyDepartment of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, GermanyDepartment of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, GermanyInstitute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Institute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, GermanyInstitute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, GermanyDepartment of Dermatology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyDepartment of Dermatology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyDepartment of Dermatology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany; Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyInstitute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, 45147, GermanyDepartment of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany; Corresponding author at: Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, 44227, Germany.Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs.Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models.Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F1-Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F1-Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab.Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.http://www.sciencedirect.com/science/article/pii/S2001037024003386Knowledge graphNamed entity recognitionEntity linkingClinical guidelinesSoftware |
| spellingShingle | Henning Schäfer Ahmad Idrissi-Yaghir Kamyar Arzideh Hendrik Damm Tabea M.G. Pakull Cynthia S. Schmidt Mikel Bahn Georg Lodde Elisabeth Livingstone Dirk Schadendorf Felix Nensa Peter A. Horn Christoph M. Friedrich BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature Computational and Structural Biotechnology Journal Knowledge graph Named entity recognition Entity linking Clinical guidelines Software |
| title | BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature |
| title_full | BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature |
| title_fullStr | BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature |
| title_full_unstemmed | BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature |
| title_short | BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature |
| title_sort | biokgrapher initial evaluation of automated knowledge graph construction from biomedical literature |
| topic | Knowledge graph Named entity recognition Entity linking Clinical guidelines Software |
| url | http://www.sciencedirect.com/science/article/pii/S2001037024003386 |
| work_keys_str_mv | AT henningschafer biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT ahmadidrissiyaghir biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT kamyararzideh biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT hendrikdamm biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT tabeamgpakull biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT cynthiasschmidt biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT mikelbahn biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT georglodde biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT elisabethlivingstone biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT dirkschadendorf biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT felixnensa biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT peterahorn biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature AT christophmfriedrich biokgrapherinitialevaluationofautomatedknowledgegraphconstructionfrombiomedicalliterature |