Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease

Abstract Background In Literature-based Discovery (LBD), Swanson’s original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a s...

Full description

Saved in:
Bibliographic Details
Main Authors: Yiyuan Pu, Daniel Beck, Karin Verspoor
Format: Article
Language:English
Published: BMC 2025-03-01
Series:Journal of Biomedical Semantics
Subjects:
Online Access:https://doi.org/10.1186/s13326-025-00328-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849391932213559296
author Yiyuan Pu
Daniel Beck
Karin Verspoor
author_facet Yiyuan Pu
Daniel Beck
Karin Verspoor
author_sort Yiyuan Pu
collection DOAJ
description Abstract Background In Literature-based Discovery (LBD), Swanson’s original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks. However, our analysis of a knowledge graph constructed for a recent LBD system reveals limitations arising from such pairwise representations, which further negatively impact knowledge inference. Using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We argue that enhanced knowledge representation is beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD. Results Based on a systematic analysis of one co-occurrence-based LBD system focusing on Alzheimer’s Disease, we identify 7 types of limitations arising from the exclusive use of pairwise relationships in a standard knowledge graph—including the need to capture more than two entities interacting together in a single event—and 3 types of negative impacts on knowledge inferred with the graph—Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations. We also present an indicative distribution of different types of relationships. Pairwise relationships are an essential component in representation frameworks for knowledge discovery. However, only 20% of discoveries are perfectly represented with pairwise relationships alone. 73% require a combination of pairwise relationships and nested relationships. The remaining 7% are represented with pairwise relationships, nested relationships, and hypergraphs. Conclusion We argue that the standard entity pair-based knowledge graph, while essential for representing basic binary relations, results in important limitations for comprehensive biological knowledge representation and impacts downstream tasks such as proposing meaningful discoveries in LBD. These limitations can be mitigated by integrating more semantically complex knowledge representation strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. Downstream tasks, such as LBD, can benefit from richer representations as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful.
format Article
id doaj-art-4cb744777e3942eeab6aabdd498988c5
institution Kabale University
issn 2041-1480
language English
publishDate 2025-03-01
publisher BMC
record_format Article
series Journal of Biomedical Semantics
spelling doaj-art-4cb744777e3942eeab6aabdd498988c52025-08-20T03:40:53ZengBMCJournal of Biomedical Semantics2041-14802025-03-0116112210.1186/s13326-025-00328-3Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s diseaseYiyuan Pu0Daniel Beck1Karin Verspoor2School of Computing and Information Systems, The University of MelbourneSchool of Computing Technologies, RMIT UniversitySchool of Computing and Information Systems, The University of MelbourneAbstract Background In Literature-based Discovery (LBD), Swanson’s original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks. However, our analysis of a knowledge graph constructed for a recent LBD system reveals limitations arising from such pairwise representations, which further negatively impact knowledge inference. Using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We argue that enhanced knowledge representation is beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD. Results Based on a systematic analysis of one co-occurrence-based LBD system focusing on Alzheimer’s Disease, we identify 7 types of limitations arising from the exclusive use of pairwise relationships in a standard knowledge graph—including the need to capture more than two entities interacting together in a single event—and 3 types of negative impacts on knowledge inferred with the graph—Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations. We also present an indicative distribution of different types of relationships. Pairwise relationships are an essential component in representation frameworks for knowledge discovery. However, only 20% of discoveries are perfectly represented with pairwise relationships alone. 73% require a combination of pairwise relationships and nested relationships. The remaining 7% are represented with pairwise relationships, nested relationships, and hypergraphs. Conclusion We argue that the standard entity pair-based knowledge graph, while essential for representing basic binary relations, results in important limitations for comprehensive biological knowledge representation and impacts downstream tasks such as proposing meaningful discoveries in LBD. These limitations can be mitigated by integrating more semantically complex knowledge representation strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. Downstream tasks, such as LBD, can benefit from richer representations as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful.https://doi.org/10.1186/s13326-025-00328-3Knowledge representationLiterature-based DiscoveryKnowledge graphSwanson’s ABC modelLink predictionAlzheimer’s Disease
spellingShingle Yiyuan Pu
Daniel Beck
Karin Verspoor
Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease
Journal of Biomedical Semantics
Knowledge representation
Literature-based Discovery
Knowledge graph
Swanson’s ABC model
Link prediction
Alzheimer’s Disease
title Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease
title_full Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease
title_fullStr Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease
title_full_unstemmed Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease
title_short Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease
title_sort enriched knowledge representation in biological fields a case study of literature based discovery in alzheimer s disease
topic Knowledge representation
Literature-based Discovery
Knowledge graph
Swanson’s ABC model
Link prediction
Alzheimer’s Disease
url https://doi.org/10.1186/s13326-025-00328-3
work_keys_str_mv AT yiyuanpu enrichedknowledgerepresentationinbiologicalfieldsacasestudyofliteraturebaseddiscoveryinalzheimersdisease
AT danielbeck enrichedknowledgerepresentationinbiologicalfieldsacasestudyofliteraturebaseddiscoveryinalzheimersdisease
AT karinverspoor enrichedknowledgerepresentationinbiologicalfieldsacasestudyofliteraturebaseddiscoveryinalzheimersdisease