Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease
Abstract Background In Literature-based Discovery (LBD), Swanson’s original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a s...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-03-01
|
| Series: | Journal of Biomedical Semantics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13326-025-00328-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849391932213559296 |
|---|---|
| author | Yiyuan Pu Daniel Beck Karin Verspoor |
| author_facet | Yiyuan Pu Daniel Beck Karin Verspoor |
| author_sort | Yiyuan Pu |
| collection | DOAJ |
| description | Abstract Background In Literature-based Discovery (LBD), Swanson’s original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks. However, our analysis of a knowledge graph constructed for a recent LBD system reveals limitations arising from such pairwise representations, which further negatively impact knowledge inference. Using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We argue that enhanced knowledge representation is beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD. Results Based on a systematic analysis of one co-occurrence-based LBD system focusing on Alzheimer’s Disease, we identify 7 types of limitations arising from the exclusive use of pairwise relationships in a standard knowledge graph—including the need to capture more than two entities interacting together in a single event—and 3 types of negative impacts on knowledge inferred with the graph—Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations. We also present an indicative distribution of different types of relationships. Pairwise relationships are an essential component in representation frameworks for knowledge discovery. However, only 20% of discoveries are perfectly represented with pairwise relationships alone. 73% require a combination of pairwise relationships and nested relationships. The remaining 7% are represented with pairwise relationships, nested relationships, and hypergraphs. Conclusion We argue that the standard entity pair-based knowledge graph, while essential for representing basic binary relations, results in important limitations for comprehensive biological knowledge representation and impacts downstream tasks such as proposing meaningful discoveries in LBD. These limitations can be mitigated by integrating more semantically complex knowledge representation strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. Downstream tasks, such as LBD, can benefit from richer representations as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful. |
| format | Article |
| id | doaj-art-4cb744777e3942eeab6aabdd498988c5 |
| institution | Kabale University |
| issn | 2041-1480 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Biomedical Semantics |
| spelling | doaj-art-4cb744777e3942eeab6aabdd498988c52025-08-20T03:40:53ZengBMCJournal of Biomedical Semantics2041-14802025-03-0116112210.1186/s13326-025-00328-3Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s diseaseYiyuan Pu0Daniel Beck1Karin Verspoor2School of Computing and Information Systems, The University of MelbourneSchool of Computing Technologies, RMIT UniversitySchool of Computing and Information Systems, The University of MelbourneAbstract Background In Literature-based Discovery (LBD), Swanson’s original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks. However, our analysis of a knowledge graph constructed for a recent LBD system reveals limitations arising from such pairwise representations, which further negatively impact knowledge inference. Using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We argue that enhanced knowledge representation is beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD. Results Based on a systematic analysis of one co-occurrence-based LBD system focusing on Alzheimer’s Disease, we identify 7 types of limitations arising from the exclusive use of pairwise relationships in a standard knowledge graph—including the need to capture more than two entities interacting together in a single event—and 3 types of negative impacts on knowledge inferred with the graph—Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations. We also present an indicative distribution of different types of relationships. Pairwise relationships are an essential component in representation frameworks for knowledge discovery. However, only 20% of discoveries are perfectly represented with pairwise relationships alone. 73% require a combination of pairwise relationships and nested relationships. The remaining 7% are represented with pairwise relationships, nested relationships, and hypergraphs. Conclusion We argue that the standard entity pair-based knowledge graph, while essential for representing basic binary relations, results in important limitations for comprehensive biological knowledge representation and impacts downstream tasks such as proposing meaningful discoveries in LBD. These limitations can be mitigated by integrating more semantically complex knowledge representation strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. Downstream tasks, such as LBD, can benefit from richer representations as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful.https://doi.org/10.1186/s13326-025-00328-3Knowledge representationLiterature-based DiscoveryKnowledge graphSwanson’s ABC modelLink predictionAlzheimer’s Disease |
| spellingShingle | Yiyuan Pu Daniel Beck Karin Verspoor Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease Journal of Biomedical Semantics Knowledge representation Literature-based Discovery Knowledge graph Swanson’s ABC model Link prediction Alzheimer’s Disease |
| title | Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease |
| title_full | Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease |
| title_fullStr | Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease |
| title_full_unstemmed | Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease |
| title_short | Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer’s disease |
| title_sort | enriched knowledge representation in biological fields a case study of literature based discovery in alzheimer s disease |
| topic | Knowledge representation Literature-based Discovery Knowledge graph Swanson’s ABC model Link prediction Alzheimer’s Disease |
| url | https://doi.org/10.1186/s13326-025-00328-3 |
| work_keys_str_mv | AT yiyuanpu enrichedknowledgerepresentationinbiologicalfieldsacasestudyofliteraturebaseddiscoveryinalzheimersdisease AT danielbeck enrichedknowledgerepresentationinbiologicalfieldsacasestudyofliteraturebaseddiscoveryinalzheimersdisease AT karinverspoor enrichedknowledgerepresentationinbiologicalfieldsacasestudyofliteraturebaseddiscoveryinalzheimersdisease |