Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion

Objective: Osteosarcoma is a prevalent primary malignant bone tumor in children and adolescents, accounting for approximately 5 % of childhood malignancies. Because of its rarity and biological complexity, treatment breakthroughs for osteosarcoma have been limited. To advance research in this field,...

Full description

Saved in:
Bibliographic Details
Main Authors: Lulu Zhang, Weisong Zhao, Zhiwei Cheng, Yafei Jiang, Kai Tian, Jia Shi, Zhenyu Jiang, Yingqi Hua
Format: Article
Language:English
Published: Elsevier 2025-05-01
Series:Intelligent Medicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667102625000269
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850123603428769792
author Lulu Zhang
Weisong Zhao
Zhiwei Cheng
Yafei Jiang
Kai Tian
Jia Shi
Zhenyu Jiang
Yingqi Hua
author_facet Lulu Zhang
Weisong Zhao
Zhiwei Cheng
Yafei Jiang
Kai Tian
Jia Shi
Zhenyu Jiang
Yingqi Hua
author_sort Lulu Zhang
collection DOAJ
description Objective: Osteosarcoma is a prevalent primary malignant bone tumor in children and adolescents, accounting for approximately 5 % of childhood malignancies. Because of its rarity and biological complexity, treatment breakthroughs for osteosarcoma have been limited. To advance research in this field, we aimed to construct the first comprehensive osteosarcoma knowledge graph (OSKG) using the PubMed database. Methods: A systematic search of PubMed (2003–2023) using the keyword “osteosarcoma” yielded 25,415 abstracts. Leveraging BioBERT, pretrained on biomedical corpora and fine-tuned with osteosarcoma-specific manual annotations, we identified 16 entity types and 17 biological relationships. The extracted elements were synthesized to create the OSKG, resulting in a deep learning-based knowledge base to explore osteosarcoma pathogenesis and molecular mechanisms. We then developed a specialized question-answering system (knowledge graph question answering (KGQA)) powered by ChatGLM3. This system employs advanced natural language processing and incorporates the OSKG to ensure optimal response quality and accuracy. Results: The pretrained BioBERT averaged > 92 % accuracy in entity and relationship training. Evaluation using 100 pairs of gold-standard quizzes showed that the final quiz system outperformed other large language models in accuracy and robustness. Conclusion: The system is designed to provide accurate disease-related queries and answers, effectively facilitating knowledge acquisition and reasoning in medical research and clinical practice. This project offers a robust tool for osteosarcoma research and promotes the deep integration of knowledge graphs and artificial intelligence technologies in the medical field.
format Article
id doaj-art-3aa50d87b7b04f36bc66716d20d8e810
institution OA Journals
issn 2667-1026
language English
publishDate 2025-05-01
publisher Elsevier
record_format Article
series Intelligent Medicine
spelling doaj-art-3aa50d87b7b04f36bc66716d20d8e8102025-08-20T02:34:33ZengElsevierIntelligent Medicine2667-10262025-05-01529911010.1016/j.imed.2024.12.001Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusionLulu Zhang0Weisong Zhao1Zhiwei Cheng2Yafei Jiang3Kai Tian4Jia Shi5Zhenyu Jiang6Yingqi Hua7School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China; Department of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, ChinaDepartment of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, ChinaDepartment of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, ChinaDepartment of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, ChinaDepartment of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, ChinaDepartment of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, ChinaSchool of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, ChinaDepartment of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China; Corresponding author: Yingqi Hua, Department of Orthopedic Oncology, Shanghai Bone Tumor Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China.Objective: Osteosarcoma is a prevalent primary malignant bone tumor in children and adolescents, accounting for approximately 5 % of childhood malignancies. Because of its rarity and biological complexity, treatment breakthroughs for osteosarcoma have been limited. To advance research in this field, we aimed to construct the first comprehensive osteosarcoma knowledge graph (OSKG) using the PubMed database. Methods: A systematic search of PubMed (2003–2023) using the keyword “osteosarcoma” yielded 25,415 abstracts. Leveraging BioBERT, pretrained on biomedical corpora and fine-tuned with osteosarcoma-specific manual annotations, we identified 16 entity types and 17 biological relationships. The extracted elements were synthesized to create the OSKG, resulting in a deep learning-based knowledge base to explore osteosarcoma pathogenesis and molecular mechanisms. We then developed a specialized question-answering system (knowledge graph question answering (KGQA)) powered by ChatGLM3. This system employs advanced natural language processing and incorporates the OSKG to ensure optimal response quality and accuracy. Results: The pretrained BioBERT averaged > 92 % accuracy in entity and relationship training. Evaluation using 100 pairs of gold-standard quizzes showed that the final quiz system outperformed other large language models in accuracy and robustness. Conclusion: The system is designed to provide accurate disease-related queries and answers, effectively facilitating knowledge acquisition and reasoning in medical research and clinical practice. This project offers a robust tool for osteosarcoma research and promotes the deep integration of knowledge graphs and artificial intelligence technologies in the medical field.http://www.sciencedirect.com/science/article/pii/S2667102625000269OsteosarcomaKnowledge graphLarge language modelText mining
spellingShingle Lulu Zhang
Weisong Zhao
Zhiwei Cheng
Yafei Jiang
Kai Tian
Jia Shi
Zhenyu Jiang
Yingqi Hua
Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion
Intelligent Medicine
Osteosarcoma
Knowledge graph
Large language model
Text mining
title Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion
title_full Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion
title_fullStr Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion
title_full_unstemmed Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion
title_short Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion
title_sort osteosarcoma knowledge graph question answering system deep learning based knowledge graph and large language model fusion
topic Osteosarcoma
Knowledge graph
Large language model
Text mining
url http://www.sciencedirect.com/science/article/pii/S2667102625000269
work_keys_str_mv AT luluzhang osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion
AT weisongzhao osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion
AT zhiweicheng osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion
AT yafeijiang osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion
AT kaitian osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion
AT jiashi osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion
AT zhenyujiang osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion
AT yingqihua osteosarcomaknowledgegraphquestionansweringsystemdeeplearningbasedknowledgegraphandlargelanguagemodelfusion