Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs
Abstract Background The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | BMC Medical Informatics and Decision Making |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12911-025-03013-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850273452906250240 |
|---|---|
| author | Tatsuya Kushida Tarcisio Mendes de Farias Ana C. Sima Christophe Dessimoz Hirokazu Chiba Frederic B. Bastian Hiroshi Masuya |
| author_facet | Tatsuya Kushida Tarcisio Mendes de Farias Ana C. Sima Christophe Dessimoz Hirokazu Chiba Frederic B. Bastian Hiroshi Masuya |
| author_sort | Tatsuya Kushida |
| collection | DOAJ |
| description | Abstract Background The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database). Methods This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications. Results We illustrate the above through two use cases targeting either Alzheimer’s disease or melanoma. We identified 14 Alzheimer’s disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer’s disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the “skin of limb” as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase. Conclusions As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation. |
| format | Article |
| id | doaj-art-cfee2c8ffa1e4b5db6df712db840b986 |
| institution | OA Journals |
| issn | 1472-6947 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Informatics and Decision Making |
| spelling | doaj-art-cfee2c8ffa1e4b5db6df712db840b9862025-08-20T01:51:28ZengBMCBMC Medical Informatics and Decision Making1472-69472025-05-0125S11110.1186/s12911-025-03013-8Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphsTatsuya Kushida0Tarcisio Mendes de Farias1Ana C. Sima2Christophe Dessimoz3Hirokazu Chiba4Frederic B. Bastian5Hiroshi Masuya6BioResource Research Center, RIKENSIB Swiss Institute of BioinformaticsSIB Swiss Institute of BioinformaticsSIB Swiss Institute of BioinformaticsDatabase Center for Life Science, DS, ROISSIB Swiss Institute of BioinformaticsBioResource Research Center, RIKENAbstract Background The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database). Methods This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications. Results We illustrate the above through two use cases targeting either Alzheimer’s disease or melanoma. We identified 14 Alzheimer’s disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer’s disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the “skin of limb” as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase. Conclusions As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.https://doi.org/10.1186/s12911-025-03013-8Database integrationGene-disease associationGene expressionKnowledge graphModel organismOntology |
| spellingShingle | Tatsuya Kushida Tarcisio Mendes de Farias Ana C. Sima Christophe Dessimoz Hirokazu Chiba Frederic B. Bastian Hiroshi Masuya Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs BMC Medical Informatics and Decision Making Database integration Gene-disease association Gene expression Knowledge graph Model organism Ontology |
| title | Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs |
| title_full | Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs |
| title_fullStr | Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs |
| title_full_unstemmed | Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs |
| title_short | Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs |
| title_sort | federated sparql query performance evaluation for exploring disease model mouse combining gene expression orthology and disease knowledge graphs |
| topic | Database integration Gene-disease association Gene expression Knowledge graph Model organism Ontology |
| url | https://doi.org/10.1186/s12911-025-03013-8 |
| work_keys_str_mv | AT tatsuyakushida federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs AT tarcisiomendesdefarias federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs AT anacsima federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs AT christophedessimoz federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs AT hirokazuchiba federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs AT fredericbbastian federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs AT hiroshimasuya federatedsparqlqueryperformanceevaluationforexploringdiseasemodelmousecombininggeneexpressionorthologyanddiseaseknowledgegraphs |