Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants

Ineffective treatment and side effects are associated with high burdens for the patient and society. We investigated the application of graph representation learning (GRL) for predicting medication usage based on individual genetic data in the United Kingdom Biobank (UKBB). A graph convolutional net...

Full description

Saved in:
Bibliographic Details
Main Authors: Bill Qi, Yannis J. Trakadis
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/12/6/595
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849435279583084544
author Bill Qi
Yannis J. Trakadis
author_facet Bill Qi
Yannis J. Trakadis
author_sort Bill Qi
collection DOAJ
description Ineffective treatment and side effects are associated with high burdens for the patient and society. We investigated the application of graph representation learning (GRL) for predicting medication usage based on individual genetic data in the United Kingdom Biobank (UKBB). A graph convolutional network (GCN) was used to integrate interconnected biomedical entities in the form of a knowledge graph as part of a machine learning (ML) prediction model. Data from The Pharmacogenomics Knowledgebase (PharmGKB) was used to construct a biomedical knowledge graph. Individual genetic data (<i>n</i> = 485,754) from the UKBB was obtained and preprocessed to match with pharmacogenetic variants in the PharmGKB. Self-reported medication usage labels were obtained from UKBB data field 20003. We hypothesize that pharmacogenetic variants can predict the impact of medications on individuals. We assume that an individual using a medication on a regular basis experiences a net benefit (vs. side-effects) from the medication. ML models were trained to predict medication usage for 264 medications. The GCN model significantly outperformed both a baseline logistic regression model (<i>p</i>-value: 1.53 × 10<sup>−9</sup>) and a deep neural network model (<i>p</i>-value: 8.68 × 10<sup>−8</sup>). The GCN model also significantly outperformed a GCN model trained using a random graph (GCN-random) (<i>p</i>-value: 5.44 × 10<sup>−9</sup>). A consistent trend of medications with higher sample sizes having better performance was observed, and for several medications, a high relative rank of the medication (among multiple medications) was associated with greater than 2-fold higher odds of usage of the medication. In conclusion, a graph-based ML approach could be useful in advancing precision medicine by prioritizing medications that a patient may need based on their genetic data. However, further research is needed to improve the quality and quantity of genetic data and to validate our approach using more reliable medication labels.
format Article
id doaj-art-aec2102f99974fe9b4be5455a533bc2a
institution Kabale University
issn 2306-5354
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Bioengineering
spelling doaj-art-aec2102f99974fe9b4be5455a533bc2a2025-08-20T03:26:21ZengMDPI AGBioengineering2306-53542025-05-0112659510.3390/bioengineering12060595Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic VariantsBill Qi0Yannis J. Trakadis1Department of Human Genetics, McGill University, Montreal, QC H3A 0C7, CanadaDepartment of Human Genetics, McGill University, Montreal, QC H3A 0C7, CanadaIneffective treatment and side effects are associated with high burdens for the patient and society. We investigated the application of graph representation learning (GRL) for predicting medication usage based on individual genetic data in the United Kingdom Biobank (UKBB). A graph convolutional network (GCN) was used to integrate interconnected biomedical entities in the form of a knowledge graph as part of a machine learning (ML) prediction model. Data from The Pharmacogenomics Knowledgebase (PharmGKB) was used to construct a biomedical knowledge graph. Individual genetic data (<i>n</i> = 485,754) from the UKBB was obtained and preprocessed to match with pharmacogenetic variants in the PharmGKB. Self-reported medication usage labels were obtained from UKBB data field 20003. We hypothesize that pharmacogenetic variants can predict the impact of medications on individuals. We assume that an individual using a medication on a regular basis experiences a net benefit (vs. side-effects) from the medication. ML models were trained to predict medication usage for 264 medications. The GCN model significantly outperformed both a baseline logistic regression model (<i>p</i>-value: 1.53 × 10<sup>−9</sup>) and a deep neural network model (<i>p</i>-value: 8.68 × 10<sup>−8</sup>). The GCN model also significantly outperformed a GCN model trained using a random graph (GCN-random) (<i>p</i>-value: 5.44 × 10<sup>−9</sup>). A consistent trend of medications with higher sample sizes having better performance was observed, and for several medications, a high relative rank of the medication (among multiple medications) was associated with greater than 2-fold higher odds of usage of the medication. In conclusion, a graph-based ML approach could be useful in advancing precision medicine by prioritizing medications that a patient may need based on their genetic data. However, further research is needed to improve the quality and quantity of genetic data and to validate our approach using more reliable medication labels.https://www.mdpi.com/2306-5354/12/6/595pharmacogeneticsmachine learninggraph representation learninggraph convolutional network
spellingShingle Bill Qi
Yannis J. Trakadis
Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants
Bioengineering
pharmacogenetics
machine learning
graph representation learning
graph convolutional network
title Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants
title_full Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants
title_fullStr Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants
title_full_unstemmed Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants
title_short Graph Representation Learning for the Prediction of Medication Usage in the UK Biobank Based on Pharmacogenetic Variants
title_sort graph representation learning for the prediction of medication usage in the uk biobank based on pharmacogenetic variants
topic pharmacogenetics
machine learning
graph representation learning
graph convolutional network
url https://www.mdpi.com/2306-5354/12/6/595
work_keys_str_mv AT billqi graphrepresentationlearningforthepredictionofmedicationusageintheukbiobankbasedonpharmacogeneticvariants
AT yannisjtrakadis graphrepresentationlearningforthepredictionofmedicationusageintheukbiobankbasedonpharmacogeneticvariants