Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.

Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The us...

Full description

Saved in:
Bibliographic Details
Main Authors: Bushra Haque, David Cheerie, Amy Pan, Meredith Curtis, Thomas Nalpathamkalam, Jimmy Nguyen, Celine Salhab, Bhooma Thiruvahindrapuram, Jade Zhang, Madeline Couse, Taila Hartley, Michelle M Morrow, E Magda Price, Susan Walker, David Malkin, Frederick P Roth, Gregory Costain
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS Genetics
Online Access:https://doi.org/10.1371/journal.pgen.1011540
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849315755893456896
author Bushra Haque
David Cheerie
Amy Pan
Meredith Curtis
Thomas Nalpathamkalam
Jimmy Nguyen
Celine Salhab
Bhooma Thiruvahindrapuram
Jade Zhang
Madeline Couse
Taila Hartley
Michelle M Morrow
E Magda Price
Susan Walker
David Malkin
Frederick P Roth
Gregory Costain
author_facet Bushra Haque
David Cheerie
Amy Pan
Meredith Curtis
Thomas Nalpathamkalam
Jimmy Nguyen
Celine Salhab
Bhooma Thiruvahindrapuram
Jade Zhang
Madeline Couse
Taila Hartley
Michelle M Morrow
E Magda Price
Susan Walker
David Malkin
Frederick P Roth
Gregory Costain
author_sort Bushra Haque
collection DOAJ
description Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values (0.847 and 0.829) and area under the receiver-operating characteristic curve values (0.821 and 0.774) for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.
format Article
id doaj-art-9c554e515294402182af80e03d47c5fc
institution Kabale University
issn 1553-7390
1553-7404
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj-art-9c554e515294402182af80e03d47c5fc2025-08-20T03:52:03ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042025-01-01211e101154010.1371/journal.pgen.1011540Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.Bushra HaqueDavid CheerieAmy PanMeredith CurtisThomas NalpathamkalamJimmy NguyenCeline SalhabBhooma ThiruvahindrapuramJade ZhangMadeline CouseTaila HartleyMichelle M MorrowE Magda PriceSusan WalkerDavid MalkinFrederick P RothGregory CostainInnovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values (0.847 and 0.829) and area under the receiver-operating characteristic curve values (0.821 and 0.774) for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.https://doi.org/10.1371/journal.pgen.1011540
spellingShingle Bushra Haque
David Cheerie
Amy Pan
Meredith Curtis
Thomas Nalpathamkalam
Jimmy Nguyen
Celine Salhab
Bhooma Thiruvahindrapuram
Jade Zhang
Madeline Couse
Taila Hartley
Michelle M Morrow
E Magda Price
Susan Walker
David Malkin
Frederick P Roth
Gregory Costain
Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.
PLoS Genetics
title Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.
title_full Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.
title_fullStr Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.
title_full_unstemmed Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.
title_short Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.
title_sort leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants
url https://doi.org/10.1371/journal.pgen.1011540
work_keys_str_mv AT bushrahaque leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT davidcheerie leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT amypan leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT meredithcurtis leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT thomasnalpathamkalam leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT jimmynguyen leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT celinesalhab leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT bhoomathiruvahindrapuram leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT jadezhang leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT madelinecouse leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT tailahartley leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT michellemmorrow leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT emagdaprice leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT susanwalker leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT davidmalkin leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT frederickproth leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants
AT gregorycostain leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants