Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.
Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The us...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS Genetics |
| Online Access: | https://doi.org/10.1371/journal.pgen.1011540 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849315755893456896 |
|---|---|
| author | Bushra Haque David Cheerie Amy Pan Meredith Curtis Thomas Nalpathamkalam Jimmy Nguyen Celine Salhab Bhooma Thiruvahindrapuram Jade Zhang Madeline Couse Taila Hartley Michelle M Morrow E Magda Price Susan Walker David Malkin Frederick P Roth Gregory Costain |
| author_facet | Bushra Haque David Cheerie Amy Pan Meredith Curtis Thomas Nalpathamkalam Jimmy Nguyen Celine Salhab Bhooma Thiruvahindrapuram Jade Zhang Madeline Couse Taila Hartley Michelle M Morrow E Magda Price Susan Walker David Malkin Frederick P Roth Gregory Costain |
| author_sort | Bushra Haque |
| collection | DOAJ |
| description | Innovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values (0.847 and 0.829) and area under the receiver-operating characteristic curve values (0.821 and 0.774) for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders. |
| format | Article |
| id | doaj-art-9c554e515294402182af80e03d47c5fc |
| institution | Kabale University |
| issn | 1553-7390 1553-7404 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Public Library of Science (PLoS) |
| record_format | Article |
| series | PLoS Genetics |
| spelling | doaj-art-9c554e515294402182af80e03d47c5fc2025-08-20T03:52:03ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042025-01-01211e101154010.1371/journal.pgen.1011540Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants.Bushra HaqueDavid CheerieAmy PanMeredith CurtisThomas NalpathamkalamJimmy NguyenCeline SalhabBhooma ThiruvahindrapuramJade ZhangMadeline CouseTaila HartleyMichelle M MorrowE Magda PriceSusan WalkerDavid MalkinFrederick P RothGregory CostainInnovative and easy-to-implement strategies are needed to improve the pathogenicity assessment of rare germline missense variants. Somatic cancer driver mutations identified through large-scale tumor sequencing studies often impact genes that are also associated with rare Mendelian disorders. The use of cancer mutation data to aid in the interpretation of germline missense variants, regardless of whether the gene is associated with a hereditary cancer predisposition syndrome or a non-cancer-related developmental disorder, has not been systematically assessed. We extracted putative cancer driver missense mutations from the Cancer Hotspots database and annotated them as germline variants, including presence/absence and classification in ClinVar. We trained two supervised learning models (logistic regression and random forest) to predict variant classifications of germline missense variants in ClinVar using Cancer Hotspot data (training dataset). The performance of each model was evaluated with an independent test dataset generated in part from searching public and private genome-wide sequencing datasets from ~1.5 million individuals. Of the 2,447 cancer mutations, 691 corresponding germline variants had been previously classified in ClinVar: 426 (61.6%) as likely pathogenic/pathogenic, 261 (37.8%) as uncertain significance, and 4 (0.6%) as likely benign/benign. The odds ratio for a likely pathogenic/pathogenic classification in ClinVar was 28.3 (95% confidence interval: 24.2-33.1, p < 0.001), compared with all other germline missense variants in the same 216 genes. Both supervised learning models showed high correlation with pathogenicity assessments in the training dataset. There was high area under precision-recall curve values (0.847 and 0.829) and area under the receiver-operating characteristic curve values (0.821 and 0.774) for logistic regression and random forest models, respectively, when applied to the test dataset. With the use of cancer and germline datasets and supervised learning techniques, our study shows that cancer mutation data can be leveraged to improve the interpretation of germline missense variation potentially causing rare Mendelian disorders.https://doi.org/10.1371/journal.pgen.1011540 |
| spellingShingle | Bushra Haque David Cheerie Amy Pan Meredith Curtis Thomas Nalpathamkalam Jimmy Nguyen Celine Salhab Bhooma Thiruvahindrapuram Jade Zhang Madeline Couse Taila Hartley Michelle M Morrow E Magda Price Susan Walker David Malkin Frederick P Roth Gregory Costain Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants. PLoS Genetics |
| title | Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants. |
| title_full | Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants. |
| title_fullStr | Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants. |
| title_full_unstemmed | Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants. |
| title_short | Leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants. |
| title_sort | leveraging cancer mutation data to inform the pathogenicity classification of germline missense variants |
| url | https://doi.org/10.1371/journal.pgen.1011540 |
| work_keys_str_mv | AT bushrahaque leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT davidcheerie leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT amypan leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT meredithcurtis leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT thomasnalpathamkalam leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT jimmynguyen leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT celinesalhab leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT bhoomathiruvahindrapuram leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT jadezhang leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT madelinecouse leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT tailahartley leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT michellemmorrow leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT emagdaprice leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT susanwalker leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT davidmalkin leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT frederickproth leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants AT gregorycostain leveragingcancermutationdatatoinformthepathogenicityclassificationofgermlinemissensevariants |