Improved breast cancer risk prediction using chromosomal-scale length variation

Abstract Introduction Early diagnosis of breast cancer leads to higher long-term survival rates. The development of a germline genetic test, or polygenic risk score, to identify women at high risk of breast cancer holds the potential to reduce cancer deaths. However, current tests based on SNPs do n...

Full description

Saved in:
Bibliographic Details
Main Authors: Yasaman Fatapour, James P. Brody
Format: Article
Language:English
Published: BMC 2025-06-01
Series:Human Genomics
Online Access:https://doi.org/10.1186/s40246-025-00776-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849691331480256512
author Yasaman Fatapour
James P. Brody
author_facet Yasaman Fatapour
James P. Brody
author_sort Yasaman Fatapour
collection DOAJ
description Abstract Introduction Early diagnosis of breast cancer leads to higher long-term survival rates. The development of a germline genetic test, or polygenic risk score, to identify women at high risk of breast cancer holds the potential to reduce cancer deaths. However, current tests based on SNPs do not perform much better than predictions based on family history and perform significantly worse in populations with non-European ancestry. We have developed an alternative method to characterize a genome, called chromosomal-scale length variation, which can be applied to polygenic risk scores. Objective The objective of this paper is to characterize a breast cancer genetic risk score based on chromosomal-scale length variation using the NIH All of Us dataset in different self-identified racial groups when trained on different populations. Methods We used the NIH All of Us dataset to compile a dataset with 4,533 women who have been diagnosed with breast cancer (including 440 who self-identified as Black) and 44,518 women who have not. We acquired, through All of Us, genetic information for each of these women. We computed a set of 88 values for each woman in the dataset, representing the chromosomal-scale length variation parameters. These numbers are average log R ratios for four different segments from each of the 22 autosomes. We used machine learning algorithms to find a model that best differentiates the women with breast cancer from the women without breast cancer based on the set of 88 numbers that characterize each woman’s germline genome. Results The best model had an AUC of 0.70 (95% CI, 0.67–0.73) in the All of Us population. Women who scored in the top quintile by this model were nine times more likely to have breast cancer when compared to women who scored in the lowest quintile. Conclusion In conclusion, we found that this method of computing genetic risk scores for breast cancer is a substantial improvement over SNP-based polygenic risk scores. In addition, we compared models trained on populations of only White women and only Black women. We found that the models trained only on White women performed better than models trained only on Black women when tested on only White women. We did not see a significant difference between the two models when tested on only Black women.
format Article
id doaj-art-7675f4f2c1ac4cebb81da4b8e57b9aca
institution DOAJ
issn 1479-7364
language English
publishDate 2025-06-01
publisher BMC
record_format Article
series Human Genomics
spelling doaj-art-7675f4f2c1ac4cebb81da4b8e57b9aca2025-08-20T03:21:03ZengBMCHuman Genomics1479-73642025-06-0119111010.1186/s40246-025-00776-zImproved breast cancer risk prediction using chromosomal-scale length variationYasaman Fatapour0James P. Brody1Department of Biomedical Engineering, University of California, IrvineDepartment of Biomedical Engineering, University of California, IrvineAbstract Introduction Early diagnosis of breast cancer leads to higher long-term survival rates. The development of a germline genetic test, or polygenic risk score, to identify women at high risk of breast cancer holds the potential to reduce cancer deaths. However, current tests based on SNPs do not perform much better than predictions based on family history and perform significantly worse in populations with non-European ancestry. We have developed an alternative method to characterize a genome, called chromosomal-scale length variation, which can be applied to polygenic risk scores. Objective The objective of this paper is to characterize a breast cancer genetic risk score based on chromosomal-scale length variation using the NIH All of Us dataset in different self-identified racial groups when trained on different populations. Methods We used the NIH All of Us dataset to compile a dataset with 4,533 women who have been diagnosed with breast cancer (including 440 who self-identified as Black) and 44,518 women who have not. We acquired, through All of Us, genetic information for each of these women. We computed a set of 88 values for each woman in the dataset, representing the chromosomal-scale length variation parameters. These numbers are average log R ratios for four different segments from each of the 22 autosomes. We used machine learning algorithms to find a model that best differentiates the women with breast cancer from the women without breast cancer based on the set of 88 numbers that characterize each woman’s germline genome. Results The best model had an AUC of 0.70 (95% CI, 0.67–0.73) in the All of Us population. Women who scored in the top quintile by this model were nine times more likely to have breast cancer when compared to women who scored in the lowest quintile. Conclusion In conclusion, we found that this method of computing genetic risk scores for breast cancer is a substantial improvement over SNP-based polygenic risk scores. In addition, we compared models trained on populations of only White women and only Black women. We found that the models trained only on White women performed better than models trained only on Black women when tested on only White women. We did not see a significant difference between the two models when tested on only Black women.https://doi.org/10.1186/s40246-025-00776-z
spellingShingle Yasaman Fatapour
James P. Brody
Improved breast cancer risk prediction using chromosomal-scale length variation
Human Genomics
title Improved breast cancer risk prediction using chromosomal-scale length variation
title_full Improved breast cancer risk prediction using chromosomal-scale length variation
title_fullStr Improved breast cancer risk prediction using chromosomal-scale length variation
title_full_unstemmed Improved breast cancer risk prediction using chromosomal-scale length variation
title_short Improved breast cancer risk prediction using chromosomal-scale length variation
title_sort improved breast cancer risk prediction using chromosomal scale length variation
url https://doi.org/10.1186/s40246-025-00776-z
work_keys_str_mv AT yasamanfatapour improvedbreastcancerriskpredictionusingchromosomalscalelengthvariation
AT jamespbrody improvedbreastcancerriskpredictionusingchromosomalscalelengthvariation