Nilanjan Chatterjee’s research while affiliated with Johns Hopkins Bloomberg School of Public Health and other places
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Biobanks have become pivotal in genetic research, particularly through genome-wide association studies (GWAS), driving transformative insights into the genetic basis of complex diseases and traits through the integration of genetic data with phenotypic, environmental, family history, and behavioral information. This review explores the distinct design and utility of different biobanks, highlighting their unique contributions to genetic research. We further discuss the utility and methodological advances in combining data from disease-specific study or consortia with that of biobanks, especially focusing on summary statistics based meta-analysis. Subsequently we review the spectrum of additional advantages offered by biobanks in genetic studies in representing population differences, calibration of polygenic scores, assessment of pleiotropy and improving post-GWAS in silico analyses. Advances in sequencing technologies, particularly whole-exome and whole-genome sequencing, have further enabled the discovery of rare variants at biobank scale. Among recent developments, the integration of large-scale multi-omics data especially proteomics and metabolomics, within biobanks provides deeper insights into disease mechanisms and regulatory pathways. Despite challenges like ascertainment strategies and phenotypic misclassification, biobanks continue to evolve, driving methodological innovation and enabling precision medicine. We highlight the contributions of biobanks to genetic research, their growing integration with multi-omics, and finally discuss their future potential for advancing healthcare and therapeutic development.
Background
Several breast cancer (BC) risk prediction models are used in clinical practice to identify women eligible for enhanced screening or prevention strategies. While these models have been independently validated in specific and often separate contexts, their performance has not been systematically evaluated and compared across a wide range of populations or age ranges.
Methods
We collected individual-level baseline questionnaire data and incident cancer diagnoses from 14 cohorts participating in the Breast Cancer Risk Prediction Project (BCRPP), representing the United States (N=12), Canada (N=1) and Australia (N=1). After harmonizing data across cohorts, five-year absolute risk estimates for invasive breast cancer were derived using four models: BCRAT, iCARE-Lit, Tyrer-Cuzick (all estimating risk in women aged ≥20-75 years) and the Black Women’s Health Study (BWHS) calculator (estimating risk in Black women aged ≥30-70 years). Using the iCARE-calibrate function in R we estimated the area under the curve (AUC) and 95% confidence intervals (CI) within each cohort, using absolute risk designations to incorporate age. Expected to observed (E/O) absolute risks were estimated on average and within expected absolute risk deciles for each cohort. To align with its intended use in clinical practice, calibration of the BWHS calculator (and comparison to other models) was performed within the subset of Black women pooled from all cohorts.
Results
A total of 1, 041, 708 women, enrolled in studies between 1976-2015, were included. Within five years of baseline cohort entry, 116, 113 invasive breast cancer cases were diagnosed. Mean age at baseline ranged from 34 to 62 years. Within cohorts, discrimination was similar across models but varied substantially across cohorts. Age-incorporated AUCs ranged from 0.55-0.77, with most cohort-specific AUCs under 0.65 and higher AUCs observed among younger cohorts. Calibration, measured by E/O ratios, differed substantially by both cohort and risk prediction model. For the majority of cohort-model combinations, risk was overestimated in the upper risk deciles. In general, E/O ratios were more similar for BCRAT and Tyrer-Cuzick models compared to iCARE-Lit, which tended to overestimate risk more on average. In the subset of Black women aged ≥30-70 years (N=84, 594, N=824 invasive cases in 5 years), age-incorporated AUCs ranged from 0.61 to 0.64. While risk was overestimated in the upper risk deciles for all models, overestimation was 13-118% lower for the BWHS calculator.
Conclusion
The discrimination and calibration of existing risk prediction models varied across studies. Future model development including additional risk factors (e.g., genetic and mammographic information) should leverage diverse training data and flexible models to ensure risk estimates perform well across different regions, countries, and ethnicities.
Citation Format
Kristen D. Brantley, Thomas U. Ahearn, Emily Norton, Julie Palmer, Gary Zirpoli, Matt Barnett, Marian L. Neuhouser, Lauren Teras, James Hodge, Thomas E. Rohan, Roger Milne, A. Heather Eliassen, Hongyan Huang, Yu Chen, Katie M. O'Brien, Cari Kitahara, Garnet Anderson, I-Min Lee, Nilanjan Chatterjee, Montserrat Garcia-Closas, Peter Kraft, Breast Cancer Risk Prediction Project. Performance of common general-population breast cancer risk prediction models in 14 cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3594.
Background
Plasma proteins, reflecting both exogenous and endogenous factors, could serve as the basis for risk stratification to guide decision-making for primary and secondary prevention of cancer in adults. Published studies have investigated the prospective risk of cancers and other chronic diseases associated with Olink-based proteomic data available in the UK Biobank Study. There is a need for further investigations based on alternative measurement platforms, longer follow-up data, and diverse populations.
Aim
Identify proteins measured by SomaScan assay® 5K that are associated with risk of solid cancers in the Atherosclerosis Risk in Communities (ARIC) study, a prospective cohort of middle-aged and older Black and White men and women.
Methods
The study was conducted based on a cohort of 9,495 individuals for which a total of 590, 416, 22, 97, 271, 88, 136 incident cancers of prostate, lung, liver, kidney, colorectal, pancreatic, and bladder were observed over a maximum follow-up period of 25.9 years. Log2-transformed relative fluorescence unit of 4,955 SOMAmers measured in plasma collected at Visit 2 were adjusted in a linear regression model including proteomic PEER factors, study sites, and ten genetic principal components (PCs). Corrected protein quantifications based on rank-inverse normalization of the residuals were used in the analysis. Incident cancers were ascertained primarily by cancer registry linkage. The Cox regression model and time-dependent coefficient Cox model for age at Visit 2 were applied to estimate associations between individual proteins and each solid cancer risk, adjusting for potential environmental risk factors, race, genetic PCs, and PEER factors. Sensitivity analyses examined associations with extended lag-time (diagnosis more than 5 years after Visit 2) and minimally adjusted Cox model for cancer risk.
Results
We identified 32 proteins (19 positively, 13 inversely) for liver cancer risk, 5 proteins (4 positively, 1 inversely) for lung cancer risk in both main and sensitivity analyses at false discovery rate (FDR) <0.05. 2 known proteins (MMP7 and HAVCR1) were identified to be positively associated with kidney cancer, and 2 (ACP3 and KLK3) were identified for prostate cancer. We further identified GSTM1 associated with bladder cancer risk [HR (95% CI): 0.68 (0.56, 0.82)] and found evidence of enrichment of signals for proteomics association with colorectal cancer at nominal threshold P < 0.05 (371 proteins for colorectal cancer, 401 proteins for colon cancer only, and 425 proteins for rectal cancer).
Conclusion
We discovered multiple circulating proteins associated with lung and liver cancer risks, and confirmed previously established circulating proteins for kidney and prostate cancer risks. Additional evaluations in independent and large cohort studies and laboratory testing will be necessary to validate our results. Support: NHGRI, NHLBI, NCI, NPCR.
Citation Format
Ziqiao Wang, Vernon A. Burk, Nilanjan Chatterjee, Elizabeth Platz. Discovering plasma proteins associated with common solid cancer incidence in ARIC [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 2336.
Breast cancer risk assessment tools are widely used in clinical practice to guide decisions regarding screening timing and modality, lifestyle interventions, genetic testing, preventive therapy, and risk-reducing surgery. Although a number of tools designed for the general population are used in practice, they face a series of challenges including: (i) modest discriminatory ability due to lack of a unified model that flexibly incorporates a comprehensive set of risk factors; (ii) inability to produce subtype-specific risk estimates, relevant for tailoring screening and/or preventive strategies (e.g. prophylactic endocrine therapy that is effective only for hormone receptor positive tumors); (iii) lack of data from diverse populations to build models that work well for all women; and, (iv) scarcity of opportunities for comparative validation of different models. The Breast Cancer Risk Prediction Project (BCRPP), a collaboration currently including 20 prospective cohorts from the National Cancer Institute’s Cohort Consortium, is addressing these challenges by harmonizing, sharing, and analyzing data on over 1.8 million women, over 115, 000 of whom have developed breast cancer. The BCRPP is developing a comprehensive tool that will predict breast cancer risk, overall and by subtypes, across major racial and ethnic groups in the United States. This tool incorporates known and emerging risk factors (e.g. reproductive history, anthropometry, smoking, alcohol intake, medication use, mammography, family history, and measured genotypes). In parallel to model building and validation efforts, BCRPP investigators are developing privacy-preserving web and mobile interfaces to deliver the risk tools to patients and clinicians. Here we present descriptive data on participating cohorts (e.g. dates of enrollment, ages at enrollment, length of follow up, numbers of cases by subtype), summarize missing data patterns and our approach to model development and validation that accounts for heterogeneity in missingness patterns across cohorts, and describe data governance procedures and data sharing infrastructure (including pooled and federated analyses, to account for policy limitations on sharing individual data). Descriptive information and procedures for requesting and accessing data are available via the BCRPP data platform (https://epidataplatforms.cancer.gov/bcrpp/). Although originally assembled for risk model development, these data on a large and diverse sample of women with extensive harmonized risk factors and covariates can be a resource to address other outstanding questions in breast cancer epidemiology across time, the lifecourse, and diverse populations.
Citation Format
Peter Kraft, Laura Beane-Freeman, Julie Palmer, Ellen O'Meara, Jeanine Genkinger, James Lacey, Thomas Rohan, Lauren Teras, Marian L. Neuhouser, Kala Visvanathan, Celine Vachon, Roger Milne, Christopher Haiman, Yu Chen, Heather Eliassen, Cari Kitahara, Katie O'Brien, Emily White, Garnet Anderson, I-Min Lee, Archie Campbell, Renee Fortner, Ylva Lagerros, Sven Sandin, Mia Gaudet, Montserrat Garcia-Closas, Nilanjan Chatterjee. The Breast Cancer Risk Prediction Project: a resource of 1.8 million women with diverse backgrounds from 20 prospective cohorts to improve breast cancer risk prediction and advance breast cancer epidemiology [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3583.
Introduction
Breast cancer genome-wide association studies (GWAS) have identified over 200 susceptibility loci, many replicated in diverse populations. However, cross-ancestry evaluation of breast cancer genetic architecture remains limited. We examined breast cancer genetic architecture using GWAS summary results from European (EUR; cases (ca) = 188,474, controls (co) = 96,201), East Asian (EAS; ca = 20,393, co = 86,329), African American (AA; ca = 9,235, co = 10,184), and US Hispanic/Latina and Latin American (H/L; ca = 2,396, co =7,468) studies.
Methods
GWAS results were derived from the Breast Cancer Association Consortium, the African Ancestry Breast Cancer Genetics Consortium (AABCG), Biobank Japan, and a meta-analysis of five studies of H/L women with breast cancer. Linkage disequilibrium (LD) scores were generated for EUR, EAS, and AMR using the 1000 Genomes Project, while AA LD scores were derived from a subset of AABCG controls. Heritability was estimated for each ancestry group using LD score regression. Genetic correlations between populations were assessed via Popcorn. Polygenicity was analyzed using GENESIS; however, due to the limited sample size in the H/L studies, these estimates were restricted to AA, EAS, and EUR. Enrichment of heritability across regulatory elements was evaluated using stratified LD score regression across all populations.
Results
The logit-scale heritability and corresponding standard error (SE) were 0.466 (0.066) for EAS, 0.501 (0.050) for EUR, 0.588 (0.360) for H/L, and 0.614 (0.095) for AA. The estimated number of independent susceptibility loci was 4,446 for EAS, 5,235 for EUR, and 8,308 for AA. Using a clumping and thresholding approach, an optimal set of common variants were projected to explain 38.6% (EUR), 39.4% (EAS), and 26.2% (AA) of genetic variance for samples of 100,000 cases and 100,000 controls with AUCs from corresponding estimated polygenic risk scores (PRSs) of 0.621 (EUR), 0.634 (EAS), and 0.611 (AA). Genetic correlations were strongest between EUR and EAS (ρ = 0.79, SE = 0.08) and EUR and H/L (ρ = 0.68, SE = 0.21), and weakest between AA and EAS (ρ = 0.42, SE = 0.14) and AA and H/L (ρ = 0.26, SE = 0.24). Among 73 genomic features, we found significant (P < 0.05/73) enrichment in heritability in EUR for ‘ancient promoter’ regions, transcription factor binding sites, H3K4me3, ‘super-enhancers’, H3K4me1, H3K27ac; in EAS for ‘super-enhancers’; and in AA for H3K27ac. No genomic features were significantly enriched in H/L, likely due to limited power.
Conclusion
These findings suggest a shared breast cancer genetic architecture across diverse populations, as well as the potential for a similar level of breast cancer risk stratification by PRS in these populations. Expanding GWAS in underrepresented populations is essential to improve genetic risk predictions and foster equitable cancer prevention.
Citation Format
James L. Li, Maria Zanti, Jacob Williams, Om Jahagirdar, Guochong Jia, Qiang Hu, Jean-Tristan Brandenburg, Li Yan, Weang-Kee Ho, Jingmei Li, José P. Miranda, Devika Godbole, Julie-Alexia Dias, Leila Dorling, Wenlong C. Chen, Nicholas Boddicker, Ying Wang, Alicia Martin, Martin J. Zhang, Yan Zhang, Joe Dennis, Esther M. John, Gabriela Torres-Mejia, Larry Kushi, Jeffrey Weitzel, Susan L. Neuhausen, Luis Carvajal-Carmona, Christopher Haiman, Elad Ziv, Laura Fejerman, Wei Zheng, Dezheng Huo, Douglas Easton, Nilanjan Chatterjee, Peter Kraft, Montserrat Garcia-Closas, Wendy Wong, Kyriaki Michailidou, Qianqian Zhu, Diptavo Dutta, Thomas U. Ahearn, Haoyu Zhang. Genetic architecture of breast cancer across diverse populations: Assessing heritability, genetic correlation, and polygenicity [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 2282.
High-throughput proteomic profiling in prospective cohorts has the potential to uncover biomarkers of cancer risk and biology of cancer development. We investigated prospective associations of up-to 7, 335 aptamers with overall and site-specific colorectal cancer in a case-cohort study within the European Prospective Investigation into Cancer and Nutrition (EPIC) and then evaluated associations in the Atherosclerosis Risk in Communities (ARIC) study.
Hazard ratios and 95% confidence intervals were estimated using Prentice weighted Cox-proportional hazard models (stratified by recruitment center, sex, and 5 year age group and adjusted for fasting status, day of blood draw, body mass index, smoking status, daily alcohol consumption, physical activity level, and highest level of education) in 977 incident colorectal (colon, 658; rectum, 319) cancer cases and 5, 057 non-cases under a case-cohort design. The identified associations were then evaluated in ARIC in a cohort design including 271 colorectal (colon, 235; rectum, 36) cancers in 9, 495 individuals using multivariate Cox-proportional hazard models.
Overall, 37 aptamers were associated (false discovery rate[FDR] p-value < 0.05) with colorectal cancer in EPIC; data on 27 of these were available in ARIC, 7 of which had consistent directions of association in both studies and met a nominal p-value (0.05) threshold in ARIC (positive associations: ACAA1, ASL, GDF15, IQCF1, TFF3; inverse associations: CLEC3B, COMP). Associations did not change after 5-year follow-up exclusion. In site-specific analyses in EPIC, 66 and 6 aptamers were specifically associated with colon and rectal cancer, respectively at FDR<0.05. Four of these aptamers were nominally (P<0.05) associated with colon cancer in ARIC (inverse association: CLEC3B; positive association: ACAA1, GDF15, TFF3) while a further 29 showed associations directionally consistent with those in EPIC.
These large-scale proteomic analyses identified a set of proteins associated with colorectal cancer risk across two independent cohorts. These proteins reflect pathways involved in cell migration and adhesion, colorectal mucosal integrity and growth and differentiation, among others, and may represent novel biomarkers of colorectal cancer risk.
Citation Format
Matthew A. Lee, Vivian Viallon, David C. Muller, Ziqiao Wang, Paula Jakszyn, Pietro Ferrari, Giovanna Masala, Domenico Pali, Salvatore Panico, Carlotta Sacerdote, Karl Smith-Byrne, Ruth C. Travis, 8 Rosario Tumino, P. Martijn Kolijn, Roel Vermeulen, Monique Verschuren, Nicholas Wareham, Elizabeth A. Platz, Nilanjan Chatterjee, Elio Riboli, Marc J. Gunter. Proteomic analysis of colorectal cancer risk across two prospective cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 1890.
Background
Prostate cancer (PCa) is the most prevalent cancer among men in Europe, yet its aetiology is poorly understood. Proteins contribute to the development of carcinogenesis and are also the target of most pharmacological interventions. The examination of associations between plasma proteins and PCa may enhance understanding of the aetiology of the PCa.
Aim
Identify proteins associated with PCa risk within a large European prospective cohort, with emphasis on associations with aggressive subtypes of the disease.
Methods
We conducted a case-cohort study in the European Prospective Investigation into Cancer and Nutrition (EPIC) study, with data for a sub-cohort of 1, 573 individuals and for 982 incident PCa cases. The SomaLogic® 7K panel was used to measure 7, 363 aptamers (representing 6, 412 unique proteins) in plasma samples drawn at recruitment. PCa diagnosis was ascertained by linkage to cancer and death registry data, and was further stratified into aggressive subtypes based on histological grade, tumour stage and mortality information. The Prentice-weight Cox regression model was applied to estimate associations between individual proteins and PCa risk, with adjustment for body mass index, smoking, alcohol intake and education level. Subgroup analyses examined associations with high-grade, advanced-stage, aggressive and extended lag-time (diagnosis more than 15 years after blood draw) PCa risk. We sought external replication for our significant findings in an independent multi-ancestry cohort, the Atherosclerosis Risk in Communities (ARIC) study (SomaLogic® 5K).
Results
Following a median follow-up period of 16.2 years, the sub-cohort developed 79 cases of PCa. ACP3, FLT4, and KLK3 [HRs (95% CI): 1.20 (1.10, 1.31), 1.26 (1.13, 1.40) and 2.28 (1.96, 2.65), respectively] were associated with overall PCa risk in EPIC at FDR (false discovery rate) <0.05. In the subgroup analyses, 12 proteins were associated with high-grade, 9 with advanced stage, and 7 with aggressive PCa risk. Among these, ANKRD1 was associated with approximately a 25% higher risk of all three of these clinically relevant subtypes per SD increase. Twelve proteins were associated with PCa risk diagnosed more than 15 years after blood draw, including ANXA2, OSCAR, and SAT2. Two well-established biomarkers of PCa (ACP3 and KLK3) were replicated in ARIC, together with two novel proteins (ANXA2 and APOC2).
Conclusion
Multiple novel circulating proteins were associated with PCa risk, particularly with risk for aggressive subtypes. While some proteins may serve as early biomarkers of PCa, those with evidence from long-lag time highlight underlying aetiological mechanisms in the development of PCa.
Citation Format
Zhe Huang, Mahboubeh Parsaeian, Vivian Viallon, Ziqiao Wang, Keren Papier, Trishna Desai, Stephanie Chan, Antonio Agudo, Carlotta Sacerdote, David C. Muller, Domenico Pali, Giovanna Masala, Nicholas Wareham, Raul Zamora-Ros, Roel C. Vermeulen, Rosario Tumino, Ian G. Mills, Nilanjan Chatterjee, Elizabeth A. Platz, Pietro Ferrari, Marc Gunter, Elio Riboli, Tim J. Key, Joshua R. Atkins, Karl Smith-Byrne, Ruth C. Travis, The EPIC SomaLogic Working Group. Proteomic risk factors for prostate cancer: A case-cohort study in EPIC [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3599.
Genome-wide association studies (GWAS) of oral cancers (OC) to date have focused predominantly on European Ancestry (EA) populations. India faces an excess burden of OC, but the most common site of occurrence is the cancer of the buccal mucosa, which is relatively rare in EA populations. We conducted a GWAS of buccal mucosa cancer (BMC) comprising 2,160 BMC cases and 2,325 controls from different geographical locations in India. Single-SNP association tests detected one novel locus (6q27) and one novel signal within the known OC risk locus 5p13.33, at the genome-wide significance level (P-value<5×10^ ⁻⁸ ). We additionally conducted a GWAS of 397 BMC cases and 439 controls from Taiwan and performed multi-ancestry GWAS meta-analysis of OC on 5255 cases and 8748 controls across EA, Indian and Taiwanese populations. We identified a novel risk locus harbouring the tumour suppressor gene NOTCH1 through a gene-level analysis of the multi-ancestry GWAS data. Pathway analysis suggested that PD-1 signaling, and Interferon Gamma signaling may be important in the etiology of BMC. Within data from the Indian BMC GWAS, we further identified statistically significant evidence of both multiplicative interactions (P-value=0.026) indicating stronger polygenic risk of BMC among individuals with history of chewing tobacco compared to those without. Our study provides insights into the etiologies of BMC in India, highlighting both its similarities and differences with other types of oral cavity cancers, as well as the interactions between polygenic gene score and tobacco chewing.
Introduction
The number of assays on proteomic platforms has grown rapidly. The leading platforms, SomaScan and Olink, have strengths and limitations. Comparisons of precision on the latest platforms—SomaScan 11k and Olink Explore HT—have not yet been established.
Methods
Among 102 participants in the Atherosclerosis Risk in Communities Study (mean age 74 years, 53% women, 47% Black), we used split plasma samples to measure platform precision. CV and Spearman correlations were calculated for each assay. Cross-platform agreement was assessed for overlapping proteins.
Results
SomaScan 11k demonstrated a median correlation of 0.85 for the 10 778 assays and a median CV of 6.8%, similar precision to earlier versions. The 5420 assays on Olink Explore HT exhibited a median correlation of 0.65 and median CV of 35.7%, which was higher than observed in its predecessors (e.g., 19.8% for Olink Explore 3072). Precision of Olink assays was inversely correlated with the percentage of samples above the limit of detection (LOD) (r = −0.77). Upon replacing Olink values below the LOD with values half the LOD, the median correlation for Olink assays measured in duplicate increased to 0.79; the median CV decreased to 13.3%. The distribution of between-platform correlations for the 4443 overlapping proteins had peaks at r approximately 0 and at r approximately 0.8. One-tenth of the protein pairs had cross-platform correlations r ≥ 0.8.
Conclusions
Precision of these 2 proteomics platforms in human plasma has diverged as the coverage has increased. These results highlight the need for careful consideration in platform selection based on specific research requirements.
Importance
Most breast cancers in Africa are diagnosed at advanced stages. Improved risk prediction tools to optimize screening and earlier diagnosis are urgently needed.
Objective
To build a comprehensive breast cancer risk estimation model by integrating a polygenic risk score (PRS), pathogenic variants (PVs) in high- or moderate-penetrance genes, and a questionnaire-based risk calculator.
Design, Setting, and Participants
This multicenter case-control study initially enrolled women in Nigeria in 1998 and expanded to Cameroon and Uganda in 2011; enrollment ended in 2018. Women with breast cancer (hereafter cases) were enrolled through hospital oncology units, whereas women without breast cancer (hereafter controls) were recruited from other outpatient clinics and the community. Participants whose genetic data were used in PRS development were excluded from the development of the comprehensive breast cancer risk estimation model. Analyses were performed from September 2023 to January 2025.
Exposures
Lifetime absolute risk estimation models that integrated a PRS only (previously developed using data from women of African ancestry and European ancestry), PRS plus PVs in high- or moderate-penetrance genes ( BRCA1 , BRCA2 , PALB2 , ATM , CHEK2 , TP53 , BARD1 , RAD51C , and RAD51D ), epidemiologic risk factors only (ascertained from NBCS questionnaires), and a combined model containing these 3 components.
Main Outcomes and Measures
Lifetime absolute risk of breast cancer was estimated, accounting for an association between family history and genetic factors. Participants’ lifetime estimated absolute risk was categorized by the following risk thresholds: lower than 3%, 3%, 5%, and 10% or higher.
Results
A total of 1686 women, of whom 996 were cases (mean [SD] age at enrollment, 49.5 [12.2] years) and 690 were controls (mean [SD] age at enrollment, 41.5 [13.8] years), were included in the main analyses. The age-adjusted area under the receiver operating characteristic curve (AUROC) was 0.579 (95% CI, 0.549-0.610) for the PRS only model and 0.609 (95% CI, 0.579-0.638) for the PRS plus PV model. In the combined model containing both genetic and nongenetic risk factors, age-adjusted AUROC increased to 0.723 (95% CI, 0.698-0.748). Using a threshold of 10% or higher lifetime absolute risk, the combined model classified 12.0% of cases (120) as high risk compared with 3.7% of cases (37) using the epidemiologic factors only model and 5.0% of cases (50) using the PRS plus PV model.
Conclusions and Relevance
In this case-control study, a breast cancer risk estimation model was developed that combines genetic and nongenetic factors and refines a previous model that includes epidemiologic risk factors. Further development and validation of this model are necessary to advance breast cancer risk assessment in sub-Saharan Africa.
Citations (58)
... Gene-environment interactions have been investigated in several complex diseases, including BC, to elucidate how a polygenic risk score (PRS) may modulate or be modulated by environmental exposures [35][36][37][38][39][40]. This interplay is particularly complex in admixed populations, whose unique genetic architecture and phenotypic variability may influence the magnitude and direction of gene-environment effects. ...
... By summing the risk contributions of numerous SNPs identified in GWASs, PRS provide a quantitative estimate of genetic predisposition to specific diseases. Risk prediction models based on PRS have been applied to various cancers, including lung cancer, breast cancer, colon rectal cancer, glioma, and neuroblastoma (22)(23)(24)(25)(26)(27). However, the only PRS for neuroblastomas was established from UB biobank data (27). ...
... The NIH-funded Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium was formed to assess and improve polygenic risk estimates for a broad range of health and disease outcomes with global impacts, with a focus on addressing inequitable performance in diverse populations. 1,2 Comprising seven multiinstitutional Study Sites, a Coordinating Center (CC), NIH program staff, and other affiliates and partner programs, the PRIMED Consortium was tasked with identifying and aggregating available geographically, genetically, and ancestrally diverse datasets to improve risk prediction by polygenic risk score (PRS) development. To bridge the performance gap of PRS in diverse groups, the PRIMED Consortium aims to incorporate new and existing genome-wide association study (GWAS) results and leverage methodologic and computational advances by integrating extant genotype and phenotype datasets. ...
... Comparisons with other recent studies, such as those employing high-throughput proteomics, reveal a consistent identification of key proteins involved in PCa progression but also highlight the need for further validation in diverse populations and across different stages of the disease. These comparisons underscore the potential of BPR as robust biomarkers that could enhance current diagnostic and prognostic tools in clinical practice [5,12,48]. ...
... This diverse network of regulatory elements allows for context-specific transcriptional regulation in specific tissues, at specific developmental time points, or in response to environmental stimuli (41). Under this framework, a distinct set of variants, each associated with different allergic diseases, converge on a small number of common, shared genes, giving rise to the pleiotropism suggested by GWAS (42,43). ...
... More than 10,000 proteins have been identified in blood 10,11 , with the majority entering via the lymphatic system. These proteins originate from interstitial spaces and include locally secreted proteins as well as cellular and extracellular matrix debris. ...
... We also identified several gene sets involved in glucose metabolism including glucuronidation, uronic acid metabolic process and KEGG pathway pentose and glucuronate interconversions [76,77]. Top GO and KEGG pathways enriched for our gene-region-specific DMRs included the UGT1A region (a complex of multiple alternatively spliced genes), which was recently found to influence risk of glycaemic biomarkers (glycated albumin and fructosamine) [78]. A retinol metabolism KEGG pathway was also enriched for gene-region-specific DMRs. ...
... Recently, many investigators have utilized PRS x E analysis to study gene-environment interactions for a wide range of traits, including lung cancer 5 , diabetes 6 , ADHD 7 , and cardiovascular disease 8 . Compared to single-variant GxE analysis, PRS x E analysis may provide increased power because it focuses on known disease-related variants and it integrates the signals across those variants into a potentially more informative single measure of genetic susceptibility 9 . Detecting a PRS x E interaction will allow us to answer questions such as: Does the effect of a particular exposure on disease risk vary depending on overall genetic . ...
... Recently, new PRS methods have been introduced that leverage multiple discovery GWAS from diverse ancestries to enhance PRS accuracy and generalizability across populations. 18,19 In this study, we used GWAS summary statistics from a meta-analysis of continental African individuals to develop singleancestry PRSs for SBP, DBP, PP, and hypertension using various PRS methods. GWAS summary statistics from multiple ancestries were incorporated to develop multiancestry PRSs. ...
... Clustering uncovers inherent patterns by grouping data based on similarity metrics or latent spaces. Tools such as DeepHisCoM and MUSSEL reveal subpopulations or regulatory elements linked to diseases [ 74 ]. Methods such as RegBase apply clustering to annotate and predict non-coding regulatory variants with high precision. ...