Giovanni Parmigiani

Giovanni Parmigiani
Dana-Farber Cancer Institute | DFCI · Department of Data Sciences

PhD

About

575
Publications
58,827
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
40,334
Citations
Introduction
Giovanni Parmigiani creates statistical tools. Some are general, while others are specific to understanding cancer data. For example, he is interested in addressing the challenges of cross-study replication of predictions by constructing predictors that learn replicability from multiple studies. In cancer research, he has a long-term interest in helping families who are particularly susceptible to inherited cancer understand their risk and make informed decisions. He uses Bayesian modeling and machine learning concepts for predicting who is at risk of carrying genetic variants, and to integrate literature-based and other information about the effects of mutations.
Additional affiliations
August 2009 - present
Dana-Farber Cancer Institute
Position
  • Professor (Full)
August 2009 - present
Harvard University
Position
  • Professor (Full)
August 1999 - August 2009
Johns Hopkins University
Position
  • Faculty Member

Publications

Publications (575)
Preprint
Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relatio...
Article
10597 Background: Lynch syndrome (LS) is the most common cause of hereditary colorectal cancer (CRC) with an increased CRC lifetime risk of 70-80%. LS affects 1:250 individuals and is caused by pathogenic variants in the mismatch repair (MMR) genes. Statistical prediction models such as MMRpro and PREMM5 are widely used to identify LS carriers. How...
Article
Risk evaluation to identify individuals who are at greater risk of cancer as a result of heritable pathogenic variants is a valuable component of individualized clinical management. Using principles of Mendelian genetics, Bayesian probability theory, and variant‐specific knowledge, Mendelian models derive the probability of carrying a pathogenic va...
Article
In the Men’s Lifestyle Validation Study (2011-2013), we examined the validity and relative validity of a physical activity questionnaire (PAQ), web-based 24-hour recalls (ACT24) and accelerometers by multiple comparison methods. Over one year, 609 men completed two PAQs, two 7-day accelerometer measures, one doubly labeled water (DLW)-physical acti...
Preprint
Full-text available
In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicabil...
Article
Full-text available
Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1,448 primary prostate can...
Article
Full-text available
(1) Background: The purpose of this study is to compare the performance of four breast cancer risk prediction models by race, molecular subtype, family history of breast cancer, age, and BMI. (2) Methods: Using a cohort of women aged 40–84 without prior history of breast cancer who underwent screening mammography from 2006 to 2015, we generated bre...
Article
Full-text available
Importance Patients with cancer are at increased risk for severe COVID-19, but it is unknown whether SARS-CoV-2 vaccination is effective for them. Objective To determine the association between SARS-CoV-2 vaccination and SARS-CoV-2 infections among a population of Veterans Affairs (VA) patients with cancer. Design, Setting, and Participants Retro...
Article
Immunoglobulin M (IgM) multiple myeloma (MM) is a rare disease subgroup. Its differentiation from other IgM-producing gammopathies such as Waldenström macroglobulinemia (WM) has not been well characterized but is essential for proper risk assessment and treatment. In this study, we investigated genomic and transcriptomic characteristics of IgM-MM s...
Article
Introduction: Stromal cells in the bone marrow microenvironment maintain a complex bidirectional relationship with the malignant plasma cells and have been implicated in the growth and survival of multiple myeloma (MM) cells along with the development of drug resistance and disease progression. We hypothesized that the perpetual induction of gene e...
Article
Introduction Coronavirus disease 2019 (COVID-19), caused by the SARS-CoV-2 virus, is particularly serious in patients with multiple myeloma (MM), with estimated mortality of over 30% in several studies. In the general population, SARS-CoV-2 vaccination has been demonstrated to be an effective approach to preventing infection. However, patients with...
Article
On an average, 1% of monoclonal gammopathy of undermined significance (MGUS) and 10% of smoldering Multiple Myeloma (SMM) progress to symptomatic MM every year within the first five years of diagnosis. The probability of progression significantly decreases for SMM patients after first 5 years. However, a distinct subset of SMM patients progress wit...
Article
Identifying families with an underlying inherited cancer predisposition is a major goal of cancer prevention efforts. Mendelian risk models have been developed to better predict the risk associated with a pathogenic variant of developing breast/ovarian cancer (with BRCAPRO) and the risk of developing pancreatic cancer (PANCPRO). Given that pathogen...
Article
Background On average 10% of SMM patients progress to symptomatic MM per year with in first 5 years of diagnosis. However, a subset of SMM patients re-classified as high risk patients on the basis of risk markers which identify risk of progression within 2 years. Although recent studies have evaluated the high-risk SMM, genomic background of SMM pa...
Preprint
Full-text available
It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets and applying standard statistical learning methods can result in poor out-of-study prediction performance when datasets are heterogeneous. Theoretical and applied...
Preprint
Full-text available
Risk evaluation to identify individuals who are at greater risk of cancer as a result of heritable pathogenic variants is a valuable component of individualized clinical management. Using principles of Mendelian genetics, Bayesian probability theory, and variant-specific knowledge, Mendelian models derive the probability of carrying a pathogenic va...
Preprint
Full-text available
Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1,448 primary prostate can...
Conference Paper
p>Introduction: Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. It is unknown to what extent TMAs are affected by batch effects, i.e., measurement error in biomarker levels between batches (slides from TMAs), what impact batch effects have on scientific inference from TMAs, and how to correct for batch effects in...
Conference Paper
Full-text available
p>Introduction: Identifying families with an underlying inherited cancer predisposition is a major goal of cancer prevention efforts. Mendelian risk models have been developed to better predict the risk associated with a pathogenic variant of developing breast/ovarian cancer (with BRCAPRO), and the risk of developing pancreatic cancer (PANCPRO). Gi...
Preprint
Full-text available
Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions based on knowledge of cancer susceptibility genes. These models are widely used in clinical practice to help identify high-risk individuals. Mendelian models leverage the entire family history, but...
Article
3058 Background: Circulating cell-free DNA (cfDNA) is largely nucleosomal in origin with typical fragment lengths of 167 base-pairs reflecting the length of DNA wrapped around-the histone and H1 linker. Given the nucleosomal origin of cfDNA, we have previously used low coverage whole genome sequencing to evaluate DNA fragmentation profiles to sensi...
Preprint
Full-text available
Adapting machine learning algorithms to better handle the presence of natural clustering or batch effects within training datasets is imperative across a wide variety of biological applications. This article considers the effect of ensembling Random Forest learners trained on clusters within a single dataset with heterogeneity in the distribution o...
Article
Full-text available
Since the beginning of the coronavirus disease-2019 (COVID-19) pandemic in 2020, there has been a tremendous accumulation of data capturing different statistics including the number of tests, confirmed cases and deaths. This data wealth offers a great opportunity for researchers to model the effect of certain variables on COVID-19 morbidity and mor...
Preprint
Full-text available
Improving existing widely-adopted prediction models is often a more efficient and robust way towards progress than training new models from scratch. Existing models may (a) incorporate complex mechanistic knowledge, (b) leverage proprietary information and, (c) have surmounted barriers to adoption. Compared to model training, model improvement and...
Article
Full-text available
The COVID-19 mortality rate is higher in the elderly and in those with pre-existing chronic medical conditions. The elderly also suffer from increased morbidity and mortality from seasonal influenza infections; thus, an annual influenza vaccination is recommended for them. In this study, we explore a possible county-level association between influe...
Article
Full-text available
Background: Men engaged in high physical activity have lower risks of advanced and fatal prostate cancer. Mechanisms underlying this association are not well understood but may include systemic and tumor-specific effects. We investigated potential mechanisms linking physical activity and gene expression in prostate tissue from men with prostate ca...
Article
Full-text available
Motivation Genomic data are often produced in batches due to practical restrictions, which may lead to unwanted variation in data caused by discrepancies across batches. Such” batch effects” often have negative impact on downstream biological analysis and need careful consideration. In practice, batch effects are usually addressed by specifically d...
Article
Multiple myeloma (MM) is a proliferation of terminally differentiated plasma cells (PC) producing monoclonal immunoglobulins (Ig), most commonly IgG and IgA (50% and 25% respectively), and less frequently, light-chain only disease, non-secretory, and IgD. IgM-MM is a rare entity (<0.5%), and its differentiation from common IgM producing PC disorder...
Article
Commercialized multigene panel testing brings unprecedented opportunities to understand germline genetic contributions to hereditary cancers. Most genetic testing companies classify the pathogenicity of variants as pathogenic, benign, or variants of unknown significance (VUSs). The unknown pathogenicity of VUSs poses serious challenges to clinical...
Preprint
Full-text available
Identifying individuals who are at high risk of cancer due to inherited germline mutations is critical for effective implementation of personalized prevention strategies. Most existing models to identify these individuals focus on specific syndromes by including family and personal history for a small number of cancers. Recent evidence from multi-g...
Article
Germline mutations in many genes have been shown to increase the risk of developing cancer. This risk can vary across families who carry mutations in the same gene due to differences in the specific variants, gene-gene interactions, other susceptibility mutations, environmental factors, and behavioral factors. We develop an analytic tool to explore...
Article
Estimating the prevalence of rare germline genetic mutations in the general population is of interest as it can inform genetic counseling and risk management. Most studies that estimate the prevalence of mutations are performed in high‐risk populations, and each study is designed with differing inclusion criteria, resulting in ascertained populatio...
Article
Purpose: Identifying cancers with high PI3K pathway activity is critical for treatment selection and eligibility into clinical trials of PI3K inhibitors. Assessments of tumor signaling pathway activity need to consider intratumoral heterogeneity and multiple regulatory nodes. Methods: We established a novel, mechanistically informed approach to...
Article
Full-text available
The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effe...
Article
Full-text available
Multiple studies have identified transcriptome subtypes of high-grade serous ovarian carcinoma (HGSOC), but their interpretation and translation are complicated by tumor evolution and polyclonality accompanied by extensive accumulation of somatic aberrations, varying cell type admixtures, and different tissues of origin. In this study, we examined...
Preprint
Full-text available
Accurate risk stratification is key to reducing cancer morbidity through targeted screening and preventative interventions. Numerous breast cancer risk prediction models have been developed, but they often give predictions with conflicting clinical implications. Integrating information from different models may improve the accuracy of risk predicti...
Preprint
Full-text available
Analyzing multiple studies allows leveraging data from a range of sources and populations, but until recently, there have been limited methodologies to approach the joint unsupervised analysis of multiple high-dimensional studies. A recent method, Bayesian Multi-Study Factor Analysis (BMSFA), identifies latent factors common to all studies, as well...
Preprint
Full-text available
Jointly using data from multiple similar sources for the training of prediction models is increasingly becoming an important task in many fields of science. In this paper, we propose a framework for generalist and specialist predictions that leverages multiple datasets, with potential heterogenity in the relationships between predictors and outcome...
Article
Full-text available
PURPOSE Multiple myeloma (MM) is accompanied by heterogeneous somatic alterations. The overall goal of this study was to describe the genomic landscape of myeloma using deep whole-genome sequencing (WGS) and develop a model that identifies patients with long survival. METHODS We analyzed deep WGS data from 183 newly diagnosed patients with MM trea...
Conference Paper
In 2019 there will be 22,500 new cases of ovarian cancer, resulting in approximately 14,000 deaths. Ovarian cancer remains the highest case fatality rate of any gynecologic cancer. Despite significant improvement in the surgical and chemotherapeutic managements of ovarian cancer patients, the overall survival has not changed in 30 years. However, t...
Preprint
Full-text available
COVID-19 mortality rate is higher in the elderly and in those with preexisting chronic medical conditions. The elderly also suffer from increased morbidity and mortality from seasonal influenza infection, and thus annual influenza vaccination is recommended for them. In this study, we explore a possible area-level association between influenza vacc...
Preprint
Full-text available
We investigate the power of censoring techniques, first developed for learning {\em fair representations}, to address domain generalization. We examine {\em adversarial} censoring techniques for learning invariant representations from multiple "studies" (or domains), where each study is drawn according to a distribution on domains. The mapping is u...
Article
Full-text available
There are numerous statistical models used to identify individuals at high risk of cancer due to inherited mutations. Mendelian models predict future risk of cancer by using family history with estimated cancer penetrances (age‐ and sex‐specific risk of cancer given the genotype of the mutations) and mutation prevalences. However, there is often re...
Article
1520 Background: It is critical for oncologists to be aware of unbiased and interpretable cancer risks (i.e., penetrance) in carriers with germline pathogenic variants in cancer susceptibility genes. However, relevant literature is large and varies significantly in study design, patient ascertainment, and types of risk estimates reported. This hete...
Article
This work extends Receiver Operating Characteristic (ROC) curve to the situation where some cases, falling in an intermediate ”indeterminacy zone” of the predictor, are not classified. It addresses two challenges: definition of sensitivity and specificity bounds for this case; and summarization of the large number of possibilities arising from diff...
Article
PurposeThe classification of germline variants may differ between labs and change over time. We apply a variant harmonization tool, Ask2Me VarHarmonizer, to map variants to ClinVar and identify discordant variant classifications in a large multipractice variant dataset. MethodsA total of 7496 variants sequenced between 1996 and 2019 were collected...
Article
Full-text available
Background: Use of risk-reducing Salpingo-oophorectomy (RRSO) substantially reduces the risk of ovarian and breast cancer for women who carry a BRCA1/2 mutation. It is important to adjust for RRSO use in the estimation of BRCA1/2 penetrance of breast and ovarian cancer. Methods: We searched PubMed for penetrance estimates of breast and ovarian c...
Article
Full-text available
Background: Lynch syndrome, the most common colorectal cancer (CRC) syndrome, is caused by germline mismatch repair (MMR) genes. Precise estimates of age-specific risks are crucial for sound counseling of individuals managing a genetic predisposition to cancer, but published risk estimates vary. The objective of this work is to provide gene-, sex-...
Article
The positive relationship between airborne fine particulate matter ( PM2.5) and cardiovascular disease (CVD) is established. Little is known about effect size heterogeneity across distinct CVD outcomes. We conducted a multi-outcome case-crossover study of Medicare beneficiaries aged > 65 years residing in the mainland USA from 2000 through 2012. Th...
Article
Purpose: While various studies have highlighted the prognostic significance of pathological complete response (pCR) after neoadjuvant chemotherapy (NAT), the impact of additional adjuvant therapy after pCR is not known. Experimental design: PubMed was searched for studies with NAT for breast cancer and individual patient-level data was extracted...
Preprint
Full-text available
The benefit of integrating batches of genomic data to increase statistical power in differential expression is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data. Many existing methods for batch effect...
Article
Full-text available
Multi-study learning uses multiple training studies, separately trains classifiers on each, and forms an ensemble with weights rewarding members with better cross-study prediction ability. This article considers novel weighting approaches for constructing tree-based ensemble learners in this setting. Using Random Forests as a single-study learner,...
Article
Full-text available
Background: Recent efforts to improve outcomes for high-grade serous ovarian cancer, a leading cause of cancer death in women, have focused on identifying molecular subtypes and prognostic gene signatures, but existing subtypes have poor cross-study robustness. We tested the contribution of cell admixture in published ovarian cancer molecular subt...
Preprint
Full-text available
Prediction settings with multiple studies have become increasingly common. Ensembling models trained on individual studies has been shown to improve replicability in new studies. Motivated by a groundbreaking new technology in human neuroscience, we introduce two generalizations of multi-study ensemble predictions. First, while existing methods wei...
Preprint
Full-text available
PURPOSE: The popularity of germline genetic panel testing has led to a vast accumulation of variant-level data. Variant names are not always consistent across laboratories and not easily mappable to public variant databases such as ClinVar. A tool that can automate the process of variants harmonization and mapping is needed to help clinicians ensur...
Article
Multiple myeloma (MM) is a malignancy of the plasma cell in which clonal plasma cells infiltrate the bone marrow. Although increasingly effective treatment has significantly improved management of the disease, most MM patients ultimately relapse. Previous studies suggest that miRNA expression patterns may function as biomarkers for diagnosis, subty...
Article
Multiple Myeloma (MM) is a plasma cells malignancy with number of recent therapeutic options that has improved outcomes with median survival now stretching beyond 8 years. There has been an intense search to identify genomic and laboratory correlates of outcome for high risk patients. However, a subgroup of patients have a long survival but genomic...
Article
Somatic alterations including single nucleotide variants (SNVs), copy number alterations (CNAs) and structural variants (SVs) are an important component and the hallmark of Multiple Myeloma (MM) with certain alterations having clinical implications. Recently, the chronological order of such events was explained in detailed studies however previous...
Article
Alternative splicing (AS) is a critical post-transcriptional event, which affects the number of cellular processes. Aberrant splicing of some genes has been reported in multiple myeloma (MM). However, to date, whole-transcriptome-wide AS study has not been performed. We used deep RNA-sequencing data from 16 normal plasma cells (NPC) and 360 newly-d...