Sohrab P. Shah’s research while affiliated with Memorial Sloan Kettering Cancer Center and other places
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
The immune composition of solid tumors is typically inferred from biomarkers, such as histologic and molecular classifications, somatic mutational burden, and PD-L1 expression. However, the extent to which these biomarkers predict the immune landscape in gastric adenocarcinoma—an aggressive cancer often linked to chronic inflammation—remains poorly understood. We leveraged high-dimensional spectral cytometry to generate a comprehensive single-cell immune landscape of tumors, normal tissue, and lymph nodes from patients in the Western Hemisphere with gastric adenocarcinoma. The immune composition of gastric tumors could not be predicted by traditional metrics such as tumor histology, molecular classification, mutational burden, or PD-L1 expression via IHC. Instead, our findings revealed that innate immune surveillance within tumors could be anticipated by the immune profile of the normal gastric mucosa. Additionally, distinct T-cell states in the lymph nodes were linked to the accumulation of activated and memory-like CD8+ tumor-infiltrating lymphocytes (TILs). Unbiased re-classification of patients based on tumor-specific immune infiltrate generated four distinct subtypes with varying immune compositions. Tumors with a T-cell-dominant immune subtype, which spanned TCGA molecular subtypes, were exclusively associated with superior responses to immunotherapy. Parallel analysis of metastatic gastric cancer patients treated with immune checkpoint blockade showed that patients who responded to immunotherapy had a pre-treatment tumor composition that corresponded to a T-cell-dominant immune subtype from our analysis. Taken together, this work identifies key host-specific factors associated with intratumoral immune composition in gastric cancer and offers an immunological classification system that can effectively identify patients likely to benefit from immune-based therapies.
The prevalence and nature of somatic copy number alterations (CNAs) in breast epithelium and their role in tumor initiation and evolution remain poorly understood. Using single-cell DNA sequencing (49,238 cells) of epithelium from BRCA1 and BRCA2 carriers or wild-type individuals, we identified recurrent CNAs (for example, 1q-gain and 7q, 10q, 16q and 22q-loss) that are present in a rare population of cells across almost all samples (n = 28). In BRCA1/BRCA2 carriers, these occur before loss of heterozygosity (LOH) of wild-type alleles. These CNAs, common in malignant tumors, are enriched in luminal cells but absent in basal myoepithelial cells. Allele-specific analysis of prevalent CNAs reveals that they arose by independent mutational events, consistent with convergent evolution. BRCA1/BRCA2 carriers contained a small percentage of cells with extreme aneuploidy, featuring loss of TP53, BRCA1/BRCA2 LOH and multiple breast cancer-associated CNAs. Our findings suggest that CNAs arising in normal luminal breast epithelium are precursors to clonally expanded tumor genomes.
The digitization of health records and growing availability of tumour DNA sequencing provide an opportunity to study the determinants of cancer outcomes with unprecedented richness. Patient data are often stored in unstructured text and siloed datasets. Here we combine natural language processing annotations1,2 with structured medication, patient-reported demographic, tumour registry and tumour genomic data from 24,950 patients at Memorial Sloan Kettering Cancer Center to generate a clinicogenomic, harmonized oncologic real-world dataset (MSK-CHORD). MSK-CHORD includes data for non-small-cell lung (n = 7,809), breast (n = 5,368), colorectal (n = 5,543), prostate (n = 3,211) and pancreatic (n = 3,109) cancers and enables discovery of clinicogenomic relationships not apparent in smaller datasets. Leveraging MSK-CHORD to train machine learning models to predict overall survival, we find that models including features derived from natural language processing, such as sites of disease, outperform those based on genomic data or stage alone as tested by cross-validation and an external, multi-institution dataset. By annotating 705,241 radiology reports, MSK-CHORD also uncovers predictors of metastasis to specific organ sites, including a relationship between SETD2 mutation and lower metastatic potential in immunotherapy-treated lung adenocarcinoma corroborated in independent datasets. We demonstrate the feasibility of automated annotation from unstructured notes and its utility in predicting patient outcomes. The resulting data are provided as a public resource for real-world oncologic research.
Allogeneic hematopoetic cell transplantation harnesses donor T cell alloreactivity against leukemic blasts and is curative in a subset of patients with AML. Aside from transplant, however, T cell based immunotherapies have been unsuccessful in AML and in patients with AML there is evidence for impaired endogenous immune responses. Despite these observations, mechanisms underpinning ineffective anti-leukemic T cell immunity are not fully known.
Here we hypothesized that leukemic blasts drive impaired T cell immunity leading to distinct T cell compositions during different disease states. Using multi-modal approaches to study T cell phenotype and T cell receptor (TCR) repertoire across AML disease states we identified dominant clonally expanded terminal effector memory CD45RA+ (TEMRA) CD8 T cells in the marrow of AML patients with active disease along with abundant immunosuppressive CD4 T regulatory cells (Tregs). CD8 TEMRA clones maintain over time in patients with persistent AML and exhibited numerous interactions with malignant blasts suggestive of ongoing immune modulation by antigen producing leukemic cells. A subset of these CD8 effectors (expressing CX3CR1 and other NK-like markers) exert anti-tumor cytotoxic activity ex vivo but are suppressed by interactions with marrow Tregs. Consistent with this, Treg depletion rescued CD8 effector activity and promoted AML eradication.
To study leukemic blasts and T cell immunity in the AML tumor microenvironment (TME), we performed integrative analysis of protein (CITE-seq and 31-color spectral flow), transcript, and TCRs in individual lymphoid and myeloid cells from longitudinal marrows. We applied this to 179 patient samples from 91 subjects (71 AML, 20 controls), sequencing >670K cells total and >190K T cells. We found that patients harbor a highly abundant CD8 TEMRA population at AML diagnosis that persisted over time in patients who did not achieve remission. These cells express clonally expanded TCRs, a subset of which were marked by CX3CR1, TIGIT (but not PD-1 or TIM3) and attributes of cytotoxicity including granzyme B, perforin, and NK-like markers.
To interrogate the function of expanded effector CD8 T cells, we investigated marrow T cells in an unirradiated syngeneic AML mouse model (C1498 cells into C57B6 mice). As AML accumulated in the marrow, TIGIT+ effector CD8s increased in frequency, as in human AML. A subset of these effectors expressed CX3CR1, which we hypothesized might have tumor killing capability given their cytotoxic profile in patient data. Indeed, in vitro functional analysis revealed increased killing of endogenous tumor by marrow CD8 effectors including CX3CR1+ effectors compared to naïve CD8s harvested 18-20 days post-tumor injection. Importantly in patients with AML we harnessed the TCR CDR3 sequence as a barcode to track phenotypes of CD8 clonotypes over time and found that certain CX3CR1+ TEMRAs transition to a CX3CR1- state in ongoing disease, suggesting loss of cytotoxicity.
Analyses of cell-cell interactions from CITE-seq suggested altered myeloid-T cell interactions in AML compared to control and remission samples, including increased signaling between myeloid cells, Tregs, and memory CD8s in patients with active AML. AML blasts also had increased expression of T cell inhibitory molecules including TIGIT ligands, CD244, and VISTA compared to healthy myeloid cells. Notably, in addition to expanded CD8 TEMRAs, we found increased Tregs in marrow from patients with AML and C1498 engrafted mice. Tregs in both human and mouse AML expressed high levels of TIGIT, CD39, ICOS, and CCR4 but were not clonally expanded. Ex vivo AML marrow Tregs suppressed CD8 effector function. Importantly depletion of Tregs in vivo through transgenic FoxP3 diphtheria toxin receptor mice prolonged host survival, promoted tumor clearance, and led to an increase in marrow CX3CR1+ effector CD8 T cells.
These data demonstrate the active immunologic landscape of the AML bone marrow in both patient samples and mouse models of the disease. We find that although CD8 T cells have the potential for anti-leukemic immunity, their efficacy is impaired by leukemic blasts and suppressive Tregs. These studies suggest Treg-targeting interventions as a therapeutic avenue to overcome the immunosuppressive TME in AML and nominate a host of potentially targetable T cell and blast cell surface proteins that restrain T cell anti-tumor immunity in AML.
Dysregulated DNA replication is a cause and a consequence of aneuploidy in cancer, yet the interplay between copy number alterations (CNAs), replication timing (RT) and cell cycle dynamics remain understudied in aneuploid tumors. We developed a probabilistic method, PERT, for simultaneous inference of cell-specific replication and copy number states from single-cell whole genome sequencing (scWGS) data. We used PERT to investigate clone-specific RT and proliferation dynamics in >50,000 cells obtained from aneuploid and clonally heterogeneous cell lines, xenografts and primary cancers. We observed bidirectional relationships between RT and CNAs, with CNAs affecting X-inactivation producing the largest RT shifts. Additionally, we found that clone-specific S-phase enrichment positively correlated with ground-truth proliferation rates in genomically stable but not unstable cells. Together, these results demonstrate robust computational identification of S-phase cells from scWGS data, and highlight the importance of RT and cell cycle properties in studying the genomic evolution of aneuploid tumors.
Drug resistance is the major cause of therapeutic failure in high-grade serous ovarian cancer (HGSOC). Yet, the mechanisms by which tumors evolve to drug resistant states remains largely unknown. To address this, we aimed to exploit clone-specific genomic structural variations by combining scaled single-cell whole genome sequencing with longitudinally collected cell-free DNA (cfDNA), enabling clonal tracking before, during and after treatment. We developed a cfDNA hybrid capture, deep sequencing approach based on leveraging clone-specific structural variants as endogenous barcodes, with orders of magnitude lower error rates than single nucleotide variants in ctDNA (circulating tumor DNA) detection, demonstrated on 19 patients at baseline. We then applied this to monitor and model clonal evolution over several years in ten HGSOC patients treated with systemic therapy from diagnosis through recurrence. We found drug resistance to be polyclonal in most cases, but frequently dominated by a single high-fitness and expanding clone, reducing clonal diversity in the relapsed disease state in most patients. Drug-resistant clones frequently displayed notable genomic features, including high-level amplifications of oncogenes such as CCNE1, RAB25, NOTCH3, and ERBB2. Using a population genetics Wright-Fisher model, we found evolutionary trajectories of these features were consistent with drug-induced positive selection. In select cases, these alterations impacted selection of secondary lines of therapy with positive patient outcomes. For cases with matched single-cell RNA sequencing data, pre-existing and genomically encoded phenotypic states such as upregulation of EMT and VEGF were linked to drug resistance. Together, our findings indicate that drug resistant states in HGSOC pre-exist at diagnosis and lead to dramatic clonal expansions that alter clonal composition at the time of relapse. We suggest that combining tumor single cell sequencing with cfDNA enables clonal tracking in patients and harbors potential for evolution-informed adaptive treatment decisions.
Cancer-associated venous thromboembolism (VTE) is a major source of oncologic cost, morbidity and mortality. Identifying high-risk patients for prophylactic anticoagulation is challenging and adds to clinician burden. Circulating tumor DNA (ctDNA) sequencing assays (‘liquid biopsies’) are widely implemented, but their utility for VTE prognostication is unknown. Here we analyzed three plasma sequencing cohorts: a pan-cancer discovery cohort of 4,141 patients with non-small cell lung cancer (NSCLC) or breast, pancreatic and other cancers; a prospective validation cohort consisting of 1,426 patients with the same cancer types; and an international generalizability cohort of 463 patients with advanced NSCLC. ctDNA detection was associated with VTE independent of clinical and radiographic features. A machine learning model trained on liquid biopsy data outperformed previous risk scores (discovery, validation and generalizability c-indices 0.74, 0.73 and 0.67, respectively, versus 0.57, 0.61 and 0.54 for the Khorana score). In real-world data, anticoagulation was associated with lower VTE rates if ctDNA was detected (n = 2,522, adjusted hazard ratio (HR) = 0.50, 95% confidence interval (CI): 0.30–0.81); ctDNA⁻ patients (n = 1,619) did not benefit from anticoagulation (adjusted HR = 0.89, 95% CI: 0.40–2.0). These results provide preliminary evidence that liquid biopsies may improve VTE risk stratification in addition to clinical parameters. Interventional, randomized prospective studies are needed to confirm the clinical utility of liquid biopsies for guiding anticoagulation in patients with cancer.
Background
The encoding of cell intrinsic drug resistance states in breast cancer reflects the contributions of genomic and non-genomic variations and requires accurate estimation of clonal fitness from co-measurement of transcriptomic and genomic data. Somatic copy number (CN) variation is the dominant mutational mechanism leading to transcriptional variation and notably contributes to platinum chemotherapy resistance cell states. Here, we deploy time series measurements of triple negative breast cancer (TNBC) single-cell transcriptomes, along with co-measured single-cell CN fitness, identifying genomic and transcriptomic mechanisms in drug-associated transcriptional cell states.
Results
We present scRNA-seq data (53,641 filtered cells) from serial passaging TNBC patient-derived xenograft (PDX) experiments spanning 2.5 years, matched with genomic single-cell CN data from the same samples. Our findings reveal distinct clonal responses within TNBC tumors exposed to platinum. Clones with high drug fitness undergo clonal sweeps and show subtle transcriptional reversion, while those with weak fitness exhibit dynamic transcription upon drug withdrawal. Pathway analysis highlights convergence on epithelial-mesenchymal transition and cytokine signaling, associated with resistance. Furthermore, pseudotime analysis demonstrates hysteresis in transcriptional reversion, indicating generation of new intermediate transcriptional states upon platinum exposure.
Conclusions
Within a polyclonal tumor, clones with strong genotype-associated fitness under platinum remained fixed, minimizing transcriptional reversion upon drug withdrawal. Conversely, clones with weaker fitness display non-genomic transcriptional plasticity. This suggests CN-associated and CN-independent transcriptional states could both contribute to platinum resistance. The dominance of genomic or non-genomic mechanisms within polyclonal tumors has implications for drug sensitivity, restoration, and re-treatment strategies.
Whole-genome doubling (WGD) is a critical driver of tumor development and is linked to drug resistance and metastasis in solid malignancies. Here, we demonstrate that WGD is an ongoing mutational process in tumor evolution in cancers with TP53 loss. Using single-cell whole-genome sequencing, we measured and modeled how WGD events are distributed across cellular populations within tumors and associated WGD dynamics with properties of genome diversification and phenotypic consequences of innate immunity. We studied WGD evolution in 65 high-grade serous ovarian cancer (HGSOC) tissue samples from 40 patients, yielding 29,481 tumor cell genomes. We found near-ubiquitous evidence of WGD as an ongoing mutational process promoting cell-cell diversity, high rates of chromosomal missegregation, and consequent micronucleation. Using a novel mutation-based WGD timing method, doubleTime, we delineated specific modes by which WGD can drive tumor evolution: (i) unitary evolutionary origin followed by significant diversification, (ii) independent WGD events on a pre-existing background of copy number diversity, and (iii) evolutionarily late clonal expansions of WGD populations. Additionally, through integrated single-cell RNA sequencing and high-resolution immunofluorescence microscopy, we found that inflammatory signaling and the positive association between chromosomal instability and cGAS-STING pathway activation are restricted to tumors that remain predominantly diploid. This contrasted with predominantly WGD tumors, which exhibited significant quiescent and immunosuppressive phenotypic states. Together, these findings establish WGD as an evolutionarily 'active' mutational process in late stage ovarian cancer and link consequent genomic states with altered innate immune responses and immunosuppressive phenotypes.
Citations (49)
... With traditional genomics analyses, studies of tumor evolution using whole genome sequencing (WGS) have established that chromosomal instability and somatic copy number alterations play pivotal roles in the development and progression of cancer [10][11][12][13]. A well-understood mechanism by which this occurs is through the sequential accumulation of genetic alterations in genes such as tumor suppressors and oncogenes [14][15][16][17][18]. ...
... Alhtough these studies were mainly conducted on hematopoietic malignancies, they support a "context"-dependent role of SETD2 in cancer that need to be better adressed and expanded. Interestingly, very recent studies have shown association of SETD2 mutations in lung cancer with specific genomic alterations affecting genes such as BRAF or EGFR [112]. SETD2 mutation was further associated with longer immunotherapy response pointing out SETD2 as a promising biomarker of immunotherapy response [112]. ...
... Previous attempts to quantify the link between the genome and transcriptome [31,32] relied on bulk sequencing, which obscures tumor heterogeneity and the contribution of non-cancerous cells in the microenvironment. While paired genomic and transcriptional profiling at the single-cell level is technically possible [33][34][35][36], such data are difficult to generate at scale and depth, and datasets coupling these methods remain scarce [37,38]. Conversely, scRNA-seq analysis pipelines for clustering single-cell transcriptomics and identifying differentially-expressed genes are also limited in the scope of biological insights as they merely focus on phenotypic plasticity and do not elucidate driving factors such as gene dosage, clonal identities, and lineages. ...
... Specific coagulation factors or genetic testing may be needed for specific causes. With the deepening of research, more and more new biomarkers (such as circulating tumor DNA [104], microsomes [105], and tissue factor-positive particles [106]) have been proposed to assess hypercoagulability, but the clinical application of these markers is still under study. In terms of treatment strategies and clinical management, anticoagulant therapy is the main intervention for hypercoagulability states. ...
... The success of these treatments is partly attributed to a deep understanding of the underlying immunobiology of B-cell lineage lymphomas. [5][6][7][8] Although there is growing interest in applying similar therapeutic strategies to TCLs, progress has been limited. This is due in part to the rarity and the biological complexity of these disorders, but the primary obstacle remains our inadequate understanding of the immune microenvironment. ...
... 29,55,56 In contrast, elevated levels of mtDNA (or its proportion) are closely linked to tumours. 57 Moreover, in mtDNA depletion syndromes, rare defects in nuclear genes that regulate mtDNA lead to mtDNA-CN deficiencies, resulting in brain developmental disorders. 58 Beyond these rare monogenic syndromes, the role of common genetic variations in regulating mtDNA-CN remains a vibrant area of research. ...
... Previous attempts to quantify the link between the genome and transcriptome [31,32] relied on bulk sequencing, which obscures tumor heterogeneity and the contribution of non-cancerous cells in the microenvironment. While paired genomic and transcriptional profiling at the single-cell level is technically possible [33][34][35][36], such data are difficult to generate at scale and depth, and datasets coupling these methods remain scarce [37,38]. Conversely, scRNA-seq analysis pipelines for clustering single-cell transcriptomics and identifying differentially-expressed genes are also limited in the scope of biological insights as they merely focus on phenotypic plasticity and do not elucidate driving factors such as gene dosage, clonal identities, and lineages. ...
... Recent studies in proteogenomics have revealed that certain patterns of protein expression and modifications are linked to patient survival rates and outcomes in cases of High-Grade Serous Ovarian Carcinoma (HGSOC) 10,12,13 . Additionally, Chowdhury et.al 9 described a signature of 64 proteins that can predict with high specificity which patients might develop resistance to initial platinum therapy. ...
... Importantly, multiple instance learning allows ABMIL models to learn from specimen-level labels, not requiring exhaustive pixel-level annotations, which are time-consuming and costly to obtain 15 . This feature makes ABMIL models particularly well-suited for tasks such as cancer detection 16,17 , diagnosis [18][19][20][21] , identification of primary cancer origin 22 , grading 17,23,24 , genomic aberration detection [25][26][27][28] , molecular phenotyping [29][30][31] , treatment response prediction [32][33][34] , and prognostication 33, 35-37 . However, the widespread adoption of ABMIL models in clinical settings is hindered by challenges in model interpretability and trustworthiness 9,10,38,39 . ...
... Moreover, the two distinct TME subtypes were significantly predictive of OS and PFS in the patients treated with first-line immune checkpoint inhibitors, going beyond TIL estimates and PD-L1 scores. While numerous deep learning studies have emerged for predicting ICI responses in NSCLC from H&E images, they are primarily focused on refining PD-L1 quantification [56][57][58] . In contrast to previous studies, our approach aims to offer a more comprehensive overview of the tumor microenvironment by predicting the TME cell type and molecular composition. ...