Eric S. Lander’s research while affiliated with Broad Institute of MIT and Harvard and other places
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
The time required to conduct clinical trials limits the rate at which we can evaluate and deliver new treatment options to patients with cancer. New approaches to increase trial efficiency while maintaining rigor would benefit patients, especially in oncology, in which adjuvant trials hold promise for intercepting metastatic disease, but typically require large numbers of patients and many years to complete. We envision a standing platform - an infrastructure to support ongoing identification and trial enrolment of patients with cancer with early molecular evidence of disease (MED) after curative-intent therapy for early-stage cancer, based on the presence of circulating tumour DNA. MED strongly predicts subsequent recurrence, with the vast majority of patients showing radiographic evidence of disease within 18 months. Such a platform would allow efficient testing of many treatments, from small exploratory studies to larger pivotal trials. Trials enrolling patients with MED but without radiographic evidence of disease have the potential to advance drug evaluation because they can be smaller (given high probability of recurrence) and faster (given short time to recurrence) than conventional adjuvant trials. Circulating tumour DNA may also provide a valuable early biomarker of treatment effect, which would allow small signal-finding trials. In this Perspective, we discuss how such a platform could be established.
We present a method for detecting evidence of natural selection in ancient DNA time-series data that leverages an opportunity not utilized in previous scans: testing for a consistent trend in allele frequency change over time. By applying this to 8433 West Eurasians who lived over the past 14000 years and 6510 contemporary people, we find an order of magnitude more genome-wide significant signals than previous studies: 347 independent loci with >99% probability of selection. Previous work showed that classic hard sweeps driving advantageous mutations to fixation have been rare over the broad span of human evolution, but in the last ten millennia, many hundreds of alleles have been affected by strong directional selection. Discoveries include an increase from ∼0% to ∼20% in 4000 years for the major risk factor for celiac disease at HLA-DQB1 ; a rise from ∼0% to ∼8% in 6000 years of blood type B; and fluctuating selection at the TYK2 tuberculosis risk allele rising from ∼2% to ∼9% from ∼5500 to ∼3000 years ago before dropping to ∼3%. We identify instances of coordinated selection on alleles affecting the same trait, with the polygenic score today predictive of body fat percentage decreasing by around a standard deviation over ten millennia, consistent with the “Thrifty Gene” hypothesis that a genetic predisposition to store energy during food scarcity became disadvantageous after farming. We also identify selection for combinations of alleles that are today associated with lighter skin color, lower risk for schizophrenia and bipolar disease, slower health decline, and increased measures related to cognitive performance (scores on intelligence tests, household income, and years of schooling). These traits are measured in modern industrialized societies, so what phenotypes were adaptive in the past is unclear. We estimate selection coefficients at 9.9 million variants, enabling study of how Darwinian forces couple to allelic effects and shape the genetic architecture of complex traits.
Most phenotype-associated genetic variants map to non-coding regulatory regions of the human genome. Moreover, variants associated with blood cell phenotypes are enriched in regulatory regions active during hematopoiesis. To systematically explore the nature of these regions, we developed a highly efficient strategy, Perturb-multiome, that makes it possible to simultaneously profile both chromatin accessibility and gene expression in single cells with CRISPR-mediated perturbation of a range of master transcription factors (TFs). This approach allowed us to examine the connection between TFs, accessible regions, and gene expression across the genome throughout hematopoietic differentiation. We discovered that variants within the TF-sensitive accessible chromatin regions, while representing less than 0.3% of the genome, show a ~100-fold enrichment in heritability across certain blood cell phenotypes; this enrichment is strikingly higher than for other accessible chromatin regions. Our approach facilitates large-scale mechanistic understanding of phenotype-associated genetic variants by connecting key cis-regulatory elements and their target genes within gene regulatory networks.
Enhancers are key drivers of gene regulation thought to act via 3D physical interactions with the promoters of their target genes. However, genome-wide depletions of architectural proteins such as cohesin result in only limited changes in gene expression, despite a loss of contact domains and loops. Consequently, the role of cohesin and 3D contacts in enhancer function remains debated. Here, we developed CRISPRi of regulatory elements upon degron operation (CRUDO), a novel approach to measure how changes in contact frequency impact enhancer effects on target genes by perturbing enhancers with CRISPRi and measuring gene expression in the presence or absence of cohesin. We systematically perturbed all 1,039 candidate enhancers near five cohesin-dependent genes and identified 34 enhancer-gene regulatory interactions. Of 26 regulatory interactions with sufficient statistical power to evaluate cohesin dependence, 18 show cohesin-dependent effects. A decrease in enhancer-promoter contact frequency upon removal of cohesin is frequently accompanied by a decrease in the regulatory effect of the enhancer on gene expression, consistent with a contact-based model for enhancer function. However, changes in contact frequency and regulatory effects on gene expression vary as a function of distance, with distal enhancers ( e.g. , >50Kb) experiencing much larger changes than proximal ones ( e.g. , <50Kb). Because most enhancers are located close to their target genes, these observations can explain how only a small subset of genes — those with strong distal enhancers — are sensitive to cohesin. Together, our results illuminate how 3D contacts, influenced by both cohesin and genomic distance, tune enhancer effects on gene expression.
Results from genome-wide association studies (GWAS) enable inferences about the balance of evolutionary forces maintaining genetic variation underlying common diseases and other genetically complex traits. Natural selection is a major force shaping variation, and understanding it is necessary to explain the genetic architecture and prevalence of heritable diseases. Here, we analyze data for 27 traits, including anthropometric traits, metabolic traits, and binary diseases—both early-onset and post-reproductive. We develop an inference framework to test existing population genetics models based on the joint distribution of allelic effect sizes and frequencies of trait-associated variants. A majority of traits have GWAS results that are inconsistent with neutral evolution or long-term directional selection (selection against a trait or against disease risk). Instead, we find that most traits show consistency with stabilizing selection, which acts to preserve an intermediate trait value or disease risk. Our observations also suggest that selection may reflect pleiotropy, with each variant influenced by associations with multiple selected traits.
Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.
Pancreatic cancer (PDAC) is a lethal disease in part because tumor cells exist in distinct transcriptional states (e.g. basal/mesenchymal v.s. classical/epithelial) with unique phenotypic properties that contribute to tumor growth and treatment resistance. Two major mechanisms have been suggested for treatment evasion: (1) the intrinsic resistance of an existing state to a therapy regimen and (2) plasticity of therapy-sensitive states to adopt more resistant states. The relative contribution of these mechanisms to treatment resistance is still poorly understood. Historically, measurement of plasticity in both human patients and mouse models has involved one of three principles: (1) observing a redistribution of cell states in tissue across timepoints or conditions; (2) identifying cells that have genomic, epigenetic or proteomic features of more than one state (mixed states); and (3) performing single-cell cloning of cells and observing the cell states adopted by clonal progeny. While these approaches are observationally consistent with the notion of plasticity, they either fail to definitively prove the existence of plasticity, are restricted in measurements of plasticity outside of native tissues or are unable to quantify the role of plasticity in treatment resistance.
Amongst the most well described forms of plasticity in human development and cancer is epithelial-mesenchymal plasticity (EMP), which includes epithelial to mesenchymal transition (EMT) and mesenchymal to epithelial transition (MET). To better understand and quantify the role of EMP in driving treatment resistance of human PDAC, we have developed single-cell multiomic, functional genomic and computational methods applied to patient-derived models and clinical biopsies. We first profiled twelve patient-derived PDAC cell lines by single-cell RNA-seq (scSeq) and learned convergent epithelial and mesenchymal gene programs that were consistent with programs observed in patient samples. We next performed lineage tracing experiments in three PDAC cell lines using an expressed lentiviral barcoding system (ClonMapper). By performing scSeq on these barcoded lines at weekly timepoints over four weeks, we proved the presence of EMP by showing a single cell can produce progeny in both epithelial and mesenchymal states.
We next developed a generative probabilistic model of our lineage tracing data. This demonstrated that clones (cells sharing a barcode) had different transition matrices (different EMT and MET rates), thus suggesting each clone has a distinct level of plasticity. Having established this, we focused on identifying genes that might explain the differing plasticity properties of clones. Using elastic net regression we identified 50 transcription factors (TFs) whose expression significantly explained the propensity for EMP over time across clonal populations. Among these were were several known EMP TFs (Zeb1 and Gata6), understudied TFs (Elf3, Sox2, Sox4, Klf3, Klf5 and Atf4) and novel TFs (Meis2, Meis3, FoxA1 and the interferon regulatory factors Irf6, Irf7 and Irf9). Using single-cell multiomics (paired scSeq and single-cell ATAC-seq) on our barcoded population, we found that 9 of the 50 predicted TFs, including Elf3, had differential accessibility between clones with different plasticity properties, suggesting a role for epigenetic regulation of these TFs in facilitating EMP. Importantly, we leveraged our multiomic data to infer gene regulatory networks influenced by these TFs and found an enrichment of binding motifs for these TFs in enhancer regions of genes in epithelial and mesenchymal programs.
To study the role of these predicted TFs in modulating EMP, we developed a CRISPRi system that enabled gene perturbations alongside lineage tracing. We performed a CRISPRi perturb-seq experiment (CRISPR perturbation with scSeq readouts), perturbing the 50 predicted TFs above and 10 control genes, and collected 1.5 million single-cell transcriptomic profiles, in addition to two other CRISPR KO perturb-seq experiments. We performed a negative binomial regression to estimate effect sizes of guide RNAs on all genes. 60% of our guides had significant perturbation effects on their target gene. We subsequently found Klf3 as an important regulator of PDAC proliferation independent of cell state. Importantly, we found that knockdown (KD) of several factors, such has Grhl2 and FoxA1, bias towards mesenchymal cell states, whereas KD of others such as Batf2, Snai1, Rel, Zeb1, Nr2f1 and Sox4 led to a bias towards epithelial cell states. Interestingly, KD of several TFs influenced transition properties of cells by decreasing rates of EMT and MET across barcodes without biasing the overall clonal distribution towards a single cell state. This suggests a role for these TFs in enabling plasticity and facilitating state transitions.
To study the effect of plasticity in treatment resistance, we treated four barcoded cell lines with the first-line chemotherapy combination FOLFIRINOX (5-fluorouracil, oxaliplatin and SN-38, the active metabolite of irinotecan) or targeted therapy and performed scSeq yielding over 600,00 single-cell transcriptomic profiles. We found an enrichment after treatment with FOLFIRINOX of barcodes that were biased for cells in mesenchymal states, consistent with selection against epithelial cells, but also of those barcodes with the highest inferred state transition rates. With targeted therapies, we found selective depletion of mesenchymal states. We next treated our CRISPR perturbed cell lines and found overall a significantly more restricted barcode diversity in cells containing guide RNAs targeting plasticity factors compared to non-targeting controls, suggestive of the role of plasticity in facilitation resistance.
To validate the role of these proposed plasticity factors in human patients, we collected paired biopsy samples from 23 patients in a phase 2 clinical trial of metastatic PDAC patients being treated with radiation therapy and dual checkpoint blockade (NCT03104439). We performed scSeq on these samples, and used a supervised Bayesian matrix factorization approach (Spectra) to learn epithelial and mesenchymal gene programs within tumor cells. We subsequently classified cells as epithelial, mesenchymal and intermediate cell types using a gaussian mixture model on gene expression features. We found the intermediate states were enriched in expression of our proposed plasticity factors, and importantly high expression of these factors in baseline samples correlated with a redistribution of states in follow-up biopsies.
Our efforts define a robust experimental and quantitative framework for studying tumor cell plasticity in patient-derived model systems with validation in human patient samples using single-cell and spatial transcriptomics. Collectively, we nominate several regulators that alter the propensity of EMP in PDAC, thus posing a paradigm whereby perturbations may be used to homogenize tumor populations towards treatment-sensitive phenotypes for combination therapy.
Citation Format: Arnav Mehta, Lynn Bi, Deepika Yeramosu, Michael Bogaev, Martin Jankowiak, Abigail Collins, Aziz Al'Khafaji, Milan Parikh, Mehrtash Babadi, Kyle Evans, Alex Bloemendal, Russell Kunnes, Marc Schwartz, Glen Munson, Elisa Donnard, Thouis R. Jones, Ben Z. Stanger, Jay Shendure, Jonathan Weissman, David T. Ting, Andrew Aguirre, Nir Hacohen, Dana Pe'er, Eric S. Lander. Dissecting and quantifying pancreatic cancer plasticity using single-cell multiomics, lineage tracing and functional genomics reveals novel mediators of therapy resistance [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl):Abstract nr NG08.
Epithelial-mesenchymal transition (EMT) is a hallmark of pancreatic ductal adenocarcinoma (PDAC) invasion. In general, cellular plasticity is a major contributor to both tumor progression and therapy resistance. The major goals of our study are to 1) directly and quantitatively measure plasticity in PDAC3 cell lines, and 2) identify candidate genes that could alter the plasticity of cancer cells. By understanding the regulation of tumor cell plasticity, we may find ways to perturb cells and prevent the transition towards treatment resistant cell states. A strict definition of plasticity in cancer has yet to be established, however, and is imperative to our understanding of cell state regulation and for identifying vulnerabilities in plastic cells. We argue that in order to prove plasticity occurs within a population of cells, one must either 1) demonstrate that a new cell state has emerged that was previously not present, or 2) trace the lineage of a cell to show it can adopt multiple cell states. For our study, we profile 12 patient-derived PDAC cell lines to find convergent EMT programs and perform lineage tracing on these cell lines using a modified CROP-seq vector, called ClonMapper, to quantitatively measure plasticity. Towards this goal, we perform single-cell RNA-seq (scRNA) and single-cell multiomic measurements of 1000 uniquely barcoded cells expanded over a time course of 2, 3 and 4 weeks. We refer to cells that have the same lineage barcode - originating from the same initial cell - as families. We found that ~50% of families are composed of cells that are epithelial and mesenchymal, thus proving plasticity must exist, as it shows that a given parental cell of unknown state can give rise to progeny that are in multiple states. We then develop mathematical models to show that families have different levels of plasticity, defined by the rates of transition between distinct cell states. We next leverage our barcode-resolution time course data to identify factors at early time points that may influence plasticity at later time points. We show there is differential accessibility of several of these factors across families with different plasticity properties, thus proposing a role for epigenetic regulation of EMT. Importantly, we find that specific factors drive gene regulatory networks important for epithelial and mesenchymal programs, and therefore propose candidates for the modulation of EMT for therapeutic benefit.
Citation Format: Deepika Yeramosu, Lynn Bi, Abigail Collins, Mike Bogaev, Martin Jankowiak, Aziz Al’Khafaji, Milan Parikh, Mehrtash Babadi, Alex Bloemendal, Surya Nagaraja, Ray Jones, Jay Shendure, David T. Ting, Andrew Aguirre, Nir Hacohen, Dana Pe’er, Eric S. Lander, Arnav Mehta. The transcriptional and epigenetic regulation of epithelial-mesenchymal plasticity in patient-derived pancreatic cancer cell lines [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6940.
Infection with Lassa virus (LASV) can cause Lassa fever, a haemorrhagic illness with an estimated fatality rate of 29.7%, but causes no or mild symptoms in many individuals. Here, to investigate whether human genetic variation underlies the heterogeneity of LASV infection, we carried out genome-wide association studies (GWAS) as well as seroprevalence surveys, human leukocyte antigen typing and high-throughput variant functional characterization assays. We analysed Lassa fever susceptibility and fatal outcomes in 533 cases of Lassa fever and 1,986 population controls recruited over a 7 year period in Nigeria and Sierra Leone. We detected genome-wide significant variant associations with Lassa fever fatal outcomes near GRM7 and LIF in the Nigerian cohort. We also show that a haplotype bearing signatures of positive selection and overlapping LARGE1, a required LASV entry factor, is associated with decreased risk of Lassa fever in the Nigerian cohort but not in the Sierra Leone cohort. Overall, we identified variants and genes that may impact the risk of severe Lassa fever, demonstrating how GWAS can provide insight into viral pathogenesis.
... TAD boundaries are evolutionarily conserved and associated with complex genetic traits. This feature was described by Sandoval-Velasco et al. [53], who studied a 52,000-year-old female woolly mammoth, the cells of which retained a certain conservation of TAD organization in Siberian permafrost. ...
... The number of variants identified is now so large that a complete picture of the genetic architecture of many complex traits is starting to emerge. Recent publications illustrate all known variants for a trait on a spectrum of allele frequency versus effect size, producing a characteristic "trumpet" shape [9,10] that reflects the influence of natural selection in shaping the genetic architecture of human traits [4,11] . Stabilising selection, a key driver of genetic variation [4,12] , reduces fitness in individuals whose trait values deviate from an optimal mean, which consequently limits the frequency of large-effect alleles. ...
... Genome-wide association studies (GWAS) have significantly advanced the identification of disease-associated variants 1-3 , but fine-mapping these associations to pinpoint causal variants remains challenging due to extensive linkage disequilibrium between variants [4][5][6][7][8] . Functional genomics data encompassing various biochemical assays 9-12 , functional characterization experiments [13][14][15][16] , and sequence-based computational models [17][18][19][20][21][22] provide a wealth of information complementary to GWAS regarding variant function. Developing strategies to combine this functional information with GWAS to enhance the identification of disease-causal variants is of utmost importance, mirroring integrative approaches that prioritize disease genes 23 and variants for shared common and Mendelian disease risk 24 . ...
... Linking genes to gene programs: To uncover sets of co-expressed genes that may work together in similar biological pathways, we learned gene programs ("programs") by applying consensus non-negative matrix factorization (cNMF) 60 to the transcriptome data and annotating TFs with correlated motifs in the ATAC data (Tables S4-6; see Methods). cNMF identifies sets of genes that are co-expressed across single cells, and has been shown to learn diverse types of biologically coherent pathways, including metabolic pathways, developmental trajectories, and cell identity programs 60,61 . We applied cNMF to groups of major cell types, and identified 253 total gene programs representing a wide array of biological processes (Fig. 1L,M). ...
... During the 2015 Lassa fever (LASV) outbreak in Nigeria, the group sequenced the virus to investigate its origins 19 . Happi and colleagues developed rapid diagnostic tests (RDTs) for Ebola virus and Lassa virus using CRISPR-based tools 20 and conducted a genome wide association study (GWAS) to identify host genetic signatures identified with fatal LASV outcomes 21 . The group conducted clinical metagenomic surveillance to describe local febrile aetiology 22 , which resulted in the discovery of novel rhabdoviruses circulating in West Africa 23 , and the identification of a novel clade of yellow fever virus in an outbreak in Nigeria in 2018 24 . ...
... This framework predicts base pair-resolution chromatin accessibility data (pseudobulked per cell type) from DNA sequence, identifies TF motif instances that drive accessibility, and predicts the effects of noncoding variants on chromatin accessibility. We have shown that this approach performs well at identifying TF binding sites 58 and predicting effects of CRISPR edits on gene expression 59 . Per cell type, we identified 19-39 important TF motifs and 340,000-760,000 highconfidence TF motif instances (Table S15, Table S16), with 516 consensus motifs combined across cell types (Fig. 1I). ...
... diabetes is the leading cause of morbidity and mortality worldwide, and has affected >415 million of the world population (1). Type-2 diabetes mellitus (T2dM) is characterized by an impaired insulin secretion and insulin resistance in adipose, liver and muscle tissues (2). ...
... TWAS analysis was used to evaluate the association between gene expression levels and traits, further revealing potential functional impacts 52 . We combined GWAS summary data and eQTL data to analyze the relationship between gene expression levels and traits using the TWAS method. ...
... Since sample Yak52.3K had a non-finite radiocarbon age, 24 its age was estimated following a Bayesian molecular dating method based on the full mitochondrial genome. 74 Together with eight previously published mammoth genomes, 21,23,24 this dataset spans the last 50 thousand years of the Siberian mammoth's existence, including its isolation on Wrangel Island, and contains one of the youngest mammoth samples known to date (Wra4.3K) (Table S1). ...