Broad Institute of MIT and Harvard
  • Cambridge, United States
Recent publications
Serum lipid levels, which are influenced by both genetic and environmental factors, are key determinants of cardiometabolic health and are influenced by both genetic and environmental factors. Improving our understanding of their underlying biological mechanisms can have important public health and therapeutic implications. Although psychosocial factors, including depression, anxiety, and perceived social support, are associated with serum lipid levels, it is unknown if they modify the effect of genetic loci that influence lipids. We conducted a genome-wide gene-by-psychosocial factor interaction (G×Psy) study in up to 133,157 individuals to evaluate if G×Psy influences serum lipid levels. We conducted a two-stage meta-analysis of G×Psy using both a one-degree of freedom (1df) interaction test and a joint 2df test of the main and interaction effects. In Stage 1, we performed G×Psy analyses on up to 77,413 individuals and promising associations (P < 10⁻⁵) were evaluated in up to 55,744 independent samples in Stage 2. Significant findings (P < 5 × 10⁻⁸) were identified based on meta-analyses of the two stages. There were 10,230 variants from 120 loci significantly associated with serum lipids. We identified novel associations for variants in four loci using the 1df test of interaction, and five additional loci using the 2df joint test that were independent of known lipid loci. Of these 9 loci, 7 could not have been detected without modeling the interaction as there was no evidence of association in a standard GWAS model. The genetic diversity of included samples was key in identifying these novel loci: four of the lead variants displayed very low frequency in European ancestry populations. Functional annotation highlighted promising loci for further experimental follow-up, particularly rs73597733 (MACROD2), rs59808825 (GRAMD1B), and rs11702544 (RRP1B). Notably, one of the genes in identified loci (RRP1B) was found to be a target of the approved drug Atenolol suggesting potential for drug repurposing. Overall, our findings suggest that taking interaction between genetic variants and psychosocial factors into account and including genetically diverse populations can lead to novel discoveries for serum lipids.
After acute lesions in the central nervous system (CNS), the interaction of microglia, astrocytes, and infiltrating immune cells decides over their resolution or chronification. However, this CNS-intrinsic cross-talk is poorly characterized. Analyzing cerebrospinal fluid (CSF) samples of Multiple Sclerosis (MS) patients as well as CNS samples of female mice with experimental autoimmune encephalomyelitis (EAE), the animal model of MS, we identify microglia-derived TGFα as key factor driving recovery. Through mechanistic in vitro studies, in vivo treatment paradigms, scRNA sequencing, CRISPR-Cas9 genetic perturbation models and MRI in the EAE model, we show that together with other glial and non-glial cells, microglia secrete TGFα in a highly regulated temporospatial manner in EAE. Here, TGFα contributes to recovery by decreasing infiltrating T cells, pro-inflammatory myeloid cells, oligodendrocyte loss, demyelination, axonal damage and neuron loss even at late disease stages. In a therapeutic approach in EAE, blood-brain barrier penetrating intranasal application of TGFα attenuates pro-inflammatory signaling in astrocytes and CNS infiltrating immune cells while promoting neuronal survival and lesion resolution. Together, microglia-derived TGFα is an important mediator of glial-immune crosstalk, highlighting its therapeutic potential in resolving acute CNS inflammation.
Randomized controlled trials (RCTs) remain the gold standard for evaluating medical interventions, yet ethical, practical and financial constraints often necessitate reliance on observational data and trial emulations. This study explores how integrating genetic data can enhance both emulated and traditional trial designs. Using FinnGen (n = 425,483), we emulated four major cardiometabolic RCTs and showed how reduced differences in polygenic scores (PGS) between trial arms track improvement in study design. Simulation studies reveal that PGS alone cannot fully adjust for unmeasured confounding. Instead, Mendelian randomization analyses can be used to detect likely confounders. Finally, trial emulations provide a platform to assess and refine PGS implementation for genetic enrichment strategies. By comparing associations of PGS with trial outcomes in the general population and emulated trial cohorts, we highlight the need to validate prognostic enrichment approaches in trial-relevant populations. These results highlight the growing potential of incorporating genetic information to optimize clinical trial design.
Efficient gene delivery vectors are crucial for respiratory and lung disease therapies. We report that AAV. CPP.16, an engineered adeno-associated virus (AAV) variant derived from AAV9, efficiently transduces airway and lung cells in mice and non-human primates via intranasal administration. AAV.CPP.16 outperforms AAV6 and AAV9, two wild-type AAVs with demonstrated tropism for respiratory tissues, and efficiently targets key respiratory cell types. It supports gene supplementation and editing therapies in two clinically relevant mouse models of respiratory and lung diseases. A single intranasal dose of AAV.CPP.16 expressing a dual-target, vascular endothelial growth factor (VEGF)/transforming growth factor (TGF)-β1-neutralizing protein protected lungs from idiopathic pulmonary fibrosis, while a similar application of AAV.CPP.16 carrying an ‘‘all-in-one’’ CRISPR-Cas13d system inhibited transcription of the SARS-CoV-2-derived RNA-dependent RNA polymerase (Rdrp) gene. Our findings highlight AAV.CPP.16 as a promising vector for respiratory and lung gene therapy.
BACKGROUND Clinical factors discriminate incident atrial fibrillation (AF) risk with moderate accuracy, with only modest improvement after incorporation of polygenic risk scores. Whether emerging large-scale proteomic profiling can augment AF risk estimation is unknown. METHODS In the UK Biobank cohort, we derived and validated a machine learning model to predict incident AF risk using serum proteins (Pro-AF). We compared Pro-AF to a validated clinical risk score (Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation) and an AF polygenic risk score. Models were evaluated in a multiply resampled test set from nested cross-validation (internal test set), and a sample of UK Biobank participants separate from model development (hold-out test set). Metrics included discrimination of 5-year incident AF using time-dependent area under the receiver operating characteristic curve and net reclassification. RESULTS Trained in 32 631 UK Biobank participants, Pro-AF predicts incident AF using 121 protein levels (out of 2911 protein analytes). When assessed in the internal test set comprising 30 632 individuals (mean age 57±8 years, 54% women, 2045 AF events) and hold-out test set comprising 13 998 individuals (mean age 57±8 years, 54% women, 870 AF events), discrimination of 5-year incident AF was highest using Pro-AF (area under the receiver operating characteristic curve internal: 0.761 [95% CI, 0.745–0.780], hold-out: 0.763 [0.734–0.784]), followed by Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (0.719 [0.700–0.737]; 0.702 [0.668–0.730]) and the polygenic risk score (0.686 [0.668–0.702]; 0.682 [0.660–0.710]). AF risk estimates were well-calibrated, and the addition of Pro-AF led to substantial continuous net reclassification improvement over Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (eg, internal test set 0.410 [0.330–0.492]). A simplified Pro-AF including only the 5 most influential proteins (NT-proBNP, EDA2R [ectodysplasin A2 receptor], NPPB [B-type natriuretic peptide], BCAN [brevican core protein], and GDF15 [growth/differentiation factor 15]), retained favorable discriminative value (area under the receiver operating characteristic curve internal: 0.750 [0.733–0.768]; hold-out: 0.759 [0.732–0.790]). CONCLUSIONS A machine learning-based protein score discriminates 5-year incident AF risk favorably compared with clinical and genetic risk factors. Large-scale proteomic analysis may assist in the prioritization of individuals at risk for AF for screening and related preventive interventions.
Biomolecular condensates organize numerous subcellular processes and have been implicated in diseases, including neurodegeneration and cancer. Protein sequences intrinsically encode their propensity to form condensates, but specific sequence features that regulate this behavior have not been systematically explored at scale. Here, we develop CondenSeq, a high-throughput pooled imaging with in situ sequencing approach to measure propensities of thousands of protein sequences to form nuclear condensates. Leveraging the large scale of these experiments, we evaluated the impacts of dozens of sequence features across a wide range of sequence contexts, identifying several features with highly consistent, context-independent effects and others with less-consistent effects. We also identified multiple classes of condensates and discovered distinct sequence properties that drive their formation. Our results provide a systematic overview of the relationships between protein sequences and nuclear condensate formation and establish a general approach for further dissecting these relationships at scale.
DNA double-strand breaks (DSB) are among the most deleterious forms of DNA damage and, if unresolved, result in DNA mutations and chromosomal aberrations that can cause disease, including cancer. Repair of DSBs by homologous recombination requires extensive nucleolytic digestion of DNA ends in a process known as DNA-end resection. In recent years, progress has been made in understanding how this process is initiated, but the later stages of this process—long-range DNA-end resection—are not well understood. Many questions remain in terms of how the DNA helicases and endonucleases that catalyse this process are regulated, a key step to avoiding spurious activity in the absence of breaks. The importance of DNA-end resection in human disease is highlighted by several human genetic syndromes that are caused by mutations or deficiencies in key proteins involved in this process. Here, using high-throughput microscopy coupled with a cDNA ‘chromORFeome’ library, we identified ZNF280A as an uncharacterized chromatin factor that is recruited to breaks and essential for DNA DSB repair. Lack of ZNF280A drives genomic instability and substantial sensitivity to DNA-damaging agents. Mechanistically, we demonstrate that ZNF280A promotes long-range DNA-end resection by facilitating the recruitment of the BLM–DNA2 helicase–nuclease complex to DNA DSB sites, enhancing efficiency of the enzymatic activity of this complex at DNA damage sites. ZNF280A is therefore essential for DNA-end resection and DNA repair by homologous recombination. Importantly, ZNF280A is hemizygously deleted in a human genetic condition, 22q11.2 distal deletion syndrome. Features of this condition include congenital heart disease, microcephaly, immune deficiency, developmental delay and cognitive deficits—features that are associated with other human syndromes caused by defects in genes involved in DNA repair. Remarkably, cells from individuals with a 22q11.2 distal deletion have defects in DNA-end resection and homologous recombination, resulting in increased incidence of genomic instability. These phenotypes are rescued by reintroduction of ZNF280A, providing evidence of defective DNA repair as a potential mechanistic explanation for several clinical features associated with this human condition.
Motivation Recent advances in single-cell multimodal omics technologies enable the exploration of cellular systems at unprecedented resolution, leading to the rapid generation of multimodal datasets that require sophisticated integration methods. Diagonal integration has emerged as a flexible solution for integrating heterogeneous single-cell data without relying on shared cells or features. However, the absence of anchoring elements introduces the risk of artificial integrations, where cells across modalities are incorrectly aligned due to ambiguous mapping. Results To address this challenge, we propose SONATA, a novel diagnostic method designed to detect potential artificial integrations resulting from ambiguous mappings in diagonal data integration. SONATA identifies ambiguous alignments by quantifying cell-cell ambiguity within the data manifold, ensuring that biologically meaningful integrations are distinguished from spurious ones. It is worth noting that SONATA is not designed to replace any existing pipelines for diagonal data integration; instead, SONATA works simply as an add-on to an existing pipeline for achieving more reliable integration. Through a comprehensive evaluation on both simulated and real multimodal single-cell datasets, we observe that artificial integrations in diagonal data integration are widespread yet surprisingly overlooked, occurring across all mainstream diagonal integration methods. We demonstrate SONATA’s ability to safeguard against misleading integrations and provide actionable insights into potential integration failures across mainstream methods. Our approach offers a robust framework for ensuring the reliability and interpretability of multimodal single-cell data integration. Availability and Implementation The source code is available at (https://github.com/batmen-lab/SONATA). Supplementary information Supplementary data are available at Bioinformatics
Preclinical ex vivo models capable of probing patient-specific tumor-immune interactions are particularly attractive candidates for interrogating mechanisms of resistance, developing predictors of response as well as assessing next-generation immunotherapeutics. By maintaining features of a patient's own tumor microenvironment, such patient-derived ex vivo models are poised to meaningfully contribute to the functional assessment of individual tumors to provide a tailored approach to treatment. Among contemporary ex vivo models, patient-derived organotypic tumor spheroids (PDOTS) have emerged as a promising microfluidic-based platform that is well positioned to become a useful tool for precision medicine efforts. The advantages and limitations of PDOTS and related state-of-the-art patient-derived tumor models, as well as ongoing challenges facing the clinical implementation of patient-derived ex vivo tumor models, are reviewed.
Methods that analyze single-cell paired RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin using sequencing (ATAC-seq) multiome data have shown promise in linking regulatory elements to genes. However, existing methods exhibit low concordance and do not capture the effects of genomic distance. We propose pgBoost, an integrative modeling framework that trains a non-linear combination of existing linking strategies (including genomic distance) on expression quantitative trait locus (eQTL) data to assign a probabilistic score to each candidate single-nucleotide polymorphism–gene link. pgBoost attained higher enrichment than existing methods for evaluation sets derived from eQTL, activity-by-contact, CRISPR and genome-wide association study (GWAS) data. We further determined that restricting pgBoost to features from a focal cell type improved power to identify links relevant to that cell type. We highlight several examples in which pgBoost linked fine-mapped GWAS variants to experimentally validated or biologically plausible target genes that were not implicated by other methods. In conclusion, a non-linear combination of linking strategies improves power to identify target genes underlying GWAS associations.
Exercise’s protective effects in Alzheimer’s disease (AD) are well recognized, but cell-specific contributions to this phenomenon remain unclear. Here we used single-nucleus RNA sequencing (snRNA-seq) to dissect the response to exercise (free-wheel running) in the neurogenic stem-cell niche of the hippocampal dentate gyrus in male APP/PS1 transgenic AD model mice. Transcriptomic responses to exercise were distinct between wild-type and AD mice, and most prominent in immature neurons. Exercise restored the transcriptional profiles of a proportion of AD-dysregulated genes in a cell type-specific manner. We identified a neurovascular-associated astrocyte subpopulation, the abundance of which was reduced in AD, whereas its gene expression signature was induced with exercise. Exercise also enhanced the gene expression profile of disease-associated microglia. Oligodendrocyte progenitor cells were the cell type with the highest proportion of dysregulated genes recovered by exercise. Last, we validated our key findings in a human AD snRNA-seq dataset. Together, these data present a comprehensive resource for understanding the molecular mediators of neuroprotection by exercise in AD.
Defining viral proteomes is crucial to understanding viral life cycles and immune recognition but the landscape of translated regions remains unknown for most viruses. We have developed massively parallel ribosome profiling (MPRP) to determine open reading frames (ORFs) across tens of thousands of designed oligonucleotides. MPRP identified 4208 unannotated ORFs in 679 human-associated viral genomes. We found viral peptides originating from detected noncanonical ORFs presented on class-I human leukocyte antigen in infected cells and hundreds of upstream ORFs that likely modulate translation initiation of viral proteins. The discovery of viral ORFs across a wide range of viral families-including highly pathogenic viruses-expands the repertoire of vaccine targets and reveals potential cis-regulatory sequences.
Avibactam (AVI) is a diazabicyclooctane (DBO) β-lactamase inhibitor used clinically in combination with ceftazidime. At concentrations higher than those typically achieved in vivo , it also has broad-spectrum direct antibacterial activity against Enterobacterales strains, including metallo-β-lactamase-producing isolates, mediated by inhibition of penicillin-binding protein 2 (PBP2). This activity has some mechanistic similarities to that of more potent novel DBOs (zidebactam and nacubactam) in late clinical development. We found that resistance to AVI emerged readily, with a mutation frequency of 2 × 10 ⁻⁶ to 8 × 10 ⁻⁵ . Whole-genome sequencing of resistant isolates revealed a heterogeneous mutational target that permitted bacterial survival and replication despite PBP2 inhibition, in line with prior studies of PBP2-targeting drugs. While such mutations are believed to act by upregulating the bacterial stringent response, we found a similarly high mutation frequency in bacteria deficient in components of the stringent response, although we observed a different set of mutations in these strains. Although avibactam-resistant strains had increased lag time, suggesting a fitness cost that might render them less problematic in clinical infections, there was no statistically significant difference in growth rates between susceptible and resistant strains. The finding of rapid emergence of resistance to avibactam as the result of a large and complex mutational target adds to our understanding of resistance to PBP2-targeting drugs and has potential implications for novel DBOs with potent direct antibacterial activity, which are being developed with the goal of expanding cell wall-active treatment options for multidrug-resistant gram-negative infections. IMPORTANCE Avibactam (AVI) is the first in a class of novel β-lactamase inhibitor antibiotics called diazabicyclooctanes (DBOs). In addition to its ability to inhibit bacterial β-lactamase enzymes that can destroy β-lactam antibiotics, we found that AVI had direct antibacterial activity, at concentrations higher than those used clinically, against even highly multidrug-resistant bacteria. This activity is the result of inhibition of the bacterial enzyme penicillin-binding protein 2 (PBP2). Resistance to other drugs that inhibit PBP2 occurs through mutations that involve upregulation of the bacterial “stringent response” to stress. We found that bacteria developed resistance to AVI at a high rate, as a result of mutations in stringent response genes. We also found that bacteria with impairments in the stringent response could still develop resistance to AVI through different mutations. Our findings indicate the importance of studying how resistance will emerge to newer, more potent DBOs in development and early clinical use.
Cyanide is one of the oldest known poisons in human history. In the 1980s, seminal work began to elucidate the broad cellular mechanisms of cyanide toxicity beyond its canonical inhibition of cytochrome c oxidase. In the 1990s, endogenous metabolites were shown to sequester cyanide, and these became promising avenues for the development of a cyanide antidote. However, an FDA‐approved metabolite‐based cyanide antidote did not come to fruition. More recently, in the past 10 years, advances in mass spectrometry‐based metabolomics profiling, subcellular drug targeting, and genome editing have brought fresh perspectives to the concept of a metabolism‐based cyanide antidote. Here, we review the mechanisms of cyanide toxicity with a focus on intermediary metabolism. We discuss the current state of our knowledge and gaps in our understanding of the metabolic mechanisms that contribute to cyanide poisoning, in addition to highlighting recent findings that break new ground in the field. We present the theory of redirecting intermediary metabolism to counteract cyanide poisoning: while cyanide shifts metabolism from oxidative phosphorylation to glycolysis, the metabolome encompasses hundreds of pathways; thus, potential therapeutic opportunities may reside in activating metabolism into other pathways. Potential approaches to targeting metabolism as a therapeutic intervention for cyanide poisoning will also be discussed. These targets represent an opportunity for a significant paradigm shift from current FDA‐approved treatments, which chelate the chemical toxicant but do not reverse the broad spectrum of cellular and metabolic damage caused by cyanide, to a treatment that may improve the long‐term effects of cyanide poisoning.
Clonal hematopoiesis of indeterminate potential (CHIP) is associated with increased mortality and malignancy risk, yet the determinants of clonal expansion remain poorly understood. We performed sequencing at >4,000x depth of coverage for CHIP mutations in 6,976 postmenopausal women from the Women's Health Initiative at two timepoints: the WHI baseline exam and approximately 16 years later at the Long Life Study (LLS) visit. Among 3,685 CH mutations detected at baseline (VAF ≥ 0.5%), 24% were not detected at LLS, 26% were micro-CH at LLS (0.5% ≤ VAF < 2%), and 50% were CHIP (VAF ≥ 2%). We confirmed that clonal expansion is highly dependent on initial clone size and CHIP driver gene, with SF3B1 and JAK2 mutations exhibiting the fastest growth rate. We identified germline variants in TERT, IL6R, TCL1A, and MSI2 that modulate clonal expansion rate. Measured baseline leukocyte telomere length showed differential effects on incident CHIP risk, with shorter baseline leukocyte telomere length predisposing to incident PPM1D mutations and longer baseline leukocyte telomere length favoring incident DNMT3A mutations. We discovered that the IL6R missense variant p.Asp358Ala specifically impairs TET2 clonal expansion, supported by direct measurements of soluble interleukin-6 receptor and interleukin-6. Faster clonal growth rate was associated with increased risk of cytopenia, leukemia, and all-cause mortality. Notably, CHIP clonal expansion rate mediated 34.4% and 43.7% of the Clonal Hematopoiesis Risk Score's predictive value for leukemia and all-cause mortality, respectively. These findings reveal key biological determinants of CHIP progression and suggest that incorporating growth rate measurements could enhance risk stratification.
Developing bioelectronics capable of stably tracking brain-wide, single-cell, millisecond-resolved neural activity in the developing brain is critical for advancing neuroscience and understanding neurodevelopmental disorders. During development, the three-dimensional structure of the vertebrate brain arises from a two-dimensional neural plate1,2. These large morphological changes have previously posed a challenge for implantable bioelectronics to reliably track neural activity throughout brain development3, 4, 5, 6, 7, 8–9. Here we introduce a tissue-level-soft, submicrometre-thick mesh microelectrode array that integrates into the embryonic neural plate by leveraging the tissue’s natural two-dimensional-to-three-dimensional reconfiguration. As organogenesis progresses, the mesh deforms, stretches and distributes throughout the brain, seamlessly integrating with neural tissue. Immunostaining, gene expression analysis and behavioural testing confirm no adverse effects on brain development or function. This embedded electrode array enables long-term, stable mapping of how single-neuron activity and population dynamics emerge and evolve during brain development. In axolotl models, it not only records neural electrical activity during regeneration but also modulates the process through electrical stimulation.
Background Congenital myopathies are a group of neuromuscular disorders that typically present at birth or early childhood with hypotonia and non-progressive or slowly progressive muscle weakness. They are classically subclassified by characteristic structural changes and histopathological findings in skeletal muscle. Variants in over 40 genes have been described to date in patients with various forms of congenital myopathy with overlapping phenotypic and histological features, which poses a challenge for laboratories and clinicians in interpreting genetic findings. Objective The purpose of this study was to evaluate the evidence supporting each gene-disease relationship and provide an expert-reviewed classification for the clinical validity of genes involved in congenital myopathies. Methods The ClinGen Neurological Disorders Clinical Domain Working Group assembled the Congenital Myopathies Gene Curation Expert Panel (CongenMyopathy-GCEP), a group of clinicians and geneticists with expertise in congenital myopathies tasked to perform evidence-based curation of 50 gene-disease relationships using the ClinGen semiquantitative framework to assign clinical validity. Results Our curation effort resulted in 35 (70%) Definitive, eight (16%) Moderate, six (12%) Limited, and one (2%) Disputed disease relationship classifications. The summary of each curation is made publicly available on the ClinGen website. Conclusions Expert-reviewed assignment of gene-disease relationships by the CongenMyopathy-GCEP facilitates accurate molecular diagnoses for congenital myopathies and can allow genetic testing to focus on genes with a validated role in disease.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
990 members
Dirk Gevers
  • Genome Sequencing and Analysis Program
Shamsudheen Karuthedath Vellarikkal
  • Program in Medical and Population Genetics
Aviad Tsherniak
  • Cancer Program
Information
Address
Cambridge, United States
Head of institution
Eric Lander