Han Xu

Dana-Farber Cancer Institute, Boston, Massachusetts, United States

Are you Han Xu?

Claim your profile

Publications (10)152.94 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Identifying genomic annotations that differentiate causal from trait-associated variants is essential to fine mapping disease loci. Although many studies have identified non-coding functional annotations that overlap disease-associated variants, these annotations often colocalize, complicating the ability to use these annotations for fine mapping causal variation. We developed a statistical approach (Genomic Annotation Shifter [GoShifter]) to assess whether enriched annotations are able to prioritize causal variation. GoShifter defines the null distribution of an annotation overlapping an allele by locally shifting annotations; this approach is less sensitive to biases arising from local genomic structure than commonly used enrichment methods that depend on SNP matching. Local shifting also allows GoShifter to identify independent causal effects from colocalizing annotations. Using GoShifter, we confirmed that variants in expression quantitative trail loci drive gene-expression changes though DNase-I hypersensitive sites (DHSs) near transcription start sites and independently through 3' UTR regulation. We also showed that (1) 15%-36% of trait-associated loci map to DHSs independently of other annotations; (2) loci associated with breast cancer and rheumatoid arthritis harbor potentially causal variants near the summits of histone marks rather than full peak bodies; (3) variants associated with height are highly enriched in embryonic stem cell DHSs; and (4) we can effectively prioritize causal variation at specific loci. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    The American Journal of Human Genetics 07/2015; 97(1):139-152. DOI:10.1016/j.ajhg.2015.05.016 · 10.99 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The CRISPR/CAS9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens employing CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features, and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent datasets, the model achieved significant results in both positive and negative selection conditions, and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies. Published by Cold Spring Harbor Laboratory Press.
    Genome Research 06/2015; DOI:10.1101/gr.191452.115 · 13.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose the Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. MAGeCK demonstrates better performance compared with existing methods, identifies both positively and negatively selected genes simultaneously, and reports robust results across different experimental conditions. Using public datasets, MAGeCK identified novel essential genes and pathways, including EGFR in vemurafenib-treated A375 cells harboring a BRAF mutation. MAGeCK also detected cell type-specific essential genes, including BCR and ABL1, in KBM7 cells bearing a BCR-ABL fusion, and IGF1R in HL-60 cells, which depends on the insulin signaling pathway for proliferation. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0554-4) contains supplementary material, which is available to authorized users.
    Genome Biology 12/2014; 15(12):554. DOI:10.1186/PREACCEPT-1316450832143458 · 10.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg(2)) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg(2) from imputed SNPs (5.1× enrichment; p = 3.7 × 10(-17)) and 38% (SE = 4%) of hg(2) from genotyped SNPs (1.6× enrichment, p = 1.0 × 10(-4)). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg(2) despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
    The American Journal of Human Genetics 11/2014; 95(5):535-52. DOI:10.1016/j.ajhg.2014.10.004 · 10.99 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. MethylPurify can identify differentially methylated regions (DMRs) from individual tumor methylome samples, without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. From patient data, MethylPurify gave satisfactory DMR calls from tumor methylome samples alone, and revealed potential missed DMRs by tumor to normal comparison due to tumor heterogeneity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0419-x) contains supplementary material, which is available to authorized users.
    Genome Biology 08/2014; 15(8):419. DOI:10.1186/PREACCEPT-9737754001327268 · 10.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequencing of DNase I hypersensitive sites (DNase-seq) is a powerful technique for identifying cis-regulatory elements across the genome. We studied the key experimental parameters to optimize performance of DNase-seq. Sequencing short fragments of 50-100 base pairs (bp) that accumulate in long internucleosome linker regions was more efficient for identifying transcription factor binding sites compared to sequencing longer fragments. We also assessed the potential of DNase-seq to predict transcription factor occupancy via generation of nucleotide-resolution transcription factor footprints. In modeling the sequence-specific DNase I cutting bias, we found a strong effect that varied over more than two orders of magnitude. This indicates that the nucleotide-resolution cleavage patterns at many transcription factor binding sites are derived from intrinsic DNase I cleavage bias rather than from specific protein-DNA interactions. In contrast, quantitative comparison of DNase I hypersensitivity between states can predict transcription factor occupancy associated with particular biological perturbations.
    Nature Methods 12/2013; 11(1). DOI:10.1038/nmeth.2762 · 25.95 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: If trait-associated variants alter regulatory regions, then they should fall within chromatin marks in relevant cell types. However, it is unclear which of the many marks are most useful in defining cell types associated with disease and fine mapping variants. We hypothesized that informative marks are phenotypically cell type specific; that is, SNPs associated with the same trait likely overlap marks in the same cell type. We examined 15 chromatin marks and found that those highlighting active gene regulation were phenotypically cell type specific. Trimethylation of histone H3 at lysine 4 (H3K4me3) was the most phenotypically cell type specific (P < 1 × 10(-6)), driven by colocalization of variants and marks rather than gene proximity (P < 0.001). H3K4me3 peaks overlapped with 37 SNPs for plasma low-density lipoprotein concentration in the liver (P < 7 × 10(-5)), 31 SNPs for rheumatoid arthritis within CD4(+) regulatory T cells (P = 1 × 10(-4)), 67 SNPs for type 2 diabetes in pancreatic islet cells (P = 0.003) and the liver (P = 0.003), and 14 SNPs for neuropsychiatric disease in neuronal tissues (P = 0.007). We show how cell type-specific H3K4me3 peaks can inform the fine mapping of associated SNPs to identify causal variation.
    Nature Genetics 12/2012; 45(2). DOI:10.1038/ng.2504 · 29.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenetic regulators represent a promising new class of therapeutic targets for cancer. Enhancer of zeste homolog 2 (EZH2), a subunit of Polycomb repressive complex 2 (PRC2), silences gene expression via its histone methyltransferase activity. We found that the oncogenic function of EZH2 in cells of castration-resistant prostate cancer is independent of its role as a transcriptional repressor. Instead, it involves the ability of EZH2 to act as a coactivator for critical transcription factors including the androgen receptor. This functional switch is dependent on phosphorylation of EZH2 and requires an intact methyltransferase domain. Hence, targeting the non-PRC2 function of EZH2 may have therapeutic efficacy for treating metastatic, hormone-refractory prostate cancer.
    Science 12/2012; 338(6113):1465-9. DOI:10.1126/science.1227604 · 31.48 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Histone modifications play important roles in regulating eukaryotic gene expression and have been used to model expression levels. Here, we present a regression model to systematically infer mRNA stability by comparing transcriptome profiles with ChIP-seq of H3K4me3, H3K27me3 and H3K36me3. The results from multiple human and mouse cell lines show that the inferred unstable mRNAs have significantly longer 3'Untranslated Regions (UTRs) and more microRNA binding sites within 3'UTR than the inferred stable mRNAs. Regression residuals derived from RNA-seq, but not from GRO-seq, are highly correlated with the half-lives measured by pulse-labeling experiments, supporting the rationale of our inference. Whereas, the functions enriched in the inferred stable and unstable mRNAs are consistent with those from pulse-labeling experiments, we found the unstable mRNAs have higher cell-type specificity under functional constraint. We conclude that the systematical use of histone modifications can differentiate non-expressed mRNAs from unstable mRNAs, and distinguish stable mRNAs from highly expressed ones. In summary, we represent the first computational model of mRNA stability inference that compares transcriptome and epigenome profiles, and provides an alternative strategy for directing experimental measurements.
    Nucleic Acids Research 04/2012; 40(14):6414-23. DOI:10.1093/nar/gks304 · 9.11 Impact Factor

Publication Stats

218 Citations
152.94 Total Impact Points

Institutions

  • 2013–2015
    • Dana-Farber Cancer Institute
      • Department of Biostatistics and Computational Biology
      Boston, Massachusetts, United States
  • 2012–2014
    • Harvard University
      Cambridge, Massachusetts, United States
    • Broad Institute of MIT and Harvard
      • Program in Medical and Population Genetics
      Cambridge, Massachusetts, United States
    • Tongji University
      Shanghai, Shanghai Shi, China