Hongzhe Li

University of Pennsylvania, Filadelfia, Pennsylvania, United States

Are you Hongzhe Li?

Claim your profile

Publications (124)776.26 Total impact

  • Source
    T. Tony Cai · Hongzhe Li · Weidong Liu · Jichun Xie

    Preview · Article · Jan 2016 · Statistica Sinica
  • Jichun Xie · T. Tony Cai · Hongzhe Li

    No preview · Article · Jan 2016 · Statistics and its interface
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abnormal composition of intestinal bacteria-"dysbiosis"-is characteristic of Crohn's disease. Disease treatments include dietary changes and immunosuppressive anti-TNFα antibodies as well as ancillary antibiotic therapy, but their effects on microbiota composition are undetermined. Using shotgun metagenomic sequencing, we analyzed fecal samples from a prospective cohort of pediatric Crohn's disease patients starting therapy with enteral nutrition or anti-TNFα antibodies and reveal the full complement and dynamics of bacteria, fungi, archaea, and viruses during treatment. Bacterial community membership was associated independently with intestinal inflammation, antibiotic use, and therapy. Antibiotic exposure was associated with increased dysbiosis, whereas dysbiosis decreased with reduced intestinal inflammation. Fungal proportions increased with disease and antibiotic use. Dietary therapy had independent and rapid effects on microbiota composition distinct from other stressor-induced changes and effectively reduced inflammation. These findings reveal that dysbiosis results from independent effects of inflammation, diet, and antibiotics and shed light on Crohn disease treatments.
    No preview · Article · Oct 2015 · Cell host & microbe
  • Li Chen · Han Liu · Jean-Pierre A Kocher · Hongzhe Li · Jun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: One central theme of modern high-throughput genomic data analysis is to identify relevant genomic features as well as build up a predictive model based on selected features for various tasks such as personalized medicine. Correlating the large number of ‘omics’ features with a certain phenotype is particularly challenging due to small sample size (n) and high dimensionality (p). To address this small n, large p problem, various forms of sparse regression models have been proposed by exploiting the sparsity assumption. Among these, network-constrained sparse regression model is of particular interest due to its ability to utilize the prior graph/network structure in the omics data. Despite its potential usefulness for omics data analysis, no efficient R implementation is publicly available. Here we present an R software package ‘glmgraph’ that implements the graph-constrained regularization for both sparse linear regression and sparse logistic regression. We implement both the L1 penalty and minimax concave penalty for variable selection and Laplacian penalty for coefficient smoothing. Efficient coordinate descent algorithm is used to solve the optimization problem. We demonstrate the use of the package by applying it to a human microbiome dataset, where phylogeny structure among bacterial taxa is available. Availability and implementation: ‘glmgraph’ is implemented in R and C++ Armadillo and publicly available under CRAN. Contact: chen.jun2{at}mayo.edu or hongzhe{at}upenn.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    No preview · Article · Aug 2015 · Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association studies (GWASs) have identified hundreds of susceptibility genes, including shared associations across clinically distinct autoimmune diseases. We performed an inverse χ(2) meta-analysis across ten pediatric-age-of-onset autoimmune diseases (pAIDs) in a case-control study including more than 6,035 cases and 10,718 shared population-based controls. We identified 27 genome-wide significant loci associated with one or more pAIDs, mapping to in silico-replicated autoimmune-associated genes (including IL2RA) and new candidate loci with established immunoregulatory functions such as ADGRL2, TENM3, ANKRD30A, ADCY7 and CD40LG. The pAID-associated single-nucleotide polymorphisms (SNPs) were functionally enriched for deoxyribonuclease (DNase)-hypersensitivity sites, expression quantitative trait loci (eQTLs), microRNA (miRNA)-binding sites and coding variants. We also identified biologically correlated, pAID-associated candidate gene sets on the basis of immune cell expression profiling and found evidence of genetic sharing. Network and protein-interaction analyses demonstrated converging roles for the signaling pathways of type 1, 2 and 17 helper T cells (TH1, TH2 and TH17), JAK-STAT, interferon and interleukin in multiple autoimmune diseases.
    Full-text · Article · Aug 2015 · Nature medicine
  • Source
    Jessie Jeng · Qian Wu · Hongzhe Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number variants (CNVs), ranging in size from about one kilobase to several megabases, are DNA alterations of a genome that result in the cell having less or more than two copies of segments of the DNA. Such CNVs have been shown to be associated with many complex phenotypes, ranging from diseases to gene expressions. Novel methods have been developed for identifying CNVs both at the individual and at the population level. However, methods for testing CNV association are limited. Most available methods employ a two-step approach, where CNVs carried by the samples are identified first and then tested for association. However, the results of such tests depend on the threshold used for CNV identification and also the number of CNVs to be tested. We developed a method, CNVtest, to directly identify the trait-associated CNVs without the need of identifying sample-specific CNVs. We show that CNVtest asymptotically controls the type I error rate and identifies true trait-associated CNVs with a high probability. We demonstrate the methods using simulations and an application to identify the CNVs that are associated with population differentiation. © 2015 S. Karger AG, Basel.
    Full-text · Article · Jul 2015 · Human Heredity
  • [Show abstract] [Hide abstract]
    ABSTRACT: Therapeutic targets in pediatric Crohn's disease include symptoms, quality of life (QOL), and mucosal healing. Although partial enteral nutrition (PEN), exclusive enteral nutritional (EEN), and anti-tumor necrosis factor alpha (anti-TNF) therapy all improve symptoms, the comparative effectiveness of these approaches to improve QOL and achieve mucosal healing has not been assessed prospectively. In a prospective study of children initiating PEN, EEN, or anti-TNF therapy for Crohn's disease, we compared clinical outcomes using the Pediatric Crohn's Disease Activity Index (PCDAI), QOL (IMPACT score), and mucosal healing as estimated by fecal calprotectin (FCP). PCDAI, IMPACT, FCP, and diet (prompted 24-h recall) were measured at baseline and after 8 weeks of therapy. We enrolled 90 children with active Crohn's disease (PCDAI, 33.7 ± 13.7; and FCP, 976 ± 754), of whom 52 were treated with anti-TNF, 22 with EEN, and 16 with PEN plus ad lib diet. Clinical response (PCDAI reduction ≥15 or final PCDAI ≤10) was achieved by 64% on PEN, 88% EEN, and 84% anti-TNF (test for trend P = 0.08). FCP ≤250 μg/g was achieved with PEN in 14%, EEN 45%, and anti-TNF 62% (test for trend P = 0.001). Improvement in overall QOL was not statistically significantly different between the 3 groups (P = 0.86). However, QOL improvement was the greatest with EEN in the body image (P = 0.03) domain and with anti-TNF in the emotional domain (P = 0.04). Although PEN improved clinical symptoms, EEN and anti-TNF were more effective for decreasing mucosal inflammation and improving specific aspects of QOL.
    No preview · Article · May 2015 · Inflammatory Bowel Diseases
  • Hongzhe Li
    [Show abstract] [Hide abstract]
    ABSTRACT: The human microbiome is the totality of all microbes in and on the human body, and its importance in health and disease has been increasingly recognized. High-throughput sequencing technologies have recently enabled scientists to obtain an unbiased quantification of all microbes constituting the microbiome. Often, a single sample can produce hundreds of millions of short sequencing reads. However, unique characteristics of the data produced by the new technologies, as well as the sheer magnitude of these data, make drawing valid biological inferences from microbiome studies difficult. Analysis of these big data poses great statistical and computational challenges. Important issues include normalization and quantification of relative taxa, bacterial genes, and metabolic abundances; incorporation of phylogenetic information into analysis of metagenomics data; and multivariate analysis of high-dimensional compositional data. We review existing methods, point out their limitations, and outline future research directions.
    No preview · Article · Apr 2015
  • Hokeun Sun · Hongzhe Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Many different biological processes are represented by network graphs such as regulatory networks, metabolic pathways, and protein-protein interaction networks. Since genes that are linked on the networks usually have biologically similar functions, the linked genes form molecular modules to affect the clinical phenotypes/outcomes. Similarly, in large-scale genetic association studies, many SNPs are in high linkage disequilibrium (LD), which can also be summarized as a LD graph. In order to incorporate the graph information into regression analysis with high dimensional genomic data as predictors, we introduce a Bayesian approach for graph-constrained estimation (Bayesian GRACE) and regularization, which controls the amount of regularization for sparsity and smoothness of the regression coefficients. The Bayesian estimation with their posterior distributions can provide credible intervals for the estimates of the regression coefficients along with standard errors. The deviance information criterion (DIC) is applied for model assessment and tuning parameter selection. The performance of the proposed Bayesian approach is evaluated through simulation studies and is compared with Bayesian Lasso and Bayesian Elastic-net procedures. We demonstrate our method in an analysis of data from a case-control genome-wide association study of neuroblastoma using a weighted LD graph.
    No preview · Article · Mar 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence-absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. Results: We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared ([Formula: see text]). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. Availability and implementation: http://github.com/brendankelly/micropower. Contact: brendank@mail.med.upenn.edu or hongzhe@upenn.edu.
    Preview · Article · Mar 2015 · Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: HSV is a large double stranded DNA virus, capable of causing a variety of diseases from the common cold sore to devastating encephalitis. Although DNA within the HSV virion does not contain any histone protein, within 1 h of infecting a cell and entering its nucleus the viral genome acquires some histone protein (nucleosomes). During lytic infection, partial micrococcal nuclease (MNase) digestion does not give the classic ladder band pattern, seen on digestion of cell DNA or latent viral DNA. However, complete digestion does give a mono-nucleosome band, strongly suggesting that there are some nucleosomes present on the viral genome during the lytic infection, but that they are not evenly positioned, with a 200bp repeat pattern, like cell DNA. Where then are the nucleosomes positioned? Here we perform HSV-1 genome wide nucleosome mapping, at a time when viral replication is in full swing (6hr PI), using a microarray consisting of 50mer oligonucleotides, covering the whole viral genome (152kb). Arrays were probed with MNase-protected fragments of DNA from infected cells. Cells were not treated with crosslinking agents, thus we are only mapping tightly bound nucleosomes. The data show that nucleosome deposition is not random. The distribution of signal on the arrays suggest that nucleosomes are located at preferred positions on the genome, and that there are some positions that are not occupied (nucleosome free regions -NFR or Nucleosome depleted regions -NDR), or occupied at frequency below our limit of detection in the population of genomes. Occupancy of only a fraction of the possible sites may explain the lack of a typical MNase partial digestion band ladder pattern for HSV DNA during lytic infection. On average, DNA encoding Immediate Early (IE), Early (E) and Late (L) genes appear to have a similar density of nucleosomes.
    Full-text · Article · Feb 2015 · PLoS ONE
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Chemokines are implicated in T cell trafficking. We mapped the chemokine landscape in advanced stage ovarian cancer and characterized the expression of cognate receptors in autologous DC-vaccine primed T cells in the context of cell-based immunotherapy. Experimental design: The expression of all known human chemokines in patients with primary ovarian cancer was analyzed on two independent microarray datasets and validated on tissue microarray. Peripheral blood T cells from five HLA-A2 patients with recurrent ovarian cancer, who previously received autologous tumor DC vaccine, underwent CD3/CD28 costimulation and expansion ex vivo. Tumor-specific T cells were identified by HER2/neu pentamer staining and were evaluated for the expression and functionality of chemokine receptors important for homing to ovarian cancer. Results: The chemokine landscape of ovarian cancer is heterogeneous with high expression of known lymphocyte-recruiting chemokines (CCL2, CCL4 and CCL5) in tumors with intraepithelial T cells, whereas CXCL10, CXCL12 and CXCL16 are expressed quasi-universally, including in tumors lacking tumor infiltrating T cells. DC-vaccine primed T cells were found to express the cognate receptors for the above chemokines. Ex vivo CD3/CD28 costimulation and expansion of vaccine-primed T cells upregulated CXCR3 and CXCR4, and enhanced their migration toward universally expressed chemokines in ovarian cancer. Conclusions: DC-primed tumor specific T cells are armed with the appropriate receptors to migrate towards universal ovarian cancer chemokines, and these receptors are further upregulated by ex vivo CD3/CD28 costimulation, which render T cells more fit for migrating towards these chemokines. Copyright © 2015, American Association for Cancer Research.
    No preview · Article · Feb 2015 · Clinical Cancer Research
  • Source
    Qian Wu · Kyoung-Jae Won · Hongzhe Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Chromatin immunoprecipitation sequencing (ChIP-seq) is a powerful method for analyzing protein interactions with DNA. It can be applied to identify the binding sites of transcription factors (TFs) and genomic landscape of histone modification marks (HMs). Previous research has largely focused on developing peak-calling procedures to detect the binding sites for TFs. However, these procedures may fail when applied to ChIP-seq data of HMs, which have diffuse signals and multiple local peaks. In addition, it is important to identify genes with differential histone enrichment regions between two experimental conditions, such as different cellular states or different time points. Parametric methods based on Poisson/negative binomial distribution have been proposed to address this differential enrichment problem and most of these methods require biological replications. However, many ChIP-seq data usually have a few or even no replicates. We propose a nonparametric method to identify the genes with differential histone enrichment regions even without replicates. Our method is based on nonparametric hypothesis testing and kernel smoothing in order to capture the spatial differences in histone-enriched profiles. We demonstrate the method using ChIP-seq data on a comparative epigenomic profiling of adipogenesis of murine adipose stromal cells and the Encyclopedia of DNA Elements (ENCODE) ChIP-seq data. Our method identifies many genes with differential H3K27ac histone enrichment profiles at gene promoter regions between proliferating preadipocytes and mature adipocytes in murine 3T3-L1 cells. The test statistics also correlate with the gene expression changes well and are predictive to gene expression changes, indicating that the identified differentially enriched regions are indeed biologically meaningful.
    Full-text · Article · Jan 2015 · Cancer informatics
  • Source
    Sihai Dave Zhao · T. Tony Cai · Hongzhe Li
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper studies the problem of detecting dependence between two mixture distributions, motivated by questions arising from statistical genomics. The fundamental limits of detecting weak positive dependence are derived and an oracle test statistic is proposed. It is shown that for mixture distributions whose components are stochastically ordered, the oracle test statistic is asymptotically optimal. Connections are drawn between dependency detection and signal detection, where the goal of the latter is to detect the presence of non-null components in a single mixture distribution. It is shown that the oracle test for dependency can also be used as a signal detection procedure in the two-sample setting, and there can achieve detection even when detection using each sample separately is provably impossible. A nonparametric data-adaptive test statistic is then proposed, and its closed-form asymptotic distribution under the null hypothesis of independence is established. Simulations show that the adaptive procedure performs as well as the oracle test statistic, and that both can be more powerful than existing methods. In an application to the analysis of the shared genetic basis of psychiatric disorders, the adaptive test is able to detect genetic relationships not detected by other procedures.
    Preview · Article · Dec 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: The consumption of an agrarian diet is associated with a reduced risk for many diseases associated with a 'Westernised' lifestyle. Studies suggest that diet affects the gut microbiota, which subsequently influences the metabolome, thereby connecting diet, microbiota and health. However, the degree to which diet influences the composition of the gut microbiota is controversial. Murine models and studies comparing the gut microbiota in humans residing in agrarian versus Western societies suggest that the influence is large. To separate global environmental influences from dietary influences, we characterised the gut microbiota and the host metabolome of individuals consuming an agrarian diet in Western society. Using 16S rRNA-tagged sequencing as well as plasma and urinary metabolomic platforms, we compared measures of dietary intake, gut microbiota composition and the plasma metabolome between healthy human vegans and omnivores, sampled in an urban USA environment. Plasma metabolome of vegans differed markedly from omnivores but the gut microbiota was surprisingly similar. Unlike prior studies of individuals living in agrarian societies, higher consumption of fermentable substrate in vegans was not associated with higher levels of faecal short chain fatty acids, a finding confirmed in a 10-day controlled feeding experiment. Similarly, the proportion of vegans capable of producing equol, a soy-based gut microbiota metabolite, was less than that was reported in Asian societies despite the high consumption of soy-based products. Evidently, residence in globally distinct societies helps determine the composition of the gut microbiota that, in turn, influences the production of diet-dependent gut microbial metabolites. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
    No preview · Article · Nov 2014 · Gut
  • Wei Lin · Pixu Shi · Rui Feng · Hongzhe Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivated by research problems arising in the analysis of gut microbiome and metagenomic data, we consider variable selection and estimation in high-dimensional regression with compositional covariates. We propose an l(1) regularization method for the linear log-contrast model that respects the unique features of compositional data. We formulate the proposed procedure as a constrained convex optimization problem and introduce a coordinate descent method of multipliers for efficient computation. In the high-dimensional setting where the dimensionality grows at most exponentially with the sample size, model selection consistency and l(infinity) bounds for the resulting estimator are established under conditions that are mild and interpretable for compositional data. The numerical performance of our method is evaluated via simulation studies and its usefulness is illustrated by an application to a microbiome study relating human body mass index to gut microbiome composition.
    No preview · Article · Nov 2014 · Biometrika
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In a genome-wide survey on somatic copy-number alterations (SCNAs) of long noncoding RNA (lncRNA) in 2,394 tumor specimens from 12 cancer types, we found that about 21.8% of lncRNA genes were located in regions with focal SCNAs. By integrating bioinformatics analyses of lncRNA SCNAs and expression with functional screening assays, we identified an oncogene, focally amplified lncRNA on chromosome 1 (FAL1), whose copy number and expression are correlated with outcomes in ovarian cancer. FAL1 associates with the epigenetic repressor BMI1 and regulates its stability in order to modulate the transcription of a number of genes including CDKN1A. The oncogenic activity of FAL1 is partially attributable to its repression of p21. FAL1-specific siRNAs significantly inhibit tumor growth in vivo.
    Full-text · Article · Sep 2014 · Cancer Cell
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background & aims: The gut microbiota is a complex and densely populated community in a dynamic environment determined by host physiology. We investigated how intestinal oxygen levels affect the composition of the fecal and mucosally adherent microbiota. Methods: We used the phosphorescence quenching method and a specially designed intraluminal oxygen probe to dynamically quantify gut luminal oxygen levels in mice. 16S ribosomal RNA gene sequencing was used to characterize the microbiota in intestines of mice exposed to hyperbaric oxygen, human rectal biopsy and mucosal swab samples, and paired human stool samples. Results: Average Po2 values in the lumen of the cecum were extremely low (<1 mm Hg). In altering oxygenation of mouse intestines, we observed that oxygen diffused from intestinal tissue and established a radial gradient that extended from the tissue interface into the lumen. Increasing tissue oxygenation with hyperbaric oxygen altered the composition of the gut microbiota in mice. In human beings, 16S ribosomal RNA gene analyses showed an increased proportion of oxygen-tolerant organisms of the Proteobacteria and Actinobacteria phyla associated with rectal mucosa, compared with feces. A consortium of asaccharolytic bacteria of the Firmicute and Bacteroidetes phyla, which primarily metabolize peptones and amino acids, was associated primarily with mucus. This could be owing to the presence of proteinaceous substrates provided by mucus and the shedding of the intestinal epithelium. Conclusions: In an analysis of intestinal microbiota of mice and human beings, we observed a radial gradient of microbes linked to the distribution of oxygen and nutrients provided by host tissue.
    Full-text · Article · Jul 2014 · Gastroenterology
  • Shiyuan He · Jianxin Yin · Hongzhe Li · Xing Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-way tensor data are prevalent in many scientific areas such as genomics and biomedical imaging. We consider a KK-way tensor-normal distribution, where the precision matrix for each way has a graphical interpretation. We develop an l1l1 penalized maximum likelihood estimation and an efficient coordinate descent-based algorithm for model selection and estimation in such tensor normal graphical models. When the dimensions of the tensor are fixed, we drive the asymptotic distributions and oracle property for the proposed estimates of the precision matrices. When the dimensions diverge as the sample size goes to infinity, we present the rates of convergence of the estimates and sparsistency results. Simulation results demonstrate that the the proposed estimation procedure can lead to better estimates of the precision matrices and better identifications of the graph structures defined by the precision matrices than the standard Gaussian graphical models. We illustrate the methods with an analysis of yeast gene expression data measured over different time points and under different experimental conditions.
    No preview · Article · Jul 2014 · Journal of Multivariate Analysis
  • Sihai D Zhao · T Tony Cai · Hongzhe Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Integrative genomics offers a promising approach to more powerful genetic association studies. The hope is that combining outcome and genotype data with other types of genomic information can lead to more powerful SNP detection. We present a new association test based on a statistical model that explicitly assumes that genetic variations affect the outcome through perturbing gene expression levels. It is shown analytically that the proposed approach can have more power to detect SNPs that are associated with the outcome through transcriptional regulation, compared to tests using the outcome and genotype data alone, and simulations show that our method is relatively robust to misspecification. We also provide a strategy for applying our approach to high-dimensional genomic data. We use this strategy to identify a potentially new association between a SNP and a yeast cell's response to the natural product tomatidine, which standard association analysis did not detect.
    No preview · Article · Jun 2014 · Biometrics

Publication Stats

5k Citations
776.26 Total Impact Points

Institutions

  • 2008-2015
    • University of Pennsylvania
      • • Department of Biostatistics and Epidemiology
      • • Center for Clinical Epidemiology and Biostatistics
      Filadelfia, Pennsylvania, United States
  • 2006-2015
    • William Penn University
      Filadelfia, Pennsylvania, United States
  • 2013
    • Peking University
      Peping, Beijing, China
    • Renmin University of China
      Peping, Beijing, China
  • 2011
    • University of Rochester
      • Department of Biostatistics and Computational Biology
      Rochester, New York, United States
  • 2009
    • The Children's Hospital of Philadelphia
      Filadelfia, Pennsylvania, United States
  • 2000-2005
    • University of California, Davis
      • • Area of Chemical Biology
      • • School of Medicine
      Davis, California, United States
  • 2004
    • Davis School District
      Davis, California, United States