Wei Shi

University of Melbourne, Melbourne, Victoria, Australia

Are you Wei Shi?

Claim your profile

Publications (13)73.37 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Normal linear modeling methods are developed for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation, and then enters these into a limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.
    Genome biology 02/2014; 15(2):R29. · 10.30 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite major advances in neuroscience, a comprehensive understanding of the structural and functional components of the adult brain compartments remains to be fully elucidated at a quantitative molecular level. Indeed, over half of the soluble- and membrane-annotated proteins are currently unmapped within online digital brain atlases. In this study, two complementary approaches were used to assess the unique repertoire of proteins enriched within select regions of the adult mouse central nervous system (CNS), including the brain stem, cerebellum and remaining brain hemispheres. Of the 1200 proteins visualized by 2D-difference image gel electrophoresis (DIGE), approximately 150 (including cytosolic and membrane proteins) were found to exhibit statistically significant changes in relative abundance thus representing putative region-specific brain markers. In addition using a high precision (18) O-labeling strategy for the quantitative LC-MS/MS mapping of membrane proteins isolated from myelin-enriched fractions we have identified over 1000 proteins that have yet to be described in any other mammalian myelin proteome. A comparison of our myelin proteome was made to an existing transcriptome database containing mRNA abundance profiles during oligodendrocyte differentiation and has confirmed statistically significant abundance changes for ∼500 of these newly-mapped proteins thus revealing new roles in oligodendrocyte and myelin biology. These data offer a resource for the neuroscience community studying the molecular basis for specialized neuronal activities in the CNS and myelin-related disorders. The mass spectrometry proteomics data associated with this manuscript have been deposited to the ProteomeXchange Consortium with the dataset identifier PXD000327.
    Proteomics 11/2013; · 4.43 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages. shi@wehi.edu.au.
    Bioinformatics 11/2013; · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Reads generated from the next-generation sequencing technologies are often needed to be assigned to genomic features such as genes, after they were successfully aligned. This process is often called read summarization. Read summarization is required by a number of downstream analyses such as gene expression analysis and histone modification analysis. Current read summarization programs suffer from very high computational cost including large memory consumption and long running time. Here we present featureCounts, a light-weight read summarization program. featureCounts was found to be >15 times faster and use much less memory than the popular methods. It assigns as many, or slightly more reads/fragments than other methods, and it is more powerful in summarizing paired-end read data. It also supports parallel read summarization via multi-threading, enabling even more efficient summarization.
    05/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.
    Nucleic Acids Research 04/2013; · 8.28 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To define genetic lesions driving leukemia, we targeted cre-dependent Sleeping Beauty (SB) transposon mutagenesis to the blood-forming system using a hematopoietic-selective vav 1 oncogene (vav1) promoter. Leukemias of diverse lineages ensued, most commonly lymphoid leukemia and erythroleukemia. The inclusion of a transgenic allele of Janus kinase 2 (JAK2)V617F resulted in acceleration of transposon-driven disease and strong selection for erythroleukemic pathology with transformation of bipotential erythro-megakaryocytic cells. The genes encoding the E-twenty-six (ETS) transcription factors Ets related gene (Erg) and Ets1 were the most common sites for transposon insertion in SB-induced JAK2V617F-positive erythroleukemias, present in 87.5% and 65%, respectively, of independent leukemias examined. The role of activated Erg was validated by reproducing erythroleukemic pathology in mice transplanted with fetal liver cells expressing translocated in liposarcoma (TLS)-ERG, an activated form of ERG found in human leukemia. Via application of SB mutagenesis to TLS-ERG-induced erythroid transformation, we identified multiple loci as likely collaborators with activation of Erg. Jak2 was identified as a common transposon insertion site in TLS-ERG-induced disease, strongly validating the cooperation between JAK2V617F and transposon insertion at the Erg locus in the JAK2V617F-positive leukemias. Moreover, loci expressing other regulators of signal transduction pathways were conspicuous among the common transposon insertion sites in TLS-ERG-driven leukemia, suggesting that a key mechanism in erythroleukemia may be the collaboration of lesions disturbing erythroid maturation, most notably in genes of the ETS family, with mutations that reduce dependence on exogenous signals.
    Proceedings of the National Academy of Sciences 03/2013; · 9.74 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The mammary epithelium is a dynamic, highly hormone-responsive tissue. To explore chromatin modifications underlying its lineage specification and hormone responsiveness, we determined genome-wide histone methylation profiles of mammary epithelial subpopulations in different states. The marked differences in H3K27 trimethylation between subpopulations in the adult gland suggest that epithelial cell-fate decisions are orchestrated by polycomb-complex-mediated repression. Remarkably, the mammary epigenome underwent highly specific changes in different hormonal contexts, with a profound change being observed in the global H3K27me3 map of luminal cells during pregnancy. We therefore examined the role of the key H3K27 methyltransferase Ezh2 in mammary physiology. Its expression and phosphorylation coincided with H3K27me3 modifications and peaked during pregnancy, driven in part by progesterone. Targeted deletion of Ezh2 impaired alveologenesis during pregnancy, preventing lactation, and drastically reduced stem/progenitor cell numbers. Taken together, these findings reveal that Ezh2 couples hormonal stimuli to epigenetic changes that underpin progenitor activity, lineage specificity, and alveolar expansion in the mammary gland.
    Cell reports. 01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered.
    PLoS Computational Biology 12/2011; 7(12):e1002276. · 4.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The transcription factor Gata-3 is a definitive marker of luminal breast cancers and a key regulator of mammary morphogenesis. Here we have explored a role for Gata-3 in tumor initiation and the underlying cellular mechanisms using a mouse model of "luminal-like" cancer. Loss of a single Gata-3 allele markedly accelerated tumor progression in mice carrying the mouse mammary tumor virus promoter-driven polyomavirus middle T antigen (MMTV-PyMT mice), while overexpression of Gata-3 curtailed tumorigenesis. Through the identification of two distinct luminal progenitor cells in the mammary gland, we demonstrate that Gata-3 haplo-insufficiency increases the tumor-initiating capacity of these progenitors but not the stem cell-enriched population. Overexpression of a conditional Gata-3 transgene in the PyMT model promoted cellular differentiation and led to reduced tumor-initiating capacity as well as diminished angiogenesis. Transcript profiling studies identified caspase-14 as a novel downstream target of Gata-3, in keeping with its roles in differentiation and tumorigenesis. A strong association was evident between GATA-3 and caspase-14 expression in preinvasive ductal carcinoma in situ samples, where GATA-3 also displayed prognostic significance. Overall, these studies identify GATA-3 as an important regulator of tumor initiation through its ability to promote the differentiation of committed luminal progenitor cells.
    Molecular and cellular biology 09/2011; 31(22):4609-22. · 6.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: 1. Aliskiren is a renin inhibitor with an IC(50) of 0.6 nmol/L for human renin, 4.5 nmol/L for mouse renin and 80 nmol/L for rat renin. 2. In the present study, we compared the effects of aliskiren (10 mg/kg per day), the angiotensin-converting enzyme inhibitor perindopril (0.2 mg/kg per day) and their combination on angiotensin and bradykinin peptides in female heterozygous (mRen-2)27 rats, transgenic for the mouse renin gene. 3. All three treatments produced similar reductions in systolic blood pressure, heart weight and plasma aldosterone levels and reduced angiotensin II levels in lung, but only perindopril and the combination reduced angiotensin II levels in kidney of (mRen-2)27 rats. In contrast, aliskiren and the combination, but not perindopril alone, increased cardiac bradykinin levels. Aliskiren increased immunostaining for tissue kallikrein in the heart and reduced cardiac fibrosis. 4. We investigated the mechanism underlying the increase in bradykinin levels following aliskiren treatment in Sprague-Dawley rats, in which aliskiren has a lower potency for renin inhibition. Aliskiren (10 mg/kg per day) reduced renal angiotensin levels within 24 h, but treatment for > 24 h was required to increase cardiac bradykinin levels. Moreover, 3 mg/kg per day aliskiren increased cardiac bradykinin levels, but did not reduce renal angiotensin levels. Aliskiren did not potentiate the hypotensive effects of bradykinin; however, it increased tissue kallikrein, but not plasma kallikrein, mRNA levels in the heart. 5. These data demonstrate that the aliskiren-induced increase in cardiac bradykinin levels is independent of renin inhibition and changes in bradykinin metabolism, but is associated with increased tissue kallikrein gene expression.
    Clinical and Experimental Pharmacology and Physiology 07/2011; 38(9):623-31. · 2.16 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Three surface molecules of mouse CD8(+) dendritic cells (DCs), also found on the equivalent human DC subpopulation, were compared as targets for Ab-mediated delivery of Ags, a developing strategy for vaccination. For the production of cytotoxic T cells, DEC-205 and Clec9A, but not Clec12A, were effective targets, although only in the presence of adjuvants. For Ab production, however, Clec9A excelled as a target, even in the absence of adjuvant. Potent humoral immunity was a result of the highly specific expression of Clec9A on DCs, which allowed longer residence of targeting Abs in the bloodstream, prolonged DC Ag presentation, and extended CD4 T cell proliferation, all of which drove highly efficient development of follicular helper T cells. Because Clec9A shows a similar expression pattern on human DCs, it has particular promise as a target for vaccines of human application.
    The Journal of Immunology 06/2011; 187(2):842-50. · 5.52 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Five strategies for pre-processing intensities from Illumina expression BeadChips are assessed from the point of view of precision and bias. The strategies include a popular variance stabilizing transformation and model-based background corrections that either use or ignore the control probes. Four calibration data sets are used to evaluate precision, bias and false discovery rate (FDR). The original algorithms are shown to have operating characteristics that are not easily comparable. Some tend to minimize noise while others minimize bias. Each original algorithm is shown to have an innate intensity offset, by which unlogged intensities are bounded away from zero, and the size of this offset determines its position on the noise-bias spectrum. By adding extra offsets, a continuum of related algorithms with different noise-bias trade-offs is generated, allowing direct comparison of the performance of the strategies on equivalent terms. Adding a positive offset is shown to decrease the FDR of each original algorithm. The potential of each strategy to generate an algorithm with an optimal noise-bias trade-off is explored by finding the offset that minimizes its FDR. The use of control probes as part of the background correction and normalization strategy is shown to achieve the lowest FDR for a given bias.
    Nucleic Acids Research 10/2010; 38(22):e204. · 8.28 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A fundamental question in microarray analysis is the estimation of the number of expressed probes in different RNA samples. Negative control probes available in the latest microarray platforms, such as Illumina whole genome expression BeadChips, provide a unique opportunity to estimate the number of expressed probes without setting a threshold. A novel algorithm was proposed in this study to estimate the number of expressed probes in an RNA sample by utilizing these negative controls to measure background noise. The performance of the algorithm was demonstrated by comparing different generations of Illumina BeadChips, comparing the set of probes targeting well-characterized RefSeq NM transcripts with other probes on the array and comparing pure samples with heterogenous samples. Furthermore, hematopoietic stem cells were found to have a larger transcriptome than progenitor cells. Aire knockout medullary thymic epithelial cells were shown to have significantly less expressed probes than matched wild-type cells.
    Nucleic Acids Research 04/2010; 38(7):2168-76. · 8.28 Impact Factor