Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells.

Statistics, University of California, Los Angeles, CA 90095, USA.
BMC Genomics (Impact Factor: 4.04). 08/2009; 10:327. DOI: 10.1186/1471-2164-10-327
Source: PubMed

ABSTRACT Recent work has revealed that a core group of transcription factors (TFs) regulates the key characteristics of embryonic stem (ES) cells: pluripotency and self-renewal. Current efforts focus on identifying genes that play important roles in maintaining pluripotency and self-renewal in ES cells and aim to understand the interactions among these genes. To that end, we investigated the use of unsigned and signed network analysis to identify pluripotency and differentiation related genes.
We show that signed networks provide a better systems level understanding of the regulatory mechanisms of ES cells than unsigned networks, using two independent murine ES cell expression data sets. Specifically, using signed weighted gene co-expression network analysis (WGCNA), we found a pluripotency module and a differentiation module, which are not identified in unsigned networks. We confirmed the importance of these modules by incorporating genome-wide TF binding data for key ES cell regulators. Interestingly, we find that the pluripotency module is enriched with genes related to DNA damage repair and mitochondrial function in addition to transcriptional regulation. Using a connectivity measure of module membership, we not only identify known regulators of ES cells but also show that Mrpl15, Msh6, Nrf1, Nup133, Ppif, Rbpj, Sh3gl2, and Zfp39, among other genes, have important roles in maintaining ES cell pluripotency and self-renewal. We also report highly significant relationships between module membership and epigenetic modifications (histone modifications and promoter CpG methylation status), which are known to play a role in controlling gene expression during ES cell self-renewal and differentiation.
Our systems biologic re-analysis of gene expression, transcription factor binding, epigenetic and gene ontology data provides a novel integrative view of ES cell biology.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose for the first time to divide histone proteolysis into “histone degradation” and the epigenetically connoted “histone clipping”. Our initial observation is that these two different classes are very hard to distinguish both experimentally and biologically, because they can both be mediated by the same enzymes. Since the first report decades ago, proteolysis has been found in a broad spectrum of eukaryotic organisms. However, the authors often not clearly distinguish or determine whether degradation or clipping was studied. Given the importance of histone modifications in epigenetic regulation we further elaborate on the different ways in which histone proteolysis could play a role in epigenetics. Finally, unanticipated histone proteolysis has probably left a mark on many studies of histones in the past. In conclusion, we emphasize the significance of reviving the study of histone proteolysis both from a biological and an experimental perspective.Also watch the Video Abstract.
    BioEssays 10/2014; DOI:10.1002/bies.201400118 · 4.84 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We develop an iterative subsampling approach to improve the computational efficiency of our previous work on solution path clustering (SPC). The SPC method achieves clustering by concave regularization on the pairwise distances between cluster centers. This clustering method has the important capability to recognize noise and to provide a short path of clustering solutions; however, it is not sufficiently fast for big datasets. Thus, we propose a method that iterates between clustering a small subsample of the full data and sequentially assigning the other data points to attain orders of magnitude of computational savings. The new method preserves the ability to isolate noise, includes a solution selection mechanism that ultimately provides one clustering solution with an estimated number of clusters, and is shown to be able to extract small tight clusters from noisy data. The method's relatively minor losses in accuracy are demonstrated through simulation studies, and its ability to handle large datasets is illustrated through applications to gene expression datasets.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Differential coexpression analysis usually requires the definition of 'distance' or 'similarity' between measured datasets. Until now, the most common choice is Pearson correlation coefficient. However, Pearson correlation coefficient is sensitive to outliers. Biweight midcorrelation is considered to be a good alternative to Pearson correlation since it is more robust to outliers. In this paper, we introduce to use Biweight Midcorrelation to measure 'similarity' between gene expression profiles, and provide a new approach for gene differential coexpression analysis. Firstly, we calculate the biweight midcorrelation coefficients between all gene pairs. Then, we filter out non-informative correlation pairs using the 'half-thresholding' strategy and calculate the differential coexpression value of gene, The experimental results on simulated data show that the new approach performed better than three previously published differential coexpression analysis (DCEA) methods. Moreover, we use the maximum clique analysis to gene subset included genes identified by our approach and previously reported T2D-related genes, many additional discoveries can be found through our method.
    BMC Bioinformatics 12/2014; 15 Suppl 15:S3. DOI:10.1186/1471-2105-15-S15-S3 · 2.67 Impact Factor


Available from