Peter Langfelder

CSU Mentor, Long Beach, CA, USA

Are you Peter Langfelder?

Claim your profile

Publications (26)95.37 Total impact

  • Article: Cluster and propensity based approximation of a network.
    [show abstract] [hide abstract]
    ABSTRACT: Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets.Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM).Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraphnetwork methods, b) it improves likelihood based significance tests for edge counts, c) it directlymodels higher-order relationships between clusters, and d) it suggests novel clustering algorithms.The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R packagePropClust.
    BMC Systems Biology 03/2013; 7(1):21. · 3.15 Impact Factor
  • Article: Random generalized linear model: a highly accurate and interpretable ensemble predictor.
    Lin Song, Peter Langfelder, Steve Horvath
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: Ensemble predictors such as the random forest are known to have superior accuracy but their black-boxpredictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretableespecially when forward feature selection is used to construct the model. However, forward feature selectiontends to overfit the data and leads to low predictive accuracy. Therefore, it remains an important research goalto combine the advantages of ensemble predictors (high accuracy) with the advantages of forward regressionmodeling (interpretability). To address this goal several articles have explored GLM based ensemblepredictors. Since limited evaluations suggested that these ensemble predictors were less accurate thanalternative predictors, they have found little attention in the literature. RESULTS: Comprehensive evaluations involving hundreds of genomic data sets, the UCI machine learning benchmarkdata, and simulations are used to give GLM based ensemble predictors a new and careful look. A novelbootstrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability(random subspace method, optional interaction terms, forward variable selection) often outperforms a host ofalternative prediction methods including random forests and penalized regression models (ridge regression,elastic net, lasso). This random generalized linear model (RGLM) predictor provides variable importancemeasures that can be used to define a "thinned" ensemble predictor (involving few features) that retainsexcellent predictive accuracy. CONCLUSION: RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictiveaccuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selectedgeneralized linear model (interpretability). These methods are implemented in the freely available R softwarepackage randomGLM.
    BMC Bioinformatics 01/2013; 14(1):5. · 2.75 Impact Factor
  • Article: When is hub gene selection better than standard meta-analysis?
    Peter Langfelder, Paul S Mischel, Steve Horvath
    [show abstract] [hide abstract]
    ABSTRACT: Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.
    PLoS ONE 01/2013; 8(4):e61505. · 4.09 Impact Factor
  • Article: Comparison of co-expression measures: mutual information, correlation, and model based indices.
    Lin Song, Peter Langfelder, Steve Horvath
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). RESULTS: We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. CONCLUSIONS: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.
    BMC Bioinformatics 12/2012; 13(1):328. · 2.75 Impact Factor
  • Article: Genetic analysis of DNA methylation and gene expression levels in whole blood of healthy human subjects.
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: The predominant model for regulation of gene expression through DNA methylation is an inverse association in which increased methylation results in decreased gene expression levels. However, recent studies suggest that the relationship between genetic variation, DNA methylation and expression is more complex. RESULTS: Systems genetic approaches for examining relationships between gene expression and methylation array data were used to find both negative and positive associations between these levels. A weighted correlation network analysis reveals that i) both transcriptome and methylome are organized in modules, ii) co-expression modules are generally not preserved in the methylation data and vice-versa, and iii) highly significant correlations exist between co-expression and co-methylation modules, suggesting the existence of factors that affect expression and methylation of different modules (i.e., trans effects at the level of modules). We observed that methylation probes associated with expression in cis were more likely to be located outside CpG islands, whereas specificity for CpG island shores was present when methylation, associated with expression, was under local genetic control. A structural equation model based analysis found strong support in particular for a traditional causal model in which gene expression is regulated by genetic variation via DNA methylation instead of gene expression affecting DNA methylation levels. CONCLUSIONS: Our results provide new insights into the complex mechanisms between genetic markers, epigenetic mechanisms and gene expression. We find strong support for the classical model of genetic variants regulating methylation, which in turn regulates gene expression. Moreover we show that, although the methylation and expression modules differ, they are highly correlated.
    BMC Genomics 11/2012; 13(1):636. · 4.07 Impact Factor
  • Article: Aging effects on DNA methylation modules in human brain and blood tissue.
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: Several recent studies reported aging effects on DNA methylation levels of individual CpG dinucleotides. But it is not yet known whether aging related consensus modules, in the form of clusters of correlated CpG markers, can be found that are present in multiple human tissues. Such a module could facilitate the understanding of aging effects on multiple tissues. RESULTS: We therefore employed weighted correlation network analysis of 2,442 Illumina DNA methylation arrays from brain and blood tissues, which enabled the identification of an age-related co-methylation module. Module preservation analysis confirmed that this module can also be found in diverse independent data sets. Biological evaluation showed that module membership is associated with Polycomb group target occupancy counts, CpG island status and autosomal chromosome location. Functional enrichment analysis revealed that the aging related consensus module comprises genes that are involved in nervous system development, neuron differentiation and neurogenesis, and that it contains promoter CpGs of genes known to be down-regulated in early Alzheimer's disease. A comparison with a standard, non-module based meta-analysis revealed that selecting CpGs based on module membership leads to significantly increased gene ontology enrichment, thus demonstrating that studying aging effects via consensus network analysis enhances the biological insights gained. CONCLUSIONS: Overall, our analysis revealed a robustly defined age-related co-methylation module that is present in multiple human tissues, including blood and brain. We conclude that blood is a promising surrogate for brain tissue when studying the effects of age on DNA methylation profiles.
    Genome biology 10/2012; 13(10):R97. · 6.63 Impact Factor
  • Article: Network methods for describing sample relationships in genomic datasets: application to Huntington's disease.
    Michael C Oldham, Peter Langfelder, Steve Horvath
    [show abstract] [hide abstract]
    ABSTRACT: Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis. Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington's disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes. These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks.
    BMC Systems Biology 06/2012; 6:63. · 3.15 Impact Factor
  • Article: Fast R Functions for Robust Correlations and Hierarchical Clustering.
    Peter Langfelder, Steve Horvath
    [show abstract] [hide abstract]
    ABSTRACT: Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.The hierarchical clustering algorithm implemented in R function hclust is an order n(3) (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n(2), leading to substantial time savings when clustering large data sets.
    Journal of statistical software 03/2012; 46(11). · 4.01 Impact Factor
  • Source
    Article: Strategies for aggregating gene expression data: the collapseRows R function.
    [show abstract] [hide abstract]
    ABSTRACT: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.
    BMC Bioinformatics 08/2011; 12:322. · 2.75 Impact Factor
  • Article: A systems genetic analysis of high density lipoprotein metabolism and network preservation across mouse models
    [show abstract] [hide abstract]
    ABSTRACT: We report a systems genetic analysis of high density lipoprotein (HDL) levels in an F2 intercross between inbred strains CAST/EiJ and C57BL/6J. We previously showed that there are dramatic differences in HDL metabolism in a cross between these strains, and we now report co-expression network analysis of HDL that integrates global expression data from liver and adipose with relevant metabolic traits. Using data from a total of 293 F2 intercross mice, we constructed weighted gene co-expression networks and identified modules (subnetworks) associated with HDL and clinical traits. These were examined for genes implicated in HDL levels based on large human genome-wide associations studies (GWAS) and examined with respect to conservation between tissue and sexes in a total of 9 data sets. We identify genes that are consistently ranked high by association with HDL across the 9 data sets. We focus in particular on two genes, Wfdc2 and Hdac3, that are located in close proximity to HDL QTL peaks where causal testing indicates that they may affect HDL. Our results provide a rich resource for studies of complex metabolic interactions involving HDL. This article is part of a Special Issue entitled Advances in High Density Lipoprotein Formation and Metabolism: A Tribute to John F. Oram (1945–2010).Highlights► We investigate genetic factors affecting HDL in a CASTxB6 F2 mouse cross. ► Network analysis identifies gene co-expression modules associated with HDL. ► Studies across independent data sets confirm robustness of identified modules. ► Using meta-analysis techniques we identify genes consistently associated with HDL. ► Causal testing implicates Wfdc2 and Hdac3 as novel genes affecting HDL levels.
    Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids 07/2011; 1821(3):435-447. · 5.27 Impact Factor
  • Source
    Article: Gene networks associated with conditional fear in mice identified using a systems genetics approach.
    [show abstract] [hide abstract]
    ABSTRACT: Our understanding of the genetic basis of learning and memory remains shrouded in mystery. To explore the genetic networks governing the biology of conditional fear, we used a systems genetics approach to analyze a hybrid mouse diversity panel (HMDP) with high mapping resolution. A total of 27 behavioral quantitative trait loci were mapped with a false discovery rate of 5%. By integrating fear phenotypes, transcript profiling data from hippocampus and striatum and also genotype information, two gene co-expression networks correlated with context-dependent immobility were identified. We prioritized the key markers and genes in these pathways using intramodular connectivity measures and structural equation modeling. Highly connected genes in the context fear modules included Psmd6, Ube2a and Usp33, suggesting an important role for ubiquitination in learning and memory. In addition, we surveyed the architecture of brain transcript regulation and demonstrated preservation of gene co-expression modules in hippocampus and striatum, while also highlighting important differences. Rps15a, Kif3a, Stard7, 6330503K22RIK, and Plvap were among the individual genes whose transcript abundance were strongly associated with fear phenotypes. Application of our multi-faceted mapping strategy permits an increasingly detailed characterization of the genetic networks underlying behavior.
    BMC Systems Biology 03/2011; 5:43. · 3.15 Impact Factor
  • Article: Gene coexpression network topology of cardiac development, hypertrophy, and failure.
    [show abstract] [hide abstract]
    ABSTRACT: Network analysis techniques allow a more accurate reflection of underlying systems biology to be realized than traditional unidimensional molecular biology approaches. Using gene coexpression network analysis, we define the gene expression network topology of cardiac hypertrophy and failure and the extent of recapitulation of fetal gene expression programs in failing and hypertrophied adult myocardium. We assembled all myocardial transcript data in the Gene Expression Omnibus (n=1617). Because hierarchical analysis revealed species had primacy over disease clustering, we focused this analysis on the most complete (murine) dataset (n=478). Using gene coexpression network analysis, we derived functional modules, regulatory mediators, and higher-order topological relationships between genes and identified 50 gene coexpression modules in developing myocardium that were not present in normal adult tissue. We found that known gene expression markers of myocardial adaptation were members of upregulated modules but not hub genes. We identified ZIC2 as a novel transcription factor associated with coexpression modules common to developing and failing myocardium. Of 50 fetal gene coexpression modules, 3 (6%) were reproduced in hypertrophied myocardium and 7 (14%) were reproduced in failing myocardium. One fetal module was common to both failing and hypertrophied myocardium. Network modeling allows systems analysis of cardiovascular development and disease. Although we did not find evidence for a global coordinated program of fetal gene expression in adult myocardial adaptation, our analysis revealed specific gene expression modules active during both development and disease and specific candidates for their regulation.
    Circulation Cardiovascular Genetics 02/2011; 4(1):26-35. · 6.11 Impact Factor
  • Source
    Article: Is my network module preserved and reproducible?
    Peter Langfelder, Rui Luo, Michael C Oldham, Steve Horvath
    [show abstract] [hide abstract]
    ABSTRACT: In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation.
    PLoS Computational Biology 01/2011; 7(1):e1001057. · 5.22 Impact Factor
  • Chapter: Review of Weighted Gene Coexpression Network Analysis
    [show abstract] [hide abstract]
    ABSTRACT: We survey key concepts of weighted gene coexpression network analysis (WGCNA), also known as weighted correlation network analysis, and related data analysis strategies. We describe the construction of a weighted gene coexpression network from gene expression data, identification of network modules and integration of external data such as gene ontology information and clinical phenotype data. We review Differential Weighted Gene Coexpression Network Analysis (DWGCNA), a method for comparing and contrasting networks constructed from qualitatively different groups of samples. DWGCNA provides a means for measuring not only differential expression but also differential connectivity. Further, we show how to incorporate genetic marker data with expression data via Integrated Weighted Gene Coexpression Network Analysis (IWGCNA). Lastly, we describe R software implementing WGCNA methods.
    12/2010: pages 369-388;
  • Source
    Article: Is human blood a good surrogate for brain tissue in transcriptional studies?
    [show abstract] [hide abstract]
    ABSTRACT: Since human brain tissue is often unavailable for transcriptional profiling studies, blood expression data is frequently used as a substitute. The underlying hypothesis in such studies is that genes expressed in brain tissue leave a transcriptional footprint in blood. We tested this hypothesis by relating three human brain expression data sets (from cortex, cerebellum and caudate nucleus) to two large human blood expression data sets (comprised of 1463 individuals). We found mean expression levels were weakly correlated between the brain and blood data (r range: [0.24,0.32]). Further, we tested whether co-expression relationships were preserved between the three brain regions and blood. Only a handful of brain co-expression modules showed strong evidence of preservation and these modules could be combined into a single large blood module. We also identified highly connected intramodular "hub" genes inside preserved modules. These preserved intramodular hub genes had the following properties: first, their expression levels tended to be significantly more heritable than those from non-preserved intramodular hub genes (p < 10⁻⁹⁰); second, they had highly significant positive correlations with the following cluster of differentiation genes: CD58, CD47, CD48, CD53 and CD164; third, a significant number of them were known to be involved in infection mechanisms, post-transcriptional and post-translational modification and other basic processes. Overall, we find transcriptome organization is poorly preserved between brain and blood. However, the subset of preserved co-expression relationships characterized here may aid future efforts to identify blood biomarkers for neurological and neuropsychiatric diseases when brain tissue samples are unavailable.
    BMC Genomics 10/2010; 11:589. · 4.07 Impact Factor
  • Source
    Article: Detecting network modules in fMRI time series: a weighted network analysis approach.
    [show abstract] [hide abstract]
    ABSTRACT: Many network analyses of fMRI data begin by defining a set of regions, extracting the mean signal from each region and then analyzing the correlations between regions. One essential question that has not been addressed in the literature is how to best define the network neighborhoods over which a signal is combined for network analyses. Here we present a novel unsupervised method for the identification of tightly interconnected voxels, or modules, from fMRI data. This approach, weighted voxel coactivation network analysis (WVCNA), is based on a method that was originally developed to find modules of genes in gene networks. This approach differs from many of the standard network approaches in fMRI in that connections between voxels are described by a continuous measure, whereas typically voxels are considered to be either connected or not connected depending on whether the correlation between the two voxels survives a hard threshold value. Additionally, instead of simply using pairwise correlations to describe the connection between two voxels, WVCNA relies on a measure of topological overlap, which not only compares how correlated two voxels are but also the degree to which the pair of voxels is highly correlated with the same other voxels. We demonstrate the use of WVCNA to parcellate the brain into a set of modules that are reliably detected across data within the same subject and across subjects. In addition we compare WVCNA to ICA and show that the WVCNA modules have some of the same structure as the ICA components, but tend to be more spatially focused. We also demonstrate the use of some of the WVCNA network metrics for assessing a voxel's membership to a module and also how that voxel relates to other modules. Last, we illustrate how WVCNA modules can be used in a network analysis to find connections between regions of the brain and show that it produces reasonable results.
    NeuroImage 10/2010; 52(4):1465-76. · 5.89 Impact Factor
  • Article: Weighted gene coexpression network analysis: state of the art.
    [show abstract] [hide abstract]
    ABSTRACT: Weighted gene coexpression network analysis (WGCNA) has been applied to many important studies since its introduction in 2005. WGCNA can be used as a data exploratory tool or as a gene screening method; WGCNA can also be used as a tool to generate testable hypothesis for validation in independent data sets. In this article, we review key concepts of WGCNA and some of its applications in gene expression analysis of oncology, brain function, and protein interaction data.
    Journal of Biopharmaceutical Statistics 03/2010; 20(2):281-300. · 1.34 Impact Factor
  • Conference Proceeding: Comparing Robust Methods for Defining Directed Networks and Their Hub Nodes.
    Peter Langfelder, Tova Fuller, Steve Horvath
    International Conference on Bioinformatics & Computational Biology, BIOCOMP 2010, July 12-15, 2010, Las Vegas Nevada, USA, 2 Volumes; 01/2010
  • Source
    Article: Weighted gene co-expression network analysis of the peripheral blood from Amyotrophic Lateral Sclerosis patients.
    [show abstract] [hide abstract]
    ABSTRACT: Amyotrophic Lateral Sclerosis (ALS) is a lethal disorder characterized by progressive degeneration of motor neurons in the brain and spinal cord. Diagnosis is mainly based on clinical symptoms, and there is currently no therapy to stop the disease or slow its progression. Since access to spinal cord tissue is not possible at disease onset, we investigated changes in gene expression profiles in whole blood of ALS patients. Our transcriptional study showed dramatic changes in blood of ALS patients; 2,300 probes (9.4%) showed significant differential expression in a discovery dataset consisting of 30 ALS patients and 30 healthy controls. Weighted gene co-expression network analysis (WGCNA) was used to find disease-related networks (modules) and disease related hub genes. Two large co-expression modules were found to be associated with ALS. Our findings were replicated in a second (30 patients and 30 controls) and third dataset (63 patients and 63 controls), thereby demonstrating a highly significant and consistent association of two large co-expression modules with ALS disease status. Ingenuity Pathway Analysis of the ALS related module genes implicates enrichment of functional categories related to genetic disorders, neurodegeneration of the nervous system and inflammatory disease. The ALS related modules contain a number of candidate genes possibly involved in pathogenesis of ALS. This first large-scale blood gene expression study in ALS observed distinct patterns between cases and controls which may provide opportunities for biomarker development as well as new insights into the molecular mechanisms of the disease.
    BMC Genomics 09/2009; 10:405. · 4.07 Impact Factor
  • Source
    Article: WGCNA: an R package for weighted correlation network analysis.
    Peter Langfelder, Steve Horvath
    [show abstract] [hide abstract]
    ABSTRACT: Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA.
    BMC Bioinformatics 01/2009; 9:559. · 2.75 Impact Factor

Institutions

  • 2012
    • CSU Mentor
      Long Beach, CA, USA
    • University of California, San Francisco
      • Department of Neurology
      San Francisco, CA, USA
  • 2007–2010
    • University of California, Los Angeles
      • Department of Human Genetics
      Los Angeles, CA, USA
  • 2009
    • Universitair Medisch Centrum Utrecht
      • Department of Neurology
      Utrecht, Provincie Utrecht, Netherlands