Conference PaperPDF Available

NEAT: an efficient Network Enrichment Analysis Test

Authors:

Abstract

Network enrichment analysis (NEA) integrates gene enrichment analysis with information on dependences between genes. Existing tests for NEA rely on normality assumptions, they can deal only with undirected networks and are computationally slow. We propose NEAT, an alternative test based on the hypergeometric distribution. NEAT can be applied also to directed and mixed networks, and it is faster and more powerful than existing NEA tests.
NEAT: an efficient Network Enrichment
Analysis Test
Mirko Signorelli1,2, Veronica Vinciotti3, Ernst C. Wit1
1Johann Bernoulli Institute, University of Groningen, Netherlands
2Department of Statistical Sciences, University of Padova, Italy
3Department of Mathematics, Brunel University London, United Kingdom
E-mail for correspondence: m.signorelli@rug.nl
REFERENCE ARTICLE:
After inclusion in the IWSM Conference Proceedings, an extended version
of this article has been published by BMC Bioinformatics as
Signorelli, M., Vinciotti, V., Wit, E. C. (2016). NEAT: an efficient network
enrichment analysis test. BMC Bioinformatics, 17:352. DOI: 10.1186/s12859-
016-1203-6.
Please refer to the article in BMC Bioinformatics (available here) for cita-
tion purposes as well as for a wider overview on NEAT.
Abstract: Network enrichment analysis (NEA) integrates gene enrichment anal-
ysis with information on dependences between genes. Existing tests for NEA
rely on normality assumptions, they can deal only with undirected networks and
are computationally slow. We propose NEAT, an alternative test based on the
hypergeometric distribution. NEAT can be applied also to directed and mixed
networks, and it is faster and more powerful than existing NEA tests.
Keywords: networks; enrichment analysis; gene expression.
1 Introduction
When the first data on gene expression became available, they were anal-
ysed considering each gene separately. However, researchers soon realized
that genes act in a concerted manner, and that cellular processes are often
the result of complex interactions between different genes and molecules.
Nowadays, sets of genes that are responsible for many cellular functions
This paper was published as a part of the proceedings of the 31st Inter-
national Workshop on Statistical Modelling, INSA Rennes, 4–8 July 2016. The
copyright remains with the author(s). Permission to reproduce or extract any
parts of this abstract should be requested from the author(s).
2 NEAT: an efficient Network Enrichment Analysis Test
have been identified, and are collected in publicly available databases (such
as GO and KEGG). These sets of genes, whose function is already known,
can be used to characterize and interpret (“enrich”) the results of new
experiments. This characterization is typically done by means of gene en-
richment analysis (GEA) tests, which allow to compare gene expression
levels between two conditions (experimental and control) and to detect
functional sets of genes that are activated or repressed in the experimen-
tal condition. The power of GEA tests is often low, mostly because they
consider the level of overlap between sets of genes only, and they ignore
associations and dependences that exist between genes.
Recently, Alexeyenko et al. (2012) and McCormack et al. (2013) have pro-
posed to integrate GEA with information on dependences between genes
by making use of gene networks. The idea is that “enrichment” between
two sets of genes Aand Bcan be assessed by comparing the number of
links connecting nodes in Aand B,nAB , with a reference distribution that
assumes that no relation exists between the two sets. Their tests rely on
a normal approximation for the reference distribution (which is discrete),
they require the computation of many network permutations (an activity
that can be highly time consuming) and are restricted to the analysis of
undirected networks.
In the sequel we propose NEAT, an alternative Network Enrichment Anal-
ysis Test based on the hypergeometric distribution. The assumption that
in absence of enrichment NAB is distributed as an hypergeometric arises
quite naturally, and enables us to avoid normal approximations and net-
work permutations. We develop NEAT not only for undirected, but also
for directed and mixed networks, thus providing a common framework for
the analysis of different types of networks.
2 Methods
A graph is a pair G= (V, E ), which consists of a set of nodes Vconnected by
a set of directed or undirected edges EV×V. In gene regulatory networks
each gene is represented as a node of the graph, and an edge between two
nodes is drawn to signify dependence between the corresponding genes.
In the inferred network, we expect that individual links may be slightly
unstable and noisy. However, we do expect that inferred links contain a
sign of the relationships between functional gene sets. So, if there is a
functional relationship (i.e., enrichment) between functions described by
sets AVand BV, then we expect the number of links between the
two groups to be larger (or smaller) than expected by chance.
2.1 Directed and mixed networks
In directed networks, we assess the presence of enrichment from Ato B
by considering the number of arrows nAB going from genes in Ato genes
Signorelli et al. 3
belonging to B. The observed nAB can be thought as a realization from
the random variable NAB, with expected value µAB . We compare µAB with
the number of arrows µ0that we would expect to observe from Ato Bby
chance, and test H0:µAB =µ0versus H1:µAB 6=µ0. We say that there
is enrichment from Ato Bif µAB is significantly different from µ0.
We use the hypergeometric distribution to model the null distribution of
NAB . The hypergeometric models the number of successes in a random
sample without replacement: in our case, let’s mark arrows that reach genes
in Bas “successful”, and the remaining ones as “unsuccessful”. If there is
no relation between Aand B, we can view the arrows that go out from
genes in Aas a random sample without replacement from the population
of arrows present in the graph, and nAB as the number of successes in that
sample. Thus, the distribution of NAB when H0is true is
NAB hypergeom(n=oA, K =iB, N =iV),(1)
where the sample size oAis the outdegree of A, the number of successes
in the population iBis the indegree of Band the population size iVis the
total indegree of the network. So, we expect µ0=oA
iB
iVto increase as the
indegree of A, or the outdegree of B, increases. A toy example that explains
the rationale behind NEAT is presented in Figure 1.
Bearing in mind the fact that for a discrete test statistic Tthe usual formula
for p-values p1= 2 min P0[(Tt), P (Tt)] can exceed 1, we compute
the p-value using
p= 2 min [P0(NAB > nAB ), P0(NAB < nAB )] + P0(NAB =nAB ),(2)
which differs from p1by a factor equal to P0(T=t). A p-value close to 0
can be regarded as evidence of enrichment, because it entails that nAB is
significantly higher/smaller than we would expect it to be under H0. For a
given type I error α, one can then conclude that there is enrichment from
Ato Bif p<α.
A mixed network is a network where both directed and undirected edges
are present. It is possible to regard a mixed network as a directed network,
where every undirected edge vwstands for two directed arrows, v
wand wv. NEAT adopts such convention for the analysis of mixed
networks.
2.2 Undirected networks.
When dealing with undirected networks, the presence of enrichment be-
tween Aand Bdepends on the number of links nAB that connect genes in
Ato genes in B. Here, there is no distinction between indegree and outde-
gree of a node, and it only makes sense to consider the degree of a node:
thus, assumption (1) needs to be properly modified. Define the total degree
of a set as the sum of the degrees of nodes that belong to it: then, the null
distribution is NAB hypergeom(n=dA, K =dB, N =dV),where dA,
dBand dVare the total degrees of sets A, B and V.
4 NEAT: an efficient Network Enrichment Analysis Test
A
1
2
3
4
5
6
7
8
B
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
FIGURE 1. A directed network with 8 nodes (A) and its bipartite representa-
tion (B). Suppose that one wants to know whether there is enrichment from set
A={1,4}to set B={3,5,7}. There are 5 arrows going out from A, and 2
of them reach B. The whole network consists of 15 arrows, of which 4 reach B.
Thus, nAB = 2, oA= 5, iB= 4 and iV= 15. The idea behind NEAT is that, if
the 5 arrows that are going out from Aare a random sample (without replace-
ment) from the population of 15 arrows that are present in the network, then
the proportion of arrows reaching Bfrom Ashould be close to the proportion of
arrows reaching Bin the whole network. In this case, it seems that arrows going
out from Atend to reach Bmore frequently (40%) than other arrows do (27%
of the 15 arrows in the network reach B). However, the computation of the test
leads to p= 0.48: the observed nAB = 2 does not provide enough evidence to
reject the null hypothesis that there is no enrichment from Ato B.
2.3 Software.
NEAT is implemented in the Rpackage neat, which is available on CRAN
(Signorelli et al., 2016). neat allows the user to specify the network in
different formats, and it includes a set of data and examples.
3 Simulations
We compare the performance of NEAT with the NEA test of Alexeyenko et
al. (2012) and with the LP, LA, LA+S and NP tests of McCormack et al.
(2013) by means of two simulations. We simulate two undirected random
networks with 1000 nodes, whose degree distributions are a power law in
simulation S1, and a mixture of Poisson distributions in simulation S2. We
test enrichment between 50 sets of nodes, with cardinality ranging from 50
to 100 nodes. We modify the original networks to introduce enrichments
between 100 pairs of these sets, by either increasing or reducing nAB by
Signorelli et al. 5
a proportion uniformly ranging from 10 to 50%. The results (see Table 1)
show that the distribution of p-values is uniform in both cases for NEAT
and LA, and in one case for LA+S (S1) and NP (S2). NEA and LP, instead,
do not produce uniform distributions in any case. In both S1 and S2, NEAT
turns out to have the highest discriminatory capacity (AUC) and to be by
far the fastest method, from 22 to 3000 times faster than alternative tests.
TABLE 1. Results of simulation S1 and S2. The best results in each column are
bolded. Abbreviations: pKS denotes the p-value of the Kolmogorov-Smirnov test
for H0:XU(0,1); AUC stands for “area under the ROC curve”. Time is
expressed in seconds.
Simulation S1 Simulation S2
Test pK S AUC Time pKS AUC Time
NEAT 0.399 0.920 0.6 0.343 0.925 0.7
NEA 0.001 0.918 2125.4 0.024 0.912 2151.5
LP 0 0.908 28.6 0 0.904 44.7
LA 0.255 0.897 14.4 0.111 0.908 18.0
LA+S 0.409 0.913 21.8 0.024 0.910 27.6
NP 0.037 0.884 12.9 0.323 0.908 15.8
4 Data analysis
After analysing gene expression patterns of yeast Saccaromyces cerevisiae
in response to different stressful stimuli, Gasch et al. (2000) inferred the
existence of two set of genes, collectively called Environmental Stress Re-
sponse (ESR), that constitute a coordinated, initial reaction to the emer-
gence of any hostile condition in the cell. The original study made use of a
GEA test to characterize the two sets. Here, we incorporate into the analysis
known associations between genes, as represented in the YeastNet network
(Kim et al., 2013). For lack of space, we do not show here the lists of en-
richments detected by NEAT for the two ESR sets; however, such lists can
be retrieved running the example in the help page ?yeast of the Rpackage
neat (Signorelli et al., 2016). In short, NEAT detects most of the enrich-
ments that were found in the original study for the two ESR sets; besides, it
unveils some further enrichments related to molecular transportation and
amino-acid biosynthesis for the set of induced ESR genes, which would be
overlooked if functional couplings between genes were ignored.
6 NEAT: an efficient Network Enrichment Analysis Test
5 Conclusion
Traditional gene enrichment analysis assesses enrichment between gene sets
solely on the basis of the extent of their overlap. Network enrichment anal-
ysis is a powerful extension of traditional GEA tests, which makes use
of genetic networks to integrate enrichment analyses with information on
associations and dependences that exist between genes.
We have developed NEAT, a test for network enrichment analysis that
aims to overcome some limitations of the resampling-based tests of Alex-
eyenko et al. (2012) and McCormack et al. (2013). First of all, we believe
that a normal approximation does not make justice to the discrete nature of
NAB . We have showed that this approximation can be avoided, if one mod-
els NAB with the hypergeometric distribution. In addition, existing NEA
tests require the computation of many network permutations: this opera-
tion can be highly time consuming, slowing down computations consider-
ably. NEAT, instead, fully specifies the null distribution of NAB without
resorting to permutations, thus speeding up the computation of the test.
A further drawback of existing resampling-based tests is that they have
been implemented only for undirected networks: we address this problem
proposing two different parametrizations for NEAT, that take into account
the different nature of directed and undirected edges.
The test is implemented in the Rpackage neat, which is freely available on
CRAN (Signorelli et al., 2016). Our simulations show that NEAT behaves
well under the null hypothesis, is more powerful and faster than existing
NEA tests. Application to the Environmental Stress Response data shows
that NEAT can detect most of the enrichments that were found with GEA
methods, and unveils further enrichments that would be overlooked, if de-
pencences between genes were ignored. We believe that NEAT could con-
stitute a flexible and computationally efficient test for network enrichment
analysis. Potential applications of NEAT extend beyond gene regulatory
networks, and include social networks, brain networks and other situations
where one attempts to understand the relation between groups of vertices
in a network.
References
Alexeyenko, A., Lee, W., Pernemalm, M., Guegan, J., Dessen, P. et al. (2012).
Network enrichment analysis: extension of gene-set enrichment anal-
ysis to gene networks. BMC Bioinformatics,13:226.
Gasch, A. P., Spellman, P. T., Kao, C. M., Carmel-Harel, O., Eisen, M. B.
et al. (2000). Genomic expression programs in the response of yeast
cells to environmental changes. Molecular Biology of the Cell,11(12),
4241 – 4257.
Signorelli et al. 7
Kim, H., Shin, J., Kim, E., Kim, H., Hwang, S. et al. (2013). YeastNet v3:
a public database of data-specific and integrated functional gene net-
works for Saccharomyces cerevisiae. Nucleic Acids Research, 1 – 13.
McCormack, T., Frings, O., Alexeyenko, A., Sonnhammer, E. L. (2013). Sta-
tistical assessment of crosstalk enrichment between gene groups in
biological networks. PLoS One,8(1):e54945.
Signorelli, M., Vinciotti, V., Wit E. C. (2016). NEAT: efficient Network En-
richment Analysis Test. https://cran.r-project.org/package=neat.
... genome.jp/kegg). For the latter, we make use of a recently developed test for network enrichment [28]. ...
... Given the estimated networks, the test developed by [28], and implemented in the R package neat, is used to detect enrichment of the networks among KEGG pathways. In particular, the test detects whether the number of edges between two pathways in the inferred network is larger than what is expected by chance. ...
... In particular, we summarise the networks in terms of interactions among 62 KEGG pathways. The test neat [28] is used to detect enrichment among any pair of pathways. Figure 6 shows the quantile-quantile plots (q-q plots) of the p-values for all pairwise comparisons. ...
Article
Full-text available
Background Sparse Gaussian graphical models are popular for inferring biological networks, such as gene regulatory networks. In this paper, we investigate the consistency of these models across different data platforms, such as microarray and next generation sequencing, on the basis of a rich dataset containing samples that are profiled under both techniques as well as a large set of independent samples. Results Our analysis shows that individual node variances can have a remarkable effect on the connectivity of the resulting network. Their inconsistency across platforms and the fact that the variability level of a node may not be linked to its regulatory role mean that, failing to scale the data prior to the network analysis, leads to networks that are not reproducible across different platforms and that may be misleading. Moreover, we show how the reproducibility of networks across different platforms is significantly higher if networks are summarised in terms of enrichment amongst functional groups of interest, such as pathways, rather than at the level of individual edges. Conclusions Careful pre-processing of transcriptional data and summaries of networks beyond individual edges can improve the consistency of network inference across platforms. However, caution is needed at this stage in the (over)interpretation of gene regulatory networks inferred from biological data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1136-0) contains supplementary material, which is available to authorized users.
... NEAT is implemented in the R package neat [23], which can be freely downloaded from CRAN: https://cran.rproject.org/package=neat. The manual and a vignette illustrating the package are also available from the same URL. ...
... Software: NEAT is implemented in the R package neat [23], which can be freely downloaded from CRAN: https://cran.r-project.org/package=neat. The manual and a vignette illustrating the package are available from the same URL. ...
Article
Full-text available
Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by testing for associations between GO Slim categories themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN.
... where o A denotes the outdegree of A and i B and i V are the indegrees of B and V . In its original implementation (Signorelli et al., 2016b), NEAT tests the null hypothesis H 0 : µ AB = µ 0 that the expected number of edges (arrows) between A and B, µ AB = E(N AB ), is equal to the expected number of links µ 0 AB = E(N AB |H 0 ) = nK/N obtained from models (1) and (2) against the two-tailed alternative H 1 : µ AB = µ 0 . Here, we consider instead two distinct one-tailed tests, one for overenrichment, H 0 : µ AB = µ 0 vs H 1 : µ AB > µ 0 , and one for underenrichment, H 0 : µ AB = µ 0 vs H 1 : µ AB < µ 0 . ...
Preprint
Full-text available
Community structure is a commonly observed feature of real networks. The term refers to the presence in a network of groups of nodes (communities) that feature high internal connectivity, but are poorly connected between each other. Whereas the issue of community detection has been addressed in several works, the problem of validating a partition of nodes as a good community structure for a real network has received considerably less attention and remains an open issue. We propose a set of indices for community structure validation of network partitions that are based on an hypothesis testing procedure that assesses the distribution of links between and within communities. Using both simulations and real data, we illustrate how the proposed indices can be employed to compare the adequacy of different partitions of nodes as community structures in a given network, to assess whether two networks share the same or similar community structures, and to evaluate the performance of different network clustering algorithms.
Article
Full-text available
Community structure is a commonly observed feature of real networks. The term refers to the presence in a network of groups of nodes (communities) that feature high internal connectivity, but are poorly connected between each other. Whereas the issue of community detection has been addressed in several works, the problem of validating a partition of nodes as a good community structure for a real network has received considerably less attention and remains an open issue. We propose a set of indices for community structure validation of network partitions that are based on an hypothesis testing procedure that assesses the distribution of links between and within communities. Using both simulations and real data, we illustrate how the proposed indices can be employed to compare the adequacy of different partitions of nodes as community structures in a given network, to assess whether two networks share the same or similar community structures, and to evaluate the performance of different network clustering algorithms.
Article
Full-text available
Background Quantitative proteomics studies are often used to detect proteins that are differentially expressed across different experimental conditions. Functional enrichment analyses are then typically used to detect annotations, such as biological processes that are significantly enriched among such differentially expressed proteins to provide insights into the molecular impacts of the studied conditions. While common, this analytical pipeline often heavily relies on arbitrary thresholds of significance. However, a functional annotation may be dysregulated in a given experimental condition, while none, or very few of its proteins may be individually considered to be significantly differentially expressed. Such an annotation would therefore be missed by standard approaches. Results Herein, we propose a novel graph theory-based method, PIGNON, for the detection of differentially expressed functional annotations in different conditions. PIGNON does not assess the statistical significance of the differential expression of individual proteins, but rather maps protein differential expression levels onto a protein–protein interaction network and measures the clustering of proteins from a given functional annotation within the network. This process allows the detection of functional annotations for which the proteins are differentially expressed and grouped in the network. A Monte-Carlo sampling approach is used to assess the clustering significance of proteins in an expression-weighted network. When applied to a quantitative proteomics analysis of different molecular subtypes of breast cancer, PIGNON detects Gene Ontology terms that are both significantly clustered in a protein–protein interaction network and differentially expressed across different breast cancer subtypes. PIGNON identified functional annotations that are dysregulated and clustered within the network between the HER2+, triple negative and hormone receptor positive subtypes. We show that PIGNON’s results are complementary to those of state-of-the-art functional enrichment analyses and that it highlights functional annotations missed by standard approaches. Furthermore, PIGNON detects functional annotations that have been previously associated with specific breast cancer subtypes. Conclusion PIGNON provides an alternative to functional enrichment analyses and a more comprehensive characterization of quantitative datasets. Hence, it contributes to yielding a better understanding of dysregulated functions and processes in biological samples under different experimental conditions.
Article
Full-text available
The evolution of resistance to one antimicrobial can result in enhanced sensitivity to another, known as "collateral sensitivity." This underexplored phenomenon opens new therapeutic possibilities for patients infected with pathogens unresponsive to classical treatments. Intrinsic resistance to β-lactams in Mycobacterium tuberculosis (the causative agent of tuberculosis) has traditionally curtailed the use of these low-cost and easy-to-administer drugs for tuberculosis treatment. Recently, β-lactam sensitivity has been reported in strains resistant to classical tuberculosis therapy, resurging the interest in β-lactams for tuberculosis. However, a lack of understanding of the molecular underpinnings of this sensitivity has delayed exploration in the clinic. We performed gene expression and network analyses and in silico knockout simulations of genes associated with β-lactam sensitivity and genes associated with resistance to classical tuberculosis drugs to investigate regulatory interactions and identify key gene mediators. We found activation of the key inhibitor of β-lactam resistance, blaI, following classical drug treatment as well as transcriptional links between genes associated with β-lactam sensitivity and those associated with resistance to classical treatment, suggesting that regulatory links might explain collateral sensitivity to β-lactams. Our results support M. tuberculosis β-lactam sensitivity as a collateral consequence of the evolution of resistance to classical tuberculosis drugs, mediated through changes to transcriptional regulation. These findings support continued exploration of β-lactams for the treatment of patients infected with tuberculosis strains resistant to classical therapies. IMPORTANCE Tuberculosis remains a significant cause of global mortality, with strains resistant to classical drug treatment considered a major health concern by the World Health Organization. Challenging treatment regimens and difficulty accessing drugs in low-income communities have led to a high prevalence of strains resistant to multiple drugs, making the development of alternative therapies a priority. Although Mycobacterium tuberculosis is naturally resistant to β-lactam drugs, previous studies have shown sensitivity in strains resistant to classical drug treatment, but we currently lack understanding of the molecular underpinnings behind this phenomenon. We found that genes involved in β-lactam susceptibility are activated after classical drug treatment resulting from tight regulatory links with genes involved in drug resistance. Our study supports the hypothesis that β-lactam susceptibility observed in drug-resistant strains results from the underlying regulatory network of M. tuberculosis, supporting further exploration of the use of β-lactams for tuberculosis treatment.
Book
Full-text available
Statistical procedures for differential gene expression using microarrays and RNA-Seq data. Marginal differential gene expression and gene set analysis is studied.
Thesis
Full-text available
Despite a long tradition in the study of graphs and relational data, for decades the analysis of complex networks was limited by difficulties in data collection and computational burdens. The advent of new technologies in life sciences, as well as in our daily life, has suddenly shed light on the many interconnections that our world features, from friendships and collaborations between individuals or organizations, to functional couplings between cellular molecules. This has highly facilitated the collection of relational data, fostering an unprecedented interest in network science. Understanding relations encoded in complex networks, however, still represents a challenging task, and statistical methods that can help to summarize and simplify complex networks are needed. In this thesis we show that often one can gain a deep insight of a network by focusing their attention on communities, i.e. on clusters of nodes, and on the relations that exist between them. We begin by presenting NEAT, a network-based test that allows to assess relations between gene sets in a gene interaction network. NEAT extends traditional gene enrichment analysis tests by incorporating information on interactions between genes and it overcomes some limitations of existing network enrichment analysis approaches. Then, we propose two extended stochastic blockmodels that allow to infer the relations that exist between communities from relations between pairs of individuals in a social network. We advocate the use of penalized inference to estimate these models, with the aim of deriving a sparse reduced graph between communities. Application of these models to bill cosponsorship networks in the Italian Chamber of Deputies allows us to reconstruct the pattern of collaborations between Italian political parties from 2001 to 2015. Finally, we propose a novel clustering strategy for sequences of graphs, based on mixtures of generalized linear models. We show that the proposed clustering method not only is capable to retrieve subpopulations of networks within a cross-sectional or longitudinal sequence of networks, but it also allows to directly characterize them by considering each of the components that form the mixture model.
Article
Full-text available
Saccharomyces cerevisiae, i.e. baker’s yeast, is a widely studied model organism in eukaryote genetics because of its simple protocols for genetic manipulation and phenotype profiling. The high abundance of publicly available data that has been generated through diverse ‘omics’ approaches has led to the use of yeast for many systems biology studies, including large-scale gene network modeling to better understand the molecular basis of the cellular phenotype. We have previously developed a genome-scale gene network for yeast, YeastNet v2, which has been used for various genetics and systems biology studies. Here, we present an updated version, YeastNet v3 (available at http://www.inetbio.org/yeastnet/), that significantly improves the prediction of gene–phenotype associations. The extended genome in YeastNet v3 covers up to 5818 genes (∼99% of the coding genome) wired by 362 512 functional links. YeastNet v3 provides a new web interface to run the tools for network-guided hypothesis generations. YeastNet v3 also provides edge information for all data-specific networks (∼2 million functional links) as well as the integrated networks. Therefore, users can construct alternative versions of the integrated network by applying their own data integration algorithm to the same data-specific links.
Article
Full-text available
Motivation Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.
Article
Full-text available
Background Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis. Results We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study. Conclusions The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.
Article
Full-text available
We explored genomic expression patterns in the yeast Saccharomyces cerevisiae responding to diverse environmental transitions. DNA microarrays were used to measure changes in transcript levels over time for almost every yeast gene, as cells responded to temperature shocks, hydrogen peroxide, the superoxide-generating drug menadione, the sulfhydryl-oxidizing agent diamide, the disulfide-reducing agent dithiothreitol, hyper- and hypo-osmotic shock, amino acid starvation, nitrogen source depletion, and progression into stationary phase. A large set of genes (approximately 900) showed a similar drastic response to almost all of these environmental changes. Additional features of the genomic responses were specialized for specific conditions. Promoter analysis and subsequent characterization of the responses of mutant strains implicated the transcription factors Yap1p, as well as Msn2p and Msn4p, in mediating specific features of the transcriptional response, while the identification of novel sequence elements provided clues to novel regulators. Physiological themes in the genomic responses to specific environmental stresses provided insights into the effects of those stresses on the cell.