Identification and analysis of evolutionarily cohesive
functional modules in protein networks
Mónica Campillos, Christian von Mering, Lars Juhl Jensen, and Peer Bork1
The European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
The increasing number of sequenced genomes makes it possible to infer the evolutionary history of functional
modules, i.e., groups of proteins that contribute jointly to the same cellular function in a given species. Here we
identify and analyze those prokaryotic functional modules, whose composition remains largely unchanged during
evolution, and study their properties. Such “cohesive” modules have a large number of internal functional
connections, encode genes that tend to be in close proximity in prokaryotic genomes, and correspond to physical
complexes or complex functional systems like the flagellar apparatus. Cohesive modules are enriched in processes
such as energy and amino acid metabolism, cell motility, and intracellular trafficking, or secretion. By grouping
genes into modules we achieve a more precise estimate of their age and find that the young modules are often
horizontally transferred between species and are enriched in functions involved in interactions with the environment,
implying that they play an important role in the adaptation of species to new environments.
[Supplemental material is available online at www.genome.org.]
Functional modules, groups of proteins that work together for
the same cellular function, have been described in a variety of
networks, e.g., as enzymatic pathways in metabolic networks
(Ravasz et al. 2002), as groups of interconnected proteins in pro-
tein interaction networks (Ravasz et al. 2002; Rives and Galitski
2003), or as closely linked clusters in predicted in silico protein
association networks (Snel et al. 2002b; von Mering et al. 2003a).
Functionally linked proteins have been shown to evolve together
(Pellegrini et al. 1999; Ettema et al. 2001, 2003) and proteins with
similar phylogenetic distributions are often components of the
same pathway (Huynen and Bork 1998; Marcotte et al. 1999;
Pellegrini et al. 1999, 2001; Wu et al. 2003). Although algorithms
that identify sets of genes with similar phylogenetic distributions
are able to reconstruct many known pathways (Pellegrini et al.
1999; Date and Marcotte 2003; Wu et al. 2003), a recent study of
the evolutionary modularity of several types of modules indicates
that they show only limited conservation during evolution (Snel
and Huynen 2004).
Nevertheless, some prokaryotic modules do show evolution-
ary cohesion (i.e., their components are frequently gained, trans-
ferred, or lost together); these are often conserved at the operon
level and frequently encode biosynthetic pathways (Snel and
Huynen 2004). Several hypotheses have been put forward to ex-
plain these observations, such as the notion of “selfish operons”
(Lawrence 1997), although only a few operons are stable over
very long evolutionary time scales (Itoh et al. 1999; Lathe III et al.
2000). There seem to be differences in the evolution of cohesive
modules, as some prokaryotic metabolic pathways show a broad
phylogenetic conservation (Peregrin-Alvarez et al. 2003), while
others are more restricted to specific groups of bacteria (Martin et
al. 2003). Some modules are more cohesive than others: Operons
coding for physical complexes such as ribosomal proteins, pro-
ton ATPases, and ABC-type membrane transporters show conser-
vation even at the level of gene order (Mushegian and Koonin
1996; Siefert et al. 1997; Dandekar et al. 1998; Wolf et al. 2001).
However, others have been subjected to more dynamic evolu-
tion, with frequent losses and gains of genes in different phylo-
genetic lineages (Tanaka et al. 2005).
The lack of a quantitative measure of cohesiveness has so far
prevented comparative analysis of the evolutionary properties of
functional modules. Here, we identify modules that appear to be
cohesive during evolution and perform a parsimony analysis to
determine when and why these modules appeared. We first study
topological and functional properties of these modules; we then
classify them into ancestral, intermediate, and young age according
to their inferred first appearance during evolution, and study
functional characteristics of these age classes. Finally, we analyze
the horizontal transfer of cohesive modules in extant species and
the role of this transfer in the adaptation of species to environ-
Results and Discussion
Quantifying the evolutionary cohesiveness of functional
Defining functional modules
We have previously identified functional modules by clustering
neighbors in protein interaction networks (von Mering et al.
2003a). We used interaction networks that cover multiple organ-
isms at once, with each network node corresponding to an or-
thologous group of proteins (hereafter OG, see Methods). Edges
between the nodes represent functional associations, derived by
combining a variety of different protein interaction data includ-
ing experimentally verified interactions, predicted interactions
based on gene context methods such as gene neighborhood and
fusion, as well as interactions derived from text-mining analysis
(von Mering et al. 2005). As is generally the case for prokaryotes,
the biggest contributors of association information are chromo-
somal neighborhood and text-mining, but the other association
types contribute as well, sometimes even forming the majority of
the interactions in a module (Table S4).
Functional modules derived this way have a high coverage
E-mail firstname.lastname@example.org; fax +49 6221-387-517.
Article published online ahead of print. Article and publication date are at
374 Genome Research
16:374–382 ©2006 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/06; www.genome.org
and accuracy when benchmarked against manually curated Esch-
erichia coli metabolic pathways (von Mering et al. 2003a) and
they cover a broad range of cellular functions. As they are com-
prehensive and unsupervised (i.e., largely objective), they form a
good basis for a systematic analysis of the evolution of prokary-
otic functional modules. Applying the concept to 102 prokary-
otic species with completely sequenced genomes, a total of 1161
functional modules were identified, containing 3812 out of the
9912 prokaryotic OGs included in these species.
Tracing the evolutionary history of modules
We inferred, for each functional module, the most plausible evo-
lutionary history through parsimony analysis: In a phylogenetic
tree containing 110 species (86 bacteria, 16 archaea, and eight
eukaryotes), the presence or absence of each module component
was inferred for all ancestral nodes in the tree (Fig. 1A). The
evolutionary events that were modeled for this parsimony analy-
sis were (1) gene birth, (2) gene loss, and (3) gene acquisition. We
assigned relative costs to each type of event (see Methods) and
computed for each module which evolutionary scenario incurred
the lowest overall cost (using dynamic programming to screen
scenarios capable of explaining the present-day distribution of
the module). In the cost function, we assumed that multiple
events happening at the same time incurred a somewhat lower
cost than multiple independent events (as long as they were of
the same type; see Methods). Our approach is based on two im-
plicit assumptions: Proteins known to interact today are likely to
have interacted also in the past, and a certain degree of evolu-
tionary dependence between functional partners exists (hence
the lower cost for events happening simultaneously). The latter
assumption is not essential: All results reported below are also
observed when full evolutionary independence is assumed (Fig.
Scoring and defining evolutionarily cohesive modules
Given the most parsimonious scenario for the evolution of the
genes in a module, we then asked to what extent the module was
“cohesive,” i.e., whether events involving one of the proteins
had an influence on other proteins in the module (more so than
what would be expected for random modules). We assessed how
many events were “joined” (i.e., proteins lost or acquired to-
gether, at the same node in the tree). Together with the cost
function, the “fraction of joined events” provides a measure de-
scribing the evolutionary history of each module. We compared
both measures to values derived from a conservative randomiza-
tion of modules (see Methods) and derived a single P-value
for each module. We chose this approach (i.e., combining parsi-
mony analysis with Monte Carlo P-value computation) because
it explicitly models evolutionary events against the backdrop
of the known species phylogeny, while at the same time it
provides a quantitative measure of cohesiveness that can be
used for ranking and comparing modules. At a cutoff of P < 0.01,
we found 472 of the 1161 functional modules to be cohesive,
in agreement with previous qualitative estimates of the
modularity of functional modules (Snel and Huynen 2004)
noncohesive module (b). The presence of a gene in a species or ancestral state is indicated by a black square. (B) The two evolutionary parameters are
plotted for prokaryotic functional modules and random modules. The “normalized total cost” and the “fraction of joined events” for the cohesive (a)
and non cohesive (b) modules of Figure 1A are indicated. (C) Distribution of total modules and evolutionarily cohesive modules (P < 10?2), by module
Quantification of evolutionary cohesiveness. (A) A simplified example of the ancestral states of a cohesive functional module (a) and a
Evolutionarily cohesive modules in networks
support for the primacy of RNA. J. Mol. Evol. 45: 467–472.
Smejkal, C.W., Vallaeys, T., Seymour, F.A., Burton, S.K., and
Lappin-Scott, H.M. 2001. Characterization of (R/S)-mecoprop
[2-(2-methyl-4-chlorophenoxy) propionic acid]-degrading Alcaligenes
sp. CS1 and Ralstonia sp. CS2 isolated from agricultural soils. Environ.
Microbiol. 3: 288–293.
Snel, B. and Huynen, M.A. 2004. Quantifying modularity in the
evolution of biomolecular systems. Genome Res. 14: 391–397.
Snel, B., Bork, P., and Huynen, M.A. 1999. Genome phylogeny based on
gene content. Nat. Genet. 21: 108–110.
———. 2002a. Genomes in flux: The evolution of archaeal and
proteobacterial gene content. Genome Res. 12: 17–25.
———. 2002b. The identification of functional modules from the
genomic association of genes. Proc. Natl. Acad. Sci. 99: 5890–
Tanaka, T., Tateno, Y., and Gojobori, T. 2005. Evolution of vitamin B6
(pyridoxine) metabolism by gain and loss of genes. Mol. Biol. Evol.
Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A.,
Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova,
N.D., and Koonin, E.V. 2001. The COG database: New developments
in phylogenetic classification of proteins from complete genomes.
Nucleic Acids Res. 29: 22–28.
van der Meer, J.R., Werlen, C., Nishino, S.F., and Spain, J.C. 1998.
Evolution of a pathway for chlorobenzene metabolism leads to
natural attenuation in contaminated groundwater. Appl. Environ.
Microbiol. 64: 4185–4193.
van Nimwegen, E. 2003. Scaling laws in the functional content of
genomes. Trends Genet. 19: 479–484.
Veitia, R.A. 2002. Exploring the etiology of haploinsufficiency. Bioessays
von Mering, C., Zdobnov, E.M., Tsoka, S., Ciccarelli, F.D., Pereira-Leal,
J.B., Ouzounis, C.A., and Bork, P. 2003a. Genome evolution reveals
biochemical networks and functional modules. Proc. Natl. Acad. Sci.
von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., and Snel,
B. 2003b. STRING: A database of predicted functional associations
between proteins. Nucleic Acids Res. 31: 258–261.
von Mering, C., Jensen, L.J., Snel, B., Hooper, S.D., Krupp, M.,
Foglierini, M., Jouffre, N., Huynen, M.A., and Bork, P. 2005. STRING:
Known and predicted protein–protein associations, integrated and
transferred across organisms. Nucleic Acids Res. 33: D433–D437.
Wolf, Y.I., Rogozin, I.B., Kondrashov, A.S., and Koonin, E.V. 2001.
Genome alignment, evolution of prokaryotic genome organization,
and prediction of gene function using genomic context. Genome Res.
Wu, J., Kasif, S., and DeLisi, C. 2003. Identification of functional links
between genes using phylogenetic profiles. Bioinformatics
Yang, J., Lusk, R., and Li, W.H. 2003. Organismal complexity, protein
complexity, and gene duplicability. Proc. Natl. Acad. Sci.
Received June 24, 2005; accepted in revised form November 30, 2005.
Campillos et al.
382 Genome Research