Recurring cluster and operon assembly for Phenylacetate degradation genes.

Fergal Martin, James McInerney

Journal Article: BMC Evolutionary Biology (impact factor: 4.29). 03/2009; 9(1):36. DOI: 10.1186/1471-2148-9-36

Abstract

ABSTRACT: BACKGROUND: A large number of theories have been advanced to explain why genes involved in the same biochemical processes are often co-located in genomes. Most of these theories have been dismissed because empirical data do not match the expectations of the models. In this work we test the hypothesis that cluster formation is most likely due to a selective pressure to gradually co-localise protein products and that operon formation is not an inevitable conclusion of the process. RESULTS: We have selected an exemplar well-characterised biochemical pathway, the phenylacetate degradation pathway, and we show that its complex history is only compatible with a model where a selective advantage accrues from moving genes closer together. This selective pressure is likely to be reasonably weak and only twice in our dataset of 102 genomes do we see independent formation of a complete cluster containing all the catabolic genes in the pathway. Additionally, de novo clustering of genes clearly occurs repeatedly, even though recombination should result in the random dispersal of such genes in their respective genomes. Interspecies gene transfer has frequently replaced in situ copies of genes resulting in clusters that have similar content but very different evolutionary histories. CONCLUSIONS: Our model for cluster formation in prokaryotes, therefore, consists of a two-stage selection process. The first stage is selection to move genes closer together, either because of macromolecular crowding, chromatin relaxation or transcriptional regulation pressure. This proximity opportunity sets up a separate selection for co-transcription.

Source: PubMed

Comments on this publication

ResearchGate members can add comments. Sign up now and post your comment!

Similar publications

Page 1
 
Page 2
 
Page 3
 
Page 4
 
Page 5
 
End of preview.
Page 1
ral
ssBioMed CentBMC Evolutionary Biology
Open AcceResearch article
Recurring cluster and operon assembly for Phenylacetate
degradation genes
Fergal J Martin and James O McInerney*
Address: Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
Email: Fergal J Martin - fergalmartin@gmail.com; James O McInerney* - james.o.mcinerney@nuim.ie
* Corresponding author
Abstract
Background: A large number of theories have been advanced to explain why genes involved in
the same biochemical processes are often co-located in genomes. Most of these theories have been
dismissed because empirical data do not match the expectations of the models. In this work we
test the hypothesis that cluster formation is most likely due to a selective pressure to gradually co-
localise protein products and that operon formation is not an inevitable conclusion of the process.
Results: We have selected an exemplar well-characterised biochemical pathway, the
phenylacetate degradation pathway, and we show that its complex history is only compatible with
a model where a selective advantage accrues from moving genes closer together. This selective
pressure is likely to be reasonably weak and only twice in our dataset of 102 genomes do we see
independent formation of a complete cluster containing all the catabolic genes in the pathway.
Additionally, de novo clustering of genes clearly occurs repeatedly, even though recombination
should result in the random dispersal of such genes in their respective genomes. Interspecies gene
transfer has frequently replaced in situ copies of genes resulting in clusters that have similar content
but very different evolutionary histories.
Conclusion: Our model for cluster formation in prokaryotes, therefore, consists of a two-stage
selection process. The first stage is selection to move genes closer together, either because of
macromolecular crowding, chromatin relaxation or transcriptional regulation pressure. This
proximity opportunity sets up a separate selection for co-transcription.
Background
The aerobic degradation of phenylacetic acid in E. coli
K12 occurs via a series of five reactions, involving eleven
catabolic paa genes [1], two of which are distant para-
logs, with the rest showing no sequence homology (fig-
ure 1). The first step of the pathway is catalysed by the
product of the paaK gene, a CoA ligase that catalyses the
conversion of phenylacetate into phenylacetyl-CoA. The
converts phenylacetyl-CoA into 2'-OH-phenylacetyl-
CoA. The third step, where 2'-OH-phenylacetyl-CoA is
converted to 3-hydroxyadipyl-CoA, is jointly catalysed
by paaJ, paaG and paaZ. The fourth step sees the conver-
sion of 3-hydroxyadipyl-CoA by paaF and paaH to β-
ketoadipyl-CoA. The final step is catalysed by paaJ,
which converts β-ketoadipyl-CoA to succinyl-CoA,
thereby connecting phenylacetate degradation with the
Published: 10 February 2009
BMC Evolutionary Biology 2009, 9:36 doi:10.1186/1471-2148-9-36
Received: 6 August 2008
Accepted: 10 February 2009
This article is available from: http://www.biomedcentral.com/1471-2148/9/36
© 2009 Martin and McInerney; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 11
(page number not for citation purposes)
second step involves a ring-oxygenase complex formed
from the gene products of paaABCDE. This heteromer
TCA cycle [1]. In addition to these 11 catabolic genes, E.
coli K12 has 3 other paa genes, two of which regulate the
Page 2
BMC Evolutionary Biology 2009, 9:36 http://www.biomedcentral.com/1471-2148/9/36
pathway (paaX and paaY), the other has an unknown
function (paaI). Other E. coli strains such as E. coli O157
and E. coli O73 do not share homologs to all 11 catabolic
genes, with no homologs found for paaA, paaB, paaC,
paaD, paaE, paaG and paaK in either of these two
genomes. However, previous studies have identified
other bacteria as having homologs to paa genes, such as
Pseudomonas putida U [2]. In addition to these 14 genes
found in E. coli K12, a further three genes associated with
the pathway were examined in this study. These were
paaL and paaM, coding for a phenylacetic acid trans-
porter protein and a phenylacetic acid specific porin
respectivelty, and tetR, a transcription factor.
The genes involved in phenylacetate degradation in E. coli
K12 and P. putida U are located in clusters [2,3]. In this
study we define a gene cluster as a set of functionally
related genes located in close physical proximity in a
genome. The term operon refers to a set of genes under
common regulatory control, that are transcribed into a
single mRNA and are all co-directional in orientation on
the chromosome. An operon, therefore, is a more struc-
tured instance of a cluster. All operons by definition are
also clusters, but not all clusters are operons. A gene clus-
ter can consist entirely of independently transcribed genes
or multiple operon structures or combinations of both.
Clusters and operons are observed both in prokaryotes
and eukaryotes, however, the system of operon processing
in eukaryotes involves mRNA splicing, and is different to
the system in prokaryotes [4,5].
tronic operon is a paradigm of prokaryotic genomic
biology [11]. However, the process of operon formation
remains poorly understood and the precise link between
clustering and operon formation has never been fully
explained, though several models exist.
The simplest model is the Natal Model where clusters
form via tandem gene duplications [12]. However, many
operons contain genes that are not homologous, but have
some kind of functional link. As a general mechanism of
operon formation, the Natal Model is inadequate.
The Fisher Model postulates that clustering of genes into
operons offers the benefit that random recombination
events will tend to separate co-adapted genes less often if
they are clustered together. This model has suffered crit-
icism recently because of observations of orthologous
replacement in situ of operon genes [13,14] which sug-
gests that the primary reason for operon formation is
unlikely to be the preservation of co-adapted alleles.
The Co-regulation Model [15] states that operons are
formed in order to facilitate the production of gene
products in equal measures. This theory only accounts
for operon maintenance. In order for an operon to spon-
taneously form, rare, highly specific recombination
events must occur. However, it has recently been asserted
that operon formation is driven by co-regulation [16].
This assertion is largely due to the more complex regula-
tory regions associated with operons in some gamma-
Schematic for the degradation of phenylacetate, including genes involved and the cluster and operon structures in E. coli K12 and P. putida KT2440Figure 1
Schematic for the degradation of phenylacetate, including genes involved and the cluster and operon struc-
tures in E. coli K12 and P. putida KT2440.Page 2 of 11
(page number not for citation purposes)
Clustering of genes involved in the same metabolic path-
way is a widespread phenomenon [6-10], and the polycis-
proteobacteria compared with genes that are not in oper-
ons. However, this study only focused on operons and
not on the broader issue of cluster formation.
Page 3
BMC Evolutionary Biology 2009, 9:36 http://www.biomedcentral.com/1471-2148/9/36
The Selfish Operon Model (SOM) suggests that operons
in prokaryotes are in some respect like viruses or trans-
posons and their formation facilitates their horizontal
gene transfer (HGT) [12]. The formation of an operon is
therefore of no direct benefit to the organism but it means
that the fitness of gene cluster itself is enhanced. An exten-
sion of the SOM posits that if HGT is indeed the main rea-
son for operon formation, non-essential genes are more
likely to be in operons/clusters than essential genes [12].
However, Pál and Hurst [17] have provided evidence that
essential genes are more likely to be found in operons and
clusters than non-essential genes, thereby presenting a sig-
nificant problem to the SOM.
Lastly, a recent proposition has been made that gene clus-
tering is due to the relative difficulty of protein movement
through the cellular matrix [18]. This model, known as
the Protein Immobility Model (PIM), suggests that
because transcription and translation are coupled in
prokaryotes, the resulting physical proximity of enzymes
minimizes the steady state level of reaction step interme-
diates thereby saving energy and reducing the amount of
protein that needs to be produced. The PIM has not been
tested using empirical data, but has been supported by
computer simulation. An observation that indirectly sup-
ports the PIM is the study by Elowitz et al [19] that shows
that protein diffusion is slower through the cytoplasm
than through water, is adversely affected by the size of the
protein, and is also reduced when expression levels are
higher.
Because paa genes show a patchy phylogenetic distribu-
tion and previously observed paa clusters have diverse
structures that appear to be independent of the species
phylogeny, we felt that this pathway was important to
study from an evolutionary standpoint. Indeed, phenyla-
cetate degradation has previously been identified as a
potential model for understanding the evolution of meta-
bolic pathways [20]. By examining the gene content of
previously studied paa clusters a total of 17 genes are asso-
ciated with the pathway including catabolic genes, regula-
tory genes, a transporter and an exporter. In this study we
identify new paa gene clusters and examine the structure
and distribution of paa gene clusters with respect to their
evolution and implications for models of both cluster and
operon formation.
Methods
Homolog identification
We implemented an iterative strategy for locating
homologs to all 17 genes encoding proteins involved in
the degradation of phenylacetate. Initially, the genomes
for taxa containing known paa gene clusters, previously
with known paa gene clusters can be found in the supple-
mentary information http://bioinf.nuim.ie/supplemen
tary/clusters/. We used a BLAST-based [22] similarity
search strategy where we extracted all the known paa genes
from these initial genomes and used them in order to find
homologs in other completed bacterial genomes. These
additional bacterial genomes were downloaded from
GenBank, bringing the total number of genomes in the
dataset to 102.
We generated alignments using ClustalW 1.81 [23] for
genes where we found multiple homologs. The exceptions
were the paaL and paaM, genes that were only found only
in P. putida KT2440. This gave a total of 15 initial align-
ments. These alignments were then used as input for PSI-
BLAST using the default parameters [22], with the larger
dataset of 102 bacterial genomes as the input database.
This gave us a comprehensive list of homologs.
We wrote a number of PERL scripts (available on request)
to cross-reference the result files generated from the PSI-
BLAST searches and identify clusters of genes from this
pathway that were co-located on their respective genomes.
If two genes found in the result files generated by the PSI-
BLAST searches came from the same genome and had no
more than five intervening genes between them, then
such genes were considered to be an initial linked pair. All
initial linked pairs were identified and then manually
merged if they overlapped. In this way, clusters of various
sizes were identified.
Construction of phylogenetic trees
Each of the 15 gene families were used to build phyloge-
netic trees. The amino acid sequences of all homologs
were extracted from their genome files and each family
was aligned using Muscle v3.5 [24] with all settings at
their default values. Model selection was performed on
the alignments using ModelGenerator [25] and maximum
likelihood phylogenetic trees were constructed based on
the selected models using Phyml v3.0 [26]. Confidence in
phylogenetic hypotheses was assessed using the bootstrap
resampling approach [27].
Visualisation of clusters on phylogenetic trees and operon
identification
For each gene family, we wished to visualise both the rela-
tionships among members of the family and their cluster
context simultaneously. Visualisation of each gene cluster
was achieved by extracting the necessary genomic location
information for the cluster from the corresponding Gen-
Bank file. This was carried out automatically using PERL
scripts. Once this information was parsed from the Gen-
Bank file, the corresponding cluster was drawn using thePage 3 of 11
(page number not for citation purposes)
reported in the literature, were downloaded from Gen-
Bank [21]. The list of genomes used in the initial search
postscript language (Adobe Systems, San José, California).
If, for instance, a cluster contained the genes paaA and
Page 4
BMC Evolutionary Biology 2009, 9:36 http://www.biomedcentral.com/1471-2148/9/36paaB, then this cluster will appear on the paaA tree at the
phylogenetic position of the paaA gene and on the paaB
tree at the phylogenetic position of the paaB gene. Oper-
ons were identified using the MicrobesOnline database
http://www.microbesonline.org. Know operons were
crosschecked with the predictions to access quality of the
predictions in the database.
Results
In order to test whether a cluster has been independently
assembled more than once, we can examine the phyloge-
netic trees of both cluster and non-cluster homologs. If a
cluster has originated once and has never been subse-
quently perturbed, then for every gene in the cluster the
corresponding phylogenetic tree will include a clade con-
taining all the species in which the cluster is present.
Given the prevalence of HGT [28] this clade does not have
to correspond to any recognized phylogenetic group. The
only relationships that are of importance are the relation-
ships of the genes.
Variation in cluster and operon content and context
Table 1 shows a summary of all 1,311 homologs identi-
fied via the PSI-BLAST searches, in terms of the frequency
with which they were found in a paa gene cluster and if
found in a cluster, how often they were in an operon. Fig-
ure 2 contains a complete list of all observed unique oper-
ons in our dataset. These data together detail the
considerable variation across genes in terms of their ten-
dency to be found in a cluster or operon.
In the cases of paaA, B, C and D the genes were always
found in an operon and obviously therefore, always in a
cluster. For paaE, in 19 out of 23 instances it was found
with other paa genes. paaI was always found in a cluster
(19 occasions) and the majority of times (16 out of 19),
in an operon. Similarly paaX and tetR were found rela-
tively rarely (13 and 14 times respectively) and were usu-
ally found in clusters (11 out of 13 for paaX, 12 out of 14
for tetR) and 7 times each, they were in operons. paaG was
found 23 times, 17 times in a cluster and 16 out of those
17 times it was found in an operon. paaK is found 37
times and in slightly more than 50% of the instances (21
of 37), it is in a cluster and the majority of times that it is
in a cluster it is in an operon (19 of 21). The remaining
five genes paaF, paaH, paaJ, paaY and paaZ are more widely
distributed and the majority of times these homologs are
not found in clusters or operons. The gene that is least
likely to be found in an operon is paaY, which is only
found in an operon in 3 out of 84 instances. Interestingly,
apart from paaA, B, C, D and E where being in a cluster
automatically means being in an operon, most other
genes are found in an operon the majority of the times
for 14 out of 22 instances of the gene being in a cluster it
is not in an operon.
Figure 2 shows the set of unique operons involving two or
more paa genes found in all identified clusters. The most
striking aspect of this analysis is the sheer diversity in
terms of size, gene content and gene order among the
operons. A total of 33 different operons were identified,
ranging in size from 2 to 11 genes. Out of the 33 unique
operons only two display identical gene content, one
being paaXY, the other paaYX. This diversity is not surpris-
ing from a mathematical standpoint, given that 17 genes
were examined in the study. Even operons consisting of
only 2 genes there are 289 possible permutations. Aside
from the paaABCDE operon, which is clearly under strong
selection (all 25 clusters form operons), no particular
operon composition or configuration is dominant. This
result seems to indicate that operon formation (apart
from paaABCDE) is not dependent on the composition of
the genes that are present. Operons seem to form, simply
when members of the pathway are present and no single
operon composition or order is obligatory.
Analysis of gene clusters containing all 11 catabolic paa
genes
In order to establish how operons and clusters grow we
chose to focus on the largest clusters. We wished to ana-
lyse whether for large clusters there was selection to keep
co-adapted alleles together. We identified five clusters in
the dataset that were almost complete and were present in
genomes that were not thought to be each others' closest
relatives as judged using phylogenetic supertree methods
based on completed genomes [29]. These included the
clusters found in E. coli, P. putida, Rhodoccoccus sp., Nocar-
dia farcinica and Corynebacterium efficiens. The evolution-
ary history of these clusters was examined in detail:
phylogenetic trees and additional data are available as
supplementary information http://bioinf.nuim.ie/supple
mentary/clusters/.
Figure 1 shows the operon structures observed in E. coli
and P. putida. In E. coli K12, all fourteen genes involved in
the pathway are clustered together and the cluster is bro-
ken into three operons [3]. paaABCDEFGHIJK are present
in one operon, paaXY in another and the paaZ gene is
transcribed by itself.
Superficially, the cluster in P. putida has high levels of sim-
ilarity to the cluster in E. coli K12 with simple rearrange-
ments of the order of blocks of genes accounting for the
majority of the observed differences, at first glance (see
figure 1). In P. putida the gene cluster is arranged in five
operons [3] with paaABCDE being in one operon andPage 4 of 11
(page number not for citation purposes)
they are found in a cluster. The exception is paaZ, where paaFGHIJK being in a second, where both are merged in E.
Page 5
BMC Evolutionary Biology 2009, 9:36 http://www.biomedcentral.com/1471-2148/9/36
coli. paaLM and an unrelated gene are in another operon,
paaYX is in an operon (the order is reversed in E. coli) and
paaZ is transcribed by itself in the cluster. The gene con-
tent difference between the two clusters is the presence of
paaL, a phenylacetic acid transporter, and paaM, a pheny-
lacetic acid specific porin, along with an additional gene
not known to be involved in phenylacetate degradation.
paaL and paaM are only present in P. putida and in none
of the other 102 genomes studied.
ined the phylogenetic trees for all genes in these clusters,
expecting that the individual genes would be each other's
closest relatives or at least reasonably closely related. We
found that indeed for the paaA, C, D, F, G, I, J, K and X
genes the E. coli and the P. putida copies grouped closely
on a phylogenetic tree. Contrastingly, for paaB, E, H, Y
and Z we see support for the separation of the two E. coli
sequences from the P. putida sequences on their respective
phylogenetic trees. This result indicates that orthologous
An exhaustive list of all observed operons in the dataset of 102 genomes examinedFigure 2
An exhaustive list of all observed operons in the dataset of 102 genomes examined. Each arrow represents a gene,
with the name of the gene being given in the legend. I.G. refers to an intervening gene, which is a gene in the cluster that is not
involved in the degradation of phenylacetate.Page 5 of 11
(page number not for citation purposes)
The phylogenetic analyses of the genes in these two clus-
ters reveals a much greater degree of difference. We exam-
gene displacement has replaced a considerable number of
genes in these clusters since the clusters separated from
their common ancestor. Given the compositional similar-
End of preview.
Preview full-text

Science & Research Jobs

Keywords

biochemical processes
 
catabolic genes
 
chromatin relaxation
 
de novo clustering
 
different evolutionary histories
 
empirical data
 
exemplar well-characterised biochemical pathway
 
first stage
 
independent formation
 
Interspecies gene transfer
 
macromolecular crowding
 
models
 
phenylacetate degradation pathway
 
proximity opportunity sets
 
random dispersal
 
respective genomes
 
selective advantage accrues
 
similar content
 
situ copies
 
two-stage selection process