JOURNAL OF BACTERIOLOGY, Nov. 2010, p. 5788–5798
Copyright © 2010, American Society for Microbiology. All Rights Reserved.
Vol. 192, No. 21
Unexpected Abundance of Coenzyme F420-Dependent Enzymes in
Mycobacterium tuberculosis and Other Actinobacteria?†
Jeremy D. Selengut‡ and Daniel H. Haft‡*
J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, Maryland 20850
Received 13 April 2010/Accepted 23 July 2010
Regimens targeting Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), require long courses
of treatment and a combination of three or more drugs. An increase in drug-resistant strains of M. tuberculosis
demonstrates the need for additional TB-specific drugs. A notable feature of M. tuberculosis is coenzyme F420,
which is distributed sporadically and sparsely among prokaryotes. This distribution allows for comparative
genomics-based investigations. Phylogenetic profiling (comparison of differential gene content) based on F420
biosynthesis nominated many actinobacterial proteins as candidate F420-dependent enzymes. Three such
families dominated the results: the luciferase-like monooxygenase (LLM), pyridoxamine 5?-phosphate oxidase
(PPOX), and deazaflavin-dependent nitroreductase (DDN) families. The DDN family was determined to be
limited to F420-producing species. The LLM and PPOX families were observed in F420-producing species as
well as species lacking F420but were particularly numerous in many actinobacterial species, including M.
tuberculosis. Partitioning the LLM and PPOX families based on an organism’s ability to make F420allowed the
application of the SIMBAL (sites inferred by metabolic background assertion labeling) profiling method to
identify F420-correlated subsequences. These regions were found to correspond to flavonoid cofactor binding
sites. Significantly, these results showed that M. tuberculosis carries at least 28 separate F420-dependent
enzymes, most of unknown function, and a paucity of flavin mononucleotide (FMN)-dependent proteins in
these families. While prevalent in mycobacteria, markers of F420biosynthesis appeared to be absent from the
normal human gut flora. These findings suggest that M. tuberculosis relies heavily on coenzyme F420for its
redox reactions. This dependence and the cofactor’s rarity may make F420-related proteins promising drug
Mycobacterium tuberculosis, the causative agent of tubercu-
losis (TB), is an actinobacterium that presents a number of
clinical challenges. For example, due to the high frequency of
drug-resistant mutants, TB antibiotic regimens require long
courses of treatment and a combination of three or more
separate drugs (37). Long courses of combination therapy con-
tribute to noncompliance, which in turn has led to an increase
in the occurrence of multiple-drug-resistant (MDR) and ex-
tensively drug-resistant (XDR) tuberculosis (39). There is a
clear need for additional tuberculosis-specific drugs that, in
combination with the current pharmacopeia, can shorten the
course of treatment and increase its effectiveness.
Biological features that are present in mycobacteria but rare
or absent in other organisms are useful targets for treating TB.
For example, mycobacteria have “mycolic” fatty acids present
in their cell walls that distinguish them from all other bacteria.
Four major anti-TB drugs (isoniazid, cycloserine, ethambutol,
and ethionamide) are known to target enzymes involved in the
biosynthesis of the mycobacterial cell wall, and others, such as
pyrazinamide, caprazamycin, and caprolactams, may do so as
well (34, 43).
Similarly, the enzyme cofactor F420(Fig. 1), a deazaflavin
analog of flavin mononucleotide (FMN), is absent from hu-
mans but distributed sporadically and sparsely among pro-
karyotes and observed universally in the mycobacteria (includ-
ing being encoded by the reduced genome of Mycobacterium
leprae). It has been suggested that the reduced F420(F420H2)
produced by the action of the F420-dependent glucose-6-phos-
phate dehydrogenase (4) under aerobic conditions may protect
mycobacterial cells from macrophage-generated NO2(31).
Moreover, Rv3547 from M. tuberculosis uses reduced F420in
the activation of the NO2-containing antitubercular drug can-
didate PA-824 (40). Overall, F420may confer an advantage to
mycobacteria in anaerobic environments because it has a lower
redox potential than NADP (5).
The sporadic phylogenetic distribution of F420provides an
opportunity for the application of comparative genomic meth-
ods. We introduced partial phylogenetic profiling (PPP) to
efficiently discover protein families codistributed with such pat-
terns of biological traits (21). Unlike earlier profiling methods,
PPP does not require the prior accurate determination of pro-
tein families for success. This method is well suited to the
identification of F420-dependent enzyme families, which may
have distributions only partially spanning the entire profile.
PPP analysis is further augmented by SIMBAL (sites inferred
by metabolic background assertion labeling) (36). This tech-
nique can pinpoint sites discriminating F420binding from FMN
binding and subsequently identify additional correlated genes
that are undetectable by PPP.
Here we demonstrate how comparative genomics, namely,
profiling, can strongly associate sets of genes in a particular
* Corresponding author. Mailing address: J. Craig Venter Institute,
9704 Medical Center Drive, Rockville, MD 20850. Phone: (301) 795-
7952. Fax: (301) 795-7060. E-mail: email@example.com.
† Supplemental material for this article may be found at http://jb
‡ J.D.S. and D.H.H. contributed equally to this work.
?Published ahead of print on 30 July 2010.
genome of interest with a biologically important trait, gener-
ating numerous experimentally testable hypotheses. This anal-
ysis has indicated a pervasive and presumably important fea-
ture of M. tuberculosis and its lifestyle. The lack of F420-based
reactions in humans or their associated gut flora and their
prevalence in M. tuberculosis may provide another drug target.
MATERIALS AND METHODS
Data sets. A total of 1,451 bacterial and archaeal complete and draft genomes
were downloaded from the NCBI on 1 June 2009. An all-versus-all BLAST
calculation was performed on the protein coding sequences from all of these
genomes, and the E value results were stored as a flat file.
Construction of a coenzyme F420utilization phylogenetic profile. We applied
hidden Markov models (HMMs) for coenzyme F420biosynthetic genes (Table 1),
using the HMMER 2.0 package (biosequence analysis using profile hidden
Markov models [http://hmmer.janelia.org]), to the genomic data set (described
above) and identified those genomes where genes for at least four of five F420
biosynthesis components were detected. Genomes containing these F420biosyn-
thetic genes were placed in the positive branch of the profile. Those containing
no detectable components were placed in the negative branch. Organisms with
intermediate content were assigned as follows. A set of cyanobacterial genomes
encoded only the Fosynthase subunits (CofGH) and were placed in the negative
set, since they are known to make only Fo, which is used as a distinct cofactor
(14). A separate set of genomes contained CofCDE genes but lacked the Fo
synthase subunit gene (e.g., Xylanimonas cellulosilytica DSM 15894). PPP anal-
yses (data not shown, but essentially the same as the results shown in Table S2
in the supplemental material) indicated that these genomes encode the same
families of F420-dependent enzymes as those with the complete biosynthesis
pathway. This suggests that either a separate, nonhomologous Fosynthase or an
Foimporter system exists in these organisms. Despite failing to identify candi-
dates for these potential functions, these genomes were included in the positive
branch of the profile. A final group of genomes contained only close homologs
of the terminal enzyme CofE (e.g., Jonesia denitrificans DSM 20603). As de-
scribed above, these genomes carried genes for putative F420binding enzymes
(sometimes adjacent to cofE, as in sequence EEN39210.1 from Cellulomonas
flavigena DSM 20109) as well as an ABC-type transporter gene often found
adjacent to the cofE gene (in J. denitrificans sequences EEJ12725.1, -6.1, and
-7.1), and therefore a candidate Fotransporter gene. Accordingly, these genomes
were also counted among the F420-positive set.
PPP. Phylogenetic profiling (PP) methods allow the discovery of biological
features (usually protein families) codistributed with traits observed in defined
sets of organisms. PP methods often are limited by a dependence on protein
families of a fixed size, whether preconstructed or generated at run time by use
of static parameters. Such preconstructed families are often too large or too
small for the comparison at hand. PPP differs from other profiling methods in its
independence from precalculated protein clusters. In PPP, clusters of increasing
size are generated on the fly for each query protein by selecting increasingly
permissive sequence similarity cutoffs. These clusters are then compared with the
reference profile under study; PPP returns both an optimized score and the
correspondingly optimized protein cluster. The statistical procedures utilized by
PPP may be tuned to enable the identification of proteins that have distributions
that are strict subsets of a reference profile, as F420-dependent enzymes are
expected to be compared to an F420utilization profile.
Here we identified F420-utilizing enzymes in mycobacteria based on the ob-
served pattern (profile) of the F420biosynthesis trait. A phylogenetic profile was
determined from the above results, with genomes in the positive branch repre-
sented by 1’s and those in the negative branch represented by 0’s. Profiling was
carried out as previously reported (21). For each gene in the genome of interest,
a ranked list of all BLAST matches was prepared. Processing from the strongest
hit downwards, the genomic source of each hit was examined. The genome of
interest itself and all but the first hit to other genomes were ignored. Addition-
ally, a taxonomic filter was applied to limit sample bias caused by a superabun-
dance of genomes from certain species in the data set: once a hit from a certain
species was found, all other hits to that species were ignored. As each new
genome was identified (whether marked as a “1” or a “0” in the profile), the
likelihood, based on the proportion (P) of 1’s in the profile (or a manually set
value), that the total observed number of 1’s thus far might have occurred by
chance was calculated using the binomial equation. As discussed above, P was set
to 0.3 for this study, to accentuate families whose members form strict subsets of
the positive branch. As the list was processed, the lowest likelihood (most
significant) score achieved was recorded for each gene in the genome. After all
genes in the genome were processed, those having achieved the most significant
overall scores were reported in rank order.
Protein family HMM building. Sequences suggested by PPP to be linked to
F420metabolism were considered for construction of defining protein families
based on full-length multiple sequence alignments. Those proteins not directly
associated with known steps in F420biosynthesis belonged mostly to the lucif-
erase-like monooxygenase (LLM), pyridoxamine 5?-phosphate oxidase (PPOX),
and deazaflavin-dependent nitroreductase (DDN) domain families, each of
which is known to contain members binding at least one type of flavonoid
cofactor. Sets of candidate orthologs occurring in multiple F420-biosynthesizing
species, as identified by bidirectional best-hit relationships and essentially full-
length homology, were aligned first by Muscle (13), manually inspected, and then
trimmed and realigned as necessary to produce seed alignments for HMM
construction. Criteria such as the proper alignment of known conserved motifs,
stability of alignments to realignment after trimming or after the addition or
removal of sequences, and the absence of especially long branch lengths in
computed phylogenetic trees were assessed manually to judge alignment quality
and to select the most accurately constructed alignments. If Muscle alignments
were deemed suspect, alignment using Clustal W (24) was attempted. Deep
separation of clades in neighbor-joining (NJ) trees, differences in domain archi-
tecture, and sharp drops in sequence similarity scores were taken as indicators of
splits between distinct subfamilies to guide HMM construction. Cutoff scores
were set for the resulting HMMs to select only full-length homologs from F420-
synthesizing species. Families with no more than one hit per genome were
FIG. 1. Flavonoid cofactor structures. (A) FMN. (B) Coenzyme F420. Note that coenzyme F420typically contains 5 to 7 side chain glutamate
residues in mycobacterial species (3).
TABLE 1. Components of the coenzyme F420biosynthetic pathway
and their TIGRFAMs HMMs
VOL. 192, 2010COENZYME F420-DEPENDENT ENZYMES IN MYCOBACTERIA5789
designated putative equivalogs, that is, were hypothesized to share a specific
function even though that function is unknown. HMMs identifying multiple
proteins per genome were designated instead as subfamily models. All models
described in this work were included in TIGRFAMs (35), release 9.0 (http://www
SIMBAL. SIMBAL can be used to gain insight into the molecular mechanisms
underlying the associations between proteins and traits that are discovered by
PPP. In this study, families that include both FMN-binding and F420-binding
members may illuminate molecular details of cofactor binding sites and provide
convenient classifiers for the identification of F420-dependent enzymes. True and
false training sets were constructed by partitioning members of the LLM (Pfam
accession no. PF00296) and PPOX (Pfam accession no. PF01243) families based
on their genomes of origin, using the same profile as that for PPP. This method
generates a noisy true set containing a population of false-positive results, i.e.,
FMN-binding LLM or PPOX proteins present in F420-producing organisms
(since FMN is universal). In the case of the LLM study, a cleaner true set was
generated by collecting all members of the presumptive F420-specific families
modeled by the HMMs in Table 2.
For each query protein, subsequences of the indicated lengths were generated
by scanning appropriately sized windows over the entire sequence. SIMBAL was
carried out as previously published (36). Each subsequence was used as a BLAST
query versus a combined database of the true and false training sets. In a manner
analogous to that for PPP, subsequences were scored based on the preponder-
ance of hits to the true partition, using the binomial equation. SIMBAL scores
are reported as log likelihood values. Longer subsequences may include multiple
short regions of correlation and thus will tend to have significant scores. Locally
important subsequence regions will stand out above this background and may
even outscore the full-length sequence. This increasing background can be re-
moved, and the localization of the SIMBAL signal accentuated, by dividing the
SIMBAL score by the window length.
Determination of the set of prokaryotic coenzyme F420uti-
lizers. In order to apply profiling methods to the study of F420
in mycobacteria, we constructed a profile of F420utilization
over all bacterial genomes. The most straightforward marker
of F420utilization is the biosynthesis of F420itself. The pathway
for F420biosynthesis has been elucidated and proceeds from
two compounds, 5-amino-6-ribitylamino-2,4(1H,3H)-pyri-
midinedione and 4-hydroxyphenylpyruvate. These compounds
are intermediaries in the biosyntheses of FMN and tyrosine,
respectively, and are condensed by Fosynthase: two subunits,
designated CofG and CofH, are often observed, but these are
fused into one protein (FbiC) in mycobacteria (19). The ma-
ture cofactor is subsequently produced by two enzymes, CofD
(FbiA) and CofE (FbiB). These attach a phospho-L-lactate
group and a variable number of glutamate residues (generally
five in the case of actinobacteria). The activated precursor of
the phospho-L-lactate group, lactyl-2-diphospho-5?-guanosine
(LPPG), is made by the CofC protein.
To accurately detect these genes in genomes, we utilized
HMMs. Equivalogs are protein families with conserved molec-
ular function since their last common ancestor (20). Equivalog
HMMs have been built for each F420biosynthetic enzyme and
are included within the TIGRFAMs library (35). These in turn
have been combined to form a Genome Property (35) for F420
biosynthesis which can be used to conveniently identify the
presence of this set of genes in any prokaryotic genome (Ta-
We applied these HMMs to a set of all 1,451 bacterial and
archaeal genomes available (at the time of this work) from the
NCBI in order to determine which contained the essential
components of the F420biosynthesis trait. Details of this pro-
cedure are presented in Materials and Methods. We identified
11% of all species in our sample as F420producers (including
about 50% of all actinobacterial species [see Table S1 in the
Identification of candidate F420-associated proteins by phy-
logenetic profiling. Using PPP on this F420producer profile in
its default mode, the program provided results optimized to
identify proteins present in all F420-producing species by set-
ting a probability value equal to the 11% of genomes in the
positive branch of the profile. Here we were interested in
families of F420-dependent enzymes which may be present only
in a subset of the positive branch but only very rarely, if at all,
in the negative branch. This makes sense for particular F420-
dependent enzymes, whose functions might not be required in
TABLE 2. Phylogenetic distribution analysis of subfamilies within the PF00296 LLM family
M. tuberculosis member(s)
(no. found in M. smegmatis)
Member(s) identified by PPP
(no. found in M. smegmatis)
Likely F420-dependent families
Ac, Ch, Pr
Ac, Ch, Pr, Ar
Ac, Ch, Pr
Ac, Ch, Pr, Ba
Rv0791c, Rv0940c, Rv0953c, Rv2161c, Rv3079c (13)
Rv2161c, Rv3079c (8)
OthersRv0132c, Rv2951c, Rv1936, Rv3618 (21) Rv0132c, Rv2951c (7)
Likely FMN-dependent families
aAc, Actinobacteria; Ch, Chloroflexi; Pr, Proteobacteria; Ba, Bacteroides; Ar, Archaea.
5790SELENGUT AND HAFTJ. BACTERIOL.
every F420-producing organism. By forcing the probability vari-
able to have an arbitrary value of ?11%, which had the effect
of penalizing hits to the negative branch more severely, higher
scores were obtained for families strictly limited to subsets of
the positive branch of the profile, even if those families are far
from universal across those genomes.
We first applied the PPP algorithm, using the above profile
and a P value of 30%, to the genome of Mycobacterium smeg-
matis MC2 155. Aside from the biosynthetic genes for CofC
(MSMEG_2393), CofD (fbiA; MSMEG_1830), CofE (fbiB;
MSMEG_1829), and the fusion protein CofGH (fbiC;
MSMEG_5126), 62 of the top 63 hits are members of only
three homology families (see Table S1 in the supplemental
material), each of which is known to include flavin cofactor
binding proteins. Primary among these, with 44 prominent hits
and the widest species distribution, was the LLM (Pfam model
PF00296) family. (This model was recently updated to version
13 in release 24.0 of Pfam, correcting serious deficiencies of
sensitivity; it requires version 3.0 of HMMER to run, a version
of which is available for download from http://hmmer.janelia
.org/.) Several LLM family proteins, particularly those found in
archaea, are known to be F420-dependent enzymes (1, 2, 4),
and one has been characterized as requiring FAD (8), but
most, including luciferase itself, utilize FMN (6, 22, 26, 41).
Second most prominent among the PPP results, with nine hits,
was the DDN family (represented by TIGR00026 and InterPro
accession no. IPR004378). The only characterized member of
this family is F420dependent (40). Third, with five prominent
members, was the PPOX family (represented by Pfam model
PF01243). There are no known F420-dependent PPOX family
members, although several FMN-dependent enzymes are
known: PPOX (also called PdxH ), an FMN binding pro-
tein of unknown function (23), and PhzG, an enzyme involved
in the biosynthesis of phenazine (28). Tellingly, a crystal struc-
ture of the M. tuberculosis Rv1155 protein, a member of the
PPOX family, was noted to have a much altered flavin binding
site, consistent with its apparent lack of FMN binding; it was
hypothesized that a novel binding capability evolved in this and
related enzymes (7).
The DDN family is observed exclusively in F420-producing
species. The other two families showed an excess in the num-
ber of genes found per genome for F420producers among
actinobacteria (Fig. 2) and in general. These data suggest that
in these families and in these genomes, a notable expansion has
occurred. F420, with its lower reduction potential, gives access
to an increased range of chemical transformations; presum-
ably, the large number of family members in some genomes
indicates a diverse group of available reactions. These three
families account for 32 genes in M. tuberculosis and 123 genes
in M. smegmatis. Although PPP was able to identify certain
members of the LLM and PPOX families as F420correlated, it
remains to be determined how many and which of the many
uncharacterized members of these families are F420binding.
LLM family. Although PPP is clearly a very efficient method
for determining candidates for association with a profile, it can
both fail to identify all such related proteins and indicate oth-
ers that are correlated for indirect or fortuitous reasons. PPP
relies on the ordering (not the strength) of BLAST hits and
generally ignores the relatedness of species in the BLAST list
to the query genome. We attempted to explore the prepon-
derance of LLM homologs among the top hits by PPP for the
F420biosynthesis profile by producing estimated phylogenetic
trees. Sequence diversity within the LLM family is so great that
it is not clear that a tree produced from its multiple sequence
alignment is sufficiently trustworthy to estimate a molecular
phylogenetic tree of the entire family. Nevertheless, such a tree
shows numerous clades of sequences limited to F420-producing
species (see Fig. S1 in the supplemental material). This tree
also shows nearly all of the F420-limited clades to be descen-
dant from a single ancestor (with possibly two instances of
reversion), but despite the parsimony and appeal of this inter-
pretation, we do not believe the tree in and of itself represents
strong evidence in support of that model. Consequently, we
constructed alignments from smaller sets of sequences, includ-
ing each of the LLM proteins identified in M. smegmatis and
M. tuberculosis, particularly those identified by PPP. Sets for
alignment were obtained by virtue of pairwise BLAST homol-
ogy to these Mycobacterium LLM proteins. In this way, we
identified 13 subfamilies (clades) that are each distributed
more widely than the genus Mycobacteria and are observed
only in F420-producing organisms. Each of these has been
modeled separately by an HMM included within the
TIGRFAMs library (Table 2) (35). Together, these models
identified 48 F420-dependent members of the total of 86 LLM
family genes in M. smegmatis and 13 of 17 LLM family genes in
M. tuberculosis. Included in this set of families is one,
TIGR03554, encompassing the characterized F420-dependent
glucose-6-phosphate dehydrogenase (29, 30). (Note that this
clade is in turn part of a larger “subfamily” [of LLM proteins]
that is modeled by TIGR03557. It includes a number of other
FIG. 2. Average numbers of putative F420/FMN-binding protein
family genes in actinobacterial species. The presence of F420bio-
synthesis components is correlated with large expansions of these
VOL. 192, 2010 COENZYME F420-DEPENDENT ENZYMES IN MYCOBACTERIA 5791
clades, all but one of which are limited to F420producers. This
more broadly distributed clade, modeled by TIGR03885, ap-
pears to have reverted to FMN binding or otherwise changed
its cofactor specificity and is not analyzed further here.)
Where the patterns made by these families over all F420-
producing organisms are sporadic or punctate in nature, they
are indicative of either lateral gene transfer or widespread
gene loss. For instance, TIGR03559 represents an LLM
enzyme of unknown function that is found in 90% of all
F420-producing actinobacterial species, once in the marine
gammaproteobacterium HTCC2143, and twice in the alpha-
proteobacterium Phenylobacterium zucineum HLK1. Either
these genes (and the associated F420biosynthesis genes) were
selectively retained while being lost from the vast majority of
lineages derived from the last common ancestor of the acti-
nobacteria and proteobacteria, or they were laterally trans-
ferred to the two proteobacterial species from some other
F420-producing (likely actinobacterial) strain. Indeed, in P.
zucineum, these two LLM family genes encode the only can-
didate F420-dependent enzymes, and they are observed in an
operon with the F420biosynthesis genes. It is clear that in the
rare instance where families contain members with such clear
evidence of en bloc lateral gene transfer of both enzyme and
F420biosynthetic machinery genes, the connection to F420can
be regarded as particularly strong.
At the other end of the spectrum are LLM family members
that are not identified by PPP and that belong to clades, in-
cluding organisms (both within and outside the Actinobacteria)
that do not produce F420. It is reasonable to suppose that these
proteins are not F420related. Several of these clades contain
members that have been characterized as FMN-utilizing en-
zymes, such as bacterial luciferase itself (16) and the FMN-
dependent nitrilotriacetate monooxygenase (44). An addi-
tional two models have been built to represent clades with
members in the Actinobacteria that include many non-F420-
utilizing species and identify 19 additional M. smegmatis LLM
genes (Table 2). Interestingly, no genes of this type are ob-
served in M. tuberculosis, perhaps indicating a greater special-
ization toward F420-dependent enzymes in M. tuberculosis than
in M. smegmatis.
In the middle ground are the genes of most interest for the
purposes of antimycobacterial drug design, i.e., those whose
phylogenetic distribution is limited to Mycobacterium or only a
few closely related species, including Mycobacterium spp. For
such genes, PPP may give a strong score if the closest relatives
are F420dependent, even if that relationship is distant and even
if they had undergone a switch to utilization of a different
cofactor. Alternatively, the degree of sequence divergence may
be so great as to obscure the PPP detection of correlation for
an authentic F420-dependent gene. There are 19 such genes in
M. smegmatis and 5 in M. tuberculosis (Table 2).
We previously observed that sequence similarity correlations
may be localized to discrete short sequence regions corre-
sponding to functional sites related to the physical nature of a
profile and developed a method, SIMBAL, to identify such
subsequences (36). In the current case, we can reasonably
expect to identify sequence motifs correlated with the binding
of the F420cofactor in particular that are distinct from motifs
for the binding of FMN or other flavonoids. In order to apply
the SIMBAL method, the LLM family was partitioned such
that all genes from non-F420-utilizing genomes were placed in
the negative branch and all members of the F420producer-
restricted families (Table 2) were in the positive branch of the
partition. To add additional rigor and to remove family-specific
signals, when SIMBAL was applied to genes from each of
these families, that particular family was removed from the
positive branch set.
The F420-dependent glucose-6-phosphate dehydrogenase
(FGD1) from M. tuberculosis has been crystallized with F420
bound, and its structure has been reported (4). SIMBAL ap-
plied to the sequence of this enzyme yielded a result that is
typical of genes in these F420-restricted families (Fig. 3). Each
of the subsequences corresponding to the hot spots in this plot
either include residues in the structure which make direct
contact with the cofactor or are directly adjacent to the short
resolved part of the polyglutamate side chain in an extended
surface cleft that is a likely binding site for the full-length
polyglutamate (Fig. 4A). Twenty of 24 residues that make up
the cofactor binding pocket in this structure were identified by
SIMBAL, and 19 of these were centrally located in those sub-
sequence regions. The four residues that were missed were
located at the cofactor-substrate interface. The putative poly-
glutamate binding cleft is lined by a number of lysine and
arginine residues and lacks negatively charged residues, as
would be expected of a region binding a polyanion (Fig. 4B).
This structure also includes a molecule of citrate bound in the
substrate binding pocket. It is notable that 13 of 15 residues
making up the substrate binding pocket are outside the regions
identified by SIMBAL. Very similar results were obtained
when SIMBAL was applied to the archaeal F420-dependent
LLM enzymes Mer and Adf (1, 2).
Four genes in M. tuberculosis, Rv0132c, Rv1936, Rv2951c,
and Rv3618 (and 18 more in M. smegmatis), were not covered
by our models because their only close relatives are restricted
to the mycobacteria (Table 2). Running SIMBAL on these may
help to resolve whether they are F420dependent. Rv0132c and
Rv2951c showed patterns consistent with other putative F420-
dependent LLM enzymes, Rv3618 clearly did not, and Rv1936
showed a mixed result, including a very strong third peak, but
with all other peaks either weak or missing (Fig. 5). In all, we
concluded that M. tuberculosis contains 14 separate F420-de-
pendent members of the LLM family, and possibly only two
FMN-dependent members. M. smegmatis, in contrast, has 19
sequences not covered by the F420models, and only 5 of these
show strong SIMBAL results (data not shown).
PPOX family. PPP identified five members of the PPOX
family in M. smegmatis (see Table S2 in the supplemental
material) and four in M. tuberculosis as likely F420-dependent
enzymes. Sixteen and four additional family members (Table
3) were present in the M. smegmatis and M. tuberculosis ge-
nomes, respectively, including the known FMN-dependent
protein PdxH (also known as PPOX itself). Like the case with
the LLM family, we identified five clades within the larger
PPOX family that are restricted to F420-producing organisms.
These are represented by the TIGR03618, TIGR03666,
TIGR03667, TIGR03668, and TIGR04023 models. Aside from
the PdxH gene itself, all eight of the M. tuberculosis PPOX
genes were found within these clades and therefore encode
putative F420-dependent enzymes. M. smegmatis carries seven
5792SELENGUT AND HAFT J. BACTERIOL.
PPOX genes that fall outside these clades and might be F420or
The product of the M. tuberculosis gene Rv1155 (which was
identified by PPP as an F420-correlated gene) has had its crystal
structure solved (44). Although the authors identified Rv1155
(by homology) as a pyridoxamine 5?-phosphate oxidase and
reported structures with FMN and PLP bound (separately),
these two ligands appear to bind at the same site, which is
FIG. 3. SIMBAL analysis of the M. tuberculosis FGD1 gene (Rv0407) versus a partition of the LLM family (PF00296) based on the ability (positive
branch) of source genomes to produce cofactor F420. Closely related homologs of the TIGR03557 family were removed from the positive training set to
accentuate features common to all F420-dependent LLM family sequences. (Top left) Raw SIMBAL data (log likelihood scores). (Top right) Normalized
data are represented as SIMBAL scores divided by the sequence window length in order to identify prominent localized regions (colored circles).
(Bottom) Sequence of Rv0407. High-scoring subsequence regions are indicated in colors corresponding to the circles in the top right panel. Underlined
residues make contacts (?3.5 Å) with the F420cofactor in the crystal structure under Protein Data Bank (PDB) accession no. 3B4Y (4), starred residues
make contacts with the citrate molecule bound in the putative substrate cavity, and dotted residues contribute to a positively charged patch in an extended
surface cleft adjacent to the end of the resolved part of the polyglutamate cofactor side chain (see Fig. 4B).
FIG. 4. (A) SIMBAL-identified residues making up the F420-binding surface of M. tuberculosis FGD1 (PDB accession no. 3B4Y ). Peak 1, SDH,
is in contact with the carboxylate oxygens of the deazaflavin terminal ring (cyan). Peak 2, SVLT, includes the nonproline cis-peptide bond between serine
and valine (2) and comprises the “bulge” behind the deazaflavin central ring (red). Peak 3, GTGE, is in contact with the phospholactate component of
the side chain (yellow). Peak 4, FKER, is in contact with the single glutamate resolved by the crystal structure and forms a long adjacent surface cleft
(blue). Peak 5, AAGGPAV, contacts the deazaflavin hydroxyl (obscured), the side chain phospholactate, and the carboxylate of the resolved side chain
glutamate and also forms the putative polyglutamate binding cleft (green). (B) A patch of positively charged residues (blue) lines the poly-Glu binding
cleft and is surrounded by a more distant ring of negatively charged residues (red). F420is indicated as a stick model (green ? carbon, red ? oxygen,
blue ? nitrogen, and orange ? phosphorus). Molecular models were visualized with MacPyMOL (http://pymol.org/).
VOL. 192, 2010COENZYME F420-DEPENDENT ENZYMES IN MYCOBACTERIA5793
inconsistent with the catalytic mechanism of PPOX enzymes
(12). Furthermore, a concentration of 5 mM was required in
order to achieve FMN binding, which is inconsistent with the
observed micromolar affinity of PdxH for its FMN cofactor (9).
The crystal structure of Escherichia coli PdxH, however, does
show FMN bound in roughly the same position, confirming at
least the location of the Rv1155 flavonoid cofactor binding site,
if not the identity of the cofactor. Based on the analysis de-
scribed below, we suggest that F420may bind Rv1155 and that
this could be examined by fairly routine bench work.
We applied SIMBAL to the sequence of Rv1155, partition-
ing the PPOX family based on the ability of the source ge-
nomes to produce F420. Two very prominent peaks were ob-
served (Fig. 6), corresponding to the cleft where FMN (and
PLP) binds in the crystal structures (Fig. 7A). The SIMBAL-
identified region is clearly larger than the FMN molecule and
appears to be consistent with the binding of a cofactor, such as
F420, with a much longer side chain (Fig. 1). Indeed, the struc-
ture of FMN-dependent E. coli PdxH has a very different shape
in this region, with a closed-off cofactor binding cleft well
matched to the size of FMN?s side chain and inconsistent with
the extended polyglutamate found in F420(Fig. 7B). When the
sequence in E. coli PdxH homologous to SIMBAL peak 1 was
examined in the structure, it was found that it contains an
arginine (R67) involved in the coordination of the terminal
phosphate of the FMN side chain. This arginine is invariant in
TABLE 3. Evidence for F420association in PPOX family genes from Mycobacterium smegmatis and M. tuberculosise
Gene or sequence
SIMBAL peak 1SIMBAL peak 2
M. smegmatis genes
Negative control sequencesd
Mycobacterial PdxH (MSMEG_5675; Rv2607)
Escherichia coli PdxH
PdxH model TIGR00558 consensus
Non-F420-producing species PPOX family consensus
aGenes identified among the top 63 PPP hits versus an F420producer profile (see Table S1 in the supplemental material).
bTIGRFAMs HMMs built to represent clades of PPOX genes consisting only of genes from F420-producing organisms.
cTwelve-mer sequences corresponding to the centers of the two prominent SIMBAL peaks. Residues in bold are observed primarily in the highest-scoring sequences,
and underlined residues are those typical of low-scoring and known FMN-binding sequences.
dThese sequences are presumed to correspond to non-F420-binding enzymes. PdxH is a characterized FMN-binding enzyme. The non-F420-binding PPOX family
consensus was constructed from a multiple sequence alignment of all PPOX family proteins from non-F420-producing species in our data set. Conserved residues are
in uppercase; consensus residues are in lowercase.
eItems in bold represent items in evidence of F420binding for the respective gene.
FIG. 5. SIMBAL analysis of four M. tuberculosis LLM family proteins
not found by the positive branch models versus a partition of the LLM
(PF00296) family based on the TIGRFAMs-modeled clades of F420-pro-
ducing organisms (Table 2) (positive branch) and all members from non-
F420-producing organisms (negative branch). Window-length-normalized
SIMBAL data are plotted as the maximum scores observed over a range
of subsequence window lengths, in essence tracing the highest contour
across a triangle plot like the one shown in Fig. 3 (top right).
5794SELENGUT AND HAFTJ. BACTERIOL.
the family of PdxH enzymes described by the TIGR00558
model. This residue is instead a serine in the Rv1155 protein,
a change consistent with the requirements for binding the F420
cofactor, with its increased size and decreased charge around
the corresponding phosphate group. Additionally, the subse-
quence corresponding to peak 1 of Rv1155 (TIKHDGRPQ
LSN) contains two residues (underlined) which make up the
surface of the cleft proximal to the end of the bound FMN
molecule. Relative to PdxH, these represent a shift to a more
positively charged environment (D3K and Y3Q), consistent
with binding of F420’s negatively charged polyglutamate side
Application of SIMBAL to M. smegmatis and M. tuberculosis
PPOX proteins (Table 3) showed a consistent pattern with
respect to the presence of a prominent peak 1 consisting of a
shift away from the arginine in the phosphate binding pocket
and a shift toward positively charged and amide residues in the
putative polyglutamate binding cleft. The presence of a dom-
inant peak 2 appears limited to those members of the
TIGR03618 family containing sequences similar to NL-
RRDPR, which are distinctly different from the corresponding
PdxH subsequence, QIENNPR. Despite the weakness of peak
2 for many of the tested sequences that had a strong peak 1,
there was a clear trend toward an increase in positively charged
residues (Table 3), again consistent with the binding of poly-
glutamate. One of the M. tuberculosis genes (Rv1875) yielded
equivocal results (despite being a member of the F420produc-
er-limited TIGR03618 family), but six proteins could be judged
very likely F420binders by SIMBAL. For M. smegmatis, 14
genes fall into the likely F420-dependent category, while there
appear to be at least five other FMN-dependent enzymes in
addition to PdxH.
Amount of F420-dependent metabolism in normal human
gut flora. Actinobacteria include many species that produce
F420, many of which, such as Mycobacteria and Frankia species
(see Table S2 in the supplemental material), have large num-
bers of apparent F420-dependent enzymes. According to recent
metagenomic studies (32, 42), the human gut harbors a large
number of actinobacterial lineages. In the profiling studies
presented here, we utilized the CofE gene as a marker of F420
biosynthesis, even when it was found in the absence of other
known F420genes, a decision based on evidence from the genes
surrounding the orphan CofE gene in those few genomes (see
Materials and Methods). Even if incorrect, the inclusion of
those genomes (see Table S2 in the supplemental material),
which were numerically insignificant, would have had little
effect on the profiling results. Postanalysis, we can now look
back at those genomes and confirm that all also carry putative
F420-dependent genes encoding members of the LLM, PPOX,
and/or DDN families, as detected by the HMMs we built (Ta-
bles 2 and 3). For instance, Jonesia denitrificans DSM 20603
carries 8 LLM family genes, 4 of which were identified by
FIG. 6. SIMBAL analysis of the M. tuberculosis PPOX family gene
Rv2991, using an F420biosynthesis-based partition, indicates two
strongly correlated regions. Window-length-normalized SIMBAL data
are plotted as the maximum scores observed over a range of subse-
quence window lengths.
FIG. 7. (A) Crystal structure of the (dimeric) M. tuberculosis PPOX family protein Rv1155, with (monomeric) FMN bound (blue), showing the
locations of SIMBAL peaks 1 and 2 (yellow and orange) (see Fig. 6 and Table 3). FMN binds only weakly to Rv1155, which is a likely F420-binding
enzyme. Extending downwards from the short FMN side chain is an extended cleft which appears complementary to the much longer F420
polyglutamate side chain. (B) The FMN-dependent E. coli PPOX family enzyme PdxH is shown with the homologous regions colored. PdxH binds
FMN roughly 1,000 times tighter than Rv1155 and contains a pocket into which the FMN side chain fits snugly, while no extended cleft is apparent.
Molecular models were visualized with MacPyMOL (http://pymol.org/).
VOL. 192, 2010COENZYME F420-DEPENDENT ENZYMES IN MYCOBACTERIA5795
HMMs for putative F420-dependent gene families, and
SIMBAL indicated that an additional 2 genes are likely F420
dependent. This is in contrast to Arthrobacter chlorophenolicus
A6, which lacks the CofE gene and any other F420biosynthesis
genes and carries 15 LLM family genes, none of which were
hits with the F420-dependent HMMs. Thus, we feel justified in
regarding the CofE gene as a perfect marker, with a 1:1 cor-
respondence to the F420biosynthesis trait.
Searches of the CofE HMM (TIGR01916) versus shotgun
sequencing reads from human gut microbiome samples total-
ing 1,149 Mb of sequence (42) identified only one read, from
Methanobrevibacter smithii, an archaeal organism. In contrast,
searches using an HMM (TIGR00468) for the PheS gene,
which is present in single copy in all bacterial and archaeal
genomes, yielded hits at an 860-fold higher rate. A similar
calculation for a metagenomic sample from the Global Ocean
Survey (33) yielded a 1:50 ratio of F420producers to nonpro-
ducers, suggesting that in the human gut flora, F420producers
are rare relative to those in the environment. Similarly,
searches of draft genomes for 30 of the most abundant organ-
isms in the human gut microbiome (32) yielded no hits to the
CofE model (see Table S3 in the supplemental material).
We have demonstrated, through the application of a se-
ries of comparative genomic methods, the presence of sig-
nificant numbers of putative coenzyme F420-dependent en-
zymes in Mycobacteria tuberculosis and other related
mycobacteria. In M. tuberculosis, this likely amounts to 28
different enzymes, including 14 from the LLM family, 7 from
the PPOX family, and 7 from the DDN family. Prior to this
work, it was not appreciated that the PPOX family includes
F420-dependent members, and the extent to which the LLM
family, especially in the Actinobacteria, is dominated by
them was not known. As its name suggests, the DDN family
was known to include the deazaflavin-dependent nitroreduc-
tase Rv3547 (active on an antimycobacterial prodrug );
we have shown that this family is observed solely in F420-
The prominent expansion of these families in many acti-
nobacterial lineages suggests a degree of importance to their
lifestyle. We suggest that there may be an advantage in
targeting the distinctive biosynthesis of the F420cofactor as
a means of developing new antimycobacterial drugs for com-
bination therapy. Coenzyme F420is not utilized by human
cells, and a preliminary search of available human gut met-
agenomic data suggested that F420-producing members of
the “normal” flora are rare, implying that such an approach
would specifically target mycobacterial cells.
It was recently shown that mutations in an F420biosyn-
thesis gene result in hypersensitivity of M. tuberculosis to
acidified nitrite, a model of macrophage-induced reactive
nitrogen intermediates (10), and it is speculated that the
purpose of FGD may be to provide reduced F420to react
with these oxidants, protecting M. tuberculosis from macro-
phages (31). Little else is currently known about the biolog-
ical role of these enzymes in the Actinobacteria. One excep-
tion is the LLM family protein Rv2951c, which has been
shown to be a key ketoreductase in the biosynthesis of the
mycobacterial diacyl phthiocerol virulence factors (27, 38).
We speculate that some prior attempts to clone and express
genes from these families may have failed to produce active
products due to the lack of F420production in E. coli. There
have been reports, for instance, of “colorless” proteins iso-
lated when colored FMN binding proteins were expected
(7). Expression of these proteins in F420-producing back-
grounds such as M. smegmatis or provision with F420during
purification may yield improved results.
The methods applied here began with the generation of a
phylogenetic profile for the biosynthesis of cofactor F420
over a large set of prokaryotic genomes. PPP led in short
order to the identification of the three dominant families of
F420-dependent enzymes in actinobacteria. Since two of
these families were distributed in both F420producers and
nonproducers and also included known FMN-dependent
members, further dissection of the families into F420pro-
ducer clades was carried out using tree-building methods.
Finally, structural insight into F420binding and detailed
classification were achieved by the application of SIMBAL
(aided by available crystal structures).
The SIMBAL method exploits an algorithm related to
phylogenetic profiling in order to mine statistical signals
from large collections of genome sequence data. By collect-
ing members of a protein family for which the bound cofac-
tor is variable and partitioning the family according to the
cofactor biosynthesis properties of each species of origin,
SIMBAL is able to bypass the requirement for a training set
based on actual experimental data and to substitute a much
larger and more informative data set based on computed
The proteins studied here are enzymes that have, presum-
ably, at least two different types of specificity: one for the
substrate and one for the cofactor. Because the training sets
were constructed based on cofactor biosynthesis rather than
substrate availability, it became possible to discover se-
quence regions that tend to “predict” binding to one par-
ticular cofactor. We used SIMBAL previously to probe
transporter substrate specificity, and also substrate specific-
ity, for a family of protein modification methylases (36), but
this is our first use of it to probe enzymatic cofactor binding
specificity and to develop evidence to make high-confidence
Mapping the short sequences that earned SIMBAL’s best
scores to solved protein crystal structures demonstrated that
the method found both known F420binding sites in the LLM
family and flavonoid binding (presumably F420binding) sites
in the PPOX family. Close examination of SIMBAL hot
spots for F420 binding proteins with solved structures
showed that these hot spots represent not only sequences
that bind the cofactor but also sites representing key differ-
ences that distinguish F420from FMN.
PPP generated the strong hypothesis that high-scoring
proteins bind F420, but any number of alternative explana-
tions are possible, e.g., a single F420-dependent pathway
makes a novel substrate available, and PPP is identifying
collections of enzymes that use that substrate. The strong
confirmation by SIMBAL that PPP identified true F420bind-
ing proteins led to the novel, and experimentally testable,
finding that the actual numbers of F420-dependent enzymes
5796 SELENGUT AND HAFTJ. BACTERIOL.
in the major human pathogen Mycobacterium tuberculosis
and numerous other actinobacteria are quite large. Inter-
estingly, the closely related pathogen M. smegmatis not only
displays a greater number of genes for each of the F420-
utilizing protein families, but a much larger proportion of
these genes appear to be FMN dependent. One possible
interpretation is that M. tuberculosis has committed, to a
much higher degree, to an F420-dominated lifestyle for its
We thank Laura Sheahan and Chuck Merryman for their expert
editorial advice in the preparation of the manuscript.
This work was supported by NIH/NHGRI grant R01-HG004881 and
NSF grant DBI-0445826.
1. Aufhammer, S. W., E. Warkentin, H. Berk, S. Shima, R. K. Thauer, and U.
Ermler. 2004. Coenzyme binding in F420-dependent secondary alcohol de-
hydrogenase, a member of the bacterial luciferase family. Structure 12:361–
2. Aufhammer, S. W., E. Warkentin, U. Ermler, C. H. Hagemeier, R. K.
Thauer, and S. Shima. 2005. Crystal structure of methylenetetrahydrometh-
anopterin reductase (Mer) in complex with coenzyme F420: architecture of
the F420/FMN binding site of enzymes within the nonprolyl cis-peptide
containing bacterial luciferase family. Protein Sci. 14:1840–1849.
3. Bair, T. B., D. W. Isabelle, and L. Daniels. 2001. Structures of coenzyme
F(420) in Mycobacterium species. Arch. Microbiol. 176:37–43.
4. Bashiri, G., C. J. Squire, N. J. Moreland, and E. N. Baker. 2008. Crystal
structures of F420-dependent glucose-6-phosphate dehydrogenase FGD1
involved in the activation of the anti-tuberculosis drug candidate PA-824
reveal the basis of coenzyme and substrate binding. J. Biol. Chem. 283:
5. Boshoff, H. I., and C. E. Barry III. 2005. Tuberculosis—metabolism and
respiration in the absence of growth. Nat. Rev. Microbiol. 3:70–80.
6. Campbell, Z. T., A. Weichsel, W. R. Montfort, and T. O. Baldwin. 2009.
Crystal structure of the bacterial luciferase/flavin complex provides insight
into the function of the beta subunit. Biochemistry 48:6085–6094.
7. Canaan, S., G. Sulzenbacher, V. Roig-Zamboni, L. Scappuccini-Calvo, F.
Frassinetti, D. Maurin, C. Cambillau, and Y. Bourne. 2005. Crystal structure
of the conserved hypothetical protein Rv1155 from Mycobacterium tuber-
culosis. FEBS Lett. 579:215–221.
8. Chaiyen, P., C. Suadee, and P. Wilairat. 2001. A novel two-protein compo-
nent flavoprotein hydroxylase. Eur. J. Biochem. 268:5550–5561.
9. Churchich, J. E. 1984. Brain pyridoxine-5-phosphate oxidase. A dimeric
enzyme containing one FMN site. Eur. J. Biochem. 138:327–332.
10. Darwin, K. H., S. Ehrt, J. C. Gutierrez-Ramos, N. Weich, and C. F. Nathan.
2003. The proteasome of Mycobacterium tuberculosis is required for resis-
tance to nitric oxide. Science 302:1963–1966.
11. Di Salvo, M., E. Yang, G. Zhao, M. E. Winkler, and V. Schirch. 1998.
Expression, purification, and characterization of recombinant Escherichia
coli pyridoxine 5?-phosphate oxidase. Protein Expr. Purif. 13:349–356.
12. di Salvo, M. L., M. K. Safo, F. N. Musayev, F. Bossa, and V. Schirch. 2003.
Structure and mechanism of Escherichia coli pyridoxine 5?-phosphate oxi-
dase. Biochim. Biophys. Acta 1647:76–82.
13. Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with
reduced time and space complexity. BMC Bioinformatics 5:113.
14. Eker, A. P., P. Kooiman, J. K. Hessels, and A. Yasui. 1990. DNA photore-
activating enzyme from the cyanobacterium Anacystis nidulans. J. Biol.
15. Forouhar, F., M. Abashidze, H. Xu, L. L. Grochowski, J. Seetharaman, M.
Hussain, A. Kuzin, Y. Chen, W. Zhou, R. Xiao, T. B. Acton, G. T. Monte-
lione, A. Galinier, R. H. White, and L. Tong. 2008. Molecular insights into
the biosynthesis of the F420 coenzyme. J. Biol. Chem. 283:11832–11840.
16. Gerlo, E., and E. Schram. 1971. Bioluminescence assay of reduced pyridine
and flavine nucleotides with bacterial luciferase. Arch. Int. Physiol. Biochim.
17. Graham, D. E., H. Xu, and R. H. White. 2003. Identification of the 7,8-
didemethyl-8-hydroxy-5-deazariboflavin synthase required for coenzyme
F(420) biosynthesis. Arch. Microbiol. 180:455–464.
18. Grochowski, L. L., H. Xu, and R. H. White. 2008. Identification and char-
acterization of the 2-phospho-L-lactate guanylyltransferase involved in coen-
zyme F420 biosynthesis. Biochemistry 47:3033–3037.
19. Guerra-Lopez, D., L. Daniels, and M. Rawat. 2007. Mycobacterium smeg-
matis mc2 155 fbiC and MSMEG_2392 are involved in triphenylmethane dye
decolorization and coenzyme F420 biosynthesis. Microbiology 153:2724–
20. Haft, D. H., B. J. Loftus, D. L. Richardson, F. Yang, J. A. Eisen, I. T.
Paulsen, and O. White. 2001. TIGRFAMs: a protein family resource for the
functional identification of proteins. Nucleic Acids Res. 29:41–43.
21. Haft, D. H., I. T. Paulsen, N. Ward, and J. D. Selengut. 2006. Exopolysac-
charide-associated protein sorting in environmental organisms: the PEP-
CTERM/EpsH system. Application of a novel phylogenetic profiling heuris-
tic. BMC Biol. 4:29.
22. Kertesz, M. A., K. Schmidt-Larbig, and T. Wuest. 1999. A novel reduced
flavin mononucleotide-dependent methanesulfonate sulfonatase encoded by
the sulfur-regulated msu operon of Pseudomonas aeruginosa. J. Bacteriol.
23. Kitamura, M., S. Kojima, K. Ogasawara, T. Nakaya, T. Sagara, K. Niki, K.
Miura, H. Akutsu, and I. Kumagai. 1994. Novel FMN-binding protein from
Desulfovibrio vulgaris (Miyazaki F). Cloning and expression of its gene in
Escherichia coli. J. Biol. Chem. 269:5566–5573.
24. Larkin, M. A., G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan,
H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thomp-
son, T. J. Gibson, and D. G. Higgins. 2007. Clustal W and Clustal X version
2.0. Bioinformatics 23:2947–2948.
25. Li, H., M. Graupner, H. Xu, and R. H. White. 2003. CofE catalyzes the
addition of two glutamates to F420-0 in F420 coenzyme biosynthesis in
Methanococcus jannaschii. Biochemistry 42:9771–9778.
26. Moore, S. A., and M. N. James. 1995. Structural refinement of the nonfluo-
rescent flavoprotein from Photobacterium leiognathi at 1.60 A resolution. J.
Mol. Biol. 249:195–214.
27. Onwueme, K. C., C. J. Vos, J. Zurita, C. E. Soll, and L. E. Quadri. 2005.
Identification of phthiodiolone ketoreductase, an enzyme required for pro-
duction of mycobacterial diacyl phthiocerol virulence factors. J. Bacteriol.
28. Parsons, J. F., K. Calabrese, E. Eisenstein, and J. E. Ladner. 2004. Structure
of the phenazine biosynthesis enzyme PhzG. Acta Crystallogr. D Biol. Crys-
29. Purwantini, E., and L. Daniels. 1998. Molecular analysis of the gene encod-
ing F420-dependent glucose-6-phosphate dehydrogenase from Mycobacte-
rium smegmatis. J. Bacteriol. 180:2212–2219.
30. Purwantini, E., T. P. Gillis, and L. Daniels. 1997. Presence of F420-depen-
dent glucose-6-phosphate dehydrogenase in Mycobacterium and Nocardia
species, but absence from Streptomyces and Corynebacterium species and
methanogenic Archaea. FEMS Microbiol. Lett. 146:129–134.
31. Purwantini, E., and B. Mukhopadhyay. 2009. Conversion of NO2 to NO by
reduced coenzyme F420 protects mycobacteria from nitrosative damage.
Proc. Natl. Acad. Sci. U. S. A. 106:6333–6338.
32. Qin, J., R. Li, J. Raes, M. Arumugam, K. S. Burgdorf, C. Manichanh, T.
Nielsen, N. Pons, F. Levenez, T. Yamada, D. R. Mende, J. Li, J. Xu, S. Li, D.
Li, J. Cao, B. Wang, H. Liang, H. Zheng, Y. Xie, J. Tap, P. Lepage, M.
Bertalan, J. M. Batto, T. Hansen, D. Le Paslier, A. Linneberg, H. B. Nielsen,
E. Pelletier, P. Renault, T. Sicheritz-Ponten, K. Turner, H. Zhu, C. Yu, M.
Jian, Y. Zhou, Y. Li, X. Zhang, N. Qin, H. Yang, J. Wang, S. Brunak, J. Dore,
F. Guarner, K. Kristiansen, O. Pedersen, J. Parkhill, J. Weissenbach, P.
Bork, and S. D. Ehrlich. 2010. A human gut microbial gene catalogue
established by metagenomic sequencing. Nature 464:59–65.
33. Rusch, D. B., A. L. Halpern, G. Sutton, K. B. Heidelberg, S. Williamson, S.
Yooseph, D. Wu, J. A. Eisen, J. M. Hoffman, K. Remington, K. Beeson, B.
Tran, H. Smith, H. Baden-Tillson, C. Stewart, J. Thorpe, J. Freeman, C.
Andrews-Pfannkoch, J. E. Venter, K. Li, S. Kravitz, J. F. Heidelberg, T.
Utterback, Y. H. Rogers, L. I. Falcon, V. Souza, G. Bonilla-Rosso, L. E.
Eguiarte, D. M. Karl, S. Sathyendranath, T. Platt, E. Bermingham, V.
Gallardo, G. Tamayo-Castillo, M. R. Ferrari, R. L. Strausberg, K. Nealson,
R. Friedman, M. Frazier, and J. C. Venter. 2007. The Sorcerer II global
ocean sampling expedition: northwest Atlantic through eastern tropical Pa-
cific. PLoS Biol. 5:e77.
34. Schroeder, E. K., N. de Souza, D. S. Santos, J. S. Blanchard, and L. A. Basso.
2002. Drugs that inhibit mycolic acid biosynthesis in Mycobacterium tuber-
culosis. Curr. Pharm. Biotechnol. 3:197–225.
35. Selengut, J. D., D. H. Haft, T. Davidsen, A. Ganapathy, M. Gwinn-Giglio,
W. C. Nelson, A. R. Richter, and O. White. 2007. TIGRFAMs and Genome
Properties: tools for the assignment of molecular function and biological
process in prokaryotic genomes. Nucleic Acids Res. 35:D260–D264.
36. Selengut, J. D., D. B. Rusch, and D. H. Haft. 2010. Sites inferred by meta-
bolic background assertion labeling (SIMBAL): adapting the partial phylo-
genetic profiling algorithm to scan sequences for signatures that predict
protein function. BMC Bioinformatics 11:52.
37. Shi, R., N. Itagaki, and I. Sugawara. 2007. Overview of anti-tuberculosis
(TB) drugs and their resistance mechanisms. Mini Rev. Med. Chem. 7:1177–
38. Simeone, R., P. Constant, W. Malaga, C. Guilhot, M. Daffe, and C. Chalut.
2007. Molecular dissection of the biosynthetic relationship between phthiocerol
and phthiodiolone dimycocerosates and their critical role in the virulence and
permeability of Mycobacterium tuberculosis. FEBS J. 274:1957–1969.
39. Singh, J. A., R. Upshur, and N. Padayatchi. 2007. XDR-TB in South Africa:
no time for denial or complacency. PLoS Med. 4:e50.
40. Singh, R., U. Manjunatha, H. I. Boshoff, Y. H. Ha, P. Niyomrattanakit, R.
VOL. 192, 2010COENZYME F420-DEPENDENT ENZYMES IN MYCOBACTERIA 5797
Ledwidge, C. S. Dowd, I. Y. Lee, P. Kim, L. Zhang, S. Kang, T. H. Keller, J.
Jiricek, and C. E. Barry III. 2008. PA-824 kills nonreplicating Mycobacterium
tuberculosis by intracellular NO release. Science 322:1392–1395.
41. Thibaut, D., N. Ratet, D. Bisch, D. Faucher, L. Debussche, and F. Blanche.
1995. Purification of the two-enzyme system catalyzing the oxidation of the
D-proline residue of pristinamycin IIB during the last step of pristinamycin
IIA biosynthesis. J. Bacteriol. 177:5199–5205.
42. Turnbaugh, P. J., M. Hamady, T. Yatsunenko, B. L. Cantarel, A. Duncan,
R. E. Ley, M. L. Sogin, W. J. Jones, B. A. Roe, J. P. Affourtit, M. Egholm, B.
Henrissat, A. C. Heath, R. Knight, and J. I. Gordon. 2009. A core gut
microbiome in obese and lean twins. Nature 457:480–484.
43. Winn, M., R. J. Goss, K. Kimura, and T. D. Bugg. 2010. Antimicrobial nucle-
oside antibiotics targeting cell wall assembly: recent advances in structure-func-
tion studies and nucleoside biosynthesis. Nat. Prod. Rep. 27:279–304.
44. Xu, Y., M. W. Mortimer, T. S. Fisher, M. L. Kahn, F. J. Brockman, and L. Xun.
1997. Cloning, sequencing, and analysis of a gene cluster from Chelatobacter
heintzii ATCC 29600 encoding nitrilotriacetate monooxygenase and NADH:
flavin mononucleotide oxidoreductase. J. Bacteriol. 179:1112–1116.
5798SELENGUT AND HAFTJ. BACTERIOL.