Identification of full genes and proteins of MCM9, a novel,
vertebrate-specific member of the MCM2–8 protein family
Malik Lutzmann, Domenico Maiorano, Marcel Méchali⁎
Institute of Human Genetics, CNRS, 141, rue de la Cardonille, 34396 Montpellier Cedex 05, France
Received 5 June 2005; accepted 1 July 2005
Available online 14 October 2005
MCM2–7 proteins are conserved replication factors functioning as DNA helicases during DNA synthesis. MCM8 is another member of this
family, which appears to be specific for higher eukaryotes, as it is absent in worms and yeast. Here we report the complete identification of a novel
member of this family, the MCM9 protein. Like MCM8, MCM9 is only present in the genome of higher eukaryotes. This protein contains an
MCM8-like ATP binding and hydrolysis motif implicated in helicase activity. Strikingly, MCM9 also contains a unique carboxy-terminal domain
which has only weak homology to MCM2–7 and MCM8 but is conserved within MCM9 homologs. We also show that the very recently reported
human MCM9 protein (HsMCM9), which resembles a truncated MCM-like protein missing a part of the MCM2–7 signature domain, is an
incomplete form of the full length HsMCM9 described here. Searching the human genome with either the newly identified human MCM9 or other
MCM protein sequences, we did not detect further additional members of this DNA helicase family and suggest that it is constituted of eight
members, falling into two different groups, one constituted by the MCM2–7 complex and the other by MCM8 and MCM9, which are present only
in higher eukaryotes.
© 2005 Elsevier B.V. All rights reserved.
Keywords: DNA replication; Helicase; ATPase; MCM2–7; Replication origins; Cell cycle; Xenopus
The minichromosome maintenance family (MCM2–7)
comprises a group of six structurally related proteins required to
These proteins function in a complex very likely as a DNA
helicase in promoting the opening of the DNA double helix at
replication origins. ATP binding (Walker A) and hydrolysis
(Walker B) motifs are present in all MCM2–7 members,
embedded in a region which is highly conserved in this protein
family, also known as the MCM2–7 signature domain (Koonin,
1993). Recently, this protein family has expanded by the
identification of a novel member, the MCM8 protein (Gozuacik
et al., 2003; Johnson et al., 2003; Maiorano et al., 2005). Unlike
MCM2–7, which are widely conserved in eukaryotes, MCM8 is
present only in higher multicellular organisms, being absent in
worms and yeast.
Very recently, another new member of the MCM protein
family, the MCM9 protein, has been identified in humans
(Yoshida, 2005). Intriguingly, the predicted human protein
(HsMCM9) has been reported to be a rather short homolog of
MCM proteins (391 aa against an average of 800 aa in MCM2–
7 and MCM8 proteins) and did not contain an ATP hydrolysis
motif which is essential for DNA helicase activity. Alignment of
HsMCM9 with the other MCM proteins suggested us that
HsMCM9 might be a truncated MCM-like protein missing the
carboxy-terminal half of the MCM2–7 signature domain.
Within this truncated domain of HsMCM9, the ATP binding site
was present but the ATP hydrolysis motif was absent. Hence, it
was unclear whether HsMCM9 is a true MCM protein, which
might have a distinct, yet unknown function in DNA synthesis,
or whether HsMCM9 is functionally unrelated to MCM
proteins and contains only one part of the MCM2–7 signature.
Screening the public EST databases, we have identified a
homolog of the HsMCM9 protein in Xenopus laevis. Unlike the
Gene 362 (2005) 51–56
Abbreviations: MCM, minichromosome maintenance; Hs, Homo sapiens;
Xl, Xenopus laevis; Mm, Mus musculus; Gg, Gallus gallus; aa, amino acid;
EST, expressed sequence tag; ATP, adenosine triphosphate; BLAST, basic local
alignment search tool; MCMDC1, minichromosome maintenance domain
containing 1; ASF1, nucleosome assembly factor 1; ORF, open reading frame.
⁎ Corresponding author. Tel.: +33 499619917; fax: +33 499619920.
E-mail address: email@example.com (M. Méchali).
0378-1119/$ - see front matter © 2005 Elsevier B.V. All rights reserved.
reported human MCM9, the Xenopus MCM9 (XlMCM9) is
much longer and contains all the features of MCM proteins, in
particular the entire MCM2–7 signature domain, made of both
Bmotifs (whereas MCM2–7possessa deviant Walker Amotif).
By careful screening of the genome of other vertebrates and
mammals in silico, we have now identified conserved homologs
of the entire MCM9 protein also in chicken, mouse and human,
whose primary structure closely resembles that of the Xenopus
MCM9 protein. Our findings indicate that MCM9 is a canonical
MCM protein also in humans, closer related to human MCM8
than to human MCM2–7 and that the previously reported
HsMCM9 protein represents only a truncated part of the entire
2. Materials and methods
2.1. Identification of MCM9 homologs
To identify homologs of MCM9, database searches were
performed using the program BLAST (http://ncbi.nlm.nih.gov).
Either EST databases or genomic databases for specific
organisms were searched. In addition, ab initio proteins were
generated to identify hypothetical proteins by BLAST with
the GNOMON routine (http://www.ncbi.nlm.nih.gov/genome/
the best self-consistent set of transcripts and protein alignments
in a certain genomic region. The program calculates splice sites
are as close as max. 50 bp having different frames. Since such
short introns are extremely rare, in these cases GNOMON
introduces frame shifts in the sequence to combine multiple
exons, which allows to create consistent transcripts also from
genomic regions containing errors in its sequence. Proteins
predicted by GNOMON were then confirmed by identification
GEO Blast (http://www.ncbi.nlm.nih.gov/projects/geo/).
2.2. Special search for short regions of high homology
To identify smaller regions of homology and to identify
EST sequences within a database, MEGABLAST was also
used (Zhang et al., 2000) which is especially suited for the
identification of shorter, but highly similar sequences in a
given genome database. Megablast was designed to optimize
the alignment of sequences which differ only slightly due
to sequencing errors (http://www.ncbi.nlm.nih.gov/BLAST/
2.3. Protein alignment
Protein alignments were performed using the program
ALIGN (Pearson et al., 1997) or CLUSTALW (Higgins et al.,
1994), available on the server of the Institute of Human
Genetics (IGH), Montpellier or the EMBL-EBI server. Iden-
tification of protein domains and motif searches was performed
either using InterProScan available on the EMBL-EBI server
scanning the InterPro database of protein families, domains
and functional sites (Mulder et al., 2005) or MotifScan, using
the Hits-database from the Swiss Institute of Bioinformatics
3. Results and discussion
3.1. Identification of a Xenopus homolog of reported HsMCM9
To identify a Xenopus homolog of the recently described
human MCM9 protein (Yoshida, 2005), we performed a search
using the BLAST program with the HsMCM9 protein sequence
as a query against the Expressed Sequence Tags (EST) Xenopus
database. Consequently, we identified the cDNA clone
IMAGE6637819 (accession numberBC070720), coding for a
protein of 1143 amino acids derived from a mRNA expressed in
Xenopus eggs. Sequence alignment with XlMCM proteins
shows that the first 835 aa of XlMCM9 share 25.6% identity
with full length XlMCM8 (835 aa) while the identity with
XlMCM2–7 proteins is in average 10.5%. These results
strongly indicate that XlMCM9 is a distinct member of the
MCM family in Xenopus. (Fig. 1A). XlMCM9 shows a strong
identity (73.8%) in its first amino-terminal 391 aa with the
reported HsMCM9 protein (391 aa, Yoshida, 2005). However,
unlike the reported HsMCM9, the XlMCM9 protein contains a
much longer carboxy-terminal extension which shows in its first
part a high homology to the other MCM proteins. Within this
region, XlMCM9 contains an intact MCM2–7 family signature
domain (aa 303–aa 606) harboring Walker A and B motifs.
Interestingly, the Walker A motif of XlMCM9 (GxxGxGKS/T,
aa 354–360), is a canonical consensus site as the one found in
MCM8 proteins, but different from that found in MCM2–7
proteins, which is a deviant consensus site (GxxGxA/KS). We
conclude that this protein is the Xenopus homologue of
HsMCM9. Importantly, the size of the XlMCM9 protein is
bigger than that of other MCM proteins. This is essentially due
to a C-terminal extension after the MCM homology region,
which does not share a clear homology to other MCM proteins
and seems to be a unique feature of this protein (Fig. 1A).
3.2. Identification of MCM9 homologs in the genome of other
vertebrates and mammals
Given that the length of XlMCM9 is much bigger than
reported for the HsMCM9, we investigated whether this was a
special feature of the Xenopus protein or if a longer MCM9
homolog protein could be also identified in other organisms.
Therefore, we performed databank searches using BLASTwith
the XlMCM9 protein sequence against databases of several
organisms. A record (XM_419764) in the chicken genomic
database was found derived from an annotated sequence
(NW_060336, located on chromosome 3 between 61.196 and
61.290 kbp). Within this region, GNOMON predicts a mRNA
coding for a 1169 aa long protein, which could be supported
by multiple EST evidences (e.g. BU378776, BU478046,
BU271359). This hypothetical chicken MCM9 (GgMCM9)
52M. Lutzmann et al. / Gene 362 (2005) 51–56
shares 54.1% identity with XlMCM9. Like XlMCM9, the
putative chicken protein consists of two main parts: an N-
terminal part which is highly conserved (aa 1–626 share 81.2%
identity with XlMCM9) and a C-terminal region which is much
less conserved within MCM proteins as well as in respect to
other MCM9 homologs (Fig. 1B).
Next, searching the mouse genome database, we found two
entries (BAB31238.1 and NP_954598 on chromosome 10
between 53.544 and 53.679 kbp), both coding for unnamed
protein products. The predicted proteins showed 87% identity
with the N-terminus of GgMCM9, and 47.9% identity with the
carboxy-terminus of GgMCM9, respectively. Searching the
mouseESTdatabasewiththese sequences,anumberof partially
overlapping expressed sequences could be identified (e.g.
BY720667, CB244669, CX2225903) and the entire predicted
protein could be re-joined in silico, resulting in a 1291 aa long
hypothetical protein possessing over 60% identity with the
GgMCM9 protein and 47% to XlMCM9. MmMCM9 shares the
general organization of a highly conserved N-terminus and a
MCM9 proteins and other MCM family members (Fig. 1B). In
addition, the first 150 aa of this protein are not present in the
other MCM9 proteins. Importantly, its first 386 aa are 100%
identical with the reported 386 aa containing mouse MCM9
(Yoshida, 2005). These findings suggest that XlMCM9-like
mammals. Therefore, we re-investigated the human databases
using the full XlMCM9 sequence as a query to search for a
complete human MCM9 protein. First, homologs of XlMCM9
were searched in the human genome with BLAST. Two
overlapping sequence entries on chromosome 6 were found
(NT_025741 and NT_086697) revealing the highest alignment
significance. The identified human sequences were coding for
amino acid stretches, which were highly similar to XlMCM9
over the entire length of the protein, strongly suggesting that a
Next, using the GNOMON routine within BLAST (which
corrects artificial frame shifts, see Section 2.1), to generate
ab initio proteins, a HsMCM9 was found at exactly the same
position on chromosome 6, highly similar to XlMCM9. This
hypothetical protein was in its N-terminus 100% identical
to the first 385 aa of the reported 391 aa long HsMCM9
(Yoshida, 2005). Consequently, multiple partially overlapping
Fig. 1. The MCM9 protein is a novel member of the MCM2–8 protein family with a unique C-terminal domain. (A) Cartoon showing the alignment between the
previouslyreportedhumanMCM9protein(HsMCM9(Yoshida,2005),andthe XenopusMCM9protein (XlMCM9) shownas bars.TheMCM2–8 signaturedomainis
shown in grey. The ATP binding (Walker A) and hydrolysis (Walker B) motifs are indicated. Numbers indicate amino acids. (B) Cartoon showing an alignment
between MCM9 homologs in different organisms. Bars represent Xenopus (XlMCM9), chicken (GgMCM9), mouse (MmMCM9) and human (HsMCM9) proteins.
Numbers indicate amino acids.
53 M. Lutzmann et al. / Gene 362 (2005) 51–56
Fig. 2. Alignment of MCM9-like proteins in different organisms. (A) Alignment of the conserved N-terminal half of MCM9 proteins obtained by ClustalW. The
Walker A and B motifs are underlined. Stars indicate identity, while similar amino acids are indicated by a single or double dot. (B) Alignment of MCM9 with
MCM2–8 proteins from Xenopus within the central region of MCM2–8 proteins and the N-terminal of MCM9. Alignment was performed as in (A). Walker A and B
motifs are underlined. (C) Phylogram of human MCM2–7, MCM8 and MCM9. The phylogram was calculated with ClustalW.
54 M. Lutzmann et al. / Gene 362 (2005) 51–56
EST sequences corresponding to HsMCM9 region were also
identified (e.g. CV030253, CX756843 (which contains the full
Walker A and B motifs), DR008069), demonstrating that a
mRNA of the predicted protein inclusive an intact Walker B
motif and an elongated C-terminus is indeed transcribed.
Finally we searched by BLAST the human genome with
the HsMCM9 protein generated by GNOMON. Over 30
BLAST hits on chromosome 6 were found, covering nearly
all the HsMCM9 sequence generated by GNOMON, giving
direct EST evidence from aa 1 to aa 1060. Some hits were
located at the locus previously annotated as MCMDC1 (as
MCM-containing domain 1) at the position 6q22.31, corres-
ponding to the 7 exons of HsMCM9 previously described
(Yoshida, 2005). In addition, more hits were identified
further downstream of the MCMDC1 locus and beyond the
ASF1 gene, which is located in an intron of HsMCM9 and
transcribed in the opposite direction, corresponding to 6 more
exons of the HsMCM9 gene. Finally, on the map of the human
chromosome 6 (http://www.ncbi.nlm.nih.gov/genome/guide/
HsMCM9 gene with the corresponding protein is also annotated
as the entry hmm17631 in the GNOMON model in Map viewer,
as member of the MCM2/3/5 family. Thus, this new HsMCM9
gene consists of 13 exons, giving rise to a mRNA of 4789
nucleotides containing 1366 nucleotides of untranslated 3′
sequences. The corresponding HsMCM9 protein consists of
Fig. 2 (continued).
55M. Lutzmann et al. / Gene 362 (2005) 51–56
1141 aa, thus having a similar length as the identified proteins in Download full-text
this sequence and considered as the end of the protein
corresponds to the end of exon seven.
The here identified full length HsMCM9 shares 55.0%
identity with XlMCM9 and 63.8% identity with the MmMCM9
all new identified members of the MCM9 family (Xenopus,
chicken, mouse and human) are similar in length and highly
3.3. Characterization and classification of the new MCM9
The most striking feature of the MCM9 protein in
different organisms is their highly conserved N-terminus (aa
1–650), which contains all classical features of MCM2–7
and MCM8 proteins, including Zn finger-like domains, the
Walker A and B motifs as well as a full MCM2–7 family
domain (Figs. 1A and 2A). Only the mouse protein appears
to contain additional 150 aa on its N-terminus. However,
MCM9 shares a much higher homology to MCM8 than to
the other MCM2–7 proteins (Fig. 2A and C) and it is only
present in vertebrates. Thus, MCM8 and MCM9 seem to
represent a distinct sub-family of MCM DNA helicases,
perhaps to fulfill special needs which came up with the more
complex biology and development of multicellular organisms,
especially in vertebrates. In contrast, the C-terminal half of
all identified MCM9 proteins (aa 650 to the end), is less
conserved (Fig. 2B), unique and not present in other MCM
proteins, although a weak homology to human MCM8 exists.
No obvious protein signatures or motifs could be identified
with a significant score within this part. However, the C-
terminus contains several short, nevertheless highly conserved
stretches. The elongated C-terminus of this newly identified
MCM9 protein might not be directly involved in helicase
activity, but in binding to other factors.
The newly identified MCM9 protein seems to be generally
present in vertebrates (e.g. also in dog within the contig
NW_139836), cow (XP_584574) and zebra fish (within the
contig CAAK01001524.1) whereas in D. melanogaster, C.
elegans and yeast there appears to be no MCM9 homolog. The
previously identified HsMCM9 protein, which is shorter in size
than MCM proteins, was annotated as MCMDC1 in the public
database (NM_153255), suggesting that this protein may be a
protein functionally unrelated to MCM proteins, but sharing
some homology with them, in particular in one part of the
MCM2–7 signature. Our findings clarify this issue by
establishing that MCM9 is a canonical MCM protein, more
related to MCM8 than to the six MCM2–7 proteins and whose
motifs and sequences are conserved in vertebrates and
mammals, including humans.
M.L. is supported by a Liebig-Stipendium of the Fond der
Chemischen Industrie VCI and D.M. is supported by INSERM.
This work was supported by the CNRS, the Human Frontier
Science Program Organization, the Ligue contre le Cancer and
l’Association pour la Recherche contre le Cancer (ARC).
Gozuacik, D., et al., 2003. Nucleic Acids Res. 31, 570–579.
Higgins, D., Thompson, J., Gibson, T., Thompson, J.D., Higgins, D.G., Gibson,
T.J., 1994. Nucleic Acids Res. 22, 4673–4680.
Johnson, E.M., Kinoshita, Y., Daniel, D.C., 2003. Nucleic Acids Res. 31,
Kearsey, S.E., Labib, K., 1998. Biochim. Biophys. Acta 1398, 113–136.
Koonin, E.V., 1993. Nucleic Acids Res. 21, 2541–2547.
Maiorano, D., Cuvier, O., Danis, E., Mechali, M., 2005. Cell 120, 315–328.
Mulder, N.J., et al., 2005. Nucleic Acids Res. 33, D201–D205.
Pearson, W.R., Wood, T., Zhang, Z., Miller, W., 1997. Genomics 46, 24–36.
Yoshida, K., 2005. Biochem. Biophys. Res. Commun. 331, 669–674.
Zhang, Z., Schwartz, S., Wagner, L., Miller, W., 2000. J. Comput. Biol. 7,
56M. Lutzmann et al. / Gene 362 (2005) 51–56