Atypical structure and phylogenomic evolution of the new eutherian oocyte- and embryo-expressed KHDC1/DPPA5/ECAT1/OOEP gene family.
ABSTRACT Several recent studies have shown that genes specifically expressed by the oocyte are subject to rapid evolution, in particular via gene duplication mechanisms. In the present work, we have focused our attention on a family of genes, specific to eutherian mammals, that are located in unstable genomic regions. We have identified two genes specifically expressed in the mouse oocyte: Khdc1a (KH homology domain containing 1a, also named Ndg1 for Nur 77 downstream gene 1, a target gene of the Nur77 orphan receptor), and another gene structurally related to Khdc1a that we have renamed Khdc1b. In this paper, we show that Khdc1a and Khdc1b belong to a family of several members including the so-called developmental pluripotency A5 (Dppa5) genes, the cat/dog oocyte expressed protein (cat OOEP and dog OOEP) genes, and the ES cell-associated transcript 1 (Ecat1) genes. These genes encode structurally related proteins that are characterized by an atypical RNA-binding KH domain and are specifically expressed in oocytes and/or embryonic stem cells. They are absent in fish, bird, and marsupial genomes and thus seem to have first appeared in eutherian mammals, in which they have evolved rapidly. They are located in a single syntenic region in all mammalian genomes studied, except in rodents, in which a synteny rupture due to a paracentric inversion has separated this gene family into two genomic regions and seems to be associated with increased instability in these regions. Overall, we have identified and characterized a novel family of oocyte and/or embryonic stem cell-specific genes encoding proteins that share an atypical KH RNA-binding domain and that have evolved rapidly since their emergence in eutherian mammalian genomes.
- SourceAvailable from: era.lib.ed.ac.uk[show abstract] [hide abstract]
ABSTRACT: Humans and their closest evolutionary relatives, the chimpanzees, differ in ~1.24% of their genomic DNA sequences. The fraction of these changes accumulated during the speciation processes that have separated the two lineages may be of special relevance in understanding the basis of their differences. We analyzed human and chimpanzee sequence data to search for the patterns of divergence and polymorphism predicted by a theoretical model of speciation. According to the model, positively selected changes should accumulate in chromosomes that present fixed structural differences, such as inversions, between the two species. Protein evolution was more than 2.2 times faster in chromosomes that had undergone structural rearrangements compared with colinear chromosomes. Also, nucleotide variability is slightly lower in rearranged chromosomes. These patterns of divergence and polymorphism may be, at least in part, the molecular footprint of speciation events in the human and chimpanzee lineages.Science 12/2003; 302(5647):988; author reply 988. · 31.20 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: hnRNP K is one of the major proteins found in hnRNP particles which are ribonucleoprotein complexes containing proteins and pre-mRNA. hnRNP K contains hnRNP K homology (KH) domains which bind both CT-rich single-stranded DNA (ssDNA) and CU-rich ssRNA. Co-crystallization of the third KH domain of human hnRNP K with a 15-mer ssDNA gave rod-shaped crystals belonging to the trigonal space group P3(1)21 (unit-cell parameters a = 54.0, c = 149.7 A) and diffracting to 2.4 A resolution. MicroPIXE (proton-induced X-ray emission) experiments showed that the crystals contained the complex and that the phosphorus to sulfur atomic ratio was consistent with the asymmetric unit containing three KH3 domains per 15-mer ssDNA. This was confirmed by structure solution by molecular replacement.Acta Crystallographica Section D Biological Crystallography 05/2004; 60(Pt 4):784-7. · 14.10 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Pseudogenes have been defined as nonfunctional sequences of genomic DNA originally derived from functional genes. It is therefore assumed that all pseudogene mutations are selectively neutral and have equal probability to become fixed in the population. Rather, pseudogenes that have been suitably investigated often exhibit functional roles, such as gene expression, gene regulation, generation of genetic (antibody, antigenic, and other) diversity. Pseudogenes are involved in gene conversion or recombination with functional genes. Pseudogenes exhibit evolutionary conservation of gene sequence, reduced nucleotide variability, excess synonymous over nonsynonymous nucleotide polymorphism, and other features that are expected in genes or DNA sequences that have functional roles. We first review the Drosophila literature and then extend the discussion to the various functional features identified in the pseudogenes of other organisms. A pseudogene that has arisen by duplication or retroposition may, at first, not be subject to natural selection if the source gene remains functional. Mutant alleles that incorporate new functions may, nevertheless, be favored by natural selection and will have enhanced probability of becoming fixed in the population. We agree with the proposal that pseudogenes be considered as potogenes, i.e., DNA sequences with a potentiality for becoming new genes.Annual Review of Genetics 02/2003; 37:123-51. · 17.44 Impact Factor
Atypical structure and phylogenomic evolution of the new eutherian oocyte-
and embryo-expressed KHDC1/DPPA5/ECAT1/OOEP gene family
Alice Pierrea, Mathieu Gautierb, Isabelle Callebautc, Martine Bontouxa, Eric Jeanpierrea,
Pierre Pontarottid, Philippe Mongeta,⁎
aPhysiologie de la Reproduction et des Comportements, UMR 6175 INRA–CNRS–Université F. Rabelais de Tours Haras Nationaux, 37380 Nouzilly, France
bLaboratoire de Génétique Biochimique et de Cytogénétique, Domaine de Vilvert, INRA, 78352 Jouy-en-Josas, France
cDépartement de Biologie Structurale, Institut de Minéralogie et de Physique des Milieux Condensés, UMR CNRS 7590, Université Pierre et Marie Curie–Paris 6,
Université Denis Diderot–Paris 7, IPGP, 140 Rue de Lourmel, 75015 Paris, France
dEA 3781 Evolution Biologique, Université d’Aix Marseille I, 13331 Marseille Cedex 3, France
Received 12 April 2007; accepted 12 June 2007
Available online 3 October 2007
Several recent studies have shown that genes specifically expressed by the oocyte are subject to rapid evolution, in particular via gene
duplication mechanisms. In the present work, we have focused our attention on a family of genes, specific to eutherian mammals, that are located
in unstable genomic regions. We have identified two genes specifically expressed in the mouse oocyte: Khdc1a (KH homology domain containing
1a, also named Ndg1 for Nur 77 downstream gene 1, a target gene of the Nur77 orphan receptor), and another gene structurally related to Khdc1a
that we have renamed Khdc1b. In this paper, we show that Khdc1a and Khdc1b belong to a family of several members including the so-called
developmental pluripotency A5 (Dppa5) genes, the cat/dog oocyte expressed protein (cat OOEP and dog OOEP) genes, and the ES cell-
associated transcript 1 (Ecat1) genes. These genes encode structurally related proteins that are characterized by an atypical RNA-binding KH
domain and are specifically expressed in oocytes and/or embryonic stem cells. They are absent in fish, bird, and marsupial genomes and thus seem
to have first appeared in eutherian mammals, in which they have evolved rapidly. They are located in a single syntenic region in all mammalian
genomes studied, except in rodents, in which a synteny rupture due to a paracentric inversion has separated this gene family into two genomic
regions and seems to be associated with increased instability in these regions. Overall, we have identified and characterized a novel family of
oocyte and/or embryonic stem cell-specific genes encoding proteins that share an atypical KH RNA-binding domain and that have evolved rapidly
since their emergence in eutherian mammalian genomes.
© 2007 Elsevier Inc. All rights reserved.
Keywords: Oocyte; Atypical phylogenetic evolution; Dppa5; ECATI; Eutherian mammals
During the past decade, a number of studies have identified
several genes specifically expressed by the oocyte that play a
role in oogenesis, folliculogenesis, or early embryonic develop-
ment. In a previous work, we used and validated an in silico
subtraction methodologyto identify oocyte-specific genes in the
mouse [1,2]. Using the digital differential display software, we
and others have identified more than 100 genes specifically
expressed in the mouse oocyte. Some of these genes are or-
ganized in clusters in the mouse genome and seem to have
evolved particularly rapidly, via recent gene duplications
(oogenesin, Nalp9, Obox, …) [1–4].
The evolution of genes depends, at least in part, on their
specificity of expression and on their biological function. In
particular, genes that exhibit a strict specificity of expression
evolve faster than ubiquitously expressed genes [5,6]. Genes that
play a role in reproduction processes also are known to evolve
also depends on their location in the genome. For example,
oocyte-specific gene families organized in clusters in the mouse
genome are significantly closer to the telomeres than isolated
genes . Furthermore, it has been also shown that segmental
as driving forces for evolutionary rearrangements .
In the present work, we were interested in studying two
genes predicted in silico as specifically expressed in the oocyte,
Available online at www.sciencedirect.com
Genomics 90 (2007) 583–594
⁎Corresponding author. Fax: +33 2 47 42 77 43.
E-mail address: firstname.lastname@example.org (P. Monget).
0888-7543/$ - see front matter © 2007 Elsevier Inc. All rights reserved.
KH homology domain containing 1a (Khdc1a), also named Nur
77 downstream gene 1 (Ndg1), and another gene structurally
related to Khdc1a that we have renamed Khdc1b. We have
focused our study on these two genes because they seem to have
of the corresponding proteins sharing a particularly high degree
structural homology with developmental pluripotency A5
(Dppa5), another factor specifically expressed by embryonic
stem (ES) cells. In this paper, we show that Khdc1a and Khdc1b
belong to a new eutherian gene family with several members,
(cat OOEP and dog OOEP) genes, and the ES cell-associated
transcript 1 (Ecat1) genes. These genes encode structurally
related proteins that are characterized by an atypical RNA-
binding KH domain and are specifically expressed in oocytes
and/or embryonic stem cells. Interestingly, they appear to have
emerged initially in eutherian mammalian genomes and they
have undergone rapid evolution. Moreover, in rodents, they are
located in an unstable chromosomal region.
Identification of oocyte-specific Khdc1 genes in the mouse
Using the in silico digital differential display methodology as
previously described (http://www.ncbi.nlm.nih.gov/UniGene/
info/ddd.html) , we have identified two structurally related
genes, Kdhc1a and another gene that we named Kdhc1b (see
below for the annotation), both predicted to be exclusively
expressed in oocytes.
The two deduced protein sequences of Khdc1a and Khdc1b
share 86% identity over the first 123 amino acids. The Khdc1a
protein sequence is longer than that of Khdc1b, with 166
Fig. 1. mRNAexpressionofKhdc1aandKhdc1binthemouseovary.(A)ExpressionanalysisofKhdc1aandKhdc1bmRNAsbySouthernblotanalysisoftheRT-PCR
products. As a control, actin was amplified in all tissues by RT-PCR. (B) Localization of Khdc1a and Khdc1b mRNAs by in situ hybridization, using35S-labeled RNA
bars, 100 μm.
584 A. Pierre et al. / Genomics 90 (2007) 583–594
amino acids, instead of 126. A Blast analysis showed that, in
the mouse genome, a third gene structurally related to Khdc1a
and Khdc1b exists, which we have named Khdc1c. The
proteins encoded by the three genes share 86 to 98% sequence
identity (Fig. 2A).
First, we verified their oocyte-specific expression. Southern
blot analysis of RT-PCR products showed that, in mouse,
Khdc1a and Khdc1b mRNAs are exclusively expressed in the
ovary (Fig. 1A) and in situ hybridization confirmed that they
are expressed in oocytes (Fig. 1B).
Transcripts are clearly detected in secondary follicles but are
also present in antral follicles (Fig. 1B). No clear difference of
expression was observed between healthy and atretic follicles or
between ovaries recovered at the diestrus, proestrus, or estrus
stages (data not shown).
Interestingly, in silico analyses predicted that Khdc1c should
also be specifically expressed in the oocyte (http://www.ncbi.
nlm.nih.gov/UniGene/; data not shown).
Phylogenetic analysis of Khdc1/Dppa5/Ooep genes
three Khdc1 protein sequences with those of Dppa5 and Ecat1
from different mammals specific to embryonic stem cells and
early embryos (Dppa5 ; Ecat1 ), as well as with those of
to be specifically expressed in cat and dog oocytes, respectively
(W. He et al., 2006, direct submission to GenBank). It is
noteworthy that the Ecat1 gene is absent in rodents. The protein
Dppa5 shares approximately 35% sequence identity with Coep
over 100 amino acids, while both Dppa5 and Ooep share only
approximately 21% sequence identity over 97 amino acids with
Khdc1a and Khdc1b (Fig. 2A). The region of similarity
encompasses the N-terminal parts of the proteins, which include
a KH-like domain (see below). The C-terminal extensions of the
proteins are variable and cannot be aligned between Khdc1,
Dppa5, and Ooep homolog (cat/dog) sequences.
Phylogenetic analysis revealed three monophylogenetic
groups that contain Ooep/Ecat, Dppa5, and Khdc1 genes,
respectively, suggesting that they arose from a duplication of an
ancestral gene before the eutherian mammalian radiation
(Fig. 3). Thus it is likely that the mammalian ancestor genome
contained one Dppa5 and one Khdc1, and one Ooep/Ecat1 gene
which, after duplication, gave rise to both Ooep and Ecat1
genes. After this event it seems that the Ecat1 and Ooep genes
have remained as a single copy in mammalian genomes since
only one ortholog is found in different mammalians species
occurred in the case of the Dppa5 and Khdc1 genes (Figs. 3D
and 3E). Indeed, duplication of the Khdc1 gene occurred in the
mouse after divergence from the rat, as well as in the primate
lineages before the chimpanzee/human/Macaca split (Fig. 3E).
Fig. 2. (A) Multiple alignment of the murine sequences of Khdc1/Dppa5/Ooep family and relationship to type I KH domains. On top is shown the multiple alignment
of the Khdc1/Dppa5/Ooep sequences, showing positions that are conserved between the three groups of sequences. The Ecat1 gene is absent from the mouse genome.
Shading was performed according to residue properties and degree of conservation within the alignment. Sequence identities are in white on a black background.
Positions where hydrophobicity is conserved are shaded in dark gray, whereas other conserved positions are shaded in light gray. Dashes indicate gap positions.
Consensus abbreviations are shown next to the alignment (h, hydrophobic residues (VILFMYWACTS); p, polar residues (STEDNQKRHC); s, small residues
(AGSVT); l, loop-forming residues (PGDNS)). The C-terminal regions of the sequences are not shown, since they are highly variable and cannot be aligned. GI
numbers are Khdc1a (34304011), Khdc1b (38049357), Khdc1c (76573872), Dppa5 (27228963), Dppa5-homolog1 (94386247), Dppa5-homolog2 (94386249), Ooep
homolog (cat/dog) (21312688). The Khdc1/Dppa5/Ooep sequences were aligned with those of different type I KH-fold domains for which the three-dimensional
structureshave beensolved:Nova-2 (PDB1ec6), FBP (PDB 1j4w),hnrnp K(PDB 1khm),vigilin (PDB1vih). Conserved positionsof the KH core match mainlythose
observed in the Khdc1/Dppa5/Ooep family. The positions of regular secondary structures, as experimentally observed, are indicated below the alignment (consensus
positions for regular secondary structures between the four sequences are shown with black boxes, whereas white boxes indicate the N- or C-terminal extensions of
regularsecondarystructuresspecificfor somesequences).Starsindicate thepositionsofaminoacidsthathavebeenshowninoneorseveralstructuresto beinvolvedin
RNA recognition . (B) Model of the three-dimensional structure of the type I KH domain of the Khdc1a protein (right view),based on the experimentalstructure of
the KH3 domain of the Nova-2 protein, in complex with a hairpin RNA (PDB 1ec6; left view).
585 A. Pierre et al. / Genomics 90 (2007) 583–594
Another duplication of Khdc1 occurred in Macaca after the
divergence from other primates. In the case of the Dppa5 gene
duplicationisrestricted tothemouselineage. Itisalso likelythat
the Ecat1 gene was lost from the genome of rodents after the
divergence with other mammals (Fig. 3C). Curiously, if the
mouse is chosen as outgroup, the phylogenetic tree of Khdc1
shows an inversion in the topology of the mouse genes, the rat
Khdc1 gene forming a monophylogenetic group with primate
genes (Fig. 3E). This could be due tothe particularly high rate of
evolution of these genes in the mouse lineage.
Identification and organization of Khdc1/Dppa5/Ecat1/Ooep
genes in mammalian genomes
To establish a relationship between the phylogenetic evolu-
tion of the Khdc1/Dppa5/Ecat1/Ooep family and their
positions in the genome, we have identified and located all
the genes of this family in the available sequenced mammalian
genomes. Analysis of the human genome showed that the two
human KHDC1 genes, as well as the DPPA5, ECAT1, and
OOEP homolog (cat/dog) genes, are localized on human
Fig. 3. Phylogenetic analysis of the Khdc1/Dppa5/Ecat1/Ooep genes. (A) Overall phylogenetic tree: three monophylogenetic groups that contain Ooep/Ecat1, Dppa5,
and Khdc1 genes, respectively. Note that the Ooep/Ecat1 ancestor further duplicated to give rise to Ooep in Ecat1 genes. (B) Ooep: The Ooep phylogenetic tree is
similar to that of Ecat1. (C) Ecat1: Ecat1 genes remained as a single copy in mammalian genomes except in rodents, from which these genes have probably been lost.
(D) Dppa5: Duplication of Dppa5 genes occurred only in the mouse; Dppa5 is present only as a single copy in all other mammals. (E) Khdc1 subtree: Khdc1 probably
was duplicated in the mouse after the divergence with rat and in primates before the chimpanzee/human/Macaca split. A further duplication occurred in Macaca after
the divergence from other primates. Note the inversion of topology of mouse genes, probably due to the high rate of evolution of these genes. Bootstrap values are
reported for each npl method. *The bootstrap value was under 50%.
586A. Pierre et al. / Genomics 90 (2007) 583–594
chromosome 6q13, in a locus flanked by the genes KCNQ5
(potassium voltage-gated channel/subfamily Q/member 5) and
DDX43 (DEAD-box polypeptide 43) (Fig. 4 and Supplemental
Data Fig. 4A). A similar organization is found in the
chimpanzee, macaque, dog, and bovine genomes, the syntenic
regions being located on chromosomes 6, 4, 12, and 9,
respectively (Fig. 4 and Supplemental Figs. 4B–4E). One
pseudogene of KHDC1 was detected in the human and
chimpanzee genomes (Supplemental Figs. 4A and 4B) and
also in the dog, close to Khdc1 (Supplemental Fig. 4D).
Interestingly, a break is observed in the syntenic genomic
region in mouse and rat (Fig. 4 and Supplemental Figs. 4F and
4G). In mouse, the locus containing the Khdc1a, Khdc1b, and
Khdc1c genes is close to the Kcnq5 gene, on chromosome
1A4, and the locus containing the Dppa5/Ooep homolog genes
is on chromosome 9E1, near Ddx43 (Supplemental Fig. 4F).
those of other mammals. Moreover, in the mouse, three
glutathione S-transferase (Gst) genes are present in this region,
as well as the so-called oocyte maturation genes, Omt2a and
Omt2b, known to be specifically expressed in the oocyte .
Inrat, theKhdc1and Dppa5 genesoccupy twoloci,respectively
on chromosome 9q13 and 8q31 (Supplemental Fig. 4G). On
chromosome 9q13, only one Khdc1 gene (and a pseudogene) is
present, close to two Kcnq5 genes and a Kcnq5 pseudogene. On
chromosome 8q31, two Dppa5/Ooep genes are present, as well
as three Gst genes and one Omt gene, near a predicted Ddx43
Fig. 3 (continued).
587A. Pierre et al. / Genomics 90 (2007) 583–594