Molecular Biology and Evolution

Published by Oxford University Press

Online ISSN: 1537-1719


Print ISSN: 0737-4038


Table 1 b-1,3-G Genes with Greater than a 3-Fold Change in Expression Level Following Fungal Pathogen Treatment 
FIG. 1.-Protein domain architectures observed in the Arabidopsis b1,3-G gene family. The 5 architectural classes are based on the presence/ absence of an N-terminal sequence (NTS), a cellulose binding module (CBM43), and hydrophobic C-terminal sequence (CTS), in addition to the core glycosyl hydrolase family 17 domain (GH-17). 
FIG. 2.-(a) Majority-rule consensus tree generated by Bayesian inference of phylogeny. Bayesian posterior probabilities (/1), maximum likelihood support (/100) and NJ bootstrap values (/1000) are indicated above the clades where the clade is present in the respective tree. (b) Presence/absence of conserved introns and protein domain architecture mapped onto the phylogenetic tree. The locations of introns (labeled I1-I9) are shown in the multiple sequence alignment (fig. SF1, Supplementary Material online). Black boxes indicate presence of introns, white indicates absence, and gray boxes indicate introns that are located in the C-terminal domain but do not align well with I8. (c) Protein domain architectural class as defined in figure 1. 
FIG. 4.-Phylogenetic reconstruction of ancestral expression states using parsimony. Colored boxes are shown at the terminal branches of genes included in the expression clustering. Genes with absent expression profiles (no box) are associated with null states in the reconstruction. The colors corresponding to each expression group are shown in the legend, and multicolored branches are associated with ambiguous (multiple possible) states. PR-glucanases identified through fungal stress response expression analysis are marked with an X. 
Functional Divergence in the Arabidopsis -1,3-Glucanase Gene Family Inferred by Phylogenetic Reconstruction of Expression States
ArticleFull-text available

May 2007


263 Reads



Barbara A Moffatt




Plant beta-1,3-glucanases (beta-1,3-Gs) (E.C. comprise large, highly complex gene families involved in pathogen defense as well as a wide range of normal developmental processes. In spite of previous phylogenetic analyses that classify beta-1,3-Gs by sequence relatedness, the functional evolution of beta-1,3-Gs remains unclear. Here, expression and phylogenetic analyses have been integrated in order to investigate patterns of functional divergence in the Arabidopsis beta-1,3-G gene family. Fifty beta-1,3-G genes were grouped into expression classes through clustering of microarray data, and functions were inferred based on knowledge of coexpressed genes and existing literature. The resulting expression classes were mapped as discrete states onto a phylogenetic tree and parsimony reconstruction of ancestral expression states was performed, providing a model of expression divergence. Results showed a highly nonrandom distribution of developmental expression states in the phylogeny (P = 0.0002) indicating a significant degree of coupling between sequence and developmental expression divergence. A weaker, yet significant level of coupling was found using stress response data, but not using hormone-response or pathogen-response data. According to the model of developmental expression divergence, the ancestral function was most likely involved in cell division and/or cell wall remodeling. The associated expression state is widely distributed in the phylogeny, is retained by over 25% of gene family members, and is consistent with the known functions of beta-1,3-Gs in distantly related species and gene families. Consistent with previous hypotheses, pathogenesis-related (PR) beta-1,3-Gs appear to have evolved from ancestral developmentally regulated beta-1,3-Gs, acquiring PR function through a number of evolutionary events: divergence from the ancestral expression state, acquisition of pathogen/stress-responsive expression patterns, and loss of the C-terminal region including the glycosylphosphatidylinisotol (GPI)-anchoring site thus allowing for extracellular secretion.

Table 1 Primers Used to Amplify and Sequence Primate Lewis-like Sequences
FIG. 3.—Amino acid alignment of Lewis fucosyltransferases. The nine new fucosyltransferases (bold-type characters) are compared with the human, chimpanzee, and bovine Lewis enzymes. Numberings refer to position in the alignment, above the sequences, and to the length of each sequence, at the end of the line. Dashes indicate common residues to the human FUT3, dots show missing positions, question marks represent nonsequenced characters, and crosses point out residues that cannot be aligned. The FUT sq2 sequence was made translatable (frame 2) by deletion of one nucleotide at position 349 in addition to the 17-bp gap. The 3-and 3/4-fucosyltransferase conserved motifs I and II (Oriol et al. 1999), the acceptor-binding motif (abm) (Dupuy et al. 1999), and the new motif III are shaded in gray for all sequences. The Trp/ Arg residue of acceptor-binding motif involved in acceptor substrate specificity is indicated in bold type. The two conserved Asn-linked glycosylation sites are shown by arrows, and stars correspond to conserved Cys involved in human FUT3 folding (Holmes et al. 2000). The putative transmembrane domain is underlined. Shaded characters indicate selectively conserved residues characteristic of the three Lewis orthology classes defined by Hominoid sequences (FUT3: white characters on gray background; FUT5: black characters on gray background; FUT6: white characters on black background). Nomenclature: bos, Bos taurus; ch, chimpanzee; eu, Eulemur fulvus; hu, human; rh, rhesus macaque; sq, squirrel monkey; va, Varecia variegata.  
FIG. 5.-Western blot analysis of primate Lewis fucosyltransferases expressed in COS-7 cells. Lane 1, negative control, 50 g of proteins extracted from COS-7 cells transfected with pcDNAI/Amp; lane 2, human FUT3 (MW 42.1 kDa); lane 3, human FUT5 (MW 43.0 kDa); lane 4, human FUT6 (MW 41.8 kDa); lane 5, rhesus monkey FUT3 (MW 43.1 kDa); lane 6, rhesus monkey FUT5 (MW 43.0 kDa); lane 7, rhesus monkey FUT3 (MW 43.3 kDa); lane 8, squirrel monkey FUT sq1 (MW 43.0 kDa). The molecular weights are deduced from amino acid sequences. Lewis-like fucosyltransferases were labeled by anti-human FUT3 antibodies. Revelation was performed by a secondary antibody, a pig anti-rabbit IgG conjugated to horseradish peroxidase.
1,4-Fucosyltransferase Activity: A Significant Function in the Primate Lineage has Appeared Twice Independently

July 2002


122 Reads

In the animal kingdom the enzymes that catalyze the formation of α1,4 fucosylated–glycoconjugates are known only in apes (chimpanzee) and humans. They are encoded by FUT3 and FUT5 genes, two members of the Lewis FUT5-FUT3-FUT6 gene cluster, which had originated by duplications of an α3 ancestor gene. In order to explore more precisely the emergence of the α1,4 fucosylation, new Lewis-like fucosyltransferase genes were studied in species belonging to the three main primate groups. Two Lewis-like genes were found in brown and ruffed lemurs (prosimians) as well as in squirrel monkey (New World monkey). In the latter, one gene encodes an enzyme which transfers fucose only in α1,3 linkage, whereas the other is a pseudogene. Three genes homologous to chimpanzee and human Lewis genes were identified in rhesus macaque (Old World monkey), and only one encodes an α3/4-fucosyltransferase. The ability of new primate enzymes to transfer fucose in α1,3 or α1,3/4 linkage confirms that the amino acid R or W in the acceptor-binding motif “HH(R/W)(D/E)” is required for the type 1/type 2 acceptor specificity. Expression of rhesus macaque genes proved that fucose transfer in α1,4 linkage is not restricted to the hominoid family and may be extended to other Old World monkeys. Moreover, the presence of only one enzyme supporting the α1,4 fucosylation in rhesus macaque versus two enzymes in hominoids suggests that this function occurred twice independently during primate evolution.

A Gene Duplication/Loss Event in the Ribulose-1,5-Bisphosphate-Carboxylase/Oxygenase (Rubisco) Small Subunit Gene Family among Accessions of Arabidopsis thaliana

May 2011


190 Reads

Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase; EC, the most abundant protein in nature, catalyzes the assimilation of CO2 (worldwide about 1011 t each year) by carboxylation of ribulose-1,5-bisphosphate. It is a hexadecamer consisting of eight large and eight small subunits. Although the Rubisco large subunit (rbcL) is encoded by a single gene on the multicopy chloroplast genome, the Rubisco small subunits (rbcS) are encoded by a family of nuclear genes. In Arabidopsis thaliana, the rbcS gene family comprises four members, that is, rbcS-1a, rbcS-1b, rbcS-2b, and rbcS-3b. We sequenced all Rubisco genes in 26 worldwide distributed A. thaliana accessions. In three of these accessions, we detected a gene duplication/loss event, where rbcS-1b was lost and substituted by a duplicate of rbcS-2b (called rbcS-2b*). By screening 74 additional accessions using a specific polymerase chain reaction assay, we detected five additional accessions with this duplication/loss event. In summary, we found the gene duplication/loss in 8 of 100 A. thaliana accessions, namely, Bch, Bu, Bur, Cvi, Fei, Lm, Sha, and Sorbo. We sequenced an about 1-kb promoter region for all Rubisco genes as well. This analysis revealed that the gene duplication/loss event was associated with promoter alterations (two insertions of 450 and 850 bp, one deletion of 730 bp) in rbcS-2b and a promoter deletion (2.3 kb) in rbcS-2b* in all eight affected accessions. The substitution of rbcS-1b by a duplicate of rbcS-2b (i.e., rbcS-2b*) might be caused by gene conversion. All four Rubisco genes evolve under purifying selection, as expected for central genes of the highly conserved photosystem of green plants. We inferred a single positive selected site, a tyrosine to aspartic acid substitution at position 72 in rbcS-1b. Exactly the same substitution compromises carboxylase activity in the cyanobacterium Anacystis nidulans. In A. thaliana, this substitution is associated with an inferred recombination. Functional implications of the substitution remain to be evaluated.

Characteristics of a conserved 1,579-bp highly repetitive component in the killer whale, Orcinus orca

October 1985


58 Reads

A tandemly organized, highly repetitive DNA component of the killer whale was sequenced. The length of the repeat was 1,579 bp. This unit, which characterizes all delphinids, shows stringent hybridization homology with a 1,740-bp repeat that is characteristic of all other cetacean families. The 1,579-bp component comprises approximately 15% of the killer-whale genome, in which it is repeated 4-5 X 10(5) times. Computer analysis of the sequence showed no linear repetition within the component. This indicates that the 1,579-bp unit has not evolved by amplification of shorter repeats. Several inverted repeats of substantial length were found in the 1,579-bp unit. The most conspicuous of these was a 72-bp sequence that deviated from matching in only three positions. The 72-bp sequence occurs within an open reading frame 330 bp in length. Transcriptional activity was registered in the cloned repeat in a cell-free system. The length of the transcript was approximately 340 nucleotides. The chromosomal localization of the 1,579-bp repeat was determined by in situ hybridization. The repeat was present in eight of 21 autosomal pairs and was found in almost all C-band-positive (constitutive heterochromatin) regions of the karyotype.

FIG. 1. Cytological map of chromosomes X, 2, and 3 of Drosophila melanogaster showing cytological divisions for euchromatin (X, 2, and 3) and heterochromatin (X#, 2#,and 3#) (adapted from Gatti and Pimpinelli 1992; Lohe et al. 1993; Divisions in gray (print version) or red (online version) mark the locations of 1.688 repeats described in the literature; stars show 1.688 arrays studied in figures 2 and 3. 
FIG. 2. ML trees of 1.688 repeats from chromosomes 2, 3, and X under the TIM1þG, TPM2ufþG and TVMþIþG models, respectively. Heterochromatic repeats (dark triangles) are mostly grouped and lie away from euchromatic repeats (colours and other symbols). Chromosomes 2 and 3: circles and squares represent repeats from the left and right arm, respectively. Chromosome X: different symbols with same colour represent repeats from different arrays but same cytological division. Numbers refer to locations of arrays (see fig. 1). The colour version of this figure is available online. 
FIG. 3. ML tree of all 1.688 repeats analyzed under the TVMþIþG model. Het, heterochromatin and Eu, euchromatin. The colour version of this figure is available online. 
FIG. 4. Proportion of 1.688 arrays that falls into the three genomic landscape classes defined in the present work. Numbers within brackets correspond to the number of analyzed arrays from each chromosome. 
Kuhn GCS, Küttler H, Moreira-Filho O, Heslop-Harrison JS.. The 1.688 repetitive DNA of Drosophila: concerted evolution at different genomic scales and association with genes. Mol Biol Evol 29: 7-11

June 2011


1,837 Reads

Concerted evolution leading to homogenization of tandemly repeated DNA arrays is widespread and important for genome evolution. We investigated the range and nature of the process at chromosomal and array levels using the 1.688 tandem repeats of Drosophila melanogaster where large arrays are present in the heterochromatin of chromosomes 2, 3, and X, and short arrays are found in the euchromatin of the same chromosomes. Analysis of 326 euchromatic and heterochromatic repeats from 52 arrays showed that the homogenization of 1.688 repeats occurred differentially for distinct genomic regions, from euchromatin to heterochromatin and from local arrays to chromosomes. We further found that most euchromatic arrays are either close to, or are within introns of, genes. The short size of euchromatic arrays (one to five repeats) could be selectively constrained by their role as gene regulators, a situation similar to the so-called “tuning knobs.”

BEAUti GUI for importing data and specifying the evolutionary model.
Simultaneous phylogenetic and phenotypic trait reconstruction of Darwin's finches. Plotted are the maximum clade credibility tree and posterior estimate of the trait correlation matrix. We annotate the tree with estimates of selected posterior clade support values and the one significant nucleotide substitution local clock (in red) and the branches scale in expected substitutions per site. We depict correlation coefficients through their bivariate ellipse sizes, where more highly correlated phenotypes return narrower ellipses.
(a) Representative gene tree of mitochondrial DNA fragment from 16 Darwin's finches of four species (Geospiza fortis, G. magnirostris, Camarhynchus parvulus, and Certhidea olivacea). Nodes that have posterior clade probabilities of greater than 0.5 are labeled with their posterior clade probability. (b) The two most probable species trees (solid line represents most probable species tree; dashed line is second most probable). (c) Gene tree embedded in a point estimate of the species tree, including divergence times and effective population sizes. The x axis is divergence time in units of substitutions per site and the y axis is proportional to effective population size.
Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29, 1969-1973

February 2012


1,570 Reads

Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at and

Table 1 Recombination Lines, Recombination Breakpoint Number, Genotype Coverage, and Proportion of the Parental Allele Ore in the Subset of RI Lines 
FIG. 1.—Flow chart of the statistical approach taken to detect genes under cis-and trans-regulatory effects. All equations and symbols are fully described in Materials and Methods.  
FIG. 2.—Absolute differences of transcript abundance between genotypes. Box plot of the difference of mean (log transcript intensity) for the significant contrasts at FDR correction of 0.2; for cis, trans, and cisand trans-regulatory effects. Rectangle bars represent the first and third quartiles.  
FIG. 3.—Venn diagram result of all analysis of variance models after an FDR correction at 0.2, comparing the infinitesimal models versus the contrast models. Circles indicate the number of probes (and genes in parentheses) with significant cis, trans, or cis and trans effects. Light gray: infinitesimal models, dark gray: contrast models.  
Genissel A, McIntyre LM, Wayne ML, Nuzhdin SV. Cis and trans regulatory effects contribute to natural variation in transcriptome of Drosophila melanogaster. Mol Biol Evol 25: 101-110

February 2008


92 Reads

The dissection of intraspecific variation in transcriptome is a central theme of many recent quantitative genomic analyses. Transcript level variation has been attributed to factors at the gene itself (cis) and elsewhere in the genome (trans). Previous analyses of Drosophila intraspecific transcriptome variation pointed toward a larger contribution of trans factors. However, data from other genera, and from interspecific comparisons within Drosophila, are more consistent with a major role for cis factors. We investigated the relative amount of cis and trans variation in Drosophila melanogaster, using whole-genome expression from an oligonucleotide microarray in the 2 extensively studied genotypes Ore and 2b3, and 6 recombinant inbred (RI) lines derived from these parents. We examined 2 types of models to decompose cis and trans contributions to genetic variation in transcript level: 1) an infinitesimal model assuming that the transcription variation is highly polygenic and due to many small effects and 2) contrast models assuming that a few large effects contribute to the transcriptional variation. We explicitly fitted cis-by-trans interactions and extended our analyses to consider regulation of alternatively spliced transcripts. We estimated that approximately 10% of the transcriptome was differentially regulated among the lines. We were able to identify cis and trans effects that contribute to this differential regulation for 1,340 genes. Our analyses revealed numerous cis effects (90%) but much fewer trans effects, perhaps due to reduced power of detection for trans effects. In addition, we identified 15 genes that have alternative splice variants differentially regulated in cis.

Hardison, R. & Miller, W. Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Mol. Biol. Evol. 10, 73-102

January 1993


28 Reads

The determination of long segments of DNA sequences encompassing the beta- and alpha-globin gene clusters has provided an unprecedented data base for analysis of genome evolution and regulation of gene clusters. A newly developed computer tool kit generates local alignments between such long sequences in a space-efficient manner, helps the user analyze the alignments effectively, and finds consistently aligning blocks of sequences in multiple pairwise comparisons. Such sequence analyses among the beta-like globin gene clusters of human, galago, rabbit, and mouse have revealed the general patterns of evolution of this gene cluster. Alignments in the flanking regions are very useful in assigning orthologous relationships. Investigation of such matches between the mouse and human beta-like globin gene clusters has led to a reassessment of some orthologous assignments in mouse and to a revision of the proposed pathway for evolution of this gene cluster. In general, the interspersed repetitive elements have inserted independently, presumably via a retrotransposition mechanism, in the different mammalian lineages. However, some examples of ancient L1 repeats are found, including one between the epsilon- and gamma-globin genes that appears to have been in the ancestral eutherian gene cluster. Prominent matching sequences are found in a long region 5' to the epsilon-globin gene, the locus control region (LCR) that is a positive regulator of the entire gene cluster. Three-way alignments among the human, goat, and rabbit sequences can extend for > or = 3 kb in part of the LCR (DNase hypersensitive site 3), indicating that the cis-acting components of this complex regulatory region cover a long segment of DNA. In contrast to the beta-like globin gene clusters, the alpha-like globin gene clusters of many mammals occur in very G+C-rich isochores and contain prominent CpG islands. The regions between the alpha-like globin genes are evolving faster than the intergenic regions of the beta-like globin gene clusters. The contrasts between the two gene clusters can be attributed to differences in DNA metabolism in the isochore. The proximal control elements of the rabbit alpha-globin gene are located both 5' to and within the gene. All of this region is part of a prominent CpG island that may be acting as an extended, enhancer-independent promoter. One can hypothesize that the analogue to the LCR in the alpha-globin gene cluster may interface with the distinctive alpha-globin promoter in ways different from the interaction between the beta LCR and the promoters of beta-like globin genes.(ABSTRACT TRUNCATED AT 400 WORDS)

Table 1 . Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa. 
FIG. 4. Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers. The tree was inferred by concatenation analyses using ML, BI, and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST). Branch support values are indicated beside nodes in order of ML bootstrap (BP ML ), BI posterior probability (PP BI ), CAT posterior probability (PP CAT ), and MP-EST bootstrap (BP MP-EST ), from left to right. The filled squares represent BP ML > 95, PP BAY = 1.0, PP CAT = 1.0, and BP MP-EST > 95. The circled number refers to the node of interest studied in figure 6. Branch lengths are from the ML analysis. 
FIG. 5. The effect of increasing the number of nuclear loci on resolving the basal split within salamanders. Each data point represents the mean of support values estimated from 30 randomly sampled subsets. The dashed line indicates the threshold of 95% bootstrap support values. The statistical plots show that the minimum number of nuclear loci needed to robustly resolve the basal split within salamanders is 25. 
FIG. 6. Schematic representation of the experimental protocol for using our NPCL toolkit. Note that for each NPCL, nested PCR primers are designed on four short conserved blocks flanking the target region. 
A Versatile and Highly Efficient Toolkit Including 102 Nuclear Markers for Vertebrate Phylogenomics, Tested by Resolving the Higher Level Relationships of the Caudata

July 2013


714 Reads

Resolving difficult nodes for any part of the vertebrate tree of life often requires analyzing a large number of loci. Developing molecular markers that are workable for the groups of interest is often a bottleneck in phylogenetic research. Here, based on a nested PCR strategy, we present a universal toolkit including 102 NPCL (nuclear protein-coding locus) markers for vertebrate phylogenomics. The 102 NPCL markers have a broad range of evolutionary rates, which makes them useful for a wide range of time depths. The new NPCL toolkit has three important advantages compared to all previously developed NPCL sets: (i) the kit is universally applicable across vertebrates, with a PCR success rate of 94.6% in 16 widely divergent tested vertebrate species; (ii) more than 90% of PCR reactions produce strong and single bands of the expected sizes that can be directly sequenced; and (iii) all cleanup PCR reactions can be sequenced with only two specific universal primers. To test its actual phylogenetic utility, 30 NPCLs from this toolkit were used to address the higher-level relationships of living salamanders. Of the 639 target PCR reactions performed on 19 salamanders and several outgroup species, 632 (98.9%) were successful, and 602 (94.1%) were directly sequenced. Concatenation and species-tree analyses on this 30-locus dataset produced a fully resolved phylogeny and showed that Cryptobranchoidea (Cryptobranchidae + Hynobiidae) branches first within the salamander tree, followed by Sirenidae. Our experimental tests and our demonstration for a particular case show that our NPCL toolkit is a highly reliable, fast, and cost-effective approach for vertebrate phylogenomic studies and thus has the potential to accelerate the completion of many parts of the vertebrate tree of life.

Human-Specific Amino Acid Changes Found in 103 Protein-Coding Genes

June 2004


55 Reads

We humans have many characteristics that are different from those of the great apes. These human-specific characters must have arisen through mutations accumulated in the genome of our direct ancestor after the divergence of the last common ancestor with chimpanzee. Gene trees of human and great apes are necessary for extracting these human-specific genetic changes. We conducted a systematic analysis of 103 protein-coding genes for human, chimpanzee, gorilla, and orangutan. Nucleotide sequences for 18 genes were newly determined for this study, and those for the remaining genes were retrieved from the DDBJ/EMBL/GenBank database. The total number of amino acid changes in the human lineage was 147 for 26,199 codons (0.56%). The total number of amino acid changes in the human genome was, thus, estimated to be about 80,000. We applied the acceleration index test and Fisher's synonymous/nonsynonymous exact test for each gene tree to detect any human-specific enhancement of amino acid changes compared with ape branches. Six and two genes were shown to have significantly higher nonsynonymous changes at the human lineage from the acceleration index and exact tests, respectively. We also compared the distribution of the differences of the nonsynonymous substitutions on the human lineage and those on the great ape lineage. Two genes were more conserved in the ape lineage, whereas one gene was more conserved in the human lineage. These results suggest that a small proportion of protein-coding genes started to evolve differently in the human lineage after it diverged from the ape lineage.

Table 1 Primers Used in this Study 
Macey JR, Larson A, Ananjeva NB, Fang ZL, Papenfuss TJ. Two novel gene orders and the role of light-strand replication in rearrangement of the vertebrate mitochondrial genome. Mol Biol Evol 14: 91-104

February 1997


519 Reads

Two novel mitochondrial gene arrangements are identified in an agamid lizard and a ranid frog. Statistical tests incorporating phylogeny indicate a link between novel vertebrate mitochondrial gene orders and movement of the origin of light-strand replication. A mechanism involving errors in light-strand replication and tandem duplication of genes is proposed for rearrangement of vertebrate mitochondrial genes. A second mechanism involving small direct repeats also is identified. These mechanisms implicate gene order as a reliable phylogenetic character. Shifts in gene order define major lineages without evidence of parallelism or reversal. The loss of the origin of light-strand replication from its typical vertebrate position evolves in parallel and, therefore, is a less reliable phylogenetic character. Gene junctions also evolve in parallel. Sequencing across multigenic regions, in particular transfer RNA genes, should be a major focus of future systematic studies to locate novel gene orders and to provide a better understanding of the evolution of the vertebrate mitochondrial genome.

Felsenstein J, Churchill G.A.. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13: 93-104

February 1996


499 Reads

The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of phylogeny is calculated as a sum of terms, each term being the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specified by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summed, in a time proportional to the number of different rates at a single site. Thus with three rates, the effort involved is no greater than three times that for a single rate. This "Hidden Markov Model" method allows for rates to differ between sites and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However, it does not allow for correlation of rates at nonadjacent sites, nor does it allow for a continuous distribution of rates over sites. It is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given using beta-hemoglobin DNA sequences in eight mammal species; the regions of high and low evolutionary rates are inferred and also the average length of patches of similar rates.

Savtchenko ES, Freedberg IM, Choi IY, Blumenberg MInactivation of human keratin genes: the spectrum of mutations in the sequence of an acidic keratin pseudogene. Mol Biol Evol 5:97-108

February 1988


40 Reads

Keratins are cytoskeletal proteins encoded by a multigene family. We have identified the first human keratin pseudogene and determined its complete nucleotide sequence. Sequence comparisons indicate that the pseudogene arose from a very recent duplication of the 50-kd keratin (K14) gene. The coding and the intron sequences of the two genes are 95% and 93% identical, respectively. Although the sequence of the regulatory region in the pseudogene is virtually identical to that in the 50-kd functional gene, several deleterious mutations have been identified in the pseudogene. There are three frameshifts in the coding regions, one of which is a perfect 8-bp duplication. A single-base-pair deletion in the first exon and a single-base-pair insertion in the penultimate exon also result in frameshifts. The three remaining deleterious mutations interfere with the mRNA processing signals: two alter the intron/exon boundaries, and the third disrupts the polyadenylation signal. These mutations clearly identify the sequence as a human keratin pseudogene.

Clark, A. G. and Lanigan, C. M. S.. Prospects for estimating nucleotide divergence with RAPDs. Mol Biol Evol, 10: 1096-1111

October 1993


101 Reads

The technique of random amplification of polymorphic DNA (RAPD), which is simply polymerase chain reaction (PCR) amplification of genomic DNA by a single short oligonucleotide primer, produces complex patterns of anonymous polymorphic DNA fragments. The information provided by these banding patterns has proved to be of great utility for mapping and for verification of identity of bacterial strains. Here we consider whether the degree of similarity of the banding patterns can be used to estimate nucleotide diversity and nucleotide divergence. With haploid data, fragments generated by RAPD-PCR can be treated in a fashion very similar to that for restriction-fragment data. Amplification of diploid samples, on the other hand, requires consideration of the fact that presence of a band is dominant to absence of the band. After describing a method for estimating nucleotide divergence on the basis of diploid samples, we summarize the restrictions and criteria that must be met when RAPD data are used for estimating population genetic parameters.

Table 1 Rep and Rep-Like Genes and Proteins and the Organisms in Which They Were Identified 
Values from BlastP Searches Indicating the Significance of the Similarities Among the Rep-Like Genes and the Rep Genes of Circoviruses and the pLS1 Family Plasmids 
Gibbs MJ, Smeianov VV, Steele JL, Upcroft P, Efimov Ba.. Two families of Rep-like genes that probably originated by interspecies recombination are represented in viral, plasmid, bacterial, and parasitic protozoan genomes. Mol Biol Evol 23: 1097-1100

July 2006


63 Reads

Two families of genes related to, and including, rolling circle replication initiator protein (Rep) genes were defined by sequence similarity and by evidence of intergene family recombination. The Rep genes of circoviruses were the best characterized members of the “RecRep1 family.” Other members of the RecRep1 family were Rep-like genes found in the genomes of the Canarypox virus, Entamoeba histolytica, and Giardia duodenalis and in a plasmid, p4M, from the Gram-positive bacterium, Bifidobacterium pseudocatenulatum. The “RecRep2 family” comprised some previously identified Rep-like genes from plasmids of phytoplasmas and similar Rep-like genes from the genomes of Lactobacillus acidophilus, Lactococcus lactis, and Phytoplasma asteris. Both RecRep1 and RecRep2 proteins have a nucleotide-binding domain significantly similar to the helicases (2C proteins) of picorna-like viruses. On the N-terminal side of the nucleotide binding domain, RecRep1 proteins have a domain significantly similar to one found in nanovirus Reps, whereas RecRep2 proteins have a domain significantly similar to one in the Reps of pLS1 plasmids. We speculate that RecRep genes have been transferred from viruses or plasmids to parasitic protozoan and bacterial genomes and that Rep proteins were themselves involved in the original recombination events that generated the ancestral RecRep genes.

Eyre-Walker, A. DNA mismatch repair and synonymous codon evolution in mammals. Mol. Biol. Evol. 11, 88-98

February 1994


26 Reads

It has been suggested that the differences in synonymous codon use between mammalian genes within a genome are due to differences in the efficiency of DNA mismatch repair. This hypothesis was tested by developing a model of mismatch repair, which was used to predict the expected relationship between the rate of substitution and G+C content at silent sites. It was found that the silent-substitution rate should decline with increasing G+C content over most of the G+C-content range, if it is assumed that mismatch repair is G+C biased, an assumption which is supported by data. This prediction was then tested on a set of 58 primate and artiodactyl genes. There was no evidence of a direct decline in substitution rate with increasing G+C content, for either twofold- or fourfold-degenerate sites. It was therefore concluded that variation in the efficiency of mismatch repair is not responsible for the differences in synonymous codon use between mammalian genes. In support of this conclusion, analysis of the model also showed that the parameter range over which mismatch repair can explain the differences in synonymous codon use between genes is very small.

Interrogating 11 Fast-Evolving Genes for Signatures of Recent Positive Selection in Worldwide Human Populations

August 2009


200 Reads

Different signatures of natural selection persist over varying time scales in our genome, revealing possible episodes of adaptative evolution during human history. Here, we identify genes showing signatures of ancestral positive selection in the human lineage and investigate whether some of those genes have been evolving adaptatively in extant human populations. Specifically, we compared more than 11,000 human genes with their orthologs in chimpanzee, mouse, rat, and dog and applied a branch-site likelihood method to test for positive selection on the human lineage. Among the significant cases, a robust set of 11 genes was then further explored for signatures of recent positive selection using single nucleotide polymorphism (SNP) data. We genotyped 223 SNPs in 39 worldwide populations from the HGDP-CEPH diversity panel and supplemented this information with available genotypes for up to 4,814 SNPs distributed along 2 Mb centered on each gene. After exploring the allele frequency spectrum, population differentiation and the maintenance of long unbroken haplotypes, we found signals of recent adaptative phenomena in only one of the 11 candidate gene regions. However, the signal of recent selection in this region may come from a different, neighboring gene (CD5) rather than from the candidate gene itself (VPS37C). For this set of positively selected genes in the human lineage, we find no indication that these genes maintained their rapid evolutionary pace among human populations. Based on these data, it therefore appears that adaptation for human-specific and for population-specific traits may have involved different genes.

Evolution of Siglec-11 and Siglec-16 Genes in Hominins

March 2012


231 Reads

We previously reported a human-specific gene conversion of SIGLEC11 by an adjacent paralogous pseudogene (SIGLEC16P), generating a uniquely human form of the Siglec-11 protein, which is expressed in the human brain. Here, we show that Siglec-11 is expressed exclusively in microglia in all human brains studied-a finding of potential relevance to brain evolution, as microglia modulate neuronal survival, and Siglec-11 recruits SHP-1, a tyrosine phosphatase that modulates microglial biology. Following the recent finding of a functional SIGLEC16 allele in human populations, further analysis of the human SIGLEC11 and SIGLEC16/P sequences revealed an unusual series of gene conversion events between two loci. Two tandem and likely simultaneous gene conversions occurred from SIGLEC16P to SIGLEC11 with a potentially deleterious intervening short segment happening to be excluded. One of the conversion events also changed the 5' untranslated sequence, altering predicted transcription factor binding sites. Both of the gene conversions have been dated to ~1-1.2 Ma, after the emergence of the genus Homo, but prior to the emergence of the common ancestor of Denisovans and modern humans about 800,000 years ago, thus suggesting involvement in later stages of hominin brain evolution. In keeping with this, recombinant soluble Siglec-11 binds ligands in the human brain. We also address a second-round more recent gene conversion from SIGLEC11 to SIGLEC16, with the latter showing an allele frequency of ~0.1-0.3 in a worldwide population study. Initial pseudogenization of SIGLEC16 was estimated to occur at least 3 Ma, which thus preceded the gene conversion of SIGLEC11 by SIGLEC16P. As gene conversion usually disrupts the converted gene, the fact that ORFs of hSIGLEC11 and hSIGLEC16 have been maintained after an unusual series of very complex gene conversion events suggests that these events may have been subject to hominin-specific selection forces.

Roy MS, Geffen E, Smith D, Ostrander EA, Wayne RK. Patterns of differentiation and hybridization in North American wolf like canids, revealed by analysis of microsatellite loci. Mol Biol Evol 11: 553-570

August 1994


82 Reads

Genetic divergence and gene flow among closely related populations are difficult to measure because mutation rates of most nuclear loci are so low that new mutations have not had sufficient time to appear and become fixed. Microsatellite loci are repeat arrays of simple sequences that have high mutation rates and are abundant in the eukaryotic genome. Large population samples can be screened for variation by using the polymerase chain reaction and polyacrylamide gel electrophoresis to separate alleles. We analyzed 10 microsatellite loci to quantify genetic differentiation and hybridization in three species of North American wolflike canids. We expected to find a pattern of genetic differentiation by distance to exist among wolflike canid populations, because of the finite dispersal distances of individuals. Moreover, we predicted that, because wolflike canids are highly mobile, hybrid zones may be more extensive and show substantial changes in allele frequency, relative to nonhybridizing populations. We demonstrate that wolves and coyotes do not show a pattern of genetic differentiation by distance. Genetic subdivision in coyotes, as measured by theta and Gst, is not significantly different from zero, reflecting persistent gene flow among newly established populations. However, gray wolves show significant subdivision that may be either due to drift in past Ice Age refugia populations or a result of other causes. Finally, in areas where gray wolves and coyotes hybridize, allele frequencies of gray wolves are affected, but those of coyotes are not. Past hybridization between the two species in the south-central United States may account for the origin of the red wolf.

FIG. 1. Graphical representation of synteny between the orthologous and paralogous 11 and 12 contigs in the RefSeq, Oryza glaberrima, and Oryza brachyantha. Coordinates are indicated in kilobases. The segments for the RefSeq correspond to 1.42–2.51 Mb on chromosome 11 and 1.34–2.54 Mb on chromosome 12. Lines represent sequence similarity comparison by BlastN, with blue lines representing inverted matches. The minimum score and size of matches are 300 and 300 bp, respectively. The CDS composition of each contig is shown, with a color code indicating their presence/absence on the six homologous chromosomes.  
FIG. 2. Evolutionary scheme of the 11-12 duplicated block in the Oryza genus, as a function of conversion events in the FF and AA lineages. A 5 Ancestor of AA lineage, B 5 Oryza brachyantha, G 5 Oryza glaberrima, S 5 Oryza sativa. Conversion is inferred based on topological incongruency with the topology 0. *Only one example of topology 1M is shown as we group several trees in this class: The first have only one orthologous pair, S11-G11 or S12-G12, clustered in a terminal branch, whereas the two remaining genes form intermediate branches between this cluster and the O. brachyantha node. The second have only one paralogous pair, S11-S12 or G11-G12, clustered in a terminal branch, whereas the two remaining genes form intermediate branches between this cluster and the O. brachyantha node. This topology is ambiguous as it could reveal 1) too weak divergence of the four AA genes to resolve their phylogenetic relationships, 2) the strong divergence of one of these genes blurring their true relationships, and 3) conversion in one of the AA lineages after their divergence.
FIG. 4. Frequency distribution of BI nucleotide distances between paralogous 500 bp fragments of the whole contig alignment. The inset histograms show distance distributions in converted zone 1 only.  
Long-Range and Targeted Ectopic Recombination between the Two Homeologous Chromosomes 11 and 12 in Oryza Species

May 2011


95 Reads

Whole genome duplication (WGD) and subsequent evolution of gene pairs have been shown to have shaped the present day genomes of most, if not all, plants and to have played an essential role in the evolution of many eukaryotic genomes. Analysis of the rice (Oryza sativa ssp. japonica) genome sequence suggested an ancestral WGD ∼50–70 Ma common to all cereals and a segmental duplication between chromosomes 11 and 12 as recently as 5 Ma. More recent studies based on coding sequences have demonstrated that gene conversion is responsible for the high sequence conservation which suggested such a recent duplication. We previously showed that gene conversion has been a recurrent process throughout the Oryza genus and in closely related species and that orthologous duplicated regions are also highly conserved in other cereal genomes. We have extended these studies to compare megabase regions of genomic (coding and noncoding) sequences between two cultivated (O. sativa, Oryza glaberrima) and one wild (Oryza brachyantha) rice species using a novel approach of topological incongruency. The high levels of intraspecies conservation of both gene and nongene sequences, particularly in O. brachyantha, indicate long-range conversion events less than 4 Ma in all three species. These observations demonstrate megabase-scale conversion initiated within a highly rearranged region located at ∼2.1 Mb from the chromosome termini and emphasize the importance of gene conversion in cereal genome evolution.

Table 1 Specimens Used in the Investigation of Sry Evolution 
Lundrigan BL, Tucker PK. Tracing paternal ancestry in mice, using the Y-linked, sex-determining locus, Sry. Mol Biol Evol 11: 483-492

May 1994


64 Reads

The molecular evolution of mammalian Y-linked DNA sequences is of special interest because of their unique mode of inheritance: most Y-linked sequences are clonally inherited from father to son. Here we investigate the use of Y-linked sequences for phylogenetic inference. We describe a comparative analysis of a 515-bp region from the male sex-determining locus, Sry, in 22 murine rodents (subfamily Murinae, family Muridae), including representatives from nine species of Mus, and from two additional murine genera--Mastomys and Hylomyscus. Percent sequence divergence was < 0.01% for comparisons between populations within a species and was 0.19%-8.16% for comparisons between species. Our phylogenetic analysis of 12 murine taxa resulted in a single most-parsimonius tree that is highly concordant with phylogenies based on mitochondrial DNA and allozymes. A total evidence tree based on the combined data from Sry, mitochondrial DNA, and allozymes supports (1) the monophyly of the subgenus Mus, (2) its division into a Palearctic group (M. musculus, M. domesticus, M. spicilegus, M. Macedonicus, and M. spretus) and an Oriental group (M. cookii++, M. cervicolor, and M. caroli), and (3) sister-group relationships between M. spicilegus and M. macedonicus and between M. cookii and M. cervicolor. We argue that Y-chromosome DNA sequences represent a valuable new source of characters for phylogenetic inference.

Xu L, Chen H, Hu XH, Zhang RM, Zhang Z, Luo ZW.. Average gene length is highly conserved in prokaryotes and eukaryotes and diverges only between the two kingdoms. Mol Biol Evol 23: 1107-1108

July 2006


95 Reads

The average length of genes in a eukaryote is larger than in a prokaryote, implying that evolution of complexity is related to change of gene lengths. Here, we show that although the average lengths of genes in prokaryotes and eukaryotes are much different, the average lengths of genes are highly conserved within either of the two kingdoms. This suggests that natural selection has clearly set a strong limitation on gene elongation within the kingdom. Furthermore, the average gene size adds another distinct characteristic for the discrimination between the two kingdoms of organisms.

FIG. 1.—Genome structures of K113 and K115 are shown. (A) The proviruses are flanked by a short duplicated host DNA also known as preintegration site. LTRs line the viral genomes both at 5# and 3# ends. The primer positions of the flanking primers (specific to human DNA sequences flanking the proviral insertions) and HERV-K–specific primer (annealing to HERV-K insertions between the 5# LTR and gag gene) are also included. (B) LTR of K113 and K115 indicating loci for various regulatory regions. An additional 17 HERV-K sequences (supplementary table 1, Supplementary Material online) were used to validate the coordinates of the known regulatory regions within the 5# LTR.  
FIG. 2.—ML phylogeny of HERV-K 5# LTR sequences including K113 and K115 haplotypes. Taxon names of all reference sequences include GenBank accession numbers. K113 and K115 sequences are highlighted in red and blue, respectively, and the number of times each haplotype was observed in this study is listed in taxon labels. Scale bar represents 1% genetic distance.  
Table 3 SNP in Various Positions of 5# LTR of K113 and K115
Table 4 Haplotypes and Haplotype Frequencies of K113 and K115 Based on Variations in the 5# LTR
Cross-Sectional Dating of Novel Haplotypes of HERV-K 113 and HERV-K 115 Indicate These Proviruses Originated in Africa before Homo sapiens

September 2009


201 Reads

The human genome, human endogenous retroviruses (HERV), of which HERV-K113 and HERV-K115 are the only known full-length proviruses that are insertionally polymorphic. Although a handful of previously published papers have documented their prevalence in the global population; to date, there has been no report on their prevalence in the United States population. Here, we studied the geographic distribution of K113 and K115 among 156 HIV-1+ subjects from the United States, including African Americans, Hispanics, and Caucasians. In the individuals studied, we found higher insertion frequencies of K113 (21%) and K115 (35%) in African Americans compared with Caucasians (K113 9% and K115 6%) within the United States. We also report the presence of three single nucleotide polymorphism sites in the K113 5' long terminal repeats (LTRs) and four in the K115 5' LTR that together constituted four haplotypes for K113 and five haplotypes for K115. HERV insertion times can be estimated from the sequence differences between the 5' and 3' LTR of each insertion, but this dating method cannot be used with HERV-K115. We developed a method to estimate insertion times by applying coalescent inference to 5' LTR sequences within our study population and validated this approach using an independent estimate derived from the genetic distance between K113 5' and 3' LTR sequences. Using our method, we estimated the insertion dates of K113 and K115 to be a minimum of 800,000 and 1.1 million years ago, respectively. Both these insertion dates predate the emergence of anatomically modern Homo sapiens.

Table 1 Fish-Specific Paralogons in the Fugu Genome 
Table 2 Paralogons in the Real Draft Fugu Genome Sequence and on 1,000 Simulations of the Draft Genome 
Christoffels A, Koh EG, Chia JM, Brenner S, Aparicio S & Venkatesh B.Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol 21: 1146−1151

July 2004


161 Reads

With about 24,000 extant species, teleosts are the largest group of vertebrates. They constitute more than 99% of the ray-finned fishes (Actinopterygii) that diverged from the lobe-finned fish lineage (Sarcopterygii) about 450 MYA. Although the role of genome duplication in the evolution of vertebrates is now established, its role in structuring the teleost genomes has been controversial. At least two hypotheses have been proposed: a whole-genome duplication in an ancient ray-finned fish and independent gene duplications in different lineages. These hypotheses are, however, based on small data sets and lack adequate statistical and phylogenetic support. In this study, we have made a systematic comparison of the draft genome sequences of Fugu and humans to identify paralogous chromosomal regions ("paralogons") in the Fugu that arose in the ray-finned fish lineage ("fish-specific"). We identified duplicate genes in the Fugu by phylogenetic analyses of the Fugu, human, and invertebrate sequences. Our analyses provide evidence for 425 fish-specific duplicate genes in the Fugu and show that at least 6.6% of the genome is represented by fish-specific paralogons. We estimated the ages of Fugu duplicate genes and paralogons using the molecular clock. Remarkably, the ages of duplicate genes and paralogons are clustered, with a peak around 350 MYA. These data strongly suggest a whole-genome duplication event early during the evolution of ray-finned fishes, probably before the origin of teleosts.

Top-cited authors