
Masatoshi Nei
- Pennsylvania State University
Masatoshi Nei
- Pennsylvania State University
About
324
Publications
116,052
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
364,111
Citations
Introduction
Current institution
Publications
Publications (324)
The reliability of a phylogenetic tree obtained from empirical data is usually measured by the bootstrap probability (Pb) of interior branches of the tree. If the bootstrap probability is high for most branches, the tree is considered to be reliable. If some interior branches show relatively low bootstrap probabilities, we are not sure that the inf...
At the present time it is often stated that the maximum likelihood (ML) or the Bayesian method of phylogenetic construction
is more accurate than the neighbor joining (NJ) method. Our computer simulations, however, have shown that the converse is
true if we use p distance in the NJ procedure and the criterion of obtaining the true tree (Pc expresse...
More than 3 years have passed after Walter Fitch died on March 10, 2011. However, my memory of Walter is still fresh and vivid. From the time when he and I started this journal Molecular Biology and Evolution (MBE) in 1983, we had a close contact for about 15 years. He and I also started the Society for Molecular Biology and Evolution (SMBE) in 199...
POPTREE software, including the command line (POPTREE) and the Windows (POPTREE2) versions, is available to perform evolutionary
analyses of allele frequency data, computing distance measures for constructing population trees and average heterozygosity
(H) (measure of genetic diversity within populations) and GST (measure of genetic differentiation...
Sex-lethal (Sxl) functions as the switch gene for sex-determination in Drosophila melanogaster by engaging a regulatory cascade. Thus far the origin and evolution of both the regulatory system and SXL protein's sex-determination function have remained largely unknown. In this study, we explore systematically the Sxl homologs in a wide range of inse...
It is well known that the selection coefficient of a mutant allele varies from generation to generation, and the effect of this factor on genetic variation has been studied by many theoreticians. However, no consensus has been reached. One group of investigators believes that fluctuating selection has an effect of enhancing genetic variation, where...
MicroRNAs (miRNAs) are among the most important regulatory elements of gene expression in animals and plants. However, their origin and evolutionary dynamics have not been studied systematically. In this paper, we identified putative miRNA genes in 11 plant species using the bioinformatic technique and examined their evolutionary changes. Our homol...
MicroRNAs (miRs) are noncoding RNAs that regulate gene expression at the post-transcriptional level. In animals, the target sites of a miR are generally located in the 3' untranslated regions (UTRs) of messenger RNAs. However, how the target sites change during evolution is largely unknown. MiR-iab-4 and miR-iab-4as are known to regulate the expres...
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for m...
One of the most important problems in evolutionary biology is to understand how new species are generated in nature. In the past, it was difficult to study this problem because our lifetime is too short to observe the entire process of speciation. In recent years, however, molecular and genomic techniques have been developed for identifying and stu...
The neutral theory of molecular evolution has been widely accepted and is the guiding principle for studying evolutionary genomics and the molecular basis of phenotypic evolution. Recent data on genomic evolution are generally consistent with the neutral theory. However, many recently published papers claim the detection of positive Darwinian selec...
In PNAS, Briscoe et al. (1) discovered that Heliconius butterflies have two duplicate copies (UVRh1 and UVRh2) of the UV-sensitive gene, and they statistically inferred that the UVRh2 gene was subjected to positive selection in the evolutionary lineage immediately after the gene duplication. Using the branch-site (BS) method of Bayesian statistical...
MicroRNAs (miRs) regulate gene expression at the posttranscriptional level. To obtain some insights into the origins and evolutionary
patterns of miR genes, we have identified miR genes in the genomes of 12 Drosophila species by bioinformatics approaches and examined their evolutionary changes. The results showed that the extant and ancestral
Droso...
Currently, there is a demand for software to analyze polymorphism data such as microsatellite DNA and single nucleotide polymorphism
with easily accessible interface in many fields of research. In this article, we would like to make an announcement of POPTREE2,
a computer program package, that can perform evolutionary analyses of allele frequency d...
All jawed vertebrates produce immunoglobulins (IGs) as a defense mechanism against pathogens. Typically, IGs are composed
of two identical heavy chains (IGH) and two identical light chains (IGL). Most tetrapod species encode more than one isotype
of light chains. Chicken is the only representative of birds for which genomic information is currently...
It is unfortunate that Yang et al. (1) misrepresent the contents and conclusions of our recent study (2). Our study was motivated by the recent publication of papers reporting the detection of positive selection in many genes of the human lineage by using the branch-site method (BSM) and others, where the ratio (ω) of nonsynonymous to synonymous nu...
We have studied the genomic structure and evolutionary pattern of immunoglobulin kappa deleting element (KDE) and three kappa enhancers (KE5', KE3'P, and KE3'D) in eleven mammalian genomic sequences. Our results show that the relative positions and the genomic organization of the KDE and the kappa enhancers are conserved in all mammals studied and...
Natural selection operating in protein-coding genes is often studied by examining the ratio (omega) of the rates of nonsynonymous to synonymous nucleotide substitution. The branch-site method (BSM) based on a likelihood ratio test is one of such tests to detect positive selection for a predetermined branch of a phylogenetic tree. However, because t...
In recent years, copy number variation (CNV) of DNA segments has become a hot topic in the study of genetic variation, and a large amount of CNVs has been uncovered in human populations. The CNVs involving the smallest units of DNA segments are microsatellite DNAs, and the evolutionary change of microsatellite DNAs is believed to occur mostly by th...
F-box proteins are substrate-recognition components of the Skp1-Rbx1-Cul1-F-box protein (SCF) ubiquitin ligases. In plants, F-box genes form one of the largest multigene superfamilies and control many important biological functions. However, it is unclear how and why plants have acquired a large number of F-box genes. Here we identified 692, 337, a...
Mob proteins from distantly related eukaryotic species share very high sequence similarity and they are charac-teristic of a conserved Mob domain with around 180 amino-acid residues in length. However, the evolutionary relation-ship of mob family genes has not been extensively investigated. Through a phylogenetic approach, we have conducted a compr...
Chemosensory receptors are essential for the survival of organisms that range from bacteria to mammals. Recent studies have shown that the numbers of functional chemosensory receptor genes and pseudogenes vary enormously among the genomes of different animal species. Although much of the variation can be explained by the adaptation of organisms to...
Peer Reviewed http://deepblue.lib.umich.edu/bitstream/2027.42/62801/1/456317a.pdf
In eukaryotes, the assembly and elongation of unbranched actin filaments is controlled by formins, which are long, multidomain proteins. These proteins are important for dynamic cellular processes such as determination of cell shape, cell division, and cellular interaction. Yet, no comprehensive study has been done about the origins and evolution o...
The phylogenetic relationships of Ig light chain (IGL) genes are difficult to resolve, because these genes are short and evolve relatively fast. Here, we classify the IGL sequences from 12 tetrapod species into three distinct groups (κ, λ, and σ isotypes) using conserved amino acid residues, recombination signal sequences, and genomic organization...
All bilaterian animals share a general genetic framework that controls the formation of their body structures, although their forms are highly diversified. The Hox genes that encode transcription factors play a central role in this framework. All Hox proteins contain a highly conserved homeodomain encoded by the homeobox motif, but the other region...
A complete list of identified UCRs based on pairwise comparisons among mammalian Hox genes.
Multiple alignments of the nucleotide sequences of UCRs
A list of UCRs of Hox genes that are used for concatenated multiple sequence alignment and phylogenetic tree construction. The nucleotide positions of each UCR are listed in the right column. The name of each UCR is denoted in parentheses.
Conserved noncoding sequences flanking the UCRs. A. The conservation of noncoding sequences flanking the first exons of HoxC4, HoxC5 and HoxC6 from UCSC Human Genome Brower. The position of the coding regions is highlighted by red bar. Transcription direction is indicated by arrow. B. The accumulations of nucleotide mutations in the conserved regio...
The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of
homologous gene sequences either from multigene families or from different species with a special emphasis on inferring evolutionary
relationships and patterns of DNA and protein evolution. In addition to the tools for statisti...
Recent studies about the structural variation of genomic sequences have shown that there is a large amount of copy number variations (CNVs) of genes within species. Analyzing Redon et al.'s (2006) crude data on copy number variable regions (CNVRs), we previously showed that CNVs are particularly high for chemosensory receptor genes in human populat...
Microsatellite DNA loci or short tandem repeats (STRs) are abundant in eukaryotic genomes and are often used for constructing phylogenetic trees of closely related populations or species. These phylogenetic trees are usually constructed by using some genetic distance measure based on allele frequency data, and there are many distance measures that...
Immunoglobulin heavy chains are polypeptides encoded by four genes: variable (IGHV), joining (IGHJ), diversity (IGHD), and constant (IGHC) region genes. The number of IGHV genes varies from species to species. To understand the evolution of the IGHV multigene family, we identified and analyzed the IGHV sequences from 16 vertebrate species. The resu...
The number of sensory receptor genes varies extensively among different mammalian species. This variation is believed to be caused partly by physiological requirements of animals and partly by genomic drift due to random duplication and deletion of genes. If the contribution of genomic drift is substantial, each species should contain a significant...
We announce the release of the fourth version of MEGA software, which expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary...
Names of nodes and branches for (A) Table S1 and (B) Table S2.
(0.22 MB PDF)
Estimated numbers of genes in the ancestral species and those of gene gains and losses for the Euarchontoglires tree and various bootstrap condensed trees.
(0.03 MB PDF)
Estimated numbers of genes in the ancestral species and those of gene gains and losses for the mouse-outside tree and various bootstrap condensed trees.
(0.03 MB PDF)
(A) A neighbor-joining (NJ) phylogenetic tree for 265 functional OR genes in platypuses and 1,188 genes in opossums. Purple and blue lines represent branches for platypuses and opossums, respectively. Bootstrap values obtained from 500 replications are shown for the branches determining Class I clade and 34 Class II clades. The scale bar indicates...
Estimation of the numbers of genes in the ancestral species and those of gene gains and losses by the reconciled tree method. See Protocol S1. (A) A species tree. (B) A gene tree. (C) A gene tree for estimating the number of genes α in (A). A diamond represents the divergence between marsupials and placentals. A dashed line indicates a gene loss. (...
Names of functional OR genes belonging to each clade.
(0.47 MB DOC)
Flowchart for the identification of functional OR genes and OR pseudogenes. See Materials and Methods and Protocol S1 for details.
(0.30 MB PDF)
Amino acid sequences of OR genes from six mammalian species. “Oran”, “Modo”, “Bota”, “Cafa”, “Rano”, and “Mamu” represent platypus, opossum, cow, dog, rat, and macaque OR genes, respectively. A gene name with “P” and “T” indicate a pseudogene and a truncated gene, respectively. An asterisk and a slash in an amino acid sequence represent a stop codo...
Odor perception in mammals is mediated by a large multigene family of olfactory receptor (OR) genes. The number of OR genes varies extensively among different species of mammals, and most species have a substantial number of pseudogenes. To gain some insight into the evolutionary dynamics of mammalian OR genes, we identified the entire set of OR ge...
Recent studies of developmental biology have shown that the genes controlling phenotypic characters expressed in the early stage of development are highly conserved and that recent evolutionary changes have occurred primarily in the characters expressed in later stages of development. Even the genes controlling the latter characters are generally c...
Olfactory receptor (OR) genes are of vital importance for animals to find food, identify mates, and avoid dangers. In mammals, the number of OR genes is large and varies extensively among different orders, whereas, in insects, the extent of interspecific variation appears to be small, although only a few species have been studied. To understand the...
To understand the evolutionary process of the DNA mismatch repair system, we conducted systematic phylogenetic analysis of
its key components, the bacterial MutS and MutL genes and their eukaryotic homologs. Based on genome-wide homolog searches, we identified three new MutS subfamilies (MutS3-5) in addition to the previously studied MutS1 and MutS...
The bacterial recA gene and its eukaryotic homolog RAD51 are important for DNA repair, homologous recombination, and genome stability. Members of the recA/RAD51 family have functions that have differentiated during evolution. However, the evolutionary history and relationships of these members remains unclear. Homolog searches in prokaryotes and eu...
Evolutionary distance refers to the number of nucleotide substitutions per site between two homologous DNA sequences or the number of amino acid substitutions per site between two homologous protein sequences.
The natural killer (NK) receptor gene complex (NKC) encodes a large number of C-type lectin-like receptors, which are expressed on NK and other immune-related cells. These receptors play an important role in regulating NK-cell cytolytic activity, protecting cells against virus infection and tumorigenesis. To understand the evolutionary history of t...
The numbers of functional olfactory receptor (OR) genes in humans and mice are about 400 and 1,000 respectively. In both humans and mice, these genes exist as genomic clusters and are scattered over almost all chromosomes. The difference in the number of genes between the two species is apparently caused by massive inactivation of OR genes in the h...
Charles Darwin proposed that evolution occurs primarily by natural selection, but this view has been controversial from the beginning. Two of the major opposing views have been mutationism and neutralism. Early molecular studies suggested that most amino acid substitutions in proteins are neutral or nearly neutral and the functional change of prote...
It has been known that the conservation or diversity of homeobox genes is responsible for the similarity and variability of some of the morphological or physiological characters among different organisms. To gain some insights into the evolutionary pattern of homeobox genes in bilateral animals, we studied the change of the numbers of these genes d...
Olfaction, which is an important physiological function for the survival of mammals, is controlled by a large multigene family of olfactory receptor (OR) genes. Fishes also have this gene family, but the number of genes is known to be substantially smaller than in mammals. To understand the evolutionary dynamics of OR genes, we conducted a phylogen...
In mammals many natural killer (NK) cell receptors, encoded by the leukocyte receptor complex (LRC), regulate the cytotoxic activity of NK cells and provide protection against virus-infected and tumor cells. To investigate the origin of the Ig-like domains encoded by the LRC genes, a subset of C2-type Ig-like domain sequences was compiled from mamm...
In mammals, the cell surface receptors encoded by the leukocyte receptor complex (LRC) regulate the activity of T lymphocytes and B lymphocytes, as well as that of natural killer cells, and thus provide protection against pathogens and parasites. The chicken genome encodes many Ig-like receptors that are homologous to the LRC receptors. The chicken...
The gene family of killer cell immunoglobulin-like receptors (KIRs) in primates provides the first line of defense against virus infection and tumor transformation. Interacting with MHC class I molecules, KIRs can regulate the cytotoxic activity of natural killer (NK) cells and distinguish the tumor and virus infected cells from normal body cells....
The chimpanzee is our closest living relative. The morphological differences between the two species are so large that there is no problem in distinguishing between them. However, the nucleotide difference between the two species is surprisingly small. The early genome comparison by DNA hybridization techniques suggested a nucleotide difference of...
The numbers of functional olfactory receptor (OR) genes are quite variable among mammalian species. Previously we have reported that humans have 388 functional OR genes and 414 pseudogenes, while mice have 1037 functional genes and 354 pseudogenes. These observations suggest either that humans lost many functional OR genes after the human-mouse div...
Until around 1990, most multigene families were thought to be subject to concerted evolution, in which all member genes of a family evolve as a unit in concert. However, phylogenetic analysis of MHC and other immune system genes showed a quite different evolutionary pattern, and a new model called birth-and-death evolution was proposed. In this mod...
Olfactory receptor (OR) genes form the largest multigene family in mammalian genomes. Humans have approximately 800 OR genes, but >50% of them are pseudogenes. By contrast, mice have approximately 1400 OR genes and pseudogenes are approximately 25%. To understand the evolutionary processes that shaped the difference of OR gene families between huma...
A simple statistical method for predicting the functional differentiation of duplicate genes was developed. This method is
based on the premise that the extent of functional differentiation between duplicate genes is reflected in the difference
in evolutionary rate because the functional change of genes is often caused by relaxation or intensificat...
Ly49 genes regulate the cytotoxic activity of natural killer (NK) cells in rodents and provide important protection against virus-infected or tumor cells. About 15 Ly49 genes have been identified in mice, but only a few genes have been reported to date in rats. Here we studied all Ly49 genes in the entire rat genome sequence and identified 17 putat...
Current efforts to reconstruct the tree of life and histories of multigene families demand the inference of phylogenies consisting of thousands of gene sequences. However, for such large data sets even a moderate exploration of the tree space needed to identify the optimal tree is virtually impossible. For these cases the neighbor-joining (NJ) meth...
With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein
sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating
rates of molecular evolution, and inferring the nature and extent of selective forces shapi...
With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shapi...
Sexually induced gene 1 (Sig1) in the centric diatom Thalassiosira weissflogii is considered to encode a gamete recognition protein. Sorhannus (2003) analyzed nucleotide sequences of Sig1 using parsimony analysis and the maximum-likelihood (ML)-based Bayesian method for inferring positive selection at single amino acid sites and reported that posit...
We have identified the Hsp70 gene superfamily of the nematode Caenorhabditis briggsae and investigated the evolution of these genes in comparison with Hsp70 genes from C. elegans, Drosophila, and yeast. The Hsp70 genes are classified into three monophyletic groups according to their subcellular localization, namely, cytoplasm (CYT), endoplasmic ret...
Plant MADS-box genes form a large gene family for transcription factors and are involved in various aspects of developmental processes, including flower development. They are known to be subject to birth-and-death evolution, but the detailed features of this mode of evolution remain unclear. To have a deeper insight into the evolutionary pattern of...
Olfactory receptor (OR) genes form the largest known multigene family in the human genome. To obtain some insight into their evolutionary history, we have identified the complete set of OR genes and their chromosomal locations from the latest human genome sequences. We detected 388 potentially functional genes that have intact ORFs and 414 apparent...
MADS-box genes in plants control various aspects of development and reproductive processes including flower formation. To obtain some insight into the roles of these genes in morphological evolution, we investigated the origin and diversification of floral MADS-box genes by conducting molecular evolutionary genetics analyses. Our results suggest th...
We know that genes with related functions tend to cluster together in genomes but the cause of this pattern is less clear. Now, Csaba Pal and Laurence Hurst, in a study published in Nature Genetics, show that in yeast such genes not only tend to occur together but also tend to be inherited together, indicating that natural selection may favour func...
The major histocompatibility complex (MHC) is a multigene family that mediates the host immune response by helping T lymphocytes to recognize and respond to foreign antigens. The high degree of polymorphism and a quick turnover of the genetic loci make the evolution of MHC genes an intriguing subject of study. To understand the evolutionary pattern...
Although the phylogenetic relationships of major lineages of primate species are relatively well established, the times of divergence of these lineages as estimated by molecular data are still controversial. This controversy has been generated in part because different authors have used different types of molecular data, different statistical metho...
Murphy and colleagues reported that the mammalian phylogeny was resolved by Bayesian phylogenetics. However, the DNA sequences they used had many alignment gaps and undetermined nucleotide sites. We therefore reanalyzed their data by minimizing unshared nucleotide sites and retaining as many species as possible (13 species). In constructing phyloge...
Bayesian phylogenetics has recently been proposed as a powerful method for inferring molecular phylogenies, and it has been reported that the mammalian and some plant phylogenies were resolved by using this method. The statistical confidence of interior branches as judged by posterior probabilities in Bayesian analysis is generally higher than that...
Inferring positive selection at single amino acid sites is of biological and medical importance. Parsimony-based and likelihood-based methods have been developed for this purpose, but the reliabilities of these methods are not well understood. Because the evolutionary models assumed in these methods are only rough approximations to reality, it is d...
Endosymbionts, which are widely observed in nature, have undergone reductive genome evolution because of their long-term intracellular lifestyle. Here we compared the complete genome sequences of two different endosymbionts, Buchnera and a protist mitochondrion, with their close relatives to study the evolutionary rates of functional genes in endos...
Histones are small basic proteins encoded by a multigene family and are responsible for the nucleosomal organization of chromatin in eukaryotes. Because of the high degree of protein sequence conservation, it is generally believed that histone genes are subject to concerted evolution. However, purifying selection can also generate a high degree of...
Statistical methods for estimating divergence times by using multiprotein gamma distances are discussed. When a large number of proteins are used, even a small degree of deviation from the molecular clock hypothesis can be detected. In this case, one may use the stem-lineage method for estimating divergence times. However, the estimates obtained by...
Influenza A, B, and C viruses are the etiological agents of influenza. Hemagglutinin (HA) is the major envelope glycoprotein of influenza A and B viruses, and hemagglutinin-esterase (HE) in influenza C viruses is a protein homologous to HA. Because influenza A virus pandemics in humans appear to occur when new subtypes of HA genes are introduced fr...
A typical immunoglobulin (Ig) molecule is composed of four polypeptide chains: two identical heavy (H) chains and two identical light (L) chains. This tetrameric structure is conserved in almost all jawed vertebrate species. However, it has been discovered that camels and llamas (family: Camelidae) possess a type of dimeric Ig that consists of two...
In some species, histone gene clusters consist of tandem arrays of each type of histone gene, whereas in other species the
genes may be clustered but not arranged in tandem. In certain species, however, histone genes are found scattered across several
different chromosomes. This study examines the evolution of histone 3 (H3) genes that are not arra...
The reliabilities of parsimony-based and likelihood-based methods for inferring positive selection at single amino acid sites were studied using the nucleotide sequences of human leukocyte antigen ( HLA ) genes, in which positive selection is known to be operating at the antigen recognition site. The results indicate that the inference by parsimony...
Unlabelled:
We have developed a new software package, Molecular Evolutionary Genetics Analysis version 2 (MEGA2), for exploring and analyzing aligned DNA or protein sequences from an evolutionary perspective. MEGA2 vastly extends the capabilities of MEGA version 1 by: (1) facilitating analyses of large datasets; (2) enabling creation and analyses...
Unlabelled:
ADAPTSITE is a program package for detecting natural selection at single amino acid sites, using a multiple alignment of protein-coding sequences for a given phylogenetic tree. The program infers ancestral codons at all interior nodes, and computes the total numbers of synonymous (c(S)) and nonsynonymous (c(N)) substitutions as well as...
The diversity of T-cell receptors is generated primarily by the variable-region gene families, each of which is composed of a large number of member genes. The entire genomic sequence of the variable region (VB) of the T- cell receptor beta chain from humans and mice has become available. To understand the evolutionary dynamics of the VB gene famil...
When many protein sequences are available for estimating the time of divergence between two species, it is customary to estimate the time for each protein separately and then use the average for all proteins as the final estimate. However, it can be shown that this estimate generally has an upward bias, and that an unbiased estimate is obtained by...