-
Toyoyuki Takada,
Toshinobu Ebata, Hideki Noguchi,
Thomas Keane,
David Adams,
Takanori Narita,
Tadasu Shin-I,
Hironori Fujisawa,
Atsushi Toyoda,
Kuniya Abe,
Yuichi Obata,
Yoshiyuki Sakaki,
Kazuo Moriwaki,
Asao Fujiyama,
Yuji Kohara,
Toshihiko Shiroishi
[show abstract]
[hide abstract]
ABSTRACT: Commonly used classical inbred mouse strains have mosaic genomes with sequences from different subspecific origins. Their genomes are derived predominantly from the Western European subspecies Mus musculus domesticus, with the remaining sequences derived mostly from the Japanese subspecies M. m. molossinus. However, it remains unknown how this intersubspecific genome introgression occurred during the establishment of classical inbred strains. In this study, we resequenced the genomes of M. m. molossinus -derived two inbred strains, MSM/Ms and JF1/Ms. MSM/Ms originated from Japanese wild mice, and ancestry of JF1/Ms was originally found in Europe and then transferred to Japan. We compared the characteristics of these sequences to those of the C57BL/6J reference sequence and the recent datasets from the resequencing of 17 inbred strains in the Mouse Genome Project (MGP), and the results unequivocally show that genome introgression from M. m. molossinus into M. m. domesticus provided the primary framework for the mosaic genomes of classical inbred strains. Furthermore, the genomes of C57BL/6J and other classical inbred strains have long consecutive segments with extremely high similarity (>99.998%) to the JF1/Ms strain. In the early 20th century, Japanese waltzing mice with a morphological phenotype resembling that of JF1/Ms mice were often crossed with European fancy mice for early studies of "Mendelism" which suggests that that the ancestor of the extant JF1/Ms strain provided the origin of the M. m. molossinus genome in classical inbred strains and largely contributed to its intersubspecific genome diversity.
Genome Research 04/2013; · 13.61 Impact Factor
-
Hideto Takami, Hideki Noguchi,
Yoshihiro Takaki,
Ikuo Uchiyama,
Atsushi Toyoda,
Shinro Nishi,
Gab-Joo Chee,
Wataru Arai,
Takuro Nunoura,
Takehiko Itoh,
Masahira Hattori,
Ken Takai
[show abstract]
[hide abstract]
ABSTRACT: A nearly complete genome sequence of Candidatus 'Acetothermum autotrophicum', a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that Ca. 'A. autotrophicum' is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO(2) fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of Ca. 'A. autotrophicum' is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of Ca. 'A. autotrophicum' support the view that the first bacterial and archaeal lineages were H(2)-dependent acetogens and methanogenes living in hydrothermal environments.
PLoS ONE 01/2012; 7(1):e30559. · 4.09 Impact Factor
-
Hideki Noguchi
05/2011: pages 433 - 439; , ISBN: 9781118010518
-
[show abstract]
[hide abstract]
ABSTRACT: We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries.
DNA Research 01/2011; 18(3):165-76. · 5.16 Impact Factor
-
Yoshiyuki Sakuraba,
Toru Kimura,
Hiroshi Masuya, Hideki Noguchi,
Hideki Sezutsu,
K Ryo Takahasi,
Atsushi Toyoda,
Ryutaro Fukumura,
Takuya Murata,
Yoshiyuki Sakaki,
Masayuki Yamamura,
Shigeharu Wakana,
Tetsuo Noda,
Toshihiko Shiroishi,
Yoichi Gondo
[show abstract]
[hide abstract]
ABSTRACT: Comparative sequence analyses have identified highly conserved genomic DNA sequences, including noncoding sequences, between humans and other species. By performing whole-genome comparisons of human and mouse, we have identified 611 conserved noncoding sequences longer than 500 bp, with more than 95% identity between the species. These long conserved noncoding sequences (LCNS) include 473 new sequences that do not overlap with previously reported ultraconserved elements (UCE), which are defined as aligned sequences longer than 200 bp with 100% identity in human, mouse, and rat. The LCNS were distributed throughout the genome except for the Y chromosome and often occurred in clusters within regions with a low density of coding genes. Many of the LCNS were also highly conserved in other mammals, chickens, frogs, and fish; however, we were unable to find orthologous sequences in the genomes of invertebrate species. In order to examine whether these conserved sequences are functionally important or merely mutational cold spots, we directly measured the frequencies of ENU-induced germline mutations in the LCNS of the mouse. By screening about 40.7 Mb, we found 35 mutations, including mutations at nucleotides that were conserved between human and fish. The mutation frequencies were equivalent to those found in other genomic regions, including coding sequences and introns, suggesting that the LCNS are not mutational cold spots at all. Taken together, these results suggest that mutations occur with equal frequency in LCNS but are eliminated by natural selection during the course of evolution.
Mammalian Genome 12/2008; 19(10-12):703-12. · 2.89 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Recent advances in DNA sequencers are accelerating genome sequencing, especially in microbes, and complete and draft genomes from various species have been sequenced in rapid succession. Here, we present a comprehensive gene prediction tool, the MetaGeneAnnotator (MGA), which precisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths. The MGA integrates statistical models of prophage genes, in addition to those of bacterial and archaeal genes, and also uses a self-training model from input sequences for predictions. As a result, the MGA sensitively detects not only typical genes but also atypical genes, such as horizontally transferred and prophage genes in a prokaryotic genome. In this paper, we also propose a novel approach for analyzing the ribosomal binding site (RBS), which enables us to detect species-specific patterns of the RBSs. The MGA has the ingenious RBS model based on this approach, and precisely predicts translation starts of genes. The MGA also succeeds in improving prediction accuracies for short sequences by using the adapted RBS models (96% sensitivity and 93% specificity for 700 bp fragments). These features of the MGA expedite wide ranges of microbial genome studies, such as genome annotations and metagenome analyses.
DNA Research 11/2008; 15(6):387-96. · 5.16 Impact Factor
-
Ken Kurokawa,
Takehiko Itoh,
Tomomi Kuwahara,
Kenshiro Oshima,
Hidehiro Toh,
Atsushi Toyoda,
Hideto Takami,
Hidetoshi Morita,
Vineet K Sharma,
Tulika P Srivastava,
Todd D Taylor, Hideki Noguchi,
Hiroshi Mori,
Yoshitoshi Ogura,
Dusko S Ehrlich,
Kikuji Itoh,
Toshihisa Takagi,
Yoshiyuki Sakaki,
Tetsuya Hayashi,
Masahira Hattori
[show abstract]
[hide abstract]
ABSTRACT: Numerous microbes inhabit the human intestine, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects human physiology. To identify the genomic features common to all human gut microbiomes as well as those variable among them, we performed a large-scale comparative metagenomic analysis of fecal samples from 13 healthy individuals of various ages, including unweaned infants. We found that, while the gut microbiota from unweaned infants were simple and showed a high inter-individual variation in taxonomic and gene composition, those from adults and weaned children were more complex but showed a high functional uniformity regardless of age or sex. In searching for the genes over-represented in gut microbiomes, we identified 237 gene families commonly enriched in adult-type and 136 families in infant-type microbiomes, with a small overlap. An analysis of their predicted functions revealed various strategies employed by each type of microbiota to adapt to its intestinal environment, suggesting that these gene sets encode the core functions of adult and infant-type gut microbiota. By analysing the orphan genes, 647 new gene families were identified to be exclusively present in human intestinal microbiomes. In addition, we discovered a conjugative transposon family explosively amplified in human gut microbiomes, which strongly suggests that the intestine is a 'hot spot' for horizontal gene transfer between microbes.
DNA Research 09/2007; 14(4):169-81. · 5.16 Impact Factor
-
Nobuyoshi Sugaya,
Kazuyoshi Ikeda,
Toshiyuki Tashiro,
Shizu Takeda,
Jun Otomo,
Yoshiko Ishida,
Akiko Shiratori,
Atsushi Toyoda, Hideki Noguchi,
Tadayuki Takeda,
Satoru Kuhara,
Yoshiyuki Sakaki,
Takao Iwayanagi
[show abstract]
[hide abstract]
ABSTRACT: Protein-protein interactions (PPIs) are challenging but attractive targets for small chemical drugs. Whole PPIs, called the 'interactome', have been emerged in several organisms, including human, based on the recent development of high-throughput screening (HTS) technologies. Individual PPIs have been targeted by small drug-like chemicals (SDCs), however, interactome data have not been fully utilized for exploring drug targets due to the lack of comprehensive methodology for utilizing these data. Here we propose an integrative in silico approach for discovering candidates for drug-targetable PPIs in interactome data.
Our novel in silico screening system comprises three independent assessment procedures: i) detection of protein domains responsible for PPIs, ii) finding SDC-binding pockets on protein surfaces, and iii) evaluating similarities in the assignment of Gene Ontology (GO) terms between specific partner proteins. We discovered six candidates for drug-targetable PPIs by applying our in silico approach to original human PPI data composed of 770 binary interactions produced by our HTS yeast two-hybrid (HTS-Y2H) assays. Among them, we further examined two candidates, RXRA/NRIP1 and CDK2/CDKN1A, with respect to their biological roles, PPI network around each candidate, and tertiary structures of the interacting domains.
An integrative in silico approach for discovering candidates for drug-targetable PPIs was applied to original human PPIs data. The system excludes false positive interactions and selects reliable PPIs as drug targets. Its effectiveness was demonstrated by the discovery of the six promising candidate target PPIs. Inhibition or stabilization of the two interactions may have potential therapeutic effects against human diseases.
BMC Pharmacology 02/2007; 7:10.
-
Nature Genetics. 07/2006; 38(8):854-855.
-
Jun Aruga,
Akiko Kamiya,
Hirokazu Takahashi,
Takahiko J Fujimi,
Yuri Shimizu,
Keiko Ohkawa,
Shigenobu Yazawa,
Yoshihiko Umesono, Hideki Noguchi,
Takashi Shimizu,
Naruya Saitou,
Katsuhiko Mikoshiba,
Yoshiyuki Sakaki,
Kiyokazu Agata,
Atsushi Toyoda
[show abstract]
[hide abstract]
ABSTRACT: We compared Zic homologues from a wide range of animals. Striking conservation was found in the zinc finger domains, in which an exon-intron boundary has been kept in all bilateralians but not cnidarians, suggesting that all of the bilateralian Zic genes are derived from a single gene in a bilateralian ancestor. There were additional conserved amino acid sequences, ZOC and ZF-NC. Combined analysis of the zinc finger, ZOC, and ZF-NC revealed the presence of two classes of Zic, based on the degree of protein structure conservation. The "conserved" class includes Zic proteins from the Arthropoda, Mollusca, Annelida, Echinodermata, and Chordata (vertebrates and cephalochordates), whereas the "diverged" class contains those from the Platyhelminthes, Cnidaria, Nematoda, and Chordata (urochordates). The result indicates that the ancestral bilateralian Zic protein had already acquired an entire set of conserved domains, but that this was lost and diverged in the platyhelminthes, nematodes, and urochordates.
Genomics 07/2006; 87(6):783-92. · 3.02 Impact Factor
-
Todd D Taylor, Hideki Noguchi,
Yasushi Totoki,
Atsushi Toyoda,
Yoko Kuroki,
Ken Dewar,
Christine Lloyd,
Takehiko Itoh,
Tadayuki Takeda,
Dae-Won Kim, [......],
Xiaoping Yang,
Andrew R Zimmer,
Michael C Zody,
Bruce W Birren,
Chad Nusbaum,
Asao Fujiyama,
Masahira Hattori,
Jane Rogers,
Eric S Lander,
Yoshiyuki Sakaki
[show abstract]
[hide abstract]
ABSTRACT: Chromosome 11, although average in size, is one of the most gene- and disease-rich chromosomes in the human genome. Initial gene annotation indicates an average gene density of 11.6 genes per megabase, including 1,524 protein-coding genes, some of which were identified using novel methods, and 765 pseudogenes. One-quarter of the protein-coding genes shows overlap with other genes. Of the 856 olfactory receptor genes in the human genome, more than 40% are located in 28 single- and multi-gene clusters along this chromosome. Out of the 171 disorders currently attributed to the chromosome, 86 remain for which the underlying molecular basis is not yet known, including several mendelian traits, cancer and susceptibility loci. The high-quality data presented here--nearly 134.5 million base pairs representing 99.8% coverage of the euchromatic sequence--provide scientists with a solid foundation for understanding the genetic basis of these disorders and other biological phenomena.
Nature 04/2006; 440(7083):497-500. · 36.28 Impact Factor
-
Yoko Kuroki,
Atsushi Toyoda, Hideki Noguchi,
Todd D Taylor,
Takehiko Itoh,
Dae-Soo Kim,
Dae-Won Kim,
Sang-Haeng Choi,
Il-Chul Kim,
Han Ho Choi,
Yong Sung Kim,
Yoko Satta,
Naruya Saitou,
Tomoyuki Yamada,
Shinichi Morishita,
Masahira Hattori,
Yoshiyuki Sakaki,
Hong-Seog Park,
Asao Fujiyama
[show abstract]
[hide abstract]
ABSTRACT: The mammalian Y chromosome has unique characteristics compared with the autosomes or X chromosomes. Here we report the finished sequence of the chimpanzee Y chromosome (PTRY), including 271 kb of the Y-specific pseudoautosomal region 1 and 12.7 Mb of the male-specific region of the Y chromosome. Greater sequence divergence between the human Y chromosome (HSAY) and PTRY (1.78%) than between their respective whole genomes (1.23%) confirmed the accelerated evolutionary rate of the Y chromosome. Each of the 19 PTRY protein-coding genes analyzed had at least one nonsynonymous substitution, and 11 genes had higher nonsynonymous substitution rates than synonymous ones, suggesting relaxation of selective constraint, positive selection or both. We also identified lineage-specific changes, including deletion of a 200-kb fragment from the pericentromeric region of HSAY, expansion of young Alu families in HSAY and accumulation of young L1 elements and long terminal repeat retrotransposons in PTRY. Reconstruction of the common ancestral Y chromosome reflects the dynamic changes in our genomes in the 5-6 million years since speciation.
Nature Genetics 03/2006; 38(2):158-67. · 35.53 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Exhaustive gene identification is a fundamental goal in all metagenomics projects. However, most metagenomic sequences are unassembled anonymous fragments, and conventional gene-finding methods cannot be applied. We have developed a prokaryotic gene-finding program, MetaGene, which utilizes di-codon frequencies estimated by the GC content of a given sequence with other various measures. MetaGene can predict a whole range of prokaryotic genes based on the anonymous genomic sequences of a few hundred bases, with a sensitivity of 95% and a specificity of 90% for artificial shotgun sequences (700 bp fragments from 12 species). MetaGene has two sets of codon frequency interpolations, one for bacteria and one for archaea, and automatically selects the proper set for a given sequence using the domain classification method we propose. The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences. Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes. MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.
Nucleic Acids Research 02/2006; 34(19):5623-30. · 8.03 Impact Factor
-
Chad Nusbaum,
Michael C. Zody,
Mark L. Borowsky,
Michael Kamal,
Chinnappa D. Kodira,
Todd D. Taylor,
Charles A. Whittaker,
Jean L. Chang,
Christina A. Cuomo,
Ken Dewar, [......],
Atsushi Toyoda,
Hester M. Wain,
Sarah K. Young,
Qiandong Zeng,
Andrew R. Zimmer,
Asao Fujiyama,
Masahira Hattori,
Bruce W. Birren,
Yoshiyuki Sakaki,
Eric S. Lander
Nature 11/2005; 438(7068):696-696. · 36.28 Impact Factor
-
Yoshiyuki Sakuraba,
Hideki Sezutsu,
K Ryo Takahasi,
Keiko Tsuchihashi,
Rie Ichikawa,
Naomi Fujimoto,
Satoko Kaneko,
Yuji Nakai,
Masashi Uchiyama,
Noriko Goda, [......],
Hideki Kaneda,
Hiroshi Masuya,
Osamu Minowa, Hideki Noguchi,
Atsushi Toyoda,
Yoshiyuki Sakaki,
Shigeharu Wakana,
Tetsuo Noda,
Toshihiko Shiroishi,
Yoichi Gondo
[show abstract]
[hide abstract]
ABSTRACT: The large-scale mouse mutagenesis with ENU has provided forward-genetic resources for functional genomics. The frozen sperm archive of ENU-mutagenized generation-1 (G1) mice could also provide a "mutant mouse library" that allows us to conduct reverse genetics in any particular target genes. We have archived frozen sperm as well as genomic DNA from 9224 G1 mice. By genome-wide screening of 63 target loci covering a sum of 197 Mbp of the mouse genome, a total of 148 ENU-induced mutations have been directly identified. The sites of mutations were primarily identified by temperature gradient capillary electrophoresis method followed by direct sequencing. The molecular characterization revealed that all the identified mutations were point mutations and mostly independent events except a few cases of redundant mutations. The base-substitution spectra in this study were different from those of the phenotype-based mutagenesis. The ENU-based gene-driven mutagenesis in the mouse now becomes feasible and practical.
Biochemical and Biophysical Research Communications 11/2005; 336(2):609-16. · 2.48 Impact Factor
-
Chad Nusbaum,
Michael C Zody,
Mark L Borowsky,
Michael Kamal,
Chinnappa D Kodira,
Todd D Taylor,
Charles A Whittaker,
Jean L Chang,
Christina A Cuomo,
Ken Dewar, [......],
Atsushi Toyoda,
Hester M Wain,
Sarah K Young,
Qiandong Zeng,
Andrew R Zimmer,
Asao Fujiyama,
Masahira Hattori,
Bruce W Birren,
Yoshiyuki Sakaki,
Eric S Lander
[show abstract]
[hide abstract]
ABSTRACT: Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements.
Nature 10/2005; 437(7058):551-5. · 36.28 Impact Factor
-
Chad Nusbaum,
Michael C. Zody,
Mark L. Borowsky,
Michael Kamal,
Chinnappa D. Kodira,
Todd D. Taylor,
Charles A. Whittaker,
Jean L. Chang,
Christina A. Cuomo,
Ken Dewar, [......],
Atsushi Toyoda,
Hester M. Wain,
Sarah K. Young,
Qiandong Zeng,
Andrew R. Zimmer,
Asao Fujiyama,
Masahira Hattori,
Bruce W. Birren,
Yoshiyuki Sakaki,
Eric S. Lander
[show abstract]
[hide abstract]
ABSTRACT: Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term
Nature 09/2005; 437(7058):551-555. · 36.28 Impact Factor
-
Kuniya Abe, Hideki Noguchi,
Keiko Tagawa,
Misako Yuzuriha,
Atsushi Toyoda,
Toshio Kojima,
Kiyoshi Ezawa,
Naruya Saitou,
Masahira Hattori,
Yoshiyuki Sakaki,
Kazuo Moriwaki,
Toshihiko Shiroishi
[show abstract]
[hide abstract]
ABSTRACT: MSM/Ms is an inbred strain derived from the Japanese wild mouse, Mus musculus molossinus. It is believed that subspecies molossinus has contributed substantially to the genome constitution of common laboratory strains of mice, although the majority of their genome is derived from the west European M. m. domesticus. Information on the molossinus genome is thus essential not only for genetic studies involving molossinus but also for characterization of common laboratory strains. Here, we report the construction of an arrayed bacterial artificial chromosome (BAC) library from male MSM/Ms genomic DNA, covering approximately 1x genome equivalent. Both ends of 176,256 BAC clone inserts were sequenced, and 62,988 BAC-end sequence (BES) pairs were mapped onto the C57BL/6J genome (NCBI mouse Build 30), covering 2,228,164 kbp or 89% of the total genome. Taking advantage of the BES map data, we established a computer-based clone screening system. Comparison of the MSM/Ms and C57BL/6J sequences revealed 489,200 candidate single nucleotide polymorphisms (SNPs) in 51,137,941 bp sequenced. The overall nucleotide substitution rate was as high as 0.0096. The distribution of SNPs along the C57BL/6J genome was not uniform: The majority of the genome showed a high SNP rate, and only 5.2% of the genome showed an extremely low SNP rate (percentage identity = 0.9997); these sequences are likely derived from the molossinus genome.
Genome Research 01/2005; 14(12):2439-47. · 13.61 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Comprehensive knowledge of the gene content of human chromosome 21 (HSA21) is essential for understanding the etiology of Down syndrome (DS). Here we report the largest comparison of finished mouse and human sequence to date for a 1.35-Mb region of mouse chromosome 16 (MMU16) that corresponds to human chromosome 21q22.2. This includes a portion of the commonly described "DS critical region," thought to contain a gene or genes whose dosage imbalance contributes to a number of phenotypes associated with DS. We used comparative sequence analysis to construct a DNA feature map of this region that includes all known genes, plus 144 conserved sequences > or =100 bp long that show > or =80% identity between mouse and human but do not match known exons. Twenty of these have matches to expressed sequence tag and cDNA databases, indicating that they may be transcribed sequences from chromosome 21. Eight putative CpG islands are found at conserved positions. Models for two human genes, DSCR4 and DSCR8, are not supported by conserved sequence, and close examination indicates that low-level transcripts from these loci are unlikely to encode proteins. Gene prediction programs give different results when used to analyze the well-conserved regions between mouse and human sequences. Our findings have implications for evolution and for modeling the genetic basis of DS in mice.
Genome Research 10/2002; 12(9):1323-32. · 13.61 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We introduce here a novel index which precisely derives protein coding regions from cross-species genome alignments. The index is deeply related to frame recovery observed in coding sequence alignments, that is, if insertions or deletions of nucleotides causes frame shifts in coding regions, other in-dels which recover the reading frames will be often observed in the vicinity. In contrast, such frame recoveries are not observed in other conserved regions. We prepared two gene models: a model which finds gene by using sequence similarity and intrinsic gene measures (basic model), and the other model which finds gene by using frame recovery index in addition to sequence similarity and intrinsic gene measures (frame recovery model). We evaluated the prediction accuracies of the two models, and our benchmark test revealed that frame recovery model significantly improved the prediction accuracy in comparison with basic model.
Genome informatics. International Conference on Genome Informatics 01/2002; 13:183-91.