Osamu Ogasawara
Research interests
-
InterestsGenomics, Computational Biology, Gene Expression, Bioinformatic Software, Gene Regulation, Next Generation Sequencing
Publications
-
7.48Impact points
The DNA Data Bank of Japan launches a new resource, the DDBJ Omics Archive of functional genomics experiments.
Nucleic acids research. 11/2011; 40(Database issue):D38-42.
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and ... [more] The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
-
7.48Impact points
DDBJ progress report.
Nucleic acids research. 11/2010; 39(Database issue):D22-7.
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) provides a nucleotide sequence archive database and accompanying database tools for sequence submission, entry retrieval and annotation analysis. The DDBJ collected and released 3,637,446 entries/2,272,231,889 bases between July 2009 and J... [more] The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) provides a nucleotide sequence archive database and accompanying database tools for sequence submission, entry retrieval and annotation analysis. The DDBJ collected and released 3,637,446 entries/2,272,231,889 bases between July 2009 and June 2010. A highlight of the released data was archive datasets from next-generation sequencing reads of Japanese rice cultivar, Koshihikari submitted by the National Institute of Agrobiological Sciences. In this period, we started a new archive for quantitative genomics data, the DDBJ Omics aRchive (DOR). The DOR stores quantitative data both from the microarray and high-throughput new sequencing platforms. Moreover, we improved the content of the DDBJ patent sequence, released a new submission tool of the DDBJ Sequence Read Archive (DRA) which archives massive raw sequencing reads, and enhanced a cloud computing-based analytical system from sequencing reads, the DDBJ Read Annotation Pipeline. In this article, we describe these new functions of the DDBJ databases and support tools.
-
7.48Impact points
DDBJ launches a new archive database with analytical tools for next-generation sequence data.
Nucleic acids research. 10/2009;
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1 701 110 entries/1 116 138 614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and C... [more] The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1 701 110 entries/1 116 138 614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the 'DDBJ Read Archive' (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the 'DDBJ Read Annotation Pipeline' was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users' research and provide easier access to DDBJ databases.
-
4.41Impact points
On theoretical models of gene expression evolution with random genetic drift and natural selection.
PloS one. 01/2009; 4(11):e7943.
BACKGROUND: The relative contributions of natural selection and random genetic drift are a major source of debate in the study of gene expression evolution, which is hypothesized to serve as a bridge from molecular to phenotypic evolution. It has been suggested that the conflict between views is cau... [more] BACKGROUND: The relative contributions of natural selection and random genetic drift are a major source of debate in the study of gene expression evolution, which is hypothesized to serve as a bridge from molecular to phenotypic evolution. It has been suggested that the conflict between views is caused by the lack of a definite model of the neutral hypothesis, which can describe the long-run behavior of evolutionary change in mRNA abundance. Therefore previous studies have used inadequate analogies with the neutral prediction of other phenomena, such as amino acid or nucleotide sequence evolution, as the null hypothesis of their statistical inference. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we introduced two novel theoretical models, one based on neutral drift and the other assuming natural selection, by focusing on a common property of the distribution of mRNA abundance among a variety of eukaryotic cells, which reflects the result of long-term evolution. Our results demonstrated that (1) our models can reproduce two independently found phenomena simultaneously: the time development of gene expression divergence and Zipf's law of the transcriptome; (2) cytological constraints can be explicitly formulated to describe long-term evolution; (3) the model assuming that natural selection optimized relative mRNA abundance was more consistent with previously published observations than the model of optimized absolute mRNA abundances. CONCLUSIONS/SIGNIFICANCE: The models introduced in this study give a formulation of evolutionary change in the mRNA abundance of each gene as a stochastic process, on the basis of previously published observations. This model provides a foundation for interpreting observed data in studies of gene expression evolution, including identifying an adequate time scale for discriminating the effect of natural selection from that of random genetic drift of selectively neutral variations.
-
7.48Impact points
DDBJ with new system and face.
Nucleic acids research. 02/2008; 36(Database issue):D22-4.
DDBJ (http://www.ddbj.nig.ac.jp) collected and released 1 880 115 entries or 1 134 086 245 bases in the period from July 2006 to June 2007. The released data contains the high-throughput cDNAs of cricket and high-quality draft genome of medaka among others. Our computer system has been upgraded sinc... [more] DDBJ (http://www.ddbj.nig.ac.jp) collected and released 1 880 115 entries or 1 134 086 245 bases in the period from July 2006 to June 2007. The released data contains the high-throughput cDNAs of cricket and high-quality draft genome of medaka among others. Our computer system has been upgraded since March 2007. Another new aspect is an efficient data retrieval tool that has recently been equipped and served at DDBJ. It is called All-round Retrieval for Sequence and Annotation, which enables the user to search for keywords also in the Feature/Qualifier of the International Nucleotide Sequence Database Collaboration (http://www.insdc.org/). We will also replace our home page with a more efficient one by the end of 2007.
-
7.48Impact points
BodyMap-Xs: anatomical breakdown of 17 million animal ESTs for cross-species comparison of gene expression.
Nucleic acids research. 02/2006; 34(Database issue):D628-31.
BodyMap-Xs (http://bodymap.jp) is a database for cross-species gene expression comparison. It was created by the anatomical breakdown of 17 million animal expressed sequence tag (EST) records in DDBJ using a sorting program tailored for this purpose. In BodyMap-Xs, users are allowed to compare the e... [more] BodyMap-Xs (http://bodymap.jp) is a database for cross-species gene expression comparison. It was created by the anatomical breakdown of 17 million animal expressed sequence tag (EST) records in DDBJ using a sorting program tailored for this purpose. In BodyMap-Xs, users are allowed to compare the expression patterns of orthologous and paralogous genes in a coherent manner. This will provide valuable insights for the evolutionary study of gene expression and identification of a responsive motif for a particular expression pattern. In addition, starting from a concise overview of the taxonomical and anatomical breakdown of all animal ESTs, users can navigate to obtain gene expression ranking of a particular tissue in a particular animal. This method may lead to the understanding of the similarities and differences between the homologous tissues across animal species. BodyMap-Xs will be automatically updated in synchronization with the major update in DDBJ, which occurs periodically.
-
7.48Impact points
The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms.
Nucleic acids research. 02/2005; 33(Database issue):D567-72.
The Human Anatomic Gene Expression Library (H-ANGEL) is a resource for information concerning the anatomical distribution and expression of human gene transcripts. The tool contains protein expression data from multiple platforms that has been associated with both manually annotated full-length cDNA... [more] The Human Anatomic Gene Expression Library (H-ANGEL) is a resource for information concerning the anatomical distribution and expression of human gene transcripts. The tool contains protein expression data from multiple platforms that has been associated with both manually annotated full-length cDNAs from H-InvDB and RefSeq sequences. Of the H-Inv predicted genes, 18 897 have associated expression data generated by at least one platform. H-ANGEL utilizes categorized mRNA expression data from both publicly available and proprietary sources. It incorporates data generated by three types of methods from seven different platforms. The data are provided to the user in the form of a web-based viewer with numerous query options. H-ANGEL is updated with each new release of cDNA and genome sequence build. In future editions, we will incorporate the capability for expression data updates from existing and new platforms. H-ANGEL is accessible at http://www.jbirc.aist.go.jp/hinv/h-angel/.
-
12.92Impact points
Integrative annotation of 21,037 human genes validated by full-length cDNA clones.
PLoS biology. 07/2004; 2(6):e162.
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and fu... [more] The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
-
Indexing anatomical concepts to OMIM Clinical Synopsis using the UMLS Metathesaurus.
In silico biology. 02/2004; 4(1):31-54.
As a first step toward the quantitative comparison of clinical features of diseases, we indexed the text descriptions in the Clinical Synopsis section of the Online Mendelian Inheritance in Man (OMIM) with concepts for the body parts, organs, and tissues contained in the Metathesaurus of the Unified... [more] As a first step toward the quantitative comparison of clinical features of diseases, we indexed the text descriptions in the Clinical Synopsis section of the Online Mendelian Inheritance in Man (OMIM) with concepts for the body parts, organs, and tissues contained in the Metathesaurus of the Unified Medical Language System (UMLS). We also indexed the text with the diseases and disorders having links to body parts specified in the thesaurus. The vocabulary size was approximately 177,540 representations for 81,435 concepts, and 2,161 concepts were indexed to 3,779 OMIM entries. The indexed concepts included 134 concepts for the noun forms of anatomical concepts and 985 indexed concepts for diseases and disorders that were linked to 132 and 408 anatomical concepts, respectively. We report herein that the retrieval of OMIM entries for diseases affecting specific organs can be made more comprehensive through the anatomical concepts indexed to the Clinical Synopsis or linked to the indexed concepts, as compared to simply matching organ names to the Clinical Synopsis text. The recall and precision of identifying relevant body parts in the Clinical Synopsis were calculated as 78% and 92.5%, respectively, based on random sampling. The examination of the unidentified body parts due to lack of indexed diseases and disorders showed that although most of the concepts for diseases and disorders were contained in the Metathesaurus, their relations to body parts were not. The indexing result proved the effectiveness of the Metathesaurus as a resource for the identification of concepts indicating body parts, diseases, and disorders.
-
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
PLoS Biology. 01/2004;
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and fu... [more] The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/ ). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
-
Integration of Diverse Knowledge and Data
08/2003;
After the accomplishment of human draft sequence, more and more efforts are being made in the mapping of the data-driven patterns to background knowledge, hoping to efficiently produce hypotheses out of the flood of data. Here we propose a framework of biomedical data and knowledge that has a high a... [more] After the accomplishment of human draft sequence, more and more efforts are being made in the mapping of the data-driven patterns to background knowledge, hoping to efficiently produce hypotheses out of the flood of data. Here we propose a framework of biomedical data and knowledge that has a high adaptability to the automated data interpretation. Then, we show that biomedical databases with heterogeneous scopes and structures can be converted to the format, and possible roles of ontology of biomedical objects combined with natural language processing techniques. Lastly, we present applications of formatted biomedical knowledge to scientific discovery.
-
2.73Impact points
Expression of cytochrome P450 cholesterol side chain cleavage and 3beta-hydroxysteroid dehydrogenase during embryogenesis in chicken adrenal glands and gonads.
General and comparative endocrinology. 05/2000; 118(1):96-104.
Expression of cytochrome P450 cholesterol side chain cleavage (P450scc) and 3beta-hydroxysteroid dehydrogenase (3beta-HSD) mRNAs was examined in chicken embryonic adrenal glands and gonads between days 4 and 12 of incubation. In situ hybridization analysis showed that 3beta-HSD mRNA appeared on day ... [more] Expression of cytochrome P450 cholesterol side chain cleavage (P450scc) and 3beta-hydroxysteroid dehydrogenase (3beta-HSD) mRNAs was examined in chicken embryonic adrenal glands and gonads between days 4 and 12 of incubation. In situ hybridization analysis showed that 3beta-HSD mRNA appeared on day 5 of incubation in the adrenal glands and on day 6 in the gonads, while P450scc mRNA was expressed on day 7 in both the adrenal glands and the gonads. Cells expressing both enzyme mRNAs were distributed in the steroidogenic tissues of the adrenal glands and in the medullary cords of the gonads. From days 9 to 11 of incubation, P450scc mRNA expression was not found in the majority of both the adrenal glands and the gonads, but was detected again in both on day 12, although 3beta-HSD mRNA was constitutively expressed during this period. Changes in the expression pattern of P450scc mRNA are paralleled by changes in the plasma corticosterone level reported previously. Therefore, it is suggested that P450scc is essential to embryogenesis.
-
2.73Impact points
Highly heterologous region in the N-terminal extracellular domain of reptilian follitropin receptors.
General and comparative endocrinology. 01/1997; 104(3):374-81.
The primary structure of the N-terminal extracellular region of the follitropin receptor (FSH-R), which is thought to be responsible for hormone binding specificity, was determined in three reptilian species (tortoise, gecko, and lizard). Remarkably low sequence homologies were detected in the C-ter... [more] The primary structure of the N-terminal extracellular region of the follitropin receptor (FSH-R), which is thought to be responsible for hormone binding specificity, was determined in three reptilian species (tortoise, gecko, and lizard). Remarkably low sequence homologies were detected in the C-terminal part of the extracellular domain. This region was estimated to be a part of exon 10, which is the last exon of the FSH-R gene. In this region, not only were low homologies detected among the three reptilian species, but also specific deletions and/or insertions were found. In particular, large deletions were detected in squamate (gecko and lizard) FSH-Rs. Phylogenetic analysis indicated that these large deletions occurred recently, i.e., after the Triassic period. In another region characterized, sequence homologies were high, with tortoise-rat homology 78.4%, gecko-rat 64.7%, and lizard-rat 69.1%. In this highly conserved region, however, some reptile-specific alterations were detected, such as the loss of a cysteine residue in putative exon 7 and the existence of potential N-linked glycosylation sites in putative exon 9.
-
1.71Impact points
Zipf's law and human transcriptomes: an explanation with an evolutionary model.
Comptes rendus biologies. 326(10-11):1097-101.
Detailed analysis of human gene expression data reveals several patterns of relationship between transcript frequency and abundance rank. In muscle and liver, organs composed primarily of a homogeneous population of differentiated cells, they obey Zipf's law. In cell lines, epithelial tissue and... [more] Detailed analysis of human gene expression data reveals several patterns of relationship between transcript frequency and abundance rank. In muscle and liver, organs composed primarily of a homogeneous population of differentiated cells, they obey Zipf's law. In cell lines, epithelial tissue and compiled transcriptome data, only high-rankers deviate from it. We propose an evolutionary process model during which expression level changes stochastically proportionally to its intensity, providing a novel interpretation of transcriptome data and of evolutionary constraints on gene expression.
Following (17)
-
Kei Yura
Ochanomizu University -
Eduardo Eyras
Universitat Pompeu Fabra -
Winston Hide
Harvard Medical School -
Wojciech Makalowski
Westfälische Wilhelms-Universität Münster -
Hyang-Sook Yoo
Korea Research Institute of Bioscience & Biotechnology KRIBB