About
71
Publications
13,122
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,401
Citations
Publications
Publications (71)
The Bioinformation and DDBJ (DNA Data Bank of Japan) Center (DDBJ Center; https://www.ddbj.nig.ac.jp) operates archival databases that collect nucleotide sequences, study and sample information, and distribute them without access restriction to progress life science research as a member of the International Nucleotide Sequence Database Collaboratio...
The Bioinformation and DDBJ Center (DDBJ Center, https://www.ddbj.nig.ac.jp) provides databases that capture, preserve and disseminate diverse biological data to support research in the life sciences. This center collects nucleotide sequences with annotations, raw sequencing data, and alignment information from high-throughput sequencing platforms,...
Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to gu...
Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One sol...
The Bioinformation and DDBJ Center (https://www.ddbj.nig.ac.jp) in the National Institute of Genetics (NIG) maintains a primary nucleotide sequence database as a member of the International Nucleotide Sequence Database Collaboration (INSDC) in partnership with the US National Center for Biotechnology Information and the European Bioinformatics Inst...
Background
Container virtualization technologies such as Docker are popular in the bioinformatics domain because they improve the portability and reproducibility of software deployment. Along with software packaged in containers, the standardized workflow descriptors Common Workflow Language (CWL) enable data to be easily analyzed on multiple compu...
We have fully integrated public chromatin chromatin immunoprecipitation sequencing (ChIP-seq) and DNase-seq data (n > 70,000) derived from six representative model organisms (human, mouse, rat, fruit fly, nematode, and budding yeast), and have devised a data-mining platform—designated ChIP-Atlas (http://chip-atlas.org). ChIP-Atlas is able to show a...
The Genomic Expression Archive (GEA) for functional genomics data from microarray and high-throughput sequencing experiments has been established at the DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp), which is a member of the International Nucleotide Sequence Database Collaboration (INSDC) with the US National Center for Biotechn...
Noncoding regions of the human genome possess enhancer activity and harbor risk loci for heritable diseases. Whereas the binding profiles of multiple transcription factors (TFs) have been investigated, integrative analysis with the large body of public data available so as to provide an overview of the function of such noncoding regions has remaine...
Aim:
Bioinformatics analysis for Illumina Infinium Human DNA methylation BeadArray is essential, but still remains difficult task for many experimental researchers. We here aimed to develop a browser-accessible bioinformatics tool for analyzing the BeadArray data.
Materials & methods:
The tool was established as an analytical pipeline using R, P...
The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US Nati...
Gene expression data are exponentially accumulating; thus, the functional annotation of such sequence data from metadata is urgently required. However, life scientists have difficulty utilizing the available data due to its sheer magnitude and complicated access. We have developed a web tool for browsing reference gene expression pattern of mammali...
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for B...
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the
DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics
Institute (EBI) within the fra...
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Cent...
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. This database content is
shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within
the framework of the International Nucleo...
The DNA data bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) maintains a primary nucleotide sequence database and provides analytical resources for biological information to researchers.
This database content is exchanged with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics
Institute (EBI) within the fram...
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource
consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional
annotation. Database content is exchang...
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) provides a nucleotide sequence archive database and accompanying database tools for sequence submission, entry retrieval
and annotation analysis. The DDBJ collected and released 3 637 446 entries/2 272 231 889 bases between July 2009 and June
2010. A highlight of the released data was arc...
Time development of hypothetical mRNA abundance generated by Monte Carlo simulations of the previous model (L = 0.0). Other model parameters were: M = 20,000, N = 300,000. The line shows y = 0.1/x.
(0.37 MB PDF)
Time development of hypothetical mRNA abundance generated by Monte Carlo simulations of the refined neutral model (L = 1.0) Other model parameters were: M = 20,000, N = 300,000. The line shows y = 0.1/x.
(0.39 MB PDF)
The relative contributions of natural selection and random genetic drift are a major source of debate in the study of gene expression evolution, which is hypothesized to serve as a bridge from molecular to phenotypic evolution. It has been suggested that the conflict between views is caused by the lack of a definite model of the neutral hypothesis,...
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1 701 110 entries/1 116 138 614 bases between July 2008 and June 2009. A few highlighted data
releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis
Gene Expression tags for human and mou...
DDBJ (http://www.ddbj.nig.ac.jp) collected and released 1 880 115 entries or 1 134 086 245 bases in the period from July 2006 to June 2007. The released
data contains the high-throughput cDNAs of cricket and high-quality draft genome of medaka among others. Our computer system
has been upgraded since March 2007. Another new aspect is an efficient d...
BodyMap-Xs (http://bodymap.jp) is a database for cross-species gene expression comparison. It was created by the anatomical breakdown of 17 million animal
expressed sequence tag (EST) records in DDBJ using a sorting program tailored for this purpose. In BodyMap-Xs, users are allowed
to compare the expression patterns of orthologous and paralogous g...
The Human Anatomic Gene Expression Library (H-ANGEL) is a resource for information concerning the anatomical distribution
and expression of human gene transcripts. The tool contains protein expression data from multiple platforms that has been
associated with both manually annotated full-length cDNAs from H-InvDB and RefSeq sequences. Of the H-Inv...
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so,...
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so,...
List of Library Origins of H-Inv cDNAs (182 Libraries)
The dataset consists of 41,118 H-Inv cDNAs that were cloned from cDNA libraries derived from 182 varieties of cell and tissue.
(33 KB XLS).
List of H-Inv Proteins with Potential EC Numbers (1,892 H-Inv Proteins)
The allotted EC numbers are based on the corresponding DNA databank records, UniProt/Swiss-Prot and TrEMBL records that show sequence similarity to the proteins, and InterPro records that the proteins hit.
(247 KB XLS).
Gene Structure
(A) Gene structure of the cDNAs.
(B) The frequencies and varieties of repetitive sequences found in the cDNAs. A list of the 20,899 loci representing cDNAs that RepeatMasker showed contained repetitive elements.
(C) The positions (5′ UTR, ORF, and 3′ UTR) of repetitive sequences in the protein-coding cDNAs. A total of 1,863 cDNAs con...
List of Polymorphic Microsatellites Inferred by Comparisons between the H-Inv cDNAs and Genomic Sequences
(56 KB XLS).
Size Distribution of Predicted ORFs
The size distribution of all H-Inv proteins among the five similarity categories.
(24 KB PDF).
Numbers of Representative H-Inv cDNAs That Are Homologous to Proteins in Each Taxonomic Group
Two thresholds (E < 10−5, white bars, and E < 10−10, black bars) were employed. The “animal” group does not include mammalian species. The “eukaryote” group represents eukaryotic species other than animals, fungi, and plants.
(9 KB PDF).
Prediction of ORFs
(A) Schematic diagram for the prediction of ORFs. This diagram illustrates the ORF prediction method used on all H-Inv cDNAs. The method was based upon the alignment of similarity searches using FASTY and BLASTX. Gene prediction was carried out using GeneMark. Prior to the prediction of ORFs, we judged if a sequence had any frame...
Scheme of Prediction for Functional Annotation
(A) Schematic diagram for determining a representative transcript for each locus. The procedure of computational autoannotation is illustrated. Prior to the human curation of the representative transcript of each H-Inv cluster, we performed computational autoannotation.
(B) Schematic diagram for functi...
A Functional Classification of H-Inv Protein Families That Have Homologs in Each Taxonomic Group
H-Inv protein families were identified by clustering H-Inv proteins using the single-linkage clustering method. Then, the number of homologs for each H-Inv protein family was calculated. Mammalian species are excluded from the “animal” group. “eukaryote...
H-Inv Annotation Viewers
(A) G-integra: A genome mapping viewer.
(B) SOUP Locus annotation viewer.
(C) SOUP cDNA annotation viewer.
(D) SMO Viewer: The similarity, motif, and ORF information viewer.
(2,022 KB PDF).
The InterPro IDs Identified in H-Inv Proteins
The top 40 InterPro IDs identified in H-Inv proteins and proteins from other species are listed for all types (A) and for each type of family, domain, and repeat (B–D). Analyses were conducted by InterPro ver. 3.1. Nonredundant proteome datasets of other species were obtained from the following sites: f...
Features of Category II Proteins
A total of 4,104 H-Inv proteins were classified as Category II based on sequence similarity to functionally validated proteins. The table and figure show source species of proteins in public databases to which the Category II proteins were similar.
(9 KB PDF).
H-Inv KEGG Analysis Results (Images of KEGG Pathways)
The images illustrate the metabolic pathways of KEGG database based on the EC number assignments to H-Inv proteins.
(47 KB PDF).
A Sample View of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/)
A FLcDNA (BC003551) is shown with its detailed annotations, e.g., gene structure, functional annotation, ORF predictions, protein structure prediction by GTOP, etc. The H-InvDB has links to other internal databases (red boxes) such as a genome map viewer (G-integr...
CAI and Codon Usage
(A) CAI was measured for all H-Inv proteins. CAI is a measure of biased patterns for synonymous codon usage (http://biobase.dk/embossdocs/cai.html).
(B) Codon usage in predicted ORFs of H-Inv proteins. Total tri-nucleotide frequencies (forward strand) for the sequences of each species are shown. Nonredundant proteome datasets fo...
Tissue Library Origins of H-Inv Proteins
The results of classification into five similarity categories for each of ten tissue classes.
(A) Numbers of H-Inv proteins.
(B) Histogram.
(10 KB PDF).
List of Newly Assigned Human Enzymes (32 H-Inv Proteins)
All these 32 H-Inv proteins were newly assigned enzyme numbers with the support of the KEGG pathway. These enzyme assignments were previously unrepresented in Homo sapiens.
(33 KB PDF).
Basic Statistics for UTR Sequences Analyzed
(8 KB PDF).
GO Term Assignment to H-Inv Proteins
(A) Molecular function.
(B) Cellular component.
(C) Biological process.
(74 KB PDF).
A Functional Classification of Representative H-Inv cDNAs That Have Homologs in Other Species
(See also Figure 6.)
(9 KB PDF).
UTR Replacements in Primates and Rodents
One hundred and forty-seven UTR replacements distributed among different species were detected.
(9 KB PDF).
List of the Databases and Software Used in the H-Inv Project
(31 KB PDF).
A Detailed Functional Annotation Based on Protein Modules
(25 KB PDF).
As a first step toward the quantitative comparison of clinical features of diseases, we indexed the text descriptions in the Clinical Synopsis section of the Online Mendelian Inheritance in Man (OMIM) with concepts for the body parts, organs, and tissues contained in the Metathesaurus of the Unified Medical Language System (UMLS). We also indexed t...
Detailed analysis of human gene expression data reveals several patterns of relationship between transcript frequency and abundance rank. In muscle and liver, organs composed primarily of a homogeneous population of differentiated cells, they obey Zipf's law. In cell lines, epithelial tissue and compiled transcriptome data, only high-rankers deviat...
After the accomplishment of human draft sequence, more and more efforts are being made in the mapping of the data-driven patterns to background knowledge, hop-ing to efficiently produce hypotheses out of the flood of data. Here we propose a framework of biomedical data and knowl-edge that has a high adaptability to the automated data interpretation...
Expression of cytochrome P450 cholesterol side chain cleavage (P450scc) and 3beta-hydroxysteroid dehydrogenase (3beta-HSD) mRNAs was examined in chicken embryonic adrenal glands and gonads between days 4 and 12 of incubation. In situ hybridization analysis showed that 3beta-HSD mRNA appeared on day 5 of incubation in the adrenal glands and on day 6...
The primary structure of the N-terminal extracellular region of the follitropin receptor (FSH-R), which is thought to be responsible for hormone binding specificity, was determined in three reptilian species (tortoise, gecko, and lizard). Remarkably low sequence homologies were detected in the C-terminal part of the extracellular domain. This regio...
We have developed an internet-accessible database, TissueDB, which provides a hierarchy of names and synonyms for adult human tissues. There are two goals for TissueDB. The first is to provide a framework within which to store data concerning gene expression, tissue sources of cultured cell lines, and other spatially organized data. The second goal...