DDBJ launches a new archive database with analytical tools for next-generation sequence data

Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization for Information and Systems, Yata, Mishima 411-8510, Japan.
Nucleic Acids Research (Impact Factor: 9.11). 10/2009; 38(Database issue):D33-8. DOI: 10.1093/nar/gkp847
Source: PubMed


The DNA Data Bank of Japan (DDBJ) ( has collected and released 1,701,110 entries/1,116,138,614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the 'DDBJ Read Archive' (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the 'DDBJ Read Annotation Pipeline' was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users' research and provide easier access to DDBJ databases.

31 Reads
  • Source
    • "Deep sequencing was performed on various tissues and sequencing data were obtained. We transformed the data from SRA format to FASTQ format, as previously described (15) using the SRA Toolkit (16). Sequence alignments were then carried out with Bowtie alignments (17), allowing as many as 2 mismatches. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we investigated differentially expressed microRNAs (miRNAs or miRs) and their functions in metastatic melanoma using next-generation sequencing technology. The GSE36236 data set was downloaded from the Gene Expression Omnibus (GEO) database and 4 primary cutaneous melanoma samples (used as controls) and 3 metastatic melanoma samples were selected from 31 samples for further analysis. Firstly, the differentially expressed miRNAs were screened by limma package in R language. Secondly, the target genes of the miRNAs were retrieved with TargetScanHuman 6.2, and the interactions among these genes were identified by String and an interaction network was established. Finally, functional and pathway analyses were performed for the genes in the network using Expression Analysis Systematic Explorer (EASE). A total of 4 differentially expressed miRNAs (hsa-miR-146, hsa-miR-27, hsa-miR-877 and hsa-miR-186) were obtained between the metastatic melanoma and primary cutaneous melanoma samples. We predicted 101 high-confidence target genes of hsa-miR-27 and obtained a network with 41 interactions. Finally, functional and pathway analyses revealed that the genes in the network were significantly enriched at the transcriptional level. Differentially expressed miRNAs were identified in the metastatic melanoma compared with the primary cutaneous melanoma samples and the target genes of hsa-miR-27 were found to be significantly enriched at the transcriptional level. The results presented in our study may prove helpful in the diagnosis and treatment of metastatic melanoma.
    International Journal of Molecular Medicine 02/2014; 33(5). DOI:10.3892/ijmm.2014.1668 · 2.09 Impact Factor
  • Source
    • "DNA sequencing on the Illumina platform (Harris et al. 2008) was performed using a HiSeq2000 system (Illumina), and that on the 454 platform (Margulies et al. 2005) was performed using a Genome Sequencer FLX system (Roche). Raw sequence data are available in the DDBJ Sequence Read Archive (DRA) (Kaminuma et al. 2010) (study ID DRP000312) and in the Sequence Read Archive (SRA) of NCBI (Leinonen et al. 2011) (accession Nos. DRX000454, DRX000455 and DRX000482). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. The genome sequencing of the tomato cultivar 'Heinz 1706' was recently completed. To accelerate the progress of tomato genomics studies, systematic bioresources, such as mutagenized lines and full-length cDNA libraries, have been established for the cultivar 'Micro-Tom'. However, these resources cannot be utilized to their full potential without the completion of the genome sequencing of 'Micro-Tom'. We undertook the genome sequencing of 'Micro-Tom' and here report the identification of single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) between 'Micro-Tom' and 'Heinz 1706'. The analysis demonstrated the presence of 1.23 million SNPs and 0.19 million indels found in common between the two cultivars. The density of SNPs and indels was high in chromosomes 02, 05, and 11, but was low in chromosomes 06, 08 and 10. Three known mutations of 'Micro-Tom' were localized on chromosomal regions where the density of SNPs and indels was low, which was consistent with the fact that these mutations were relatively new and introgressed into 'Micro-Tom' during the breeding of this cultivar. We also report SNP analysis for two 'Micro-Tom' varieties that have been maintained independently in Japan and France, both of which have served as standard lines for 'Micro-Tom' mutant collections. Approximately 28,000 SNPs were identified between these two 'Micro-Tom' lines. These results provide high-resolution DNA polymorphic information on 'Micro-Tom' and represent a valuable contribution to the 'Micro-Tom'-based genomics resources.
    Plant and Cell Physiology 02/2014; 55(2):445-54. DOI:10.1093/pcp/pct181 · 4.93 Impact Factor
  • Source
    • "Entire read counts mapped onto the bovine genome (Btau_4.0), allowing multiple hits within 10 times, were 172,435,337, 142,294,526, and 139,083,864 for days 17, 20, and 22, respectively [19], and primary sequencing data were deposited to the DDBJ Sequence Read Archive (accession number DRA000549) [25]. Matching locations were subsequently used to generate counts for identified IFNTs and Ensemble-provided gene annotations (Bos_taurus. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Interferon tau (IFNT), produced by the mononuclear trophectoderm, signals the process of maternal recognition of pregnancy in ruminants. However, its expression in vivo and its transcriptional regulation are not yet well characterized. Objectives of this study were to determine conceptus IFNT gene isoforms expressed in the bovine uterus and to identify differences in promoter sequences of IFNT genes that differ in their expression. RNA-seq data analysis of bovine conceptuses on days 17, 20, and 22 (day 0 = day of estrus) detected the expression of two IFNT transcripts, IFNT1 and IFNTc1, which were indeed classified into the IFNT gene clade. RNA-seq and quantitative RT-PCR analyses also revealed that the expression levels of both IFNT mRNAs were highest on day 17, and then decreased on days 20 and 22. Bovine ear-derived fibroblast (EF) cells, a model system commonly used for bovine IFNT gene transcription study in this laboratory, were cotransfected with luciferase reporter constructs carrying upstream (positions -637 to +51) regions of IFNT1 or IFNTc1 gene and various transcription factor expression plasmids including CDX2, AP-1 (Jun) and ETS2. CDX2, either alone or with the other transcription factors, markedly increased luciferase activity. The upstream regions of IFNT1 and IFNTc1 loci were then serially deleted or point-mutated at potential CDX-, AP-1-, and ETS-binding sites. Compared to the wild-type constructs, deletion or mutation at CDX2 or ETS2 binding sites similarly reduced the luciferase activities of IFNT1- or IFNTc1-promoter constructs. However, with the AP-1 site mutated construct, IFNT1- and IFNTc1-reporters behaved differently. These results suggest that two forms of bovine conceptus IFNT genes are expressed in utero and their transcriptional regulations differ.
    PLoS ONE 11/2013; 8(11):e80427. DOI:10.1371/journal.pone.0080427 · 3.23 Impact Factor
Show more