
Yasukazu Nakamura- Doctor of Science (PhD)
- Professor at National Institute of Genetics
Yasukazu Nakamura
- Doctor of Science (PhD)
- Professor at National Institute of Genetics
About
270
Publications
78,097
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
26,672
Citations
Introduction
Current institution
Additional affiliations
January 2009 - present
January 2009 - present
January 2009 - present
Publications
Publications (270)
The liverwort Marchantia polymorpha is a key model organism for understanding land plant evolution, development, and gene regulation. To support the growing demand for high-quality genomic resources, we present MarpolBase, a comprehensive and integrated genome database that hosts newly assembled, high-accuracy reference genomes for both the male Ta...
Background
Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Despite advances in genome analysis techniques, challenges pers...
Nicotiana benthamiana has long served as a crucial plant material extensively used in plant physiology research, particularly in the field of plant pathology, because of its high susceptibility to plant viruses. Additionally , it serves as a production platform to test vaccines and other valuable substances. Among its approximately 3.1 Gb genome, 5...
The Bioinformation and DNA Data Bank of Japan Center (DDBJ Center, https://www.ddbj.nig.ac.jp) provides public databases that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), the DDBJ Center accepts and distributes nucleotide sequence data ranging from raw r...
Motivation
Accurate taxonomic assignments of genomic data are crucial across various biological databases. With a rapid increase in submitted genomes in recent years, ensuring precise classification is important to maintain database integrity. Mislabeled genomes can confuse researchers, hinder analyses, and produce false results. Therefore, there i...
The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample...
Although chemotherapy using CHOP-based protocol induces remission in most cases of canine multicentric high-grade B-cell lymphoma (mhBCL), some cases develop early relapse during the first induction protocol. In this study, we examined the gene expression profiles of canine mhBCL before chemotherapy and investigated their associations with early re...
Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been perf...
Nicotiana benthamiana has long served as a crucial plant material extensively used in plant physiology research, particularly in the field of plant pathology, because of its high susceptibility to plant viruses. Additionally, it serves as a production platform to test vaccines and other valuable substances. Among its approximately 3.1 Gb genome, 57...
Background
Plant genome information is fundamental to plant research and development. Along with the increase in the number of published plant genomes, there is a need for an efficient system to retrieve various kinds of genome-related information from many plant species across plant kingdoms. Various plant databases have been developed, but no pub...
Objectives:
Autosomal dominant polycystic kidney disease (ADPKD) is a common inherited disease in cats. In most cases, the responsible abnormality is a nonsense single nucleotide polymorphism in exon 29 of the PKD1 gene (chrE3:g.42858112C>A, the conventional PKD1 variant). The aim of this study was to conduct a large-scale epidemiological study of...
In plants, variations in seed size and number are outcomes of different reproductive strategies. Both traits are often environmentally influenced, suggesting that a mechanism exists to coordinate these phenotypes in response to available maternal resources. Yet, how maternal resources are sensed and influence seed size and number are largely unknow...
Autosomal dominant polycystic kidney disease (ADPKD) is a common inherited disease in cats. In most cases, the responsible abnormality is a nonsense single nucleotide polymorphism in exon 29 of the PKD1 gene (chrE3:g.42858112C>A, the conventional PKD1 variant). Epidemiological studies on feline ADPKD caused by the conventional PKD1 variant have bee...
Nicotiana benthamiana is widely used as a model plant for dicotyledonous angiosperms. In fact, the strains used in research are highly susceptible to a wide range of viruses. Accordingly, these strains are subject to plant pathology and plant-microbe interactions. In terms of plant-plant interactions, N. benthamiana is one of the plants that exhibi...
Simple Summary
We identified the neuropeptides and their genomic loci on the draft genome sequences of Gryllus bimaculatus. These annotations were additionally assigned to the draft genome annotation. This addition to the draft genome annotation improved the convenience of research by consolidating the knowledge of neuropeptides, such as the sequen...
The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) maintains database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), our primary mission is to collect and distribute nucleotide sequence data, as well as t...
Perilla frutescens (Lamiaceae) is an important herbal plant with hundreds of bioactive chemicals, among which perillaldehyde and rosmarinic acid are the two major bioactive compounds in the plant. The leaves of red perilla are used as traditional Kampo medicine or food ingredients. However, the medicinal and nutritional uses of this plant could be...
The liverwort Marchantia polymorpha is equipped with a wide range of molecular and genetic tools and resources that have led to its wide use to explore the evo-devo aspects of land plants. Although its diverse transcriptome data are rapidly accumulating, there is no extensive yet user-friendly tool to exploit such a compilation of data and to summa...
The liverwort Marchantia polymorpha is equipped with a wide range of molecular and genetic tools and resources that have led to its wide use to explore the evo-devo aspects of land plants. Although its diverse transcriptome data are rapidly accumulating, there is no extensive yet user-friendly tool to exploit such a compilation of data and to summa...
Secondary loss of photosynthesis is observed across almost all plastid-bearing branches of the eukaryotic tree of life. However, genome-based insights into the transition from a phototroph into a secondary heterotroph have so far only been revealed for parasitic species. Free-living organisms can yield unique insights into the evolutionary conseque...
Background
OryzaGenome (http://viewer.shigen.info/oryzagenome21detail/index.xhtml), a feature within Oryzabase (https://shigen.nig.ac.jp/rice/oryzabase/), is a genomic database for wild Oryza species that provides comparative and evolutionary genomics approaches for the rice research community.
Results
Here we release OryzaGenome2.1, the first maj...
Sex determination is a central process for sexual reproduction and is often regulated by a sex determinant encoded on a sex chromosome. Rules that govern the evolution of sex chromosomes via specialization and degeneration following the evolution of a sex determinant have been well studied in diploid organisms. However, distinct predictions apply t...
Cyanobacteria are a diverse group of Gram-negative prokaryotes that perform oxygenic photosynthesis. Cyanobacteria have been used for research on photosynthesis and have attracted attention as a platform for biomaterial/biofuel production. Cyanobacteria are also present in almost all habitats on Earth and have extensive impacts on global ecosystems...
Planktothrix species are distributed worldwide, and these prevalent cyanobacteria occasionally form potentially devastating toxic blooms. Given the ecological and taxonomic importance of Planktothrix agardhii as a bloom species, we set out to determine the complete genome sequence of the type strain Planktothrix agardhii NIES-204. Remarkably, we fo...
The domestic cat ( Felis catus ) is one of the most popular companion animals in the world. Comprehensive genomic resources will aid the development and application of veterinary medicine including to improve feline health, in particular, to enable precision medicine which is promising in human application. However, currently available cat genome a...
Two Gram-stain-positive, rod-shaped, non-motile, non-spore-forming, catalase-negative bacteria, designated strains SG162T and NK01, were isolated from Japanese rice grain silage and total mixed ration silage, respectively. They were initially identified as Lactobacillus buchneri based on the 16S rRNA gene sequence similarities. However, the two str...
Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One sol...
Clostridium diolis shares high similarity based on 16S rRNA gene sequences and fatty acid composition with Clostridium beijerinckii. In this study, the taxonomic status of C. diolis was clarified using genomic and phenotypic approaches. High similarity was detected among C. diolis DSM 15410T, C. beijerinckii DSM 791T and NCTC 13035T, showing averag...
Genome packaging by nucleosomes is a hallmark of eukaryotes. Histones and the pathways that deposit, remove, and read histone modifications are deeply conserved. Yet, we lack information regarding chromatin landscapes in extant representatives of ancestors of the main groups of eukaryotes, and our knowledge of the evolution of chromatin-related pro...
Genome packaging by nucleosomes is a hallmark of eukaryotes. Histones and the pathways that deposit, remove, and read histone modifications are deeply conserved. Yet, we lack information regarding chromatin landscapes in extant representatives of ancestors of the main groups of eukaryotes and our knowledge of the evolution of chromatin related proc...
DDBJ Fast Annotation and Submission Tool (DFAST) is a genome annotation pipeline for prokaryotes, which also assists data submission to the public sequence database. It is available both as a web service and as a stand-alone tool that runs on local machines. DFAST can annotate a typical-sized bacterial genome within 5 min. The default annotation wo...
A taxonomic study of a Gram-stain-positive, rod-shaped, non-motile, non-spore-forming, catalase-negative bacterium, strain YK43T, isolated from spent mushroom substrates stored in Nagano, Japan was performed. Growth was detected at 15–45 °C, pH 5.0–8.5, and 0–10 % (w/v) NaCl. The genomic DNA G+C content of strain YK43T was 43.6 mol%. The predominan...
The taxonomic status of Paenibacillus thermophilus was analyzed using genomic and phenotypic approaches. The results of RNA polymerase beta subunit gene sequence comparisons indicated that two type strains of P. thermophilus (DSM 24746T and JCM 17693T) and Paenibacillus macerans ATCC 8244T shared 100 % sequence similarity. By whole-genome sequence...
Oryza officinalis is an accessible alien donor for genetic improvement of rice. Comparison across a representative panel of Oryza species showed that the wild O. officinalis and cultivated O. sativa ssp. japonica have similar cold tolerance potentials. The possibility that either distinct or similar genetic mechanisms are involved in the low temper...
Three strains, JCM 5343T, JCM 5344 and JCM 1130, currently identified as Lactobacillus gasseri, were investigated using a polyphasic taxonomic approach. Although these strains shared high 16S rRNA gene sequence similarities with L. gasseri ATCC 33323T (99.9 %), they formed a clade clearly distinct from ATCC 33323T based on whole-genome relatedness....
We report here the whole-genome sequence of Nostoc cycadae strain WK-1, which was isolated from cyanobacterial colonies growing in the coralloid roots of the gymnosperm Cycas revoluta . It can provide valuable resources to study the mutualistic relationships and the syntrophic metabolisms between the cyanobacterial symbiont and the host plant, C. r...
Satsuma (Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma (“Miyagawa Wase”) was conducted by a hybrid assembly approach using short-read sequences, three mate-pair librar...
Novel genomics-based approaches such as genome-wide association studies (GWAS) and genomic selection (GS) are expected to be useful in fruit tree breeding, which requires much time from the cross to the release of a cultivar because of the long generation time. In this study, a citrus parental population (111 varieties) and a breeding population (6...
We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, w...
The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US Nati...
The evolution of land flora transformed the terrestrial environment. Land plants evolved from an ancestral charophycean alga from which they inherited developmental, biochemical, and cell biological attributes. Additional biochemical and physiological adaptations to land, and a life cycle with an alternation between multicellular haploid and diploi...
The International Nucleotide Sequence Database Collaboration (INSDC) has maintained a primary sequence database that collects experimentally-determined nucleotide sequence data directly from researchers. Now data deposition to the INSDC is mandatory for research publication at most of the scientific journals. However, the procedure to deposit data...
Members of the cyanobacterial genus Synechococcus are abundant in marine environments. To better understand the genomic diversity of marine Synechococcus spp., we determined the complete genome sequence of a coastal cyanobacterium, Synechococcus sp. NIES-970. The genome had a size of 3.1 Mb, consisting of one chromosome and four plasmids.
Genome annotation is a fundamental process in the sequence analysis, through which biological knowledge is generated from sequenced genomic data. Good annotation not only enhances our own downstream analyses but also promotes subsequent researches by others because it can propagate through public sequence databases. In this article, we will show ho...
Whole-genome sequencing was performed for Lactobacillus parakefiri JCM 8573T to confirm its hitherto controversial taxonomic position. Here, we report its first reliable reference genome. Genome-wide metrics, such as average nucleotide identity and digital DNA-DNA hybridization, and phylogenomic analysis based on multiple genes supported its taxono...
With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distr...
Sample number of registered SRA and DNApod by study type.
Data as of April 2016. The sample number of the registered SRA was searched using ENA. “Library strategy” is explained on the DDBJ SRA website (http://trace.ddbj.nig.ac.jp/dra/submission_e.html).
(DOCX)
Data quantity of each sample.
Data quantity is described as the depth after the removal of multiple-hit reads on the genome. The depth of a reference genome is <5-fold in 87% of the DNApod genotypic data.
(TIF)
Heterogeneous base-quality raw sequence reads in SRAs.
SRAs contain data of various quality values among NGS datasets from individual projects. To detect DNA polymorphisms with uniform reliability, DNApod performs pre-processing to filter out low quality values and detects DNA polymorphisms by using a uniform threshold.
(TIF)
Overview of the Galaxy virtual machine.
The high-level analysis is configured in the Galaxy platform, which is implemented in the virtual machine image. The virtual machine image of the high-level analysis is launched by the Oracle VirtualBox on the user’s personal computer. The respective tools in high-level analysis are encapsulated in the Docker...
Read loss per read length caused by elimination of multiple-hit reads.
Maize exhibits a more profound effect resulting from read loss than do rice and sorghum after the elimination of multiple-hit reads. This predicted that a large-scale syntenic block of maize would cause comparatively higher multiple-hit reads.
(TIF)
Oligoflexus tunisiensis Shr3T is the first strain described in the newest (eighth) class Oligoflexia of the phylum Proteobacteria. This strain was isolated from the 0.2-μm filtrate of a suspension of sand gravels collected in the Sahara Desert in the Republic of Tunisia. The genome of O. tunisiensis Shr3T is comprising 6,406 protein-coding and 57 R...
Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various ci...
The first ever cyanobacterial genome sequence was determined two decades ago and CyanoBase (http://genome.microbedb.jp/cyanobase), the first database for cyanobacteria was simultaneously developed to allow this genomic information to be used more efficiently. Since then, CyanoBase has constantly been extended and has received several updates. Here,...
Genome finishing still remains a laborious work that includes various validation processes requiring both wet and dry knowledge and consideration, although long-read sequencers such as PacBio RSII have largely contributed to lighten the burden. We here introduce a procedure of post-assemble validation in which draft contigs are circularized into co...
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for B...
Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed cur...
The large-scale genotyping assay is a prerequisite for modern genetic analysis, and single-nucleotide polymorphism (SNP) markers that enable high-throughput genotyping are widely used for genome-wide association studies (GWAS) and genomic selection (GS). However, SNP markers randomly selected from limited genome data often fail in genotyping certai...
Aurantimicrobium minutum type strain KNC(T) is a planktonic ultramicrobacterium isolated from river water in western Japan. Strain KNC(T) has an extremely small, streamlined genome of 1,622,386 bp comprising 1,575 protein-coding sequences. The genome annotation suggests that strain KNC(T) has an actinorhodopsin-based photometabolism.
Long-read sequencing represented by Pacific Biosciences’ single-molecule real-time (SMRT) technology has been widely used for microbial genomes. We overview an analysis procedure of Lactobacillus hokkaidonensis LOOC260T genome using the so-called “PacBio“ data. We describe (i) the characteristics of PacBio data, (ii) genome assembly using the HGAP...
Cyanobacterial genus
Leptolyngbya
comprises genetically diverse species, but the availability of their complete genome information is limited. Here, we isolated
Leptolyngbya
sp. strain NIES-3755 from soil at the Toyohashi University of Technology, Japan. We determined the complete genome sequence of the NIES-3755 strain, which is composed of one ch...
Genome assembly is a major task of NGS analyses. For this purpose, many assemblers based on the de Bruijn graph have been developed. In the framework, each node represents a series of overlapping k-mers (k nucleotides) and contigs can be obtained as paths solved from the k-mer graph. The most significant parameter is therefore k. We overview conven...
Cyanobacterial phytochrome-class photosensors are recently emerging optogenetic tools, but availability of thermoresistant photosensors is limited. We isolated Fischerella sp. strain NIES-3754 from hotspring at Suwa-shrine, Suwa, Nagano, Japan. We determined complete genome sequence of the NIES-3754 strain, which is composed of one chromosome and t...
While Marchantia polymorpha has been utilized as a model system to investigate fundamental biological questions for over almost two centuries, there
is renewed interest in M. polymorpha as a model genetic organism in the genomics era. Here we outline community guidelines for M. polymorpha gene and transgene nomenclature, and we anticipate that thes...
Lactobacillus hokkaidonensis is an obligate heterofermentative lactic acid bacterium, which is isolated from Timothy grass silage in Hokkaido, a subarctic region of Japan. This bacterium is expected to be useful as a silage starter culture in cold regions because of its remarkable psychrotolerance; it can grow at temperatures as low as 4°C. To eluc...
To explore the diverse photoreceptors of cyanobacteria, we isolated Nostoc sp. strain NIES-3756 from soil at Mimomi-Park, Chiba, Japan, and determined its complete genome sequence. The Genome consists of one chromosome and two plasmids (total 6,987,571bp containing no gaps). The NIES-3756 strain carries 7 phytochrome and 12 cyanobacteriochrome gene...
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the
DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics
Institute (EBI) within the fra...
The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of NGS technology, a flood of Oryza species reference genomes and genomic variation information has become available...
We release a high-resolution map of genomic transformation-competent artificial chromosome (TAC) clones extending over all Arabidopsis thaliana (Arabidopsis) chromosomes. The Arabidopsis genomic TAC clones have been valuable genetic tools. Previously, we constructed an Arabidopsis genomic TAC library, which consists of more than ten thousand TAC cl...
Bifidobacterium longum 105-A shows high transformation efficiency and allows for the generation of gene knockout mutants through homologous recombination.
Here, we report the complete genome sequence of strain 105-A. Genes encoding at least four putative restriction-modification
systems were found in this genome, which might contribute to its trans...
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Cent...
Although cyanobacteria are photoautotrophs, they have heterotrophic metabolism that enables them to survive in their natural habitat. However, cyanobacterial species that grow heterotrophically in the dark are rare. It remains largely unknown how cyanobacteria regulate heterotrophic activity. The cyanobacterium Leptolyngbya boryana grows heterotrop...
We report the 1.86-Mb draft genome and annotation of Lactobacillus oryzae SG293T isolated from fermented rice grains. This genome information may provide further insights into the mechanisms underlying
the fermentation of rice grains.
In this study, the genes expressed in response to low pH stress were identified in the unicellular cyanobacterium Synechocystis sp. PCC 6803 using DNA microarrays. The expression of slr0967 and sll0939 constantly increased throughout 4-h acid stress conditions. Overexpression of these two genes under the control of the trc promoter induced the cell...
Weissella oryzae was originally isolated from fermented rice grains. Here we report the draft genome sequence of the type strain of W. oryzae. This first report on the genomic sequence of this species may help identify the mechanisms underlying bacterial adaptation
to the ecological niche of fermented rice grains.
Microbial genome sequence submissions to the International Nucleotide Sequence Database Collaboration (INSDC) have been annotated with organism names that include the strain identifier. Each of these strain-level names has been assigned a unique 'taxid' in the NCBI Taxonomy Database. With the significant growth in genome sequencing, it is not possi...
The colonization of land by plants was a key event in the evolution of life. Here we report
the draft genome sequence of the filamentous terrestrial alga Klebsormidium flaccidum
(Division Charophyta, Order Klebsormidiales) to elucidate the early transition step from
aquatic algae to land plants. Comparison of the genome sequence with that of other...
In forward genetics, identification of mutations is a time-consuming and laborious process. Modern whole-genome sequencing, coupled with bioinformatics analysis, has enabled fast and cost-effective mutation identification. However, for many experimental researchers, bioinformatics analysis is still a difficult aspect of whole-genome sequencing. To...