Masao Nagasaki

Tohoku University, Sendai-shi, Miyagi, Japan

Are you Masao Nagasaki?

Claim your profile

Publications (151)409.12 Total impact


  • No preview · Article · Jan 2016 · British Journal of Haematology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mitochondrial disorders have the highest incidence among congenital metabolic disorders characterized by biochemical respiratory chain complex deficiencies. It occurs at a rate of 1 in 5,000 births, and has phenotypic and genetic heterogeneity. Mutations in about 1,500 nuclear encoded mitochondrial proteins may cause mitochondrial dysfunction of energy production and mitochondrial disorders. More than 250 genes that cause mitochondrial disorders have been reported to date. However exact genetic diagnosis for patients still remained largely unknown. To reveal this heterogeneity, we performed comprehensive genomic analyses for 142 patients with childhood-onset mitochondrial respiratory chain complex deficiencies. The approach includes whole mtDNA and exome analyses using high-throughput sequencing, and chromosomal aberration analyses using high-density oligonucleotide arrays. We identified 37 novel mutations in known mitochondrial disease genes and 3 mitochondria-related genes (MRPS23, QRSL1, and PNPLA4) as novel causative genes. We also identified 2 genes known to cause monogenic diseases (MECP2 and TNNI3) and 3 chromosomal aberrations (6q24.3-q25.1, 17p12, and 22q11.21) as causes in this cohort. Our approaches enhance the ability to identify pathogenic gene mutations in patients with biochemically defined mitochondrial respiratory chain complex deficiencies in clinical settings. They also underscore clinical and genetic heterogeneity and will improve patient care of this complex disorder.
    Full-text · Article · Jan 2016 · PLoS Genetics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: RNA-sequencing (RNA-Seq) has become a popular tool for transcriptome profiling in mammals. However, accurate estimation of allele-specific expression (ASE) based on alignments of reads to the reference genome is challenging, because it contains only one allele on a mosaic haploid genome. Even with the information of diploid genome sequences, precise alignment of reads to the correct allele is difficult because of the high-similarity between the corresponding allele sequences. Results: We propose a Bayesian approach to estimate ASE from RNA-Seq data with diploid genome sequences. In the statistical framework, the haploid choice is modeled as a hidden variable and estimated simultaneously with isoform expression levels by variational Bayesian inference. Through the simulation data analysis, we demonstrate the effectiveness of the proposed approach in terms of identifying ASE compared to the existing approach. We also show that our approach enables better quantification of isoform expression levels compared to the existing methods, TIGAR2, RSEM and Cufflinks. In the real data analysis of the human reference lymphoblastoid cell line GM12878, some autosomal genes were identified as ASE genes, and skewed paternal X-chromosome inactivation in GM12878 was identified. Conclusions: The proposed method, called ASE-TIGAR, enables accurate estimation of gene expression from RNA-Seq data in an allele-specific manner. Our results show the effectiveness of utilizing personal genomic information for accurate estimation of ASE. An implementation of our method is available at http://nagasakilab.csml.org/ase-tigar.
    Preview · Article · Jan 2016 · BMC Genomics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The integrative Japanese Genome Variation Database (iJGVD; http://ijgvd.megabank.tohoku.ac.jp/) provides genomic variation data detected by whole-genome sequencing (WGS) of Japanese individuals. Specifically, the database contains variants detected by WGS of 1,070 individuals who participated in a genome cohort study of the Tohoku Medical Megabank Project. In the first release, iJGVD includes >4,300,000 autosomal single nucleotide variants (SNVs) whose minor allele frequencies are >5.0%.
    Preview · Article · Nov 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies.
    Full-text · Article · Aug 2015 · Nature Communications
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Given the advent of massively parallel DNA sequencing, human microbiome is analyzed comprehensively by metagenomic approaches. However, the inter- and intra-individual variability and stability of the human microbiome remain poorly characterized, particularly at the intra-day level. This issue is of crucial importance for studies examining the effects of microbiome on human health. Here, we focused on bacteriome of oral plaques, for which repeated, time-controlled sampling is feasible. Eighty-one supragingival plaque subjects were collected from healthy individuals, examining multiple sites within the mouth at three time points (forenoon, evening, and night) over the course of 3 days. Bacterial composition was estimated by 16S rRNA sequencing and species-level profiling, resulting in identification of a total of 162 known bacterial species. We found that species compositions and their relative abundances were similar within individuals, and not between sampling time or tooth type. This suggests that species-level oral bacterial composition differs significantly between individuals, although the number of subjects is limited and the intra-individual variation also occurs. The majority of detected bacterial species (98.2%; 159/162), however, did not fluctuate over the course of the day, implying a largely stable oral microbiome on an intra-day time scale. In fact, the stability of this data set enabled us to estimate potential interactions between rare bacteria, with 40 co-occurrences supported by the existing literature. In summary, the present study provides a valuable basis for studies of the human microbiome, with significant implications in terms of biological and clinical outcomes.
    Preview · Article · Jun 2015 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659 253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r(2)>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%<MAF⩽5%) of the Japonica array reached 67.2%, which is higher than those of the existing arrays. In addition, we confirmed the high quality genotyping performance of the Japonica array using the 288 samples in 1KJPN; the average call rate 99.7% and the average concordance rate 99.7% to the genotypes obtained from high-throughput sequencer. As demonstrated in this study, the creation of custom-made SNP arrays based on a population-specific reference panel is a practical way to facilitate further association studies through genome-wide genotype imputations.Journal of Human Genetics advance online publication, 25 June 2015; doi:10.1038/jhg.2015.68.
    Preview · Article · Jun 2015 · Journal of Human Genetics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BRCA1-associated protein 1 (BAP1) is a deubiquitinating enzyme that is involved in the regulation of cell growth. Recently, many somatic and germline mutations of BAP1 have been reported in a broad spectrum of tumors. In this study, we identified a novel somatic non-synonymous BAP1 mutation, a phenylalanine-to-isoleucine substitution at codon 170 (F170I), in one of 49 patients with esophageal squamous cell carcinoma (ESC). Multiplex ligation-dependent probe amplification (MLPA) of BAP1 gene in this ESC tumor disclosed monoallelic deletion (LOH), suggesting BAP1 alterations on both alleles in this tumor. The deubiquitinase activity and the auto-deubiquitinase activity of F170I-mutant BAP1 were markedly suppressed compared with wild-type BAP1. In addition, wild-type BAP1 mostly localizes to the nucleus, whereas the F170I mutant preferentially localized in the cytoplasm. Microarray analysis revealed that expression of the F170I mutant drastically altered gene expression profiles compared with expressed wild-type BAP1. Gene-ontology analyses indicated that the F170I mutation altered the expression of genes involved in oncogenic pathways. We found that one candidate, TCEAL7, previously reported as a putative tumor suppressor gene, was significantly induced by wild-type BAP1 as compared to F170I mutant BAP1. Furthermore, we found that the level of BAP1 expression in the nucleus was reduced in 44% of ESCs examined by immunohistochemistry (IHC). Because the nuclear localization of BAP1 is important for its tumor suppressor function, BAP1 may be functionally inactivated in substantial portion of ESCs. Taken together, BAP1 is likely to function as a tumor suppressor in at least a part of ESC. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Full-text · Article · Jun 2015 · Cancer Science
  • Source

    Full-text · Conference Paper · Jun 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimizes read alignments to HLA allele sequences and abundance of reads on HLA alleles by variational Bayesian inference. We show the effectiveness of the proposed method over other methods through the analysis of predicting HLA types for HLA class I (HLA-A, -B and -C) and class II (HLA-DQA1,-DQB1 and -DRB1) loci from the simulation data of various depth of coverage, and real sequencing data of human trio samples. HLA-VBSeq is an efficient and accurate HLA typing method using high-throughput sequencing data without the need of primer design for HLA loci. Moreover, it does not assume any prior knowledge about HLA allele frequencies, and hence HLA-VBSeq is broadly applicable to human samples obtained from a genetically diverse population.
    Full-text · Article · Feb 2015 · BMC Genomics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.
    Full-text · Article · Feb 2015 · BMC Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. > 250 bp). We propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods. TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.
    Full-text · Article · Dec 2014 · BMC Genomics

  • No preview · Article · Dec 2014 · Genes & Genetic Systems
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Comprehensive understanding of gene regulatory networks (GRNs) is a major challenge in the field of systems biology. Currently, there are two main approaches in GRN analysis using time-course observation data, namely an ordinary differential equation (ODE)-based approach and a statistical model-based approach. The ODE-based approach can generate complex dynamics of GRNs according to biologically validated nonlinear models. However, it cannot be applied to ten or more genes to simultaneously estimate system dynamics and regulatory relationships due to the computational difficulties. The statistical model-based approach uses highly abstract models to simply describe biological systems and to infer relationships among several hundreds of genes from the data. However, the high abstraction generates false regulations that are not permitted biologically. Thus, when dealing with several tens of genes of which the relationships are partially known, a method that can infer regulatory relationships based on a model with low abstraction and that can emulate the dynamics of ODE-based models while incorporating prior knowledge is urgently required. To accomplish this, we propose a method for inference of GRNs using a state space representation of a vector auto-regressive (VAR) model with L1 regularization. This method can estimate the dynamic behavior of genes based on linear time-series modeling constructed from an ODE-based model and can infer the regulatory structure among several tens of genes maximizing prediction ability for the observational data. Furthermore, the method is capable of incorporating various types of existing biological knowledge, e.g., drug kinetics and literature-recorded pathways. The effectiveness of the proposed method is shown through a comparison of simulation studies with several previous methods. For an application example, we evaluated mRNA expression profiles over time upon corticosteroid stimulation in rats, thus incorporating corticosteroid kinetics/dynamics, literature-recorded pathways and transcription factor (TF) information.
    Full-text · Article · Aug 2014 · PLoS ONE
  • [Show abstract] [Hide abstract]
    ABSTRACT: Library quantitation is a critical step to obtain high data output in Illumina HiSeq sequencers. Here, we introduce a library quantitation method that utilizes Illumina MiSeq sequencer, designated as quantitative MiSeq (qMiSeq). In this procedure, 96 dual-index libraries including control samples are denatured, pooled in equal volume, and sequenced by MiSeq. We found that relative concentration of each library can be determined based on the observed index ratio and can be used to determine HiSeq run condition for each library. Thus, qMiSeq provides an efficient way to quantitate a large number of libraries at a time.
    No preview · Article · Aug 2014 · Analytical Biochemistry
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously. Results Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%. Conclusions Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users.
    Full-text · Article · Aug 2014 · BMC Genomics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. Results We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. Conclusion The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
    Full-text · Article · Aug 2014 · BMC Genomics
  • [Show abstract] [Hide abstract]
    ABSTRACT: Haplotype phasing is essential for identifying disease-causing variants with phase-dependent interactions as well as for the coalescent-based inference of demographic history. One of approaches for estimating haplotypes is to use phase-informative reads, which span multiple heterozygous variant positions. Although the quality of estimated variants is crucial in haplotype phasing, accurate variant calling is still challenging due to errors on sequencing and read mapping. Since some of such errors can be corrected by considering haplotype phasing, simultaneous estimation of variants and haplotypes is important. Thus, we propose a statistically unified approach for variant calling and haplotype phasing named HapMonster, where haplotype phasing information is used for improving the accuracy of variant calling and the improved variant calls are used for more accurate haplotype phasing. From the comparison with other existing methods on simulation and real sequencing data, we confirm the effectiveness of HapMonster in both variant calling and haplotype phasing.
    No preview · Chapter · Jul 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently, several biological simulation models of, e.g., gene regulatory networks and metabolic pathways, have been constructed based on existing knowledge of biomolecular reactions, e.g., DNA-protein and protein-protein interactions. However, since these do not always contain all necessary molecules and reactions, their simulation results can be inconsistent with observational data. Therefore, improvements in such simulation models are urgently required. A previously reported method created multiple candidate simulation models by partially modifying existing models. However, this approach was computationally costly and could not handle a large number of candidates that are required to find models whose simulation results are highly consistent with the data. In order to overcome the problem, we focused on the fact that the qualitative dynamics of simulation models are highly similar if they share a certain amount of regulatory structures. This indicates that better fitting candidates tend to share the basic regulatory structure of the best fitting candidate, which can best predict the data among candidates. Thus, instead of evaluating all candidates, we propose an efficient explorative method that can selectively and sequentially evaluate candidates based on the similarity of their regulatory structures. Furthermore, in estimating the parameter values of a candidate, e.g., synthesis and degradation rates of mRNA, for the data, those of the previously evaluated candidates can be utilized. The method is applied here to the pharmacogenomic pathways for corticosterids in rats, using time-series microarray expression data. In the performance test, we succeeded in obtaining more than 80% of consistent solutions within 15% of the computational time as compared to the comprehensive evaluation. Then, we applied this approach to 142 literature-recorded simulation models of corticosteroid-induced genes, and consequently selected 134 newly constructed better models. The method described here was found to be capable of efficiently exploring candidate simulation models and obtaining better models within a short span of time. Furthermore, the results suggest that there may be room for improvement in literature recorded pathways and that they can be systematically updated using biological observational data.
    Full-text · Article · Jun 2014 · Bio Systems
  • Chen Li · Masao Nagasaki · Emi Ikeda · Yayoi Sekiya · Satoru Miyano
    [Show abstract] [Hide abstract]
    ABSTRACT: CSML and SBML are XML-based model definition standards which are developed with the aim of creating exchange formats for modeling, visualizing and simulating biological pathways. In this article we report a release of a format convertor for quantitative pathway models, namely CSML2SBML. It translates models encoded by CSML into SBML without loss of structural and kinetic information. The simulation and parameter estimation of the resulting SBML model can be carried out with compliant tool CellDesigner for further analysis. The convertor is based on the standards CSML version 3.0 and SBML Level 2 Version 4. In our experiments, 11 out of 15 pathway models in CSML model repository and 228 models in Macrophage Pathway Knowledgebase (MACPAK) are successfully converted to SBML models. The consistency of the resulting model is validated by libSBML Consistency Check of CellDesigner. Furthermore, the converted SBML model assigned with the kinetic parameters translated from CSML model can reproduce the same dynamics with CellDesigner as CSML one running on Cell Illustrator. CSML2SBML, along with its instructions and examples for use are available at http://csml2sbml.csml.org.
    No preview · Article · May 2014 · Bio Systems

Publication Stats

2k Citations
409.12 Total Impact Points

Institutions

  • 2012-2016
    • Tohoku University
      • Tohoku Medical Megabank Organization
      Sendai-shi, Miyagi, Japan
  • 1998-2013
    • The University of Tokyo
      • • Institute of Medical Science
      • • Center for Human Genome
      • • Department of Information Science
      白山, Tōkyō, Japan
  • 2010
    • RIKEN
      • Computational Climate Science Research Team
      Вако, Saitama, Japan
  • 2000-2003
    • Yamaguchi University
      • • Graduate School of Science and Engineering
      • • Faculty of Science
      Yamaguti, Yamaguchi, Japan