Yosuke Tanigawa

Yosuke Tanigawa
Massachusetts Institute of Technology | MIT · Computer Science and Artificial Intelligence Laboratory

Doctor of Philosophy

About

64
Publications
8,188
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
601
Citations
Introduction
Yosuke Tanigawa is a Postdoctoral Associate at MIT Computational Biology Lab (PI: Dr. Manolis Kellis). My research interests are in various topics in population genomics, specifically focusing on the analysis of large-scale data Personal website: https://yosuketanigawa.com/

Publications

Publications (64)
Article
Full-text available
Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics deriv...
Article
Full-text available
Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection ag...
Article
Full-text available
Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) pro...
Article
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual’s likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expressio...
Article
Full-text available
We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10 ⁻⁵ ) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotypi...
Preprint
Full-text available
Regular physical exercise has long been recognized to reverse the effects of diet-induced obesity, but the molecular mechanisms mediating these multi-tissue beneficial effects remain uncharacterized. Here, we address this challenge by studying the opposing effects of exercise training and high-fat diet at single-cell, deconvolution and tissue-level...
Article
Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by...
Article
Full-text available
Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating p...
Article
Full-text available
Background Hypertriglyceridemia has emerged as a critical coronary artery disease (CAD) risk factor. Rare loss-of-function (LoF) variants in apolipoprotein C-III have been reported to reduce triglycerides (TG) and are cardioprotective in American Indians and Europeans. However, there is a lack of data in other Europeans and non-Europeans. Also, whe...
Preprint
Full-text available
We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 × 10 ⁻⁵ ) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotypi...
Preprint
The SARS-CoV-2 pandemic has differentially impacted populations of varied race, ethnicity and socioeconomic status. Admixture mapping and local ancestry inference represent powerful tools to examine genetic risk within multi-ancestry genomes independent of these confounding social constructs. Here, we leverage a pandemic tracking strategy in which...
Preprint
Full-text available
Rare-variant aggregate analysis from exome and whole genome sequencing data typically summarizes with a single statistic the signal for a gene or the unit that is being aggregated. However, when doing so, the effect profile within the unit may not be easily characterized across one or multiple phenotypes. Here, we present an approach we call Multip...
Article
Motivation Large-scale and high-dimensional genome sequencing data poses computational challenges. General purpose optimization tools are usually not optimal in terms of computational and memory performance for genetic data. Results We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on milli...
Preprint
Full-text available
We develop two efficient solvers for optimization problems arising from large-scale regularized regressions on millions of genetic variants sequenced from hundreds of thousands of individuals. These genetic variants are encoded by the values in the set {0, 1, 2, NA}. We take advantage of this fact and use two bits to represent each entry in a genet...
Article
Motivation: The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data. Results: We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is appl...
Article
Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk b...
Article
Full-text available
Background - The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods - From a sample of 34,287 white British-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. Aortic valve area...
Preprint
The current genome-wide association studies (GWASs) do not yet capture sufficient diversity in terms of populations and scope of phenotypes. To address an essential need to expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype GWASs (disease endpoints, biomarkers, and medication usage) in BioBank Japan...
Article
Full-text available
The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been sh...
Article
Full-text available
Suicide accounts for nearly 800,000 deaths per year worldwide with rates of both deaths and attempts rising. Family studies have estimated substantial heritability of suicidal behavior; however, collecting the sample sizes necessary for successful genetic studies has remained a challenge. We utilized two different approaches in independent datasets...
Article
We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do...
Article
Sex differences have been shown in laboratory biomarkers; however, the extent to which this is due to genetics is unknown. In this study, we infer sex-specific genetic parameters (heritability and genetic correlation) across 33 quantitative biomarker traits in 181,064 females and 156,135 males from the UK Biobank study. We apply a Bayesian Mixture...
Preprint
Full-text available
Genetics plays a key role in drug response, affecting efficacy and toxicity. Pharmacogenomics aims to understand how genetic variation influences drug response and develop clinical guidelines to aid clinicians in personalized treatment decisions informed by genetics. Although pharmacogenomics has not been broadly adopted into clinical practice, gen...
Preprint
Full-text available
During COVID19 and other viral pandemics, rapid generation of host and pathogen genomic data is critical to tracking infection and informing therapies. There is an urgent need for efficient approaches to this data generation at scale. We have developed a scalable, high throughput approach to generate high fidelity low pass whole genome and HLA sequ...
Preprint
Full-text available
We propose a Sparse-Group regularized Cox regression method to analyze large-scale, ultrahigh-dimensional, and multi-response survival data efficiently. Our method has three key components: 1. A Sparse-Group penalty that encourages the coefficients to have small and overlapping support; 2. A variable screening procedure that minimizes the frequency...
Preprint
Full-text available
The human leukocyte antigen (HLA) region of the genome is one of the most disease-associated regions of the human genome, yet even well-studied alleles in the HLA region have unknown impact on disease. Here, we study the effect of 156 HLA alleles on 677 binary phenotypes for 337,138 individuals in the UK Biobank. We perform single-allele associatio...
Preprint
Full-text available
In high-dimensional regression problems, often a relatively small subset of the features are relevant for predicting the outcome, and methods that impose sparsity on the solution are popular. When multiple correlated outcomes are available (multitask), reduced rank regression is an effective way to borrow strength and capture latent structures that...
Preprint
Full-text available
G protein-coupled receptors (GPCRs) drive an array of critical physiological functions and are an important class of drug targets, though a map of which GPCR genetic variants are associated with phenotypic variation is lacking. We performed a phenome-wide association analysis for 269 common protein-altering variants in 156 GPCRs and 275 phenotypes,...
Preprint
The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. From a sample of 26,142 European-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. A genome-wide association study of aortic val...
Article
Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertai...
Preprint
Full-text available
The global pandemic of COVID-19 accounts for more than 14,000 deaths worldwide. However, little is known about the host genetics interaction with infection and COVID-19 progression. To better understand the role of host genetics, we review the current literature, aggregate readily available genetic resources, and provide some updated analysis relev...
Preprint
Full-text available
We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the L ¹ -regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in (Qian et al. 2019). The output of our algorithm is the full Lasso path, the parameter estimates at all predefined...
Preprint
Full-text available
Sex differences have been shown in laboratory biomarkers; however, the extent to which this is due to genetics is unknown. In this study, we infer sex-specific genetic parameters (heritability and genetic correlation) across 33 quantitative biomarker traits in 181,064 females and 156,135 males from the UK Biobank study. We apply a Bayesian mixture...
Preprint
Full-text available
Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While traditional models of genetic risk like polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce...
Preprint
Full-text available
In the fall of 2018, news broke about a researcher from China who had used CRISPR gene editing to cause human babies to have a deletion in the CCR5 chemokine receptor, making them resistant to HIV infection. One of the numerous ethical concerns about this study is that the deletion may have other effects. Subsequently, Nature Medicine published a B...
Preprint
Full-text available
We present WhichTF, a novel computational method to identify dominant transcription factors (TFs) from chromatin accessibility measurements. To rank TFs, WhichTF integrates high-confidence genome-wide computational prediction of TF binding sites based on evolutionary sequence conservation, putative gene-regulatory models, and ontology-based gene an...
Preprint
Full-text available
Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertai...
Preprint
Full-text available
Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data in UK Biobank and FinnGen to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through protein-altering variant association analy...
Preprint
Clinical laboratory tests are a critical component of the continuum of care and provide a means for rapid diagnosis and monitoring of chronic disease. In this study, we systematically evaluated the genetic basis of 38 blood and urine laboratory tests measured in 358,072 participants in the UK Biobank and identified 1,857 independent loci associated...
Preprint
Full-text available
The UK Biobank (Bycroft et al., 2018) is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with GWAS, have already been showed to gre...
Article
Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of th...
Article
Full-text available
Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here we present Global Biobank Engine (GBE), a web-based tool...
Preprint
Full-text available
To characterize latent components of genetic associations, we applied truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identified key components of genetic ass...
Article
Full-text available
Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. Here, we characterize the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Bio...
Preprint
Full-text available
Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here we present Global Biobank Engine (GBE), a web-based tool...
Preprint
Full-text available
Whole genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery and inference tha...
Preprint
Full-text available
Suicide accounts for nearly 800,000 deaths per year worldwide with rates of both deaths and attempts rising. Family studies have estimated substantial heritability of suicidal behavior; however, collecting the sample sizes necessary for successful genetic studies has remained a challenge. We utilized two different approaches in independent datasets...
Article
Recent studies have shown that environmental DNA is found almost everywhere. Flower petal surfaces are an attractive tissue to use for investigation of the dispersal of environmental DNA in nature as they are isolated from the external environment until the bud opens and only then can the petal surface accumulate environmental DNA. Here, we perform...
Preprint
Full-text available
Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. We characterized the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank...
Preprint
Full-text available
Recent studies have shown that environmental DNA is found almost everywhere. Flower petal surfaces are an attractive tissue to use for investigation of the dispersal of environmental DNA in nature as they are isolated from the external environment until the bud opens and only then can the petal surface accumulate environmental DNA. Here, we perform...

Network

Cited By

Projects

Projects (2)
Project
We aim to identify genetic variants that modulate the risk of diseases. Such variants may provide therapeutic insights into potential drug targets.
Project
To efficiently analyze large-scale (wide [p] and tall [n]) datasets in human genomic studies, we develop new statistical and computational methods. We specifically focus on (I) statistical models for joint analysis of multiple traits and (ii) sparse multivariate models for genetic risk prediction.