Farhad Hormozdiari

Farhad Hormozdiari
University of California, Los Angeles | UCLA

About

138
Publications
31,462
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
14,735
Citations
Introduction
Skills and Expertise

Publications

Publications (138)
Article
Full-text available
Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants an...
Preprint
Full-text available
Electronic health record (EHR) and biobank datasets contain multiple high-dimensional clinical data (HDCD) modalities (e.g., ECG, Photoplethysmography (PPG), and MRI) for each individual. Access to multimodal HDCD provides a unique opportunity for genetic studies of complex traits because different modalities relevant to a single physiological syst...
Chapter
Full-text available
Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual’s health status. In this paper, we take a step towards creating mult...
Preprint
Full-text available
Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating mult...
Preprint
Full-text available
High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discoverin...
Preprint
Full-text available
Background: Spirometry measures lung function by selecting the best of multiple efforts meeting pre-specified quality control (QC), and reporting two key metrics: forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC). We hypothesize that discarded submaximal and QC- failing data meaningfully contribute to the prediction of air...
Article
Full-text available
Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-report...
Preprint
Full-text available
Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify new genetic signals. Here we train a deep convolutional neural network on noisy self-re...
Article
Full-text available
Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis. Here, we developed a heritability-based framework for evaluating and combining different...
Article
Full-text available
Genome-wide association studies (GWASs) examine the association between genotype and phenotype while adjusting for a set of covariates. Although the covariates may have non-linear or interactive effects, due to the challenge of specifying the model, GWAS often neglect such terms. Here we introduce DeepNull, a method that identifies and adjusts for...
Article
Full-text available
Machine learning (ML) systems often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification in ML pipelines as a key reason for these failures. An ML pipeline is the full procedure followed to train and validate a predictor. Such a pipeline is underspecified when it can return many distinct pr...
Article
Full-text available
Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility...
Preprint
Full-text available
Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is u...
Article
Full-text available
Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic par...
Preprint
Full-text available
Genome-wide association studies (GWAS) are among the workhorses of statistical genetics, having detected thousands of variants associated with complex traits and diseases. A typical GWAS examines the association between genotypes and the phenotype of interest while adjusting for a set of covariates. While covariates potentially have non-linear effe...
Article
Full-text available
In standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenot...
Article
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v...
Article
Full-text available
The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across...
Article
Full-text available
Fine-mapping aims to identify causal variants impacting complex traits. We propose PolyFun, a computationally scalable framework to improve fine-mapping accuracy by leveraging functional annotations across the entire genome—not just genome-wide-significant loci—to specify prior probabilities for fine-mapping methods such as SuSiE or FINEMAP. In sim...
Article
Full-text available
Background: Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the v8 release also includes up to 15% of individuals of non-European ancestry. Assessing a...
Article
Full-text available
Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and trait...
Article
Full-text available
The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We compr...
Chapter
Genome-Wide Association Studies (GWAS) have successfully identified numerous genetic variants associated with a variety of complex traits in humans. However, most of these associated variants are not causal, and are simply in Linkage Disequilibrium (LD) with the true causal variants. This problem is addressed by statistical “fine mapping” methods,...
Preprint
Full-text available
Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of ''fine mapping'' methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibili...
Article
Full-text available
Many disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. Previous studies have found enrichment of expression quantitative trait loci (eQTLs) in disease risk loci, indicating that identifying causal variants for gene expression is important for elucidating the genetic basis of not only ge...
Preprint
Full-text available
Background Population structure among study subjects may confound genetic association studies, and lack of proper correction can lead to spurious findings. The Genotype-Tissue Expression (GTEx) project largely contains individuals of European ancestry, but the final release (v8) also includes up to 15% of individuals of non-European ancestry. Asses...
Article
Full-text available
Background: Neurodevelopmental disorders (NDDs) such as autism spectrum disorder, intellectual disability, developmental disability, and epilepsy are characterized by abnormal brain development that may affect cognition, learning, behavior, and motor skills. High co-occurrence (comorbidity) of NDDs indicates a shared, underlying biological mechani...
Preprint
Full-text available
The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human traits and diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genetic loci discovered by Genome-Wide Ass...
Preprint
Full-text available
Fine-mapping aims to identify causal variants impacting complex traits. Several recent methods improve fine-mapping accuracy by prioritizing variants in enriched functional annotations. However, these methods can only use information at genome-wide significant loci (and/or a small number of functional annotations), severely limiting the benefit of...
Article
Regulatory variation plays a major role in complex disease and that cell-type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF binding sites to disease heritability is challenging, as binding is often cell-type-specific and annotations from directly measure...
Preprint
Full-text available
The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues, and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the v8 data, based on 17,382 RNA-sequencing samples from 54 tissues of 948 post-mortem donors. We comprehensi...
Preprint
Full-text available
Deep learning models have shown great promise in predicting genome-wide regulatory effects from DNA sequence, but their informativeness for human complex diseases and traits is not fully understood. Here, we evaluate the disease informativeness of allelic-effect annotations (absolute value of the predicted difference between reference and variant a...
Article
Full-text available
Transposable elements (TE) comprise roughly half of the human genome. Though initially derided as junk DNA, they have been widely hypothesized to contribute to the evolution of gene regulation. However, the contribution of TE to the genetic architecture of diseases remains unknown. Here, we analyze data from 41 independent diseases and complex trai...
Article
Complex traits and common diseases are extremely polygenic, their heritability spread across thousands of loci. One possible explanation is that thousands of genes and loci have similarly important biological effects when mutated. However, we hypothesize that for most complex traits, relatively few genes and loci are critical, and negative selectio...
Article
Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological...
Article
Regulatory elements, e.g., enhancers and promoters, have been widely reported to be enriched for disease and complex trait heritability. We investigated how this enrichment varies with the age of the underlying genome sequence, the conservation of regulatory function across species, and the target gene of the regulatory element. We estimated herita...
Preprint
It is widely known that regulatory variation plays a major role in complex disease and that cell-type-specific binding of transcription factors (TF) is critical to gene regulation, but genomic annotations from directly measured TF binding information are not currently available for most cell-type-TF pairs. Here, we construct cell-type-specific TF b...
Preprint
Full-text available
Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 independent diseases and complex traits (average N = 323K) to identify enriched annotations. First, we constructed ann...
Article
Full-text available
Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable...
Article
Genotype imputation has been widely utilized for two reasons in the analysis of genome-wide association studies (GWAS). One reason is to increase the power for association studies when causal single nucleotide polymorphisms are not collected in the GWAS. The second reason is to aid the interpretation of a GWAS result by predicting the association s...
Preprint
Full-text available
Regulatory elements, e.g. enhancers and promoters, have been widely reported to be enriched for disease and complex trait heritability. We investigated how this enrichment varies with the age of the underlying genome sequence, the conservation of regulatory function across species, and the target gene of the regulatory element. We estimated heritab...
Preprint
Full-text available
Complex traits and common disease are highly polygenic: thousands of common variants are causal, and their effect sizes are almost always small. Polygenicity could be explained by negative selection, which constrains common-variant effect sizes and may reshape their distribution across the genome. We refer to this phenomenon as flattening , as gene...
Article
Full-text available
There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We...
Article
Full-text available
We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies. Applying po...
Preprint
Full-text available
Biological interpretation of GWAS data frequently involves analyzing unsigned genomic annotations comprising SNPs involved in a biological process and assessing enrichment for disease signal. However, it is often possible to generate signed annotations quantifying whether each SNP allele promotes or hinders a biological process, e.g., binding of a...
Preprint
Many disease risk loci identified in genome-wide association studies are present in non-coding regions of the genome. It is hypothesized that these variants affect complex traits by acting as expression quantitative trait loci (eQTLs) that influence expression of nearby genes. This indicates that many causal variants for complex traits are likely t...
Preprint
Genome-wide association studies (GWAS) have discovered thousands of variants involved in common human diseases. In these studies, frequencies of genetic variants are compared between a cohort of individuals with a disease (cases) and a cohort of healthy individuals (controls). Any variant that has a significantly different frequency between the two...
Preprint
There is increasing evidence that many GWAS risk loci are molecular QTL for gene expression (eQTL), histone modification (hQTL), splicing (sQTL), and/or DNA methylation (meQTL). Here, we introduce a new set of functional annotations based on causal posterior probabilities (CPP) of fine-mapped molecular cis-QTL, using data from the GTEx and BLUEPRIN...
Article
Full-text available
African genomics and skin color Skin color varies among human populations and is thought to be under selection, with light skin maximizing vitamin D production at higher latitudes and dark skin providing UV protection in equatorial zones. To identify the genes that give rise to the palette of human skin tones, Crawford et al. applied genome-wide an...
Article
Full-text available
Genetic effects on gene expression across human tissues The GTEx (Genotype-Tissue Expression) Consortium has established a reference catalogue and