# Ali Shojaie's research while affiliated with Trinity Washington University and other places

**What is this page?**

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

## Publications (163)

Background:
Pulmonary arterial hypertension (PAH) is a complex disease characterized by progressive right ventricular (RV) failure leading to significant morbidity and mortality. Investigating metabolic features and pathways associated with RV dilation, mortality, and measures of disease severity can provide insight into molecular mechanisms, iden...

Microglia‐mediated neuroinflammation contributes to disease progression in Alzheimer’s Disease (AD). Microglia demonstrate heterogeneous states with proposed beneficial, harmful, and disease‐specific subtypes. Defining the spectrum of microglia phenotypes is crucially important to the design of neuroinflammation‐modulating therapies. We performed s...

Our prior work shows that azinphos-methyl pesticide exposure is associated with altered oral microbiomes in exposed farmworkers. Here we extend this analysis to show the same association pattern is also evident in their children. Oral buccal swab samples were analyzed at two time points, the apple thinning season in spring-summer 2005 for 78 childr...

Regime shifts in high-dimensional time series arise naturally in many applications, from neuroimaging to finance. This problem has received considerable attention in low-dimensional settings, with both Bayesian and frequentist methods used extensively for parameter estimation. The EM algorithm is a particularly popular strategy for parameter estima...

Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis (MESA) which included several hundred individuals with whole...

We propose a novel inference procedure for linear combinations of high-dimensional regression coefficients in generalized estimating equations, which have been widely used for correlated data analysis for decades. Our estimator, obtained via constructing a system of projected estimating equations, is shown to be asymptotically normally distributed...

Microglia contribute to Alzheimer’s Disease (AD) progression and are candidate therapeutic targets for disease modulation. Single cell transcriptomics demonstrate microglia adopt multiple phenotypes. However, identifying the human AD microglia profile was limited by small numbers. We employed fluorescence activated nuclei sorting prior to single-nu...

Background
In the last decade, genomic studies have identified and replicated thousands of genetic associations with measures of health and disease and contributed to the understanding of the etiology of a variety of health conditions. Proteins are key biomarkers in clinical medicine and often drug-therapy targets. Like genomics, proteomics can adv...

OBJECTIVE
Differences in type 2 diabetes phenotype by age are described, but it is not known whether these differences are seen in a more uniformly defined adult population at a common early stage of care. We sought to characterize age-related clinical and metabolic characteristics of adults with type 2 diabetes on metformin monotherapy, prior to t...

[This corrects the article DOI: 10.1371/journal.pgen.1008835.].

As aberrant network-level functional connectivity underlies a variety of neural disorders, the ability to induce targeted functional reorganization would be a profound development towards therapies for neural disorders. Brain stimulation has been shown to induce large-scale network-wide functional connectivity changes (FCC), but the mapping from st...

Introduced more than a half-century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this framework for inferring causal relationships among time series has remained the topic of continuous de...

The PC and FCI algorithms are popular constraint-based methods for learning the structure of directed acyclic graphs (DAGs) in the absence and presence of latent and selection variables, respectively. These algorithms (and their order-independent variants, PC-stable and FCI-stable) have been shown to be consistent for learning sparse high-dimension...

Islet autoimmunity may contribute to β-cell dysfunction in type 2 diabetes (T2D). Its prevalence and clinical significance have not been rigorously determined. In this ancillary study to the Glycemia Reduction Approaches in Diabetes-A Comparative Effectiveness (GRADE) Study, we investigated the prevalence of cellular and humoral islet autoimmunity...

Thanks to technological advances leading to near-continuous time observations, emerging multivariate point process data offer new opportunities for causal discovery. However, a key obstacle in achieving this goal is that many relevant processes may not be observed in practice. Naïve estimation approaches that ignore these hidden variables can gener...

Microglia‐mediated neuroinflammation is hypothesized to contribute to disease progression in neurodegenerative diseases such as Alzheimer’s Disease (AD). Microglia subtypes are complex, with beneficial and harmful phenotypes. Understanding the gene expression networks which define the spectrum of microglia phenotypes is critical to identifying spec...

Microglia-mediated neuroinflammation is hypothesized to contribute to disease progression in neurodegenerative diseases such as Alzheimer's Disease (AD). Microglia demonstrate heterogeneous states in health and disease, with proposed beneficial, harmful, and disease specific subtypes. Defining the spectrum of microglia phenotypes is an important st...

Linear mixed models are widely used in ecological and biological applications, especially in genetic studies. Reliable estimation of variance components is crucial for using linear mixed models. However, standard methods, such as the restricted maximum likelihood (REML), are computationally inefficient in large samples and may be unstable with smal...

Background
Differential correlation networks are increasingly used to delineate changes in interactions among biomolecules. They characterize differences between omics networks under two different conditions, and can be used to delineate mechanisms of disease initiation and progression.
Results
We present a new R package, CorDiffViz, that facilita...

Mendelian randomization (MR) studies carried out among patients with a particular health condition should establish the genetic instrument influences the exposure in that subgroup, however this is normally investigated in the general population. Here, we investigated whether the genetic associations of four cis-acting C-reactive protein (CRP) varia...

Modern high-dimensional point process data, especially those from neuroscience experiments, often involve observations from multiple conditions and/or experiments. Networks of interactions corresponding to these conditions are expected to share many edges, but also exhibit unique, condition-specific ones. However, the degree of similarity among the...

Thanks to technological advances leading to near-continuous time observations, emerging multivariate point process data offer new opportunities for causal discovery. However, a key obstacle in achieving this goal is that many relevant processes may not be observed in practice. Naive estimation approaches that ignore these hidden variables can gener...

Differential Granger causality, that is understanding how Granger causal relations differ between two related time series, is of interest in many scientific applications. Modeling each time series by a vector autoregressive (VAR) model, we propose a new method to directly learn the difference between the corresponding transition matrices in high di...

Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is considerable interest in modeling interactions among such relative proportions. To this end we propose...

We consider the problem of learning causal structures in sparse high-dimensional settings that may be subject to the presence of (potentially many) unmeasured confounders, as well as selection bias. Based on the structure found in common families of large random networks and examining the representation of local structures in linear structural equa...

Consumption of dietary lignans has been associated with reduced risk of chronic diseases, although the underlying mechanisms are unclear. We sought to determine if urinary excretion of ENL, the predominant microbial metabolite of dietary lignans, was associated with plasma protein abundance using a cross-sectional design based on data and plasma co...

Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariat...

The PC and FCI algorithms are popular constraint-based methods for learning the structure of directed acyclic graphs (DAGs) in the absence and presence of latent and selection variables, respectively. These algorithms (and their order-independent variants, PC-stable and FCI-stable) have been shown to be consistent for learning sparse high-dimension...

Existing software tools for topology-based pathway enrichment analysis are either computationally inefficient, have undesirable statistical power, or require expert knowledge to leverage the methods’ capabilities. To address these limitations, we have overhauled NetGSA, an existing topology-based method, to provide a computationally-efficient user-...

In causal graphical models based on directed acyclic graphs (DAGs), directed paths represent causal pathways between the corresponding variables. The variable at the beginning of such a path is referred to as an ancestor of the variable at the end of the path. Ancestral relations between variables play an important role in causal modeling. In exist...

As aberrant network-level functional connectivity underlies a variety of neural disorders, the ability to induce targeted functional reorganization would be a profound development towards therapies for neural disorders. Brain stimulation has been shown to alter large-scale network-wide functional connectivity, but the mapping from stimulation to th...

It is often of interest to make inference on an unknown function that is a local parameter of the data-generating mechanism, such as a density or regression function. Such estimands can typically only be estimated at a slower-than-parametric rate in nonparametric and semiparametric models, and performing calibrated inference can be challenging. In...

Originally developed for imputing missing entries in low rank, or approximately low rank matrices, matrix completion has proven widely effective in many problems where there is no reason to assume low-dimensional linear structure in the underlying matrix, as would be imposed by rank constraints. In this manuscript, we build some theoretical intuiti...

Introduced more than a half century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this notion for inferring causal relationships among time series has remained the topic of continuous debat...

Networks are increasingly used to capture the multitude of interaction mechanisms among microbes. Due to the dynamic nature of microbial communities and their interactions, and given the paucity of well-curated databases for microbial interactions, these networks are often inferred from microbiome abundance data. While a number of procedures have b...

This paper studies high-dimensional regression with two-way structured data. To estimate the high-dimensional coefficient vector, we propose the generalized matrix decomposition regression (GMDR) to efficiently leverage any auxiliary information on row and column structures. The GMDR extends the principal component regression (PCR) to two-way struc...

While most classical approaches to Granger causality detection assume linear dynamics, many interactions in applied domains, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured...

The cover image is based on the Advance Review Differential network analysis: A statistical perspective by Ali Shojaie., https://doi.org/10.1002/wics.1508.
Abstract
The cover image is based on the Advance Review Differential network analysis: A statistical perspective by Ali Shojaie., https://doi.org/10.1002/wics.1508.

A bstract
Linear mixed models are widely used in ecological and biological applications, especially in genetic studies. Reliable estimation of variance components is crucial for using linear mixed models. However, standard methods, such as the restricted maximum likelihood (REML), are computationally inefficient and may be unstable with small sampl...

Estimation of density functions supported on general domains arises when the data are naturally restricted to a proper subset of the real space. This problem is complicated by typically intractable normalizing constants. Score matching provides a powerful tool for estimating densities with such intractable normalizing constants but as originally pr...

Background
Dietary patterns low in glycemic load are associated with reduced risk of cardiometabolic diseases. Improvements in serum lipid concentrations may play a role in these observed associations.
Objective
We investigated how dietary patterns differing in glycemic load affect clinical lipid panel measures and plasma lipidomics profiles.
Met...

Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a mathematical programm...

This paper concerns the development of an inferential framework for high-dimensional linear mixed effect models. These are suitable models, for instance, when we have n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (and may be larger than M), but the number of random effects q is small. Ou...

Qualitative interactions occur when a treatment effect or measure of association varies in sign by sub-population. Of particular interest in many biomedical settings are absence/presence qualitative interactions, which occur when an effect is present in one sub-population but absent in another. Absence/presence interactions arise in emerging applic...

Differences between genetic networks corresponding to disease conditions may delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on exogenous variables, or covariates. As a result, these approaches may detect spurious differential connections, which are induced by...

Estimation of density functions supported on general domains arises when the data is naturally restricted to a proper subset of the real space. This problem is complicated by typically intractable normalizing constants. Score matching provides a powerful tool for estimating densities with such intractable normalizing constants, but as originally pr...

Fueled in part by recent applications in neuroscience, the multivariate Hawkes process has become a popular tool for modeling the network of interactions among high-dimensional point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work has primar...

In most organisms, dietary restriction (DR) increases lifespan. However, several studies have found that genotypes within the same species vary widely in how they respond to DR. To explore the mechanisms underlying this variation, we exposed 178 inbred Drosophila melanogaster lines to a DR or ad libitum (AL) diet, and measured a panel of 105 metabo...

We hypothesized that islet autoimmunity, hitherto considered the pathophysiologic basis of type 1 diabetes (T1D) and latent autoimmune diabetes of adults (LADA), could contribute to ß cell dysfunction in patients with type 2 diabetes (T2D). To evaluate this question, the Glycemia Reduction Approaches in Diabetes: A Comparative Effectiveness (GRADE)...

Bayesian Networks (BNs) represent conditional probability relations among a set of random variables (nodes) in the form of a directed acyclic graph (DAG), and have found diverse applications in knowledge discovery. We study the problem of learning the sparse DAG structure of a BN from continuous observational data. The central problem can be modele...

Modern RNA sequencing technologies provide gene expression measurements from single cells that promise refined insights on regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often sho...

Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of com...

Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of com...

Biplots that simultaneously display the sample clustering and the important taxa have gained popularity in the exploratory analysis of human microbiome data. Traditional biplots, assuming Euclidean distances between samples, are not appropriate for microbiome data, when non-Euclidean distances are used to characterize dissimilarities among microbia...

This paper concerns the development of an inferential framework for high-dimensional linear mixed effect models. These are suitable models, for instance, when we have $n$ repeated measurements for $M$ subjects. We consider a scenario where the number of fixed effects $p$ is large (and may be larger than $M$), but the number of random effects $q$ is...

Background:
Pathway enrichment extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels of the bio...

Exploratory analysis of human microbiome data is often based on dimension-reduced graphical displays derived from similarities based on non-Euclidean distances, such as UniFrac or Bray-Curtis. However, a display of this type, often referred to as the principal coordinate analysis (PCoA) plot, does not reveal which taxa are related to the observed c...

Background
Pathway enrichment analysis is extensively used in the analysis of Omics data for gaining biological insights into the functional roles of pre-defined subsets of genes, proteins and metabolites. A large number of methods have been proposed in the literature for this task. The vast majority of these methods use as input expression levels...

Identifying differences in networks has become a canonical problem in many biological applications. Here, we focus on testing whether two Gaussian graphical models are the same. Existing methods try to accomplish this goal by either directly comparing their estimated structures, or testing the null hypothesis that the partial correlation matrices a...

Objectives:
Dietary patterns high in fiber from sources including whole grains, legumes, fruits, vegetables, nuts and seeds, are associated with lower risk of chronic disease, such as cardiovascular disease and cancer. We investigated how plasma lipidomics profiles differed between a diet high in whole grains (WG) versus a diet high in refined gra...

Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a mathematical programm...

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyvärinen (2005) avoids direct calculation of t...

We present a novel approach for nonparametric regression using wavelet basis functions. Our proposal, $\texttt{waveMesh}$, can be applied to non-equispaced data with sample size not necessarily a power of 2. We develop an efficient proximal gradient descent algorithm for computing the estimator and establish adaptive minimax convergence rates. The...

We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We pr...

An optimal and flexible multiple hypotheses testing procedure is constructed for dependent data based on Bayesian techniques, aiming at handling two challenges, namely dependence structure and non-null distribution specification. Ignoring dependence among hypotheses tests may lead to loss of efficiency and bias in decision. Misspecification in the...

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyv\"arinen [2005] avoids direct calculation of...

We consider the task of estimating a high-dimensional directed acyclic graph, given observations from a linear structural equation model with arbitrary noise distribution. By exploiting properties of common random graphs, we develop a new algorithm that requires conditioning only on small sets of variables. The proposed algorithm, which is essentia...

Background
Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope.
Methods
In this study we presen...

A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. In contrast, the score matching method of Hyv\"arinen (2005) avoids direct c...

While most classical approaches to Granger causality detection assume linear dynamics, many interactions in applied domains, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured...

Objective:
The effects of diets high in refined grains on biliary and colonic bile acids have been investigated extensively. However, the effects of diets high in whole versus refined grains on circulating bile acids, which can influence glucose homeostasis and inflammation through activation of farnesoid X receptor (FXR) and G protein-coupled bil...

Toxicology plays a key role in public and environmental health. Traditionally, animals exposing to toxicants were used in toxicity tests. Fueled by rapid technologies advances, the majority of animal tests have recently been replaced by tests based on high-throughput data. In this regard, toxicogenomics helps discover the relationship between toxic...

While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the...

We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate non-stationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated...

Assuming stationarity is unrealistic in many time series applications. A more realistic alternative is to allow for piecewise stationarity, where the model is allowed to change at given time points. We propose a three-stage procedure for consistent estimation of both structural change points and parameters of high-dimensional piecewise vector autor...

Enterolignans, products of gut bacterial metabolism of plant lignans, have been associated with reduced risk of chronic diseases, but their association with other plasma metabolites is unknown. We examined plasma metabolite profiles according to urinary enterolignan excretion in a cross-sectional analysis using data from a randomized crossover, con...

Assuming stationarity is unrealistic in many time series applications. A more realistic alternative is to allow for piecewise stationarity, where the model is allowed to change at given time points. In this article, the problem of detecting the change points in a high-dimensional piecewise vector autoregressive model (VAR) is considered. Reformulat...

The Hawkes process is a class of point processes whose future depends on its own history. Previous theoretical work on the Hawkes process is limited to the case of a mutually-exciting process, in which a past event can only increase the occurrence of future events. However, in neuronal networks and other real-world applications, inhibitory relation...

MicroRNAs (miRNAs) are short non-coding RNAs which target mRNAs by binding to them and regulating their expression. Involvement of miRNAs has been discovered in many diseases, so it is fruitful to investigate the miRNAs and their targets to develop new therapeutic ways by designing anti-miRNA oligonucleotides. There are various computational method...

We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a conve...

Long-term use of aspirin is associated with lower risk of colorectal cancer and other cancers; however, the mechanism of chemopreventive effect of aspirin is not fully understood. Animal studies suggest that COX-2, NFκB signaling and Wnt/β-catenin pathways may play a role, but no clinical trials have systematically evaluated the biological response...