Shujie Ma

Shujie Ma
University of California, Riverside | UCR · Department of Statistics

About

42
Publications
3,896
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
675
Citations
Citations since 2017
19 Research Items
513 Citations
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120
2017201820192020202120222023020406080100120

Publications

Publications (42)
Preprint
We consider a sparse deep ReLU network (SDRN) estimator obtained from empirical risk minimization with a Lipschitz loss function in the presence of a large number of features. Our framework can be applied to a variety of regression and classification problems. The unknown target function to estimate is assumed to be in a Korobov space. Functions in...
Preprint
Uncovering the heterogeneity in the disease progression of Alzheimer's is a key factor to disease understanding and treatment development, so that interventions can be tailored to target the subgroups that will benefit most from the treatment, which is an important goal of precision medicine. However, in practice, one top methodological challenge h...
Article
Full-text available
With a large number of baseline covariates, we propose a new semi-parametric modeling strategy for heterogeneous treatment effect estimation and individualized treatment selection, which are two major goals in personalized medicine. We achieve the first goal through estimating a covariate-specific treatment effect (CSTE) curve modeled as an unknown...
Article
We consider a semiparametric quantile factor panel model that allows observed stock-specific characteristics to affect stock returns in a nonlinear time-varying way, extending Connor, Hagmann, and Linton (2012) to the quantile restriction case. We propose a sieve-based estimation methodology that is easy to implement. We provide tools for inference...
Preprint
Full-text available
This paper proposes a logistic undirected network formation model which allows for assortative matching on observed individual characteristics and the presence of edge-wise fixed effects. We model the coefficients of observed characteristics to have a latent community structure and the edge-wise fixed effects to be of low rank. We propose a multi-s...
Article
Understanding treatment heterogeneity is essential to the development of precision medicine, which seeks to tailor medical treatments to subgroups of patients with similar characteristics. One of the challenges of achieving this goal is that we usually do not have a priori knowledge of the grouping information of patients with respect to treatment...
Article
Mann–Whitney‐type causal effects are generally applicable to outcome variables with a natural ordering, have been recommended for clinical trials because of their clinical relevance and interpretability and are particularly useful in analysing an ordinal composite outcome that combines an original primary outcome with death and possibly treatment d...
Article
Full-text available
A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be importan...
Preprint
In this paper, we propose a new semi-parametric modeling strategy for heterogeneous treatment effect estimation and individualized treatment selection, which are two major goals in personalized medicine, with a large number of baseline covariates. We achieve the first goal through estimating a covariate-specific treatment effect (CSTE) curve modele...
Article
Full-text available
Mann–Whitney-type causal effects are clinically relevant, easy to interpret, and readily applicable to a wide range of study settings. This article considers estimation of such effects when the outcome variable is a survival time subject to right censoring. We derive and discuss several methods: an outcome regression method based on a regression mo...
Article
Clinical trials are widely considered the gold standard for treatment evaluation, and they can be highly expensive in terms of time and money. The efficiency of clinical trials can be improved by incorporating information from baseline covariates that are related to clinical outcomes. This can be done by modifying an unadjusted treatment effect est...
Preprint
Full-text available
We propose to estimate the number of communities in degree-corrected stochastic block models based on a pseudo likelihood ratio. For estimation, we consider a spectral clustering together with binary segmentation method. This approach guarantees an upper bound for the pseudo likelihood ratio statistic when the model is over-fitted. We also derive i...
Article
In this paper we study the estimation of a large dimensional factor model when the factor loadings exhibit an unknown number of changes over time. We propose a novel three-step procedure to detect the breaks if any and then identify their locations. In the first step, we divide the whole time span into subintervals and fit a conventional factor mod...
Article
For conditional time-varying factor models with high dimensional assets, this article proposes a high dimensional alpha (HDA) test to assess whether there exist abnormal returns on securities (or portfolios) over the theoretical expected returns. To employ this test effectively, a constant coefficient test is also introduced. It examines the validi...
Article
Full-text available
We propose an estimation methodology for a semiparametric quantile factor panel model. We provide tools for inference that are robust to the existence of moments and to the form of weak cross-sectional dependence in the idiosyncratic error term. We apply our method to daily stock return data.
Article
Motivated by an HIV example, we consider how to compare and combine treatment selection markers, which are essential to the notion of precision medicine. The current literature on precision medicine is focused on evaluating and optimizing treatment regimes, which can be obtained by dichotomizing treatment selection markers. In practice, treatment d...
Article
Motivated by the study of gene and environment interactions, we consider a multivariate response varying-coefficient model with a large number of covariates. The need of nonparametrically estimating a large number of coefficient functions given relatively limited data poses a big challenge for fitting such a model. To overcome the challenge, we dev...
Article
While popular, single index models and additive models have potential limitations, a fact that leads us to propose SiAM, a novel hybrid combination of these two models. We first address model identifiability under general assumptions. The result is of independent interest. We then develop an estimation procedure by using splines to approximate unkn...
Article
There are many applications in which several response variables are predicted with a common set of predictors. To take into account the possible correlations among the responses, estimators with restricted rank were introduced. However, existing methods for performing reduced-rank regression are often based on least squares procedure, which is adve...
Article
We consider a problem motivated by issues in nutritional epidemiology, across diseases and populations. In this area, it is becoming increasingly common for diseases to be modeled by a single diet score, such as the Healthy Eating Index, the Mediterranean Diet Score, etc. For each disease and for each population, a partially linear single-index mod...
Article
We use a metagenomic approach and network analysis to investigate the relationships between phenotypes across taxa under different environmental conditions. The network structure of taxa can be affected by the disease-associated environmental conditions. In addition, taxa abundance is differentiated under conditions. Therefore, knowing how the corr...
Article
Single index models offer greater flexibility in data analysis than linear models but retain some of the desirable properties such as the interpretability of the coefficients. We consider a pseudo-profile likelihood approach to estimation and testing for single-index quantile regression models. We establish the asymptotic normality of the index coe...
Article
In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our pr...
Article
To study the relationship of serum antibody neutralization activity (determined by IC50) and the B cell immune response, we face two challenges: (i) IC50 values can not be observed when they are below the detected limitation, and (ii) the number of factors is larger than the number of observations. To address these two challenges, we propose a Tobi...
Article
In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variab...
Article
We consider the problem of estimating a relationship nonparametrically using regression splines when there exist both continuous and categorical predictors. We combine the global properties of regression splines with the local properties of categorical kernel functions to handle the presence of categorical predictors rather than resorting to sample...
Article
In this paper, we propose a flexible generalized semiparametric model for repeated measurements by combining generalized partially linear single-index models with varying coefficient models. The proposed model is a useful analytic tool to explore dynamic patterns which naturally exist in longitudinal data and also study possible nonlinear relations...
Article
We propose a functional single-index model (FSiM) to study the link between a scalar response variable and multiple functional predictors, in which the mean of the response is related to the linear predictors via an unknown link function. The FSiM serves as a good tool for dimension reduction in regression with multiple predictors and it is more fl...
Article
In this article, we study the estimations of partially linear single-index models (PLSiM) with repeated measurements. Specifically, we approximate the nonparametric function by the polynomial spline, and then employ the quadratic inference function (QIF) together with profile principle to derive the QIF-based estimators for the linear coefficients....
Article
It is commonly accepted that gene and environment (G××E) interactions play a pivotal role in determining the risk of human diseases. In conventional parametric models such as linear models and generalized linear models which are applied frequently to study statistical interactions, effects of covariates are decomposed into main effects and interact...
Article
A plug-in the number of interior knots (NIKs) selector is proposed for polynomial spline estimation in nonparametric regression. The existence and properties of the optimal NIKs for spline regression are established by minimising the weighted mean integrated squared error. We obtain plug-in formulae for the optimal NIKs based on the theoretical res...
Article
Studying model checking problems for partially linear single-index models, we propose a variant of the integrated conditional moment test using a linear projection weighting function, which gains dimension reduction and makes the proposed method act as if there exists only one covariate even in the presence of multiple dimensional regressors. We de...
Article
In genetic studies, not only can the number of predictors obtained from microarray measurements be extremely large, there can also be multiple response variables. Motivated by such a situation, we consider semiparametric dimension reduction methods in sparse multivariate regression models. Previous studies on joint variable and rank selection have...
Article
We consider the problem of estimating a relationship using semiparametric additive regression splines when there exist both continuous and categorical regressors, some of which are irrelevant but this is not known a priori. We show that choosing the spline degree, number of subintervals, and bandwidths via cross-validation can automatically remove...
Article
We propose a two-step estimating procedure for generalized additive partially linear models with clustered data using estimating equations. Our proposed method applies to the case that the number of observations per cluster is allowed to increase with the number of independent subjects. We establish oracle properties for the two-step estimator of e...
Article
Full-text available
We consider the problem of simultaneous variable selection and estimation in additive, partially linear models for longitudinal/clustered data. We propose an estimation procedure via polynomial splines to estimate the nonparametric components and apply proper penalty functions to achieve sparsity in the linear part. Under reasonable conditions, we...
Article
Full-text available
Functional data analysis has received considerable recent attention and a number of successful applications have been reported. In this paper, asymp-totically simultaneous confidence bands are obtained for the mean function of the functional regression model, using piecewise constant spline estimation. Simulation experiments corroborate the asympto...
Article
Full-text available
Motivation: The genetic basis of complex traits often involves the function of multiple genetic factors, their interactions and the interaction between the genetic and environmental factors. Gene-environment (G×E) interaction is considered pivotal in determining trait variations and susceptibility of many genetic disorders such as neurodegenerativ...
Article
Full-text available
In a random-design nonparametric regression model, procedures for detecting jumps in the regression function via constant and linear spline estimation method are proposed based on the maximal differences of the spline estimators among neighbouring knots, the limiting distributions of which are obtained when the regression function is smooth. Simula...
Article
A spline-backfitted kernel smoothing method is proposed for partially linear additive model. Under assumptions of stationarity and geometric mixing, the proposed function and parameter estimators are oracally efficient and fast to compute. Such superior properties are achieved by applying to the data spline smoothing and kernel smoothing consecutiv...
Conference Paper
The genetic basis of complex traits often involves the function of multiple genetic factors, their interactions and the interaction between the genetic and environmental factors. Gene–environment (G×E) interaction is considered pivotal in determining trait variations and susceptibility of many genetic disorders such as neurodegenerative diseases or...

Network

Cited By

Projects

Projects (4)
Archived project
estimation of density, distribution and regression functions
Archived project
detrending, heteroscedasticity, model selection, nonlinearity, prediction of time series
Archived project
effcient and fast estimation and testing for generalzied additive models