Wenxin Jiang

Wenxin Jiang
  • statistics
  • Northwestern University

About

98
Publications
8,569
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,895
Citations
Current institution
Northwestern University

Publications

Publications (98)
Preprint
Full-text available
Compositional data are quantitative descriptions of the parts of some whole, conveying relative information, which are ubiquitous in many fields. There has been a spate of interest in association networks for such data in biological and medical research, for example, microbial interaction networks. In this paper, we propose a novel method, the exte...
Preprint
We provide analytic formulas for the standard error and confidence intervals for the F measures, based on a property of asymptotic normality in the large sample limit. The formula can be applied for sample size planning in order to achieve accurate enough estimation of these F measures.
Preprint
To deal with the growing challenge from high dimensional data, we propose a conditional variable screening method for linear models named as conditional screening via ordinary least squares projection (COLP). COLP can take advantage of prior knowledge concerning certain active predictors by eliminating the adverse impacts from their coefficients in...
Preprint
Full-text available
Word embeddings capture syntactic and semantic information about words. Definition modeling aims to make the semantic content in each embedding explicit, by outputting a natural language definition based on the embedding. However, existing definition models are limited in their ability to generate accurate definitions for different senses of the sa...
Article
Several statistical issues associated with health care costs, such as heteroscedasticity and severe skewness, make it challenging to estimate or predict medical costs. When the interest is modeling the mean cost, it is desirable to make no assumption on the density function or higher order moments. Another challenge in developing cost prediction mo...
Article
When model uncertainty is handled by Bayesian model averaging (BMA) or Bayesian model selection (BMS), the posterior distribution possesses a desirable “oracle property” for parametric inference, if for large enough data it is nearly as good as the oracle posterior, obtained by assuming unrealistically that the true model is known and only the true...
Chapter
Network community detection is an important area of research. In this work, we propose a novel nonparametric probabilistic model for this task. We conduct random walks on the network and apply the Hierarchical Dirichlet Process topic model on the random walk data to explore the community structure of the network. Our work is among the very few ende...
Article
We study a classification problem of multiple sclerosis (MS) lesions in three dimensional brain magnetic resonance (MR) images. Segmentation of MS lesions is essential for MS diagnosis, assessment of disease progression and evaluation of treatment efficacy. Accurate identification of MS lesions in MR images is challenging due to variability in lesi...
Preprint
The iterative version of the sure independence screening algorithm (ISIS) has been widely employed in various scientific fields since Fan and Lv [2008] proposed it. Despite the outstanding performance of ISIS in extensive applications, its sure screening property has not been theoretically verified during the past decade. To fill this gap, we adapt...
Article
We study a partially identified linear contextual effects model for ecological inference, and we describe how to perform inference on the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. We derive various bounds for this non-identified parameter of the contextual effect...
Preprint
Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We study a partially identified linear contextual effects model for EI and describe how to estimate the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. This may be regar...
Article
In this paper, we study computation of the range of posterior expectations that arise from robust Bayesian statistics. We compute supremum and infimum of the posterior expectations, when allowing uncertainty for the choice of the likelihood function, or uncertainty for the choice of the prior distribution. In the standard approach of sensitivity an...
Article
Full-text available
Background Metabolic syndrome has become a major public health challenge worldwide. The association between metabolic syndrome and DNA methylation is of great research interest. ResultsWe constructed a binomial model to investigate the association between a metabolic syndrome index and DNA methylation in the Normative Aging Study. We applied the It...
Article
Full-text available
An asymmetric correlation measure commonly used in social economics, called the Gini correlation, is defined between a numerical response and a rank. We generalize the definition of this correlation so that it can be applied to data mining. The new definition, called the generalized Gini correlation, is found to include special cases that are equiv...
Article
We establish the limiting distribution (in total variation) of the quasi posteriors based on moment conditions, which only partially identify the parameters of interest. Some examples are discussed.
Article
Community detection has been an active research area for decades. Among all probabilistic models, Stochastic Block Model has been the most popular one. This paper introduces a novel probabilistic model: RW-HDP, based on random walks and Hierarchical Dirichlet Process, for community extraction. In RW-HDP, random walks conducted in a social network a...
Article
Recent advances in human neuroimaging have shown that it is possible to accurately decode how the brain perceives information based only on non-invasive functional magnetic resonance imaging measurements of brain activity. Two commonly used statistical approaches, namely, univariate analysis and multivariate pattern analysis often lead to distinct...
Article
Statistical inference based on moment conditions and estimating equations is of substantial interest when it is difficult to specify a full probabilistic model. We propose a Bayesian flavored model selection framework based on (quasi-)posterior probabilities from the Bayesian Generalized Method of Moments (BGMM), which allows us to incorporate two...
Article
Full-text available
We establish a fundamental relation between three different topics: Bayesian model selection, model averaging, and oracle performance. The relatively basic property of model selection consistency is shown to be equivalent to a seemingly more advanced distributional result, the oracle property. The result is very simple and general. There is no rest...
Article
A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. In this paper, we study how to construct confidence intervals of the LIFT measures. We point out the subtlety of this task...
Article
The Gibbs posterior is a useful tool for risk minimization, which adopts a Bayesian framework and can incorporate convenient computational algorithms such as Markov chain Monte Carlo. We derive risk bounds for the Gibbs posterior using some general nonasymptotic inequalities, which can be used to derive nearly optimal convergence rates and select m...
Article
Full-text available
We derive some simple relations that demonstrate how the posterior convergence rate is related to two driving factors: a “penalized divergence” of the prior, which measures the ability of the prior distribution to propose a nonnegligible set of working models to approximate the true model and a “norm complexity” of the prior, which measures the com...
Technical Report
Full-text available
A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. The LIFT measures are related to the ROC (Receiver Operator Characteristic), but there exist some important differences. In...
Article
Full-text available
An important practice in statistics is to use robust likelihood-free methods, such as the estimating equations, which only require assumptions on the moments instead of specifying the full probabilistic model. We propose a Bayesian flavored model selection approach for such likelihood-free methods, based on (quasi-)posterior probabilities from the...
Article
We propose Bayesian model selection based on composite datasets, which can be constructed from various subsample estimates. The method remains consistent without fully specifying a probability model, and is useful for dependent data, when asymptotic variance of the parameter estimator is difficult to estimate.
Article
Full-text available
We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form psi(ga+fx^Tfgb) are mixed. Here psi(...) is the inverse link function. Suppose the true response y follows an exponential family regression model with mean function belonging to a clas...
Article
In this paper, we summarize some recent results in Li et al. (2012), which can be used to extend an important PAC-Bayesian approach, namely the Gibbs posterior, to study the nonadditive ranking risk. The methodology is based on assumption-free risk bounds and nonasymptotic oracle inequalities, which leads to nearly optimal convergence rates and opt...
Article
Full-text available
In this letter, we consider a mixture-of-experts structure where m experts are mixed, with each expert being related to a polynomial regression model of order k. We study the convergence rate of the maximum likelihood estimator in terms of how fast the Hellinger distance of the estimated density converges to the true density, when the sample size n...
Article
Hoeffding’s inequality provides a probability bound for the deviation between the average of n independent bounded random variables and its mean. This paper introduces two inequalities that extend Hoeffding’s inequality to panel data, which consists of several mutually independent sequences of dependent data with strong mixing or with a depende...
Preprint
Full-text available
In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In th...
Chapter
In this paper we illustrate the so-called “indirect” method of inference, originally developed from the econometric literature, with analyses of three biological data sets involving longitudinal or repeated events data. This method is often more convenient computationally than maximum likelihood estimation when handling such model complexities as r...
Article
Full-text available
This letter considers Bayesian binary classification where data are assumed to consist of multiple time series (panel data) with binary class labels (binary choice). The observed data can be represented as {yit, xit}T,t=1i = 1, … , n. Here yit ∈ {0, 1} represents binary choices, and xit represents the exogenous variables. We consider prediction of...
Article
Full-text available
This paper addresses the estimation of the nonparametric conditional moment restricted model that involves an infinite-dimensional parameter $g_0$. We estimate it in a quasi-Bayesian way, based on the limited information likelihood, and investigate the impact of three types of priors on the posterior consistency: (i) truncated prior (priors support...
Article
We consider an approximate posterior approach to making joint probabilistic inference on the action and the associated risk in data mining. The posterior probability is based on a profile empirical likelihood, which imposes a moment restriction relating the action to the resulting risk, but does not otherwise require a probability model for the und...
Article
Full-text available
This paper considers the problem of predicting binary choices by selecting from a possibly large set of candidate explanatory variables, which can include both exogenous variables and lagged dependent variables. We consider risk minimization with the risk function being the predictive classification error. We study the convergence rates of empirica...
Article
Jiang and Tanner (2008) consider a method of classification using the Gibbs posterior which is directly constructed from the empirical classification errors. They propose an algorithm to sample from the Gibbs posterior which utilizes a smoothed approximation of the empirical classification error, via a Gibbs sampler with augmented latent variables....
Article
Full-text available
This paper presents a study of the large-sample behavior of the posterior distribution of a structural parameter which is partially identified by moment inequalities. The posterior density is derived based on the limited information likelihood. The posterior distribution converges to zero exponentially fast on any $\delta$-contraction outside the i...
Article
DNA methylation is an important epigenetic phenomenon that is associated with a variety of diseases, particularly cancers. Recent development of high throughput sequencing technology has enabled researchers to investigate the methylation rate at a single nucleotide resolution for any given sample. Testing for methylation rate equality or difference...
Article
Full-text available
The statistical learning theory of risk minimization depen ds heavily on probability bounds for uni- form deviations of the empirical risks. Classical probabil ity bounds using Hoeffding's inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding's...
Article
Full-text available
In the popular approach of "Bayesian variable selection" (BVS), one uses prior and posterior distributions to select a subset of can-didate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function...
Article
Full-text available
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number $K$ of explanatory variables $(x_1,...,x_K)$ is possibly much larger than the sample size $n$. For generalized linear models, if most of the $x_j$'s have very small effects on the response $y$, we show that it is possible to use Bayes...
Preprint
A nonparametric and locally adaptive Bayesian estimator is proposed for estimating a binary regression. Flexibility is obtained by modeling the binary regression as a mixture of probit regressions with the argument of each probit regression having a thin plate spline prior with its own smoothing parameter and with the mixture weights depending on t...
Article
A nonparametric and locally adaptive Bayesian estimator is proposed for estimating a binary regression. Flexibility is obtained by modeling the binary regression as a mixture of probit regressions with the argument of each probit regression having a thin plate spline prior with its own smoothing parameter and with the mixture weights depending on t...
Article
Full-text available
Food webs aim to provide a thorough representation of the trophic interactions found in an ecosystem. The complexity of empirical food webs, however, is leading many ecologists to focus dynamic ecosystem studies on smaller microcosm or mesocosm studies based upon community modules, which comprise three to five species and the interactions likely to...
Article
As a generalization of the accelerated failure time models, we consider parametric models of lifetime Y, where the conditional mean E(Y|X;beta) can depend nonlinearly on the covariates X and some parameters beta. The error distribution can be heteroscedastic and dependent on X. With observed data subject to right censoring, we propose regression an...
Article
Full-text available
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classificat...
Conference Paper
Full-text available
We report that mixtures of m multinomial logistic regression can be used to approxi- mate a class of 'smooth' probability mod- els for multiclass responses. With bounded second derivatives of log-odds, the approx- imation rate is O(m 2/s) in Hellinger dis- tance or O(m 4/s) in Kullback-Leibler di- vergence. Here s = dim(x) is the dimen- sion of the...
Article
Full-text available
This is a theoretical study of the consistency properties of Bayesian inference using mixtures of logistic regression models. When standard logistic regression models are combined in a mixtures-of-experts setup, a flexible model is formed to model the relationship between a binary (yes-no) response y and a vector of predictors x. Bayesian inference...
Article
Full-text available
This article presents an exposition and synthesis of the theory and some applications of the so-called indirect method of inference. These ideas have been exploited in the field of econometrics, but less so in other fields such as biostatistics and epidemiology. In the indirect method, statistical inference is based on an intermediate statistic, wh...
Article
Full-text available
This letter is a comprehensive account of some recent findings about AdaBoost in the presence of noisy data when approached from the perspective of statistical theory. We start from the basic assumption of weak hypotheses used in AdaBoost and study its validity and implications on generalization error. We recommend studying the generalization error...
Article
Full-text available
Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and N. Vayatis; and "Statistical behavior and consistency of classification methods based on convex risk minimization" [ibid., 56-85] by T. Zhang. Include...
Article
Full-text available
We address the problem of model comparison and model mixing in time series using the approach known as Hierarchical Mixtures-of-Experts. Our method-ology allows for comparisons of arbitrary models, not restricted to a particular class or parametric form. Additionally, the approach is flexible enough to incorporate exogenous information that can be...
Article
Full-text available
In this paper we describe the so-called indirect method of inference, originally developed from the econometric literature, and apply it to survival analyses of two data sets with repeated events. This method is often more convenient computationally than maximum likelihood estimation when handling such model complexities as random effects and measu...
Article
We consider model selection based on estimators that are asymptotically normal. Such a method can be applied to the context of estimating equations, since a complete specification of the probability model or likelihood function is not required. We construct a cost function for the models in consideration, and show that the minimizer of the cost fun...
Article
Full-text available
Previous researchers developed new learning architectures for sequential data by extending conventional hidden Markov models through the use of distributed state representations. Although exact inference and parameter estimation in these architectures is computationally intractable, Ghahramani and Jordan (1997) showed that approximate inference and...
Article
Full-text available
A Bayesian approach is presented for spatially adaptive nonparametric regression where the regression function is modelled as a mixture of splines.Each component spline in the mixture has associated with it a smoothing parameter which is defined over a local region of the covariate space. These local regions overlap such that individual data points...
Article
A Bayesian approach is presented for model selection in nonparametric regression with Gaussian errors and in binary nonparametric regression. A smoothness prior is assumed for each component of the model and the posterior probabilities of the candidate models are approximated using the Bayesian information criterion. We study the model selection me...
Article
We consider local likelihood or local estimating equations, in which a multivariate function Θ() is estimated but a derived function λ() of Θ() is of interest. In many applications, when most naturally formulated the derived function is a non-linear function of Θ(). In trying to understand whether the derived non-linear function is constant or line...
Article
In this paper we propose Bayesian and frequentist approaches to ecological inference, based on R×C contingency tables, including a covariate. The proposed Bayesian model extends the binomial-beta hierarchical model developed by King, Rosen and Tanner (1999) from the 2×2 case to the R×C case. As in the 2×2 case, the inferential procedure employs Mar...
Article
Full-text available
This is a survey of some theoretical results on boosting obtained from an analogous treatment of some regression and classification boosting algorithms. Some related papers include [J99] and [J00a,b,c,d], which is a set of (mutually overlapping) papers concerning the assumption of weak hypotheses, behavior of generalization error in the large time...
Article
Full-text available
The most basic property of the boosting algorithm is its ability to reduce the training error, subject to the critical assumption that the base learners generate weak hypotheses that are better that random guessing. We exploit analogies between regression and classification to give a characterization on what base learners generate weak hypotheses,...
Article
Full-text available
We consider the AdaBoost algorithm using the piecewise constant base hypotheses on the predictor space [0; 1]. The boosted solutions are not unique, and one exact solution after a sufficiently large number of rounds, is shown to generate the nearest neighbor rule. Asymptotic result for the prediction error is provided for piecewise Lipshitz signals...
Article
Full-text available
Introduction. Some recent experimental results [e.g., Friedman, Hastie and Tibshirani (1999); Grove and Schuurmans (1998); Mason et al. (1998)] and theoretical examples [Jiang (1999)] suggest that the AdaBoost algorithm [e.g., Schapire (1999); Freund and Schapire (1997)] can overfit in the limit of (very) large time (or the number of rounds of AdaB...
Article
Full-text available
this paper we present examples where `boosting forever ' leads to suboptimal predictions; while some regularization method, on the other hand, can achieve asymptotic optimality, at least in theory. We conjecture that this can be true in more general situations, and for some other regularization methods as well. Therefore the emerging literature on...
Article
Full-text available
When studying the training error and the prediction error for boosting, it is often assumed that the hypotheses returned by the base learner are weakly accurate, or are able to beat a random guesser by a certain amount of difference. It has been an open question how much this difference can be, whether it will eventually disappear in the boosting p...
Article
Full-text available
This paper tries to investigate these two major questions. The first question is answered by intuition gained from an analogy of boosting in least squares regression, where we will see that the weak learner assumption does not always hold, and that it does hold for a large class of base learners. For this purpose we introduce a geometric concept ca...
Conference Paper
Full-text available
. One basic property of the boosting algorithm is its ability to reduce the training error, subject to the critical assumption that the base learners generate `weak' (or more appropriately, `weakly accurate') hypotheses that are better that random guessing. We exploit analogies between regression and classification to give a characterization on wha...
Article
this paper that this is not the case for local estimation procedures, where the components in a given parameterization are to be modeled by local polynomials. We show that the bias that arises from the standard parameterization in local estimation procedures has a form that is more complex than is usually the case. In particular, the bias does not...
Article
Full-text available
The mixtures-of-experts (ME) methodology provides a tool of classification when experts of logistic regression models or Bernoulli models are mixed according to a set of local weights. We show that the Vapnik-Chervonenkis dimension of the ME architecture is bounded below by the number of experts m and above by O (m⁴s²), where s is the dimension of...
Article
Full-text available
In the class of hierarchical mixtures-of-experts (HME) models, “experts” in the exponential family with generalized linear mean functions of the form ψ(α+x<sup>T</sup>β) are mixed, according to a set of local weights called the “gating functions” depending on the predictor x. Here ψ(·) is the inverse link function. We provide regularity conditions...
Article
Full-text available
this paper, we adapt a mixture model originally developed for regression models with independent data for the more general case of correlated outcome data, which includes longitudinal data as a special case. The estimation is performed by a generalisation of the EM algorithm which we call the Expectation-Solution (ES) algorithm. In this ES algorith...
Article
Full-text available
This paper intends to focus on a commonly-used class of non-linear models, namely the log-linear models. Thall and Veil (1990) proposed a log-linear model for repeated counts in which two types of random effects were introduced: one is time-specific and the other is subject-specific. However it can be easily verified that their model still virtuall...
Article
In this paper we propose Bayesian and frequentist approaches to ecological inference,based on R \Theta C contingency tables, including a covariate. The proposed Bayesian modelextends the binomial-beta hierarchical model developed by King, Rosen and Tanner (1999)from the 2 \Theta 2 case to the R \Theta C case. As in the 2 \Theta 2 case, the inferent...
Article
In the context of regression rnodels with random effects, repeated response are traditionally assumed to be mutually independent conditional on the random effects. In order to asseess the validity of such an assumption and its impact on parameter inference, we propose an estimating equation methodology where both random eifects and within-subject c...
Article
In mixtures-of-experts (ME) models, "experts" of generalized linear models are combined, according to a set of local weights called the "gating function". The invariant transformations of the ME probability density functions include the permutations of the expert labels and the translations of the parameters in the gating functions. Under certain c...
Article
In this article we consider the design aspects of group sequential trials with recurrent study endpoints, where the subjects are from a heterogeneous population. The usual procedures of sequential analysis based on the “independent increments” property are no longer valid due to the heterogeneity of the study subjects, as pointed out by Cook and La...
Article
Full-text available
Statistical methodology is presented for the analysis of multiple events with random effects and measurement error. We model multiple events in a general space using a random measure, and define point process regression models with residual random effects. Our approach to parameter estimation and significance testing is to start with a simple naive...
Article
Full-text available
this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and one-parameter exponential family HIERARCHICAL MIXTURES-OF-EXPERTS 3 regression models. Generalized linear models are wi...
Article
Full-text available
Statistical methodology is presented for the regression analysis of multiple events in the presence of random effects and measurement error. Omitted covariates are modeled as random effects. Our approach to parameter estimation and significance testing is to start with a naive model of semi-parametric Poisson process regression, and then to adjust...
Article
Full-text available
This paper investigates group sequential procedures for recurrent events data, allowing frailty (see Oakes, 1992), or the random heterogeneity of event frequencies among different subjects (see Lawless, 1987; Turnbull, Jiang, and Clark, 1997). Recurrent events data consist of individuals each being able to develop a number of events over time. Exam...
Article
Full-text available
We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form /(ff + x T fi) are mixed. Here /(Delta) is the inverse link function. Suppose the true response y follows an exponential family regression model with mean function belonging to a class...
Article
Full-text available
We investigate a class of hierarchical mixtures-of-experts (HME) models where generalized linear models with nonlinear mean functions of the form ψ(α + xTβ) are mixed. Here ψ(·) is the inverse link function. It is shown that mixtures of such mean functions can approximate a class of smooth functions of the form ψ(h(x)), where h(·) ε W∞2;k (a Sobole...
Article
Statistical methodology is presented for the statistical analysis of non-linear measurement error models. Our approach is to provide adjustments for the usual maximum likelihood estimators, their standard errors and associated significance tests in order to account for the presence of measurement error in some of the covariates. We illustrate the t...
Article
Holomorphic Fueter functions of the position quaternion form a subgroup of Euclidean space‐time diffeomorphisms. An O(4) covariant treatment of such mappings is presented with the quaternionic argument x being replaced by either x or xp involving self‐dual and anti‐self‐dual structures and p denoting an arbitrary Euclidean time direction. An infini...
Article
It is shown that the structures of the c-number parts of the Schwinger terms for the current algebra and stress tensor algebra are universal for conformal field theories in higher than two dimensions. New results on the structure of these Schwinger terms are presented. Some related general conclusions will be obtained from these results.
Article
In this paper a simple derivation of the anomalous Ward identity in the Aa0=0 gauge of Yang–Mills theory is presented. The same method is also applied to the gravitational case and the anomalous Ward identities in both the quasiconformal gauge and the spatial gauge are obtained. The relation between the Schwinger terms and the consistent anomalies...
Article
In this paper the operator extensions for infinite algebras of residual gauge transformations and space-time transformations in physical space-time are discussed on general grounds. Possible general forms of the extensions are presented and their realizations in quantum field theories are discussed. The meaning of nontriviality for the extensions w...

Network

Cited By