
Wenxin Jiang- statistics
- Northwestern University
Wenxin Jiang
- statistics
- Northwestern University
About
98
Publications
8,569
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,895
Citations
Introduction
Current institution
Publications
Publications (98)
Compositional data are quantitative descriptions of the parts of some whole, conveying relative information, which are ubiquitous in many fields. There has been a spate of interest in association networks for such data in biological and medical research, for example, microbial interaction networks. In this paper, we propose a novel method, the exte...
We provide analytic formulas for the standard error and confidence intervals for the F measures, based on a property of asymptotic normality in the large sample limit. The formula can be applied for sample size planning in order to achieve accurate enough estimation of these F measures.
To deal with the growing challenge from high dimensional data, we propose a conditional variable screening method for linear models named as conditional screening via ordinary least squares projection (COLP). COLP can take advantage of prior knowledge concerning certain active predictors by eliminating the adverse impacts from their coefficients in...
Word embeddings capture syntactic and semantic information about words. Definition modeling aims to make the semantic content in each embedding explicit, by outputting a natural language definition based on the embedding. However, existing definition models are limited in their ability to generate accurate definitions for different senses of the sa...
Several statistical issues associated with health care costs, such as heteroscedasticity and severe skewness, make it challenging to estimate or predict medical costs. When the interest is modeling the mean cost, it is desirable to make no assumption on the density function or higher order moments. Another challenge in developing cost prediction mo...
When model uncertainty is handled by Bayesian model averaging (BMA) or Bayesian model selection (BMS), the posterior distribution possesses a desirable “oracle property” for parametric inference, if for large enough data it is nearly as good as the oracle posterior, obtained by assuming unrealistically that the true model is known and only the true...
Network community detection is an important area of research. In this work, we propose a novel nonparametric probabilistic model for this task. We conduct random walks on the network and apply the Hierarchical Dirichlet Process topic model on the random walk data to explore the community structure of the network. Our work is among the very few ende...
We study a classification problem of multiple sclerosis (MS) lesions in three dimensional brain magnetic resonance (MR) images. Segmentation of MS lesions is essential for MS diagnosis, assessment of disease progression and evaluation of treatment efficacy. Accurate identification of MS lesions in MR images is challenging due to variability in lesi...
The iterative version of the sure independence screening algorithm (ISIS) has been widely employed in various scientific fields since Fan and Lv [2008] proposed it. Despite the outstanding performance of ISIS in extensive applications, its sure screening property has not been theoretically verified during the past decade. To fill this gap, we adapt...
We study a partially identified linear contextual effects model for ecological inference, and we describe how to perform inference on the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. We derive various bounds for this non-identified parameter of the contextual effect...
Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We study a partially identified linear contextual effects model for EI and describe how to estimate the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. This may be regar...
In this paper, we study computation of the range of posterior expectations that arise from robust Bayesian statistics. We compute supremum and infimum of the posterior expectations, when allowing uncertainty for the choice of the likelihood function, or uncertainty for the choice of the prior distribution. In the standard approach of sensitivity an...
Background
Metabolic syndrome has become a major public health challenge worldwide. The association between metabolic syndrome and DNA methylation is of great research interest. ResultsWe constructed a binomial model to investigate the association between a metabolic syndrome index and DNA methylation in the Normative Aging Study. We applied the It...
An asymmetric correlation measure commonly used in social economics, called the Gini correlation, is defined between a numerical response and a rank. We generalize the definition of this correlation so that it can be applied to data mining. The new definition, called the generalized Gini correlation, is found to include special cases that are equiv...
We establish the limiting distribution (in total variation) of the quasi posteriors based on moment conditions, which only partially identify the parameters of interest. Some examples are discussed.
Community detection has been an active research area for decades. Among all probabilistic models, Stochastic Block Model has been the most popular one. This paper introduces a novel probabilistic model: RW-HDP, based on random walks and Hierarchical Dirichlet Process, for community extraction. In RW-HDP, random walks conducted in a social network a...
Recent advances in human neuroimaging have shown that it is possible to accurately decode how the brain perceives information based only on non-invasive functional magnetic resonance imaging measurements of brain activity. Two commonly used statistical approaches, namely, univariate analysis and multivariate pattern analysis often lead to distinct...
Statistical inference based on moment conditions and estimating equations is of substantial interest when it is difficult to specify a full probabilistic model. We propose a Bayesian flavored model selection framework based on (quasi-)posterior probabilities from the Bayesian Generalized Method of Moments (BGMM), which allows us to incorporate two...
We establish a fundamental relation between three different topics: Bayesian
model selection, model averaging, and oracle performance. The relatively basic
property of model selection consistency is shown to be equivalent to a
seemingly more advanced distributional result, the oracle property. The result
is very simple and general. There is no rest...
A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. In this paper, we study how to construct confidence intervals of the LIFT measures. We point out the subtlety of this task...
The Gibbs posterior is a useful tool for risk minimization, which adopts a Bayesian framework and can incorporate convenient computational algorithms such as Markov chain Monte Carlo. We derive risk bounds for the Gibbs posterior using some general nonasymptotic inequalities, which can be used to derive nearly optimal convergence rates and select m...
We derive some simple relations that demonstrate how the posterior convergence rate is related to two driving factors: a “penalized divergence” of the prior, which measures the ability of the prior distribution to propose a nonnegligible set of working models to approximate the true model and a “norm complexity” of the prior, which measures the com...
A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. The LIFT measures are related to the ROC (Receiver Operator Characteristic), but there exist some important differences. In...
An important practice in statistics is to use robust likelihood-free methods,
such as the estimating equations, which only require assumptions on the moments
instead of specifying the full probabilistic model. We propose a Bayesian
flavored model selection approach for such likelihood-free methods, based on
(quasi-)posterior probabilities from the...
We propose Bayesian model selection based on composite datasets, which can be constructed from various subsample estimates. The method remains consistent without fully specifying a probability model, and is useful for dependent data, when asymptotic variance of the parameter estimator is difficult to estimate.
We investigate a class of hierarchical mixtures-of-experts (HME) models where
exponential family regression models with generalized linear mean functions of
the form psi(ga+fx^Tfgb) are mixed. Here psi(...) is the inverse link function.
Suppose the true response y follows an exponential family regression model with
mean function belonging to a clas...
In this paper, we summarize some recent results in Li et al. (2012), which can be used to extend an important PAC-Bayesian approach, namely the Gibbs posterior, to study the nonadditive ranking risk. The methodology is based on assumption-free risk bounds and nonasymptotic oracle inequalities, which leads to nearly optimal convergence rates and opt...
In this letter, we consider a mixture-of-experts structure where m experts are mixed, with each expert being related to a polynomial regression model of order k. We study the convergence rate of the maximum likelihood estimator in terms of how fast the Hellinger distance of the estimated density converges to the true density, when the sample size n...
Hoeffding’s inequality provides a probability bound for the deviation between the average of n independent bounded random variables and its mean. This paper introduces two inequalities that extend Hoeffding’s inequality to panel data, which consists of several mutually independent sequences of dependent data with strong mixing or with a depende...
In mixtures-of-experts (ME) model, where a number of submodels (experts) are
combined, there have been two longstanding problems: (i) how many experts
should be chosen, given the size of the training data? (ii) given the total
number of parameters, is it better to use a few very complex experts, or is it
better to combine many simple experts? In th...
In this paper we illustrate the so-called “indirect” method of inference, originally developed from the econometric literature, with analyses of three biological data sets involving longitudinal or repeated events data. This method is often more convenient computationally than maximum likelihood estimation when handling such model complexities as r...
This letter considers Bayesian binary classification where data are assumed to consist of multiple time series (panel data) with binary class labels (binary choice). The observed data can be represented as {yit, xit}T,t=1i = 1, … , n. Here yit ∈ {0, 1} represents binary choices, and xit represents the exogenous variables. We consider prediction of...
This paper addresses the estimation of the nonparametric conditional moment
restricted model that involves an infinite-dimensional parameter $g_0$. We
estimate it in a quasi-Bayesian way, based on the limited information
likelihood, and investigate the impact of three types of priors on the
posterior consistency: (i) truncated prior (priors support...
We consider an approximate posterior approach to making joint probabilistic inference on the action and the associated risk in data mining. The posterior probability is based on a profile empirical likelihood, which imposes a moment restriction relating the action to the resulting risk, but does not otherwise require a probability model for the und...
This paper considers the problem of predicting binary choices by selecting from a possibly large set of candidate explanatory variables, which can include both exogenous variables and lagged dependent variables. We consider risk minimization with the risk function being the predictive classification error. We study the convergence rates of empirica...
Jiang and Tanner (2008) consider a method of classification using the Gibbs posterior which is directly constructed from the empirical classification errors. They propose an algorithm to sample from the Gibbs posterior which utilizes a smoothed approximation of the empirical classification error, via a Gibbs sampler with augmented latent variables....
This paper presents a study of the large-sample behavior of the posterior distribution of a structural parameter which is partially identified by moment inequalities. The posterior density is derived based on the limited information likelihood. The posterior distribution converges to zero exponentially fast on any $\delta$-contraction outside the i...
DNA methylation is an important epigenetic phenomenon that is associated with a variety of diseases, particularly cancers. Recent development of high throughput sequencing technology has enabled researchers to investigate the methylation rate at a single nucleotide resolution for any given sample. Testing for methylation rate equality or difference...
The statistical learning theory of risk minimization depen ds heavily on probability bounds for uni- form deviations of the empirical risks. Classical probabil ity bounds using Hoeffding's inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding's...
In the popular approach of "Bayesian variable selection" (BVS), one uses prior and posterior distributions to select a subset of can-didate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function...
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number $K$ of explanatory variables $(x_1,...,x_K)$ is possibly much larger than the sample size $n$. For generalized linear models, if most of the $x_j$'s have very small effects on the response $y$, we show that it is possible to use Bayes...
A nonparametric and locally adaptive Bayesian estimator is proposed for estimating a binary regression. Flexibility is obtained by modeling the binary regression as a mixture of probit regressions with the argument of each probit regression having a thin plate spline prior with its own smoothing parameter and with the mixture weights depending on t...
A nonparametric and locally adaptive Bayesian estimator is proposed for estimating a binary regression. Flexibility is obtained by modeling the binary regression as a mixture of probit regressions with the argument of each probit regression having a thin plate spline prior with its own smoothing parameter and with the mixture weights depending on t...
Food webs aim to provide a thorough representation of the trophic interactions found in an ecosystem. The complexity of empirical food webs, however, is leading many ecologists to focus dynamic ecosystem studies on smaller microcosm or mesocosm studies based upon community modules, which comprise three to five species and the interactions likely to...
As a generalization of the accelerated failure time models, we consider parametric models of lifetime Y, where the conditional mean E(Y|X;beta) can depend nonlinearly on the covariates X and some parameters beta. The error distribution can be heteroscedastic and dependent on X. With observed data subject to right censoring, we propose regression an...
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classificat...
We report that mixtures of m multinomial logistic regression can be used to approxi- mate a class of 'smooth' probability mod- els for multiclass responses. With bounded second derivatives of log-odds, the approx- imation rate is O(m 2/s) in Hellinger dis- tance or O(m 4/s) in Kullback-Leibler di- vergence. Here s = dim(x) is the dimen- sion of the...
This is a theoretical study of the consistency properties of Bayesian inference using mixtures of logistic regression models. When standard logistic regression models are combined in a mixtures-of-experts setup, a flexible model is formed to model the relationship between a binary (yes-no) response y and a vector of predictors x. Bayesian inference...
This article presents an exposition and synthesis of the theory and some applications of the so-called indirect method of inference. These ideas have been exploited in the field of econometrics, but less so in other fields such as biostatistics and epidemiology. In the indirect method, statistical inference is based on an intermediate statistic, wh...
This letter is a comprehensive account of some recent findings about AdaBoost in the presence of noisy data when approached from the perspective of statistical theory. We start from the basic assumption of weak hypotheses used in AdaBoost and study its validity and implications on generalization error. We recommend studying the generalization error...
Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and N. Vayatis; and "Statistical behavior and consistency of classification methods based on convex risk minimization" [ibid., 56-85] by T. Zhang. Include...
We address the problem of model comparison and model mixing in time series using the approach known as Hierarchical Mixtures-of-Experts. Our method-ology allows for comparisons of arbitrary models, not restricted to a particular class or parametric form. Additionally, the approach is flexible enough to incorporate exogenous information that can be...
In this paper we describe the so-called indirect method of inference, originally developed from the econometric literature, and apply it to survival analyses of two data sets with repeated events. This method is often more convenient computationally than maximum likelihood estimation when handling such model complexities as random effects and measu...
We consider model selection based on estimators that are asymptotically normal. Such a method can be applied to the context of estimating equations, since a complete specification of the probability model or likelihood function is not required. We construct a cost function for the models in consideration, and show that the minimizer of the cost fun...
Previous researchers developed new learning architectures for sequential data by extending conventional hidden Markov models through the use of distributed state representations. Although exact inference and parameter estimation in these architectures is computationally intractable, Ghahramani and Jordan (1997) showed that approximate inference and...
A Bayesian approach is presented for spatially adaptive nonparametric regression where the regression function is modelled as a mixture of splines.Each component spline in the mixture has associated with it a smoothing parameter which is defined over a local region of the covariate space. These local regions overlap such that individual data points...
A Bayesian approach is presented for model selection in nonparametric regression with Gaussian errors and in binary nonparametric regression. A smoothness prior is assumed for each component of the model and the posterior probabilities of the candidate models are approximated using the Bayesian information criterion. We study the model selection me...
We consider local likelihood or local estimating equations, in which a multivariate function Θ() is estimated but a derived function λ() of Θ() is of interest. In many applications, when most naturally formulated the derived function is a non-linear function of Θ(). In trying to understand whether the derived non-linear function is constant or line...
In this paper we propose Bayesian and frequentist approaches to ecological inference, based on R×C contingency tables, including a covariate. The proposed Bayesian model extends the binomial-beta hierarchical model developed by King, Rosen and Tanner (1999) from the 2×2 case to the R×C case. As in the 2×2 case, the inferential procedure employs Mar...
This is a survey of some theoretical results on boosting obtained from an analogous treatment of some regression and classification boosting algorithms. Some related papers include [J99] and [J00a,b,c,d], which is a set of (mutually overlapping) papers concerning the assumption of weak hypotheses, behavior of generalization error in the large time...
The most basic property of the boosting algorithm is its ability to reduce the training error, subject to the critical assumption that the base learners generate weak hypotheses that are better that random guessing. We exploit analogies between regression and classification to give a characterization on what base learners generate weak hypotheses,...
We consider the AdaBoost algorithm using the piecewise constant base hypotheses on the predictor space [0; 1]. The boosted solutions are not unique, and one exact solution after a sufficiently large number of rounds, is shown to generate the nearest neighbor rule. Asymptotic result for the prediction error is provided for piecewise Lipshitz signals...
Introduction. Some recent experimental results [e.g., Friedman, Hastie and Tibshirani (1999); Grove and Schuurmans (1998); Mason et al. (1998)] and theoretical examples [Jiang (1999)] suggest that the AdaBoost algorithm [e.g., Schapire (1999); Freund and Schapire (1997)] can overfit in the limit of (very) large time (or the number of rounds of AdaB...
this paper we present examples where `boosting forever ' leads to suboptimal predictions; while some regularization method, on the other hand, can achieve asymptotic optimality, at least in theory. We conjecture that this can be true in more general situations, and for some other regularization methods as well. Therefore the emerging literature on...
When studying the training error and the prediction error for boosting, it is often assumed that the hypotheses returned by the base learner are weakly accurate, or are able to beat a random guesser by a certain amount of difference. It has been an open question how much this difference can be, whether it will eventually disappear in the boosting p...
This paper tries to investigate these two major questions. The first question is answered by intuition gained from an analogy of boosting in least squares regression, where we will see that the weak learner assumption does not always hold, and that it does hold for a large class of base learners. For this purpose we introduce a geometric concept ca...
. One basic property of the boosting algorithm is its ability to reduce the training error, subject to the critical assumption that the base learners generate `weak' (or more appropriately, `weakly accurate') hypotheses that are better that random guessing. We exploit analogies between regression and classification to give a characterization on wha...
this paper that this is not the case for local estimation procedures, where the components in a given parameterization are to be modeled by local polynomials. We show that the bias that arises from the standard parameterization in local estimation procedures has a form that is more complex than is usually the case. In particular, the bias does not...
The mixtures-of-experts (ME) methodology provides a tool of classification when experts of logistic regression models or Bernoulli models are mixed according to a set of local weights. We show that the Vapnik-Chervonenkis dimension of the ME architecture is bounded below by the number of experts m and above by O (m⁴s²), where s is the dimension of...
In the class of hierarchical mixtures-of-experts (HME) models,
“experts” in the exponential family with generalized linear
mean functions of the form ψ(α+x<sup>T</sup>β) are mixed,
according to a set of local weights called the “gating
functions” depending on the predictor x. Here ψ(·) is
the inverse link function. We provide regularity conditions...
this paper, we adapt a mixture model originally developed for regression models with independent data for the more general case of correlated outcome data, which includes longitudinal data as a special case. The estimation is performed by a generalisation of the EM algorithm which we call the Expectation-Solution (ES) algorithm. In this ES algorith...
This paper intends to focus on a commonly-used class of non-linear models, namely the log-linear models. Thall and Veil (1990) proposed a log-linear model for repeated counts in which two types of random effects were introduced: one is time-specific and the other is subject-specific. However it can be easily verified that their model still virtuall...
In this paper we propose Bayesian and frequentist approaches to ecological inference,based on R \Theta C contingency tables, including a covariate. The proposed Bayesian modelextends the binomial-beta hierarchical model developed by King, Rosen and Tanner (1999)from the 2 \Theta 2 case to the R \Theta C case. As in the 2 \Theta 2 case, the inferent...
In the context of regression rnodels with random effects, repeated response are traditionally assumed to be mutually independent conditional on the random effects. In order to asseess the validity of such an assumption and its impact on parameter inference, we propose an estimating equation methodology where both random eifects and within-subject c...
In mixtures-of-experts (ME) models, "experts" of generalized linear models are combined, according to a set of local weights called the "gating function". The invariant transformations of the ME probability density functions include the permutations of the expert labels and the translations of the parameters in the gating functions. Under certain c...
In this article we consider the design aspects of group sequential trials with recurrent study endpoints, where the subjects are from a heterogeneous population. The usual procedures of sequential analysis based on the “independent increments” property are no longer valid due to the heterogeneity of the study subjects, as pointed out by Cook and La...
Statistical methodology is presented for the analysis of multiple events with random effects and measurement error. We model multiple events in a general space using a random measure, and define point process regression models with residual random effects. Our approach to parameter estimation and significance testing is to start with a simple naive...
this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and one-parameter exponential family HIERARCHICAL MIXTURES-OF-EXPERTS 3 regression models. Generalized linear models are wi...
Statistical methodology is presented for the regression analysis of multiple events in the presence of random effects and measurement error. Omitted covariates are modeled as random effects. Our approach to parameter estimation and significance testing is to start with a naive model of semi-parametric Poisson process regression, and then to adjust...
This paper investigates group sequential procedures for recurrent events data, allowing frailty (see Oakes, 1992), or the random heterogeneity of event frequencies among different subjects (see Lawless, 1987; Turnbull, Jiang, and Clark, 1997). Recurrent events data consist of individuals each being able to develop a number of events over time. Exam...
We investigate a class of hierarchical mixtures-of-experts (HME) models where exponential family regression models with generalized linear mean functions of the form /(ff + x T fi) are mixed. Here /(Delta) is the inverse link function. Suppose the true response y follows an exponential family regression model with mean function belonging to a class...
We investigate a class of hierarchical mixtures-of-experts (HME) models where generalized linear models with nonlinear mean functions of the form ψ(α + xTβ) are mixed. Here ψ(·) is the inverse link function. It is shown that mixtures of such mean functions can approximate a class of smooth functions of the form ψ(h(x)), where h(·) ε W∞2;k (a Sobole...
Statistical methodology is presented for the statistical analysis of non-linear measurement error models. Our approach is to provide adjustments for the usual maximum likelihood estimators, their standard errors and associated significance tests in order to account for the presence of measurement error in some of the covariates. We illustrate the t...
Holomorphic Fueter functions of the position quaternion form a subgroup of Euclidean space‐time diffeomorphisms. An O(4) covariant treatment of such mappings is presented with the quaternionic argument x being replaced by either x or xp involving self‐dual and anti‐self‐dual structures and p denoting an arbitrary Euclidean time direction. An infini...
It is shown that the structures of the c-number parts of the Schwinger terms for the current algebra and stress tensor algebra are universal for conformal field theories in higher than two dimensions. New results on the structure of these Schwinger terms are presented. Some related general conclusions will be obtained from these results.
In this paper a simple derivation of the anomalous Ward identity in the Aa0=0 gauge of Yang–Mills theory is presented. The same method is also applied to the gravitational case and the anomalous Ward identities in both the quasiconformal gauge and the spatial gauge are obtained. The relation between the Schwinger terms and the consistent anomalies...
In this paper the operator extensions for infinite algebras of residual gauge transformations and space-time transformations in physical space-time are discussed on general grounds. Possible general forms of the extensions are presented and their realizations in quantum field theories are discussed. The meaning of nontriviality for the extensions w...
Thesis (Ph.D.)--Cornell University, August, 1996. Includes bibliographical references (leaves 183-186).