Publications (41)39.26 Total impact

Article: Confidence Bands for Distribution Functions: A New Look at the Law of the Iterated Logarithm
[Show abstract] [Hide abstract]
ABSTRACT: We present a general law of the iterated logarithm for stochastic processes on the open unit interval having subexponential tails in a locally uniform fashion. It applies to standard Brownian bridge but also to suitably standardized empirical distribution functions. This leads to new goodnessoffit tests and confidence bands which refine the procedures of Berk and Jones (1979) and Owen (1995). Roughly speaking, the high power and accuracy of the latter procedures in the tail regions of distributions are esentially preserved while gaining considerably in the central region.02/2014;  [Show abstract] [Hide abstract]
ABSTRACT: We consider nonparametric maximumlikelihood estimation of a logconcave density in case of interval or rightcensored or binned data. Theoretical properties are studied and an algorithm is proposed for the approximate computation of the estimator.Electronic Journal of Statistics 11/2013; 8. · 0.79 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: In the setting of highdimensional linear models with Gaussian noise, we investigate the possibility of confidence statements connected to model selection. Although there exist numerous procedures for adaptive (point) estimation, the construction of adaptive confidence regions is severely limited (cf. Li in Ann Stat 17:1001–1008, 1989). The present paper sheds new light on this gap. We develop exact and adaptive confidence regions for the best approximating model in terms of risk. One of our constructions is based on a multiscale procedure and a particular coupling argument. Utilizing exponential inequalities for noncentral χ 2distributions, we show that the risk and quadratic loss of all models within our confidence region are uniformly bounded by the minimal risk times a factor close to one.Probability Theory and Related Fields 01/2013; 155(34). · 1.39 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. We investigate the moderately illposed setting, where the Fourier transform of the error density in the deconvolution model is of polynomial decay. For multiscale testing, we consider a calibration, motivated by the modulus of continuity of Brownian motion. We investigate the performance of our results from both the theoretical and simulation based point of view. A major consequence of our work is that the detection of qualitative features of a density in a deconvolution problem is a doable task although the minimax rates for pointwise estimation are very slow.The Annals of Statistics 07/2011; 41(3). · 2.53 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Let $P$ be a probability distribution on $q$dimensional space. The socalled DiaconisFreedman effect means that for a fixed dimension $d << q$, most $d$dimensional projections of $P$ look like a scale mixture of spherically symmetric Gaussian distributions. The present paper provides necessary and sufficient conditions for this phenomenon in a suitable asymptotic framework with increasing dimension $q$. It turns out, that the conditions formulated by Diaconis and Freedman (1984) are not only sufficient but necessary as well. Moreover, letting $\hat{P}$ be the empirical distribution of $n$ independent random vectors with distribution $P$, we investigate the behavior of the empirical process $\sqrt{n}(\hat{P}  P)$ under random projections, conditional on $\hat{P}$.07/2011;  [Show abstract] [Hide abstract]
ABSTRACT: This paper introduces and analyzes a stochastic search method for parameter estimation in linear regression models in the spirit of Beran and Millar (1987). The idea is to generate a random finite subset of a parameter space which will automatically contain points which are very close to an unknown true parameter. The motivation for this procedure comes from recent work of Duembgen, Samworth and Schuhmacher (2011) on regression models with logconcave error distributions.06/2011;  [Show abstract] [Hide abstract]
ABSTRACT: We derive multiscale statistics for deconvolution in order to detect qualitative features of the unknown density. An important example covered within this framework is to test for local monotonicity on all scales simultaneously. The errors in the deconvolution model are restricted to a certain class of distributions that include Laplace, Gamma and Exponential random variables. Our approach relies on inversion formulas for deconvolution operators. For multiscale testing, we consider a calibration, motivated by the modulus of continuity of Brownian motion. We investigate the performance of our results from both the theoretical and simulation based point of view. A major consequence of our work is that the detection of qualitative features of a density in a deconvolution problem is a doable task although the minimax rates for pointwise estimation are very slow.01/2011;  [Show abstract] [Hide abstract]
ABSTRACT: We review various inequalities for Mills' ratio (1  \Phi)/\phi, where \phi and \Phi denote the standard Gaussian density and distribution function, respectively. Elementary considerations involving finite continued fractions lead to a general approximation scheme which implies and refines several known bounds.12/2010;  [Show abstract] [Hide abstract]
ABSTRACT: We study the approximation of arbitrary distributions $P$ on $d$dimensional space by distributions with logconcave density. Approximation means minimizing a KullbackLeiblertype functional. We show that such an approximation exists if and only if $P$ has finite first moments and is not supported by some hyperplane. Furthermore we show that this approximation depends continuously on $P$ with respect to Mallows distance $D_1(\cdot,\cdot)$. This result implies consistency of the maximum likelihood estimator of a logconcave density under fairly general conditions. It also allows us to prove existence and consistency of estimators in regression models with a response $Y=\mu(X)+\epsilon$, where $X$ and $\epsilon$ are independent, $\mu(\cdot)$ belongs to a certain class of regression functions while $\epsilon$ is a random error with logconcave density and mean zero.The Annals of Statistics 02/2010; · 2.53 Impact Factor 
Article: Nemirovski's Inequalities Revisited.
[Show abstract] [Hide abstract]
ABSTRACT: An important tool for statistical research are moment inequalities for sums of independent random vectors. Nemirovski and coworkers (1983, 2000) derived one particular type of such inequalities: For certain Banach spaces $(\B,\\cdot\)$ there exists a constant $K = K(\B,\\cdot\)$ such that for arbitrary independent and centered random vectors $X_1, X_2, ..., X_n \in \B$, their sum $S_n$ satisfies the inequality $ E \S_n \^2 \le K \sum_{i=1}^n E \X_i\^2$. We present and compare three different approaches to obtain such inequalities: Nemirovski's results are based on deterministic inequalities for norms. Another possible vehicle are type and cotype inequalities, a tool from probability theory on Banach spaces. Finally, we use a truncation argument plus Bernstein's inequality to obtain another version of the moment inequality above. Interestingly, all three approaches have their own merits.The American Mathematical Monthly 01/2010; 117(2):138160. · 0.29 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: This note proves Hellingerconsistency for the nonparametric maximum likelihood estimator of a logconcave probability density on .Statistics [?] Probability Letters 01/2010; 80(56):376380. · 0.53 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: In this paper we show that the family P_d of probability distributions on R^d with logconcave densities satisfies a strong continuity condition. In particular, it turns out that weak convergence within this family entails (i) convergence in total variation distance, (ii) convergence of arbitrary moments, and (iii) pointwise convergence of Laplace transforms. Hence the nonparametric model P_d has similar properties as parametric models such as, for instance, the family of all dvariate Gaussian distributions.07/2009;  [Show abstract] [Hide abstract]
ABSTRACT: The computation of robust regression estimates often relies on minimization of a convex functional on a convex set. In this paper we discuss a general technique for a large class of convex functionals to compute the minimizers iteratively which is closely related to majorizationminimization algorithms. Our approach is based on a quadratic approximation of the functional to be minimized and includes the iteratively reweighted least squares algorithm as a special case. We prove convergence on convex function spaces for general coercive and convex functionals F and derive geometric convergence in certain unconstrained settings. The algorithm is applied to TV penalized quantile regression and is compared with a step size corrected NewtonRaphson algorithm. It is found that typically in the first steps the iteratively reweighted least squares algorithm performs significantly better, whereas the Newton type method outpaces the former only after many iterations. Finally, in the setting of bivariate regression with unimodality constraints we illustrate how this algorithm allows to utilize highly efficient algorithms for special quadratic programs in more complex settings.01/2009; 
Article: Invariant coordinate selection
[Show abstract] [Hide abstract]
ABSTRACT: A general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented. The method is based on the eigenvalueeigenvector decomposition of one scatter matrix relative to another. In particular, it is shown that the eigenvectors can be used to generate an affine invariant coordinate system for the multivariate data. Consequently, we view this method as a method for "invariant coordinate selection". By plotting the data with respect to this new invariant coordinate system, various data structures can be revealed. For example, under certain independent components models, it is shown that the invariant co ordinates correspond to the independent components. Another example pertains to mixtures of elliptical distributions. In this case, it is shown that a subset of the invariant coordinates corresponds to Fisher's linear discriminant subspace, even though the class identifications of the data points are unknown. Some illustrative examples are given. Copyright (c) 2009 Royal Statistical Society.Journal of the Royal Statistical Society Series B (Statistical Methodology) 01/2009; 71(3):549592. · 4.81 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: In this paper we describe active set type algorithms for minimization of a smooth function under general order constraints, an important case being functions on the set of bimonotone rbys matrices. These algorithms can be used, for instance, to estimate a bimonotone regression function via least squares or (a smooth approximation of) least absolute deviations. Another application is shrinkage estimation in image denoising or, more generally, regression problems with two ordinal factors after representing the data in a suitable basis which is indexed by pairs (i,j) in {1,...,r}x{1,...,s}. Various numerical examples illustrate our methods.Statistics and Computing 09/2008; · 1.98 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: In this note we provide explicit expressions and expansions for a special function which appears in nonparametric estimation of logdensities. This function returns the integral of a loglinear function on a simplex of arbitrary dimension. In particular it is used in the Rpackage "LogCondDEAD" by Cule et al. (2007).08/2008;  [Show abstract] [Hide abstract]
ABSTRACT: Suppose that we observe independent random pairs (X_1,Y_1), (X_2,Y_2), ..., (X_n,Y_n).Our goal is to estimate regression functions such as the conditional mean or betaquantile of Y given X, where 0 < beta < 1. In order to achieve this we minimize criteria such as, for instance, the sum of rho(f(X_i)  Y_i) over all i plus lambda * TV(f) among all candidate functions. Here rho(.) is some convex function depending on the particular regression function we have in mind, TV(f) stands for the total variation of f, and lambda > 0 is some tuning parameter. This framework is extended further to include binary or Poisson regression, and to include localized total variation penalties. The latter are needed to construct estimators adapting to inhomogeneous smoothness of f. For the general framework we develop noniterative algorithms for the solution of the minimization problems which are closely related to the taut string algorithm (cf. Davies and Kovac, 2001). Further we establish a connection between the present setting and monotone regression, extending previous work by Mammen and van de Geer (1997).Electronic Journal of Statistics 04/2008; · 0.79 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: In the setting of highdimensional linear models with Gaussian noise, we investigate the possibility of confidence statements connected to model selection. Although there exist numerous procedures for adaptive point estimation, the construction of adaptive confidence regions is severely limited (cf. Li, 1989). The present paper sheds new light on this gap. We develop exact and adaptive confidence sets for the best approximating model in terms of risk. One of our constructions is based on a multiscale procedure and a particular coupling argument. Utilizing exponential inequalities for noncentral chisquared distributions, we show that the risk and quadratic loss of all models within our confidence region are uniformly bounded by the minimal risk times a factor close to one.02/2008; 
Article: Pvalues for classification
[Show abstract] [Hide abstract]
ABSTRACT: Let $(X,Y)$ be a random variable consisting of an observed feature vector $X\in \mathcal{X}$ and an unobserved class label $Y\in \{1,2,...,L\}$ with unknown joint distribution. In addition, let $\mathcal{D}$ be a training data set consisting of $n$ completely observed independent copies of $(X,Y)$. Usual classification procedures provide point predictors (classifiers) $\widehat{Y}(X,\mathcal{D})$ of $Y$ or estimate the conditional distribution of $Y$ given $X$. In order to quantify the certainty of classifying $X$ we propose to construct for each $\theta =1,2,...,L$ a pvalue $\pi_{\theta}(X,\mathcal{D})$ for the null hypothesis that $Y=\theta$, treating $Y$ temporarily as a fixed parameter. In other words, the point predictor $\widehat{Y}(X,\mathcal{D})$ is replaced with a prediction region for $Y$ with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric pvalues. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.Electronic Journal of Statistics 02/2008; · 0.79 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We study nonparametric maximum likelihood estimation of a logconcave probability density and its distribution and hazard function. Some general properties of these estimators are derived from two characterizations. It is shown that the rate of convergence with respect to supremum norm on a compact interval for the density and hazard rate estimator is at least (log(n)/n)^{1/3} and typically (log(n)/n)^{2/5} whereas the difference between the empirical and estimated distribution function vanishes with rate o_p (n^{1/2}) under certain regularity assumptions.Bernoulli 10/2007; · 0.94 Impact Factor
Publication Stats
354  Citations  
39.26  Total Impact Points  
Top Journals
Institutions

2003–2013

Universität Bern
 Institute of Mathematical Statistics and Actuarial Science
Berna, Bern, Switzerland 
University of Rostock
 Institut für Anatomie
Rostock, MecklenburgVorpommern, Germany


2011

Technische Universität Dresden
Dresden, Saxony, Germany


2009

University of Tampere
Tammerfors, Province of Western Finland, Finland


2008

University of Cambridge
Cambridge, England, United Kingdom


2007

Stanford University
Palo Alto, California, United States


2006

Lomonosov Moscow State University
Moskva, Moscow, Russia


1998

Universität zu Lübeck
Lübeck Hansestadt, SchleswigHolstein, Germany


1994–1998

Universität Heidelberg
 Institute of Applied Mathematics
Heidelburg, BadenWürttemberg, Germany


1993

University of California, Berkeley
 Department of Statistics
Berkeley, CA, United States
