Regression Level Set Estimation Via Cost-Sensitive Classification

Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., Ann Arbor, MI
IEEE Transactions on Signal Processing (Impact Factor: 3.2). 07/2007; DOI: 10.1109/TSP.2007.893758
Source: DBLP

ABSTRACT Regression level set estimation is an important yet understudied learning task. It lies somewhere between regression function estimation and traditional binary classification, and in many cases is a more appropriate setting for questions posed in these more common frameworks. This note explains how estimating the level set of a regression function from training examples can be reduced to cost-sensitive classification. We discuss the theoretical and algorithmic benefits of this learning reduction, demonstrate several desirable properties of the associated risk, and report experimental results for histograms, support vector machines, and nearest neighbor rules on synthetic and real data

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of estimating the region on which a non-parametric regression function is at its baseline level in two dimensions. The baseline level typically corresponds to the minimum/maximum of the function and estimating such regions or their complements is pertinent to several problems arising in edge estimation, environmental statistics, fMRI and related fields. We assume the baseline region to be convex and estimate it via fitting a `stump' function to approximate $p$-values obtained from tests for deviation of the regression function from its baseline level. The estimates, obtained using an algorithm originally developed for constructing convex contours of a density, are studied in two different sampling settings, one where several responses can be obtained at a number of different covariate-levels (dose-response) and the other involving limited number of response values per covariate (standard regression). The shape of the baseline region and the smoothness of the regression function at its boundary play a critical role in determining the rate of convergence of our estimate: for a regression function which is `p-regular' at the boundary of the convex baseline region, our estimate converges at a rate $N^{2/(4p+3)}$ in the dose-response setting, $N$ being the total budget, and its analogue in the standard regression setting converges at a rate of $N^{1/(2p+2)}$. Extensions to non-convex baseline regions are explored as well.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Let (X,Y)(X,Y) be a random pair taking values in Rd×JRd×J, where J⊂RJ⊂R is supposed to be bounded. We propose a plug-in estimator of the level sets of the regression function rr of YY on XX, using a kernel estimator of rr. We consider an error criterion defined by the volume of the symmetrical difference between the real and estimated level sets. We state the consistency of our estimator, and we get a rate of convergence equivalent to the one obtained by Cadre (2006) for the density function level sets.
    Journal of the Korean Statistical Society 09/2013; 42(3):301–311. · 0.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper discusses a universal approach to the construction of confidence regions for level sets {h(x)≥0}⊂ℝ q of a function h of interest. The proposed construction is based on a plug-in estimate of the level sets using an appropriate estimate h ^ n of h. The approach provides finite sample upper and lower confidence limits. This leads to generic conditions under which the constructed confidence regions achieve a prescribed coverage level asymptotically. The construction requires an estimate of the quantiles of the distribution of sup Δ n |h ^ n (x)-h(x)| for appropriate sets Δ n ⊂ℝ q . In contrast to related work from the literature, the existence of a weak limit for an appropriately normalized process {h ^ n (x),x∈D} is not required. This adds significantly to the challenge of deriving asymptotic results for the corresponding coverage level. Our approach is exemplified in the case of a density level set utilizing a kernel density estimator and a bootstrap procedure.
    Journal of Multivariate Analysis 11/2013; 122. · 0.94 Impact Factor


1 Download
Available from