Conference Paper

Multi-task classification with infinite local experts.

Dept. of Electr. & Comput. Eng., Duke Univ., Durham, NC
DOI: 10.1109/ICASSP.2009.4959897 Conference: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, 19-24 April 2009, Taipei, Taiwan
Source: IEEE Xplore

ABSTRACT We propose a multi-task learning (MTL) framework for non-linear classification, based on an infinite set of local experts in feature space. The usage of local experts enables sharing at the expert-level, encouraging the borrowing of information even if tasks are similar only in subregions of feature space. A kernel stick-breaking process (KSBP) prior is imposed on the underlying distribution of class labels, so that the number of experts is inferred in the posterior and thus model selection issues are avoided. The MTL is implemented by imposing a Dirichlet process (DP) prior on a layer above the task-dependent KSBPs.

0 Bookmarks
 · 
93 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In most machine learning approaches, it is usually assumed that data are complete. When data are partially missing due to various reasons, for example, the failure of a subset of sensors, image corruption or inadequate medical measurements, many learning methods designed for complete data cannot be directly applied. In this dissertation we treat two kinds of problems with incomplete data using non-parametric Bayesian approaches: classification with incomplete features and analysis of low-rank matrices with missing entries.Incomplete data in classification problems are handled by assuming input features to be generated from a mixture-of-experts model, with each individual expert (classifier) defined by a local Gaussian in feature space. With a linear classifier associated with each Gaussian component, nonlinear classification boundaries are achievable without the introduction of kernels. Within the proposed model, the number of components is theoretically ``infinite'' as defined by a Dirichlet process construction, with the actual number of mixture components (experts) needed inferred based upon the data under test. With a higher-level DP we further extend the classifier for analysis of multiple related tasks (multi-task learning), where model components may be shared across tasks. Available data could be augmented by this way of information transfer even when tasks are only similar in some local regions of feature space, which is particularly critical for cases with scarce incomplete training samples from each task. The proposed algorithms are implemented using efficient variational Bayesian inference and robust performance is demonstrated on synthetic data, benchmark data sets, and real data with natural missing values.Another scenario of interest is to complete a data matrix with entries missing. The recovery of missing matrix entries is not possible without additional assumptions on the matrix under test, and here we employ the common assumption that the matrix is low-rank. Unlike methods with a preset fixed rank, we propose a non-parametric Bayesian alternative based on the singular value decomposition (SVD), where missing entries are handled naturally, and the number of underlying factors is imposed to be small and inferred in the light of observed entries. Although we assume missing at random, the proposed model is generalized to incorporate auxiliary information including missingness features. We also make a first attempt in the matrix-completion community to acquire new entries actively. By introducing a probit link function, we are able to handle counting matrices with the decomposed low-rank matrices latent. The basic model and its extensions are validated onsynthetic data, a movie-rating benchmark and a new data set presented for the first time. Dissertation
    01/2010;

Full-text (3 Sources)

Download
35 Downloads
Available from
May 19, 2014