Conference Paper

Multi-task classification with infinite local experts

Dept. of Electr. & Comput. Eng., Duke Univ., Durham, NC
DOI: 10.1109/ICASSP.2009.4959897 Conference: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, 19-24 April 2009, Taipei, Taiwan
Source: IEEE Xplore


We propose a multi-task learning (MTL) framework for non-linear classification, based on an infinite set of local experts in feature space. The usage of local experts enables sharing at the expert-level, encouraging the borrowing of information even if tasks are similar only in subregions of feature space. A kernel stick-breaking process (KSBP) prior is imposed on the underlying distribution of class labels, so that the number of experts is inferred in the posterior and thus model selection issues are avoided. The MTL is implemented by imposing a Dirichlet process (DP) prior on a layer above the task-dependent KSBPs.

Download full-text


Available from: David B Dunson
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In most machine learning approaches, it is usually assumed that data are complete. When data are partially missing due to various reasons, for example, the failure of a subset of sensors, image corruption or inadequate medical measurements, many learning methods designed for complete data cannot be directly applied. In this dissertation we treat two kinds of problems with incomplete data using non-parametric Bayesian approaches: classification with incomplete features and analysis of low-rank matrices with missing entries.Incomplete data in classification problems are handled by assuming input features to be generated from a mixture-of-experts model, with each individual expert (classifier) defined by a local Gaussian in feature space. With a linear classifier associated with each Gaussian component, nonlinear classification boundaries are achievable without the introduction of kernels. Within the proposed model, the number of components is theoretically ``infinite'' as defined by a Dirichlet process construction, with the actual number of mixture components (experts) needed inferred based upon the data under test. With a higher-level DP we further extend the classifier for analysis of multiple related tasks (multi-task learning), where model components may be shared across tasks. Available data could be augmented by this way of information transfer even when tasks are only similar in some local regions of feature space, which is particularly critical for cases with scarce incomplete training samples from each task. The proposed algorithms are implemented using efficient variational Bayesian inference and robust performance is demonstrated on synthetic data, benchmark data sets, and real data with natural missing values.Another scenario of interest is to complete a data matrix with entries missing. The recovery of missing matrix entries is not possible without additional assumptions on the matrix under test, and here we employ the common assumption that the matrix is low-rank. Unlike methods with a preset fixed rank, we propose a non-parametric Bayesian alternative based on the singular value decomposition (SVD), where missing entries are handled naturally, and the number of underlying factors is imposed to be small and inferred in the light of observed entries. Although we assume missing at random, the proposed model is generalized to incorporate auxiliary information including missingness features. We also make a first attempt in the matrix-completion community to acquire new entries actively. By introducing a probit link function, we are able to handle counting matrices with the decomposed low-rank matrices latent. The basic model and its extensions are validated onsynthetic data, a movie-rating benchmark and a new data set presented for the first time. Dissertation
    Preview · Article · Jan 2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Clustering is a fundamental topic in pattern recognition and machine learning research. Traditional clustering methods deal with a single clustering task on a single data set. However, in many real applications, multiple similar clustering tasks are involved simultaneously, e.g., clustering clients of different shopping websites, in which data of different subjects are collected for each task. These tasks are cross-domains but closely related. It is proved that we can improve the individual performance of each clustering task by appropriately utilizing the underling relation. In this paper, we will propose a new approach, which performs multiple related clustering tasks simultaneously through domain adaptation. A shared subspace will be learned through domain adaptation, where the gap of distributions among tasks is reduced, and the shared knowledge will be transferred through all tasks by exploiting the strengthened relation in the learned subspace. Then the object is set as the best clustering in both the original and learned spaces. An alternating optimization method is introduced and its convergence is theoretically guaranteed. Experiments on both synthetic and real data sets demonstrate the effectiveness of the proposed approach.
    No preview · Article · Jan 2012 · Pattern Recognition