Conference Paper

Learning message-passing inference machines for structured prediction.

Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
DOI: 10.1109/CVPR.2011.5995724 Conference: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011
Source: IEEE Xplore

ABSTRACT Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learned, predictions are often inaccurate due to approximations. In this work, instead of performing inference over a graphical model, we instead consider the inference procedure as a composition of predictors. Specifically, we focus on message-passing algorithms, such as Belief Propagation, and show how they can be viewed as procedures that sequentially predict label distributions at each node over a graph. Given labeled graphs, we can then train the sequence of predictors to output the correct labeling s. The result no longer corresponds to a graphical model but simply defines an inference procedure, with strong theoretical properties, that can be used to classify new graphs. We demonstrate the scalability and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose 'filter forests' (FF), an efficient new discriminative approach for predicting continuous variables given a signal and its context. FF can be used for general signal restoration tasks that can be tackled via convolutional filter-ing, where it attempts to learn the optimal filtering kernels to be applied to each data point. The model can learn both the size of the kernel and its values, conditioned on the ob-servation and its spatial or temporal context. We show that FF compares favorably to both Markov random field based and recently proposed regression forest based approaches for labeling problems in terms of efficiency and accuracy. In particular, we demonstrate how FF can be used to learn optimal denoising filters for natural images as well as for other tasks such as depth image refinement, and 1D signal magnitude estimation. Numerous experiments and quanti-tative comparisons show that FFs achieve accuracy at par or superior to recent state of the art techniques, while being several orders of magnitude faster.
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 06/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The availability of cheap and effective depth sensors has re-sulted in recent advances in human pose estimation and tracking. De-tailed estimation of hand pose, however, remains a challenge since fingers are often occluded and may only represent just a few pixels. Moreover, labelled data is difficult to obtain. We propose a deep learning based-approach for hand pose estimation, targeting gesture recognition, that requires very little labelled data. It leverages both unlabeled data and synthetic data from renderings. The key to making it work is to integrate structural information not into the model architecture, which would slow down inference, but into the training objective. We show that adding un-labelled real-world samples significantly improves results compared to a purely supervised setting.
    Asian Conference on Computer Vision (ACCV), Singapore; 11/2014

Preview (2 Sources)

Available from