Conference Paper

Learning message-passing inference machines for structured prediction

Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
DOI: 10.1109/CVPR.2011.5995724 Conference: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011
Source: IEEE Xplore

ABSTRACT Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learned, predictions are often inaccurate due to approximations. In this work, instead of performing inference over a graphical model, we instead consider the inference procedure as a composition of predictors. Specifically, we focus on message-passing algorithms, such as Belief Propagation, and show how they can be viewed as procedures that sequentially predict label distributions at each node over a graph. Given labeled graphs, we can then train the sequence of predictors to output the correct labeling s. The result no longer corresponds to a graphical model but simply defines an inference procedure, with strong theoretical properties, that can be used to classify new graphs. We demonstrate the scalability and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images.

13 Reads
  • Source
    • "To our knowledge no such model has been successfully used for the problem of detecting and localizing body part positions of humans in images. Recently, Ross et al. [26] use a message-passing inspired procedure for structured prediction on computer vision tasks, such as 3D point cloud classification and 3D surface estimation from single images. In contrast to this work, we formulate our message-parsing inspired network in a way that is more amenable to back-propagation and so can be implemented in existing neural networks. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.
  • Source
    • "Decision trees have also been used for reducing the computational complexity of inference methods. Shapovalov et al. [22] recently showed how the inference machines framework proposed in [21] can be used in conjunction with decision forests to efficiently assign class labels to 3D points. Our work differs from these methods in that it does not require explicit inference, it is non-iterative and involves a simple (yet spatially-varying) convolution operation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose 'filter forests' (FF), an efficient new discriminative approach for predicting continuous variables given a signal and its context. FF can be used for general signal restoration tasks that can be tackled via convolutional filter-ing, where it attempts to learn the optimal filtering kernels to be applied to each data point. The model can learn both the size of the kernel and its values, conditioned on the ob-servation and its spatial or temporal context. We show that FF compares favorably to both Markov random field based and recently proposed regression forest based approaches for labeling problems in terms of efficiency and accuracy. In particular, we demonstrate how FF can be used to learn optimal denoising filters for natural images as well as for other tasks such as depth image refinement, and 1D signal magnitude estimation. Numerous experiments and quanti-tative comparisons show that FFs achieve accuracy at par or superior to recent state of the art techniques, while being several orders of magnitude faster.
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 06/2014
  • Source
    • "Typical approaches include randomly selecting elements to update , iterating over the structure in a fixed ordering, or simultaneously updating all predictions at all iterations. As shown by Ross et al. [20], this iterative decoding approach can is equivalent to message passing approaches used to solve graphical models, where each update encodes a single set of messages passed to one node in the graphical model. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Structured prediction plays a central role in machine learning applications from computational biology to computer vision. These models require significantly more computation than unstructured models, and, in many applications, algorithms may need to make predictions within a computational budget or in an anytime fashion. In this work we propose an anytime technique for learning structured prediction that, at training time, incorporates both structural elements and feature computation trade-offs that affect test-time inference. We apply our technique to the challenging problem of scene understanding in computer vision and demonstrate efficient and anytime predictions that gradually improve towards state-of-the-art classification performance as the allotted time increases.
Show more

Preview (2 Sources)

13 Reads
Available from