Conference Paper

Learning message-passing inference machines for structured prediction

Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
DOI: 10.1109/CVPR.2011.5995724 Conference: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011
Source: IEEE Xplore


Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learned, predictions are often inaccurate due to approximations. In this work, instead of performing inference over a graphical model, we instead consider the inference procedure as a composition of predictors. Specifically, we focus on message-passing algorithms, such as Belief Propagation, and show how they can be viewed as procedures that sequentially predict label distributions at each node over a graph. Given labeled graphs, we can then train the sequence of predictors to output the correct labeling s. The result no longer corresponds to a graphical model but simply defines an inference procedure, with strong theoretical properties, that can be used to classify new graphs. We demonstrate the scalability and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images.

Full-text preview

Available from:
  • Source
    • "Indeed, we observe empirically that our model is superior to (Zheng et al., 2015). Besides the approach of (Zheng et al., 2015) and (Schwing & Urtasun, 2015), there are many other works that consider the idea of backpropagation with a so-called unrolled CRF-inference scheme, such as (Domke, 2013; Kiefel & Gehler, 2014; Barbu, 2009; Ross et al., 2011; Stoyanov et al., 2011; Tompson et al., 2014; Liu et al., 2015). These inference steps mostly correspond to message passing operations of e.g. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Deep Models, such as Convolutional Neural Networks (CNNs), are omnipresent in computer vision, as well as, structured models, such as Conditional Random Fields (CRFs). Combining them brings many advantages, foremost the ability to explicitly model the dependencies between output variables (CRFs) using thereby the incredible power of CNNs. In this work we present a CRF model were factors are dependent on CNNs. Our main contribution is a joint, maximum likelihood-based, learning procedure for all model parameters. Previous work either concentrated on training-in-pieces, or joint learning of restricted model families, such as Gaussian CRFs or CRFs with a few variables only. We empirically observe that our model is superior to prior art for scenarios where repulsive factors are necessary. In particular, we demonstrate this for Kinect-based body part labeling.
  • Source
    • "In [2], the fields of experts [27] MRF model was discriminatively trained for image denoising by unfolding a fixed number of gradient descent inference steps. In [26], message-passing inference machines were trained for structured prediction tasks by considering the belief propagation-based inference of a discrete graphical model as a sequence of predictors. In [13], a feed-forward sparse code predictor was trained by unfolding a coordinate descent based sparse coding inference algorithm. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel deep network architecture for image denoising based on a Gaussian Conditional Random Field (GCRF) model. In contrast to the existing discriminative denoising methods that train a separate model for each noise level, the proposed deep network explicitly models the input noise variance and hence is capable of handling a range of noise levels. Our deep network, which we refer to as deep GCRF network, consists of two sub-networks: (i) a parameter generation network that generates the pairwise potential parameters based on the noisy input image, and (ii) an inference network whose layers perform the computations involved in an iterative GCRF inference procedure. We train the entire deep GCRF network (both parameter generation and inference networks) discriminatively in an end-to-end fashion by maximizing the peak signal-to-noise ratio measure. Experiments on Berkeley segmentation and PASCALVOC datasets show that the proposed deep GCRF network outperforms state-of-the-art image denoising approaches for several noise levels.
  • Source
    • "To our knowledge no such model has been successfully used for the problem of detecting and localizing body part positions of humans in images. Recently, Ross et al. [26] use a message-passing inspired procedure for structured prediction on computer vision tasks, such as 3D point cloud classification and 3D surface estimation from single images. In contrast to this work, we formulate our message-parsing inspired network in a way that is more amenable to back-propagation and so can be implemented in existing neural networks. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.
Show more