Conference Paper

Learning message-passing inference machines for structured prediction

Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
DOI: 10.1109/CVPR.2011.5995724 Conference: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011
Source: IEEE Xplore

ABSTRACT

Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learned, predictions are often inaccurate due to approximations. In this work, instead of performing inference over a graphical model, we instead consider the inference procedure as a composition of predictors. Specifically, we focus on message-passing algorithms, such as Belief Propagation, and show how they can be viewed as procedures that sequentially predict label distributions at each node over a graph. Given labeled graphs, we can then train the sequence of predictors to output the correct labeling s. The result no longer corresponds to a graphical model but simply defines an inference procedure, with strong theoretical properties, that can be used to classify new graphs. We demonstrate the scalability and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images.

Full-text preview

Available from: cmu.edu
  • Source
    • "Some early works that advocated gradient backpropagation through graphical model inference for parameter optimization include [35] [8] [22] and [17]. Our work differentiates from the above works since, to our knowledge, we are the first to propose and conduct a thorough experimental investigation of higher order potentials that are based on detection outputs and superpixel segmentation , in a CRF which is learned end-to-end in a deep network. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We tackle the problem of semantic segmentation using deep learning techniques. Most semantic segmentation systems include a Conditional Random Field (CRF) model to produce a structured output that is consistent with visual features of the image. With recent advances in deep learning, it is becoming increasingly common to perform CRF inference within a deep neural network to facilitate joint learning of the CRF with a pixel-wise Convolutional Neural Network (CNN) classifier. While basic CRFs use only unary and pairwise potentials, it has been shown that the addition of higher order potentials defined on cliques with more than two nodes can result in a better segmentation outcome. In this paper, we show that two types of higher order potential, namely, object detection based potentials and superpixel based potentials, can be included in a CRF embedded within a deep network. We design these higher order potentials to allow inference with the efficient and differentiable mean-field algorithm, making it possible to implement our CRF model as a stack of layers in a deep network. As a result, all parameters of our richer CRF model can be jointly learned with a CNN classifier during the end-to-end training of the entire network. We find significant improvement in the results with the introduction of these trainable higher order potentials.
    Full-text · Article · Nov 2015
  • Source
    • "Indeed, we observe empirically that our model is superior to (Zheng et al., 2015). Besides the approach of (Zheng et al., 2015) and (Schwing & Urtasun, 2015), there are many other works that consider the idea of backpropagation with a so-called unrolled CRF-inference scheme, such as (Domke, 2013; Kiefel & Gehler, 2014; Barbu, 2009; Ross et al., 2011; Stoyanov et al., 2011; Tompson et al., 2014; Liu et al., 2015). These inference steps mostly correspond to message passing operations of e.g. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Deep Models, such as Convolutional Neural Networks (CNNs), are omnipresent in computer vision, as well as, structured models, such as Conditional Random Fields (CRFs). Combining them brings many advantages, foremost the ability to explicitly model the dependencies between output variables (CRFs) using thereby the incredible power of CNNs. In this work we present a CRF model were factors are dependent on CNNs. Our main contribution is a joint, maximum likelihood-based, learning procedure for all model parameters. Previous work either concentrated on training-in-pieces, or joint learning of restricted model families, such as Gaussian CRFs or CRFs with a few variables only. We empirically observe that our model is superior to prior art for scenarios where repulsive factors are necessary. In particular, we demonstrate this for Kinect-based body part labeling.
    Full-text · Article · Nov 2015
  • Source
    • "In [2], the fields of experts [27] MRF model was discriminatively trained for image denoising by unfolding a fixed number of gradient descent inference steps. In [26], message-passing inference machines were trained for structured prediction tasks by considering the belief propagation-based inference of a discrete graphical model as a sequence of predictors. In [13], a feed-forward sparse code predictor was trained by unfolding a coordinate descent based sparse coding inference algorithm. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel deep network architecture for image denoising based on a Gaussian Conditional Random Field (GCRF) model. In contrast to the existing discriminative denoising methods that train a separate model for each noise level, the proposed deep network explicitly models the input noise variance and hence is capable of handling a range of noise levels. Our deep network, which we refer to as deep GCRF network, consists of two sub-networks: (i) a parameter generation network that generates the pairwise potential parameters based on the noisy input image, and (ii) an inference network whose layers perform the computations involved in an iterative GCRF inference procedure. We train the entire deep GCRF network (both parameter generation and inference networks) discriminatively in an end-to-end fashion by maximizing the peak signal-to-noise ratio measure. Experiments on Berkeley segmentation and PASCALVOC datasets show that the proposed deep GCRF network outperforms state-of-the-art image denoising approaches for several noise levels.
    Full-text · Article · Nov 2015
Show more