Conference Paper
Learning messagepassing inference machines for structured prediction
Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
DOI: 10.1109/CVPR.2011.5995724 Conference: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 2025 June 2011 Source: IEEE Xplore
ABSTRACT
Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learned, predictions are often inaccurate due to approximations. In this work, instead of performing inference over a graphical model, we instead consider the inference procedure as a composition of predictors. Specifically, we focus on messagepassing algorithms, such as Belief Propagation, and show how they can be viewed as procedures that sequentially predict label distributions at each node over a graph. Given labeled graphs, we can then train the sequence of predictors to output the correct labeling s. The result no longer corresponds to a graphical model but simply defines an inference procedure, with strong theoretical properties, that can be used to classify new graphs. We demonstrate the scalability and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images.
Fulltext preview
cmu.edu Available from: Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

 "Some early works that advocated gradient backpropagation through graphical model inference for parameter optimization include [35] [8] [22] and [17]. Our work differentiates from the above works since, to our knowledge, we are the first to propose and conduct a thorough experimental investigation of higher order potentials that are based on detection outputs and superpixel segmentation , in a CRF which is learned endtoend in a deep network. "
[Show abstract] [Hide abstract]
ABSTRACT: We tackle the problem of semantic segmentation using deep learning techniques. Most semantic segmentation systems include a Conditional Random Field (CRF) model to produce a structured output that is consistent with visual features of the image. With recent advances in deep learning, it is becoming increasingly common to perform CRF inference within a deep neural network to facilitate joint learning of the CRF with a pixelwise Convolutional Neural Network (CNN) classifier. While basic CRFs use only unary and pairwise potentials, it has been shown that the addition of higher order potentials defined on cliques with more than two nodes can result in a better segmentation outcome. In this paper, we show that two types of higher order potential, namely, object detection based potentials and superpixel based potentials, can be included in a CRF embedded within a deep network. We design these higher order potentials to allow inference with the efficient and differentiable meanfield algorithm, making it possible to implement our CRF model as a stack of layers in a deep network. As a result, all parameters of our richer CRF model can be jointly learned with a CNN classifier during the endtoend training of the entire network. We find significant improvement in the results with the introduction of these trainable higher order potentials. 
 "Indeed, we observe empirically that our model is superior to (Zheng et al., 2015). Besides the approach of (Zheng et al., 2015) and (Schwing & Urtasun, 2015), there are many other works that consider the idea of backpropagation with a socalled unrolled CRFinference scheme, such as (Domke, 2013; Kiefel & Gehler, 2014; Barbu, 2009; Ross et al., 2011; Stoyanov et al., 2011; Tompson et al., 2014; Liu et al., 2015). These inference steps mostly correspond to message passing operations of e.g. "
[Show abstract] [Hide abstract]
ABSTRACT: Deep Models, such as Convolutional Neural Networks (CNNs), are omnipresent in computer vision, as well as, structured models, such as Conditional Random Fields (CRFs). Combining them brings many advantages, foremost the ability to explicitly model the dependencies between output variables (CRFs) using thereby the incredible power of CNNs. In this work we present a CRF model were factors are dependent on CNNs. Our main contribution is a joint, maximum likelihoodbased, learning procedure for all model parameters. Previous work either concentrated on traininginpieces, or joint learning of restricted model families, such as Gaussian CRFs or CRFs with a few variables only. We empirically observe that our model is superior to prior art for scenarios where repulsive factors are necessary. In particular, we demonstrate this for Kinectbased body part labeling. 
 "In [2], the fields of experts [27] MRF model was discriminatively trained for image denoising by unfolding a fixed number of gradient descent inference steps. In [26], messagepassing inference machines were trained for structured prediction tasks by considering the belief propagationbased inference of a discrete graphical model as a sequence of predictors. In [13], a feedforward sparse code predictor was trained by unfolding a coordinate descent based sparse coding inference algorithm. "
[Show abstract] [Hide abstract]
ABSTRACT: We propose a novel deep network architecture for image denoising based on a Gaussian Conditional Random Field (GCRF) model. In contrast to the existing discriminative denoising methods that train a separate model for each noise level, the proposed deep network explicitly models the input noise variance and hence is capable of handling a range of noise levels. Our deep network, which we refer to as deep GCRF network, consists of two subnetworks: (i) a parameter generation network that generates the pairwise potential parameters based on the noisy input image, and (ii) an inference network whose layers perform the computations involved in an iterative GCRF inference procedure. We train the entire deep GCRF network (both parameter generation and inference networks) discriminatively in an endtoend fashion by maximizing the peak signaltonoise ratio measure. Experiments on Berkeley segmentation and PASCALVOC datasets show that the proposed deep GCRF network outperforms stateoftheart image denoising approaches for several noise levels.