C. Rother

University College London, London, ENG, United Kingdom

Are you C. Rother?

Claim your profile

Publications (22)10.63 Total impact

  • Article: Fast Cost-Volume Filtering for Visual Correspondence and Beyond.
    [show abstract] [hide abstract]
    ABSTRACT: Many computer vision tasks can be formulated as labeling problems. The desired solution is often a spatially smooth labeling where label transitions are aligned with color edges of the input image. We show that such solutions can be efficiently achieved by smoothing the label costs with a very fast edge-preserving filter. In this paper we propose a generic and simple framework comprising three steps: (i) Constructing a cost volume; (ii) Fast cost volume filtering; and (iii) Winner-Takes-All label selection. Our main contribution is to show that with such a simple framework state-of-the-art results can be achieved for several computer vision applications. In particular, we achieve (i) disparity maps in real-time, whose quality exceeds those of all other fast (local) approaches on the Middlebury stereo benchmark, and (ii) optical flow fields which contain very fine structures as well as large displacements. To demonstrate robustness, the few parameters of our framework are set to nearly identical values for both applications. Also, competitive results for interactive image segmentation are presented. With this work, we hope to inspire other researchers to leverage this framework to other application areas.
    IEEE Transactions on Software Engineering 07/2012; · 1.98 Impact Factor
  • Source
    Conference Proceeding: Object cosegmentation
    [show abstract] [hide abstract]
    ABSTRACT: Cosegmentation is typically defined as the task of jointly segmenting “something similar” in a given set of images. Existing methods are too generic and so far have not demonstrated competitive results for any specific task. In this paper we overcome this limitation by adding two new aspects to cosegmentation: (1) the “something” has to be an object, and (2) the “similarity” measure is learned. In this way, we are able to achieve excellent results on the recently introduced iCoseg dataset, which contains small sets of images of either the same object instance or similar objects of the same class. The challenge of this dataset lies in the extreme changes in viewpoint, lighting, and object deformations within each set. We are able to considerably outperform several competitors. To achieve this performance, we borrow recent ideas from object recognition: the use of powerful features extracted from a pool of candidate object-like segmentations. We believe that our work will be beneficial to several application areas, such as image retrieval.
    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on; 07/2011
  • Source
    Article: Fusion Moves for Markov Random Field Optimization
    [show abstract] [hide abstract]
    ABSTRACT: The efficient application of graph cuts to Markov Random Fields (MRFs) with multiple discrete or continuous labels remains an open question. In this paper, we demonstrate one possible way of achieving this by using graph cuts to combine pairs of suboptimal labelings or solutions. We call this combination process the fusion move. By employing recently developed graph-cut-based algorithms (so-called QPBO-graph cut), the fusion move can efficiently combine two proposal labelings in a theoretically sound way, which is in practice often globally optimal. We demonstrate that fusion moves generalize many previous graph-cut approaches, which allows them to be used as building blocks within a broader variety of optimization schemes than were considered before. In particular, we propose new optimization schemes for computer vision MRFs with applications to image restoration, stereo, and optical flow, among others. Within these schemes the fusion moves are used 1) for the parallelization of MRF optimization into several threads, 2) for fast MRF optimization by combining cheap-to-compute solutions, and 3) for the optimization of highly nonconvex continuous-labeled MRFs with 2D labels. Our final example is a nonvision MRF concerned with cartographic label placement, where fusion moves can be used to improve the performance of a standard inference method (loopy belief propagation).
    IEEE Transactions on Pattern Analysis and Machine Intelligence 09/2010; · 4.91 Impact Factor
  • Source
    Conference Proceeding: Joint optimization of segmentation and appearance models
    [show abstract] [hide abstract]
    ABSTRACT: Many interactive image segmentation approaches use an objective function which includes appearance models as an unknown variable. Since the resulting optimization problem is NP-hard the segmentation and appearance are typically optimized separately, in an EM-style fashion. One contribution of this paper is to express the objective function purely in terms of the unknown segmentation, using higher-order cliques. This formulation reveals an interesting bias of the model towards balanced segmentations. Furthermore, it enables us to develop a new dual decomposition optimization procedure, which provides additionally a lower bound. Hence, we are able to improve on existing optimizers, and verify that for a considerable number of real world examples we even achieve global optimality. This is important since we are able, for the first time, to analyze the deficiencies of the model. Another contribution is to establish a property of a particular dual decomposition approach which involves convex functions depending on foreground area. As a consequence, we show that the optimal decomposition for our problem can be computed efficiently via a parametric maxflow algorithm.
    Computer Vision, 2009 IEEE 12th International Conference on; 11/2009
  • Source
    Conference Proceeding: A global perspective on MAP inference for low-level vision
    [show abstract] [hide abstract]
    ABSTRACT: In recent years the Markov Random Field (MRF) has become the de facto probabilistic model for low-level vision applications. However, in a maximum a posteriori (MAP) framework, MRFs inherently encourage delta function marginal statistics. By contrast, many low-level vision problems have heavy tailed marginal statistics, making the MRF model unsuitable. In this paper we introduce a more general Marginal Probability Field (MPF), of which the MRF is a special, linear case, and show that convex energy MPFs can be used to encourage arbitrary marginal statistics. We introduce a flexible, extensible framework for effectively optimizing the resulting NP-hard MAP problem, based around dual-decomposition and a modified min-cost flow algorithm, and which achieves global optimality in some instances. We use a range of applications, including image denoising and texture synthesis, to demonstrate the benefits of this class of MPF over MRFs.
    Computer Vision, 2009 IEEE 12th International Conference on; 11/2009
  • Source
    Conference Proceeding: Image segmentation with a bounding box prior
    [show abstract] [hide abstract]
    ABSTRACT: User-provided object bounding box is a simple and popular interaction paradigm considered by many existing interactive image segmentation frameworks. However, these frameworks tend to exploit the provided bounding box merely to exclude its exterior from consideration and sometimes to initialize the energy minimization. In this paper, we discuss how the bounding box can be further used to impose a powerful topological prior, which prevents the solution from excessive shrinking and ensures that the user-provided box bounds the segmentation in a sufficiently tight way. The prior is expressed using hard constraints incorporated into the global energy minimization framework leading to an NP-hard integer program. We then investigate the possible optimization strategies including linear relaxation as well as a new graph cut algorithm called pinpointing. The latter can be used either as a rounding method for the fractional LP solution, which is provably better than thresholding-based rounding, or as a fast standalone heuristic. We evaluate the proposed algorithms on a publicly available dataset, and demonstrate the practical benefits of the new prior both qualitatively and quantitatively.
    Computer Vision, 2009 IEEE 12th International Conference on; 11/2009
  • Source
    Conference Proceeding: New appearance models for natural image matting
    D. Singaraju, C. Rother, C. Rhemann
    [show abstract] [hide abstract]
    ABSTRACT: Image matting is the task of estimating a fore- and background layer from a single image. To solve this ill posed problem, an accurate modeling of the scene's appearance is necessary. Existing methods that provide a closed form solution to this problem, assume that the colors of the foreground and background layers are locally linear. In this paper, we show that such models can be an overfit when the colors of the two layers are locally constant. We derive new closed form expressions in such cases, and show that our models are more compact than existing ones. In particular, the null space of our cost function is a subset of the null space constructed by existing approaches. We discuss the bias towards specific solutions for each formulation. Experiments on synthetic and real data confirm that our compact models estimate alpha mattes more accurately than existing techniques, without the need of additional user interaction.
    Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on; 07/2009
  • Source
    Conference Proceeding: Graph cut based image segmentation with connectivity priors
    [show abstract] [hide abstract]
    ABSTRACT: Graph cut is a popular technique for interactive image segmentation. However, it has certain shortcomings. In particular, graph cut has problems with segmenting thin elongated objects due to the ldquoshrinking biasrdquo. To overcome this problem, we propose to impose an additional connectivity prior, which is a very natural assumption about objects. We formulate several versions of the connectivity constraint and show that the corresponding optimization problems are all NP-hard. For some of these versions we propose two optimization algorithms: (i) a practical heuristic technique which we call DijkstraGC, and (ii) a slow method based on problem decomposition which provides a lower bound on the problem. We use the second technique to verify that for some practical examples DijkstraGC is able to find the global minimum.
    Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on; 07/2008
  • Source
    Conference Proceeding: High resolution matting via interactive trimap segmentation
    [show abstract] [hide abstract]
    ABSTRACT: We present a new approach to the matting problem which splits the task into two steps: interactive trimap extraction followed by trimap-based alpha matting. By doing so we gain considerably in terms of speed and quality and are able to deal with high resolution images. This paper has three contributions: (i) a new trimap segmentation method using parametric max-flow; (ii) an alpha matting technique for high resolution images with a new gradient preserving prior on alpha; (iii) a database of 27 ground truth alpha mattes of still objects, which is considerably larger than previous databases and also of higher quality. The database is used to train our system and to validate that both our trimap extraction and our matting method improve on state-of-the-art techniques.
    Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on; 07/2008
  • Source
    Conference Proceeding: Applications of parametric maxflow in computer vision
    V. Kolmogorov, Y. Boykov, C. Rother
    [show abstract] [hide abstract]
    ABSTRACT: The maximum flow algorithm for minimizing energy functions of binary variables has become a standard tool in computer vision. In many cases, unary costs of the energy depend linearly on parameter lambda. In this paper we study vision applications for which it is important to solve the maxflow problem for different lambda's. An example is a weighting between data and regularization terms in image segmentation or stereo: it is desirable to vary it both during training (to learn lambda from ground truth data) and testing (to select best lambda using high-knowledge constraints, e.g. user input). We review algorithmic aspects of this parametric maximum flow problem previously unknown in vision, such as the ability to compute all breakpoints of lambda and corresponding optimal configurations infinite time. These results allow, in particular, to minimize the ratio of some geometric functional, such as flux of a vector field over length (or area). Previously, such functional were tackled with shortest path techniques applicable only in 2D. We give theoretical improvements for "PDE cuts" [5]. We present experimental results for image segmentation, 3D reconstruction, and the cosegmentation problem.
    Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on; 11/2007
  • Source
    Conference Proceeding: LogCut - Efficient Graph Cut Optimization for Markov Random Fields
    V. Lempitsky, C. Rother, A. Blake
    [show abstract] [hide abstract]
    ABSTRACT: Markov Random Fields (MRFs) are ubiquitous in low- level computer vision. In this paper, we propose a new approach to the optimization of multi-labeled MRFs. Similarly to a-expansion it is based on iterative application of binary graph cut. However, the number of binary graph cuts required to compute a labelling grows only logarithmically with the size of label space, instead of linearly. We demonstrate that for applications such as optical flow, image restoration, and high resolution stereo, this gives an order of magnitude speed-up, for comparable energies. Iterations are performed by "fusion" of solutions, done with QPBO which is related to graph cut but can deal with non- submodularity. At convergence, the method achieves optima on a par with the best competitors, and sometimes even exceeds them.
    Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on; 11/2007
  • Chapter: Fusion of Stereo, Colour and Contrast
    [show abstract] [hide abstract]
    ABSTRACT: Stereo vision has numerous applications in robotics, graphics, inspection and other areas. A prime application, one which has driven work on stereo in our laboratory, is teleconferencing in which the use of a stereo webcam already makes possible various transformations of the video stream. These include digital camera control, insertion of virtual objects, background substitution, and eye-gaze correction [9, 8].
    05/2007: pages 295-304;
  • Article: Efficient Dense Stereo with Occlusions for New View-Synthesis by Four-State Dynamic Programming
    [show abstract] [hide abstract]
    ABSTRACT: A new algorithm is proposed for efficient stereo and novel view synthesis. Given the video streams acquired by two synchronized cameras the proposed algorithm synthesises images from a virtual camera in arbitrary position near the physical cameras. The new technique is based on an improved, dynamic-programming, stereo algorithm for efficient novel view generation. The two main contributions of this paper are: (i) a new four state matching graph for dense stereo dynamic programming, that supports accurate occlusion labelling; (ii) a compact geometric derivation for novel view synthesis by direct projection of the minimum cost surface. Furthermore, the paper presents an algorithm for the temporal maintenance of a background model to enhance the rendering of occlusions and reduce temporal artefacts (flicker); and a cost aggregation algorithm that acts directly in the three-dimensional matching cost space. The proposed algorithm has been designed to work with input images with large disparity range, a common practical situation. The enhanced occlusion handling capabilities of the new dynamic programming algorithm are evaluated against those of the most powerful state-of-the-art dynamic programming and graph-cut techniques. Four-state DP is also evaluated against the disparity-based Middlebury error metrics and its performance found to be amongst the best of the efficient algorithms. A number of examples demonstrate the robustness of four-state DP to artefacts in stereo video streams. This includes demonstrations of cyclopean view synthesis in extended conversational sequences, synthesis from a freely translating virtual camera and, finally, basic 3D scene editing.
    International Journal of Computer Vision 12/2006; 71(1):89-110. · 3.74 Impact Factor
  • Source
    Conference Proceeding: Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs
    [show abstract] [hide abstract]
    ABSTRACT: We introduce the term cosegmentation which denotes the task of segmenting simultaneously the common parts of an image pair. A generative model for cosegmentation is presented. Inference in the model leads to minimizing an energy with an MRF term encoding spatial coherency and a global constraint which attempts to match the appearance histograms of the common parts. This energy has not been proposed previously and its optimization is challenging and NP-hard. For this problem a novel optimization scheme which we call trust region graph cuts is presented. We demonstrate that this framework has the potential to improve a wide range of research: Object driven image retrieval, video tracking and segmentation, and interactive image editing. The power of the framework lies in its generality, the common part can be a rigid/non-rigid object (or scene), observed from different viewpoints or even similar objects of the same class.
    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on; 07/2006
  • Source
    Conference Proceeding: Bi-layer segmentation of binocular stereo video
    [show abstract] [hide abstract]
    ABSTRACT: This paper describes two algorithms capable of real-time segmentation of foreground from background layers in stereo video sequences. Automatic separation of layers from colour/contrast or from stereo alone is known to be error-prone. Here, colour, contrast and stereo matching information are fused to infer layers accurately and efficiently. The first algorithm, layered dynamic programming (LDP), solves stereo in an extended 6-state space that represents both foreground/background layers and occluded regions. The stereo-match likelihood is then fused with a contrast-sensitive colour model that is learned on the fly, and stereo disparities are obtained by dynamic programming. The second algorithm, layered graph cut (LGC), does not directly solve stereo. Instead the stereo match likelihood is marginalised over foreground and background hypotheses, and fused with a contrast-sensitive colour model like the one used in LDP. Segmentation is solved efficiently by ternary graph cut. Both algorithms are evaluated with respect to ground truth data and found to have similar p performance, substantially better than stereo or colour/contrast alone. However, their characteristics with respect to computational efficiency are rather different. The algorithms are demonstrated in the application of background substitution and shown to give good quality composite video output.
    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on; 07/2005
  • Source
    Conference Proceeding: Digital tapestry [automatic image synthesis]
    [show abstract] [hide abstract]
    ABSTRACT: This paper addresses the novel problem of automatically synthesizing an output image from a large collection of different input images. The synthesized image, called a digital tapestry, can be viewed as a visual summary or a virtual 'thumbnail' of all the images in the input collection. The problem of creating the tapestry is cast as a multi-class labeling problem such that each region in the tapestry is constructed from input image blocks that are salient and such that neighboring blocks satisfy spatial compatibility. This is formulated using a Markov random field and optimized via the graph cut based expansion move algorithm. The standard expansion move algorithm can only handle energies with metric terms, while our energy contains non-metric (soft and hard) constraints. Therefore we propose two novel contributions. First, we extend the expansion move algorithm for energy functions with non-metric hard constraints. Secondly, we modify it for functions with "almost" metric soft terms, and show that it gives good results in practice. The proposed framework was tested on several consumer photograph collections, and the results are presented.
    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on; 07/2005
  • Source
    Article: Bayesian Color Constancy Revisited
    [show abstract] [hide abstract]
    ABSTRACT: Computational color constancy is the task of estimating the true reflectances of visible surfaces in an image. In this paper we follow a line of research that assumes uniform illumination of a scene, and that the principal step in estimating reflectances is the estimation of the scene illuminant. We review recent approaches to illuminant estimation, firstly those based on formulae for normalisation of the reflectance distribution in an image — so-called grey-world algorithms, and those based on a Bayesian formulation of image formation. In evaluating these previous approaches we introduce a new tool in the form of a database of 568 high-quality, indoor and outdoor images, accurately labelled with illuminant, and preserved in their raw form, free of correction or normalisation. This has enabled us to establish several properties experimentally. Firstly automatic selection of grey-world algorithms according to image properties is not nearly so effective as has been thought. Secondly, it is shown that Bayesian illuminant estimation is significantly improved by the improved accuracy of priors for illuminant and reflectance that are obtained from the new dataset.
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 1-8 (2008).
  • Article: A stereo approach that handles the matting problem via image warping
    [show abstract] [hide abstract]
    ABSTRACT: We propose an algorithm that simultaneously extracts disparities and alpha matting information given a stereo image pair. Our method divides the reference image into a set of overlapping, partially transparent color segments. Each segment pixel is assigned an alpha value and color. The disparity inside the segment is modeled via a plane. The goodness of alphas, colors and disparity planes is measured by a new energy function. Its basic idea is to use the three parameters for generating artificial views representing the left and right images. If alphas, colors and disparity planes are correct, these artificial images should be very similar to the real ones. For generating the artificial right view, we warp all pixels of the left into the geometry of the right image using the disparity planes. We introduce the assumption of constant solidity in order to correctly model how pixels' alpha values are affected by the warping operation. Experimental results on the Middlebury set show that our algorithm gives good results in comparison to the state-of-the-art in stereo matching.
    2012 IEEE Conference on Computer Vision and Pattern Recognition.
  • Article: Learning an interactive segmentation system
    [show abstract] [hide abstract]
    ABSTRACT: Many successful applications of computer vision to image or video manipulation are interactive by nature. However, parameters of such systems are often trained neglecting the user. Traditionally, interactive systems have been treated in the same manner as their fully automatic counterparts. Their performance is evaluated by computing the accuracy of their solutions under some fixed set of user interactions. This paper proposes a new evaluation and learning method which brings the user in the loop. It is based on the use of an active robot user -- a simulated model of a human user. We show how this approach can be used to evaluate and learn parameters of state-of-the-art interactive segmentation systems. We also show how simulated user models can be integrated into the popular max-margin method for parameter learning and propose an algorithm to solve the resulting optimisation problem.
    Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2010), 274-281 (2010).
  • Source
    Article: Bayesian methods in graphics
    [show abstract] [hide abstract]
    ABSTRACT: Within this talk I shall describe some of the work I have been in-volved with at Microsoft Research Cambridge on Bayesian meth-ods. In particular I will cover the application of Bayesian meth-ods to certain problems relating to the field of Computer Graph-ics. Bayesian methods provide a rational way of making inference about problems; including learning parameters that are so often set by hand, and the incorporation of prior knowledge. The particular problems I shall address are (a) image cut out, (b) new view synthe-sis, (c) motion capture of articulated objects (e.g. hands) for driving avatars, (d) general 3D reconstruction. In all of these problems I hope to demonstrate that the Bayesian approach leads to new insights and methodologies that improve on existing methods.