Filter Forests for Learning Data-Dependent Convolutional Kernels

Conference Paper (PDF Available) · June 2014with246 Reads
DOI: 10.1109/CVPR.2014.221
Conference: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
We propose 'filter forests' (FF), an efficient new discriminative approach for predicting continuous variables given a signal and its context. FF can be used for general signal restoration tasks that can be tackled via convolutional filter-ing, where it attempts to learn the optimal filtering kernels to be applied to each data point. The model can learn both the size of the kernel and its values, conditioned on the ob-servation and its spatial or temporal context. We show that FF compares favorably to both Markov random field based and recently proposed regression forest based approaches for labeling problems in terms of efficiency and accuracy. In particular, we demonstrate how FF can be used to learn optimal denoising filters for natural images as well as for other tasks such as depth image refinement, and 1D signal magnitude estimation. Numerous experiments and quanti-tative comparisons show that FFs achieve accuracy at par or superior to recent state of the art techniques, while being several orders of magnitude faster.


    • "While it is quite easy to get a huge database of color image examples, e.g. from the web, there exists no equivalent source for depth data. One workaround [9, 23] is to densely reconstruct a 3D scene with KinectFusion [19] and facilitate these reconstructions as ground-truth. However, this also introduces artifacts in the training data, such as smoothed edges and the loss of fine details. "
    [Show abstract] [Hide abstract] ABSTRACT: In this paper we present a novel method to increase the spatial resolution of depth images. We combine a deep fully convolutional network with a non-local variational method in a deep primal-dual network. The joint network computes a noise-free, high-resolution estimate from a noisy, low-resolution input depth map. Additionally, a high-resolution intensity image is used to guide the reconstruction in the network. By unrolling the optimization steps of a first-order primal-dual algorithm and formulating it as a network, we can train our joint method end-to-end. This not only enables us to learn the weights of the fully convolutional network, but also to optimize all parameters of the variational method and its optimization procedure. The training of such a deep network requires a large dataset for supervision. Therefore, we generate high-quality depth maps and corresponding color images with a physically based renderer. In an exhaustive evaluation we show that our method outperforms the state-of-the-art on multiple benchmarks.
    Article · Jul 2016
    • "The computational architecture of GPC is similar to decision forests [10]. Decision trees have been widely used in various fields of computer vision, such as pose estimation [28], image denoising [14], image classification [5], object detection [21], depth estimation [13, 15], etc. However, unlike all these applications, our method does not require classification or regression labels. "
    [Show abstract] [Hide abstract] ABSTRACT: This paper proposes a novel extremely efficient, fully-parallelizable, task-specific algorithm for the computation of global point-wise correspondences in images and videos. Our algorithm, the Global Patch Collider, is based on detecting unique collisions between image points using a collection of learned tree structures that act as conditional hash functions. In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures. The split functions stored at the intermediate nodes of the trees are trained to ensure that only visually similar patches or their geometric or photometric transformed versions fall into the same leaf node. The matching process involves passing all pixel positions in the images under analysis through the tree structures. We then compute matches by isolating points that uniquely collide with each other {\em ie.} fell in the same empty leaf in multiple trees. Our algorithm is linear in the number of pixels but can be made constant time on a parallel computation architecture as the tree traversal for individual image points is decoupled. We demonstrate the efficacy of our method by using it to perform optical flow matching and stereo matching on some challenging benchmarks. Experimental results show that not only is our method extremely computationally efficient, but it is also able to match or outperform state of the art methods that are much more complex.
    Full-text · Conference Paper · Jun 2016
    • "Random forests [6] is a well-known decision tree based classifier ensemble. It has been widely used in many computer vision problems recently, such as image denoising [8], edge detection [7], image [37] and body parts [39] classification , pose estimation [16], object tracking [40], etc. In our work, the random forest has been strategically designed to decompose the multi-modal inter-camera transformation into simple and independent uni-modal transforms. "
    Full-text · Conference Paper · Mar 2016
Show more

Recommended publications

Discover more