Conference Paper

Binary patterns for shape description in RGB-D object registration

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this paper, we tackle the challenge of defining an efficient local descriptor and a repeatable keypoint detector. The former is addressed by generalizing the Shape Binary Patterns (SBP) that were first proposed in [37]. The original design of the SBP descriptor was oriented to segmented objects. ...
... The underlying problem in this approach is the need to find the N points in a specific position on the sphere of radius R. In general, the inherent occlusions of RGB-D images (and 3D models) would render this task impossible. To overcome this problem, in [37] we proposed the Shape Binary Pattern (SBP), a binary descriptor that encodes the presence of points in the local neighborhood of p c . That version included an automatic radius calculation for the local neighborhood of p c that worked for the specific task of object registration, when the object had been previously segmented. ...
Article
Full-text available
Many of the research problems in robot vision involve the detection of keypoints, areas with salient information in the input images and the generation of local descriptors, that encode relevant information for such keypoints. Computer vision solutions have recently relied on Deep Learning techniques, which make extensive use of the computational capabilities available. In autonomous robots, these capabilities are usually limited and, consequently, images cannot be processed adequately. For this reason, some robot vision tasks still benefit from a more classic approach based on keypoint detectors and local descriptors. In 2D images, the use of binary representations for visual tasks has shown that, with lower computational requirements, they can obtain a performance comparable to classic real-value techniques. However, these achievements have not been fully translated to 3D images, where research is mainly focused on real-value approaches. Thus, in this paper, we propose a keypoint detector and local descriptor based on 3D binary patterns. The experimentation demonstrates that our proposal is competitive against state-of-the-art techniques, while its processing can be performed more efficiently.
Conference Paper
Detection of keypoints in an image is a crucial step in most registration and recognition tasks. The information encoded in RGB-D images can be redundant and, usually, only specific areas in the image are useful for the classification process. The process of identifying those relevant areas is known as keypoint detection. The use of keypoints can facilitate the following stages in the image processing process by reducing the search space. To properly represent an image by means of a set of keypoints, properties like repeatability and distinctiveness have to be fullfilled. In this work, we propose a keypoint detection technique based on the Shape Binary Pattern (SBP) descriptor that can be computed from RGB-D images. Next, we rely on this method to identify the most discriminative patterns that are used to detect the most relevant keypoint. Experiments on a well-know benchmark for 3D keypoint detection have been performed to assess our proposal.
Conference Paper
Full-text available
In this paper, we present a novel and original framework for computing Local Binary Pattern (LBP)-like patterns on a triangular mesh manifold. This framework, dubbed mesh-LBP can be adapted to all the LBP variants employed in 2D image analysis. As such, it allows extending the related techniques to mesh surfaces. First, we describe the foundations, the construction and the features of the mesh-LBP. In the experiments, we first show evidence of the presence of the ``uniformity'' aspect in the mesh-LBP patterns. Then, we show the mesh-LBP repeatability across different instances of same objects, %and the potentials for analyzing a variety of triangular mesh surfaces, reporting also the application of mesh-LBP to the problem of 3D texture-classification in comparison to standard 3D surface descriptors.
Conference Paper
Full-text available
Several visual feature extraction algorithms have recently appeared in the literature, with the goal of reducing the computational complexity of state-of-the-art solutions (e.g., SIFT and SURF). Therefore, it is necessary to evaluate the performance of these emerging visual descriptors in terms of processing time, repeatability and matching accuracy, and whether they can obtain competitive performance in applications such as image retrieval. This paper aims to provide an up-to-date detailed, clear, and complete evaluation of local feature detector and descriptors, focusing on the methods that were designed with complexity constraints, providing a much needed reference for researchers in this field. Our results demonstrate that recent feature extraction algorithms, e.g., BRISK and ORB, have competitive performance requiring much lower complexity and can be efficiently used in low-power devices.
Conference Paper
Full-text available
Local feature detectors and descriptors are widely used in many computer vision applications and various methods have been proposed during the past decade. There have been a number of evaluations focused on various aspects of local features, matching accuracy in particular, however there has been no comparisons considering the accuracy and speed trade-offs of recent extractors such as BRIEF, BRISK, ORB, MRRID, MROGH and LIOP. This paper provides a performance evaluation of recent feature detectors and compares their matching precision and speed in randomized kd-trees setup as well as an evaluation of binary descriptors with efficient computation of Hamming distance.
Conference Paper
Full-text available
Neuroimaging has shown great potential for the computer-aided diagnosis (CAD) of both Alzheimer's disease (AD) and Mild Cognitive Impairment (MCI). However, the texture of such images has been little explored. In this paper, we explore the discriminative power of the Local Binary Patterns (LBPs) texture descriptor to diagnose AD and MCI from 3D brain images. For this purpose, we propose a novel extension of LBPs to full 3D data that, unlike previous approaches, makes no approximations to the underlying concepts of uniformity and rotation invariance. Experimental results obtained using FDG-PET images from the Alzheimer's Disease Neuroimaging Initiative (ADNI) showed that the new feature was able to improve the system's performance when compared to the raw Voxel Intensities (VI) and to the standard 2D LBP version applied to axial cuts of the PET volume.
Conference Paper
Full-text available
A large number of vision applications rely on match- ing keypoints across images. The last decade featured an arms-race towards faster and more robust keypoints and association algorithms: Scale Invariant Feature Trans- form (SIFT), Speed-up Robust Feature (SURF), and more recently Binary Robust Invariant Scalable Keypoints (BRISK) to name a few. These days, the deployment of vision algorithms on smart phones and embedded de- vices with low memory and computation complexity has even upped the ante: the goal is to make descriptors faster to compute, more compact while remaining robust to scale, rotation and noise. To best address the current requirements, we propose a novel keypoint descriptor inspired by the human visual sys- tem and more precisely the retina, coined Fast Retina Key- point (FREAK). A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sam- pling pattern. Our experiments show that FREAKs are in general faster to compute with lower memory load and also more robust than SIFT, SURF or BRISK. They are thus com- petitive alternatives to existing keypoints in particular for embedded applications.
Article
Full-text available
In this article, we describe efficient methods for tackling everyday mobile manipulation tasks that require object pick-up. In order to achieve real-time performance in complex environments, we focus our approach on fast yet robust solutions. For 3D perception of objects on planar surfaces, we develop scene segmentation methods that process depth images in real-time at high frame rates. We e�ciently plan feasible, collision-free grasps for the segmented objects directly from the perceived point clouds to achieve fast execution times. We evaluate our approaches quantitatively in lab experiments and also report on the successful integration of our methods in public demonstrations at RoboCup@Home competitions in 2011 and 2012.
Conference Paper
Full-text available
This paper presents a new approach for recognition of 3D objects that are represented as 3D point clouds. We introduce a new 3D shape descriptor called Intrinsic Shape Signature (ISS) to characterize a local/semi-local region of a point cloud. An intrinsic shape signature uses a view-independent representation of the 3D shape to match shape patches from different views directly, and a view-dependent transform encoding the viewing geometry to facilitate fast pose estimation. In addition, we present a highly efficient indexing scheme for the high dimensional ISS shape descriptors, allowing for fast and accurate search of large model databases. We evaluate the performance of the proposed algorithm on a very challenging task of recognizing different vehicle types using a database of 72 models in the presence of sensor noise, obscuration and scene clutter.
Conference Paper
Full-text available
Our purpose is to extend the Local Binary Pattern method to three dimensions and compare it with the two-dimensional model for three-dimensional texture analysis. To compare these two methods, we made classification experiments using three databases of three-dimensional texture images having different properties. The first database is a set of three-dimensional images without any distorsion or transformation, the second contains additional gaussian noise. The last one contains similar textures as the first one but with random rotations according x, y and z axis. For each of these databases, the three-dimensional Local Binary Pattern method outperforms the two-dimensional approach which has more difficulties to provide correct classifications.
Conference Paper
Full-text available
This paper deals with local 3D descriptors for surface matching. First, we categorize existing methods into two classes: Signatures and Histograms. Then, by discussion and experiments alike, we point out the key issues of unique- ness and repeatability of the local reference frame. Based on these observations, we formulate a novel comprehensive proposal for surface representation, which encompasses a new unique and repeatable local reference frame as well as a new 3D descriptor. The latter lays at the intersection between Signatures and His- tograms, so as to possibly achieve a better balance between descriptiveness and robustness. Experiments on publicly available datasets as well as on range scans obtained with Spacetime Stereo provide a thorough validation of our proposal.
Conference Paper
Full-text available
Motivated by the increasing availability of 3D sensors capable of delivering both shape and texture information, this paper presents a novel descriptor for feature matching in 3D data enriched with texture. The proposed approach stems from the theory of a recently proposed descriptor for 3D data which relies on shape only, and represents its generalization to the case of multiple cues associated with a 3D mesh. The proposed descriptor, dubbed CSHOT, is demonstrated to notably improve the accuracy of feature matching in challenging object recognition scenarios characterized by the presence of clutter and occlusions.
Conference Paper
Full-text available
Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.
Conference Paper
Full-text available
Effective and efficient generation of keypoints from an image is a well-studied problem in the literature and forms the basis of numerous Computer Vision applications. Established leaders in the field are the SIFT and SURF algorithms which exhibit great performance under a variety of image transformations, with SURF in particular considered as the most computationally efficient amongst the high-performance methods to date. In this paper we propose BRISK1, a novel method for keypoint detection, description and matching. A comprehensive evaluation on benchmark datasets reveals BRISK's adaptive, high quality performance as in state-of-the-art algorithms, albeit at a dramatically lower computational cost (an order of magnitude faster than SURF in cases). The key to speed lies in the application of a novel scale-space FAST-based detector in combination with the assembly of a bit-string descriptor from intensity comparisons retrieved by dedicated sampling of each keypoint neighborhood.
Conference Paper
Full-text available
Recognizing and manipulating objects is an im- portant task for mobile robots performing useful services in everyday environments. In this paper, we develop a system that enables a robot to grasp an object and to move it in front of its depth camera so as to build a 3D surface model of the object. We derive an information gain based variant of the next best view algorithm in order to determine how the manipulator should move the object in front of the camera. By considering occlusions caused by the robot manipulator, our technique also determines when and how the robot should re-grasp the object in order to build a complete model.
Conference Paper
Full-text available
Over the last decade, the availability of public image repositories and recognition benchmarks has enabled rapid progress in visual object category and instance detection. Today we are witnessing the birth of a new generation of sensing technologies capable of providing high quality synchronized videos of both color and depth, the RGB-D (Kinect-style) camera. With its advanced sensing capabilities and the potential for mass adoption, this technology represents an opportunity to dramatically increase robotic object recognition, manipulation, navigation, and interaction capabilities. In this paper, we introduce a large-scale, hierarchical multi-view object dataset collected using an RGB-D camera. The dataset contains 300 objects organized into 51 categories and has been made publicly available to the research community so as to enable rapid progress based on this promising technology. This paper describes the dataset collection procedure and introduces techniques for RGB-D based object recognition and detection, demonstrating that combining color and depth information substantially improves quality of results.
Conference Paper
Full-text available
In this paper we address the topic of feature extraction in 3D point cloud data for object recognition and pose identification. We present a novel interest keypoint extraction method that operates on range images generated from arbitrary 3D point clouds, which explicitly considers the borders of the objects identified by transitions from foreground to background. We furthermore present a feature descriptor that takes the same information into account. We have implemented our approach and present rigorous experiments in which we analyze the individual components with respect to their repeatability and matching capabilities and evaluate the usefulness for point feature based object detection methods.
Conference Paper
Full-text available
With the advent of new, low-cost 3D sensing hardware such as the Kinect, and continued efforts in advanced point cloud processing, 3D perception gains more and more importance in robotics, as well as other fields. In this paper we present one of our most recent initiatives in the areas of point cloud perception: PCL (Point Cloud Library - http://pointclouds.org). PCL presents an advanced and extensive approach to the subject of 3D perception, and it's meant to provide support for all the common 3D building blocks that applications need. The library contains state-of- the art algorithms for: filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation. PCL is supported by an international community of robotics and perception researchers. We provide a brief walkthrough of PCL including its algorithmic capabilities and implementation strategies.
Conference Paper
Full-text available
Consumer depth cameras, such as the Microsoft Kinect, are capable of providing frames of dense depth values at real time. One fundamental question in utilizing depth cameras is how to best extract features from depth frames. Motivated by local descriptors on images, in particular kernel descriptors, we develop a set of kernel features on depth images that model size, 3D shape, and depth edges in a single framework. Through extensive experiments on object recognition, we show that (1) our local features capture different aspects of cues from a depth frame/view that complement one another; (2) our kernel features significantly outperform traditional 3D features (e.g. Spin images); and (3) we significantly improve the capabilities of depth and RGB-D (color+depth) recognition, achieving 10–15% improvement in accuracy over the state of the art.
Article
Full-text available
We propose to use binary strings as an efficient feature point descriptor, which we call BRIEF. We show that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L2 norm as is usually done. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and U-SURF on standard benchmarks and show that it yields a similar or better recognition performance, while running in a fraction of the time required by either.
Article
This paper presents a local 3D descriptor for surface matching dubbed SHOT. Our proposal stems from a taxonomy of existing methods which highlights two major approaches, referred to as Signatures and Histograms, inherently emphasizing descriptiveness and robustness respectively. We formulate a comprehensive proposal which encompasses a repeatable local reference frame as well as a 3D descriptor, the latter featuring an hybrid structure between Signatures and Histograms so as to aim at a more favorable balance between descriptive power and robustness. A quite peculiar trait of our method concerns seamless integration of multiple cues within the descriptor to improve distinctiveness, which is particularly relevant nowadays due to the increasing availability of affordable RGB-D sensors which can gather both depth and color information. A thorough experimental evaluation based on datasets acquired with different types of sensors, including a novel RGB-D dataset, vouches that SHOT outperforms state-of-the-art local descriptors in experiments addressing descriptor matching for object recognition, 3D reconstruction and shape retrieval.
Conference Paper
There has been growing interest in the use of binary-valued features, such as BRIEF, ORB, and BRISK for efficient local feature matching. These binary features have several advantages over vector-based features as they can be faster to compute, more compact to store, and more efficient to compare. Although it is fast to compute the Hamming distance between pairs of binary features, particularly on modern architectures, it can still be too slow to use linear search in the case of large datasets. For vector-based features, such as SIFT and SURF, the solution has been to use approximate nearest-neighbor search, but these existing algorithms are not suitable for binary features. In this paper we introduce a new algorithm for approximate matching of binary features, based on priority search of multiple hierarchical clustering trees. We compare this to existing alternatives, and show that it performs well for large datasets, both in terms of speed and memory efficiency.
Conference Paper
”Local Binary Patterns” (LBP) [7] have been estab- lished as a standard feature based method for 2D image analysis. LBP have been successfully applied to a wide range of different applications from texture analysis [7] to face recognition [8]. Various extensions to the basic LBP algorithms were published in recent years, in- cluding rotation invariant and computationally efficient ”uniform binary patterns” (fuLBP) - a comprehensive overview can be found in [7]. In this paper, we extend the original LBP from 2D images to 3D volume data. We also generalize the rotation ...
Conference Paper
In this paper, we present a novel method for the fast compu- tation of rotational invariant "uniform local binary patte rns" (uLBP) for texture analysis on 3D volume data. We introduce an alternative computation method for uLBPs in frequency space, which allows to effectively approximate uniform patterns in 2D and 3D in order to avoid the enor- mous computational complexity of a "naive" 3D LBP im- plementation.
Article
In view-based 3D object retrieval and recognition, each object is described by multiple views and a central problem is how to estimate the distance between two objects. Most conventional methods integrate the distances of view pairs across two objects as an estimation of their distance. In this paper, we propose a discriminative probabilistic object modeling approach. It builds probabilistic models for each object based on the distribution of its views, and the distance between two objects is defined as the upper bound of the KL divergence of the corresponding probabilistic models. 3D object retrieval and recognition is accomplished based on the distance measures. We first learn models for each object by the adaptation from a set of global models with a maximum likelihood principle. A further adaption step is then performed to enhance the discriminative ability of the models. We conduct experiments on the ETH 3D object dataset, the National Taiwan University 3D model dataset, and the Princeton Shape Benchmark. We compare our approach with different methods, and experimental results demonstrate the superiority of our approach.
Article
Many modern data analysis methods involve computing a matrix singular value decomposition (SVD) or eigenvalue decomposition (EVD). Principal component analysis is the time-honored example, but more recent applications include latent semantic indexing (LSI), hypertext induced topic selection (HITS), clustering, classification, etc. Though the SVD and EVD are well established and can be computed via state-of-the-art algorithms, it is not commonly mentioned that there is an intrinsic sign indeterminacy that can significantly impact the conclusions and interpretations drawn from their results. Here we provide a solution to the sign ambiguity problem and show how it leads to more sensible solutions. Copyright © 2008 John Wiley & Sons, Ltd.
Conference Paper
In our recent work [1], [2], we proposed Point Feature Histograms (PFH) as robust multi-dimensional features which describe the local geometry around a point p for 3D point cloud datasets. In this paper, we modify their mathematical expressions and perform a rigorous analysis on their robustness and complexity for the problem of 3D registration for overlapping point cloud views. More concretely, we present several optimizations that reduce their computation times drastically by either caching previously computed values or by revising their theoretical formulations. The latter results in a new type of local features, called Fast Point Feature Histograms (FPFH), which retain most of the discriminative power of the PFH. Moreover, we propose an algorithm for the online computation of FPFH features for realtime applications. To validate our results we demonstrate their efficiency for 3D registration and propose a new sample consensus based method for bringing two datasets into the convergence basin of a local non-linear optimizer: SAC-IA (SAmple Consensus Initial Alignment).
Conference Paper
For many computer vision problems, the most time consuming component consists of nearest neighbor match- ing in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with only minimal guidance on selecting an algorithm and its parameters for any given problem. In this paper, we describe a system that answers the question, "What is the fastest approximate nearest-neighbor algorithm for my data?" Our system will take any given dataset and desired degree of precision and use these to automatically determine the best algorithm and parameter values. We also describe a new algorithm that applies priority search on hierarchical k-means trees, which we have found to provide the best known performance on many datasets. After testing a range of alternatives, we have found that multiple randomized k-d trees provide the best performance for other datasets. We are releasing public domain code that implements these approaches. This library provides about one order of magnitude improvement in query time over the best previously available software and provides fully automated parameter selection.
Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases which do not capture the full variety of scene categories. Whereas standard databases for object categorization contain hundreds of different classes of objects, the largest available dataset of scene categories contains only 15 classes. In this paper we propose the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images. We use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. We measure human scene classification performance on the SUN database and compare this with computational methods. Additionally, we study a finer-grained scene representation to detect scenes embedded inside of larger scenes.
Article
Environment models serve as important resources for an autonomous robot by providing it with the necessary task-relevant information about its habitat. Their use enables robots to perform their tasks more reliably, flexibly, and efficiently. As autonomous robotic platforms get more sophisticated manipulation capabilities, they also need more expressive and comprehensive environment models: for manipulation purposes their models have to include the objects present in the world, together with their position, form, and other aspects, as well as an interpretation of these objects with respect to the robot tasks. The dissertation presented in this article (Rusu, PhD thesis, 2009) proposes Semantic 3D Object Models as a novel representation of the robot’s operating environment that satisfies these requirements and shows how these models can be automatically acquired from dense 3D range data.
Article
This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features. The face image is divided into several regions from which the LBP feature distributions are extracted and concatenated into an enhanced feature vector to be used as a face descriptor. The performance of the proposed method is assessed in the face recognition problem under different challenges. Other applications and several extensions are also discussed.
Article
Dynamic texture (DT) is an extension of texture to the temporal domain. Description and recognition of DTs have attracted growing attention. In this paper, a novel approach for recognizing DTs is proposed and its simplifications and extensions to facial image analysis are also considered. First, the textures are modeled with volume local binary patterns (VLBP), which are an extension of the LBP operator widely used in ordinary texture analysis, combining motion and appearance. To make the approach computationally simple and easy to extend, only the co-occurrences of the local binary patterns on three orthogonal planes (LBP-TOP) are then considered. A block-based method is also proposed to deal with specific dynamic events such as facial expressions in which local information and its spatial locations should also be taken into account. In experiments with two DT databases, DynTex and Massachusetts Institute of Technology (MIT), both the VLBP and LBP-TOP clearly outperformed the earlier approaches. The proposed block-based method was evaluated with the Cohn-Kanade facial expression database with excellent results. The advantages of our approach include local processing, robustness to monotonic gray-scale changes, and simple computation.
Conference Paper
Accurate localization of representative points of a face is crucial to many face analysis and synthesis problems. Active shape model (ASM) is a powerful statistical tool for face alignment. However, it suffers from variations of pose, illumination and expressions. In this paper, we analyze the mechanism of active shape model and realize that the ability of normal profiles to describe the local appearance pattern is very limited. For efficient appearance pattern representation, the local binary pattern is used and extended to describe the local patterns of facial key points. For the purpose of retaining the spatial information, sub-images of key points are divided into several regions, which are combined to define the extended local binary pattern (ELBP) histogram. Then we propose an improved ASM method framework, ELBP-ASM, in which local appearance patterns of key points are modelled using extended local binary pattern. Experimental results demonstrate that ELBP-ASM achieves more accurate results compared with original method used in ASM.
Article
Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns
Article
We present a 3D shape-based object recognition system for simultaneous recognition of multiple objects in scenes containing clutter and occlusion. Recognition is based on matching surfaces by matching points using the spin image representation. The spin image is a data level shape descriptor that is used to match surfaces represented as surface meshes. We present a compression scheme for spin images that results in efficient multiple object recognition which we verify with results showing the simultaneous recognition of multiple objects from a library of 20 models. Furthermore, we demonstrate the robust performance of recognition in the presence of clutter and occlusion through analysis of recognition trials on 100 scenes