Article

Accurate three-dimensional pose recognition from monocular images using template matched filtering

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An accurate algorithm for three-dimensional (3-D) pose recognition of a rigid object is presented. The algorithm is based on adaptive template matched filtering and local search optimization. When a scene image is captured, a bank of correlation filters is constructed to find the best correspondence between the current view of the target in the scene and a target image synthesized by means of computer graphics. The synthetic image is created using a known 3-D model of the target and an iterative procedure based on local search. Computer simulation results obtained with the proposed algorithm in synthetic and real-life scenes are presented and discussed in terms of accuracy of pose recognition in the presence of noise, cluttered background, and occlusion. Experimental results show that our proposal presents high accuracy for 3-D pose estimation using monocular images.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To solve the pose estimation problem, the correlation filtering technique can be extended into the three-dimensional search problem. This can be done using a bank of filters, in which each filter is designed with information of the target in a single pose configuration [33]. Here, the goal is to find the best filter that gives the highest matching value in the correlation process. ...
... The object of interest is observed by a monocular camera which captures a frame from the real-scene world. The captured frame consists of a projection mapping of the 3D scene into an image plane [33], given by R 3 → R 2 . The input image f (x) is represented by the following signal model [19]: ...
... In this process, we employed correlation filtering to measure the correspondence between the input scene from the observation stage and the target reference. We take advantage of the accuracy response of correlation filters on challenging image conditions like noise, background clutter, and occlusions [33]. The processing of a bank of correlation filters can be described in three detailed subsystems: template generation, correlation filtering, and an evaluation stage. ...
Article
Full-text available
An accurate method based on evolutionary correlation filtering to solve pose estimation of highly occluded targets is presented. The proposed method performs multiple correlation operations between an input scene and a bank of filters designed in frequency-domain. Each filter is computed with statistical parameters of a real-world scene and a template that contains information of the target in a single pose parameter configuration. A vast set of templates is generated from multiple views of a three-dimensional model of the target, which are created synthetically with computer graphics. An evolutionary approach in the bank of filter construction for optimizing the pose estimation parameters is implemented. The evolutionary computation technique based on a pseudo-bacterial genetic algorithm yields high estimation accuracy finding the best filter that produces the highest matching score. The proposed evolutionary correlation filtering yields good convergence of the bank of filter optimization, which produces a reduction of the number of computational operations. Experimental results demonstrate the robustness of the proposed method in terms of detection performance and pose estimation of highly occluded targets compared with state-of-the-art methods.
... Template matching based on correlation filters can be used to solve 3D pose estimation of rigid objects [13]. Moreover, the pose estimation problem can be modeled as a search problem, in which the goal is to find the reference target view that gives the best match between the actual view of the target in the scene [14]. ...
... The design and construction of correlation filters is a high-dimensional complex problem, thus finding an optimal solution will require a very large search space. This strategy presented a narrow exploration space using a local search algorithm [13]. In this strategy it was assumed that the object would appear with smooth pose transitions through the video frames. ...
... The additive noise denoted by (x) is given by a zero-mean Gaussian distribution process. Moreover, Γ in (1) is a transformation matrix that involves the appearance modifications of the target [13] related to scaling S and rotation R (with , , orientation parameters). Hence, Γ = S R is related to the space Π ∈ R 3 . ...
Article
Full-text available
In this paper, we propose an evolutionary correlation filtering approach for solving pose estimation in noncontinuous video sequences. The proposed algorithm computes the linear correlation between the input scene containing a target in an unknown environment and a bank of matched filters constructed from multiple views of the target and estimates of statistical parameters of the scene. An evolutionary approach for finding the optimal filter that produces the highest matching score in the correlator is implemented. The parameters of the filter bank evolve through generations to refine the quality of pose estimation. The obtained results demonstrate the robustness of the proposed algorithm in challenging image conditions such as noise, cluttered background, abrupt pose changes, and motion blur. The performance of the proposed algorithm yields high accuracy in terms of objective metrics for pose estimation in noncontinuous video sequences.
... The panoramic high-precision deformation measurements of the object surface and 3D reconstruction have received great attention with the development of production and manufacturing [1][2][3]. The related technology based on machine vision has the characteristics of non-contact and high precision, making it widely used in 3D full-field measurement [4][5][6][7]. Although vision measurement technology has been widely used in various fields, the limited field of view of cameras and the calibration of multicamera are the main obstacles restricting the development of panoramic measurement. ...
Article
Full-text available
Panoramic dynamic and static measurements of objects in the application of vision measurement are difficult due to the constraints of a camera field of view and multicamera calibration technology. This paper proposes a universal global calibration method for ring multicamera systems based on rotating target and multi-view vision technology. This method uses a rotating target to establish the relationship between ring multi-camera arrays, retrieves the coordinates of the target corners from the fields of view of different cameras, and combines them with the rotation angle to complete the coordinate unification of the system. The unification of coordinates is unaffected by the overlapping fields of view between cameras, and the number of cameras can be configured arbitrarily. The calibration accuracy, validity, and precision of the proposed method are verified through reprojection error, dynamic tensile test, and 3D reconstruction.
... The depth information of the circle can be recovered incompletely; finally, the pose of the target can be estimated by single projection [13]. For breaking through the above-mentioned restriction of methods based on cooperative targets, the object's own features can be used to calculate pose [14]. Leng et al. proposed a rigid object pose-estimation method based on an object's contour and non-Euclidean multifeature distance map, which solves the pose estimation problem and the feature projection correspondence problem simultaneously [15]. ...
Article
Full-text available
A novel algorithm of dynamic pose estimation for monocular visual sensor is proposed in this paper. The sensor is principally composed of two 1D turntables, one collimated laser, and one industrial camera. In particular, the proposed algorithm is suitable for the cases of uncooperative targets. By analyzing the motion of a laser beam based on quaternion, the functional detection algorithm is derived from the position information of multiple scanning points. Furthermore, the depth recovery based on a nonparametric model is a key step in the pose calculation, which is unnecessary to make use of the calibration parameters of an industrial camera. It is, however, effective to avoid the influence of camera distortion and calibration error. After establishing a test platform, simulation and experiments for pose estimation are carried out. The experimental results show that the maximum error is 0.98° at a range of 500 mm, which proves that the proposed algorithm is accurate and effective.
... Another approach based on template matching filters has been proposed to solve 3D pose of an object: by generating a set of synthetic images of 3D model of the object as reference templates, a high matching score when the input and reference images are very similar. Given a known 3D model of target, this approach estimates its locations and orientation parameters by maximizing frequency response between the input and the current reference images [9,10]. The input image is globally processed instead of processing only local feature, and it yields high accuracy of 3D pose estimation in comparison with the existing approaches based on segmentation in a challenging environment. ...
Article
Full-text available
Abstract Many objects in real world have circular feature. It is a difficult task to obtain the 2D-3D pose estimation using circular feature in challenging scenarios. This paper proposes a method to incorporate elliptic shape prior for object pose estimation using a level set method. The relationship between the projection of the circular feature of a 3D object and the signed distance function corresponding to it is analyzed to yield a 2D elliptic shape prior. The method employs the combination of the grayscale histogram, the intensities of edge, and the smoothness distribution as main image feature descriptors that define the image statistical measure model. The elliptic shape prior combined with the image statistical measure energy model drives the elliptic shape contour to the projection of the circular feature of the 3D object with the current pose into the image plane. These works effectively reduce the impacts of the challenging scenarios on the pose estimate results. In addition, the method utilizes particle filters that take into account the motion dynamics of the object among scene frames, and this work provides the robust method for object 2D-3D pose estimation using circular feature in a challenging environment. Various numerical experiments are illustrated to show the performance and advantages of the proposed method.
... The above mentioned approaches are widely used in human faces recognition. [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24] In the proposed paper, first we covert a 3D scan to the "canonical form". Since real 3D scans contain holes and boundary noise, we eliminate them. ...
... One is to calculate the pose by searching for the corresponding relation of some feature projection from three to two dimensions [Bai and Junkins (2016); Conway and Daniele (2016); Zhang, Liu and Jiang (2015); Xia, Xu and Xiong (2012)]. The other is based on a model, which uses 3D model retrieval technology to estimate the pose, which avoids establishing complex projection relations, but needs to build a large and complex model database [Labibian, Alikhani and Pourtakdoust (2017); Picos, Diaz-Ramirez, Kober et al. (2016); Tang, Wen, Ma et al. (2011); Shan, Ji and Zhou (2009)]. The accuracy of the match is limited by the accuracy of the established model base. ...
Article
With the development of adaptive optics and post restore processing techniques, large aperture ground-based telescopes can obtain high-resolution images (HRIs) of targets. The pose of the space target can be estimated from HRIs by several methods. As the target features obtained from the image are unstable, it is difficult to use existing methods for pose estimation. In this paper a method based on real-time target model matching to estimate the pose of space targets is proposed. First, the physically-constrained iterative deconvolution algorithm is used to obtain HRIs of the space target. Second, according to the 3D model, the ephemeris data, the observation time of the target, and the optical parameters of the telescope, the simulated observation image of the target in orbit is rendered by a scene simulation program. Finally, the target model searches through yaw, pitch, and roll until the correlation between the simulated observation image and the actual observation image shows an optimal match. The simulation results show that the proposed pose estimation method can converge to the local optimal value with an estimation error of about 1.6349°.
... Building dense 3D maps of environments is an important task for mobile robotics, with applications in navigation, manipulation, semantic mapping, face recognition, and telepresence. [9][10][11][12][13] Other applications are also found in augmented reality applications, surveillance systems, medical applications, etc. ...
... 1. Searching of corresponding points (pairs) in two clouds; 2. Minimizing the error metric (variational subproblem of the ICP). [3][4][5][6][7][8][9][10][11][12] _____________________________ Further author information: ...
... The principle of PCA is as follows: three eigenvectors corresponding to three maximum eigenvalues are selected as the axis of a new coordinate system. Point cloud data are converted to the new coordinate system by transforming coordinate, so as to realize position calibration [4,12,15]. Position calibration consists of the following steps: ...
Article
Full-text available
Aiming to realize rapid and efficient three-dimensional (3D) identification of substation equipment, this article proposes a new method in which the 3D identification of substation equipment is based on K-nearest neighbor (KNN) classification of subspace feature vector. First of all, the article uses octree encoding to reduce and denoise the point cloud data obtained by a 3D laser scanner. Secondly, position calibration and size standardization are used for the point cloud after pretreatment. Then, the normalized point cloud is divided into a number of cubes with same size. The cosine of the angle between the positive direction of
... A surface is generated from the reconstructed 3D face model. 6. Template matching is computed with a bank of several GMF filters [5]. Then, the cross correlation between input scene f (x , y ) and the current filter H(µ, ν) is computed by ...
Poster
Full-text available
Recent computer vision applications require the modeling and synthesis of realistic human faces. 3D face digitization is a challenging task due to non-uniform illumination, image distortions, and noise. Fringe-projection is a reliable method for 3D digitization of human faces. This work presents the usefulness of the fringe-projection method in the field of face recognition.
... We proposed a correlation filtering approach to solve the location estimation of a target which is embedded in a 3D scene with an unknown pose. Correlation filtering has been prove good accuracy in detection performance [10]. In this approach an appropiate filter design is needed. ...
Conference Paper
Full-text available
In this paper we solve three-dimensional object detection using a correlation filtering approach. The input data comes from a point cloud scene digitalization given by a digital scanner. The detection system employs a filtering design based on local statistical parameters of the input scene. The position of the target is estimated with a maximization function of the output correlation between the input scene and the designed filter. The proposed algorithm yields good accuracy in terms of location error and detection performance of the correlation filter.
Article
Full-text available
An efficient algorithm for registration of two non-rigid objects based on geometrical transformation of the template object to target object is proposed. The transformation is considered as warping of the template onto the target. To choose the most suitable transformation from all possible warps, a registration algorithm should satisfy deformation constraints referred to as regularization of non-rigid objects. In this work, we use variational functionals for affine transformations. With the help of computer simulation, the proposed method for searching the optimal geometrical transformation is compared with that of common algorithms.
Article
Full-text available
Basically, the I CP algorithm steps are as follows: for each point i n the first set, match the closest point in the second set; estimate mapping parameters using the RMS cost function; transform points using estimated parameters; perform multiple iterations (reconnecting points, and so on). In this paper a preliminary data-thinning algorithm is proposed. It can help to speed up all ICP algorithm stages by reducing the amount of input data. The proposed algorithm is based on the human perception of objects geometry. A human often analyzes edges and corners of objects and does not pay attention to the inner parts of polygons when comparing two objects and looking for similar parts. The algorithm described in this article begins with a search of planes in point’s cloud. Next, the search for intersection of the found planes is performed in order to extract object edges. Finally, the intersection of the edges help us to get object corners. Further, all points not belonging to the edges and corners are removed from the point cloud. In real objects polygons most often occupy a large part of the object, therefore the proposed algorithm allows to get rid of a large number of insignificant points.
Chapter
An efficient algorithm for path generation in autonomous mobile robots using a visual recognition approach is presented. The proposal includes image filtering techniques by employing an inspecting camera to sense a cluttered environment. Template matching filters are used to detect several environment elements, such as obstacles, feasible terrain, the target location, and the mobile robot. The proposed algorithm includes the parallel evolutionary artificial potential field to perform the path planning for autonomous navigation of the mobile robot. Our problem to be solved for autonomous navigation is to safely take a mobile robot from the starting point to the target point employing the path with the shortest distance and which also contains the safest route. To find the path that satisfies this condition, the proposed algorithm chooses the best candidate solution from a vast number of different paths calculated concurrently. For achieving efficient autonomous navigation, the proposal employs a parallel computation approach for the evolutionary artificial potential field algorithm for path generation and optimization. Experimental results yield accuracy in environment recognition in terms of quantitative metrics. The proposed algorithm demonstrates efficiency in path generation and optimization.
Chapter
Full-text available
Fuzzy set proposed by Zadeh states that belongingness of an element in a set is a matter of degree unlike classical set where membership is a matter of affirmation or denial. Fuzzy set theory provides more natural representation for real world problems. Intuitionistic fuzzy set (IFS) is the generalization of fuzzy set, proposed by Atanassov, in 1986 (Fuzzy Sets Syst 20(1):87–96, 1986 [1]). It assigns two values called membership degree and a non-membership degree respectively. Later Florentin Smarandache introduced an additional parameter for neutrality which generalise Intuitionistic Fuzzy Set as Neutrosophic Fuzzy Set (NFS). The speciality lies in the 3D Neutrosophic space where each logical statement is evaluated with 3 components namely truth, falsity and indeterminacy. IFS and NFS revolve around these divisions of degree of belongingness to their component structure and so generate different variations. In this chapter we discuss the properties of these two variants of fuzzy set based on their different extension, propositional calculus, predicate calculus, degree of dependence of each component, geometric representation and various application areas of both the sets.
Article
Full-text available
In this paper, we estimate the accuracy of 3D object reconstruction using multiple Kinect sensors. First, we discuss the calibration of multiple Kinect sensors, and provide an analysis of the accuracy and resolution of the depth data. Next, the precision of coordinate mapping between sensors data for registration of depth and color images is evaluated. We test a proposed system for 3D object reconstruction with four Kinect V2 sensors and present reconstruction accuracy results. Experiments and computer simulation are carried out using Matlab and Kinect V2. © 2019, Institution of Russian Academy of Sciences. All rights reserved.
Article
Full-text available
This research presents an algorithm for three-dimensional (3-D) pose tracking of a rigid object by processing sequences of monocular images. The pose trajectory of the object is estimated by performing linear correlation between the current scene and a filter bank constructed from different views of a 3-D model of the target, which are created synthetically with computer graphics. The pose tracking is guided by particle filters that dynamically adapt the filter bank by taking into account the kinematics of the target in the scene. Experimental results obtained with the proposed algorithm in processing synthetic and real images are presented and discussed. These results show that the proposed algorithm achieves a higher accuracy of pose tracking in terms of objective metrics, in comparison with that of existing similar algorithms.
Chapter
Object recognition is a widely studied problem in computer vision. Template matching with correlation filters is one of the most accurate strategies for target recognition. However, it is computationally expensive, particularly when there is no restriction in the pose of the object of interest and an exhaustive search is implemented. This work proposes the use of a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) for post-processing template matched filters. The proposed strategy searches for the best template matching guided by the discrimination capability of a correlation-based filter, considering a vast set of filters. CMA-ES is used to find the best match and determine the correct pose or orientation parameters of a target object. The proposed method demonstrates that CMA-ES is effective for multidimensional problems in a huge search space, which makes it a suitable candidate for target recognition in unconstrained applications. Experimental results show high efficiency in terms of the number of function evaluations and locating the correct pose parameters based on the DC measure.
Conference Paper
A reliable approach for object segmentation based on template-matching filters is proposed. The system employs an adaptive strategy for the generation of space-variant filters which take into account several versions of the target and local statistical properties of the input scene. Moreover, the proposed method considers the geometric modifications of the target while is moving through a video sequence. The detection accuracy of the matched filter brings the location of the target of interest. The estimated location coordinates are used to compute the support area covered by the target using watershed segmentation technique. In each frame, the filter adapts according the geometrical changes of the target in order to estimate its current support region. Experimental tests carried out in a video sequence show that the proposed system yields a very good performance for accuracy detection, and object segmentation efficiency in real-life scenes.
Conference Paper
Full-text available
A visual approach in environment recognition for robot navigation is proposed. This work includes a template matching filtering technique to detect obstacles and feasible paths using a single camera to sense a cluttered environment. In this problem statement, a robot can move from the start to the goal by choosing a single path between multiple possible ways. In order to generate an efficient and safe path for mobile robot navigation, the proposal employs a pseudo-bacterial potential field algorithm to derive optimal potential field functions using evolutionary computation. Simulation results are evaluated in synthetic and real scenes in terms of accuracy of environment recognition and efficiency of path planning computation. -- See more at: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10395/103950N/Visual-environment-recognition-for-robot-path-planning-using-template-matched/10.1117/12.2273596.full?SSO=1
Conference Paper
In this work, a correlation-based algorithm consisting of a set of adaptive filters for recognition of occluded objects in still and dynamic scenes in the presence of additive noise is proposed. The designed algorithm is adaptive to the input scene, which may contain different fragments of the target, false objects, and background to be rejected. The algorithm output is high correlation peaks corresponding to pieces of the target in scenes. The proposed algorithm uses a bank of composite optimum filters. The performance of the proposed algorithm for recognition partially occluded objects is compared with that of common algorithms in terms of objective metrics.
Conference Paper
The problem of 3D pose recognition of a rigid object is difficult to solve because the pose in a 3D space can vary with multiple degrees of freedom. In this work, we propose an accurate method for 3D pose estimation based on template matched filtering. The proposed method utilizes a bank of space-variant filters which take into account different pose states of the target and local statistical properties of the input scene. The state parameters of location coordinates, orientation angles, and scaling parameters of the target are estimated with high accuracy in the input scene. Experimental tests are performed for real and synthetic scenes. The proposed system yields good performance for 3D pose recognition in terms of detection efficiency, location and orientation errors.
Conference Paper
Full-text available
Computer vision is an important task in robotics applications. This work proposes an approach for autonomous mobile robot navigation using the integration of the template-matching filters for obstacle detection and the evolutionary artificial potential field method for path planning. The recognition system employs a digital camera to sense the environment of a mobile robot. The captured scene is processed by a bank of space variant filters in order to find the obstacles and a feasible area for the robot navigation. The path planning employs evolutionary artificial potential fields to derive optimal potential field functions using evolutionary computation. Simulation results to validate the analysis and implementation are provided; they were specifically made to show the effectiveness and the efficiency of the proposal. http://dx.doi.org/10.1117/12.2237412.
Article
Full-text available
We derive an optimum filter function to detect a target degraded by multiplicative noise and additive overlapping noise and placed in background noise. The filter is designed to maximize the metric peak-to-output energy, which is the ratio of the expected value squared of the output peak at the target position to the expected value of the average output energy. The optimum filter provides improved discrimination as well as robustness to input noise. One advantage of the filter described here over the homomorphic filters is that the additional preprocess on the input image, that is, the input logarithmic operation, is not required for reducing the effects of multiplicative noise. The performance of the filter is examined in terms of discrimination against background noise and robustness to multiplicative and additive input noise. Both multiplicative amplitude noise and multiplicative complex noise are considered.
Article
Full-text available
Accuracy of target position estimation, defined as the variance of location errors, is evaluated when a noisy target is embedded on a nonoverlapping background. It is shown, with some assumptions, that the generalized matched filter minimizes this variance. We also investigate the performance of various correlation filters in terms of location accuracy. Computer simulations are made to compare the results obtained with the generalized matched filter with those of other filters.
Article
Full-text available
Two types of filter are proposed to detect a noisy target embedded in nonoverlapping background noise by optimization of two proposed criteria that are used in the assessment of filter design and performance. Criterion 1 is defined as the ratio of the square of the expected value of the correlation-peak amplitude to the expected value of the output-signal energy. Criterion 2 is defined as the ratio of the square of the expected value of the correlation-peak amplitude to the average output-signal variance. It is shown that, for the nonoverlapping target and scene noise models, the target window and the scene noise window affect the filter functions significantly. Computer-simulation tests of the generalized optimum filter for various kinds of noisy input image are provided to investigate filter performance in terms of peak-to-output-energy ratio, discrimination against undesired objects, and tolerance to target distortion (for example, target rotation and scaling). We compare the results with those of other filters to verify the performance of the optimum filters.
Conference Paper
Full-text available
This paper introduces a technique for region-based pose tracking without the need to explicitly compute contours. We assume a surface model of a rigid object and at least one calibrated camera view. The goal is to find the pose parameters that optimally fit the model surface to the contour of the object seen in the image. In contrast to conventional contour-based techniques, which acquire the contour to be extracted explicitly from the image, our approach optimizes an energy directly defined on the pose parameters. We show experimental results for rather challenging scenes observed with a monocular and a stereo camera system.
Article
Full-text available
The pose estimation from visual sensors is widely practiced nowadays. The pose vector is estimated by means of homographies and projection geometry. The integration of visual and inertial measurements is getting more attractive due to its robustness and flexibility. The cooperation of visual with inertial sensors for finding a robot's pose bears many advantages, as it exploits their complementary attributes. Most of the visual pose estimation systems identify a geometrically known planar target to extract the pose vector. In this paper, the pose is estimated from a set of colored markers arranged in a known geometry, fused with the measurements of an inertial unit. The utilization of an extended Kalman filter (EKF) compensates the error and fuses the two heterogeneous measurements. The novelty of the proposed system is the use of low-cost colored post-it markers, along with the capability of handling different frames of reference, as the camera and the inertial unit are mounted on different mobile subsystems of a sophisticated volant robotic platform. The proposed system is computationally inexpensive, operates in real time, and exhibits high accuracy.
Article
Full-text available
In this work, we present a nonrigid approach to jointly solving the tasks of 2D-3D pose estimation and 2D image segmentation. In general, most frameworks that couple both pose estimation and segmentation assume that one has exact knowledge of the 3D object. However, under nonideal conditions, this assumption may be violated if only a general class to which a given shape belongs is given (e.g., cars, boats, or planes). Thus, we propose to solve the 2D-3D pose estimation and 2D image segmentation via nonlinear manifold learning of 3D embedded shapes for a general class of objects or deformations for which one may not be able to associate a skeleton model. Thus, the novelty of our method is threefold: first, we present and derive a gradient flow for the task of nonrigid pose estimation and segmentation. Second, due to the possible nonlinear structures of one's training set, we evolve the pre-image obtained through kernel PCA for the task of shape analysis. Third, we show that the derivation for shape weights is general. This allows us to use various kernels, as well as other statistical learning methodologies, with only minimal changes needing to be made to the overall shape evolution scheme. In contrast with other techniques, we approach the nonrigid problem, which is an infinite-dimensional task, with a finite-dimensional optimization scheme. More importantly, we do not explicitly need to know the interaction between various shapes such as that needed for skeleton models as this is done implicitly through shape learning. We provide experimental results on several challenging pose estimation and segmentation scenarios.
Article
Full-text available
In this work, we present an approach to jointly segment a rigid object in a two-dimensional (2D) image and estimate its three-dimensional (3D) pose, using the knowledge of a 3D model. We naturally couple the two processes together into a shape optimization problem and minimize a unique energy functional through a variational approach. Our methodology differs from the standard monocular 3D pose estimation algorithms since it does not rely on local image features. Instead, we use global image statistics to drive the pose estimation process. This confers a satisfying level of robustness to noise and initialization for our algorithm and bypasses the need to establish correspondences between image and object features. Moreover, our methodology possesses the typical qualities of region-based active contour techniques with shape priors, such as robustness to occlusions or missing information, without the need to evolve an infinite dimensional curve. Another novelty of the proposed contribution is to use a unique 3D model surface of the object, instead of learning a large collection of 2D shapes to accommodate the diverse aspects that a 3D object can take when imaged by a camera. Experimental results on both synthetic and real images are provided, which highlight the robust performance of the technique in challenging tracking and segmentation applications.
Article
Full-text available
Several performance criteria are described to enable a fair comparison among the various correlation filter designs: signal-to-noise ratio, peak sharpness, peak location, light efficiency, discriminability, and distortion invariance. The trade-offs resulting between some of these criteria are illustrated with the help of a new family of filters called fractional power filters (FPFs). The classical matched filter, phase-only filter (POF), and inverse filter are special cases of FPFs. Using examples, we show that the POF appears to provide a good compromise between noise tolerance and peak sharpness.
Article
Full-text available
Correlation filtering methods are becoming increasingly popular for image recognition and location. The recent introduction of optimal tradeoff circular harmonic function filters allowed the user to specify the response of a correlation filter to in-plane rotation distortion. In this paper we introduce a new correlation filter design that can provide a user-specified response to in-plane scale distortion. The design is based on the Mellin radial harmonic (MRH) transform and incorporates multiple harmonics into the correlation filter for improved discrimination capability. Additionally, the filter design minimizes the average correlation energy in order to achieve sharp correlation peaks, and thus we refer to these filters as minimum average correlation energy Mellin radial harmonic (MACE-MRH) filters. We present underlying theory, a MACE-MRH filter design method, and numerical simulation results.
Article
Full-text available
Estimation of camera pose from an image of n points or lines with known correspondence is a thoroughly studied problem in computer vision. Most solutions are iterative and depend on nonlinear optimization of some geometric constraint, either on the world coordinates or on the projections to the image plane. For real-time applications, we are interested in linear or closed-form solutions free of initialization. We present a general framework which allows for a novel set of linear solutions to the pose estimation problem for both n points and n lines. We then analyze the sensitivity of our solutions to image noise and show that the sensitivity analysis can be used as a conservative predictor of error for our algorithms. We present a number of simulations which compare our results to two other recent linear algorithms, as well as to iterative approaches. We conclude with tests on real imagery in an augmented reality setup.
Article
Full-text available
Determining the rigid transformation relating 2D images to known 3D geometry is a classical problem in photogrammetry and computer vision. Heretofore, the best methods for solving the problem have relied on iterative optimization methods which cannot be proven to converge and/or which do not effectively account for the orthonormal structure of rotation matrices. We show that the pose estimation problem can be formulated as that of minimizing an error metric based on collinearity in object (as opposed to image) space. Using object space collinearity error, we derive an iterative algorithm which directly computes orthogonal rotation matrices and which is globally convergent. Experimentally, we show that the method is computationally efficient, that it is no less accurate than the best currently employed optimization methods, and that it outperforms all tested methods in robustness to outliers
Article
Despite great progress achieved in 3-D pose tracking during the past years, occlusions and self-occlusions are still an open issue. This is particularly true in silhouette-based tracking where even visible parts cannot be tracked as long as they do not affect the object silhouette. Multiple cameras or motion priors can overcome this problem. However, multiple cameras or appropriate training data are not always readily available. We suggest to handle occlusions and self-occlusions by tracking multiple objects and object parts simultaneously, where each part is described by its own region. This allows to deal with occlusions explicitly, which includes self-occlusions between different parts of the same object as well as occlusions between different objects. Our tracking approach estimates the pose of 3-D models by minimising the 2-D projection error through minimisation of an energy function depending on the pose parameters. Therefore, we model different image regions by probability density functions. The results we present for
Article
An accurate method for tracking the position and orientation of a moving target in nonuniformly illuminated and noisy scenes is proposed. The approach employs a filter bank of space-variant correlation filters which adapt their parameters accordingly with local statistics of the observed scene in each frame. When a scene frame is captured, a fragment of interest is extracted from the frame around predicted coordinates of the target location. The fragment is firstly preprocessed to correct the illumination. Afterwards, the location and orientation of the target are estimated from the corrected fragment with the help of the filter bank. The performance of the proposed system in terms of tracking accuracy is tested in nonuniformly illuminated and noisy scene sequences. The obtained results are discussed and compared with those of similar state-of-the-art techniques for target tracking in terms of objective metrics.
Article
A real-time system for classification and tracking of multiple moving objects is proposed. The system employs a bank of composite correlation filters with complex constraints implemented in parallel on a graphics processing unit. When a scene frame is captured, the system splits the frame into several fragments on the base of a modeling kinematic prediction of target's locations. The fragments are processed with a bank of adaptive filters. The filters are synthesized with the help of an iterative algorithm, which optimizes discrimination capability for each target. Using complex constraints in the filter design, multiple objects in the input frame can be detected and classified by analyzing the intensity and phase distributions on the output complex correlation plane for each fragment. The performance of the proposed system in terms of tracking accuracy, classification efficiency and time expenses is tested and discussed with synthetic and real input-scene sequences. The results are compared with those of common techniques based on correlation filtering.
Article
In this paper, we address the problem of 2D-3D pose estimation. Specifically, we propose an approach to jointly track a rigid object in a 2D image sequence and to estimate its pose (position and orientation) in 3D space. We revisit a joint 2D segmentation/3D pose estimation technique, and then extend the framework by incorporating a particle filter to robustly track the object in a challenging environment, and by developing an occlusion detection and handling scheme to continuously track the object in the presence of occlusions. In particular, we focus on partial occlusions that prevent the tracker from extracting an exact region properties of the object, which plays a pivotal role for region-based tracking methods in maintaining the track. To this end, a dynamical choice of how to invoke the objective functional is performed online based on the degree of dependencies between predictions and measurements of the system in accordance with the degree of occlusion and the variation of the object's pose. This scheme provides the robustness to deal with occlusions of an obstacle with different statistical properties from that of the object of interest. Experimental results demonstrate the practical applicability and robustness of the proposed method in several challenging scenarios.
Article
The quality of computer generated images of three-dimensional scenes depends on the shading technique used to paint the objects on the cathode-ray tube screen. The shading algorithm itself depends in part on the method for modeling the object, which also determines the hidden surface algorithm. The various methods of object modeling, shading, and hidden surface removal are thus strongly interconnected. Several shading techniques corresponding to different methods of object modeling and the related hidden surface algorithms are presented here. Human visual perception and the fundamental laws of optics are considered in the development of a shading rule that provides better quality and increased realism in generated images.
Conference Paper
We present the design of correlation filters for detection of a target in a noisy input scene when the object of interest is given in a noisy reference image. The target signal, shape and location in the reference image are assumed to be unknown. Two signal models are considered for the input scene: additive and nonoverlapping. The design of the filters consists of automated estimation of needed parameters from a noisy reference image and maximization of the peak-to-output energy ratio criterion. Two filter variants are proposed. The matching error metric is used to determine the regions of the parameter space where each filter variant performs better. Computer simulation results obtained with the proposed filters are presented and evaluated in terms of discrimination capability, location errors, and tolerance to input noise.
Article
Two-dimensional (2-D) face recognition (FR) is of interest in many verification (1:1 matching) and identification (1:N matching) applications because of its nonintrusive nature and because digital cameras are becoming ubiquitous. However, the performance of 2-D FR systems can be degraded by natural factors such as expressions, illuminations, pose, and aging. Several FR algorithms have been proposed to deal with the resulting appearance variability. However, most of these methods employ features derived in the image or the space domain whereas there are benefits to working in the spatial frequency domain (i.e., the 2-D Fourier transforms of the images). These benefits include shift-invariance, graceful degradation, and closed-form solutions. We discuss the use of spatial frequency domain methods (also known as correlation filters or correlation pattern recognition) for FR and illustrate the advantages. However, correlation filters can be computationally demanding due to the need for computing 2-D Fourier transforms and may not match well for large-scale FR problems such as in the Face Recognition Grand Challenge (FRGC) phase-II experiments that require the computation of millions of similarity metrics. We will discuss a new method [called the class-dependence feature analysis (CFA)] that reduces the computational complexity of correlation pattern recognition and show the results of applying CFA to the FRGC phase-II data
Stanford bunny 3D digital model
  • Turk
G. Turk and M. Levoy, "Stanford bunny 3D digital model," 1994, http://graphics.stanford.edu/data/3Dscanrep (8 January 2015).
Utah teapot 3D digital model
  • Newell
M. Newell, "Utah teapot 3D digital model," 1975, http://graphics.cs. williams.edu/data/meshes.xml (20 May 2015).
Diaz-Ramirez obtained his MS degree in electronics engineering from Instituto Tecnológico de Mexicali in 2003 and his PhD in computer science from Centro de Investigación Científica y de Education Superior de Ensenada (CICESE), Mexico
  • H Victor
Victor H. Diaz-Ramirez obtained his MS degree in electronics engineering from Instituto Tecnológico de Mexicali in 2003 and his PhD in computer science from Centro de Investigación Científica y de Education Superior de Ensenada (CICESE), Mexico, in 2007. He is now a professor at Instituto Politécnico Nacional, Mexico. His research interests include signal and image processing, pattern recognition, and opto-digital correlators.
1984 and his PhD and Doctor of Sciences in image processing from the Institute of Information Transmission Problems
Vitaly Kober obtained his MS degree in applied mathematics from Air-Space University of Samara, Russia, in 1984 and his PhD and Doctor of Sciences in image processing from the Institute of Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia, in 1992 and 2004, respectively. Currently, he is a professor in CICESE. His research interests include signal and image processing and pattern recognition.
He is currently an associate professor at Universidad Rey Juan Carlos and main leader of the CAPO research group. His research interests include soft computing, computer vision, GPU computing, image and video processing
  • S Antonio
Antonio S. Montemayor received his MS degree in applied physics at Universidad Autónoma de Madrid in 1999 and his PhD from Universidad Rey Juan Carlos in 2006. He is currently an associate professor at Universidad Rey Juan Carlos and main leader of the CAPO research group. His research interests include soft computing, computer vision, GPU computing, image and video processing, and real-time implementations.
Pantrigo is currently an associate professor at Universidad Rey Juan Carlos and member of CAPO research group in the Department of Computer Science. He received his MS degree in fundamental physics from Universidad de Extremadura in 1998 and his PhD from Universidad Rey Juan Carlos in 2005
  • J Juan
Juan J. Pantrigo is currently an associate professor at Universidad Rey Juan Carlos and member of CAPO research group in the Department of Computer Science. He received his MS degree in fundamental physics from Universidad de Extremadura in 1998 and his PhD from Universidad Rey Juan Carlos in 2005. From 1998 to 2002, he was working in the Biomechanics Lab at Universidad de Extremadura. His research interests include high-dimensional space-state tracking problems, computer vision, metaheuristic optimization, and hybrid approaches. Optical Engineering 063102-11