Article

A Method for Registration of 3-D Shapes

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper describes a general purpose, representation independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six-degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and experience shows that the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. For examples, a given 'model' shape and a sensed 'data' shape that represents a major portion of the model shape can be registered in minutes by testing one initial translation and a relatively small set of rotations to allow for the given level of model complexity. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model prior to shape inspection. The described method is also useful for deciding fundamental issues such as the congruence (shape equivalence) of different geometric representations as well as for estimating the motion between point sets where the correspondences are not known. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Since we evaluate on synthetic scenes without a meaningful physical scale, we report the absolute numbers without any physical unit. The baseline methods either assume a fixed camera [58] or predict the camera [53,43] and, in both cases, we apply ICP [3] to rigidly align their meshes with the ground truth in order to compare to our method, which assumes camera motion is known. ...
... Step Bones Faces Hypotheses Epochs r1 21 1280 16 20 r2 26 1600 1 10 r3 31 1920 1 10 r4 31 2240 1 10 r5 36 2560 1 10 final 36 2880 1 10 We run the method of Yu et al. [58] in the manner shown in their code 3 . We empirically explored a set of values and found those of Table 7 to perform best when comparing results after rigid alignment with ICP [3] to the ground truth. ...
... Step Bones Faces Hypotheses Epochs r1 21 1280 16 20 r2 26 1600 1 10 r3 31 1920 1 10 r4 31 2240 1 10 r5 36 2560 1 10 final 36 2880 1 10 We run the method of Yu et al. [58] in the manner shown in their code 3 . We empirically explored a set of values and found those of Table 7 to perform best when comparing results after rigid alignment with ICP [3] to the ground truth. ...
Preprint
Capturing general deforming scenes is crucial for many computer graphics and vision applications, and it is especially challenging when only a monocular RGB video of the scene is available. Competing methods assume dense point tracks, 3D templates, large-scale training datasets, or only capture small-scale deformations. In contrast to those, our method, Ub4D, makes none of these assumptions while outperforming the previous state of the art in challenging scenarios. Our technique includes two new, in the context of non-rigid 3D reconstruction, components, i.e., 1) A coordinate-based and implicit neural representation for non-rigid scenes, which enables an unbiased reconstruction of dynamic scenes, and 2) A novel dynamic scene flow loss, which enables the reconstruction of larger deformations. Results on our new dataset, which will be made publicly available, demonstrate the clear improvement over the state of the art in terms of surface reconstruction accuracy and robustness to large deformations. Visit the project page https://4dqv.mpi-inf.mpg.de/Ub4D/.
... When matching, the pose of the template with the highest similarity found by querying in the established template set is used as the rough pose. Then the pose refinement is performed by the iterative closest point (ICP) algorithm (Besl and McKay 1992). The mentioned methods have advantages in detection speed. ...
... Thus, adopting both RGB and depth data to improve the performance of pose estimation becomes a popular topic (Li et al. 2018a;Wang et al. 2019;Xiang et al. 2018;Kehl et al. 2017). Xiang et al. (2018) and Kehl et al. (2017) apply RGB images to predict a rough 6D pose, which is then refined by the ICP (Besl and McKay 1992) algorithm based on depth maps. However, the refinement stage is time-consuming, and the accuracy is strongly influenced by the initial rough pose. ...
... After obtaining the estimated rough pose, we perform pose refinement to acquire a finer pose. Since the ICP (Besl and McKay 1992) is time-consuming to meet the real-time requirement, we adopt the end-to-end iterative refinement module as shown in Fig. 6. The point cloud is transformed using the rough pose. ...
Article
Full-text available
Precise 6DoF (6D) object pose estimation is an essential topic for many intelligent applications, for example, robot grasping, virtual reality, and autonomous driving. Lacking depth information, traditional pose estimators using only RGB cameras consistently predict bias 3D rotation and translation matrices. With the wide use of RGB-D cameras, we can directly capture both the depth for the object relative to the camera and the corresponding RGB image. Most existing methods concatenate these two data sources directly, which does not make full use of their complementary relationship. Therefore, we propose an efficient RGB-D fusion network for 6D pose estimation, called EFN6D, to exploit the 2D–3D feature more thoroughly. Instead of directly using the original single-channel depth map, we encode the depth information into a normal map and point cloud data. To effectively fuse the surface texture features and the geometric contour features of the object, we feed the RGB images and the normal map into two ResNets. Besides, the PSP modules and skip connections are used between the two ResNets, which not only enhances cross modal fusion performance of the network but also enhances the network’s capability in handling objects at different scales. Finally, the fused features obtained from these two ResNets and the point cloud features are densely fused point by point to further strengthen the fusion of 2D and 3D information at a per-pixel level. Experiments on the LINEMOD and YCB-Video datasets show that our EFN6D outperforms state-of-the-art methods by a large margin.
... We use as baseline ICP [19], DCP [20], DeepGMR [21], PointNetLK [12], and the most recent state-of-the-art PointNetLK-Revisited [14] for synthetic registration and indoor registration. We also test it on an outdoor dataset which is not applicable to deep feature-based registration [14]. ...
... We also test it on an outdoor dataset which is not applicable to deep feature-based registration [14]. Thus we use the benchmark from outdoor registration paper [28] with the baseline ICP [19], G-ICP [29], 3DFeat-Net [6], DeepVCP [30], and DeepCLR [28]. ...
... As compared deep learning based methods are trained on 00-07 sequences, we run tests on 08-10 sequences. ICP Point2Point *[19] ICP Point2Plane *[19] G-ICP *[29] 3DFeat-Net *[6] DeepVCP *[30] DeepCLR *[28] Ours Values taken from[28]. ...
Preprint
Full-text available
In recent years, implicit functions have drawn attention in the field of 3D reconstruction and have successfully been applied with Deep Learning. However, for incremental reconstruction, implicit function-based registrations have been rarely explored. Inspired by the high precision of deep learning global feature registration, we propose to combine this with distance fields. We generalize the algorithm to a non-Deep Learning setting while retaining the accuracy. Our algorithm is more accurate than conventional models while, without any training, it achieves a competitive performance and faster speed, compared to Deep Learning-based registration models. The implementation is available on github for the research community.
... 3. We compare NSM with several point cloud registration methods and one part assembly baseline approach. 4. Experimental results and analysis support our design choices and demonstrate the robustness of NSM when presented with realistically noisy observations. ...
... Pose estimation. Existing pose estimation methods predict poses for known objects by aligning a provided model with an observation [4,56]. Learning-based approaches predict poses for novel objects as bounding box corners [27] or semantic keypoints [46,53] or mappings to a normalized coordinate space [48]. ...
... Point cloud registration. If we had access to additional information, our problem would reduce to point cloud registration [4,5,58]. Specifically, if we had a segmentation of the interface of each piece (the subset of its surface that contacts the other piece in the assembled pose), computing the assembled pose would reduce to aligning the paired interfaces. ...
Preprint
Learning to autonomously assemble shapes is a crucial skill for many robotic applications. While the majority of existing part assembly methods focus on correctly posing semantic parts to recreate a whole object, we interpret assembly more literally: as mating geometric parts together to achieve a snug fit. By focusing on shape alignment rather than semantic cues, we can achieve across-category generalization. In this paper, we introduce a novel task, pairwise 3D geometric shape mating, and propose Neural Shape Mating (NSM) to tackle this problem. Given the point clouds of two object parts of an unknown category, NSM learns to reason about the fit of the two parts and predict a pair of 3D poses that tightly mate them together. We couple the training of NSM with an implicit shape reconstruction task to make NSM more robust to imperfect point cloud observations. To train NSM, we present a self-supervised data collection pipeline that generates pairwise shape mating data with ground truth by randomly cutting an object mesh into two parts, resulting in a dataset that consists of 200K shape mating pairs from numerous object meshes with diverse cut types. We train NSM on the collected dataset and compare it with several point cloud registration methods and one part assembly baseline. Extensive experimental results and ablation studies under various settings demonstrate the effectiveness of the proposed algorithm. Additional material is available at: https://neural-shape-mating.github.io/
... As an alternative representation to achieve sensor tracking and mapping, the seminal work of KinectFusion [4] demonstrates the performance of camera motion estimation and dense reconstruction in real-time. To represent the geometry, KinectFusion uses the truncated signed distance function (TSDF) [18] and a coarse-to-fine ICP algorithm [19] to perform alignment. TSDF, originally proposed in computer graphics literature, has become a widely used representation to exploit in odometry and mapping approaches. ...
... Moreover, assuming that the choice of covariance function respects (19) already, we estimatev(x) and ∇v and its variance by simply using the GPIS regression equation (13) and (14) with target measurements set to y i = 1 at locations x i on the surface (instead ofd and ∇d). Intuitively, this implies the target values of measurements in terms of the EDF is log(1) = 0 on the surface boundary. ...
... Intuitively, this implies the target values of measurements in terms of the EDF is log(1) = 0 on the surface boundary. Similarly, this enforces the boundary condition in (19). ...
Preprint
Full-text available
Whereas dedicated scene representations are required for each different tasks in conventional robotic systems, this paper demonstrates that a unified representation can be used directly for multiple key tasks. We propose the Log-Gaussian Process Implicit Surface for Mapping, Odometry and Planning (Log-GPIS-MOP): a probabilistic framework for surface reconstruction, localisation and navigation based on a unified representation. Our framework applies a logarithmic transformation to a Gaussian Process Implicit Surface (GPIS) formulation to recover a global representation that accurately captures the Euclidean distance field with gradients and, at the same time, the implicit surface. By directly estimate the distance field and its gradient through Log-GPIS inference, the proposed incremental odometry technique computes the optimal alignment of an incoming frame, and fuses it globally to produce a map. Concurrently, an optimisation-based planner computes a safe collision-free path using the same Log-GPIS surface representation. We validate the proposed framework on simulated and real datasets in 2D and 3D and benchmark against the state-of-the-art approaches. Our experiments show that Log-GPIS-MOP produces competitive results in sequential odometry, surface mapping and obstacle avoidance.
... Shape registration is a classic problem in pointcloud processing for which several methods exists [2,3,6,8,12,15], being the iterative closest point (ICP) and its variants [2,8,9] the most widely used methods. The ICP algorithm progressively matches the target shape to a reference shape by iteratively estimating a global rotation, translation and/or scaling. ...
... Shape registration is a classic problem in pointcloud processing for which several methods exists [2,3,6,8,12,15], being the iterative closest point (ICP) and its variants [2,8,9] the most widely used methods. The ICP algorithm progressively matches the target shape to a reference shape by iteratively estimating a global rotation, translation and/or scaling. ...
... and the third barycentric coordinate as α 3 = 1 − α 1 − α 2 . Random points on the surface are then generated using the convex combination of corresponding vertices as in Eq. (2). In order to achieve a constant point density δ S , up to δ S A(f ) points must be generated for a facet f with area A(f ). ...
Chapter
Labelling pointclouds with the nearest facet of triangular meshes is a required step for a number of operations involving pointclouds and meshes, such as pointcloud registration, model parameters optimization, error estimation and pointcloud selection, among others. In this paper, we describe a simple method for labelling pointclouds with nearest facet that is based on an objective function that resolves the ambiguity around shared edges and vertices. We provide explicit formulas for computing the barycentric coordinates of projected points on selected facets, which are efficiently used to evaluate the point-facet distance. The method was tested with simulated and real pointclouds generated through standard photogrammetric procedures.
... In rigid registration, two point clouds to be aligned only differ by a rotation and/or a translation. The iterative closest point (ICP) algorithm [4] and its variants are perhaps the most widely used solutions for this class of problems. In non-rigid registration, portions from a source/target point cloud can be deformed independently from others, such as in articulated or organic figures. ...
... ICP [4] is one of the most popular approaches for rigid registration, and several improvements have been proposed to it, including optimizations for estimating correspondences (Tr-ICP [14]), and modeling using maximum likelihood estimation (EM-ICP [19]). Probabilistic approaches like CPD [29], FilterReg [16], and Branch-and-Bound-based methods, such as Go-ICP [51], have shown improved results. ...
Article
Full-text available
We present 3D point-cloud registration techniques suited for scenarios where robustness to outliers and missing regions is necessary, besides being applicable to both rigid and non-rigid configurations. Our techniques exploit advantages from deep learning models for dense point matching and from recent advances in probabilistic modeling of point-cloud registration. Such a combination produces context awareness and resilience to outliers and missing information. We demonstrate their effectiveness by comparing them to state-of-the-art methods and showing that ours achieve superior results on existing and proposed datasets.
... For matching failure case, the registration results is not used for this evaluation. The state-of-the-art geometric approaches like ICP [58], G-ICP [59], AA-ICP [60], NDT-P2D [61] and CPD [62] and the DNN-based methods 3DFeat-Net [24], DeepVCP [44] and D3Feat [26], StickyPillars [25] are used for the comparison. The results presented in [25] is adopted and we extended the table by running the experiments on LinK3D. ...
... Comparison with state-of-the-art methods. In this evaluation, we compare our results with ICP algorithms [58] [59], CLS [63] and LOAM [64] which is widely considered as baseline in terms of point cloud based odometry estimation. Furthermore, we also validate against DNN-based LO-Net [65] which combines point cloud registration and subsequent geometry based on mapping. ...
Preprint
Feature extraction and matching are the basic parts of many computer vision tasks, such as 2D or 3D object detection, recognition, and registration. As we all know, 2D feature extraction and matching have already been achieved great success. Unfortunately, in the field of 3D, the current methods fail to support the extensive application of 3D LiDAR sensors in vision tasks, due to the poor descriptiveness and inefficiency. To address this limitation, we propose a novel 3D feature representation method: Linear Keypoints representation for 3D LiDAR point cloud, called LinK3D. The novelty of LinK3D lies in that it fully considers the characteristics (such as sparsity, complexity of scenarios) of LiDAR point cloud, and represents current keypoint with its robust neighbor keypoints, which provide strong constraint on the description of current keypoint. The proposed LinK3D has been evaluated on two public datasets (i.e., KITTI, Steven VLP16), and the experimental results show that our method greatly outperforms the state-of-the-arts in matching performance. More importantly, LinK3D shows excellent real-time performance (based on the frequence 10 Hz of LiDAR). LinK3D only takes an average of 32 milliseconds to extract features from the point cloud collected by a 64-ray laser beam, and takes merely about 8 milliseconds to match two LiDAR scans when executed in a notebook with an Intel Core i7 @2.2 GHz processor. Moreover, our method can be widely extended to a variety of 3D vision applications. In this paper, we has applied our LinK3D to 3D registration, LiDAR odometry and place recognition tasks, and achieved competitive results compared with the state-of-the-art methods.
... The resulted 3D point clouds belonging to the ground are further injected into the local alignment process which is applied to each pair of 3D patch matches. The fine alignment procedure includes pose estimation based on the Iteratively Closest Point (ICP) algorithm (Besl and McKay, 1992), barycenter compensation and outlier rejection. The final output of the algorithm is a set of homologous 3D patches and their associated 3D rigid transformations. ...
... Let us note with Pnew,G and P old,G the 3D point clouds representing the ground for the recent and the historical dataset, respectively. In order to refine the 3D global patch alignment, each 3D patch match is injected into the local alignment procedure which includes rigid pose estimation via the ICP (Besl and McKay, 1992) algorithm, barycenter compensation and outlier rejection. ...
Article
Full-text available
Automatic georeferencing for historical-to-nowadays aerial images represents the main ingredient for supplying territory evolution analysis and environmental monitoring. Existing georeferencing methods based on feature extraction and matching reported successful results for multi-epoch aerial images acquired in structured and man-made environments. While improving the state-of- the-art of the multi-epoch georeferencing problem, such frameworks present several limitations when applied to unstructured scenes, such as natural feature-less environments, characterized by homogenous or texture-less areas. This is mainly due to the lack of structured areas which often results in sparse and ambiguous feature matches, introducing inconsistencies during the pose estimation process. This paper addresses the automatic georeferencing problem for historical aerial images acquired in unstructured natural environments. The research work presented in this paper introduces a feature-less algorithm designed to perform historical-to- nowadays image matching for pose estimation in a fully automatic fashion. The proposed algorithm operates within two stages: (i) 2D patch extraction and matching and (ii) 3D patch-based local alignment. The final output is a set of 3D patch matches and the 3D rigid transformation relating each homologous patches. The obtained 3D point matches are designed to be injected into traditional multi-views pose optimisation engines. Experimental results on real datasets acquired over Fabas area situated in France demonstrate the effectiveness of the proposed method. Our findings illustrate that the proposed georeferencing technique provides accurate results in presence of large periods of time separating historical from nowadays aerial images (up to 48 years time span).
... The baseline is inspired by the state-of-the-art indoor extreme pose estimation method [64] and can be seen as a variant for humans. Specifically, we utilize the latest EFT-Net [41] to reconstruct 3D human models and align them with ICP [9]. To avoid local minima, we first register the shapes based on their canonical coordinates. ...
... when the pose error threshold is low, but deep optimization [9,44] surpasses it when the threshold increases. This is expected since matching-based approaches can produce accurate estimation when classic correspondences are available, yet fail catastrophically when the viewpoints are very different. ...
Preprint
Full-text available
Recovering the spatial layout of the cameras and the geometry of the scene from extreme-view images is a longstanding challenge in computer vision. Prevailing 3D reconstruction algorithms often adopt the image matching paradigm and presume that a portion of the scene is co-visible across images, yielding poor performance when there is little overlap among inputs. In contrast, humans can associate visible parts in one image to the corresponding invisible components in another image via prior knowledge of the shapes. Inspired by this fact, we present a novel concept called virtual correspondences (VCs). VCs are a pair of pixels from two images whose camera rays intersect in 3D. Similar to classic correspondences, VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views. Therefore VCs can be established and exploited even if images do not overlap. We introduce a method to find virtual correspondences based on humans in the scene. We showcase how VCs can be seamlessly integrated with classic bundle adjustment to recover camera poses across extreme views. Experiments show that our method significantly outperforms state-of-the-art camera pose estimation methods in challenging scenarios and is comparable in the traditional densely captured setup. Our approach also unleashes the potential of multiple downstream tasks such as scene reconstruction from multi-view stereo and novel view synthesis in extreme-view scenarios.
... coincident, parallelism, coaxiality, contact) are left to the CAD modeler in charge of the updates. The use of such an algorithm is particularly interesting in the present case as the energy function to be minimized, and the geometric constraints to be satisfied, are not defined by equations but by means of black boxes combining calls to several procedures of the CAD modeler [25,26]. Other metaheuristics have been tested but have demonstrated a lower efficiency than SA. ...
... Here, updates can appear at the level of the sketches, parts and assemblies. Then, each geometry is translated and rotated using an ICP algorithm [26] that finds the best fit rigid body transformation between the geometry and its associated PC κ i (t). Indeed, adding six additional parameters to control the position and orientation of each geometry taking part to the optimization process would clearly reduce the performances of the SA algorithm, it is therefore more efficient to manage these issues in a standalone ICP step. ...
Thesis
Even if many Reverse Engineering techniques exist to reconstruct real objects in 3D, very few are able to deal directly and efficiently with the reconstruction of the editable CAD models of assemblies of mechanical parts that can be used in the stages of Product Development Processes (PDP). In the absence of suitable segmentation tools, these approaches struggle to identify in the reconstructed model the different parts that make up the assembly. The thesis aims to develop a new Reverse Engineering technique for the reconstruction of editable CAD models of mechanical parts’ assemblies. The proposed method uses Simulated Annealing-based fitting and the optimization process leverages a two-level filtering technique able to capture and manage the boundaries of the geometries inside the overall point cloud in order to allow for local fitting and interface detection. The originality lies in the exploitation of multimodal data (e.g. clouds of points, a database of CAD models, and store best configurations for the fitting process). The thesis presents a two-stage modular approach. The first step is to collect within the multimodal data a set of characteristics that contribute to the Simulated Annealing-based new Reverse Engineering technique and for the identification of interfaces and parts. In a second step, it is necessary to merge this information with the help of transformation operators working in a common space. The method integrates sensitivity analysis to characterize the impact of the variations in the parameters of a CAD model on the evolution of the deviation between the CAD model itself and the point cloud to be fitted. The evaluation of the proposed approach is performed using both real scanned point clouds and as-scanned virtually generated point clouds which incorporate several artifacts that could appear with a real scanner. Results cover several Industry 4.0 related application scenarios, ranging from the global fitting of a single part to the update of a complete Digital Mock-Up embedding assembly constraints. The proposed approach demonstrates good capacities to help to maintain the coherence between a product/system and its digital twin.
... Early networks are trained on the synthetic dataset [13], that is because the ground truth of optical flow in the real world is difficult to obtain, which increases the difficulty for supervision and estimation. As a result, the method of unsupervised learning on real-world data has attracted much attention [20], [41], [21], [22], [23], [24]. ...
... Quantitative results are shown in Table I. We implemented the unsupervised scene flow training based only on monocular images and the scene flow network trained with our unsupervised methods performs well on the evaluation criteria compared to traditional methods Iterative Closest Point (ICP) [41] and Fast Global Registration (FGR) [50]. ...
Preprint
Scene flow represents the motion of points in the 3D space, which is the counterpart of the optical flow that represents the motion of pixels in the 2D image. However, it is difficult to obtain the ground truth of scene flow in the real scenes, and recent studies are based on synthetic data for training. Therefore, how to train a scene flow network with unsupervised methods based on real-world data shows crucial significance. A novel unsupervised learning method for scene flow is proposed in this paper, which utilizes the images of two consecutive frames taken by monocular camera without the ground truth of scene flow for training. Our method realizes the goal that training scene flow network with real-world data, which bridges the gap between training data and test data and broadens the scope of available data for training. Unsupervised learning of scene flow in this paper mainly consists of two parts: (i) depth estimation and camera pose estimation, and (ii) scene flow estimation based on four different loss functions. Depth estimation and camera pose estimation obtain the depth maps and camera pose between two consecutive frames, which provide further information for the next scene flow estimation. After that, we used depth consistency loss, dynamic-static consistency loss, Chamfer loss, and Laplacian regularization loss to carry out unsupervised training of the scene flow network. To our knowledge, this is the first paper that realizes the unsupervised learning of 3D scene flow from monocular camera. The experiment results on KITTI show that our method for unsupervised learning of scene flow meets great performance compared to traditional methods Iterative Closest Point (ICP) and Fast Global Registration (FGR). The source code is available at: https://github.com/IRMVLab/3DUnMonoFlow.
... ] to a new one q i (k), which may be calculated using odometry sensors or the movement model. Next, the observation of the environment is used to verify robot pose using the specific algorithm, as, for instance, iterative closest point (ICP) [4]. It allows to evaluate more probable position q j (k) with an error e i j (q i , q j ) between q i and q j . ...
... Algorithm sensitivity is assessed for the same parameters as in the simulation study described in Sect. 4. Nominal values for the fastSLAM algorithm are set as follows: n = 1000, E B = 0.2, D 2 cr,1 = 100, P(L n ) = 10 −4 and P( an ) = P( bn ) = 10 −3 . ...
Article
Full-text available
The task of simultaneous localization and mapping (SLAM) allows a mobile robot to localize itself in the unknown environment, while building the map of the surrounding landscape. It is frequently used for the navigation of autonomous mobile robots. Despite the fact that there are plenty available solutions, real applications encounter various application constraints. The present work addresses the subject of the modified fastSLAM application using custom lidar as the detection sensor. Available localization solutions are assessed, their constraints are identified and corresponding solutions are proposed. The algorithm is implemented, tested using simulations and finally applied to the existing mobile platform. Its validation considers various practical aspects of its operation in real-time environment.
... θ.y and θ.z share the similar definition to θ.x. Based on the O init , we use the ICP [62] to achieve more accurate O. Finally, the registration result is achieved. The most significant advantage of our shape registration is that the method does not require any feature analysis such as normal matching, 3D keypoint extraction, different shape operators, etc. ...
... As mentioned before, we provide a shape registration application based on our resampling method and this improves the efficiency of transform matrix computation in the registration. We compare different classical shape registration methods to show the improvement, including ICP [62], FPFH [34], Go-ICP [77], PointNetLK [28], and PCRNet [29]. The ICP and FPFH are implemented by the PCL library. ...
Article
Full-text available
With rapid development of 3D scanning technology, 3D point cloud based research and applications are becoming more popular. However, major difficulties are still exist which affect the performance of point cloud utilization. Such difficulties include lack of local adjacency information, non-uniform point density, and control of point numbers. In this paper, we propose a two-step intrinsic and isotropic (I&I) resampling framework to address the challenge of these three major difficulties. The efficient intrinsic control provides geodesic measurement for a point cloud to improve local region detection and avoids redundant geodesic calculation. Then the geometrically-optimized resampling uses a geometric update process to optimize a point cloud into an isotropic or adaptively-isotropic one. The point cloud density can be adjusted to global uniform (isotropic) or local uniform with geometric feature keeping (being adaptively isotropic). The point cloud number can be controlled based on application requirement or user-specification. Experiments show that our point cloud resampling framework achieves outstanding performance in different applications: point cloud simplification, mesh reconstruction and shape registration. We provide the implementation codes of our resampling method at https://github.com/vvvwo/II-resampling.
... To improve the co-registration accuracy between point cloud epochs for surface change analysis, we align each epoch in the time series to the first point cloud (2021-07-28 12:00) as global null epoch. Alignment is performed by deriving a rigid transformation matrix using an iterative closest point (ICP; Besl and McKay, 1992) method on stable surfaces within the scene. The stable surfaces are manually selected as centroids with specified radii (between 0.75 and 2.5 m). ...
Article
Full-text available
Automatic extraction of surface activity from near-continuous 3D time series is essential for geographic monitoring of natural scenes. Recent change analysis methods leverage the temporal domain to improve the detection in time and the spatial delineation of surface changes, which occur with highly variable spatial and temporal properties. 4D objects-by-change (4D-OBCs) are specifically designed to extract individual surface activities which may occur in the same area, both consecutively or simultaneously. In this paper, we investigate how the extraction of 4D-OBCs can improve by considering uncertainties associated to change magnitudes using Kalman filtering of surface change time series. Based on the change rate contained in the Kalman state vector, the method automatically detects timespans of accumulation and erosion processes. This renders change detection independent from a globally fixed minimum detectable change value. Considering uncertainties associated to change allows detecting and classifying more occurrences of relevant surface activity, depending on the change rate and magnitude. We compare the Kalman-based seed detection to a regression-based method using a three-month tri-hourly terrestrial laser scanning time series (763 epochs) acquired of mass movements at a high-mountain slope in Austria. The Kalman-based method successfully identifies all relevant changes at the example location for the extraction of 4D-OBCs, without requiring the definition of a global minimum change magnitude. In the future, we will further investigate which kind of change detection method is best suited for which types of surface activity.
... Point-based registration approaches are most generally applicable to local features, where the loss function minimizes the distance between pairs of points or points and a model. Examples of point cloud registration algorithms [60] include iterative algorithms such as the Iterative Closest Point (ICP) algorithm [61] where points contribute uniformly to a solution, or the Coherent Point Drift (CPD) algorithm [24] where points contribute according to a probabilistic weighting. 3D SIFT keypoint correspondences have been used to achieve point-based registration the context of image-guided neurosurgery [8], including non-rigid image registration via thin plate splines [62], finite element methods [63] and the CPD algorithm [64], however these have made use of keypoint locations and not orientation and scale properties as we propose. ...
Preprint
Full-text available
This paper proposes to extend local image features in 3D to include invariance to discrete symmetry including inversion of spatial axes and image contrast. A binary feature sign $s \in \{-1,+1\}$ is defined as the sign of the Laplacian operator $\nabla^2$, and used to obtain a descriptor that is invariant to image sign inversion $s \rightarrow -s$ and 3D parity transforms $(x,y,z)\rightarrow(-x,-y,-z)$, i.e. SP-invariant or SP-symmetric. SP-symmetry applies to arbitrary scalar image fields $I: R^3 \rightarrow R^1$ mapping 3D coordinates $(x,y,z) \in R^3$ to scalar intensity $I(x,y,z) \in R^1$, generalizing the well-known charge conjugation and parity symmetry (CP-symmetry) applying to elementary charged particles. Feature orientation is modeled as a set of discrete states corresponding to potential axis reflections, independently of image contrast inversion. Two primary axis vectors are derived from image observations and potentially subject to reflection, and a third axis is an axial vector defined by the right-hand rule. Augmenting local feature properties with sign in addition to standard (location, scale, orientation) geometry leads to descriptors that are invariant to coordinate reflections and intensity contrast inversion. Feature properties are factored in to probabilistic point-based registration as symmetric kernels, based on a model of binary feature correspondence. Experiments using the well-known coherent point drift (CPD) algorithm demonstrate that SIFT-CPD kernels achieve the most accurate and rapid registration of the human brain and CT chest, including multiple MRI modalities of differing intensity contrast, and abnormal local variations such as tumors or occlusions. SIFT-CPD image registration is invariant to global scaling, rotation and translation and image intensity inversions of the input data.
... In [33] a patch collaborative approach is presented that applies the image denoising procedure of [34] on the point set. First similar in geometry patches of the point set are aligned in the same coordinate system using ICP [35] forming a collaborative patch. These patches are thinned using the Laplace-Beltrami operator (LBO) on the spectral domain. ...
Article
A point cloud smoothing algorithm is presented which is based on the mesh filtering procedure of Taubin. This is accomplished by defining a robust one-ring neighborhood for each vertex of the point cloud based on the elliptic Gabriel graph and by incorporating non-uniform Gaussian weights during the smoothing process. The proposed method is robust to noise and very simple to implement. It is able to produce high quality smoothed point clouds that avoid the shrinkage, point clustering and edge over-smoothing problems, without the use of normal information which is computationally unstable on noisy point sets. Through an extensive comparison with state-of-the-art smoothing methods the advantages of the proposed method in both free-form and CAD-oriented models is presented. A framework for efficient GPU implementation is also provided. The proposed method is able to process more than 15 million points in about 20 secs making it very suitable for point sets produced by modern 3D scanners.
... The reconstructed mesh was roughly aligned to the original mesh of the phantom using an affine transformation based on landmarks. Then, the registration to the original mesh was refined by applying the iterative closest point (ICP) algorithm 35 . The accuracy of each node of the reconstructed mesh was computed as the Euclidean distance between that node and the closest node in the original mesh. ...
Article
Full-text available
Functional near infrared spectroscopy and electroencephalography are non-invasive techniques that rely on sensors placed over the scalp. The spatial localization of the measured brain activity requires the precise individuation of sensor positions and, when individual anatomical information is not available, the accurate registration of these sensor positions to a head atlas. Both these issues could be successfully addressed using a photogrammetry-based method. In this study we demonstrate that sensor positions can be accurately detected from a video recorded with a smartphone, with a median localization error of 0.7 mm, comparable if not lower, to that of conventional approaches. Furthermore, we demonstrate that the additional information of the shape of the participant’s head can be further exploited to improve the registration of the sensor’s positions to a head atlas, reducing the median sensor localization error of 31% compared to the standard registration approach.
... ICP algorithms [2,9] have been widely used in the industry for point cloud registration and model matching. A large number of variants [19,20,22] have been proposed since the advent of this algorithm for better accuracy and faster convergence. ...
Preprint
Full-text available
Tensegrity robots, which are composed of rigid compressive elements (rods) and flexible tensile elements (e.g., cables), have a variety of advantages, including flexibility, light weight, and resistance to mechanical impact. Nevertheless, the hybrid soft-rigid nature of these robots also complicates the ability to localize and track their state. This work aims to address what has been recognized as a grand challenge in this domain, i.e., the pose tracking of tensegrity robots through a markerless, vision-based method, as well as novel, onboard sensors that can measure the length of the robot's cables. In particular, an iterative optimization process is proposed to estimate the 6-DoF poses of each rigid element of a tensegrity robot from an RGB-D video as well as endcap distance measurements from the cable sensors. To ensure the pose estimates of rigid elements are physically feasible, i.e., they are not resulting in collisions between rods or with the environment, physical constraints are introduced during the optimization. Real-world experiments are performed with a 3-bar tensegrity robot, which performs locomotion gaits. Given ground truth data from a motion capture system, the proposed method achieves less than 1 cm translation error and 3 degrees rotation error, which significantly outperforms alternatives. At the same time, the approach can provide pose estimates throughout the robot's motion, while motion capture often fails due to occlusions.
... The ULS, DIM and TLS point clouds of both epochs were aligned to a georeferenced reference TLS epoch of 2019 (cf. Zahs et al., 2022) using an iterative closest point algorithm (ICP; Besl and McKay, 1992) in stable surfaces outside the rock glacier. We assess the alignment accuracy (Table 1) between point clouds of all epochs by calculating the standard deviation of M3C2 distances on stable rock walls distributed around the rock glacier (Zahs et al., 2022 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France ...
Article
Full-text available
Point clouds derived from UAV-borne laser scanning and UAV-borne photogrammetry provide new opportunities for 3D topographic monitoring in geographic research. The airborne acquisition strategy overcomes common challenges of ground-based techniques, such as limited spatial coverage or heterogeneous measurement distribution, and allows flexible repeated acquisitions at high temporal and spatial resolution. While UAV-borne 3D sensing techniques are expected to thereby enhance geographic monitoring, their specific potential for methods and algorithms of 3D change analysis is yet to be investigated. In our study, we assess point clouds originating from UAV-borne photogrammetry using dense image matching (DIM) and UAV-borne laser scanning (ULS) as input for 3D topographic change analysis at an active rock glacier in the Austrian Alps. We analyse surface change by using ULS and DIM point clouds of 2019 and 2021 as input for two state-of-the-art methods for pairwise surface change analysis: (1) The Multiscale Model to Model Cloud Comparison (M3C2) algorithm and (2) a recent M3C2-based approach (CD-PB M3C2) using plane correspondences to reduce the uncertainty of quantified change. We evaluate ULS-based and DIM-based change analysis regarding their performance in (1) achieving high spatial coverage of derived changes, (2) accurately quantifying magnitudes and uncertainty of change, and (3) detecting significant change (change magnitudes > associated uncertainty). As reference we use change quantified between two terrestrial laser scanning (TLS) surveys undertaken simultaneously with the ULS and DIM data acquisitions. Our study shows the improved spatial coverage of M3C2 achieved with point clouds acquired with UAVs (+ 60% of core points used for change analysis). For CD-PB M3C2, ULS and DIM point clouds enabled a spatially more uniform distribution of plane pairs used for change quantification and a slightly higher spatial coverage (+6% – +7% of core points used for change analysis) compared to the TLS reference. Magnitudes of M3C2 change were closer to the TLS reference for ULS-ULS (mean difference: 0.04 m; std. dev.: 0.05 m) compared to ULS-DIM (mean difference: 0.12 m; std. dev.: 0.08 m). Similar results were obtained for CD-PB M3C2 using ULS-ULS (mean difference: 0.02 m; std. dev.: 0.01 m) and ULS-DIM (mean difference: 0.06 m; std. dev.: 0.01 m). Moreover, magnitudes of change were above the associated uncertainty in 82% – 89% (M3C2) and 89% – 90% (CD-PB M3C2) of the area of change analysis. Our findings demonstrate the potential of ULS and DIM point clouds as input for accurate 3D topographic change analysis for the study at hand and can support the design and setup of 3D/4D Earth observation systems for rock glaciers and natural scenes with complex topography, such as landslides or debris covered glaciers.
... Given several sets of points in different coordinates, registration aims to align all sets of points into a common coordinate [9]. Ref. [10] proposes the iterative closest point (ICP) for registration by iteratively estimating point correspondence and performing least-square optimization. The authors of [7] propose the PointNet for raw point clouds classification and segmentation and provide some insights into dealing with raw point clouds. ...
Article
Full-text available
Atrial fibrillation (AF) is a common cardiac arrhythmia and affects one to two percent of the population. In this work, we leverage the three-dimensional atrial endocardial unipolar/bipolar voltage map to predict the AF type and recurrence of AF in 1 year. This problem is challenging for two reasons: (1) the unipolar/bipolar voltages are collected at different locations on the endocardium and the shapes of the endocardium vary widely in different patients, and thus the unipolar/bipolar voltage maps need aligning to the same coordinate; (2) the collected dataset size is very limited. To address these issues, we exploit a pretrained 3D point cloud registration approach and finetune it on left atrial voltage maps to learn the geometric feature and align all voltage maps into the same coordinate. After alignment, we feed the unipolar/bipolar voltages from the registered points into a multilayer perceptron (MLP) classifier to predict whether patients have paroxysmal or persistent AF, and the risk of recurrence of AF in 1 year for patients in sinus rhythm. The experiment shows our method classifies the type and recurrence of AF effectively.
... When adding a new pose to the graph, the sensor scans the environment in a range of 30 meters, and provides a point-cloud of it. This point-cloud is then matched to scans taken in previous poses using ICP matching (Besl and McKay 1992). If a match is found, a loop-closure factor (constraint) is added between these poses. ...
Article
Full-text available
In this work, we introduce a new and efficient solution approach for the problem of decision making under uncertainty, which can be formulated as decision making in a belief space, over a possibly high-dimensional state space. Typically, to solve a decision problem, one should identify the optimal action from a set of candidates, according to some objective. We claim that one can often generate and solve an analogous yet simplified decision problem, which can be solved more efficiently. A wise simplification method can lead to the same action selection, or one for which the maximal loss in optimality can be guaranteed. Furthermore, such simplification is separated from the state inference and does not compromise its accuracy, as the selected action would finally be applied on the original state. First, we present the concept for general decision problems and provide a theoretical framework for a coherent formulation of the approach. We then practically apply these ideas to decision problems in the belief space, which can be simplified by considering a sparse approximation of their initial belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a realistic active-SLAM problem and manage to significantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical and holds numerous possible extensions.
... The term weighted by µ 2 promotes orthogonality of the map by penalizing the off-diagonal entries of C C. Finally d ∈ {0, 1} ∀ ; the entries equal to 1 represent which singular values of C are expected to be non-zero. A refinement, similar to the iterative closest point algorithm [3] in the space of the coefficients, is then applied to the matrix C. As a final step, the spectral refinement approach of ZoomOut [34] is applied to the computed C. Given a map of size 50 × 50 as input, we apply ZoomOut to its 37 × 37 sub-matrix and get back a refined matrix C ZM of size 50 × 50. ...
Preprint
Full-text available
With the rise and advent of graph learning techniques, graph data has become ubiquitous. However, while several efforts are being devoted to the design of new convolutional architectures, pooling or positional encoding schemes, less effort is being spent on problems involving maps between (possibly very large) graphs, such as signal transfer, graph isomorphism and subgraph correspondence. With this paper, we anticipate the need for a convenient framework to deal with such problems, and focus in particular on the challenging subgraph alignment scenario. We claim that, first and foremost, the representation of a map plays a central role on how these problems should be modeled. Taking the hint from recent work in geometry processing, we propose the adoption of a spectral representation for maps that is compact, easy to compute, robust to topological changes, easy to plug into existing pipelines, and is especially effective for subgraph alignment problems. We report for the first time a surprising phenomenon where the partiality arising in the subgraph alignment task is manifested as a special structure of the map coefficients, even in the absence of exact subgraph isomorphism, and which is consistently observed over different families of graphs up to several thousand nodes.
... In this section, I describe the Iterative Closest Point (ICP) algorithm, first developed by [Besl and McKay, 1992;Chen and Medioni, 1992;Zhang, 1994], which was used to register ILM segmentations. ICP minimizes the difference between two clouds of points: one of the cloud points is kept fixed, while the other one is trans- ...
Thesis
Glaucoma is the leading cause of irreversible blindness worldwide. It is a progressive optic neuropathy in which retinal ganglion cell (RGC) axon loss, probably as a consequence of damage at the optic disc, causes a loss of vision, predominantly affecting the mid-peripheral visual field (VF). Glaucoma results in a decrease in vision-related quality of life and, therefore, early detection and evaluation of disease progression rates is crucial in order to assess the risk of functional impairment and to establish sound treatment strategies. The aim of my research is to improve glaucoma diagnosis by enhancing state of the art analyses of glaucoma clinical trial outcomes using advanced analytical methods. This knowledge would also help better design and analyse clinical trials, providing evidence for re-evaluating existing medications, facilitating diagnosis and suggesting novel disease management. To facilitate my objective methodology, this thesis provides the following contributions: (i) I developed deep learning-based super-resolution (SR) techniques for optical coherence tomography (OCT) image enhancement and demonstrated that using super-resolved images improves the statistical power of clinical trials, (ii) I developed a deep learning algorithm for segmentation of retinal OCT images, showing that the methodology consistently produces more accurate segmentations than state-of-the-art networks, (iii) I developed a deep learning framework for refining the relationship between structural and functional measurements and demonstrated that the mapping is significantly improved over previous techniques, iv) I developed a probabilistic method and demonstrated that glaucomatous disc haemorrhages are influenced by a possible systemic factor that makes both eyes bleed simultaneously. v) I recalculated VF slopes, using the retinal never fiber layer thickness (RNFLT) from the super-resolved OCT as a Bayesian prior and demonstrated that use of VF rates with the Bayesian prior as the outcome measure leads to a reduction in the sample size required to distinguish treatment arms in a clinical trial.
... • No-Learned. Point-to-point Iterative Closest Point (ICP) [3] implemented in Open3D [41], Coherent Point Drift (CPD) [29] and its Bayesian formulation BCPD [16], ZoomOut [25], Sinkhorn optimal transport method implemented in Geomloss [13] and Keops [12], point-to-point Nonrigid ICP (NICP) [30], coordinate-MLP based approaches including Neural Scene Flow Prior (NSFP) [19] and Nerfies [32]. ...
Preprint
Non-rigid point cloud registration is a key component in many computer vision and computer graphics applications. The high complexity of the unknown non-rigid motion make this task a challenging problem. In this paper, we break down this problem via hierarchical motion decomposition. Our method called Neural Deformation Pyramid (NDP) represents non-rigid motion using a pyramid architecture. Each pyramid level, denoted by a Multi-Layer Perception (MLP), takes as input a sinusoidally encoded 3D point and outputs its motion increments from the previous level. The sinusoidal function starts with a low input frequency and gradually increases when the pyramid level goes down. This allows a multi-level rigid to nonrigid motion decomposition and also speeds up the solving by 50 times compared to the existing MLP-based approach. Our method achieves advanced partialto-partial non-rigid point cloud registration results on the 4DMatch/4DLoMatch benchmark under both no-learned and supervised settings.
... When a stereo camera rig is used [94], a valid spatial transformation between the two pairs of matching images is computed through the widely used iterative closest point (ICP) algorithm for matching 3D geometry [269]. Given an initial starting transformation, ICP iteratively determines the transformation among two point clouds that minimizes their points' error. ...
Article
Full-text available
Where am I? This is one of the most critical questions that any intelligent system should answer to decide whether it navigates to a previously visited area. This problem has long been acknowledged for its challenging nature in simultaneous localization and mapping (SLAM), wherein the robot needs to correctly associate the incoming sensory data to the database allowing consistent map generation. The significant advances in computer vision achieved over the last 20 years, the increased computational power, and the growing demand for long-term exploration contributed to efficiently performing such a complex task with inexpensive perception sensors. In this article, visual loop closure detection, which formulates a solution based solely on appearance input data, is surveyed. We start by briefly introducing place recognition and SLAM concepts in robotics. Then, we describe a loop closure detection system’s structure, covering an extensive collection of topics, including the feature extraction, the environment representation, the decision-making step, and the evaluation process. We conclude by discussing open and new research challenges, particularly concerning the robustness in dynamic environments, the computational complexity, and scalability in long-term operations. The article aims to serve as a tutorial and a position paper for newcomers to visual loop closure detection.
... The task of point cloud registration is to find the spatial transformation between two point clouds. Traditional methods are mainly based on the iterative closest point (ICP) [3,47] and its variants [16,36,40,41,48,55,67]. Recent works learn deep neural networks for point cloud registration and can be divided into two streams. ...
Preprint
Full-text available
3D motion estimation including scene flow and point cloud registration has drawn increasing interest. Inspired by 2D flow estimation, recent methods employ deep neural networks to construct the cost volume for estimating accurate 3D flow. However, these methods are limited by the fact that it is difficult to define a search window on point clouds because of the irregular data structure. In this paper, we avoid this irregularity by a simple yet effective method.We decompose the problem into two interlaced stages, where the 3D flows are optimized point-wisely at the first stage and then globally regularized in a recurrent network at the second stage. Therefore, the recurrent network only receives the regular point-wise information as the input.In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task. For 3D scene flow estimation, we make comparisons on the widely used FlyingThings3D and KITTIdatasets. For point cloud registration, we follow previous works and evaluate the data pairs with large pose and partially overlapping from ModelNet40. The results show that our method outperforms the previous method and achieves a new state-of-the-art performance on both 3D scene flow estimation and point cloud registration, which demonstrates the superiority of the proposed zero-order method on irregular point cloud data.
... Accordingly, when sequential point clouds come in, odometry could be calculated by accumulating relative poses according to time sequence. A representative one is Iterative Closest Point (ICP) [34], and it has made an impact on subsequent studies. Unfortunately, the ICP-variants set point pairs by using a greedy, exhaustive nearest neighbor (NN) search for every iteration, so they are only applicable when two point clouds are close enough or nearly overlapped [35]. ...
Preprint
Full-text available
Numerous researchers have conducted studies to achieve fast and robust ground-optimized LiDAR odometry methods for terrestrial mobile platforms. In particular, ground-optimized LiDAR odometry usually employs ground segmentation as a preprocessing method. This is because most of the points in a 3D point cloud captured by a 3D LiDAR sensor on a terrestrial platform are from the ground. However, the effect of the performance of ground segmentation on LiDAR odometry is still not closely examined. In this paper, a robust ground-optimized LiDAR odometry framework is proposed to facilitate the study to check the effect of ground segmentation on LiDAR SLAM based on the state-of-the-art (SOTA) method. By using our proposed odometry framework, it is easy and straightforward to test whether ground segmentation algorithms help extract well-described features and thus improve SLAM performance. In addition, by leveraging the SOTA ground segmentation method called Patchwork, which shows robust ground segmentation even in complex and uneven urban environments with little performance perturbation, a novel ground-optimized LiDAR odometry is proposed, called PaGO-LOAM. The methods were tested using the KITTI odometry dataset. \textit{PaGO-LOAM} shows robust and accurate performance compared with the baseline method. Our code is available at https://github.com/url-kaist/AlterGround-LeGO-LOAM.
... Regardless of its 30 th anniversary, the widest known approach for aligning two point clouds is still the ICP algorithm, which was developed independently by Besl and McKay [21], as well as Chen and Medioni [22] in 1992. It aims to find the rigid transformation to align one point cloud ("data") with a second one ("model"), by iteratively searching pairs of closest points between both clouds and optimizing the initially guessed transformation by minimizing the distance of all pairs. ...
Preprint
Both, robot and hand-eye calibration haven been object to research for decades. While current approaches manage to precisely and robustly identify the parameters of a robot's kinematic model, they still rely on external devices, such as calibration objects, markers and/or external sensors. Instead of trying to fit the recorded measurements to a model of a known object, this paper treats robot calibration as an offline SLAM problem, where scanning poses are linked to a fixed point in space by a moving kinematic chain. As such, the presented framework allows robot calibration using nothing but an arbitrary eye-in-hand depth sensor, thus enabling fully autonomous self-calibration without any external tools. My new approach is utilizes a modified version of the Iterative Closest Point algorithm to run bundle adjustment on multiple 3D recordings estimating the optimal parameters of the kinematic model. A detailed evaluation of the system is shown on a real robot with various attached 3D sensors. The presented results show that the system reaches precision comparable to a dedicated external tracking system at a fraction of its cost.
... Regarding the choice of correspondences, iterative closest point (ICP) [8], [9] follows the very simple yet effective approach of assigning the closest point at each iteration. This is known as hard-assignment. ...
... Existing solutions can be categorized into two main methods. Methods that iteratively minimize the distance between nearest-neighboring points with good initialization, e.g., iterative closest point (ICP) [14] and its variants are popular approaches. Other methods randomly sample corresponding points through robust descriptor matching, e.g., using random sample consensus (RANSAC) [15]. ...
Preprint
Full-text available
Point cloud registration is a fundamental task in many applications such as localization, mapping, tracking, and reconstruction. The successful registration relies on extracting robust and discriminative geometric features. Existing learning-based methods require high computing capacity for processing a large number of raw points at the same time. Although these approaches achieve convincing results, they are difficult to apply in real-world situations due to high computational costs. In this paper, we introduce a framework that efficiently and economically extracts dense features using graph attention network for point cloud matching and registration (DFGAT). The detector of the DFGAT is responsible for finding highly reliable key points in large raw data sets. The descriptor of the DFGAT takes these key points combined with their neighbors to extract invariant density features in preparation for the matching. The graph attention network uses the attention mechanism that enriches the relationships between point clouds. Finally, we consider this as an optimal transport problem and use the Sinkhorn algorithm to find positive and negative matches. We perform thorough tests on the KITTI dataset and evaluate the effectiveness of this approach. The results show that this method with the efficiently compact keypoint selection and description can achieve the best performance matching metrics and reach highest success ratio of 99.88% registration in comparison with other state-of-the-art approaches.
... The average 3D head mesh with 56804 vertices generated from the Basel Face Model (BFM) (Paysan et al., 2009;Gerig et al., 2018) was used for this study and its nose tip was selected as the origin in a three-dimensional system of coordinate, as shown in Fig. 4(a1)-(c1). The head poses of scans are corrected using the rigid iterative closest point (ICP) algorithm (Besl and McKay, 1992) with the head template. ...
Article
3D parametric head models and 3D anthropometric analyses are essential in designing head-related products. Existing 3D parametric head modeling only analyzes sex-based dimension and shape variances but ignores age-related effects. Thus, we developed detailed 3D statistical models of adults' heads based on 1496 3D head scans of Chinese individuals aged 18 to 84, including a global model and four subclass models. An age threshold identification method has been presented to cluster the 3D heads into four groups based on their anthropometric features. The total explained variance of the first five principal components (PCs) in all head models was more than 76%. The relationships between the PCs, the head areas, and the specific dimensions of head regions have also been identified in this study. These models can be reliably used for fit assessment, mass customization, and virtual fittings for head-related products.
... Then the target registration error (TRE) is computed, using 159 targets whose positions are unknown to us. Before we begin the elastic registration process, we use the standard Iterative Closest Point method [2] to perform a rigid alignment. Then we set a fixed boundary condition on a small zone of the posterior face to enforce the uniqueness of the solution to the direct elastic problem. ...
Preprint
Full-text available
The nonrigid alignment between a pre-operative biomechanical model and an intra-operative observation is a critical step to track the motion of a soft organ in augmented surgery. While many elastic registration procedures introduce artificial forces into the direct physical model to drive the registration, we propose in this paper a method to reconstruct the surface loading that actually generated the observed deformation. The registration problem is formulated as an optimal control problem where the unknown is the surface force distribution that applies on the organ and the resulting deformation is computed using an hyperelastic model. Advantages of this approach include a greater control over the set of admissible force distributions, in particular the opportunity to choose where forces should apply, thus promoting physically-consistent displacement fields. The optimization problem is solved using a standard adjoint method. We present registration results with experimental phantom data showing that our procedure is competitive in terms of accuracy. In an example of application, we estimate the forces applied by a surgery tool on the organ. Such an estimation is relevant in the context of robotic surgery systems, where robotic arms usually do not allow force measurements, and providing force feedback remains a challenge.
... Generally, LiDAR odometry is treated as a point-set registration problem in literature. Given the input point clouds perceived by LiDAR at two consecutive timestamps, Iterative Closed Point (ICP) algorithm [16] is the typical solution to align them, which updates the relative transformation iteratively until convergence. LOAM [1] is the most successful LiDAR odometry approach, which selects the feature points by computing the roughness of scattered points on each scan line. ...
Preprint
Solid-state LiDARs are more compact and cheaper than the conventional mechanical multi-line spinning LiDARs, which have become increasingly popular in autonomous driving recently. However, there are several challenges for these new LiDAR sensors, including severe motion distortions, small field of view and sparse point cloud, which hinder them from being widely used in LiDAR odometry. To tackle these problems, we present an effective continuous-time LiDAR odometry (ECTLO) method for the Risley prism-based LiDARs with non-repetitive scanning patterns. To account for the noisy data, a filter-based point-to-plane Gaussian Mixture Model is used for robust registration. Moreover, a LiDAR-only continuous-time motion model is employed to relieve the inevitable distortions. To facilitate the implicit data association in parallel, we maintain all map points within a single range image. Extensive experiments have been conducted on various testbeds using the solid-state LiDARs with different scanning patterns, whose promising results demonstrate the efficacy of our proposed approach.
... Let Q = {q c } C c=1 be a set of selected parts (where C is their total number), and the image I t be the target pose to be reconstructed. A straight-forward method for recon-struction is fitting the optimal translation and rotation transformation for each part based on the predicted correspondences using the Procrustes orthogonal analysis used in ICP methods [5]. Such method is computationally efficient, but is sensitive to the noise in the predicted correspondences, and cannot capture non-rigid deformations that may exist in the input poses. ...
Preprint
Rigged puppets are one of the most prevalent representations to create 2D character animations. Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. Our method is trained to infer articulated parts, e.g. head, torso and limbs, that can be re-assembled to best reconstruct the given poses. Our results demonstrate significantly better performance than alternatives qualitatively and quantitatively.Our project page https://zhan-xu.github.io/parts/ includes our code and data.
... Also if the moving direction is varying, we will obtain the result with varying directions. In both cases we can rotate and translate the 3D data to match the result from the previous frame(s) by utilizing accurate registration algorithms such as iterative closest point (ICP) registration [3]. ...
Preprint
In this paper, we describe a method to capture nearly entirely spherical (360 degree) depth information using two adjacent frames from a single spherical video with motion parallax. After illustrating a spherical depth information retrieval using two spherical cameras, we demonstrate monocular spherical stereo by using stabilized first-person video footage. Experiments demonstrated that the depth information was retrieved on up to 97% of the entire sphere in solid angle. At a speed of 30 km/h, we were able to estimate the depth of an object located over 30 m from the camera. We also reconstructed the 3D structures (point cloud) using the obtained depth data and confirmed the structures can be clearly observed. We can apply this method to 3D structure retrieval of surrounding environments such as 1) previsualization, location hunting/planning of a film, 2) real scene/computer graphics synthesis and 3) motion capture. Thanks to its simplicity, this method can be applied to various videos. As there is no pre-condition other than to be a 360 video with motion parallax, we can use any 360 videos including those on the Internet to reconstruct the surrounding environments. The cameras can be lightweight enough to be mounted on a drone. We also demonstrated such applications.
... The fine registration process in this work uses the iterative closest point (ICP) algorithm [35], a point-to-point registration algorithm based on Euclidean distances between the points in the two point clouds ( Figure 2). First, the algorithm finds the nearest neighbor point correspondences between the point clouds using kd-tree space partitioning to reduce the search complexity [53]. ...
Article
Full-text available
Recent improvements in remote sensing technologies have shown that techniques such as photogrammetry and laser scanning can resolve geometric details at the millimeter scale. This is significant because it has expanded the range of structural health monitoring scenarios where these techniques can be used. In this work, we explore how 3D geometric measurements extracted from photogrammetric point clouds can be used to evaluate the performance of a highway bridge during a static load test. Various point cloud registration and deformation tracking algorithms are explored. Included is an introduction to a novel deformation tracking algorithm that uses the interpolation technique of kriging as the basis for measuring the geometric changes. The challenging nature of 3D point cloud data means that statistical methods must be employed to adequately evaluate the deformation field of the bridge. The results demonstrate a pathway from the collection of digital photographs to a mechanical analysis with results that capture the bridge deformation within one standard deviation of the mean reported value. These results are promising given that the midspan bridge deformation for the load test is only a few millimeters. Ultimately, the approaches evaluated in this work yielded errors on the order of 1 mm or less for ground truth deflections as small as 3.5 mm. Future work for this method will investigate using these results for updating finite element models.
... Therefore, research on LiDAR odometry is crucial for were trained for feature point culling and regression of relative poses. The other stream of hybrid methods, represented by DMLO [32] and 3D3L [36], extracts and matches the feature points using deep learning on the spherical projection images of LiDAR frames and then performs ICP [28] on the correspondences to obtain the solution by SVD (Singular Value Decomposition) or nonlinear optimization. They keep the common flaw as the end-to-end method by taking the spherical projected image as an input, and it is not necessary to compute the transformation by network modules. ...
Article
Full-text available
An accurate ego-motion estimation solution is vital for autonomous vehicles. LiDAR is widely adopted in self-driving systems to obtain depth information directly and eliminate the influence of changing illumination in the environment. In LiDAR odometry, the lack of descriptions of feature points as well as the failure of the assumption of uniform motion may cause mismatches or dilution of precision in navigation. In this study, a method to perform LiDAR odometry utilizing a bird’s eye view of LiDAR data combined with a deep learning-based feature point is proposed. Orthographic projection is applied to generate a bird’s eye view image of a 3D point cloud. Thereafter, an R2D2 neural network is employed to extract keypoints and compute their descriptors. Based on those keypoints and descriptors, a two-step matching and pose estimation is designed to keep these feature points tracked over a long distance with a lower mismatch ratio compared to the conventional strategy. In the experiment, the evaluation of the proposed algorithm on the KITTI training dataset demonstrates that the proposed LiDAR odometry can provide more accurate trajectories compared with the handcrafted feature-based SLAM (Simultaneous Localization and Mapping) algorithm. In detail, a comparison of the handcrafted descriptors is demonstrated. The difference between the RANSAC (Random Sample Consensus) algorithm and the two-step pose estimation is also demonstrated experimentally. In addition, the data collected by Velodyne VLP-16 is also evaluated by the proposed solution. The low-drift positioning RMSE (Root Mean Square Error) of 4.70 m from approximately 5 km mileage shown in the result indicates that the proposed algorithm has generalization performance on low-resolution LiDAR.
... It iteratively estimates the transformation for the closest distance between two point sets of objects or images. The ICP-based registrations are applied to map 3D shapes [26,27] and surfaces [28]. The ICP registrations are prominent in non-rigid 3D-2D registrations of vessel structures and arteries for its flexibility under variable degree of deformations [29][30][31]. ...
Article
Full-text available
Spine surgeries are vulnerable to wrong-level surgeries and postoperative complications because of their complex structure. Unavailability of the 3D intraoperative imaging device, low-contrast intraoperative X-ray images, variable clinical and patient conditions, manual analyses, lack of skilled technicians, and human errors increase the chances of wrong-site or wrong-level surgeries. State of the art work refers 3D-2D image registration systems and other medical image processing techniques to address the complications associated with spine surgeries. Intensity-based 3D-2D image registration systems had been widely practiced across various clinical applications. However, these frameworks are limited to specific clinical conditions such as anatomy, dimension of image correspondence, and imaging modalities. Moreover, there are certain prerequisites for these frameworks to function in clinical application, such as dataset requirement, speed of computation, requirement of high-end system configuration, limited capture range, and multiple local maxima. A simple and effective registration framework was designed with a study objective of vertebral level identification and its pose estimation from intraoperative fluoroscopic images by combining intensity-based and iterative control point (ICP)–based 3D-2D registration. A hierarchical multi-stage registration framework was designed that comprises coarse and finer registration. The coarse registration was performed in two stages, i.e., intensity similarity-based spatial localization and source-to-detector localization based on the intervertebral distance correspondence between vertebral centroids in projected and intraoperative X-ray images. Finally, to speed up target localization in the intraoperative application, based on 3D-2D vertebral centroid correspondence, a rigid ICP-based finer registration was performed. The mean projection distance error (mPDE) measurement and visual similarity between projection image at finer registration point and intraoperative X-ray image and surgeons’ feedback were held accountable for the quality assurance of the designed registration framework. The average mPDE after peak signal to noise ratio (PSNR)–based coarse registration was 20.41mm. After the coarse registration in spatial region and source to detector direction, the average mPDE reduced to 12.18mm. On finer ICP-based registration, the mean mPDE was finally reduced to 0.36 mm. The approximate mean time required for the coarse registration, finer registration, and DRR image generation at the final registration point were 10 s, 15 s, and 1.5 min, respectively. The designed registration framework can act as a supporting tool for vertebral level localization and its pose estimation in an intraoperative environment. The framework was designed with the future perspective of intraoperative target localization and its pose estimation irrespective of the target anatomy. Graphical abstract
... (Iterative Closest Point) [6] between the corresponding point cloud to calculate an accurate relative pose. Although this work has good loop closure-detection performance in many datasets, its defects are also obvious and fatal: ...
Preprint
Place recognition technology endows a SLAM algorithm with the ability to eliminate accumulated errors and to relocalize itself. Existing methods on point cloud-based place recognition often leverage the matching of global descriptors which are lidar-centric. These methods have the following two major defects: place recognition cannot be performed when the distance between the two point clouds is far, and only the rotation angle can be calculated without the offset in the X and Y direction. To solve these two problems, we propose a novel global descriptor, which is built around the Main Object, in this way, descriptors are no longer dependent on the observation position. We analyze the theory that this method can perfectly solve the above two problems, and conduct a lot of experiments in KITTI and some extreme scenarios, which show that our method has obvious advantages over traditional methods.
Preprint
Full-text available
Pose registration is critical in vision and robotics. This paper focuses on the challenging task of initialization-free pose registration up to 7DoF for homogeneous and heterogeneous measurements. While recent learning-based methods show promise using differentiable solvers, they either rely on heuristically defined correspondences or are prone to local minima. We present a differentiable phase correlation (DPC) solver that is globally convergent and correspondence-free. When combined with simple feature extraction networks, our general framework DPCN++ allows for versatile pose registration with arbitrary initialization. Specifically, the feature extraction networks first learn dense feature grids from a pair of homogeneous/heterogeneous measurements. These feature grids are then transformed into a translation and scale invariant spectrum representation based on Fourier transform and spherical radial aggregation, decoupling translation and scale from rotation. Next, the rotation, scale, and translation are independently and efficiently estimated in the spectrum step-by-step using the DPC solver. The entire pipeline is differentiable and trained end-to-end. We evaluate DCPN++ on a wide range of registration tasks taking different input modalities, including 2D bird's-eye view images, 3D object and scene measurements, and medical images. Experimental results demonstrate that DCPN++ outperforms both classical and learning-based baselines, especially on partially observed and heterogeneous measurements.
Article
Real-time performance and global consistency are extremely important in Simultaneous Localization and Mapping (SLAM) problems. Classic lidar-based SLAM systems often consist of front-end odometry and back-end pose optimization. However, due to expensive computation, it is often difficult to achieve loop-closure detection without compromising the real-time performance of the odometry. We propose a SLAM system where scan-to-submap-based local lidar odometry and global pose optimization based on submap construction as well as loop-closure detection are designed as separated from each other. In our work, extracted edge and surface feature points are inserted into two consecutive feature submaps and added to the pose graph prepared for loop-closure detection and global pose optimization. In addition, a submap is added to the pose graph for global data association when it is marked as in a finished state. In particular, a method to filter out false loops is proposed to accelerate the construction of constraints in the pose graph. The proposed method is evaluated on public datasets and achieves competitive performance with pose estimation frequency over 15 Hz in local lidar odometry and low drift in global consistency.
Article
Artificial glacier melt reduction is gaining increasing attention because of rapid glacier retreats and the projected acceleration of future mass losses. However, quantifying the effect of artificial melt reduction on glaciers in China has not been currently reported. Therefore, the case of Urumqi Glacier No.1 (eastern Tien Shan, China) is used to conduct a scientific evaluation of glacier cover efficiency for melt reduction between 24 June and 28 August 2021. By combining two high-resolution digital elevation models derived from terrestrial laser scanning and unmanned aerial vehicles, albedo, and meteorological data, glacier ablation mitigation under three different cover materials was assessed. The results revealed that up to 32% of mass loss was preserved in the protected areas compared with that of the unprotected areas. In contrast to the unprotected glacier surface, the nanofiber material reduced the glacier melt by up to 56%, which was significantly higher than that achieved by geotextiles (29%). This outcome could be attributed to the albedo of the materials and local climate factors. The nanofiber material showed higher albedo than the two geotextiles, dirty snow, clean ice, and dirty ice. Although clean snow had a higher albedo than the other materials, its impact on slowing glacier melt was minor due to the lower snowfall and relatively high air temperature after snowfall in the study area. This indicates that the efficiencies of nanofiber material and geotextiles can be beneficial in high-mountain areas. In general, the results of our study demonstrate that the high potential of glacier cover can help mitigate issues related to regions of higher glacier melt or lacking water resources, as well as tourist attractions.
Chapter
Due to the growing focus on minimally invasive surgery, there is increasing interest in intraoperative software support. For example, augmented reality can be used to provide additional information. Accurate registration is required for effective support. In this work, we present a manual registration method that aims at mimicking natural manipulation of 3D objects using tracked surgical instruments. This method is compared to a point-based registration method in a simulated laparoscopic environment. Both registration methods serve as an initial alignment step prior to surface-based registration refinement. For the evaluation, we conducted a user study with 12 participants. The registration methods were compared in terms of registration accuracy, registration duration, and subjective usability feedback. No significant differences could be found with respect to the previously mentioned criteria between the manual and the point-based registration methods. Thus, the manual registration did not outperform the reference method. However, we found that our method offers qualitative advantages, which may make it more suitable for some application scenarios. Furthermore we identified possible approaches for improvement, which should be investigated in the future to strengthen possible advantages of our registration method.
Article
The hydraulic thrust system that emphasizes grouped controlling of the shield machine is used for driving forward and pose adjustment. To control and regulate the pose of the shield machine precisely, it is of great significance to solve the total thrust and grouped thrusts (thrusts for different cylinder groups) reasonably. This study aims to develop a novel thrust model to calculate the total thrust and grouped thrust. Firstly, a soil-machine interaction model is developed to predict the loads acting on the shield machine during tunneling. Then an inverse kinematic analysis is performed to describe the velocity Jacobian and soil displacement induced by pose adjustment. Based on the kinematic mechanism of the shield machine under soil resistance, a thrust model is proposed to determine the total thrust and grouped thrusts employing static mechanics. In this model, both the soil properties and the shield parameters, such as operating parameters, pose parameters and geometric parameters are considered. The proposed thrust model is verified through a case application and the results indicate that the model could be used for assessing the total thrust and grouped thrusts during shield tunneling, which provides a theoretical basis for pose adjustment and automatic trajectory tracking control of the shield machine.
Article
Non‐rigid registration computes an alignment between a source surface with a target surface in a non‐rigid manner. In the past decade, with the advances in 3D sensing technologies that can measure time‐varying surfaces, non‐rigid registration has been applied for the acquisition of deformable shapes and has a wide range of applications. This survey presents a comprehensive review of non‐rigid registration methods for 3D shapes, focusing on techniques related to dynamic shape acquisition and reconstruction. In particular, we review different approaches for representing the deformation field, and the methods for computing the desired deformation. Both optimization‐based and learning‐based methods are covered. We also review benchmarks and datasets for evaluating non‐rigid registration methods, and discuss potential future research directions.
Article
This paper proposes MpIC, an on-manifold derivation of the probabilistic Iterative Correspondence (pIC) algorithm, which is a stochastic version of the original Iterative Closest Point. It is developed in the context of autonomous underwater karst exploration based on acoustic sonars. First, a derivation of pIC based on the Lie group structure of [Formula: see text] is developed. The closed-form expression of the covariance modeling the estimated rigid transformation is also provided. In a second part, its application to 3D scan matching between acoustic sonar measurements is proposed. It is a prolongation of previous work on elevation angle estimation from wide-beam acoustic sonar. While the pIC approach proposed is intended to be a key component in a Simultaneous Localization and Mapping framework, this paper focuses on assessing its viability on a unitary basis. As ground truth data in karst aquifer are difficult to obtain, quantitative experiments are carried out on a simulated karst environment and show improvement compared to previous state-of-the-art approach. The algorithm is also evaluated on a real underwater cave dataset demonstrating its practical applicability.
Preprint
Full-text available
The National Football League and Amazon Web Services teamed up to develop the best sports injury surveillance and mitigation program via the Kaggle competition. Through which the NFL wants to assign specific players to each helmet, which would help accurately identify each player's "exposures" throughout a football play. We are trying to implement a computer vision based ML algorithms capable of assigning detected helmet impacts to correct players via tracking information. Our paper will explain the approach to automatically track player helmets and their collisions. This will also allow them to review previous plays and explore the trends in exposure over time.
Article
Full-text available
Aesthetic shapes are usually actualized as 3D objects represented by free-form surfaces. The main components used to achieve aesthetic surfaces are 2D and 3D curves, which are the elements most basic for determining the shapes and silhouettes of industrial products. B´ ezier, B-Spline and NURBS are types of flexible curves developed for various design intents. These curves, however produce complex curvature functions that may undermine the formulation of shape aesthetics. A viable solution to this problem is to formulate aesthetic curves and surfaces from well-defined curvatures to improve aesthetic design quality. This paper advocates formalizing aesthetic curve and surface theories to fill the gap mentioned above, which has existed since the 1970s. This paper begins by reviewing on fair curves and surfaces. It then extensively discusses on the technicalities of Log-Aesthetic (LA) curves and surfaces and touches on industrial design applications. These emerging LA curves have a high potential for being used as standards to generate, evaluate and reshape aesthetic curves and surfaces, thus revolutionizing efficiency in developing curve and shape aesthetics.
Article
Full-text available
The problem of recognizing and locating rigid objects in 3-D space is important for applications of robotics and naviga tion. We analyze the task requirements in terms of what information needs to be represented, how to represent it, what kind of paradigms can be used to process it, and how to implement the paradigms. We describe shape surfaces by curves and patches, which we represent by linear primitives, such as points, lines, and planes. Next we describe algo rithms to construct this representation from range data. We then propose the paradigm of recognizing objects while locat ing them. We analyze the basic constraint of rigidity that can be exploited, which we implement as a prediction and verifi cation scheme that makes efficient use of the representation. Results are presented for data obtained from a laser range finder, but both the shape representation and the matching algorithm are general and can be used for other types of data, such as ultrasound, stereo, and tactile.
Article
Full-text available
Finding the relationship between two coordinate systems using pairs of measurements of the coordinates of a number of points in both systems is a classic photogrammetric task. It finds applications in stereophotogrammetry and in robotics. I present here a closed-form solution to the least-squares problem for three or more points. Currently various empirical, graphical, and numerical iterative methods are in use. Derivation of the solution is simplified by use of unit quaternions to represent rotation. I emphasize a symmetry property that a solution to this problem ought to possess. The best translational offset is the difference between the centroid of the coordinates in one system and the rotated and scaled centroid of the coordinates in the other system. The best scale is equal to the ratio of the root-mean-square deviations of the coordinates in the two systems from their respective centroids. These exact results are to be preferred to approximate methods based on measurements of a few selected points. The unit quaternion representing the best rotation is the eigenvector associated with the most positive eigenvalue of a symmetric 4X4 matrix. The elements of this matrix are combinations of sums of products of corresponding coordinates of the points.
Article
Full-text available
Two point sets { pi } and { p’i }; i = 1, 2, … … …, N are related by p’i= Rpi+ T + Ni, where R is a rotation matrix, T a translation vector, and Ni a noise vector. Given {pi} and { p’i }, we present an algorithm for finding the least-squares solution of R and T, which is based on the singular value decomposition (SVD) of a 3 × 3 matrix. This new algorithm is compared to two earlier algorithms with respect to computer time requirements. Copyright © 1987 by The Institute of Electrical and Electronics Engineers. Inc.
Article
Full-text available
This article describes a method to generate 3D-object recognition algorithms from a geometrical model for bin-picking tasks. Given a 3D solid model of an object, we first generate apparent shapes of an object under various viewer directions. Those apparent shapes are then classified into groups (representative attitudes) based on dominant visible faces and other features. Based on the grouping, recognition algorithms are generated in the form of an interpretation tree. The interpretation tree consists of two parts: the first part for classifying a target region in an image into one of the shape groups, and the second part for determining the precise attitude of the object within that group. We have developed a set of rules to find out what appropriate features are to be used in what order to generate an efficient and reliable interpretation tree. Features used in the interpretation tree include inertia of a region, relationship to the neighboring regions, position and orientation of edges, and extended Gaussian images. This method has been applied in a task for bin-picking objects that include both planar and cylindrical surfaces. As sensory data, we have used surface orientations from photometric stereo, depth from binocular stereo using oriented-region matching, and edges from an intensity image.
Article
Full-text available
This paper presents a comparative study and survey of model-based object-recognition algorithms for robot vision. The goal of these algorithms is to recognize the identity, position, and orientation of randomly oriented industrial parts. In one form this is commonly referred to as the "bin-picking" problem, in which the parts to be recognized are presented in a jumbled bin. The paper is organized according to 2-D, 2½-D, and 3-D object representations, which are used as the basis for the recognition algorithms. Three central issues common to each category, namely, feature extraction, modeling, and matching, are examined in detail. An evaluation and comparison of existing industrial part-recognition systems and algorithms is given, providing insights for progress toward future robot vision systems.
Article
Full-text available
Categories and shape prototypes are considered for a class ob object recognition problems where rigid and detailed object models are not available or do not apply. We propose a modeling system for generic objects to recognize different objects from the same category with only one generic model. We base our design of the modeling system upon the current psychological theories of categorization and human visual perception. The representation consists of a prototype represented by parts and their configuration. Parts are modeled by superquadric volumetric primitives which can be combined via Boolean operations to form objects. Variations between objects within a category are described by changes in structure and shape deformations of prototypical parts. Recovery of deformed supequadric models from sparse 3-D points is developed and some results are shown.
Article
Full-text available
This is a primer on extended Gaussian images. Extended Gaussian images are useful for representing the shapes of surfaces. They can be computed easily from: 1. needle maps obtained using photometric stereo; or 2. depth maps generated by ranging devices or binocular stereo. Importantly, they can also be determined simply from geometric models of the objects. Extended Gaussian images can be of use in at least two of the tasks facing a machine vision system: 1. recognition, and 2. determining the attitude in space of an object. Here, the extended Gaussian image is defined and some of its properties discussed. An elaboration for nonconvex objects is presented and several examples are shown.
Book
This book is based on the author's experience with calculations involving polynomial splines. It presents those parts of the theory which are especially useful in calculations and stresses the representation of splines as linear combinations of B-splines. After two chapters summarizing polynomial approximation, a rigorous discussion of elementary spline theory is given involving linear, cubic and parabolic splines. The computational handling of piecewise polynomial functions (of one variable) of arbitrary order is the subject of chapters VII and VIII, while chapters IX, X, and XI are devoted to B-splines. The distances from splines with fixed and with variable knots is discussed in chapter XII. The remaining five chapters concern specific approximation methods, interpolation, smoothing and least-squares approximation, the solution of an ordinary differential equation by collocation, curve fitting, and surface fitting. The present text version differs from the original in several respects. The book is now typeset (in plain TeX), the Fortran programs now make use of Fortran 77 features. The figures have been redrawn with the aid of Matlab, various errors have been corrected, and many more formal statements have been provided with proofs. Further, all formal statements and equations have been numbered by the same numbering system, to make it easier to find any particular item. A major change has occured in Chapters IX-XI where the B-spline theory is now developed directly from the recurrence relations without recourse to divided differences. This has brought in knot insertion as a powerful tool for providing simple proofs concerning the shape-preserving properties of the B-spline series.
Article
A general purpose computer vision system must be capable of recognizing three-dimensional (3-D) objects. This paper proposes a precise definition of the 3-D object recognition problem, discusses basic concepts associated with this problem, and reviews the relevant literature. Because range images (or depth maps) are often used as sensor input instead of intensity images, techniques for obtaining, processing, and characterizing range data are also surveyed.
Article
Finding a representation of a three-dimensional shape which is both concise and complete has important applications in computer vision. Every surface has a number of distinct typical views. The authors construct a model based on these views and the transitions between them. The result is a labeled graph on the viewing sphere. The label for each face is itself a graph that qualitatively represents the outline of the object when seen from any point on the interior of that face. For a generic smooth surface, the singularities that occur in typical views, as well as all transitions between those views, can be described by a known finite list of possibilities. The models have been worked out for a representative selection of examples of generic and nongeneric surfaces, including a sphere with a bump, a rotationally symmetric and twisted torus, and a surface with a furrow.
Article
Current three-dimensional vision algorithms can generate depth maps or vector maps from images, but few algorithms extract high-level information from these depth maps. This paper identifies one algorithm that determines an object's orientation by matching object models to depth map data. The object models are constructed by mapping surface orien tation data onto spheres. This process is based on a mathe matical theorem that can be applied only to convex objects, but some extensions for nonconvex objects are presented. The paper shows that a global approach can be used successfully in cases where objects do not touch one another. Another important result illustrates the size of the space of rotations. It shows that even when 6,000 rotations are almost uniformly distributed for matching, errors of 17 degrees are still possible.
Article
In this paper we analyze the problem of matching partially obscured, noise-corrupted images of composite scenes in two and three dimensions. We describe efficient methods for smoothing the noisy data and for matching portions of the observed object boundaries (or of characteristic curves lying on bounding surfaces of 3-D objects) to prestored models. We also report initial experiments showing the efficacy of this procedure.
Book
1 Introduction.- 1.1 The Input.- 1.2 Issues in Shape Description.- 1.2.1 Criteria for shape description.- 1.2.2 Choosing segmented surface descriptions.- 1.3 Issues of Recognition.- 1.3.1 Description of models.- 1.3.2 Matching primitives and algorithms.- 1.4 Questions for the Research.- 1.5 The Contribution of the Research.- 1.6 Organization of the Book.- 2 Survey of Previous Work.- 2.1 Survey of Shape Descriptions.- 2.1.1 Volume descriptions.- 2.1.2 Curve/line descriptions.- 2.1.3 Surface descriptions.- 2.1.4 Summary.- 2.2 Survey of Recognition Systems.- 2.2.1 3DPO.- 2.2.2 Nevatia and Binford.- 2.2.3 ACRONYM.- 2.2.4 Extended Gaussian Image (EGI).- 2.2.5 Oshima and Shirai.- 2.2.6 Grimson and Lozano-Perez.- 2.2.7 Faugeras and Hebert.- 2.2.8 Bhanu.- 2.2.9 Ikeuchi.- 2.2.10 Summary.- 3 Surface Segmentation and Description.- 3.1 Curvature Properties and Surface Discontinuities.- 3.2 Detecting Surface Features.- 3.2.1 Method 1: using directional curvatures and scale-space tracking.- 3.2.2 Method 2: using principal curvatures at a single scale.- 3.2.3 Method 3: using anisotropic filtering.- 3.3 Space Grouping.- 3.4 Spatial Linking.- 3.5 Segmentation into Surface Patches.- 3.6 Surface Fitting.- 3.7 Object Inference.- 3.7.1 Labeling boundaries.- 3.7.2 Occlusion and connectivity.- 3.7.3 Inferring and describing objects.- 3.8 Representing Objects by Attributed Graphs.- 3.8.1 Node attributes.- 3.8.2 Link attributes.- 4 Object Recognition.- 4.1 Representation of Models.- 4.2 Overview of the Matching Process.- 4.3 Module 1: Screener.- 4.4 Module 2: Graph Matcher.- 4.4.1 Compatibility between nodes of the model view and scene graph.- 4.4.2 Compatibility between two pairs of matching nodes.- 4.4.3 Computing the geometric transform.- 4.4.4 Modifications based on the geometric transform.- 4.4.5 Measuring the goodness of a match.- 4.5 Module 3: Analyzer.- 4.5.1 Splitting objects.- 4.5.2 Merging objects.- 4.6 Summary.- 5 Experimental Results.- 5.1 The Models.- 5.2 A Detailed Case Study.- 5.2.1 Search nodes expanded in recognition.- 5.3 Results for Other Scenes.- 5.4 Parallel Versus Sequential Search.- 5.5 Unknown Objects.- 5.6 Occlusion.- 6 Discussion and Conclusion.- 6.1 Discussion.- 6.1.1 Problems of segmentation.- 6.1.2 Problems of approximation.- 6.2 Contribution.- 6.3 Future Research.- 6.3.1 From surface to volume.- 6.3.2 Applications.- A Directional Curvatures.- B Surface Curvature.- C Approximation by Quadric Surfaces.
Article
The problem of recognizing what objects are where in the workspace of a robot can be cast as one of searching for a consistent matching between sensory data elements and equivalent model elements. In principle, this search space is enormous and to contain the potential combinatorial explosion, constraints between the data and model elements are needed. We derive a set of constraints for sparse sensory data that are applicable to a wide variety of sensors and examine their completeness and exhaustiveness. We then derive general theoretical bounds on the number of interpretations expected to be consistent with the data under the effects of local constraints. These bonds are applicable to many types of local constraints, other than the specific examples used here. For the case of sparse, noisy three-dimensional sensory data, explicit values for the bounds are computed and are shown to be consistent with empirical results obtained earlier in (Grimson and Lozano-Perez 1984). The results are used to demonstrate the graceful degradation of the recognition technique with the presence of noise in the data, and to predict the number of data points needed in general to uniquely determine the object being sensed.
Article
Views of objects composed of smooth surface patches whose intersections form smooth space curves are classified. These views may he described algebraically by mappings and diagrams of mappings from the line to the plane (for the crease) or from the plane to the plane (for the apparent contour). It is possible to derive a finite catalogue of generic views and their transitions, so that every view is either (up to smooth coordinate changes in source and target) equivalent to one of those in the catalogue or is of sufficiently high codimension.
The general paradigm of pose clustering is discussed and compared to other techniques applicable to the problem of object detection. Pose clustering is also called hypothesis accumulation and generalized Hough transform and is characterized by a “parallel” accumulation of low level evidence followed by a maxima or clustering step which selects pose hypotheses with strong support from the set of evidence. Examples are given showing the use of pose clustering in both 2D and 3D problems. Experiments show that the positional accuracy of points placed in the data space by a model pose obtained via clustering is comparable to the positional accuracy of the sensed data from which pose candidates are computed. A specific sensing system is described which yields an accuracy of a few millimeters. Complexity of the pose clustering approach relative to alternative approaches is discussed with reference to conventional computers and massively parallel computers. It is conjectured that the pose clustering approach can produce superior results in real time on a massively parallel machine.
This note shows that shape data alone, without absolute size, are highly effective in constraining the size of the search space of matches to stored 3D object models. The shape constraints developed are applied to sparse and error-prone measurements of surface orientations and scaled depths (that is, depths scaled by a constant but unknown factor) synthesized from polyhedral models which themselves have six degrees of positional freedom with respect to the sensor. The matching paradigm used is that of Grimson and Lozano-Pérez in which feasible interpretations of the data are obtained by requiring geometric consistency between metrics made on pairs of data and their associated matched pair of model faces and then tested by geometrical transformation. For a sufficiently interesting view of an object it is found that the number of feasible interpretations can be reduced to a handful if the surface normals are known to within a come with half angle of about 10°.
Article
A technique is presented for modelling with piecewise algebraic surface patches. Each surface patch is defined within a tetrahedral lattice of control points. The shape of the surface is modified by adjusting the weights of the control points. This scheme makes it possible to piece together two algebraic surface patches with any degree of cross-boundary derivative continuity. The application of piecewise algebraic surface patches to solid modelling is discussed.
Conference Paper
In this paper, we present a system that recognizes objects in a jumble, verifies them, and then determines some essential configurational information, such as which ones are on top. The approach is to use three-dimensional models of the objects to find them in range data. The matching strategy starts with a distinctive edge feature, such as the edge at the end of a cylindrical part, and then "grows" a match by add ing compatible features one at a time. (The order of features to be considered is predetermined by an interactive, off-line, feature-selection process.) Once a sufficient number of com patible features has been detected to allow a hypothesis to be formed, the verification procedure evaluates it by comparing the measured range data with data predicted according to the hypothesis. When all the objects in the scene have been hy pothesized and verified in this manner, a configuration- understanding procedure determines which objects are on top of others by analyzing the patterns of range data predicted from all the hypotheses. We also present experimental results of the system's performance in recognizing and locating castings in a bin.
Article
A representation technique for visible three-dimensional object surfaces is presented which uses regions that are homogeneous in certain intrinsic surface properties. First, smooth patches are fitted to the object surfaces; principal curvatures are then computed and surface points classified accordingly. Such a representation scheme has applications in various image processing tasks such as graphics display and recognition of objects. An algorithm is presented for computing object descriptions. The algorithm divides the range data array into windows and fits approximating surfaces to those windows that do not contain discontinuities in range. The algorithm is not restricted to polyhedral objects nor is it committed to a particular type of approximating surface. It uses tension splines which make the fitting patches locally adaptable to the shape of object surfaces. Maximal regions are then formed by coalescing patches with similar intrinsic curvature-based properties. Regions on the surface of the object can be subsequently organized into a labelled graph, where each node represents a region and is assigned a label depicting the type of region and containing the set of feature values computed for that region.
Article
The topic of model-building for 3-D objects is examined. Most 3-D object recognition systems construct models either manually or by training. Neither approach has been very satisfactory, particularly in designing object recognition systems which can handle a large number of objects. Recent interest in integrating mechanical CAD systems and vision systems has led to a third type of model building for vision: adaptation of preexisting CAD models of objects for recognition. If a solid model of an object to be recognized is already available in a manufacturing database, then it should be possible to infer automatically a model appropriate for vision tasks from the manufacturing model. Such a system has been developed. It uses 3-D object descriptions created on a commercial CAD system and expressed in both the industry-standard IGES form and a polyhedral approximation and performs geometric inferencing to obtain a relational graph representation of the object which can be stored in a database of models for object recognition. Relational graph models contain both view-independent information extracted from the IGES description and view-dependent information (patch areas) extracted from synthetic views of the object. It is argued that such a system is needed to efficiently create a large database (more than 100 objects) of 3-D models to evaluate matching strategies.
Book
This is an introduction to “polynomial continuation,” which is used to compute the solutions to systems of polynomial equations. The book shows how to solve practical problems but maintains an elementary mathematical perspective. The first two chapters illustrate most of the important concepts and numerical processes, using only high-school mathematics and some simple computer programs. Since 1987, when the book was first published, the field has advanced through many developments, noted below. Still, I have been gratified that students continue to take the trouble to let me know that they have found this book a useful starting point. The concrete, empirical, and conversational style of this book arose from my experiences as a mathematician at General Motors working with engineers. They sometimes were not convinced by proofs, but their mechanical intuition responded well to numerical experiments that demonstrated the “feel” of the concepts. I learned at this time the paradox that a verbal explanation can defeat its purpose, if made to a tactile person, no matter how correct the explanation, no matter how skilled the person. Consequently, the language in this book stays as basic as possible for as long as possible: subscripts are not used until the end of Chapter 2, the only “spaces” referenced are Euclidean, and no concept or computer code is more general than needed. Yet, the mathematical facts are complete and proven, not necessarily when they are introduced, but eventually. For this, I have relied on a few results from differential topology, avoiding more abstract mathematics. Carrying out the numerical exercises using the Fortran code provided is important to the learning experience of the book.
Article
This paper discusses how local measurements of three-dimensional positions and surface normals (recorded by a set of tactile sensors, or by three-dimensional range sensors), may be used to identify and locate objects, from among a set of known objects. The objects are modeled as polyhedra having up to six degrees of freedom relative to the sensors. We show that inconsistent hypotheses about pairings between sensed points and object surfaces can be discarded efficiently by using local constraints on: distances between faces, angles between face normals, and angles (relative to the surface normals) of vectors between sensed points. We show by simulation and by mathematical bounds that the number of hypotheses consistent with these constraints is small. We also show how to recover the position and orientation of the object from the sense data. The algorithm's performance on data obtained from a triangulation range sensor is illustrated.
Article
A solutionT of the least-squares problemAT=B +E, givenA andB so that trace (E′E)= minimum andT′T=I is presented. It is compared with a less general solution of the same problem which was given by Green [5]. The present solution, in contrast to Green's, is applicable to matricesA andB which are of less than full column rank. Some technical suggestions for the numerical computation ofT and an illustrative example are given.
Conference Paper
An efficient and reliable algorithm for computing the Euclidean distance between a pair of convex sets in R<sup>m</sup>described. Extensive numerical experience with a broad family of polytopes in R<sup>s</sup>shows that the computational cost is approximately linear in the total number of vertices specifying the two polytopes. The algorithm has special features which make its application in a variety of robotics problems attractive. These are discussed and an example of collision detection is given.
Conference Paper
A methodology for computing the distance between smooth objects in three-dimensional space is presented. The convex polytope, which is the basic solid modeling tool of prior developments, is replaced by a general convex set. This permits the direct treatment of objects with curved surfaces, eliminating the errors caused by polytope approximations. The computational procedure is a simple extension of the efficient distance algorithm described by E.G. Gilbert et al. (1988). While the convergence of the algorithm is not finite, it is fast and an effective stopping condition is available. Procedures for treating a rich family of smooth objects are given. Extensive numerical experiments support the claimed efficiency
Article
An algorithm for computing the Euclidean distance between a pair of convex sets in R <sup>m</sup> is described. Extensive numerical experience with a broad family of polytopes in R <sup>3</sup> shows that the computational cost is approximately linear in the total number of vertices specifying the two polytopes. The algorithm has special features which makes its application in a variety of robotics problems attractive. These features are discussed and an example of collision detection is given
Article
The Initial Graphics Exchange Specification (IGES) is considered as a method for communication of data on computer aided design and manufacture (CAD/CAM) between different systems. Data format for product design and manufacturing information created and stored in a CAD/CAM system in computer-readable form, the IGES, is in the public domain and is designed to be independent of all CAD/CAM systems. The benefits of this common format are indicated, its standardization status is discussed, and the development of translators to and from IGES format is described.
Article
A scheme is developed to register range images in an environment where distinctive features are scarce. Another issue addressed here is conflicting situations that may arise from pair-wise registration of multiple overlapping range images whether or not they contain distinctive features. There can be two causes for this: (a) error in individual registrations or (b) compression or bending in range images. The authors develop a scheme for resolving such conflicts for the case where range images share a common reference surface, i.e. when the transformation matrix between two overlapping images involves only three components: two translations and one rotation. The authors implemented this scheme to map the floor of the ocean, where the range data is obtained by a multibeam echo-sounder system installed aboard a sailing ship producing multiple overlapping range images. The system developed is the first automated system for correctly registered mapping of the ocean floor; it is efficient and robust
Article
An evidence-based recognition technique is defined that identifies 3-D objects by looking for their notable features. This technique makes use of an evidence rule base, which is a set of salient or evidence conditions with corresponding evidence weights for various objects in the database. A measure of similarity between the set of observed features and the set of evidence conditions for a given object in the database is used to determine the identity of an object in the scene or reject the object(s) in the scene as unknown. This procedure has polynomial time complexity and correctly identifies a variety of objects in both synthetic and real range images. A technique for automatically deriving the evidence rule base from training views of objects is shown to generate evidence conditions that successfully identify new views of those objects
Article
A computer vision system is presented for shape synthesis and recognition of three-dimensional objects using an attributed hypergraph representation. The vision system is capable of: (1) constructing an attributed hypergraph representation (AHR) based on the information extracted from an image with range data; (2) synthesizing several AHRs obtained from various views of an object to form a complete AHR of the object; and (3) recognizing any view of an object of finding the graph monomorphism between the AHR of that view and the complete AHR of a prototype object. This system is implemented on a Grinnell imaging system driven by a VAX 11/750 running VMS
Article
In this paper, we present a system for object representation and recognition from dense range maps. The system addresses three problems, namely (i) object representation from single view dense range map, (ii) integrating data or descriptions from multiple views for model construction, and (iii) matching descriptions from a single unknown view to models. Although the main goal of this paper is to develop an algorithm for solving the problem (iii) stated above, to give a complete overview of our system, we will briefly outline our solution techniques with the aid of examples, for problems (i) and (ii) as well. The objects and models are represented by regions that are a collection of surface patches homogeneous in certain intrinsic surface properties. The recognition scheme is based on matching object surface descriptions with model surface descriptions. The recognition task includes both locating the overall object and identifying its features. Location is achieved by finding a geometrical registration function that correctly superimposes an arbitrary instance of the known model and the model. A localization technique is presented which requires that correspondence be established exactly, between one point on the object surface and one on the model surface. Once the single point correspondence is specified, closed-form solutions are given for determining the attitude of the unknown view of the object in 3 space with respect to the model.
The authors present 3D-POLY, a working system for recognizing objects in the presence of occlusion and against cluttered backgrounds. The time complexity of this system is only O( n <sup>2</sup>) for single-object recognition, where n is the number of features on the project. The organisation of the feature data for the models is based on a data structure called the feature sphere. Efficient constant-time algorithms for assigning a feature to its proper place on a feature sphere and for extracting the neighbors of a given feature from the feature sphere representation are presented. For hypothesis generation, local feature sets are used. The combination of the feature sphere idea for streamlining verification and the local feature sets for hypothesis generation results in a system whose time complexity has a low-order polynomial bound
Article
Geometric matching algorithms and geometric representations are examined for point sets, curves, surfaces, volumes, and their space-time trajectories. The author focuses mainly on general-purpose techniques. Geometric matching benefits greatly from taking advantage of symmetries and special features, e.g. dihedral edges and circles or any three reliably identifiable feature points. Volume matching and representation techniques are also presented. Texture and fractal representations are briefly described
Article
http://vislab.ucr.edu/PUBLICATIONS/pubs/Journal%20and%20Conference%20Papers/before10-1-1997/Journals/1987/CADbased3-Dmodel87.pdf A CAD-based approach is proposed for building representations and models that can be used in diverse applications involving three-dimensional (3-D) object recognition and manipulation. There are two main steps in the approach. First, the object's geometry is designed, using a CAD system, or its CAD model is extracted from the existing database if it has already been modeled. The presentations are developed from the CAD model, and features are constructed, possibly by combining multiple representations that are crucial in 3-D object recognition and manipulation. The Alpha 1 modeling system is used. It utilizes spline-based boundary representation. Six CAD-based representations are presented: surface points and normals, surface curvatures, generalized sweep, polyhedral, extended Gaussian image, and object decomposition and hierarchical representation. This approach allows vision models to be automatically generated from an existing CAD database.
Polygonizing implicit surfaces. Xe-rox Parc
  • J Blumenthal
Blumenthal, J. 1988. Polygonizing implicit surfaces. Xe-rox Parc Technical Report EDL-88-4. (appeared later in CAGD).
Pose estimation from corre-sponding point data. Machine Vision for Inspection and Measurement
  • R M Haralick
  • H Joo
  • C Lee
  • X Zhuang
  • V G Vaidya
  • M B Kim
Haralick, R.M., Joo, H., Lee.,C. Zhuang, X., Vaidya, V.G., Kim, M.B. 1989. Pose estimation from corre-sponding point data. Machine Vision for Inspection and Measurement. (H. Freeman, Ed.) Academic Press.
Rigid body motion from range image sequences. (appeared in Computer Vision Graphics and Image Processing
  • B K P Horn
  • J G And Harris
Horn, B.K.P. and Harris, J.G. 1989. Rigid body motion from range image sequences. (appeared in Computer Vision Graphics and Image Processing.)
Inexact matching of 3D surfaces. VSSP-TR-3-90
  • S Z Li
Li, S.Z., 1990. Inexact matching of 3D surfaces. VSSP-TR-3-90. University of Surrey, England.
Measurement, orientation determina-tion, and recognition of surface shapes in range images
  • P Liang