Conference PaperPDF Available

Truncated Signed Distance Function: Experiments on Voxel Size

October 2014

October 2014
8815:357-364

DOI:10.1007/978-3-319-11755-3_40

Conference: International Conference Image Analysis and Recognition

Authors:

Ayoub Al-Hamadi

Otto-von-Guericke-Universität Magdeburg

Philipp Werner

Otto-von-Guericke-Universität Magdeburg

Real-time 3D reconstruction is a hot topic in current research. Several popular approaches are based on the truncated signed distance function (TSDF), a volumetric scene representation that allows for integration of multiple depth images taken from different viewpoints. Aiming at a deeper understanding of TSDF we discuss its parameters, conduct experiments on the influence of voxel size on reconstruction accuracy and derive practical recommendations.

Content uploaded by Philipp Werner

Content may be subject to copyright.

Content uploaded by Philipp Werner

Content may be subject to copyright.

Truncated Signed Distance Function:

Experiments on Voxel Size

Diana Werner, Ayoub Al-Hamadi and Philipp Werner

University of Magdeburg, Germany

{Diana.Werner;Ayoub.Al-Hamadi}@ovgu.de

Abstract. Real-time 3D reconstruction is a hot topic in current re-

search. Several popular approaches are based on the truncated signed

distance function (TSDF), a volumetric scene representation that allows

for integration of multiple depth images taken from diﬀerent viewpoints.

Aiming at a deeper understanding of TSDF we discuss its parameters,

conduct experiments on the inﬂuence of voxel size on reconstruction ac-

curacy and derive practical recommendations.

Keywords: TSDF, KinectFusion, 3D reconstruction

1 Introduction

Accurate 3D reconstruction in real-time has a lot of applications in entertain-

ment, virtual reality, augmented reality and robotics. The introduction of Mi-

crosoft’s low-cost RGB-D camera Kinect [5] has made 3D sensing available to

everyone. So it also has boosted research and commercial activities in 3D re-

construction and its applications. Several popular and widely used approaches

such as KinectFusion [4, 6], Kintinious [11, 10], the open source implementation

of KinectFusion in the point cloud library (PCL) [12] or KinFu Large Scale [1]

are based on the truncated signed distance function (TSDF).

TSDF is a volumetric representation of a scene for integrating depth images

that has several beneﬁts, e. g. time and space eﬃciency, representation of un-

certainty or incremental updating [2]. It further is well-suited for data-parallel

algorithms, i. e. for implementation on GPUs. The attained speed-up facilitates

real time processing at high frame rate.

There has been some investigations on hole ﬁlling to generate more natural

looking reconstructions, which is partly done automatic using the TSDF method

[2], and can be done in an energy conservation way [7]. However, to our knowl-

edge there has been no closer look at scenarios with multiple objects and other

important questions on object size and resolution. For instance: Up to which

point is a true hole reconstructed as a hole? In comparison with camera position

and direction: Is the reconstruction inﬂuenced by distance or angle? Are there

problems at the very left or right border of an object seen from an speciﬁc di-

rection? In which way is the reconstruction inﬂuenced by the voxel size used in

the world grid?

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11755-3_40

2 TSDF: Experiments on Voxel Size

sdfi(x)

camz(x)

−1

tsdfi(x)

0 4,000

−1

depthi(x)

object

surface

camz(x) in mm

tsdfi(x)

(a) (b)

Fig. 1. 2D TSDF example. (a) Solid object (green), camera with ﬁeld of view, optical

axis and ray (blue), and TSDF grid (unseen voxels are white, for others see color bar).

The signed distance value of voxel xis determined by the depth of the corresponding

surface point pand the voxel’s camera distance camz(x). (b) 1D TSDF sampled along

the ray through pwith t= 1000 mm. Object surface is at zero crossing.

In this paper we analyze the inﬂuence of distance, object size and angle to

camera viewing direction. We will have a look at this with combination of several

world grid voxel sizes.

This paper is structured as follows. Section 2 describes the TSDF in detail.

It is shown how this function is generated from a depth map after sensing the

environment and how one can compute the 3D reconstruction given the TSDF.

Section 3 describes parameters and algorithmic options to consider for improving

results. In section 4 we describe our experiments and discuss our results. Section

5 concludes this paper.

2 TSDF

The signed distance function (SDF) was proposed to reconstruct a 3D model

from multiple range images [2]. A d-dimensional environment is represented in

ad-dimensional grid of equally sized voxels. The position of a voxel xis deﬁned

by its center. For each voxel there are two relevant values. First, sdfi(x) which

is the signed distance in between voxel center and nearest object surface in

direction of current measurement. In front of an object (in free space) the values

are deﬁned to be positive. Behind the surface (inside the object) distances are

negative. Second, there is a weight wi(x) for each voxel to assess uncertainty of

the corresponding sdfi(x). The subscript idenotes the i’th observation. Fig. 1a

and the following equation deﬁne sdfi(x) precisely.

sdfi(x) = depthi(pic(x)) −camz(x) (1)

pic(x) is the projection of the voxel center xonto the depth image. So

depthi(pic(x)) is the measured depth in between the camera and the nearest

TSDF: Experiments on Voxel Size 3

object surface point pon the viewing ray crossing x. Accordingly, camz(x) is

the distance in between the voxel and the camera along the optical axis. Conse-

quently, sdfi(x) is a distance along the optical axis as well.

In [4, 6] the SDF has been truncated at ±t. This is beneﬁcial, because large

distances are not relevant for surface reconstruction and a restriction of the value

range can be utilized to memory footprint. The truncated variant of sdfi(x) is

denoted by tsdfi(x).

tsdfi(x) = max(−1,min(1,sdfi(x)

t)) (2)

In Fig. 1a tsdfi(x) of the voxel grid is encoded by color. Fig. 1b shows the TSDF

sampled along a viewing ray.

As mentioned above, multiple observations can be combined in one TSDF to

integrate information from diﬀerent viewpoints to improve accuracy or to add

missing patches of the surface. This is done by weighted summation, usually

through iterative updates of the TSDF. T SD Fi(x) denotes the integration of

all observations tsdfj(x) with 1 ≤j≤i.Wi(x) assesses the uncertainty of

T SD Fi(x). A new observation is integrated by applying the following update

step for all voxels xin the grid. The grid is initialized with T SDF0(x) = 0 and

W0(x) = 0.

T SD Fi(x) = Wi−1(x)T S DFi−1(x) + wi(x)tsdfi(x)

Wi−1(x) + wi(x)(3)

Wi(x) = Wi−1(x) + wi(x) (4)

Most approaches set the uncertainty weight to wi(x) = 1 for all updated voxels

and to wi(x) = 0 for all voxels outside the camera’s ﬁeld of view [4,6, 12,9]. This

simply averages the measured TSDF observations over time.

For surface reconstruction one can think of the TSDF like a level set. To ﬁnd

the object surface you look for the zero level. This is usually done by ray casting

from a given camera viewpoint. For each considered ray the TSDF is read step

by step until there is a switch in sign. The information of surrounding TSDF

values is then interpolated to estimate the reﬁned point of zero crossing along

the ray. This point is returned as an object surface point.

3 Parameters and Algorithmic Options

The TSDF representation requires to select several parameters.

Grid volume size determines the dimensions of the TSDF grid, i. e. the max-

imum dimensions of the scene to reconstruct in a physical unit like mm.

In practice it is bounded by the available Random Access Memory on the

GPU. However, several previous works suggested to overcome this limita-

tion by shifting the TSDF grid when the camera’s ﬁeld of view leaves the

grid [11, 10, 1, 8]. The grid dimensions increase with voxel size in constant

memory footprint, however at cost of reconstruction accuracy.

4 TSDF: Experiments on Voxel Size

Voxel size vis a crucial parameter as it inﬂuences memory requirements and

surface reconstruction accuracy. If dimensions of a 3D grid are ﬁxed, dou-

bling the voxel size means to reduce the number of voxels to one-eighths.

This is associated with the same reduction in memory footprint. Further,

it reduces computational cost for updating the TSDF and for ray tracing.

The other way around, an increased voxel size facilitates to increase the

scene volume without needing more memory or increasing computational

cost. However, an increase in voxel size comes along with a decrease in the

level of representable details resp. with lowered reconstruction accuracy. So

it is worth thinking about the optimal voxel size for a particular application.

In Sect. 4 we conduct experiments to assess the inﬂuence of this parameter

on the accuracy.

Distance representation and truncation distance ti. e. the coding of dis-

tance values T SDFi(x) is crucial for the reconstruction accuracy. Intuitively,

there should be as many quantization steps as possible to minimize infor-

mation loss caused by rounding and maintain the accuracy of T SDFi(x).

Especially, this is important near zero level, as those values of T SD Fi(x)

are used for surface estimation. So ﬂoating point is appropriate. In terms

of memory footprint it is beneﬁcial use two byte integer per voxel as in the

implementation of PCL [12]. However, here the selection of tinﬂuences re-

construction accuracy. Two byte integer has 65,536 quantization steps to

represent a distance, i. e. integer values between −32,768 and 32,767. With

a ﬁxed point number coding and a given truncation distance t, signed dis-

tances in range ±tare mapped to ±32,767. So signed distances are quantized

in steps of t

32,767 , i. e. the quantization error is proportional to tand a lower

tshould be better. E. g. with t= 1,000 mm the coding will round the each

distance to multiples of 0.03 mm whereas with t= 10 mm it will round to

multiples of 0.0003 mm. On the other hand, tshould be larger than length

of voxel diagonal √d·vand the level of noise. A detailed analysis of this

parameter is out of scope, but will be addressed in future work.

Next to the parameters there are some algorithmic options for variation.

TSDF update i. e. the integration of multiple observations in one TSDF can

be accomplished in diﬀerent ways. Above, we introduced the classical equally

weighted sum update. Recently, [3] proposed to select wi(x) individually for

each voxel based on the uncertainty of the measurement. The authors model

wi(x) dependent on the corresponding depth value, as the depth estimation

provided by the used Kinect sensor is more accurate in close range. They fur-

ther propose two modiﬁed update methods and evaluate their beneﬁt for the

model reconstruction accuracy. Their variants of the TSDF update, which

consider characteristics of the sensing hardware, outperform the classical

update step in the presented experiments.

Surface reconstruction is done via ray tracing. You start at a point on the

ray as near as possible to camera but in TSDF volume. From this you are

going with a individually chosen step size to the next ray point and so

TSDF: Experiments on Voxel Size 5

camz= 800 mm

step = 0.1 mm

camz= 3100 mm

2 4 8 16 32 64 128

10−2

10−1

100

101

|ed|in mm

(a) (b)

Fig. 2. Depth experiment. (a) Planar object and its variation. (b) Absolute depth error

for diﬀerent voxel sizes: mean with standard deviation (red), maximum and minimum

(blue), t= 255 mm.

on. You have to decide whether to look at the value tsdfi(x) for the voxel

xrelated to a ray point or to interpolate a TSDF value from surrounding

voxels. You also have to decide for an interpolation method. In PCL [12]

and in other works like [4] the trilinear interpolation is used. After a zero

crossing is detected between two ray points from positiv to negative you are

able to compute a surface point via the chosen interpolation. You also can

decide whether to stop the ray tracing after computing one surface point to

reconstruct only surface points seen from camera or not.

4 Experiments and discussion

In this section we conduct several experiments aiming at a deeper understanding

of TSDF, i. e. at eﬀects of spacial discretization in a grid on the reconstruction

accuracy. To focus on these eﬀects, we created synthetic depth maps which do not

suﬀer from noise or other measurement errors. Another advantage of synthetic

data is the availability of perfect ground truth. On purpose of simple illustration

experiments are conducted with a 2D TSDF grid, i. e. we have a 1D depth map

and a 2D surface. All the experiments demonstrate the result after a single

depth measurement. Integration of multiple depth maps is out of scope and will

be addressed in future work. We decompose the reconstruction accuracy in two

components and investigate each in a dedicated experiment.

4.1 Depth Error

In this experiment we calculate the depth error edfor each pixel of a synthetic

depth map and the according depth reconstructed from the TSDF. The synthetic

depth maps contain a planar object surface which is perpendicular to the optical

axis (see Fig. 2a). We placed the object 800 mm in front of camera position and

6 TSDF: Experiments on Voxel Size

−2−1 0 1 2

−0.62

0.00

0.86

−30−20−10 0 10 20 30

αin ◦

|ed|in mm

v=4 v=16 v=64

(a) (b)

Fig. 3. Angular depth error. (a) Map of edacross TSDF volume for voxel size vof

64 mm. (b) Mean error along rays for diﬀerent voxel sizes v.

move it from this position in steps of 0.1 mm till a distance of 3100 mm. The

camera is placed as in Fig. 1a with optical axis passing through the middle of

the TSDF grid and aligned with one of its axes. The size of the world grid is

ﬁxed to 4096 mm width and height in all experiments.

Fig. 2b shows the mean, standard deviation, minimum and maximum of the

absolute depth error across tested ﬁeld of view for several voxel sizes v. It is

apparent that the absolute depth error increases with voxel size. However, the

mean error increases slower than voxel size. Whereas the mean error is 0.04 mm

for v= 2 mm (1.9 %), it is 0.97 mm for v= 128 mm (0.7 %). There is a similar

eﬀect for the maximum and minimum error.

Further, looking at the spacial distribution of the error one can observe that

the most severe errors occur on the border of the ﬁeld of view (see Fig. 3a)

whereas the object distance seems to have no inﬂuence on reconstruction accu-

racy. To investigate the inﬂuence of the angle in between grid axis and ray we

calculate the mean of absolute error along each viewing ray. The results are given

in Fig. 3b. Here it is apparent that the error is at least two orders of magnitude

larger on the borders of the ﬁeld of view, and even more the higher vgets. The

high error at the borders are artefacts caused by the trilinear interpolation. The

PCL implementation of KinectFusion that we used for our experiments [12] does

not pay attention to the fact that some TSDF voxels xinvolved in the interpo-

lation may be unseen. For these voxels tsdfi(x) = 0 from initialization. There

has been no sensing for these voxels and therefore you can not expect to have

a surface there, which would be true for seen voxles xwith a TSDF value of 0.

With increasing voxel size this error gets apparent, as more reconstructed 3D

points are aﬀected by this problem because the inﬂuence of the interpolation is

one voxel size.

TSDF: Experiments on Voxel Size 7

1 2 3 4

l/v

length in % of object length

cor

1 2 3 4

l/v

edin mm

(a) (b)

Fig. 4. Lateral depth error. (a) Average length of wrongly reconstructed (f p), wrongly

not reconstructed (fn) and correctly reconstructed (cor) object in percent of true

object length. (b) Average depth error in connection to ratio of true object length l

and voxel soze v. Voxel size vis 64 mm.

4.2 Lateral Error

With this experiment we investigate how the ratio of voxel size vand the length

lof an planar object, located perpendicular to the optical camera axis, inﬂu-

ence the reconstruction accuracy. The objects were moved in a camera distance

from 1280 mm to 1536 mm, wich are 4 voxels with 64 mm size. We chosed step

sizes 8 mm in vetical and horizontal direction. The object bounds laid inside the

viewing area with an distance of 128 mm from border. For all objects of same

length generated for this experiment we calculated mean values and looked at

depth error deand at the length fp,f n and cor which are the length of the

reconstructed object wich are wrongly reconstructed, wrongly not reconstructed

and correctly reconstructed in % of the true object length.

In Fig. 4 you can assert that all of these length together with edare clearly

inﬂuenced by the ratio of land v. It is also obvious that the objects are recon-

structed too small in average.

5 Conclusion

In this paper we gave a detailed look at TSDF and the parametric and algorith-

mic options. We showed that for PCL’s implementation depth errors are in same

magnitude for voxel sizes 4 to 64mm. The errors are 2 magnitudes larger at the

border of viewing ﬁeld due to interpolation eﬀects. For lateral errors there is a

strong relation between ratio of object length and voxel size and the reconstruc-

tion accuracy whith the object being too small in average. The negative impact

of increase in voxel size is lower for depth error than for lateral error.

8 TSDF: Experiments on Voxel Size

Acknowledgement

This work was supported by Transregional Collaborative Research Centre

SFB/TRR 62 (“Companion-Technology for Cognitive Technical Systems”) funded

by the German Research Foundation (DFG).

References

1. Bondarev, E., Heredia, F., Favier, R., Ma, L., de With, P.H.: On photo-realistic

3D reconstruction of large-scale and arbitrary-shaped environments. In: Consumer

Communications and Networking Conference (CCNC), 2013 IEEE. pp. 621–624.

IEEE (2013)

2. Curless, B., Levoy, M.: A volumetric method for building complex models from

range images. In: Proceedings of the 23rd Annual Conference on Computer Graph-

ics and Interactive Techniques. pp. 303–312. SIGGRAPH ’96, ACM, New York,

NY, USA (1996)

3. Hemmat, H.J., Bondarev, E., de With, P.H.N.: Exploring distance-aware weight-

ing strategies for accurate reconstruction of voxel-based 3D synthetic models. In:

MultiMedia Modeling. Lecture Notes in Computer Science, vol. 8325, pp. 412–423.

Springer (2014)

4. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton,

J., Hodges, S., Freeman, D., Davison, A.: KinectFusion: real-time 3D reconstruction

and interaction using a moving depth camera. In: Proceedings of the 24th annual

ACM symposium on User interface software and technology. pp. 559–568. ACM

(2011)

5. Microsoft: Kinect (2014), http://www.xbox.com/en-us/kinect/

6. Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J.,

Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: real-time dense

surface mapping and tracking. In: Proceedings of the 2011 10th IEEE International

Symposium on Mixed and Augmented Reality. pp. 127–136. ISMAR ’11, IEEE

Computer Society, Washington, DC, USA (2011)

7. Paulsen, R.R., Bærentzen, J.A., Larsen, R.: Regularisation of 3d signed distance

ﬁelds. In: SCIA. Lecture Notes in Computer Science, vol. 5575, pp. 513–519.

Springer (2009)

8. Roth, H., Vona, M.: Moving volume KinectFusion. In: Proceedings of the British

Machine Vision Conference. pp. 112.1–112.11. BMVA Press (2012)

9. Whelan, T., Johannsson, H., Kaess, M., Leonard, J.J., McDonald, J.: Robust track-

ing for real-time dense RGB-D mapping with kintinuous. In: Technical Report 031.

MIT-CSAIL (2012), http://hdl.handle.net/1721.1/73167

10. Whelan, T., Johannsson, H., Kaess, M., Leonard, J.J., McDonald, J.: Robust real-

time visual odometry for dense RGB-D mapping. In: 2013 IEEE International

Conference on Robotics and Automation (ICRA). pp. 5724–5731. IEEE (2013)

11. Whelan, T., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., McDonald, J.:

Kintinuous: Spatially extended KinectFusion. In: Technical Report 020. MIT-

CSAIL (2012), http://hdl.handle.net/1721.1/71756

12. Willow Garage and other contributors: Open source implementation of KinectFu-

sion in PCL 1.7.1 (2014), http://www.pointclouds.org/downloads/

Method for Automated Data Collection for 3D Reconstruction

Conference Paper

Full-text available

Nov 2022

The paper presents a method for automating RGB images shot by the human operator and analysis for improving the quality of further 3D reconstruction for low-rise outdoor objects and a use case for method application. The method provides automation as a three-step process: local analysis of images performed during the shooting, global analysis, and recommendations performed after the shooting. Method steps filter out defective images, approximate future 3D-model using SfM, and map it to human operator trajectory estimation to identify object areas that will have low resolution on the final 3D model and, therefore, require additional shooting. Method structure does not require any sensors except RGB camera and inertial sensors and does not rely on any external backend, which lowers the hardware requirement. The authors implemented the method and use-case as an Android application. The method was evaluated by experiments on outdoor datasets created for this study. The evaluation shows that the local analysis stage is fast enough to perform during the shooting process and that the local analysis stage improves the quality of the final 3D model.

Robust feature-free pose tracking and uncertainty-aware geometry reconstruction for spinning non-cooperative targets

Article

Nov 2021
COMPUT GRAPH-UK

Pose tracking and geometry reconstruction are greatly significant for the high-level perceptual understanding and close-proximity operations of dynamic and geometrically unknown non-cooperative targets in space. However, the performance degrades severely under the commonly outlier-contaminated and corrupted measurements acquired from Time-of-Flight (ToF) cameras. In this paper, we are motivated to investigate the framework of both robust pose tracking and reliable geometry reconstruction. For the pose tracking, we propose an improved robust Iterative Closest Point (ICP) method based on adaptive Iteratively Reweighted Least Squares (IRLS), which can gradually jump out of the local minima in a naturally coarse-to-fine fashion. Besides, we put forward a hybrid feature-free loop closure detection approach to efficiently eliminate the accumulated error, meanwhile avoiding the ambiguity caused by the symmetrical structure. Regrading the geometry reconstruction, we present an explicit and general geometry uncertainty description, incorporated into a mixture-based probabilistic fusion method, to cope with the reconstruction defects. The experimental results show that our pose tracking and geometry reconstruction methods can achieve better performance in terms of robustness and accuracy.

A systematic literature review: Real-time 3D reconstruction method for telepresence system

Article

Full-text available

Nov 2023
PLOS ONE

Real-time three-dimensional (3D) reconstruction of real-world environments has many significant applications in various fields, including telepresence technology. When depth sensors, such as those from Microsoft’s Kinect series, are introduced simultaneously and become widely available, a new generation of telepresence systems can be developed by combining a real-time 3D reconstruction method with these new technologies. This combination enables users to engage with a remote person while remaining in their local area, as well as control remote devices while viewing their 3D virtual representation. There are numerous applications in which having a telepresence experience could be beneficial, including remote collaboration and entertainment, as well as education, advertising, and rehabilitation. The purpose of this systematic literature review is to analyze the recent advances in 3D reconstruction methods for telepresence systems and the significant related work in this field. Next, we determine the input data and the technological device employed to acquire the input data, which will be utilized in the 3D reconstruction process. The methods of 3D reconstruction implemented in the telepresence system as well as the evaluation of the system, have been extracted and assessed from the included studies. Through the analysis and summarization of many dimensions, we discussed the input data used for the 3D reconstruction method, the real-time 3D reconstruction methods implemented in the telepresence system, and how to evaluate the system. We conclude that real-time 3D reconstruction methods for telepresence systems have progressively improved over the years in conjunction with the advancement of machines and devices such as Red Green Blue-Depth (RGB-D) cameras and Graphics Processing Unit (GPU).

Prior-Driven NeRF: Prior Guided Rendering

Article

Full-text available

Feb 2023

Neural radiation field (NeRF)-based novel view synthesis methods are gaining popularity. NeRF can generate more detailed and realistic images than traditional methods. Conventional NeRF reconstruction of a room scene requires at least several hundred images as input data and generates several spatial sampling points, placing a tremendous burden on the training and prediction process with respect to memory and computational time. To address these problems, we propose a prior-driven NeRF model that only accepts sparse views as input data and reduces a significant number of non-functional sampling points to improve training and prediction efficiency and achieve fast high-quality rendering. First, this study uses depth priors to guide sampling, and only a few sampling points near the controllable range of the depth prior are used as input data, which reduces the memory occupation and improves the efficiency of training and prediction. Second, this study encodes depth priors as distance weights into the model and guides the model to quickly fit the object surface. Finally, a novel approach combining the traditional mesh rendering method (TMRM) and the NeRF volume rendering method was used to further improve the rendering efficiency. Experimental results demonstrated that our method had significant advantages in the case of sparse input views (11 per room) and few sampling points (8 points per ray).

Structure-aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation

Preprint

Jul 2022

Morphable models are essential for the statistical modeling of 3D faces. Previous works on morphable models mostly focus on large-scale facial geometry but ignore facial details. This paper augments morphable models in representing facial details by learning a Structure-aware Editable Morphable Model (SEMM). SEMM introduces a detail structure representation based on the distance field of wrinkle lines, jointly modeled with detail displacements to establish better correspondences and enable intuitive manipulation of wrinkle structure. Besides, SEMM introduces two transformation modules to translate expression blendshape weights and age values into changes in latent space, allowing effective semantic detail editing while maintaining identity. Extensive experiments demonstrate that the proposed model compactly represents facial details, outperforms previous methods in expression animation qualitatively and quantitatively, and achieves effective age editing and wrinkle line editing of facial details. Code and model are available at https://github.com/gerwang/facial-detail-manipulation.

Cross-platform AR annotation for assembly-design communication in pipe outfitting

Article

Full-text available

Aug 2022
INT J ADV MANUF TECH

Communication is an essential part of most professional activities. For complex industrial products such as ships, some design problems can only be found during the construction process. However, it is time-consuming for on-site assemblers to provide feedback on problems through traditional methods such as phone calls and photos. In recent years, adding structured visual cues through augmented reality (AR) has become an important method to assist remote collaborative tasks. However, previous studies and commercial solutions often had limited annotation types and relied on specific devices for tracking and reconstruction, which is unsatisfactory for deployment in industrial equipment and large scenes. The presented work tackles issues with feedback and collaborative decision-making of design problems in the pipe outfitting stage and proposes a cross-platform AR annotation solution on desktop computers, industrial tablets, and AR head-mounted devices (HMD). The novelty of the current work lies in a cross-functional annotation taxonomy for real-time collaboration, device-cloud integrative context awareness methods based on RGB-D sensors compatible with more devices, 2D–3D annotation mapping algorithms, and visual enhancement strategies for large scenes. A prototype system is developed and verified on a ship unit model and a physical pipe platform. The results indicate that the solution provides good performance in real-time capability and reconstruction precision, allows rapid localization of faulty parts, and creates annotations flexibly and reliably. This solution is also expected to play an active role in collaborative quality inspection and intelligent management in shipbuilding and other heavy industries.

Digital model reconstruction through 3D Stereo Depth camera: a faster method exploiting robot poses

Article

Jan 2023

In industrial robotic contact-based operations it is necessary to have a detailed geometrical knowledge of the work-piece to automate the generation of the working trajectory. In many cases, the digital model is not available or it differs from its as built actual conditions. Vision sensors, especially 3D vision sensors allows to scan the work-piece and reconstruct its digital copy that is used for the generation of the robotic working trajectory. In this paper we compare two algorithms for the generation of the 3D model of an unknown work-piece based on the use of the RGB-D images taken from different perspectives. The first technique, based on standard image reconstruction methods commonly used for reconstruction of indoor scenes where error in the order of few centimeters is negligible, is exploited in this work in order to be exploited in contact-based robotic operations where the geometrical error has to be limited to few millimeters. It is based on the analysis of the images to estimate the camera pose taking each image. Based on the pose, the algorithm integrates then the information contained in the images in a unique volume representing the scene. The second algorithm directly uses the known poses of the robot, stored while capturing each image, to integrate the information to create the model. The two algorithms are compared to evaluate the accuracy, acquisition time and elaboration time. The analysis is done considering two kinds of objects, the first one with regular shape, the second one corresponding to an actual industrial object.

Structure-Aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation

Chapter

Nov 2022

Bundle ICP with Virtual Depth for Hand-Held 3d Scanner

Conference Paper

May 2022

Voxel‐wise UV parameterization and view‐dependent texture synthesis for immersive rendering of truncated signed distance field scene model

Article

Feb 2022

In this paper, we introduced a novel voxel-wise UV parameterization and view-dependent texture synthesis for the immersive rendering of a truncated signed distance field (TSDF) scene model. The proposed UV parameterization delegates a precomputed UV map to each voxel using the UV map lookup table and consequently, enabling efficient and high-quality texture mapping without a complex process. By leveraging the convenient UV parameterization, our view-dependent texture synthesis method extracts a set of local texture maps for each voxel from the multiview color images and separates them into a single view-independent diffuse map and a set of weight coefficients for an orthogonal specular map basis. Furthermore, the view-dependent specular maps for an arbitrary view are estimated by combining the specular weights of each source view using the location of the arbitrary and source viewpoints to generate the view-dependent textures for arbitrary views. The experimental results demonstrate that the proposed method effectively synthesizes texture for an arbitrary view, thereby enabling the visualization of view-dependent effects, such as specularity and mirror reflection.

KinectFusion: Real-Time Dense Surface Mapping and Tracking

Conference Paper

Full-text available

Oct 2011

We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera

Conference Paper

Full-text available

Oct 2011

KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPU-based pipeline are described in full. Uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions are shown. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch.

Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous

Technical Report

Jan 2012

This paper describes extensions to the Kintinuous [1] algorithm for spatially extended KinectFusion, incorporating the following additions: (i) the integration of multiple 6DOF camera odometry estimation methods for robust tracking; (ii) a novel GPU-based implementation of an existing dense RGB-D visual odometry algorithm; (iii) advanced fused realtime surface coloring. These extensions are validated with extensive experimental results, both quantitative and qualitative, demonstrating the ability to build dense fully colored models of spatially extended environments for robotics and virtual reality applications while remaining robust against scenes with challenging sets of geometric and visual features.

KinectFusion: Real-time dense surface mapping and tracking

Conference Paper

Oct 2011

Exploring Distance-Aware Weighting Strategies for Accurate Reconstruction of Voxel-Based 3D Synthetic Models

Conference Paper

Jan 2014

In this paper, we propose and evaluate various distance-aware weighting strategies to improve reconstruction accuracy of a voxel-based model according to the Truncated Signed Distance Function (TSDF), from the data obtained by low-cost depth sensors. We look at two strategy directions: (a) weight definition strategies prioritizing importance of the sensed data depending on the data accuracy, and (b) model updating strategies defining the level of influence of the new data on the existing 3D model. In particular, we introduce Distance-Aware (DA) and Distance-Aware Slow-Saturation (DASS) updating methods to intelligently integrate the depth data into the synthetic 3D model based on the distance-sensitivity metric of a low-cost depth sensor. By quantitative and qualitative comparison of the resulting synthetic 3D models to the corresponding ground-truth models, we identify the most promising strategies, which lead to an accuracy improvement involving a reduction of the model error by 10 - 35%.

Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous

Article

Sep 2012

This paper describes extensions to the Kintinuous algorithm for spatially extended KinectFusion, incorporating the following additions: (i) the integration of multiple 6DOF camera odometry estimation methods for robust tracking; (ii) a novel GPU-based implementation of an existing dense RGB-D visual odometry algorithm; (iii) advanced fused real-time surface coloring. These extensions are validated with extensive experimental results, both quantitative and qualitative, demonstrating the ability to build dense fully colored models of spatially extended environments for robotics and virtual reality applications while remaining robust against scenes with challenging sets of geometric and visual features.

Kintinuous: Spatially Extended KinectFusion

Article

Jul 2012

In this paper we present an extension to the KinectFusion algorithm that permits dense mesh-based mapping of extended scale environments in real-time. This is achieved through (i) altering the original algorithm such that the region of space being mapped by the KinectFusion algorithm can vary dynamically, (ii) extracting a dense point cloud from the regions that leave the KinectFusion volume due to this variation, and, (iii) incrementally adding the resulting points to a triangular mesh representation of the environment. The system is implemented as a set of hierarchical multi-threaded components which are capable of operating in real-time. The architecture facilitates the creation and integration of new modules with minimal impact on the performance on the dense volume tracking and surface reconstruction modules. We provide experimental results demonstrating the system's ability to map areas considerably beyond the scale of the original KinectFusion algorithm including a two story apartment and an extended sequence taken from a car at night. In order to overcome failure of the iterative closest point (ICP) based odometry in areas of low geometric features we have evaluated the Fast Odometry from Vision (FOVIS) system as an alternative. We provide a comparison between the two approaches where we show a trade off between the reduced drift of the visual odometry approach and the higher local mesh quality of the ICP-based approach. Finally we present ongoing work on incorporating full simultaneous localisation and mapping (SLAM) pose-graph optimisation.

Moving Volume KinectFusion

Article

Jan 2012

Henry Roth

The recently reported KinectFusion algorithm uses the Kinect and GPU algorithms to simultaneously track the camera and build a dense scene reconstruction in real time. However, it is locked to a fixed volume in space and can not map surfaces that lie outside that volume. We present moving volume KinectFusion with additional algorithms to automatically translate and rotate the volume through space as the camera moves. This makes it feasible to use the algorithm for perception in mobile robotics and other free-roaming applications, simultaneously providing both visual odometry and a dense spatial map of the local environment. We present experimental results for several RGB-D SLAM benchmark datasets and also for novel datasets including a 25m outdoor hike.

Robust real-time visual odometry for dense RGB-D mapping

Conference Paper

May 2013

On photo-realistic 3D reconstruction of large-scale and arbitrary-shaped environments

Conference Paper

Jan 2013

This paper presents a system architecture for reconstructing photorealistic and accurate 3D models of indoor environments. The system specifically targets large-scale and arbitrary-shaped environments and enables processing of data obtained with an arbitrary-chosen capturing path. The system extends the baseline Kinect Fusion algorithm with a buffering algorithm to remove scene-size capturing limitations. Beside this, the paper presents the complete chain of advanced algorithms for point cloud segmentation/decimation, camera pose correction and texture mapping with post-processing filters. The presented architecture features memory- and processor-efficient processing, such that it can be executed on a conventional PC with a mainstream GPU card at the consumer premises.

Truncated Signed Distance Function: Experiments on Voxel Size

Abstract

Recommended publications

Feature Analysis for Audio Classification

Quantitative Analysis of Surface Reconstruction Accuracy Achievable with the TSDF Representation

Exploring Distance-Aware Weighting Strategies for Accurate Reconstruction of Voxel-Based 3D Syntheti...

Filling the gaps: Hybrid vision and inertial tracking

Error analysis and experiments of 3D reconstruction using a RGB-D sensor