Conference PaperPDF Available

Truncated Signed Distance Function: Experiments on Voxel Size

Authors:

Abstract

Real-time 3D reconstruction is a hot topic in current research. Several popular approaches are based on the truncated signed distance function (TSDF), a volumetric scene representation that allows for integration of multiple depth images taken from different viewpoints. Aiming at a deeper understanding of TSDF we discuss its parameters, conduct experiments on the influence of voxel size on reconstruction accuracy and derive practical recommendations.
Truncated Signed Distance Function:
Experiments on Voxel Size
Diana Werner, Ayoub Al-Hamadi and Philipp Werner
University of Magdeburg, Germany
{Diana.Werner;Ayoub.Al-Hamadi}@ovgu.de
Abstract. Real-time 3D reconstruction is a hot topic in current re-
search. Several popular approaches are based on the truncated signed
distance function (TSDF), a volumetric scene representation that allows
for integration of multiple depth images taken from different viewpoints.
Aiming at a deeper understanding of TSDF we discuss its parameters,
conduct experiments on the influence of voxel size on reconstruction ac-
curacy and derive practical recommendations.
Keywords: TSDF, KinectFusion, 3D reconstruction
1 Introduction
Accurate 3D reconstruction in real-time has a lot of applications in entertain-
ment, virtual reality, augmented reality and robotics. The introduction of Mi-
crosoft’s low-cost RGB-D camera Kinect [5] has made 3D sensing available to
everyone. So it also has boosted research and commercial activities in 3D re-
construction and its applications. Several popular and widely used approaches
such as KinectFusion [4, 6], Kintinious [11, 10], the open source implementation
of KinectFusion in the point cloud library (PCL) [12] or KinFu Large Scale [1]
are based on the truncated signed distance function (TSDF).
TSDF is a volumetric representation of a scene for integrating depth images
that has several benefits, e. g. time and space efficiency, representation of un-
certainty or incremental updating [2]. It further is well-suited for data-parallel
algorithms, i. e. for implementation on GPUs. The attained speed-up facilitates
real time processing at high frame rate.
There has been some investigations on hole filling to generate more natural
looking reconstructions, which is partly done automatic using the TSDF method
[2], and can be done in an energy conservation way [7]. However, to our knowl-
edge there has been no closer look at scenarios with multiple objects and other
important questions on object size and resolution. For instance: Up to which
point is a true hole reconstructed as a hole? In comparison with camera position
and direction: Is the reconstruction influenced by distance or angle? Are there
problems at the very left or right border of an object seen from an specific di-
rection? In which way is the reconstruction influenced by the voxel size used in
the world grid?
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-11755-3_40
2 TSDF: Experiments on Voxel Size
p
x
sdfi(x)
camz(x)
1
0
1
tsdfi(x)
0 4,000
1
0
1
depthi(x)
object
surface
camz(x) in mm
tsdfi(x)
(a) (b)
Fig. 1. 2D TSDF example. (a) Solid object (green), camera with field of view, optical
axis and ray (blue), and TSDF grid (unseen voxels are white, for others see color bar).
The signed distance value of voxel xis determined by the depth of the corresponding
surface point pand the voxel’s camera distance camz(x). (b) 1D TSDF sampled along
the ray through pwith t= 1000 mm. Object surface is at zero crossing.
In this paper we analyze the influence of distance, object size and angle to
camera viewing direction. We will have a look at this with combination of several
world grid voxel sizes.
This paper is structured as follows. Section 2 describes the TSDF in detail.
It is shown how this function is generated from a depth map after sensing the
environment and how one can compute the 3D reconstruction given the TSDF.
Section 3 describes parameters and algorithmic options to consider for improving
results. In section 4 we describe our experiments and discuss our results. Section
5 concludes this paper.
2 TSDF
The signed distance function (SDF) was proposed to reconstruct a 3D model
from multiple range images [2]. A d-dimensional environment is represented in
ad-dimensional grid of equally sized voxels. The position of a voxel xis defined
by its center. For each voxel there are two relevant values. First, sdfi(x) which
is the signed distance in between voxel center and nearest object surface in
direction of current measurement. In front of an object (in free space) the values
are defined to be positive. Behind the surface (inside the object) distances are
negative. Second, there is a weight wi(x) for each voxel to assess uncertainty of
the corresponding sdfi(x). The subscript idenotes the i’th observation. Fig. 1a
and the following equation define sdfi(x) precisely.
sdfi(x) = depthi(pic(x)) camz(x) (1)
pic(x) is the projection of the voxel center xonto the depth image. So
depthi(pic(x)) is the measured depth in between the camera and the nearest
TSDF: Experiments on Voxel Size 3
object surface point pon the viewing ray crossing x. Accordingly, camz(x) is
the distance in between the voxel and the camera along the optical axis. Conse-
quently, sdfi(x) is a distance along the optical axis as well.
In [4, 6] the SDF has been truncated at ±t. This is beneficial, because large
distances are not relevant for surface reconstruction and a restriction of the value
range can be utilized to memory footprint. The truncated variant of sdfi(x) is
denoted by tsdfi(x).
tsdfi(x) = max(1,min(1,sdfi(x)
t)) (2)
In Fig. 1a tsdfi(x) of the voxel grid is encoded by color. Fig. 1b shows the TSDF
sampled along a viewing ray.
As mentioned above, multiple observations can be combined in one TSDF to
integrate information from different viewpoints to improve accuracy or to add
missing patches of the surface. This is done by weighted summation, usually
through iterative updates of the TSDF. T SD Fi(x) denotes the integration of
all observations tsdfj(x) with 1 ji.Wi(x) assesses the uncertainty of
T SD Fi(x). A new observation is integrated by applying the following update
step for all voxels xin the grid. The grid is initialized with T SDF0(x) = 0 and
W0(x) = 0.
T SD Fi(x) = Wi1(x)T S DFi1(x) + wi(x)tsdfi(x)
Wi1(x) + wi(x)(3)
Wi(x) = Wi1(x) + wi(x) (4)
Most approaches set the uncertainty weight to wi(x) = 1 for all updated voxels
and to wi(x) = 0 for all voxels outside the camera’s field of view [4,6, 12,9]. This
simply averages the measured TSDF observations over time.
For surface reconstruction one can think of the TSDF like a level set. To find
the object surface you look for the zero level. This is usually done by ray casting
from a given camera viewpoint. For each considered ray the TSDF is read step
by step until there is a switch in sign. The information of surrounding TSDF
values is then interpolated to estimate the refined point of zero crossing along
the ray. This point is returned as an object surface point.
3 Parameters and Algorithmic Options
The TSDF representation requires to select several parameters.
Grid volume size determines the dimensions of the TSDF grid, i. e. the max-
imum dimensions of the scene to reconstruct in a physical unit like mm.
In practice it is bounded by the available Random Access Memory on the
GPU. However, several previous works suggested to overcome this limita-
tion by shifting the TSDF grid when the camera’s field of view leaves the
grid [11, 10, 1, 8]. The grid dimensions increase with voxel size in constant
memory footprint, however at cost of reconstruction accuracy.
4 TSDF: Experiments on Voxel Size
Voxel size vis a crucial parameter as it influences memory requirements and
surface reconstruction accuracy. If dimensions of a 3D grid are fixed, dou-
bling the voxel size means to reduce the number of voxels to one-eighths.
This is associated with the same reduction in memory footprint. Further,
it reduces computational cost for updating the TSDF and for ray tracing.
The other way around, an increased voxel size facilitates to increase the
scene volume without needing more memory or increasing computational
cost. However, an increase in voxel size comes along with a decrease in the
level of representable details resp. with lowered reconstruction accuracy. So
it is worth thinking about the optimal voxel size for a particular application.
In Sect. 4 we conduct experiments to assess the influence of this parameter
on the accuracy.
Distance representation and truncation distance ti. e. the coding of dis-
tance values T SDFi(x) is crucial for the reconstruction accuracy. Intuitively,
there should be as many quantization steps as possible to minimize infor-
mation loss caused by rounding and maintain the accuracy of T SDFi(x).
Especially, this is important near zero level, as those values of T SD Fi(x)
are used for surface estimation. So floating point is appropriate. In terms
of memory footprint it is beneficial use two byte integer per voxel as in the
implementation of PCL [12]. However, here the selection of tinfluences re-
construction accuracy. Two byte integer has 65,536 quantization steps to
represent a distance, i. e. integer values between 32,768 and 32,767. With
a fixed point number coding and a given truncation distance t, signed dis-
tances in range ±tare mapped to ±32,767. So signed distances are quantized
in steps of t
32,767 , i. e. the quantization error is proportional to tand a lower
tshould be better. E. g. with t= 1,000 mm the coding will round the each
distance to multiples of 0.03 mm whereas with t= 10 mm it will round to
multiples of 0.0003 mm. On the other hand, tshould be larger than length
of voxel diagonal d·vand the level of noise. A detailed analysis of this
parameter is out of scope, but will be addressed in future work.
Next to the parameters there are some algorithmic options for variation.
TSDF update i. e. the integration of multiple observations in one TSDF can
be accomplished in different ways. Above, we introduced the classical equally
weighted sum update. Recently, [3] proposed to select wi(x) individually for
each voxel based on the uncertainty of the measurement. The authors model
wi(x) dependent on the corresponding depth value, as the depth estimation
provided by the used Kinect sensor is more accurate in close range. They fur-
ther propose two modified update methods and evaluate their benefit for the
model reconstruction accuracy. Their variants of the TSDF update, which
consider characteristics of the sensing hardware, outperform the classical
update step in the presented experiments.
Surface reconstruction is done via ray tracing. You start at a point on the
ray as near as possible to camera but in TSDF volume. From this you are
going with a individually chosen step size to the next ray point and so
TSDF: Experiments on Voxel Size 5
α
camz= 800 mm
step = 0.1 mm
camz= 3100 mm
2 4 8 16 32 64 128
102
101
100
101
v
|ed|in mm
(a) (b)
Fig. 2. Depth experiment. (a) Planar object and its variation. (b) Absolute depth error
for different voxel sizes: mean with standard deviation (red), maximum and minimum
(blue), t= 255 mm.
on. You have to decide whether to look at the value tsdfi(x) for the voxel
xrelated to a ray point or to interpolate a TSDF value from surrounding
voxels. You also have to decide for an interpolation method. In PCL [12]
and in other works like [4] the trilinear interpolation is used. After a zero
crossing is detected between two ray points from positiv to negative you are
able to compute a surface point via the chosen interpolation. You also can
decide whether to stop the ray tracing after computing one surface point to
reconstruct only surface points seen from camera or not.
4 Experiments and discussion
In this section we conduct several experiments aiming at a deeper understanding
of TSDF, i. e. at effects of spacial discretization in a grid on the reconstruction
accuracy. To focus on these effects, we created synthetic depth maps which do not
suffer from noise or other measurement errors. Another advantage of synthetic
data is the availability of perfect ground truth. On purpose of simple illustration
experiments are conducted with a 2D TSDF grid, i. e. we have a 1D depth map
and a 2D surface. All the experiments demonstrate the result after a single
depth measurement. Integration of multiple depth maps is out of scope and will
be addressed in future work. We decompose the reconstruction accuracy in two
components and investigate each in a dedicated experiment.
4.1 Depth Error
In this experiment we calculate the depth error edfor each pixel of a synthetic
depth map and the according depth reconstructed from the TSDF. The synthetic
depth maps contain a planar object surface which is perpendicular to the optical
axis (see Fig. 2a). We placed the object 800 mm in front of camera position and
6 TSDF: Experiments on Voxel Size
21 0 1 2
1
2
3
0
0.62
0.00
0.86
302010 0 10 20 30
0
5
10
15
αin
|ed|in mm
v=4 v=16 v=64
(a) (b)
Fig. 3. Angular depth error. (a) Map of edacross TSDF volume for voxel size vof
64 mm. (b) Mean error along rays for different voxel sizes v.
move it from this position in steps of 0.1 mm till a distance of 3100 mm. The
camera is placed as in Fig. 1a with optical axis passing through the middle of
the TSDF grid and aligned with one of its axes. The size of the world grid is
fixed to 4096 mm width and height in all experiments.
Fig. 2b shows the mean, standard deviation, minimum and maximum of the
absolute depth error across tested field of view for several voxel sizes v. It is
apparent that the absolute depth error increases with voxel size. However, the
mean error increases slower than voxel size. Whereas the mean error is 0.04 mm
for v= 2 mm (1.9 %), it is 0.97 mm for v= 128 mm (0.7 %). There is a similar
effect for the maximum and minimum error.
Further, looking at the spacial distribution of the error one can observe that
the most severe errors occur on the border of the field of view (see Fig. 3a)
whereas the object distance seems to have no influence on reconstruction accu-
racy. To investigate the influence of the angle in between grid axis and ray we
calculate the mean of absolute error along each viewing ray. The results are given
in Fig. 3b. Here it is apparent that the error is at least two orders of magnitude
larger on the borders of the field of view, and even more the higher vgets. The
high error at the borders are artefacts caused by the trilinear interpolation. The
PCL implementation of KinectFusion that we used for our experiments [12] does
not pay attention to the fact that some TSDF voxels xinvolved in the interpo-
lation may be unseen. For these voxels tsdfi(x) = 0 from initialization. There
has been no sensing for these voxels and therefore you can not expect to have
a surface there, which would be true for seen voxles xwith a TSDF value of 0.
With increasing voxel size this error gets apparent, as more reconstructed 3D
points are affected by this problem because the influence of the interpolation is
one voxel size.
TSDF: Experiments on Voxel Size 7
1 2 3 4
0
20
40
60
l/v
length in % of object length
fp
fn
cor
1 2 3 4
1
2
3
l/v
edin mm
(a) (b)
Fig. 4. Lateral depth error. (a) Average length of wrongly reconstructed (f p), wrongly
not reconstructed (fn) and correctly reconstructed (cor) object in percent of true
object length. (b) Average depth error in connection to ratio of true object length l
and voxel soze v. Voxel size vis 64 mm.
4.2 Lateral Error
With this experiment we investigate how the ratio of voxel size vand the length
lof an planar object, located perpendicular to the optical camera axis, influ-
ence the reconstruction accuracy. The objects were moved in a camera distance
from 1280 mm to 1536 mm, wich are 4 voxels with 64 mm size. We chosed step
sizes 8 mm in vetical and horizontal direction. The object bounds laid inside the
viewing area with an distance of 128 mm from border. For all objects of same
length generated for this experiment we calculated mean values and looked at
depth error deand at the length fp,f n and cor which are the length of the
reconstructed object wich are wrongly reconstructed, wrongly not reconstructed
and correctly reconstructed in % of the true object length.
In Fig. 4 you can assert that all of these length together with edare clearly
influenced by the ratio of land v. It is also obvious that the objects are recon-
structed too small in average.
5 Conclusion
In this paper we gave a detailed look at TSDF and the parametric and algorith-
mic options. We showed that for PCL’s implementation depth errors are in same
magnitude for voxel sizes 4 to 64mm. The errors are 2 magnitudes larger at the
border of viewing field due to interpolation effects. For lateral errors there is a
strong relation between ratio of object length and voxel size and the reconstruc-
tion accuracy whith the object being too small in average. The negative impact
of increase in voxel size is lower for depth error than for lateral error.
8 TSDF: Experiments on Voxel Size
Acknowledgement
This work was supported by Transregional Collaborative Research Centre
SFB/TRR 62 (“Companion-Technology for Cognitive Technical Systems”) funded
by the German Research Foundation (DFG).
References
1. Bondarev, E., Heredia, F., Favier, R., Ma, L., de With, P.H.: On photo-realistic
3D reconstruction of large-scale and arbitrary-shaped environments. In: Consumer
Communications and Networking Conference (CCNC), 2013 IEEE. pp. 621–624.
IEEE (2013)
2. Curless, B., Levoy, M.: A volumetric method for building complex models from
range images. In: Proceedings of the 23rd Annual Conference on Computer Graph-
ics and Interactive Techniques. pp. 303–312. SIGGRAPH ’96, ACM, New York,
NY, USA (1996)
3. Hemmat, H.J., Bondarev, E., de With, P.H.N.: Exploring distance-aware weight-
ing strategies for accurate reconstruction of voxel-based 3D synthetic models. In:
MultiMedia Modeling. Lecture Notes in Computer Science, vol. 8325, pp. 412–423.
Springer (2014)
4. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton,
J., Hodges, S., Freeman, D., Davison, A.: KinectFusion: real-time 3D reconstruction
and interaction using a moving depth camera. In: Proceedings of the 24th annual
ACM symposium on User interface software and technology. pp. 559–568. ACM
(2011)
5. Microsoft: Kinect (2014), http://www.xbox.com/en-us/kinect/
6. Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J.,
Kohli, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: real-time dense
surface mapping and tracking. In: Proceedings of the 2011 10th IEEE International
Symposium on Mixed and Augmented Reality. pp. 127–136. ISMAR ’11, IEEE
Computer Society, Washington, DC, USA (2011)
7. Paulsen, R.R., Bærentzen, J.A., Larsen, R.: Regularisation of 3d signed distance
fields. In: SCIA. Lecture Notes in Computer Science, vol. 5575, pp. 513–519.
Springer (2009)
8. Roth, H., Vona, M.: Moving volume KinectFusion. In: Proceedings of the British
Machine Vision Conference. pp. 112.1–112.11. BMVA Press (2012)
9. Whelan, T., Johannsson, H., Kaess, M., Leonard, J.J., McDonald, J.: Robust track-
ing for real-time dense RGB-D mapping with kintinuous. In: Technical Report 031.
MIT-CSAIL (2012), http://hdl.handle.net/1721.1/73167
10. Whelan, T., Johannsson, H., Kaess, M., Leonard, J.J., McDonald, J.: Robust real-
time visual odometry for dense RGB-D mapping. In: 2013 IEEE International
Conference on Robotics and Automation (ICRA). pp. 5724–5731. IEEE (2013)
11. Whelan, T., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., McDonald, J.:
Kintinuous: Spatially extended KinectFusion. In: Technical Report 020. MIT-
CSAIL (2012), http://hdl.handle.net/1721.1/71756
12. Willow Garage and other contributors: Open source implementation of KinectFu-
sion in PCL 1.7.1 (2014), http://www.pointclouds.org/downloads/
... Based on the data obtained, it dynamically adjusts to new circumstances; for example, it goes around the detected obstacles. Voxels are constructed using the Truncated Signed Distance Function method [11]. This algorithm converts information from the camera and depth sensor into voxels in 3D space. ...
... This approach does not guarantee the grid's constancy (consistency) and non-redundancy. Mesh generation is performed by a scalable voxel merge TSDF [11] to avoid voxel hash conflicts while progressively updating the surface mesh according to the TSDF variation of the newly merged voxels. ...
Conference Paper
Full-text available
The paper presents a method for automating RGB images shot by the human operator and analysis for improving the quality of further 3D reconstruction for low-rise outdoor objects and a use case for method application. The method provides automation as a three-step process: local analysis of images performed during the shooting, global analysis, and recommendations performed after the shooting. Method steps filter out defective images, approximate future 3D-model using SfM, and map it to human operator trajectory estimation to identify object areas that will have low resolution on the final 3D model and, therefore, require additional shooting. Method structure does not require any sensors except RGB camera and inertial sensors and does not rely on any external backend, which lowers the hardware requirement. The authors implemented the method and use-case as an Android application. The method was evaluated by experiments on outdoor datasets created for this study. The evaluation shows that the local analysis stage is fast enough to perform during the shooting process and that the local analysis stage improves the quality of the final 3D model.
... ) is the truncation operator and δ is the truncated threshold, w i (v) is the corresponding weight representing the observation uncertainty. We refer the interested reader to [29,41] for more detailed descriptions. ...
... Computers & Graphics 102 (2022)[30][31][32][33][34][35][36][37][38][39][40][41][42][43][44] ...
Article
Pose tracking and geometry reconstruction are greatly significant for the high-level perceptual understanding and close-proximity operations of dynamic and geometrically unknown non-cooperative targets in space. However, the performance degrades severely under the commonly outlier-contaminated and corrupted measurements acquired from Time-of-Flight (ToF) cameras. In this paper, we are motivated to investigate the framework of both robust pose tracking and reliable geometry reconstruction. For the pose tracking, we propose an improved robust Iterative Closest Point (ICP) method based on adaptive Iteratively Reweighted Least Squares (IRLS), which can gradually jump out of the local minima in a naturally coarse-to-fine fashion. Besides, we put forward a hybrid feature-free loop closure detection approach to efficiently eliminate the accumulated error, meanwhile avoiding the ambiguity caused by the symmetrical structure. Regrading the geometry reconstruction, we present an explicit and general geometry uncertainty description, incorporated into a mixture-based probabilistic fusion method, to cope with the reconstruction defects. The experimental results show that our pose tracking and geometry reconstruction methods can achieve better performance in terms of robustness and accuracy.
... a) The camera's field of vision, optical axis, and ray (blue), as well as the TSDF grid (unseen voxels are white). (b) A TSDF sample was taken along the ray[104].https://doi.org/10.1371/journal.pone.0287155.g008 ...
Article
Full-text available
Real-time three-dimensional (3D) reconstruction of real-world environments has many significant applications in various fields, including telepresence technology. When depth sensors, such as those from Microsoft’s Kinect series, are introduced simultaneously and become widely available, a new generation of telepresence systems can be developed by combining a real-time 3D reconstruction method with these new technologies. This combination enables users to engage with a remote person while remaining in their local area, as well as control remote devices while viewing their 3D virtual representation. There are numerous applications in which having a telepresence experience could be beneficial, including remote collaboration and entertainment, as well as education, advertising, and rehabilitation. The purpose of this systematic literature review is to analyze the recent advances in 3D reconstruction methods for telepresence systems and the significant related work in this field. Next, we determine the input data and the technological device employed to acquire the input data, which will be utilized in the 3D reconstruction process. The methods of 3D reconstruction implemented in the telepresence system as well as the evaluation of the system, have been extracted and assessed from the included studies. Through the analysis and summarization of many dimensions, we discussed the input data used for the 3D reconstruction method, the real-time 3D reconstruction methods implemented in the telepresence system, and how to evaluate the system. We conclude that real-time 3D reconstruction methods for telepresence systems have progressively improved over the years in conjunction with the advancement of machines and devices such as Red Green Blue-Depth (RGB-D) cameras and Graphics Processing Unit (GPU).
... To address this problem, we obtained a depth map and an RBG image corresponding to the new viewpoint using the TMRM. First, we developed a truncated signed distance function (TSDF) [34] from the RGBD data in the training set, and constructed a TSDF-based 3D mesh using the marching cube algorithm [35]. Second, depth maps and RGB images of novel viewpoints were obtained quickly through the rendering pipeline. ...
Article
Full-text available
Neural radiation field (NeRF)-based novel view synthesis methods are gaining popularity. NeRF can generate more detailed and realistic images than traditional methods. Conventional NeRF reconstruction of a room scene requires at least several hundred images as input data and generates several spatial sampling points, placing a tremendous burden on the training and prediction process with respect to memory and computational time. To address these problems, we propose a prior-driven NeRF model that only accepts sparse views as input data and reduces a significant number of non-functional sampling points to improve training and prediction efficiency and achieve fast high-quality rendering. First, this study uses depth priors to guide sampling, and only a few sampling points near the controllable range of the depth prior are used as input data, which reduces the memory occupation and improves the efficiency of training and prediction. Second, this study encodes depth priors as distance weights into the model and guides the model to quickly fit the object surface. Finally, a novel approach combining the traditional mesh rendering method (TMRM) and the NeRF volume rendering method was used to further improve the rendering efficiency. Experimental results demonstrated that our method had significant advantages in the case of sparse input views (11 per room) and few sampling points (8 points per ray).
... Specifically, we convert the line map to a distance field by Euclid distance transform [51] to obtain each pixel's distance to the nearest detail line. Following [78,21], we truncate the distance value to 5% of the map width to concentrate on the neighborhood of lines, producing the final distance field maps ( Fig. 3 (c)). ...
Preprint
Morphable models are essential for the statistical modeling of 3D faces. Previous works on morphable models mostly focus on large-scale facial geometry but ignore facial details. This paper augments morphable models in representing facial details by learning a Structure-aware Editable Morphable Model (SEMM). SEMM introduces a detail structure representation based on the distance field of wrinkle lines, jointly modeled with detail displacements to establish better correspondences and enable intuitive manipulation of wrinkle structure. Besides, SEMM introduces two transformation modules to translate expression blendshape weights and age values into changes in latent space, allowing effective semantic detail editing while maintaining identity. Extensive experiments demonstrate that the proposed model compactly represents facial details, outperforms previous methods in expression animation qualitatively and quantitatively, and achieves effective age editing and wrinkle line editing of facial details. Code and model are available at https://github.com/gerwang/facial-detail-manipulation.
... Finally, with the Poisson reconstruction algorithm [49], the mesh of the environment can be obtained. This pipeline produces higher quality than the TSDF method [50]. The surface is smooth while still retaining details. ...
Article
Full-text available
Communication is an essential part of most professional activities. For complex industrial products such as ships, some design problems can only be found during the construction process. However, it is time-consuming for on-site assemblers to provide feedback on problems through traditional methods such as phone calls and photos. In recent years, adding structured visual cues through augmented reality (AR) has become an important method to assist remote collaborative tasks. However, previous studies and commercial solutions often had limited annotation types and relied on specific devices for tracking and reconstruction, which is unsatisfactory for deployment in industrial equipment and large scenes. The presented work tackles issues with feedback and collaborative decision-making of design problems in the pipe outfitting stage and proposes a cross-platform AR annotation solution on desktop computers, industrial tablets, and AR head-mounted devices (HMD). The novelty of the current work lies in a cross-functional annotation taxonomy for real-time collaboration, device-cloud integrative context awareness methods based on RGB-D sensors compatible with more devices, 2D–3D annotation mapping algorithms, and visual enhancement strategies for large scenes. A prototype system is developed and verified on a ship unit model and a physical pipe platform. The results indicate that the solution provides good performance in real-time capability and reconstruction precision, allows rapid localization of faulty parts, and creates annotations flexibly and reliably. This solution is also expected to play an active role in collaborative quality inspection and intelligent management in shipbuilding and other heavy industries.
Article
In industrial robotic contact-based operations it is necessary to have a detailed geometrical knowledge of the work-piece to automate the generation of the working trajectory. In many cases, the digital model is not available or it differs from its as built actual conditions. Vision sensors, especially 3D vision sensors allows to scan the work-piece and reconstruct its digital copy that is used for the generation of the robotic working trajectory. In this paper we compare two algorithms for the generation of the 3D model of an unknown work-piece based on the use of the RGB-D images taken from different perspectives. The first technique, based on standard image reconstruction methods commonly used for reconstruction of indoor scenes where error in the order of few centimeters is negligible, is exploited in this work in order to be exploited in contact-based robotic operations where the geometrical error has to be limited to few millimeters. It is based on the analysis of the images to estimate the camera pose taking each image. Based on the pose, the algorithm integrates then the information contained in the images in a unique volume representing the scene. The second algorithm directly uses the known poses of the robot, stored while capturing each image, to integrate the information to create the model. The two algorithms are compared to evaluate the accuracy, acquisition time and elaboration time. The analysis is done considering two kinds of objects, the first one with regular shape, the second one corresponding to an actual industrial object.
Chapter
Morphable models are essential for the statistical modeling of 3D faces. Previous works on morphable models mostly focus on large-scale facial geometry but ignore facial details. This paper augments morphable models in representing facial details by learning a Structure-aware Editable Morphable Model (SEMM). SEMM introduces a detail structure representation based on the distance field of wrinkle lines, jointly modeled with detail displacements to establish better correspondences and enable intuitive manipulation of wrinkle structure. Besides, SEMM introduces two transformation modules to translate expression blendshape weights and age values into changes in latent space, allowing effective semantic detail editing while maintaining identity. Extensive experiments demonstrate that the proposed model compactly represents facial details, outperforms previous methods in expression animation qualitatively and quantitatively, and achieves effective age editing and wrinkle line editing of facial details. Code and model are available at https://github.com/gerwang/facial-detail-manipulation.
Article
In this paper, we introduced a novel voxel-wise UV parameterization and view-dependent texture synthesis for the immersive rendering of a truncated signed distance field (TSDF) scene model. The proposed UV parameterization delegates a precomputed UV map to each voxel using the UV map lookup table and consequently, enabling efficient and high-quality texture mapping without a complex process. By leveraging the convenient UV parameterization, our view-dependent texture synthesis method extracts a set of local texture maps for each voxel from the multiview color images and separates them into a single view-independent diffuse map and a set of weight coefficients for an orthogonal specular map basis. Furthermore, the view-dependent specular maps for an arbitrary view are estimated by combining the specular weights of each source view using the location of the arbitrary and source viewpoints to generate the view-dependent textures for arbitrary views. The experimental results demonstrate that the proposed method effectively synthesizes texture for an arbitrary view, thereby enabling the visualization of view-dependent effects, such as specularity and mirror reflection.
Conference Paper
Full-text available
We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.
Conference Paper
Full-text available
KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPU-based pipeline are described in full. Uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions are shown. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch.
Technical Report
This paper describes extensions to the Kintinuous [1] algorithm for spatially extended KinectFusion, incorporating the following additions: (i) the integration of multiple 6DOF camera odometry estimation methods for robust tracking; (ii) a novel GPU-based implementation of an existing dense RGB-D visual odometry algorithm; (iii) advanced fused realtime surface coloring. These extensions are validated with extensive experimental results, both quantitative and qualitative, demonstrating the ability to build dense fully colored models of spatially extended environments for robotics and virtual reality applications while remaining robust against scenes with challenging sets of geometric and visual features.
Conference Paper
In this paper, we propose and evaluate various distance-aware weighting strategies to improve reconstruction accuracy of a voxel-based model according to the Truncated Signed Distance Function (TSDF), from the data obtained by low-cost depth sensors. We look at two strategy directions: (a) weight definition strategies prioritizing importance of the sensed data depending on the data accuracy, and (b) model updating strategies defining the level of influence of the new data on the existing 3D model. In particular, we introduce Distance-Aware (DA) and Distance-Aware Slow-Saturation (DASS) updating methods to intelligently integrate the depth data into the synthetic 3D model based on the distance-sensitivity metric of a low-cost depth sensor. By quantitative and qualitative comparison of the resulting synthetic 3D models to the corresponding ground-truth models, we identify the most promising strategies, which lead to an accuracy improvement involving a reduction of the model error by 10 - 35%.
Article
This paper describes extensions to the Kintinuous algorithm for spatially extended KinectFusion, incorporating the following additions: (i) the integration of multiple 6DOF camera odometry estimation methods for robust tracking; (ii) a novel GPU-based implementation of an existing dense RGB-D visual odometry algorithm; (iii) advanced fused real-time surface coloring. These extensions are validated with extensive experimental results, both quantitative and qualitative, demonstrating the ability to build dense fully colored models of spatially extended environments for robotics and virtual reality applications while remaining robust against scenes with challenging sets of geometric and visual features.
Article
In this paper we present an extension to the KinectFusion algorithm that permits dense mesh-based mapping of extended scale environments in real-time. This is achieved through (i) altering the original algorithm such that the region of space being mapped by the KinectFusion algorithm can vary dynamically, (ii) extracting a dense point cloud from the regions that leave the KinectFusion volume due to this variation, and, (iii) incrementally adding the resulting points to a triangular mesh representation of the environment. The system is implemented as a set of hierarchical multi-threaded components which are capable of operating in real-time. The architecture facilitates the creation and integration of new modules with minimal impact on the performance on the dense volume tracking and surface reconstruction modules. We provide experimental results demonstrating the system's ability to map areas considerably beyond the scale of the original KinectFusion algorithm including a two story apartment and an extended sequence taken from a car at night. In order to overcome failure of the iterative closest point (ICP) based odometry in areas of low geometric features we have evaluated the Fast Odometry from Vision (FOVIS) system as an alternative. We provide a comparison between the two approaches where we show a trade off between the reduced drift of the visual odometry approach and the higher local mesh quality of the ICP-based approach. Finally we present ongoing work on incorporating full simultaneous localisation and mapping (SLAM) pose-graph optimisation.
Article
The recently reported KinectFusion algorithm uses the Kinect and GPU algorithms to simultaneously track the camera and build a dense scene reconstruction in real time. However, it is locked to a fixed volume in space and can not map surfaces that lie outside that volume. We present moving volume KinectFusion with additional algorithms to automatically translate and rotate the volume through space as the camera moves. This makes it feasible to use the algorithm for perception in mobile robotics and other free-roaming applications, simultaneously providing both visual odometry and a dense spatial map of the local environment. We present experimental results for several RGB-D SLAM benchmark datasets and also for novel datasets including a 25m outdoor hike.
Conference Paper
This paper describes extensions to the Kintinuous [1] algorithm for spatially extended KinectFusion, incorporating the following additions: (i) the integration of multiple 6DOF camera odometry estimation methods for robust tracking; (ii) a novel GPU-based implementation of an existing dense RGB-D visual odometry algorithm; (iii) advanced fused realtime surface coloring. These extensions are validated with extensive experimental results, both quantitative and qualitative, demonstrating the ability to build dense fully colored models of spatially extended environments for robotics and virtual reality applications while remaining robust against scenes with challenging sets of geometric and visual features.
Conference Paper
This paper presents a system architecture for reconstructing photorealistic and accurate 3D models of indoor environments. The system specifically targets large-scale and arbitrary-shaped environments and enables processing of data obtained with an arbitrary-chosen capturing path. The system extends the baseline Kinect Fusion algorithm with a buffering algorithm to remove scene-size capturing limitations. Beside this, the paper presents the complete chain of advanced algorithms for point cloud segmentation/decimation, camera pose correction and texture mapping with post-processing filters. The presented architecture features memory- and processor-efficient processing, such that it can be executed on a conventional PC with a mainstream GPU card at the consumer premises.