Conference PaperPDF Available

Hole Filling for Kinect v2 Depth Images

Abstract and Figures

Microsoft has released the Kinect for Windows v2 sensor (Kinect v2) recently as an improved version of its original Kinect (Kinect v1). The depth and color resolutions of the Kinect v2 have been approximately doubled compared to the Kinect v1. Also, the depth sensing mechanism has been changed from the structured lighting IR sensing to the time of flight (ToF) technology. Accordingly, depth artifacts of the Kinect v1 such as random holes have been alleviated significantly, but the holes still exist around object boundaries in the Kinect v2. In this paper, we exploit the edge information from the color image to fill the holes in the depth image of the Kinect v2.
Content may be subject to copyright.
Hole Filling for Kinect v2 Depth Images
Wanbin Song1, Anh Vu Le1, Seokmin Yun1, Seung-Won Jung2 and
Chee Sun Won1,
1 Dongguk University-Seoul
Dept of Electronics and Electrical Engineering,
2 Dongguk University-Seoul, Dept of Multimedia Engineering.
wbsong@dongguk.edu (W.S.), levuanh.hut@gmail.com (A.V.L.), smyun@dongguk.edu
(S.Y.), swjung83@dongguk.edu (S.-W. J.), cswon@dongguk.edu (C.S.W.)
Abstract. Microsoft has released the Kinect for Windows v2 sensor (Kinect v2)
recently as an improved version of its original Kinect (Kinect v1). The depth and
color resolutions of the Kinect v2 have been approximately doubled compared to
the Kinect v1. Also, the depth sensing mechanism has been changed from the
structured lighting IR sensing to the time of flight (ToF) technology. Accordingly,
depth artifacts of the Kinect v1 such as random holes have been alleviated
significantly, but the holes still exist around object boundaries in the Kinect v2.
In this paper, we exploit the edge information from the color image to fill the
holes in the depth image of the Kinect v2.
Keywords: Kinect v2; hole-filling; depth map, registration;
1 Introduction
Recently, a new version of Kinect has been released from Microsoft with improved
specifications. Both color and depth images of Kinect v2 are improved not only on
resolution but also on FOV (Field Of View). Specifically, the color image of Kinect v2
has FHD (Full HD) resolution and it provides horizontally 84.1° and vertically 53.8°
FOV. Also, depth image with resolution 512x424 has horizontally 70.6° and vertically
60° FOV. The range coverage has been extended from 0.8-3.5m to 0.5-8m [1]. Another
major change in Kinect v2 is the method of depth sensing. The comparative
specifications between Kinect v1 and Kinect v2 are summarized in Table 1 [2].
Table 1. Specification comparisons between Kinect v1 and Kinect v2.
Kinect v1 Kinect v2
Depth measurement Structured light coding ToF (Time of Flight)
Resolution (Color) 640480 (pixel) 19201080 (pixel)
Resolution (Depth) 320240 (pixel) 512424 (pixel)
FOV (Color) 62°48.6° 84.1°53.8°
FOV (Depth) 57.5°43.5° 70.6°60°
Maximum skeletal tracking 2 6
Working range 0.8m~3.5m 0.5m~8m
The first version of the Kinect uses a structured light pattern for depth measurements.
However, this light coding method of depth measurement causes large holes near the
object boundaries and serious interference errors for multiple Kinects. In comparison
with the Kinect v1, different depth artifacts are observed in the Kinect v2 due to the
ToF (Time of Flight) approach of depth measurement. For example, as shown in Fig.
1, random blobs of holes of Kinect v1 are reduced in Kinect v2 but there still exist holes
near the object boundaries. Specifically, the large holes of occlusion in Fig. 1(a) are
reduced as thin hole-lines on the boundary of object as in Fig. 1(b). However, as shown
in the four corners of Fig. 1(b), a new type of hole appears in the Kinect v2. Finally, we
note that we can see more objects in Fig. 1(b) thank to the expanded FOV of the Kinect
v2.
Fig. 1. Depth image comparison between Kinect v1 and v2: (a) Depth image of Kinect v1 (holes
in white); (b) Depth image of Kinect v2 (holes in black);
Our goal of this paper is to fill the holes in Kinect v2. Among various kinds of holes
in Kinect v2, we focus on the holes along the object boundaries. Our approach to fill
these boundary holes is to exploit the actual edge position and edge direction
information from the color image and to apply the directional joint bilateral filtering
method [3] for depth images. As a prerequisite of our approach, the color and the depth
image captured from the Kinect v2 should be precisely registered.
This paper is organized as follows. Preprocessing of our work is described in Section
2 and hole-filling algorithm is presented in Section 3. We provide experimental results
in Section 4. Then, this paper concludes in Section 5.
2 Color and Depth Registration
Before we apply the directional joint bilateral filter for hole-filling [3], the color and
the depth images from Kinect v2 should be registered. The registration can be done by
calibrating the two sensors with the transformation matrix between the color and the
depth images. One can use IR image provided by the Kinect v2 instead of the depth
image for registration. However, the intensity IR image cannot be directly used for
many reasons including the problems of saturation and intensity value falling off
phenomenon [4].
Since the resolution and FOV of color and depth images are different, the coverage
of the depth and color images is different and the depth sensor scans only a part of color
image. So, we need the cropping of the color image before the calibration. Microsoft
provides new Kinect for Windows Software Development Kit (Kinect for Windows
SDK) with Kinect v2 to enables developers to create applications using Kinect sensor.
They provide many functions to handle Kinect, and there is a registration function
named CoordinateMapper in Visual Studio C# format. Using this function, we can crop
the color image to fit the coverage of the color image to the depth image (see Fig. 2(c)).
However, as shown in the superimposed depth and the cropped color image in Fig. 2(d),
the registration is not good enough and we can see misalignments between the depth
boundary and the color boundary. Therefore, we need a calibration matrix between the
two images before the hole-filling.
Fig. 2. Color image resizing with the factory provided function: (a) Original color image
(1920x1080); (b) Original depth image (512x424); (c) Resized color image (512x424); (d)
Superimposed edge map of color and depth images (white lines represent the edge of depth
image).
For the calibration, we need a projective transformation that converts color image
coordinate to depth image coordinate. Since the holes in depth have zero values, it is
difficult to interpolate if we convert the depth image coordinate to the color image
coordinate. Then, the projective matrix that converts the depth image coordinate (x,y)
into the color image coordinate (X,Y) is given as follows [5]
X
Y
1=a1a2a3
a4a5a6
a7a81x
y
1. (1)
In Equation (1), we have eight unknown parameters and it can be rewritten as the
following equations
X=a1xa2ya3a7xX a8yX (2)
Y=axayaa7xY a8yY . (3)
At least four corresponding pairs are needed to solve the equations for the eight
unknown coefficients. And, if we use more than four pairs of points, the system can be
solved using a least-square method such as the pseudo-inverse [6]. We used
quadrilateral boards to find the correspondence pairs between the color and depth
images manually as in Fig. 3(a) and (b). By substituting the coordinates of the
corresponding pairs to Equation (2) and Equation (3) and solving the linear
simultaneous equations, the eight unknown coefficients can be determined. Then, using
the complete transformation matrix, color image can be aligned with depth image.
Comparing Fig. 2(d) with Fig. 3(d), the matching accuracy by our projective
transformation matrix has improved a lot.
Fig. 3. Calibration by projective transformation (Black dotted line represents correspondence
pairs): (a) Resized color image (512x424); (b) Original depth image (512x424); (c) Edge map
after registration (white line represents the edge of depth image).
3 Edge-based hole-filling method
After the registration we can superimpose the color image on the depth image and
we can exploit the actual locations of the edges from the color image to determine the
direction of the hole-filling. That is, according to the directional joint bilateral filter for
hole-filling [3], every pixel in depth image is classified into four groups depending on
the co-existence of edge and hole, namely non-hole/non-edge, non-hole/edge,
hole/non-edge, and hole/edge. This classification can be easily done with aligned color
and depth image. Noises in the non-hole depth map can be removed by JTF (Joint
Trilateral Filter) [7], and the blurring artifacts in the boundary of object are removed by
DJBF (Directional Joint Bilateral Filter). Also, the holes in the non-edge regions are
filled by PDJBF (Partial Directional Joint Bilateral Filter) and those in the edge regions
can be filled by the DJBF [3].
4 Experimental results
In this section, we present the results of our hole-filling method. All test images in
this paper are captured by our Kinect v2.
Fig. 4. Experimental results: (a) Resized color image (512x424); (b) Original depth image
(512x424); (c) Hole-filled depth image; (d) Resized color image (512x424); (e) Original depth
image (512x424); (f) Hole-filled depth image.
Experimental results are shown in Fig. 4 and 5. Fig. 4(a) and 4(d) are resized color
images and Fig. 4(b) and 4(e) are original depth images. By applying our hole-filling
algorithm, the holes in the region boundaries are filled as Fig. 4(c) and 4(f). Fig. 5
shows zoomed depth images at an object boundary and a corner. As one can see in the
figures, the holes at the region boundary and the corner are filled without the blurring
artifact.
Fig. 5. Visual Comparison with zoomed images: (a) Holes at the object boundary; (b) Hole-filled
image; (c) Holes at a corner; (d) Hole-filled image.
And the projective transformation matrix T is obtained as in Section 2 with 12
correspondence points as follows
T= 0.9964 -0.0033 -0.00001
-0.0197 0.9879 -0.00002
-8.0255 1.5348 1 . (4)
5 Conclusions
In this paper, we have alleviated the imperfections of the Kinect v2 by filling holes
in depth image. Before applying hole-filling method, we calibrated Kinect color and
depth image due to the misalignment of two images. Using adaptive directional filter
kernel with adaptive filter range gives better hole filling results, and makes possible to
obtain clearly high quality depth image of Kinect v2.
Acknowledgments. This research was supported by the MSIP(Ministry of Science,
ICT and Future Planning), Korea, under the ITRC(Information Technology Research
Center)) support program (NIPA-2014-H0301-14-4007) supervised by the
NIPA(National IT Industry Promotion Agency).
References
1. Microsoft. Kinect for Windows, http://www.microsoft.com/en-us/kinectforwindows
2. El-laithy, R. A., Huang, J., Yeh, M.: Study on the use of Microsoft Kinect for robotics
applications. In: Position Location and Navigation Symposium (PLANS), pp. 1280—1288.
IEEE (2012)
3. Le, A. V., Jung, S. W., Won, C. S.: Directional Joint Bilateral Filter for Depth Images.
Sensors, 14(7), 11362--11378 (2014)
4. Jung, S. W., Choi, O.: Color image enhancement using depth and intensity measurements
of a time-of-flight depth camera. Optical Engineering, 52(10), pp.103104—103104 (2013).
5. Rothwell, C. A., Forsyth, D. A., Zisserman, A., Mundy, J. L.: Extracting projective structure
from single perspective views of 3D point sets. In: Computer Vision, 1993. Proceedings,
Fourth International Conference on, pp. 573--582. IEEE (1993)
6. Golub, G., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix.
Journal of the Society for Industrial & Applied Mathematics, Series B: Numerical Analysis,
2(2), 205--224 (1965)
7. Jung, S. W.: Enhancement of image and depth map using adaptive joint trilateral filter.
Circuits and Systems for Video Technology, IEEE Transactions on, 23(2), 258--269 (2013)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Depth maps taken by the low cost Kinect sensor are often noisy and incomplete. Thus, post-processing for obtaining reliable depth maps is necessary for advanced image and video applications such as object recognition and multi-view rendering. In this paper, we propose adaptive directional filters that fill the holes and suppress the noise in depth maps. Specifically, novel filters whose window shapes are adaptively adjusted based on the edge direction of the color image are presented. Experimental results show that our method yields higher quality filtered depth maps than other existing methods, especially at the edge boundaries.
Article
A time-of-flight (ToF) depth camera can capture a depth map of the scene by measuring the phase delay between the emitted and reflected infrared (IR) light signals. In addition, an intensity map that represents the magnitude of the reflected light can be obtained by the ToF camera. If we consider the light source of the ToF camera as a flash, the intensity map can be deemed as an IR flashed image. Building on ideas from flash/no-flash photography and dark flash photography, we devise a color image enhancement framework that exploits information from the intensity and depth maps. To this end, ToF-related distortions of the intensity and depth maps are first reduced. We then restore fine details of color images captured under weak illumination by combining mutually beneficial information from the visible and IR band signals. In addition, we show that the depth map can be used to produce depth-adaptive effects such as depth-adaptive smoothing at the resultant color image. (C) 2013 Society of Photo-Optical Instrumentation Engineers (SPIE)
Article
In this paper, we present an adaptive joint trilateral filter (AJTF), which consists of domain, range, and depth filters. The AJTF is used for the joint enhancement of images and depth maps, which is achieved by suppressing the noise and sharpening the edges simultaneously. For improving the sharpness of the image and depth map, the AJTF parameters, the offsets, and the standard deviations of the range and depth filters are determined in such a way that image edges that match well with depth edges are emphasized. To this end, pattern matching between local patches in the image and depth map is performed and the matching result is utilized to adjust the AJTF parameters. Experimental results show that the AJTF produces sharpness-enhanced images and depth maps without overshoot and undershoot artifacts, while successfully reducing noise as well. A comparison of the performance of the AJTF with those of conventional image and depth enhancement algorithms shows that the proposed algorithm is effective.
Article
The Microsoft X-Box Kinect Sensor is a revolutionary new depth camera that is used in the gaming industry to capture motions of people and players efficiently using the technology of an RGB camera and infrared camera to differentiate depth. In the Microsoft X-Box, Kinect was used to sense 3D perception of human's motions. It can also be used for robotic applications, precisely for indoor navigation through the process of reverse engineering. Certain software packages were made available and are open source from “LibFreenect” for Linux machines, Microsoft's Kinect SDK using the Kinect namespace on Visual Studio 2010 Express (C++, C# or Visual Basic), and Google's released “Robotic Operating System (ROS)”. In order to claim that this sensor is capable of taking on such a task, we must be able to investigate thoroughly all factors that contribute to this and at the same time we must be able to understand its limitations to be applied and integrated properly with certain types of robots for accomplishing our purpose of achieving successful indoor navigation using proper algorithms. In this paper, the results from testing the Kinect sensor on an autonomous ground vehicle was given.
Article
A numerically stable and fairly fast scheme is described to compute the unitary matrices U and V which transform a given matrix A into a diagonal form Σ = U* AV, thus exhibiting A's singular values on Σ's diagonal. The scheme first transforms A to a bidiagonal matrix J, then diagonalizes J. The scheme described here is complicated but does not suffer from the computational difficulties which occasionally afflict some previously known methods. Some applications are mentioned, in particular the use of the pseudo-inverse AI = VΣIU* to solve least squares problems in a way which dampens spurious oscillation and cancellation.
Conference Paper
A number of recent papers have argued that invariants do not exist for three-dimensional point sets in general position, which has often been misinterpreted to mean that invariants cannot be computed for any three-dimensional structure. It is proved by example that although the general statement is true, invariants do exist for structured three-dimensional point sets. Projective invariants are derived for two object classes: the first is for points that lie on the vertices of polyhedra, and the second for objects that are projectively equivalent to ones possessing a bilateral symmetry. The motivations for computing such invariants are twofold: they can be used for recognition, and they can be used to compute projective structure. Examples of invariants computed from real images are given
Article
A number of recent papers have argued that invariants do not exist for three dimensional point sets in general position [3, 4, 13]. This has often been misinterpreted to mean that invariants cannot be computed for any three dimensional structure. This paper proves by example that although the general statement is true, invariants do exist for structured three dimensional point sets. Projective invariants are derived for two classes of object: the first is for points that lie on the vertices of polyhedra, and the second for objects that are projectively equivalent to ones possessing a bilateral symmetry. The motivations for computing such invariants are twofold: firstly they can be used for recognition; secondly they can be used to compute projective structure. Examples of invariants computed from real images are given. 1 Introduction Exploiting structure modulo a projectivity has recently been shown to simplify a number of vision tasks such as model based recognition [1, 7, 10, 11, 1...