Hole Filling for Kinect v2 Depth Images
Wanbin Song1, Anh Vu Le1, Seokmin Yun1, Seung-Won Jung2 and
Chee Sun Won1,
1 Dongguk University-Seoul
Dept of Electronics and Electrical Engineering,
2 Dongguk University-Seoul, Dept of Multimedia Engineering.
firstname.lastname@example.org (W.S.), email@example.com (A.V.L.), firstname.lastname@example.org
(S.Y.), email@example.com (S.-W. J.), firstname.lastname@example.org (C.S.W.)
Abstract. Microsoft has released the Kinect for Windows v2 sensor (Kinect v2)
recently as an improved version of its original Kinect (Kinect v1). The depth and
color resolutions of the Kinect v2 have been approximately doubled compared to
the Kinect v1. Also, the depth sensing mechanism has been changed from the
structured lighting IR sensing to the time of flight (ToF) technology. Accordingly,
depth artifacts of the Kinect v1 such as random holes have been alleviated
significantly, but the holes still exist around object boundaries in the Kinect v2.
In this paper, we exploit the edge information from the color image to fill the
holes in the depth image of the Kinect v2.
Keywords: Kinect v2; hole-filling; depth map, registration;
Recently, a new version of Kinect has been released from Microsoft with improved
specifications. Both color and depth images of Kinect v2 are improved not only on
resolution but also on FOV (Field Of View). Specifically, the color image of Kinect v2
has FHD (Full HD) resolution and it provides horizontally 84.1° and vertically 53.8°
FOV. Also, depth image with resolution 512x424 has horizontally 70.6° and vertically
60° FOV. The range coverage has been extended from 0.8-3.5m to 0.5-8m . Another
major change in Kinect v2 is the method of depth sensing. The comparative
specifications between Kinect v1 and Kinect v2 are summarized in Table 1 .
Table 1. Specification comparisons between Kinect v1 and Kinect v2.
Kinect v1 Kinect v2
Depth measurement Structured light coding ToF (Time of Flight)
Resolution (Color) 640480 (pixel) 19201080 (pixel)
Resolution (Depth) 320240 (pixel) 512424 (pixel)
FOV (Color) 62°48.6° 84.1°53.8°
FOV (Depth) 57.5°43.5° 70.6°60°
Maximum skeletal tracking 2 6
Working range 0.8m~3.5m 0.5m~8m
The first version of the Kinect uses a structured light pattern for depth measurements.
However, this light coding method of depth measurement causes large holes near the
object boundaries and serious interference errors for multiple Kinects. In comparison
with the Kinect v1, different depth artifacts are observed in the Kinect v2 due to the
ToF (Time of Flight) approach of depth measurement. For example, as shown in Fig.
1, random blobs of holes of Kinect v1 are reduced in Kinect v2 but there still exist holes
near the object boundaries. Specifically, the large holes of occlusion in Fig. 1(a) are
reduced as thin hole-lines on the boundary of object as in Fig. 1(b). However, as shown
in the four corners of Fig. 1(b), a new type of hole appears in the Kinect v2. Finally, we
note that we can see more objects in Fig. 1(b) thank to the expanded FOV of the Kinect
Fig. 1. Depth image comparison between Kinect v1 and v2: (a) Depth image of Kinect v1 (holes
in white); (b) Depth image of Kinect v2 (holes in black);
Our goal of this paper is to fill the holes in Kinect v2. Among various kinds of holes
in Kinect v2, we focus on the holes along the object boundaries. Our approach to fill
these boundary holes is to exploit the actual edge position and edge direction
information from the color image and to apply the directional joint bilateral filtering
method  for depth images. As a prerequisite of our approach, the color and the depth
image captured from the Kinect v2 should be precisely registered.
This paper is organized as follows. Preprocessing of our work is described in Section
2 and hole-filling algorithm is presented in Section 3. We provide experimental results
in Section 4. Then, this paper concludes in Section 5.
2 Color and Depth Registration
Before we apply the directional joint bilateral filter for hole-filling , the color and
the depth images from Kinect v2 should be registered. The registration can be done by
calibrating the two sensors with the transformation matrix between the color and the
depth images. One can use IR image provided by the Kinect v2 instead of the depth
image for registration. However, the intensity IR image cannot be directly used for
many reasons including the problems of saturation and intensity value falling off
Since the resolution and FOV of color and depth images are different, the coverage
of the depth and color images is different and the depth sensor scans only a part of color
image. So, we need the cropping of the color image before the calibration. Microsoft
provides new Kinect for Windows Software Development Kit (Kinect for Windows
SDK) with Kinect v2 to enables developers to create applications using Kinect sensor.
They provide many functions to handle Kinect, and there is a registration function
named CoordinateMapper in Visual Studio C# format. Using this function, we can crop
the color image to fit the coverage of the color image to the depth image (see Fig. 2(c)).
However, as shown in the superimposed depth and the cropped color image in Fig. 2(d),
the registration is not good enough and we can see misalignments between the depth
boundary and the color boundary. Therefore, we need a calibration matrix between the
two images before the hole-filling.
Fig. 2. Color image resizing with the factory provided function: (a) Original color image
(1920x1080); (b) Original depth image (512x424); (c) Resized color image (512x424); (d)
Superimposed edge map of color and depth images (white lines represent the edge of depth
For the calibration, we need a projective transformation that converts color image
coordinate to depth image coordinate. Since the holes in depth have zero values, it is
difficult to interpolate if we convert the depth image coordinate to the color image
coordinate. Then, the projective matrix that converts the depth image coordinate (x,y)
into the color image coordinate (X,Y) is given as follows 
In Equation (1), we have eight unknown parameters and it can be rewritten as the
X=a1xa2ya3a7xX a8yX (2)
Y=axayaa7xY a8yY . (3)
At least four corresponding pairs are needed to solve the equations for the eight
unknown coefficients. And, if we use more than four pairs of points, the system can be
solved using a least-square method such as the pseudo-inverse . We used
quadrilateral boards to find the correspondence pairs between the color and depth
images manually as in Fig. 3(a) and (b). By substituting the coordinates of the
corresponding pairs to Equation (2) and Equation (3) and solving the linear
simultaneous equations, the eight unknown coefficients can be determined. Then, using
the complete transformation matrix, color image can be aligned with depth image.
Comparing Fig. 2(d) with Fig. 3(d), the matching accuracy by our projective
transformation matrix has improved a lot.
Fig. 3. Calibration by projective transformation (Black dotted line represents correspondence
pairs): (a) Resized color image (512x424); (b) Original depth image (512x424); (c) Edge map
after registration (white line represents the edge of depth image).
3 Edge-based hole-filling method
After the registration we can superimpose the color image on the depth image and
we can exploit the actual locations of the edges from the color image to determine the
direction of the hole-filling. That is, according to the directional joint bilateral filter for
hole-filling , every pixel in depth image is classified into four groups depending on
the co-existence of edge and hole, namely non-hole/non-edge, non-hole/edge,
hole/non-edge, and hole/edge. This classification can be easily done with aligned color
and depth image. Noises in the non-hole depth map can be removed by JTF (Joint
Trilateral Filter) , and the blurring artifacts in the boundary of object are removed by
DJBF (Directional Joint Bilateral Filter). Also, the holes in the non-edge regions are
filled by PDJBF (Partial Directional Joint Bilateral Filter) and those in the edge regions
can be filled by the DJBF .
4 Experimental results
In this section, we present the results of our hole-filling method. All test images in
this paper are captured by our Kinect v2.
Fig. 4. Experimental results: (a) Resized color image (512x424); (b) Original depth image
(512x424); (c) Hole-filled depth image; (d) Resized color image (512x424); (e) Original depth
image (512x424); (f) Hole-filled depth image.
Experimental results are shown in Fig. 4 and 5. Fig. 4(a) and 4(d) are resized color
images and Fig. 4(b) and 4(e) are original depth images. By applying our hole-filling
algorithm, the holes in the region boundaries are filled as Fig. 4(c) and 4(f). Fig. 5
shows zoomed depth images at an object boundary and a corner. As one can see in the
figures, the holes at the region boundary and the corner are filled without the blurring
Fig. 5. Visual Comparison with zoomed images: (a) Holes at the object boundary; (b) Hole-filled
image; (c) Holes at a corner; (d) Hole-filled image.
And the projective transformation matrix T is obtained as in Section 2 with 12
correspondence points as follows
T= 0.9964 -0.0033 -0.00001
-0.0197 0.9879 -0.00002
-8.0255 1.5348 1 . (4)
In this paper, we have alleviated the imperfections of the Kinect v2 by filling holes
in depth image. Before applying hole-filling method, we calibrated Kinect color and
depth image due to the misalignment of two images. Using adaptive directional filter
kernel with adaptive filter range gives better hole filling results, and makes possible to
obtain clearly high quality depth image of Kinect v2.
Acknowledgments. This research was supported by the MSIP(Ministry of Science,
ICT and Future Planning), Korea, under the ITRC(Information Technology Research
Center)) support program (NIPA-2014-H0301-14-4007) supervised by the
NIPA(National IT Industry Promotion Agency).
1. Microsoft. Kinect for Windows, http://www.microsoft.com/en-us/kinectforwindows
2. El-laithy, R. A., Huang, J., Yeh, M.: Study on the use of Microsoft Kinect for robotics
applications. In: Position Location and Navigation Symposium (PLANS), pp. 1280—1288.
3. Le, A. V., Jung, S. W., Won, C. S.: Directional Joint Bilateral Filter for Depth Images.
Sensors, 14(7), 11362--11378 (2014)
4. Jung, S. W., Choi, O.: Color image enhancement using depth and intensity measurements
of a time-of-flight depth camera. Optical Engineering, 52(10), pp.103104—103104 (2013).
5. Rothwell, C. A., Forsyth, D. A., Zisserman, A., Mundy, J. L.: Extracting projective structure
from single perspective views of 3D point sets. In: Computer Vision, 1993. Proceedings,
Fourth International Conference on, pp. 573--582. IEEE (1993)
6. Golub, G., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix.
Journal of the Society for Industrial & Applied Mathematics, Series B: Numerical Analysis,
2(2), 205--224 (1965)
7. Jung, S. W.: Enhancement of image and depth map using adaptive joint trilateral filter.
Circuits and Systems for Video Technology, IEEE Transactions on, 23(2), 258--269 (2013)