Content uploaded by Dogan Kircali

Author content

All content in this area was uploaded by Dogan Kircali on Aug 20, 2015

Content may be subject to copyright.

Ground Plane Detection Using an RGB-D Sensor

Do˘

gan Kırcalı, F. Boray Tek

Abstract Ground plane detection is essential for successful navigation of vision

based mobile robots. We introduce a very simple but robust ground plane detec-

tion method based on depth information obtained using an RGB-Depth sensor. We

present two different variations of the method: the simplest works robust for setups

where the sensor pitch angle is ﬁxed and has no roll, whereas a second version can

handle changes in pitch and roll angles. The comparative experiments show that

our approach performs better than the vertical disparity approach. It produces ac-

ceptable and useful ground plane-obstacle segmentations for many difﬁcult scenes

which include many obstacles, different ﬂoor surfaces, stairs, and narrow corridors.

1 Introduction

Ground plane detection and obstacle detection are essential tasks to determine pass-

able regions for autonomous navigation. To detect the ground plane in a scene the

most common approach is to utilize depth information (i.e. depth map). Various

methods and sensors have been used to compute the depth map of the scene. Re-

cent introduction of RGB-D sensors (Red-Green-Blue-Depth) allowed affordable

and easy computation of depth maps. Microsoft Kinect is a pioneer of such sensors

which was initially marketed as a peripheral input device for computer games. It

integrates an infrared (IR) projector, a RGB camera, a monochrome IR camera, a

tilt motor and a microphone array. The device can be used to obtain 640x480 pixel

depth map and RGB video stream at a rate of 30fps.

Kinect uses an IR laser projector to cast a structured light pattern to the scene. Si-

multaneously, an image of the scene is acquired by a monochrome CMOS camera.

Robotics and Autonomous Vehicles Laboratory,

Computer Engineering Department, Is¸ık University, 34980, S¸ile, ˙

Istanbul, Turkey

e-mail: (dogan, boray)@isikun.edu.tr

http://ravlab.isikun.edu.tr

1

2 Do˘

gan Kırcalı, F. Boray Tek

The disparities between the expected and the observed patterns are used to estimate

a depth value for each pixel. Kinect works quite well for indoor environments. How-

ever, the depth reading is not reliable for regions that are far more than 4 meters; at

the boundaries of the objects because of the shadowing; reﬂective or IR absorbing

surfaces; and at the places that are illuminated directly by sunlight which causes IR

interference. Accuracy under different conditions were studied in [1, 2, 3].

Regardless of the method or the device that is used to obtain depth information

there are several works which approach to the ground plane detection problem based

on the relationship between a pixel’s position and it is disparity [4, 5, 6, 7, 8, 9]

Li et al. show that the vertical position (y) of a pixel of the ground plane is

linearly related to its disparity D(y)such that one can seek a linear equation D(y) =

K1+K2∗y, where K1 and K2 are constants which are determined by the sensor’s

intrinsic parameters, height, and tilt angle. However, ground plane can be directly

estimated on the image coordinates using the plane equation based on disparity

D(x,y) = ax+by+cwithout determining mentioned parameters. A least squares

estimation of the ground plane can be performed ofﬂine (i.e. by pre-calibration) if

a ground plane only depth image of the scene is available [5]. Another common

approach is to use RANSAC algorithm which allows ﬁtting of the ground plane

even the image includes other planes [10, 11, 4]. Since RANSAC is used to estimate

linear planes, the ground plane is assumed to be the dominant plane in the image.

There are some other works of segmentation of the scene into relevant planes

[12, 11]. The work of Holz et al. clusters surface normals to segment planes and

reported to be accurate in close ranges [11].

In [7] row histograms of the disparity image are used to model the ground plane.

In the image formed of the row histograms (named as V-disparity), the ground plane

appears as a diagonal line. This line, which is detected by Hough Transform, was

used as the ground plane model.

In this paper, we present a novel and simple algorithm to detect the ground plane

without the assumption of that it is the largest region. Our method is based on the

fact that if a pixel is from the ground plane, its depth value must be on a ratio-

nally increasing curve placed on its vertical position. However, the degree of this

rational function is not ﬁxed due to reasons which we explain later. Neverthless, it

can be easily estimated by an exponential curve ﬁt which can be used as a ground

plane model. Later, the pixels which are consistent with the model are detected as

ground plane whereas the others are marked as obstacles. While this is our base

model which can be used for a ﬁxed viewing angle scenario, we provide an ex-

tension of it for dynamic environments where sensor viewing angle changes from

frame to frame. Moreover, we note the relation of our approach to the V-disparity

approach [7], which rely on the linear increase of disparity and ﬁtting of a linear line

to model the ground plane. Thus, we provide experiments which test and compare

both approaches on the same data.

This paper is organized as follows: In Section 2 we present the proposed method.

Section 3 presents the results of the experiments. Our conclusion and future work

are presented in Section 4.

Ground Plane Detection Using an RGB-D Sensor 3

2 Method

2.1 Detection for ﬁxed pitch

In a common scenario, the sensor views the ground plane with an angle (i.e. pitch

angle). The sensor’s pitch angle (Figure 1(a)) causes allocation of more pixels for

the closer locations of the scene than the farther parts. So that linear distance from

the sensor is projected on the depth map as a rational function. This is demonstrated

by an example of the intensity coded depth map image obtained from Kinect (Figure

1(c)). Any column of the depth image will show that the depth value increases not

linearly but exponentially from bottom to top (i.e. right to left in Figure 1(d)).

In this section we assume that the sensor is ﬁxed and its roll angle is zero (Figure

1(b)). Furthermore, a “ground plane only” depth image will have all columns equal

to each other. These columns are estimable by an exponential function.

Thus, we can ﬁt a curve to any vertical line of the depth map. We found that a

good ﬁt is possible with sum of two exponential functions in the following form:

f(x) = aebx +cedx (1)

, where f(x)is the pixel’s depth value and xis the its vertical location (i.e. row

index) in the image. The coefﬁcients (a,b,c,d) depend on the intrinsic parameters,

pitch angle, and the height of the sensor.

These coefﬁcients are estimated by a least squares ﬁtting method. Then it is pos-

sible to reconstruct a curve, which we call as the reference ground plane curve (CR).

In order to detect ground plane pixels in a new depth map, the columns of the new

depth map (CU) are compared to CR. Any value that is underCRrepresents an object

(or any protrusion), whereas values above the reference curve represent drop-offs,

holes (e.g. intrusions, downstairs, edge of a table) in the scene. Hence we compare

the absolute difference against a pre-deﬁned threshold value T; mark the pixels as

ground plane if difference is less than T.

For the comparison, depth values that are zero, ignored as they indicate sensor

reading errors. The experiments concerning this part are presented in Section 3.

2.2 Detection for changing pitch and roll

The ﬁxed pitch angle scheme explained above is quite robust. However, it is not

suitable for the scenarios where the pitch and roll angles of the sensor changes.

Generally the mobile robots exhibit movements on the sensors’ platform. Pitch and

roll movements can be compensated by using an additional gyroscopic stabilization

[13]. However, here we propose a computational solution. In this approach we do

not calculate a reference ground curve from a reference pre-calibration image but

estimate it each time from the particular input frame.

4 Do˘

gan Kırcalı, F. Boray Tek

(a) (b)

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450 0

1000

2000

3000

4000

5000

6000

7000

8000

(c)

0 50 100 150 200 250 300 350 400 450 500

0

1000

2000

3000

4000

5000

6000

Vertical index

Depth value

One column

Estimated curve

(d)

0 100 200 300 400 500

0

1000

2000

3000

4000

5000

6000

7000

8000

Vertical Index

Depth Value

lower pitch angle

default pitch angle

higher pitch angle

(e) (f)

Fig. 1 (a) Roll & pitch axis, (b) sensor view pitch causes linearly spaced points to mapped as an

exponential increasing function.(c) An example depth map image, (d) one column (y=517) of the

depth map and its ﬁtted curve representing the ground plane, (e) ground plane curves for different

pitch angles, (f) depth map in three dimensions showing the drop-offs caused by the objects.

A higher pitch angle (sensor almost parallel to the ground) will increase the

slope of the ground plane curve. Whereas a non-zero roll angle (horizontal angu-

lar change) of the sensor forms different ground plane curves along columns of the

depth map (Figure 1(e)). Such that at one end the depth map exhibits curves of

higher pitch angles while towards the other end having curves of lower pitch angles.

These variations complicate the use of a single reference curve for that frame.

To overcome roll angle affects our approach aims to rotate the depth map to make

it orthogonal to the ground plane. If the sensor is orthogonal to the ground plane it

is expected to produce equal or very similar depth values along every horizontal line

(i.e. rows). And this similarity can be simply captured by calculating a histogram

of the row values such that a higher histogram peak value indicates more similar

values along a row. Let hrshows the histogram of the rth row of a depth image (D)

of Rrows, and let us denote the rotation of depth image with D

θ

.

argmax

θ

(

R

∑

r=1argmaxi(hr(i,D

θ

)) (2)

Thus for each angle value

θ

in a predeﬁned set, the depth map is rotated with an

angle

θ

and the histogram hris computed for every row r. Then, the angle

θ

that

gives the total maximum peak histogram value (summed over rows) is estimated as

the best rotation angle. This angle is used to rotate the depth map prior to the ground

plane curve estimation. After the roll affect is removed the pitch compensation curve

estimation scheme can start.

Ground Plane Detection Using an RGB-D Sensor 5

As explained, changes of pitch angle create different projection and different

curves (Figure 1(e)). Moreover, since the scene may contain obstacles we must de-

ﬁne a new approach for ground plane curve estimation.

In a scene that consists of both the ground plane and objects, as in Figure 1(f),

maximum value along a particular row of the depth map must be due the ground

plane, unless an object is covering the whole row. This is because the objects that

are closer to the sensor than the ground plane surface that they occlude. Therefore,

if the maximum value across each row (r) of the depth map (D) is taken, which

we name as the depth envelope (E), it can be used to estimate the reference ground

plane curve (CR) for this particular depth frame.

E(r) = maxi(D(ci,r)) (3)

The estimation is again performed by ﬁtting the aforementioned exponential curve

(1). Prior to the curve ﬁtting we perform median ﬁltering to smooth the depth enve-

lope. Moreover, depth values must increase exponentially from bottom of the scene

to the top. However, when the scene ends with a wall or group of obstacles this is

reﬂected as a plateau in the depth envelope. Hence the envelope (E) is scanned from

right to left and the values after the highest peak are excluded from ﬁtting as they

cannot be a part of the ground plane.

There are two conditions which affect the ground plane curve ﬁt adversely. First,

when one or more objects cover an entire row, this will produce a plateau in the

proﬁle of the depth map. However, if the rows of the “entire row covering object

or group” do not form the highest plateau in the image, ground plane continues

afterwards curve continues and the object will not affect the curve estimation.

Second, any drop-offs exhibit higher depth values than the ground plane: drop-

offs cause sudden increases (hills) on the depth envelope. If a hill is found on the

depth envelope, the estimated curve will be produced by a higher ﬁtting error.

After estimating the ground plane reference curve coefﬁcients for the frame, ev-

ery column is compared with the reference curve as it was done for Section 2.1. The

pixels are classiﬁed as ground plane and non-ground plane by comparing against a

threshold T. The value of Twas determined by overall accuracy.

3 Experiments

We run our algorithm on four different multi-frame data sets that were not used

in the development phase. The dimensions of the depth map and RGB images are

640x480. Two of these datasets (dataset-1 and dataset-2) were manually labeled to

provide ground truth and were used in plotting ROC (Receiver Operating Curves),

whereas the other two were manually (visually) examined. Dataset-1 and dataset-2

composed of 300 frames captured on a mobile robot platform which moves in the

laboratory ﬂoor among obstacles. Dataset-3 created with the same platform; how-

6 Do˘

gan Kırcalı, F. Boray Tek

ever, the pitch and roll angles change excessively. Dataset-4 included 12 individual

frames acquired from difﬁcult scenes such as narrow corridors, wall only scenes etc.

We compare three different versions for our approach: A1-ﬁxed pitch, A2-pitch

compensated, A3-pitch and roll compensated. There is only one free parameter for

A1 and A2 that is threshold T, which is estimated by ROC analysis; whereas the 3rd

roll compensation algorithm requires pre-deﬁned angle set to search for best rotation

angle: {−30◦,−28◦,..,+30◦}. Least squares ﬁt was performed by Matlab curve ﬁt-

ting function with default parameters. However, we excluded the depth values which

are equal to zero, or above 5000 due to inaccurate sensor readings. Additionally, as

explained previously, for algorithm A2 and A3 the indices positioned to left of the

maximum of the column depth value must be excluded from the ﬁts since they do

not represent ground plane. Finally, note that A1 requires a onetime pre-calibration

and estimation of the coefﬁcients for the reference ground plane curve, whereas A2

and A3 estimate coefﬁcients separately for each new frame.

Moreover, we compare the results with V-disp method [7]. We note that V-disp is

originally developed for stereo depth calculation where disparity is available before

depth. To implement V-disp method by Kinect depth stream, we calculated disparity

from the depth map (i.e. 1/D), calculated row histograms to form V-disp image, and

then run Hough transform to estimate ground plane line. We had to put a constraint

on the Hough line search in [−60◦,−30◦]range to have relevant results.

Since A3 and A2 algorithms are same except for the roll compensation, we will

examine and compare results of A2 to A1 and V-disp; however we compare A3

results only against A2 to show the effect of roll compensation scheme.

Figure 2(a) and 2(b) show ROC curves and overall accuracies plotted for our

ﬁxed and pitch compensated algorithms (A1 and A2) and V-disp method on dataset-

2. It can be seen that our pitch compensated algorithm is superior to both V-disp

which is better than our ﬁxed algorithm.

When we select best accuracy point thresholds and run our algorithms on dataset-

2, we are able to see accuracy vs. frames (Figure 2(c)). In addition we record curve

ﬁtting error for pitch compensated algorithm (A2). It can be seen that both methods

are quite stable with the exception being high curve ﬁtting error frames for A2. It is

also easy to spot these frames on live data sequences.

Beside multi-frame datasets, we included here some example single input-output

pairs (Figure 3). Here ground plane is marked with black and obstacles were marked

with white to ease viewing. In Figure 3(a), we observe a cluttered scene. Note that

its depth map contained sensor reading errors because of the lighting and reﬂective

patches (Figure 3(b)). The output of A2 is shown in right column (Figure 3(c)). It

can be seen that algorithm is quite successful in the regions where there is depth

reading. Despite that it is possible to reduce the spurious noisy detections; we show

here the raw outputs.

Figure 3(d),3(e),3(f)) show another difﬁcult scene where the robot with sensor is

positioned in front of stairs. Due to reﬂective marble ﬂoor the sensor produce many

zeros in the close ground plane. In addition, we observe many zeros in distant walls.

However, the output is quite successful in the sense that the close plan ground ﬂoor

and the edge of the stairs is correctly identiﬁed.

Ground Plane Detection Using an RGB-D Sensor 7

0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1Dataset2

False ground detection rate

True ground detection rate

V−disp

Pitch compensated method

Fixed method

(a)

0 5 10 15 20 25 30 35 40 45 50

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9 Accuracy

Threshold index

Total accuracy

Dataset2 V−disp

Dataset2 Pitch compensated

Dataset2 Fixed

(b)

0 50 100 150 200 250 300

RMSE of fit

0 50 100 150 200 250 300

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frame number

Accuracy

Pitch compensated method

V−disp

(c)

Fig. 2 a) ROC curves comparing V-disp and our ﬁxed and pitch compensated algorithms (A1-

A2), b) average accuracy over 300 frames vs. thresholds, c) accuracy and curve ﬁt error of A2 for

individual frames.

Despite that dataset-1 and 2 are similar, dataset-3 contains excessive roll changes

which were used to test roll compensation (A2 vs. A3). The outputs show that

roll compensation is able to detect and correct rotations. Figure 3(g)) show one

of the frames from dataset-3, where the sensor is rolled almost 20◦degrees. Fig-

ure 3(h),3(i) shows the respective outputs of A2 and A3. It can be seen that roll

compensation provides a signiﬁcant advantage if sensor can roll.

Finally, Figure 3(j)-3(k) shows output pairs (overlayed on RGB) for A2 and V-

disp. It can be seen that both methods can detect ground planes in scenes where

ground plane is not the largest or dominant plane. Both methods thresholds are ﬁxed

as they produce the highest respective overall accuracies in datasets 1 and 2. Note

that V-disp marked more non-passable regions as ground plane.

If the frames are buffered beforehand and worked ofﬂine, our pitch compensated

algorithm A2 processed 83 fps while running on a computer with Pentium i5 480m

processor using Matlab 2011a.

Additional experimental results and datasets can be found from our web site1.

1http://ravlab.isikun.edu.tr

8 Do˘

gan Kırcalı, F. Boray Tek

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k)

Fig. 3 Experimental results from different scenes. RGB, depth-map and pitch compensated

method output (white pixels represent objects whereas black pixels represent ground plane): (a,b,c)

lab environment with many objects and reﬂections; (d,e,f) stairs (g,h,i) respective outputs of pitch

compensated (A-2) and pitch&roll compensated method on an image where sensor was positioned

with a roll angle (A-3). (j,k) Comparison of pitch compensated (left) and V-disp method (right) in

narrow corridor

4 Conclusion

We have presented a novel, and robust ground plane detection algorithm which uses

depth information obtained from an RGB-D sensor. Our approach includes two dif-

ferent methods, where the ﬁrst one is simple but quite robust for ﬁxed pitch and

no-roll angle scenarios, whereas the second one is more suitable for dynamic envi-

ronments. Both algorithms are based on an exponential curve ﬁt to model the ground

Ground Plane Detection Using an RGB-D Sensor 9

plane which exhibits rational decreasing depth values. We compared our method to

the popular V-disp [7] method which is based on detection of a ground plane model

line by Hough transform which relied on linear increasing disparity values.

We have shown that the proposed method is better than V-disp and produces ac-

ceptable and useful ground plane-obstacle segmentations for many difﬁcult scenes,

which included many obstacles, different surfaces, stairs, and narrow corridors.

Our method can produce erroneous detections especially when the curve ﬁtting

is not successful. However, these situations are easy to detect by checking the RMS

error of the ﬁt which has been shown to be highly correlated with the accuracy of

segmentation. Our future work will include an iterative reﬁning procedure for curve

ﬁtting for the frames which are detected to produce high RMS ﬁtting errors.

A point to note is about non-planar ground surfaces that few other studies in

literature have devised strategies for [7, 6]. We assume here a planar ground plane

model which will probably cause problems if the ﬂoor has bumps or signiﬁcant

inclination or declination [7]. Our future work will focus on these aspects.

References

1. J. Stowers, M. Hayes, and A. Bainbridge-Smith. Altitude control of a quadrotor helicopter

using depth map from microsoft kinect sensor. In Mechatronics (ICM), 2011 IEEE Int. Con-

ference on, pages 358–362, April.

2. Caroline Rougier, Edouard Auvinet, Jacqueline Rousseau, Max Mignotte, and Jean Meunier.

Fall detection from depth map video sequences. In ICOST’11, pages 121–128, Berlin, Hei-

delberg, 2011.

3. Kourosh Khoshelham and Sander Oude Elberink. Accuracy and resolution of kinect depth

data for indoor mapping applications. Sensors, 12(2), 2012.

4. F Li, J M Brady, I Reid, and H Hu. Parallel image processing for object tracking using disparity

information. In In Second Asian Conference on Computer Vision ACCV ’95, pages 762–766.

5. Stephen Se and Michael Brady. Ground plane estimation, error analysis and applications.

Robotics and Autonomous Systems, 39(2):59 – 71, 2002.

6. Qian Yu, Helder Ara´

ujo, and Hong Wang. A stereovision method for obstacle detection and

tracking in non-ﬂat urban environments. Auton. Robots, 19(2):141–157, September 2005.

7. R. Labayrade, D. Aubert, and J. P Tarel. Real time obstacle detection in stereovision on non

ﬂat road geometry through ”v-disparity” representation. In Intelligent Vehicle Symposium,

2002. IEEE, volume 2, pages 646–651 vol.2, June.

8. Camillo J. Taylor and Anthony Cowley. Parsing indoor scenes using rgb-d imagery. In

Robotics: Science and Systems, July 2012.

9. K. Gong and R. Green. Ground-plane detection using stereo depth values for wheelchair

guidance. In Image and Vision Computing New Zealand, 2009. IVCNZ ’09., pages 97–101.

10. C. Zheng and R. Green. Feature recognition and obstacle detection for drive assistance in

indoor environments. In Image and Vision Computing New Zealand, 2011. IVCNZ ’11.

11. Dirk Holz, Stefan Holzer, Radu Bogdan Rusu, and Sven Behnke. Real-Time Plane Segmen-

tation using RGB-D Cameras. In Proceedings of the 15th RoboCup Int. Symposium, volume

7416, pages 307–317, Istanbul, Turkey, July 2011. Springer.

12. Can Erdogan, Manohar Paluri, and Frank Dellaert. Planar segmentation of rgbd images using

fast linear ﬁtting and markov chain monte carlo. In CRV’12, pages 32–39, 2012.

13. Luke Wang, Russel Vanderhout, and Tim Shi. Computer vision detection of negative obstacles

with the microsoft kinect. University of British Columbia. Engineering Projects Project Lab.

ENPH 459, Project Conclusion Reports, 2012.