Conference PaperPDF Available

Ground Plane Detection Using an RGB-D Sensor

Authors:

Abstract and Figures

Ground plane detection is essential for successful navigation of vision based mobile robots. We introduce a very simple but robust ground plane detection method based on depth information obtained using an RGB-Depth sensor. We present two different variations of the method: the simplest one is robust in setups where the sensor pitch angle is fixed and has no roll, whereas the second one can handle changes in pitch and roll angles. Our comparisons show that our approach performs better than the vertical disparity approach. It produces accurate ground plane-obstacle segmentation for difficult scenes, which include many obstacles, different floor surfaces, stairs, and narrow corridors.
Content may be subject to copyright.
Ground Plane Detection Using an RGB-D Sensor
Do˘
gan Kırcalı, F. Boray Tek
Abstract Ground plane detection is essential for successful navigation of vision
based mobile robots. We introduce a very simple but robust ground plane detec-
tion method based on depth information obtained using an RGB-Depth sensor. We
present two different variations of the method: the simplest works robust for setups
where the sensor pitch angle is fixed and has no roll, whereas a second version can
handle changes in pitch and roll angles. The comparative experiments show that
our approach performs better than the vertical disparity approach. It produces ac-
ceptable and useful ground plane-obstacle segmentations for many difficult scenes
which include many obstacles, different floor surfaces, stairs, and narrow corridors.
1 Introduction
Ground plane detection and obstacle detection are essential tasks to determine pass-
able regions for autonomous navigation. To detect the ground plane in a scene the
most common approach is to utilize depth information (i.e. depth map). Various
methods and sensors have been used to compute the depth map of the scene. Re-
cent introduction of RGB-D sensors (Red-Green-Blue-Depth) allowed affordable
and easy computation of depth maps. Microsoft Kinect is a pioneer of such sensors
which was initially marketed as a peripheral input device for computer games. It
integrates an infrared (IR) projector, a RGB camera, a monochrome IR camera, a
tilt motor and a microphone array. The device can be used to obtain 640x480 pixel
depth map and RGB video stream at a rate of 30fps.
Kinect uses an IR laser projector to cast a structured light pattern to the scene. Si-
multaneously, an image of the scene is acquired by a monochrome CMOS camera.
Robotics and Autonomous Vehicles Laboratory,
Computer Engineering Department, Is¸ık University, 34980, S¸ile, ˙
Istanbul, Turkey
e-mail: (dogan, boray)@isikun.edu.tr
http://ravlab.isikun.edu.tr
1
2 Do˘
gan Kırcalı, F. Boray Tek
The disparities between the expected and the observed patterns are used to estimate
a depth value for each pixel. Kinect works quite well for indoor environments. How-
ever, the depth reading is not reliable for regions that are far more than 4 meters; at
the boundaries of the objects because of the shadowing; reflective or IR absorbing
surfaces; and at the places that are illuminated directly by sunlight which causes IR
interference. Accuracy under different conditions were studied in [1, 2, 3].
Regardless of the method or the device that is used to obtain depth information
there are several works which approach to the ground plane detection problem based
on the relationship between a pixel’s position and it is disparity [4, 5, 6, 7, 8, 9]
Li et al. show that the vertical position (y) of a pixel of the ground plane is
linearly related to its disparity D(y)such that one can seek a linear equation D(y) =
K1+K2y, where K1 and K2 are constants which are determined by the sensor’s
intrinsic parameters, height, and tilt angle. However, ground plane can be directly
estimated on the image coordinates using the plane equation based on disparity
D(x,y) = ax+by+cwithout determining mentioned parameters. A least squares
estimation of the ground plane can be performed offline (i.e. by pre-calibration) if
a ground plane only depth image of the scene is available [5]. Another common
approach is to use RANSAC algorithm which allows fitting of the ground plane
even the image includes other planes [10, 11, 4]. Since RANSAC is used to estimate
linear planes, the ground plane is assumed to be the dominant plane in the image.
There are some other works of segmentation of the scene into relevant planes
[12, 11]. The work of Holz et al. clusters surface normals to segment planes and
reported to be accurate in close ranges [11].
In [7] row histograms of the disparity image are used to model the ground plane.
In the image formed of the row histograms (named as V-disparity), the ground plane
appears as a diagonal line. This line, which is detected by Hough Transform, was
used as the ground plane model.
In this paper, we present a novel and simple algorithm to detect the ground plane
without the assumption of that it is the largest region. Our method is based on the
fact that if a pixel is from the ground plane, its depth value must be on a ratio-
nally increasing curve placed on its vertical position. However, the degree of this
rational function is not fixed due to reasons which we explain later. Neverthless, it
can be easily estimated by an exponential curve fit which can be used as a ground
plane model. Later, the pixels which are consistent with the model are detected as
ground plane whereas the others are marked as obstacles. While this is our base
model which can be used for a fixed viewing angle scenario, we provide an ex-
tension of it for dynamic environments where sensor viewing angle changes from
frame to frame. Moreover, we note the relation of our approach to the V-disparity
approach [7], which rely on the linear increase of disparity and fitting of a linear line
to model the ground plane. Thus, we provide experiments which test and compare
both approaches on the same data.
This paper is organized as follows: In Section 2 we present the proposed method.
Section 3 presents the results of the experiments. Our conclusion and future work
are presented in Section 4.
Ground Plane Detection Using an RGB-D Sensor 3
2 Method
2.1 Detection for fixed pitch
In a common scenario, the sensor views the ground plane with an angle (i.e. pitch
angle). The sensor’s pitch angle (Figure 1(a)) causes allocation of more pixels for
the closer locations of the scene than the farther parts. So that linear distance from
the sensor is projected on the depth map as a rational function. This is demonstrated
by an example of the intensity coded depth map image obtained from Kinect (Figure
1(c)). Any column of the depth image will show that the depth value increases not
linearly but exponentially from bottom to top (i.e. right to left in Figure 1(d)).
In this section we assume that the sensor is fixed and its roll angle is zero (Figure
1(b)). Furthermore, a “ground plane only” depth image will have all columns equal
to each other. These columns are estimable by an exponential function.
Thus, we can fit a curve to any vertical line of the depth map. We found that a
good fit is possible with sum of two exponential functions in the following form:
f(x) = aebx +cedx (1)
, where f(x)is the pixel’s depth value and xis the its vertical location (i.e. row
index) in the image. The coefficients (a,b,c,d) depend on the intrinsic parameters,
pitch angle, and the height of the sensor.
These coefficients are estimated by a least squares fitting method. Then it is pos-
sible to reconstruct a curve, which we call as the reference ground plane curve (CR).
In order to detect ground plane pixels in a new depth map, the columns of the new
depth map (CU) are compared to CR. Any value that is underCRrepresents an object
(or any protrusion), whereas values above the reference curve represent drop-offs,
holes (e.g. intrusions, downstairs, edge of a table) in the scene. Hence we compare
the absolute difference against a pre-defined threshold value T; mark the pixels as
ground plane if difference is less than T.
For the comparison, depth values that are zero, ignored as they indicate sensor
reading errors. The experiments concerning this part are presented in Section 3.
2.2 Detection for changing pitch and roll
The fixed pitch angle scheme explained above is quite robust. However, it is not
suitable for the scenarios where the pitch and roll angles of the sensor changes.
Generally the mobile robots exhibit movements on the sensors’ platform. Pitch and
roll movements can be compensated by using an additional gyroscopic stabilization
[13]. However, here we propose a computational solution. In this approach we do
not calculate a reference ground curve from a reference pre-calibration image but
estimate it each time from the particular input frame.
4 Do˘
gan Kırcalı, F. Boray Tek
(a) (b)
100 200 300 400 500 600
50
100
150
200
250
300
350
400
450 0
1000
2000
3000
4000
5000
6000
7000
8000
(c)
0 50 100 150 200 250 300 350 400 450 500
0
1000
2000
3000
4000
5000
6000
Vertical index
Depth value
One column
Estimated curve
(d)
0 100 200 300 400 500
0
1000
2000
3000
4000
5000
6000
7000
8000
Vertical Index
Depth Value
lower pitch angle
default pitch angle
higher pitch angle
(e) (f)
Fig. 1 (a) Roll & pitch axis, (b) sensor view pitch causes linearly spaced points to mapped as an
exponential increasing function.(c) An example depth map image, (d) one column (y=517) of the
depth map and its fitted curve representing the ground plane, (e) ground plane curves for different
pitch angles, (f) depth map in three dimensions showing the drop-offs caused by the objects.
A higher pitch angle (sensor almost parallel to the ground) will increase the
slope of the ground plane curve. Whereas a non-zero roll angle (horizontal angu-
lar change) of the sensor forms different ground plane curves along columns of the
depth map (Figure 1(e)). Such that at one end the depth map exhibits curves of
higher pitch angles while towards the other end having curves of lower pitch angles.
These variations complicate the use of a single reference curve for that frame.
To overcome roll angle affects our approach aims to rotate the depth map to make
it orthogonal to the ground plane. If the sensor is orthogonal to the ground plane it
is expected to produce equal or very similar depth values along every horizontal line
(i.e. rows). And this similarity can be simply captured by calculating a histogram
of the row values such that a higher histogram peak value indicates more similar
values along a row. Let hrshows the histogram of the rth row of a depth image (D)
of Rrows, and let us denote the rotation of depth image with D
θ
.
argmax
θ
(
R
r=1argmaxi(hr(i,D
θ
)) (2)
Thus for each angle value
θ
in a predefined set, the depth map is rotated with an
angle
θ
and the histogram hris computed for every row r. Then, the angle
θ
that
gives the total maximum peak histogram value (summed over rows) is estimated as
the best rotation angle. This angle is used to rotate the depth map prior to the ground
plane curve estimation. After the roll affect is removed the pitch compensation curve
estimation scheme can start.
Ground Plane Detection Using an RGB-D Sensor 5
As explained, changes of pitch angle create different projection and different
curves (Figure 1(e)). Moreover, since the scene may contain obstacles we must de-
fine a new approach for ground plane curve estimation.
In a scene that consists of both the ground plane and objects, as in Figure 1(f),
maximum value along a particular row of the depth map must be due the ground
plane, unless an object is covering the whole row. This is because the objects that
are closer to the sensor than the ground plane surface that they occlude. Therefore,
if the maximum value across each row (r) of the depth map (D) is taken, which
we name as the depth envelope (E), it can be used to estimate the reference ground
plane curve (CR) for this particular depth frame.
E(r) = maxi(D(ci,r)) (3)
The estimation is again performed by fitting the aforementioned exponential curve
(1). Prior to the curve fitting we perform median filtering to smooth the depth enve-
lope. Moreover, depth values must increase exponentially from bottom of the scene
to the top. However, when the scene ends with a wall or group of obstacles this is
reflected as a plateau in the depth envelope. Hence the envelope (E) is scanned from
right to left and the values after the highest peak are excluded from fitting as they
cannot be a part of the ground plane.
There are two conditions which affect the ground plane curve fit adversely. First,
when one or more objects cover an entire row, this will produce a plateau in the
profile of the depth map. However, if the rows of the “entire row covering object
or group” do not form the highest plateau in the image, ground plane continues
afterwards curve continues and the object will not affect the curve estimation.
Second, any drop-offs exhibit higher depth values than the ground plane: drop-
offs cause sudden increases (hills) on the depth envelope. If a hill is found on the
depth envelope, the estimated curve will be produced by a higher fitting error.
After estimating the ground plane reference curve coefficients for the frame, ev-
ery column is compared with the reference curve as it was done for Section 2.1. The
pixels are classified as ground plane and non-ground plane by comparing against a
threshold T. The value of Twas determined by overall accuracy.
3 Experiments
We run our algorithm on four different multi-frame data sets that were not used
in the development phase. The dimensions of the depth map and RGB images are
640x480. Two of these datasets (dataset-1 and dataset-2) were manually labeled to
provide ground truth and were used in plotting ROC (Receiver Operating Curves),
whereas the other two were manually (visually) examined. Dataset-1 and dataset-2
composed of 300 frames captured on a mobile robot platform which moves in the
laboratory floor among obstacles. Dataset-3 created with the same platform; how-
6 Do˘
gan Kırcalı, F. Boray Tek
ever, the pitch and roll angles change excessively. Dataset-4 included 12 individual
frames acquired from difficult scenes such as narrow corridors, wall only scenes etc.
We compare three different versions for our approach: A1-fixed pitch, A2-pitch
compensated, A3-pitch and roll compensated. There is only one free parameter for
A1 and A2 that is threshold T, which is estimated by ROC analysis; whereas the 3rd
roll compensation algorithm requires pre-defined angle set to search for best rotation
angle: {−30,28,..,+30}. Least squares fit was performed by Matlab curve fit-
ting function with default parameters. However, we excluded the depth values which
are equal to zero, or above 5000 due to inaccurate sensor readings. Additionally, as
explained previously, for algorithm A2 and A3 the indices positioned to left of the
maximum of the column depth value must be excluded from the fits since they do
not represent ground plane. Finally, note that A1 requires a onetime pre-calibration
and estimation of the coefficients for the reference ground plane curve, whereas A2
and A3 estimate coefficients separately for each new frame.
Moreover, we compare the results with V-disp method [7]. We note that V-disp is
originally developed for stereo depth calculation where disparity is available before
depth. To implement V-disp method by Kinect depth stream, we calculated disparity
from the depth map (i.e. 1/D), calculated row histograms to form V-disp image, and
then run Hough transform to estimate ground plane line. We had to put a constraint
on the Hough line search in [60,30]range to have relevant results.
Since A3 and A2 algorithms are same except for the roll compensation, we will
examine and compare results of A2 to A1 and V-disp; however we compare A3
results only against A2 to show the effect of roll compensation scheme.
Figure 2(a) and 2(b) show ROC curves and overall accuracies plotted for our
fixed and pitch compensated algorithms (A1 and A2) and V-disp method on dataset-
2. It can be seen that our pitch compensated algorithm is superior to both V-disp
which is better than our fixed algorithm.
When we select best accuracy point thresholds and run our algorithms on dataset-
2, we are able to see accuracy vs. frames (Figure 2(c)). In addition we record curve
fitting error for pitch compensated algorithm (A2). It can be seen that both methods
are quite stable with the exception being high curve fitting error frames for A2. It is
also easy to spot these frames on live data sequences.
Beside multi-frame datasets, we included here some example single input-output
pairs (Figure 3). Here ground plane is marked with black and obstacles were marked
with white to ease viewing. In Figure 3(a), we observe a cluttered scene. Note that
its depth map contained sensor reading errors because of the lighting and reflective
patches (Figure 3(b)). The output of A2 is shown in right column (Figure 3(c)). It
can be seen that algorithm is quite successful in the regions where there is depth
reading. Despite that it is possible to reduce the spurious noisy detections; we show
here the raw outputs.
Figure 3(d),3(e),3(f)) show another difficult scene where the robot with sensor is
positioned in front of stairs. Due to reflective marble floor the sensor produce many
zeros in the close ground plane. In addition, we observe many zeros in distant walls.
However, the output is quite successful in the sense that the close plan ground floor
and the edge of the stairs is correctly identified.
Ground Plane Detection Using an RGB-D Sensor 7
0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1Dataset2
False ground detection rate
True ground detection rate
V−disp
Pitch compensated method
Fixed method
(a)
0 5 10 15 20 25 30 35 40 45 50
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9 Accuracy
Threshold index
Total accuracy
Dataset2 V−disp
Dataset2 Pitch compensated
Dataset2 Fixed
(b)
0 50 100 150 200 250 300
RMSE of fit
0 50 100 150 200 250 300
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frame number
Accuracy
Pitch compensated method
V−disp
(c)
Fig. 2 a) ROC curves comparing V-disp and our fixed and pitch compensated algorithms (A1-
A2), b) average accuracy over 300 frames vs. thresholds, c) accuracy and curve fit error of A2 for
individual frames.
Despite that dataset-1 and 2 are similar, dataset-3 contains excessive roll changes
which were used to test roll compensation (A2 vs. A3). The outputs show that
roll compensation is able to detect and correct rotations. Figure 3(g)) show one
of the frames from dataset-3, where the sensor is rolled almost 20degrees. Fig-
ure 3(h),3(i) shows the respective outputs of A2 and A3. It can be seen that roll
compensation provides a significant advantage if sensor can roll.
Finally, Figure 3(j)-3(k) shows output pairs (overlayed on RGB) for A2 and V-
disp. It can be seen that both methods can detect ground planes in scenes where
ground plane is not the largest or dominant plane. Both methods thresholds are fixed
as they produce the highest respective overall accuracies in datasets 1 and 2. Note
that V-disp marked more non-passable regions as ground plane.
If the frames are buffered beforehand and worked offline, our pitch compensated
algorithm A2 processed 83 fps while running on a computer with Pentium i5 480m
processor using Matlab 2011a.
Additional experimental results and datasets can be found from our web site1.
1http://ravlab.isikun.edu.tr
8 Do˘
gan Kırcalı, F. Boray Tek
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k)
Fig. 3 Experimental results from different scenes. RGB, depth-map and pitch compensated
method output (white pixels represent objects whereas black pixels represent ground plane): (a,b,c)
lab environment with many objects and reflections; (d,e,f) stairs (g,h,i) respective outputs of pitch
compensated (A-2) and pitch&roll compensated method on an image where sensor was positioned
with a roll angle (A-3). (j,k) Comparison of pitch compensated (left) and V-disp method (right) in
narrow corridor
4 Conclusion
We have presented a novel, and robust ground plane detection algorithm which uses
depth information obtained from an RGB-D sensor. Our approach includes two dif-
ferent methods, where the first one is simple but quite robust for fixed pitch and
no-roll angle scenarios, whereas the second one is more suitable for dynamic envi-
ronments. Both algorithms are based on an exponential curve fit to model the ground
Ground Plane Detection Using an RGB-D Sensor 9
plane which exhibits rational decreasing depth values. We compared our method to
the popular V-disp [7] method which is based on detection of a ground plane model
line by Hough transform which relied on linear increasing disparity values.
We have shown that the proposed method is better than V-disp and produces ac-
ceptable and useful ground plane-obstacle segmentations for many difficult scenes,
which included many obstacles, different surfaces, stairs, and narrow corridors.
Our method can produce erroneous detections especially when the curve fitting
is not successful. However, these situations are easy to detect by checking the RMS
error of the fit which has been shown to be highly correlated with the accuracy of
segmentation. Our future work will include an iterative refining procedure for curve
fitting for the frames which are detected to produce high RMS fitting errors.
A point to note is about non-planar ground surfaces that few other studies in
literature have devised strategies for [7, 6]. We assume here a planar ground plane
model which will probably cause problems if the floor has bumps or significant
inclination or declination [7]. Our future work will focus on these aspects.
References
1. J. Stowers, M. Hayes, and A. Bainbridge-Smith. Altitude control of a quadrotor helicopter
using depth map from microsoft kinect sensor. In Mechatronics (ICM), 2011 IEEE Int. Con-
ference on, pages 358–362, April.
2. Caroline Rougier, Edouard Auvinet, Jacqueline Rousseau, Max Mignotte, and Jean Meunier.
Fall detection from depth map video sequences. In ICOST’11, pages 121–128, Berlin, Hei-
delberg, 2011.
3. Kourosh Khoshelham and Sander Oude Elberink. Accuracy and resolution of kinect depth
data for indoor mapping applications. Sensors, 12(2), 2012.
4. F Li, J M Brady, I Reid, and H Hu. Parallel image processing for object tracking using disparity
information. In In Second Asian Conference on Computer Vision ACCV ’95, pages 762–766.
5. Stephen Se and Michael Brady. Ground plane estimation, error analysis and applications.
Robotics and Autonomous Systems, 39(2):59 – 71, 2002.
6. Qian Yu, Helder Ara´
ujo, and Hong Wang. A stereovision method for obstacle detection and
tracking in non-flat urban environments. Auton. Robots, 19(2):141–157, September 2005.
7. R. Labayrade, D. Aubert, and J. P Tarel. Real time obstacle detection in stereovision on non
flat road geometry through ”v-disparity” representation. In Intelligent Vehicle Symposium,
2002. IEEE, volume 2, pages 646–651 vol.2, June.
8. Camillo J. Taylor and Anthony Cowley. Parsing indoor scenes using rgb-d imagery. In
Robotics: Science and Systems, July 2012.
9. K. Gong and R. Green. Ground-plane detection using stereo depth values for wheelchair
guidance. In Image and Vision Computing New Zealand, 2009. IVCNZ ’09., pages 97–101.
10. C. Zheng and R. Green. Feature recognition and obstacle detection for drive assistance in
indoor environments. In Image and Vision Computing New Zealand, 2011. IVCNZ ’11.
11. Dirk Holz, Stefan Holzer, Radu Bogdan Rusu, and Sven Behnke. Real-Time Plane Segmen-
tation using RGB-D Cameras. In Proceedings of the 15th RoboCup Int. Symposium, volume
7416, pages 307–317, Istanbul, Turkey, July 2011. Springer.
12. Can Erdogan, Manohar Paluri, and Frank Dellaert. Planar segmentation of rgbd images using
fast linear fitting and markov chain monte carlo. In CRV’12, pages 32–39, 2012.
13. Luke Wang, Russel Vanderhout, and Tim Shi. Computer vision detection of negative obstacles
with the microsoft kinect. University of British Columbia. Engineering Projects Project Lab.
ENPH 459, Project Conclusion Reports, 2012.
... Ground plane extraction and obstacle detection are essential tasks for collision-free navigation of autonomous mobile robots [11]. One of the most common methods to detect the ground plane in a scene is based in processing depth information. ...
... RANSAC is an iterative algorithm for fitting mathematical models to experimental data. Instead of considering all the data and eliminating invalid data points, as common in most conventional smoothing techniques, RANSAC uses the minimum amount of data possible and adds as consistent an amount of data points as possible [11]. This method takes the minimum number of points that define a particular model from the experimental set and instantiates a model using those points, (in the case of plane detection, RANSAC selects three random points from the set). ...
Article
Full-text available
Autonomous driving is one of the fastest developing fields of robotics. With the ever-growing interest in autonomous driving, the ability to provide robots with both efficient and safe navigation capabilities is of paramount significance. With the continuous development of automation technology, higher levels of autonomous driving can be achieved with vision-based methodologies. Moreover, materials handling in industrial assembly lines can be performed efficiently using automated guided vehicles (AGVs). However, the visual perception of industrial environments is complex due to the existence of many obstacles in pre-defined routes. With the INDTECH 4.0 project, we aim to develop an autonomous navigation system, allowing the AGV to detect and avoid obstacles based in the processing of depth data acquired with a frontal depth camera mounted on the AGV. Applying the RANSAC (random sample consensus) and Euclidean clustering algorithms to the 3D point clouds captured by the camera, we can isolate obstacles from the ground plane and separate them into clusters. The clusters give information about the location of obstacles with respect to the AGV position. In experiments conducted outdoors and indoors, the results revealed that the method is very effective, returning high percentages of detection for most tests.
... Red, green, blue, depth (RGBD) cameras combine standard RGB data with an infrared-based depth sensor, obviating the computationally heavy task of reconstructing depth. These sensors capture color-attributed (X, Y, Z) surface data at ranges up to 6 m with angular resolution of ∼5 pixels per degree at rates up to 30 Hz. Real-time odometry from RGBD sensor data has been shown to be reliable at close distances [1,2], making it a promising candidate for integration in a fused state navigational system. RGBD odometry algorithms typically match corresponding three-dimensional (3-D) (X, Y, Z) surface measurements between sequential images measured in the camera coordinate system. ...
... The RGBD reconstruction and measurement noise equations [Eqs. (1) and (2), respectively] provide a statistical model for expected (X, Y, Z) RGBD measurement errors. Broadly, the proposed approach for estimating covariance at any given point along the vehicle's path is to apply N instances of this noise model to the measured 3-D point clouds, creating N similar but distinct noisy point cloud pairs from which a statistical covariance can be calculated. ...
Article
Demand is growing for unmanned air vehicles (UAVs) with greater autonomy, including the ability to navigate without GPS information, such as indoors. In this work, a novel visual odometry algorithm is developed and flight tested. It uses sequential pairs of red, green, blue, depth (RGBD) camera images to estimate the UAV’s change in position (delta pose), which can be used to aid a navigation filter. Unlike existing related techniques, it uses a novel perturbation approach to estimate the uncertainty of the odometry measurement dynamically in real time, a technique that is applicable to a wide range of sensor preprocessing tasks aimed at generating navigation-relevant measurements. Real-time estimates of the delta pose and its covariance allow these estimates to be efficiently fused with other sensors in a navigation filter. Indoor flight testing was performed with motion capture, which demonstrated that the odometry and covariance estimates are accurate when appropriately scaled. Flights also demonstrated the algorithm used in a navigation filter to improve a velocity estimate, which represents a significant improvement over the state of the art for RGBD odometry.
... e other 2D approaches used depth-image data or histogram of the disparity map [11] instead of traditional RGB image data [12,13], and Jin et al. [14] proposed a ground plane detection method based on depth map driven, which grows a plane from the largest area having similar depth values in the depth map, and the largest plane is considered to be the ground plane. Kircali and Tek [15] estimated the ground plane by comparing the depth map of new coming frame with a precalibrated depth map in which the ground plane was predefined. Skulimowski et al. [16] used the gradient of the V-disparity pixel values to detect ground plane which has an arbitrary camera roll angle. ...
Article
Full-text available
Moving camera-based object tracking method for the intelligent transportation system (ITS) has drawn increasing attention. The unpredictability of driving environments and noise from the camera calibration, however, make conventional ground plane estimation unreliable and adversely affecting the tracking result. In this paper, we propose an object tracking system using an adaptive ground plane estimation algorithm, facilitated with constrained multiple kernel (CMK) tracking and Kalman filtering, to continuously update the location of moving objects. The proposed algorithm takes advantage of the structure from motion (SfM) to estimate the pose of moving camera, and then the estimated camera’s yaw angle is used as a feedback to improve the accuracy of the ground plane estimation. To further robustly and efficiently tracking objects under occlusion, the constrained multiple kernel tracking technique is adopted in the proposed system to track moving objects in 3D space (depth). The proposed system is evaluated on several challenging datasets, and the experimental results show the favorable performance, which not only can efficiently track on-road objects in a dashcam equipped on a free-moving vehicle but also can well handle occlusion in the tracking.
... Unlike monocular vision, the same object on a stereo vision image has constant depth values if the data acquisition is adequately calibrated [16]. The V-disparity image method [17], variable-width-baseline [18], and other stereo methods have been proposed in recent years. The universal methodology of these methods is to reconstructing 3D point cloud data and grouping them into objects. ...
Article
Full-text available
This study aims to utilize a modified you only look once (YOLO) network to address the detection and classification of spilled loads on freeways. YOLO architecture was augmented in two ways. Firstly, a kernel size of 1 × 1 for the conv layers was used. Secondly, the use of connections between the convolution layers was proposed. For training the network, a synthetic dataset was constructed where ImageNet was used to choose ten types of spilled load objects and KITTI dataset as the background. The objects were blended in the KITTI images' road region, where the road area is segmented through an already trained network previously available. The testing dataset was constructed with manually taken photographs. Experiment results showed that the training model can arrive at an accuracy rate of 74%. The trained model was also demonstrated on the test set generated by taking the background images with camera mounted on a station wagon and on a field test.
... For hand-held situation, if an Inertial Measurement Unit (IMU) is available beside the RGB-D camera, the gravity direction could be estimated from the IMU more flexibly to the environments. Otherwise, there is also a proposed robust method to estimate the ground plane [43]. ...
Article
Full-text available
Indoor service robots need to build an object-centric semantic map to understand and execute human instructions. Conventional visual simultaneous localization and mapping (SLAM) systems build a map using geometric features such as points, lines, and planes as landmarks. However, they lack a semantic understanding of the environment. This paper proposes an object-level semantic SLAM algorithm based on RGB-D data, which uses a quadric surface as an object model to compactly represent the object’s position, orientation, and shape. This paper proposes and derives two types of RGB-D camera-quadric observation models: a complete model and a partial model. The complete model combines object detection and point cloud data to estimate a complete ellipsoid in a single RGB-D frame. The partial model is activated when the depth data is severely missing because of illuminations or occlusions, which uses bounding boxes from object detection to constrain objects. Compared with the state-of-the-art quadric SLAM algorithms that use a monocular observation model, the RGB-D observation model reduces the requirements of the observation number and viewing angle changes, which helps improve the accuracy and robustness. This paper introduces a nonparametric pose graph to solve data associations in the back end, and innovatively applies it to the quadric surface model. We thoroughly evaluated the algorithm on two public datasets and an author-collected mobile robot dataset in a home-like environment. We obtained obvious improvements on the localization accuracy and mapping effects compared with two state-of-the-art object SLAM algorithms.
... Zhi Jin et al. [32] proposed a depth-map driven ground plane detection algorithm by growing a plane starting from the the largest area having similar depth values in the depth map, assuming the largest plane was the ground plane. Kircali and Tek [33] estimated the ground plane based on comparing the depth map of each new frame with a pre-calibrated depth map in which the ground plane was pre-defined. Assuming the majority area in the scene comprises the ground plane, the gradient of the V-disparity pixel values has also been successfully used to identify the ground plane with an arbitrary camera roll angle [23]. ...
Article
Full-text available
Identifying the orientation and location of a camera placed arbitrarily in a room is a challenging problem. Existing approaches impose common assumptions (e.g. the ground plane is the largest plane in the scene, the camera roll angle is zero). We present a method for estimating the ground plane and camera orientation in an unknown indoor environment given RGB-D data (colour and depth) from a camera with arbitrary orientation and location assuming that at least one person can be seem smoothly moving within the camera field of view with their body perpendicular to the ground plane. From a set of RGBD data trials captured using a Kinect sensor, we develop an approach to identify potential ground planes, cluster objects in the scenes and find 2D Scale-Invariant Feature Transform (SIFT) [48] keypoints for those objects, and then build a motion sequence for each object by evaluating the intersection of each object’s histogram in three dimensions across frames. After finding the reliable homography for all objects, we identify the moving human object by checking the change in the histogram intersection, object dimensions and the trajectory vector of the homgraphy decomposition. We then estimate the ground plane from the potential planes using the normal vector of the homography decomposition, the trajectory vector, and the spatial relationship of the planes to the other objects in the scene. Our results show that the ground plane can be successfully detected, if visible, regardless of camera orientation, ground plane size, and movement speed of the human. We evaluated our approach on our own data and on three public datasets, robustly estimating the ground plane in all indoor scenarios. Our successful approach substantially reduces restrictions on a prior knowledge of the ground plane, and has broad application in conditions where environments are dynamic and cluttered, as well as fields such as automated robotics, localization and mapping.
... Floor detection is a crucial step for obstacles detection. Authors in [8] introduced two methods for ground detection based on depth information. The simplest one is robust but assumes that the sensor pitch angle is fixed and has no roll, whereas the second one can handle changes in pitch and roll angles. ...
Conference Paper
Full-text available
This work is devoted to scene understanding and motion ability improvement for visually impaired and blind people. We investigate how to exploit egocentric vision to provide semantic labeling of scene from head-mounted depth camera. More specifically, we propose a new method for locating ground from depth image whatever the camera's pose. The rest of planes of the scene are located using RANSAC method, semantically coded by their attributes and mapped as cylinders into a generated 3D scene which will serve as a feedback to users. Experiments are conducted and the obtained results are discussed.
... Mufti et al. [10] have formulated in their work for Advanced Driver Assistance Systems an algorithm for ground estimation by using 3D video data, captured by the ToF camera, whereas [11] uses a simple method, optimized for floor detection. Ground plane detection using an RGB-D sensor by means of curve fitting is addressed in [12] by Kircali. More generally, RANdom SAmple Consensus (RANSAC) is often used to fit planes [13]. ...
Conference Paper
Full-text available
We present in this paper a fast and simple free floor detection method. Compared to existing methods the proposed method can handle non-planar camera motion by means of a Three-Step procedure. A fast initial segmentation is followed by an intermediate floor plane estimation to adapt to the camera motion. Then, a final segmentation is done using the estimated plane. This allows for correct segmentation even when the camera moves up and down, tilts or rolls. Outdoor measurements of a road surface were performed with a Time-of-Flight camera mounted in front of a car. The measurements contain three types of road surface: concrete, stone and asphalt. The proposed segmentation takes less than \(1.25\,\)ms per frame for range images with a resolution of 176 by 132, making it fit for real-time applications. The resulting accuracy is higher than the state of the art.
... used in combination with various techniques: direct itting of depth readings to an exponential curve [21], Hough transformation [24] or clustering points with respect to directions of their difference vectors [26] or RANSAC [23]. With no additional information needed, the methods are general and may be used in various applications. ...
Article
Full-text available
The paper presents a set of software tools dedicated to support mobile robot navigation. The tools are used to process an image from a depth sensor. They are imple­mented in ROS framework and they are compatible with standard ROS navigation packages. The software is relea­sed with an open source licence. First of the tools converts a 3D depth image to a 2D scan in polar coordinates. It provides projection of the obstacles, removes the ground plane from the image and compensates sensor tilt an­gle. The node is faster than the standard node within ROS and it has additional functions increasing range of possi­ble applications. The second tool allows detection of ne­gative obstacles i.e. located below the ground plane le­vel. The third tool estimates height and orientation of the sensor with RANSAC algorithm applied to the depth image. The paper presents also the results of usage of the tools with mobile platforms equipped with Microsoft Kinect sensors. The platforms are elements of the ReMeDi project within which the software was developed. © 2017, Industrial Research Institute for Automation and Measurements. All rights reserved.
Chapter
Pedestrian behavior is an essential subject of study when developing or enhancing urban infrastructure. However, most behavior elicitation techniques are inherently bound to be biased by either the observer, the subject, or the environment. The SIMUSAFE project aims at collecting road users’ behavioral data in naturalistic and realistic scenarios to produce more accurate decision-making models. Using video captured from a monocular camera worn by a pedestrian, we employ machine learning and computer vision techniques to identify areas of interest surrounding a pedestrian. Namely, we use object detection and depth estimation to generate a map of obstacles that may influence the pedestrian’s actions. Our methods have shown to be successful in detecting free and occupied areas from monocular video.
Article
Full-text available
With the advent of affordable RGBD sensors such as the Kinect, the collection of depth and appearance information from a scene has become effortless. However, neither the correct noise model for these sensors, nor a principled methodology for extracting planar segmentations has been developed yet. In this work, we advance the state of art with the following contributions: we correctly model the Kinect sensor data by observing that the data has inherent noise only over the measured disparity values, we formulate plane fitting as a linear least-squares problem that allow us to quickly merge different segments, and we apply an advanced Markov Chain Monte Carlo (MCMC) method, generalized Swendsen-Wang sampling, to efficiently search the space of planar segmentations. We evaluate our plane fitting and surface reconstruction algorithms with simulated and real-world data.
Conference Paper
Full-text available
Real-time 3D perception of the surrounding environment is a crucial precondition for the reliable and safe application of mobile service robots in domestic environments. Using a RGB-D camera, we present a system for acquiring and processing 3D (semantic) information at frame rates of up to 30Hz that allows a mobile robot to reliably detect obstacles and segment graspable objects and supporting surfaces as well as the overall scene geometry. Using integral images, we compute local surface normals. The points are then clustered, segmented, and classified in both normal space and spherical coordinates. The system is tested in different setups in a real household environment. The results show that the system is capable of reliably detecting obstacles at high frame rates, even in case of obstacles that move fast or do not considerably stick out of the ground. The segmentation of all planes in the 3D data even allows for correcting characteristic measurement errors and for reconstructing the original scene geometry in far ranges.
Conference Paper
Full-text available
Falls are one of the major risks for seniors living alone at home. Computer vision systems, which do not require to wear sensors, offer a new and promising solution for fall detection. In this work, an occlusion robust method is presented based on two features: human centroid height relative to the ground and body velocity. Indeed, the first feature is an efficient solution to detect falls as the vast majority of falls ends on the ground or near the ground. However, this method can fail if the end of the fall is completely occluded behind furniture. Fortunately, these cases can be managed by using the 3D person velocity computed just before the occlusion. Keywordsfall detection–video surveillance–computer vision–3D–depth map
Article
This paper presents an approach to parsing the Manhattan structure of an indoor scene from a single RGB-D frame. The problem of recovering the floor plan is recast as an optimal labeling problem which can be solved efficiently using Dynamic Programming.
Article
The paper presents a robust indoor feature recognition and vision-based obstacle detection algorithm. A method is proposed using the fusion of colour features, edge map, range information and motion analysis to help the system interpret visual cues. The system is able to detect ground plane, drop-offs, stairs, open doors and obstacles, and is able to provide motion information. The results showing accurate indoor feature recognition and accurate distances to various detected indoor features, suggest that this proposed colour/edge/motion/depth approach would be useful as a navigation aid through doorways and hallways.
Article
This paper proposes a plane detection algorithm from stereo camera depth data, using it to find the ground plane. The proposed algorithm ran at 30 frames per second and can successfully detect a ground plane. A simple PID wheelchair controller is implemented to correct for any heading adjustment if required based on ground plane geometries with respect to the wall.
Conference Paper
Reliable depth estimation is a cornerstone of many autonomous robotic control systems. The Microsoft Kinect is a new, low cost, commodity game controller peripheral that calculates a depth map of the environment with good accuracy and high rate. In this paper we calibrate the Kinect depth and image sensors and then use the depth map to control the altitude of a quadrotor helicopter. This paper presents the first results of using this sensor in a real-time robotics control application.
Article
Ground plane perception is of vital importance to human mobility. In order to develop a stereo-based mobility aid for the partially sighted, we model the ground plane based on disparity and analyze its uncertainty. Because the mobility aid is to be mounted on a person, the cameras will be moving around while the person is walking. By calibrating the ground plane at each frame, we show that a partial pose estimate can be recovered. Moreover, by keeping track of how the ground plane changes and analyzing the ground plane, we show that obstacles and curbs are detected. Detailed error analysis has been carried out as reliability is of utmost importance for human applications.