Conference PaperPDF Available

Robust road detection from a single image using road shape prior

  • The Beijing Institute of Basic Medical Sciences

Abstract and Figures

Many road detection algorithms require pre-learned information, which may be unreliable as the road scene is usually unexpectable. Single image based (i.e., without any pre-learned information) road detection techniques can be adopted to overcome this problem, while their robustness needs improving. To achieve robust road detection from a single image, this paper proposes a general road shape prior to enforce the detected region to be road-shaped by encoding the prior into a graph-cut segmentation framework, where the training data is automatically generated from a predicted road region of the current image. By iteratively performing the graph-cut segmentation, an accurate road region will be obtained. Quantitative and qualitative experiments on the challenging SUN Database validate the robustness and efficiency of our method. We believe that the road shape prior can also be used to yield improvements for many other road detection algorithms.
Content may be subject to copyright.
Zhen He, Tao Wu, Zhipeng Xiao, Hangen He
College of Mechatronics and Automation
National University of Defense Technology
Changsha, Hunan, P. R. China
Many road detection algorithms require pre-learned information,
which may be unreliable as the road scene is usually unexpectable.
Single image based (i.e., without any pre-learned information) road
detection techniques can be adopted to overcome this problem,
while their robustness needs improving.
To achieve robust road detection from a single image, this paper
proposes a general road shape prior to enforce the detected region to
be road-shaped by encoding the prior into a graph-cut segmentation
framework, where the training data is automatically generated from a
predicted road region of the current image. By iteratively performing
the graph-cut segmentation, an accurate road region will be obtained.
Quantitative and qualitative experiments on the challenging
SUN Database validate the robustness and efficiency of our method.
We believe that the road shape prior can also be used to yield
improvements for many other road detection algorithms.
Index TermsRoad detection, shape prior, graph cuts
Vision-based road detection is of high relevance for autonomous
driving and pedestrian detection. Detecting the road from an on-
board road image is very challenging due to the diversity of road
shapes (e.g., straight, left/right curve, and winding), road materials
(e.g., concrete, soil, and asphalt), and backgrounds (e.g., street, for-
est, and mountain), and the noise induced by varying illumination,
different weather, object occlusion, etc., as shown in Fig. 1. With the
information learned off-line from some specific road scenes, many
proposed segmentation algorithms [1, 2, 3, 4] perform well for the
same road scenes but poorly for the others. However, the off-line
learning needs to label training data manually which is time con-
suming and the road scene is usually unknown, especially on mov-
ing vehicles. Hence, road detection from a single image (i.e. without
any pre-learned information) is needed to overcome these problems
and thus will be more challenging.
There has been some works on single image based road detec-
tion. In [5], Kong et al. use a vanishing-point-constrained edge de-
tection technique to acquire straight road boundaries. This method
may be inapplicable when there is no legible vanishing point (e.g.,
on the ascent), or the road boundaries are curved. To deal with com-
plex road shapes, Alvarez et al. [6] use a likelihood-based region
growing method to label pixels. It is factually a local method and
not robust to noise, which may cause overgrowth or undergrowth.
More recently, in [7], general information off-line learned from oth-
er road scenes is combined with the information on-line learned from
current image to enhance the detection robustness. However, in its
on-line learning procedure, the training data is taken from a fixed
region (i.e. the central-bottom of the image), thus the road patterns
may not be well-learned.
(a) (b) (c) (d) (e) (f)
Fig. 1. Examples of different roads with: (a) Different shapes, (b)
Different materials, (c) Different backgrounds, (d) Different illumi-
nation, (e) Different weather, and (f) Object occlusion.
Aiming at utilizing the prior knowledge to improve the robust-
ness of single image based road detection, we propose a general road
shape prior to enforce the detected road region to be road-shaped
by encoding this prior into a graph-cut segmentation framework,
where the labels are automatically generated from the current im-
age to model the road. The final detected road region is acquired by
iteratively performing the graph cuts, where the last detected road
region is iterated to a current predicted one that is used for prior
encoding and label generating and initialized as a semicircle at the
central-bottom of the image.
The road shape prior is based on the observation that in on-board
road images, the varied road regions should have some shape con-
straints making them look like road, and thus the detected road re-
gions should also satisfy these constraints. Unlike previous methods
using limited templates that are hard to cover varied road shapes and
need matching, our approach benefits from a general road shape pri-
or that is more natural and can be encoded in a unified graph-cut
framework; unlike previous online learning methods generating la-
bels from the current image in one shot, it progressively generates a
better batch of labels automatically by iterative graph cuts and thus
builds a better model.
An outline of the paper follows. In Sec. 2 we demonstrate how
the basic graph-cut segmentation can be used for road detection, in
Sec. 3 we explain how to incorporate the road shape prior in the
graph-cut segmentation framework, and in Sec. 4 we detail the iter-
ative graph cuts. Experimental results and discussion are presented
in Sec. 5, while conclusions are given in Sec. 6.
Road detection can be formulated as a binary labeling problem trying
to assign a label (road or background) to each pixel. A popular way
to such problems is to use the efficient graph-cut segmentation [8].
Let yi∈ {0 (background),1 (road)}be the label of pixel iof
the image, Pbe the set of all pixels, and Nbe the set of all neigh-
boring pixel pairs. Then y={yi|i∈ P } is the collection of all
2757978-1-4799-2341-0/13/$31.00 ©2013 IEEE ICIP 2013
label assignments and defines a segmentation. The basic graph-cut
optimizes yby minimizing the energy function:
E(y) =
Di(yi) + λ
Vi,j (yi, yj)(1)
where Diis the data term penalizing the assignments that do not fit
for pixel i, and Vi,j is the smoothness term penalizing the different
assignments for neighboring pixels iand j. Let Iibe the feature
vector of pixel i. Then Diis typically defined as the negative log-
likelihood of a label yibeing assigned to pixel iand takes the form:
Di(1) = ln Pr(Ii|R)Di(0) = ln Pr(Ii|B)(2)
where Pr(Ii|R)is the road model, and Pr(Ii|B)is the background
model. The smoothness term Vi,j follows the conventional form:
Vi,j (yi, yj) = [yi̸=yj] exp( −∥Ii− Ij2
dist(i, j)(3)
where [·]is 1if yi̸=yjand 0otherwise, βdenotes the mean square-
difference between feature vectors over all neighboring pixels, and
dist(i, j)is the Euclidean distance between pixel iand j. Parameter
λ > 0in Eq. (1) weights the importance between Diand Vi,j .
Compared with other local methods, the basic graph-cut main-
ly benefits from Vi,j that encodes the neighboring consistency, and
its efficient global optimization. However, it is still a bottom-up ap-
proach and unable to capture the rich properties of the road region.
Thus, we propose a general road shape prior and incorporate it with
the basic graph-cut segmentation framework, as shown in Sec. 3.
As there are a wide variety of road shapes in the on-board road im-
ages (see Fig. 1), it is hard to describe them with limited templates.
However, as shown in Fig. 2, for a segmented image, we can easily
tell whether the foreground (white region) looks like a road. Hence,
there may be some shape constraints making a region like a road,
which we call road shape prior.
(a) (b) (c) (d) (e) (f) (g) (h)
Fig. 2. Examples of some segmented images. Obviously, the white
regions in (a), (d), (f), and (h) look more like roads than those in (b),
(c), (e), and (g).
Here we use the shrinking constraint and consistency constraint
to describe such a prior. The former is based on the observation that
along the road axis (i.e. the centerline of the road region), the road
region is perspectively shrinking from the near to the distant, while
the latter suggests that the region between both sides of the road
should consistently belong to the road, and vice versa.
Consider Fig. 3, let lbe the centerline of the white region R,
and pixel ibe an arbitrary point. Horizontally move lto l, and draw
a horizon line h, both of which pass through the center of i.pis
an arbitrary pixel on land below i, and qis an arbitrary pixel on h
and between land l(or on l). The shrinking constraint implies that
i, i RpR, while the consistency constraint implies that
i, (iR)(i /l)qR. Thus, in Fig. 3, the shrinking con-
straint is satisfied in (a) and (b), the consistency constraint is satisfied
in (a) and (c), whereas (d) satisfies neither of the constraints.
Our criterion is that a region is road-shaped if and only if it
satisfies both of the constraints. This criterion is consistent with our
(a) (b) (c) (d)
Fig. 3. Illustration of the shrinking constraint and consistency con-
straint. In (a) and (b), i, i RpR, so the shrinking con-
straint is satisfied, while in (a) and (c), i, (iR)(i /l)
qR, therefore the consistency constraint is satisfied. In (d),
i, (iR)(p /R), and i, (iR)(i /l)(q /R),
so neither of the constraints is satisfied.
Table 1. Comparison between our criterion and the intuition. The
road-shaped regions in Fig. 2 are also road-like, and vice versa.
Images in Fig. 2 (a) (b) (c) (d) (e) (f) (g) (h)
Shrinking const. !!%!%!%!
Consistency const. !%!!!!%!
Road-shaped !%%!%!%!
Road-like !%%!%!%!
intuition (e.g., see Table 1). To improve the detection robustness, we
enforce the detected road region to be road-shaped by encoding both
constraints into a graph-cut framework. Inspired by [9], we define a
shrinking constraint term Si,p and a consistency constraint term Ci,q
as follows:
Si,p(yi, yp) =
{,if yi= 1 and yp= 0
Ci,q(yi, yq) =
{,if yi= 1 and yq= 0
Note that Si,p is defined on pixel pairs (i, p), while Ci,q is defined on
pixel pairs (i, q)(see Fig. 3). Both terms penalize the assignments
that violate the corresponding constraint by taking infinite cost. In
fact, it is enough to put Si,p and Ci,q only between neighboring
pixels. For Si,p , if every pixel pair (i,p)satisfies the shrinking con-
straint, then, of course, every neighboring (i, p)also satisfies it, and
if there exists some pixel pair not satisfying the shrinking constraint,
then there should be at least a neighboring (i, p)not satisfying it. It
is similar for Ci,q.
We use a standard 8-neighborhood system (see Fig. 4(a)) for
generality. For every pixel ithat not on l, it is easy to find a neigh-
boring pixel q, the center of which is exactly passed through by h.
While for every pixel i,lmay not exactly passes through the center
of any neighboring pixels of i. In this case, a neighboring pixel p
will be chosen if it lies the closest to l(see Fig. 4(b)).
(a) (b)
Fig. 4. Illustration choosing a pixel pneighbored to ifrom a stan-
dard 8-neighborhood system. (a) A standard 8-neighborhood sys-
tem, where 8 neighbors are connected to each pixel i. (b) A neigh-
boring pixel of i(i.e., p4) is chosen as it lies the closest to lamong
all possible neighbors (i.e., p1p5, which are not above i).
With constraint terms Si,p and Ci,q,E(y)becomes:
E(y) =
Di(yi) + λ
Vi,j (yi, yj)
Si,p(yi, yp) +
Ci,q(yi, yq)
As the road region is less changeable than the background and
the road feature is more self-similar, we only build the road model
Pr(Ii|R), and redefine the data term Diin Eq. (5) as:
Di(yi) = {1yi,if Pr(Ii|R)γ
yi,otherwise (6)
where the threshold γ=γ0max Pr(Ii|R), and γ0[0,1] is a
fixed value. Dipenalizes the assignments not fitting for the data. In
Eq. (5), Vi,j (yi, yj)takes the same form of Eq. (3), and we fix the
weighting parameter λto 1. According to [10], Eis submodular
since every term in Eq. (5) is submodular. As Eis also a quadratic
pseudo boolean function, it can be exactly minimized via graph cuts.
Note that both the shrinking and consistency constraints depend
on the road axis, which is determined by the road region, and the
road labels should also be generated from the road region. However,
the road region is unknown before detection. Thus, as detailed in
Sec. 4, we first initialize a predicted road region, and then iteratively
use the graph cuts to update the predicted region until it converges.
Our iterative graph cuts algorithm is illustrated in Fig. 5. For an
arbitrary test road image (a), we compute the illuminant invariant
(b) ranging from 0to 255 as its feature,as described in [6].
As the real road region is unknown, we first initialize a road re-
gion Rp(see (d), a semicircle at the central-bottom of the image).
To aquire a reliable labeling region L(see (c)), we leave it with a
margin of 1
2(SRpSRp/2) (SRdenotes the area of R) to the
background region Bp. Then, a road model (f) is built with the train-
ing data uniformly labeled on L, and the likelihood image (see (e),
which is normalized) of (b) can be acquired using (f). Next, we ob-
tain the data terms (see (i)) according to Eq. (6). The smaller the
energy, the more likely the road class. While the smoothness terms
(see (h)) can be directly computed from (b) using Eq. (3).
On the other hand, we get the road axis from (d) and extend it
to the image border (see (g)). Then the road shape prior, i.e. the
shrinking constraint (j) and consistency constraint (k) can be added.
Finally, we minimize E(y)in Eq. (5) via graph cuts. If the
segmented road region Rd(see (n)) converges, i.e., Rdsatisfies
ε(Rd)< ε0, where ε(·)is the convergence error and ε0is a con-
stant, a final detected result (m) will be obtained. Otherwise, Rd
will be iterated to Rpof the next cycle. Here we set ε0to 103, and
define ε(Rd) = (SRdRpSRdRp)/(SRpBp).
An example of the iterative graph cuts is shown in Fig. 6. By
iteration, a better batch of road labels are progressively generated,
thus a better road model can be built. Moreover, it favors our road
shape prior, which benefits the robust road detection.
To validate our method, three different experiments are conducted
on the SUN Database1[11], where 500 of the 1868 still on-board
road images are randomly selected as test set, and also manually
Fig. 5. Illustration of our iterative graph cuts for road detection.
(a) (b)
Fig. 6. An example of the iterative graph cuts when γ0(see Eq. (6))
is set to 0.1. (a) A test image with its illuminant invariant and s-
moothness energy, both of which are computed in one shot, and the
ground truth. (b) An illustration of the iteration. A better road mod-
el (column 2) is gradually built using the labeled data taken from L
(column 1). As the predicted region is approaching to the real road
region, the road axis also becomes accurate (column 4), and a better
road shape prior can be enforced on the likelihood image (column
3) to get a better result (column 5, 6). The detected road region Rd
converges in the 4th iteration where ε(Rd) = 6.750×104<103.
labeled as ground truth (no training set is needed in our algorithm).
All the images are resized to 200 ×200 for testing. The parameter
γ0of Eq. (6) is fixed to 0.1by default. We use gco-v3.0 software2
[12, 13, 10] to implement the graph-cut optimization.
Evaluating the performance of iteration. First, we fix the
number of iterations to 4 for efficiency, as 95.9% of the segmented
results converge after 4 iterations (see Table 2). Then, the iteration
performance is evaluated by 4 types of pixel-wise measures (ie., the
recall (T P/(T P +F N)), precision (T P /(T P +F P )), F-measure
(2/(1/recall+1/precision)), and quality (T P /(T P +F P +F N )),
which are denoted by RC,P C ,F, and Q, respectively) and the run-
ning time on a 2 GHz computer with 4.0 GB RAM. As shown in Ta-
ble 3, each of the four measures reaches the highest after 4 iterations,
Table 2. The ratio of converged segmented results (η) with the in-
creasing of iterations. ηreaches 95.9% after the 4th iteration.
Iter. 0 1 2 3 4 5 6
η(%) 0.0 10.3 56.2 79.495.997.9 99.0
Table 3. Quantitative results of the iteration performance.
Iter. RC (%)P C (%)F(%)Q(%) Time
0 87.9 85.6 83.8 74.3 35 ms
1 90.4 86.6 86.2 77.9 55 ms
2 90.8 86.8 86.7 78.5 72 ms
3 91.0 86.9 86.8 78.6 86 ms
491.2 86.9 86.9 78.898 ms
which take only 98 ms running time and thus can achieve real-time
detection. Some qualitative results can be seen in Fig. 7.
Comparing with the basic graph-cut segmentation. To show
that our road shape prior can help improve the performance of basic
graph-cut segmentation, we evaluate our method with BGCv1 (Basic
graph cuts with only data terms) and BGCv2 (basic graph cuts with
both data and smoothness terms). Quantitative results are provided
using ROC curves that represent the trade-off between true positive
rate (T P/(T P +F N )) and false positive rate (F P/(F P +T N )).
The data terms for basic graph cuts are got by Eq. (6) using the train-
ing data taken from a fixed region (the same as our initialized label-
ing region). By changing γ0from 0to 1, we obtain the ROC curves
and corresponding AUC (area under the curve) (see Fig. 8). Obvi-
ously, our method outperforms BGCv1 and BGCv2 with the highest
AUC (95.4%). Qualitative results in Fig. 9 show that the detected
road regions are enforced to be road-shaped with our method.
Comparing with state-of-the-art algorithms. We compare
our method with two state-of-the-art single image based road detec-
tion algorithms: the vanishing point (VP) based method [5] and the
likelihood-based region growing (LRG) method [6]. Quantitative
results (see Table 4) show that our method outperforms both VP and
LRG with the measures of RC,P C ,F, and Q. Qualitative results
in Fig. 9 show that VP fails when there is no legible vanishing point
(row 5, 6) or the road boundaries are curved (row 2), and the LRG
fails due to overgrowth (row 1, 3, 5) or undergrowth (row 2, 4). Yet,
our method performs the best on the first five instances, benefiting
from the global optimization under a general road shape constraint.
Discussion. While our method enforces such a shape prior on
the road segmentation task, it can also result in a degradation in per-
formance, e.g., a fork road may not satisfy our road shape prior (see,
e.g., row 6 of Fig. 9). Besides, our method fails when the illuminant-
invariant feature lacks discriminability in some cases (e.g., in the last
instance of Fig. 9, the illuminant-invariant feature of the road and its
surroundings is similar). Thus, engineering better features and ex-
ploring more useful road priors (e.g., a more general shape prior that
can handle more complex road shapes such as forks or crossings) are
our interesting directions for future work.
In this paper, we proposed a general road shape prior to enforce the
detected region to be road-shaped by encoding it into a graph-cut
framework, where the training data is automatically generated from
a predicted road region of the current image. Accurate road region
is obtained by iteratively performing the graph cuts. Experimental
results validate the robustness and efficiency of our method.
Acknowledgment. The presented research work is support-
ed by the National Natural Science Foundation of China (Grant
No. 90820302).
Table 4. Quantitative results of our method compared with state-of-
the-art single image road detection algorithms VP [5] and LRG [6].
RC (%)P C (%)F(%)Q(%)
VP 75.9±28.7 78.2±22.8 72.8±23.8 61.5±22.9
LRG 89.1±8.8 82.7±20.6 83.3±13.1 73.3±16.6
Ours 91.2±8.686.9±19.386.9±13.378.8±16.9
Fig. 7. Qualitative results of the iteration performance.
Fig. 8. ROC curves and the corresponding AUC of our method com-
pared with basic graph-cut methods BGCv1 and BGCv2.
Fig. 9. Comparisons of our method over the basic graph-cut meth-
ods (BGCv1, BGCv2) and two state-of-the-art methods (VP, LRG).
Each row shows a different instance. White and black regions in
column 27 denote roads and backgrounds, respectively. Column
1: Test images. Column 26: Detection results obtained by: VP [5],
BGCv1, BGCv2, LRG [6], and our method, respectively. Column 7:
Manually labeled ground truth. Note that the same feature (i.e., the
illuminant invariant) are used in the last four methods.
[1] J.M. Alvarez, T. Gevers, and A.M. L´
opez, “Vision-based road
detection using road models,” in Image Processing (ICIP),
2009 16th IEEE International Conference on. IEEE, 2009, pp.
[2] Q. Huang, M. Han, B. Wu, and S. Ioffe, “A hierarchical condi-
tional random field model for labeling and segmenting images
of street scenes,” in Computer Vision and Pattern Recognition
(CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 1953–
[3] P. Sturgess, L. Ladicky, N. Crook, and P.H.S. Torr, “Scal-
able cascade inference for semantic image segmentation,” in
BMVC, 2012.
[4] G. Floros and B. Leibe, “Joint 2d-3d temporally consistent
semantic segmentation of street scenes,” in Computer Vision
and Pattern Recognition (CVPR), 2012 IEEE Conference on.
IEEE, 2012, pp. 2823–2830.
[5] H. Kong, J.Y. Audibert, and J. Ponce, “Vanishing point de-
tection for road detection,” in Computer Vision and Pattern
Recognition (CVPR), 2009 IEEE Conference on. IEEE, 2009,
pp. 96–103.
[6] J.M. Alvarez and A.M. Lopez, “Road detection based on illu-
minant invariance, Intelligent Transportation Systems, IEEE
Transactions on, vol. 12, no. 1, pp. 184–193, 2011.
[7] J.M. Alvarez, T. Gevers, Y. LeCun, and A.M. Lopez, “Road
scene segmentation from a single image,” in Proceedings of
the 12th European Conference on Computer Vision. Springer,
[8] Y. Boykov and M.P. Jolly, “Interactive graph cuts for optimal
boundary & region segmentation of objects in nd images, in
Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE
International Conference on. IEEE, 2001, vol. 1, pp. 105–112.
[9] O. Veksler, “Star shape prior for graph-cut image segmen-
tation,” in Proceedings of the 10th European Conference on
Computer Vision: Part III. Springer, 2008, pp. 454–467.
[10] V. Kolmogorov and R. Zabin, “What energy functions can be
minimized via graph cuts?,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 26, no. 2, pp. 147–
159, 2004.
[11] J. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A. Torralba, “Sun
database: Large-scale scene recognition from abbey to zoo,” in
Computer vision and pattern recognition (CVPR), 2010 IEEE
conference on. IEEE, 2010, pp. 3485–3492.
[12] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy
minimization via graph cuts,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 23, no. 11, pp. 1222–
1239, 2001.
[13] Y. Boykov and V. Kolmogorov, “An experimental compari-
son of min-cut/max-flow algorithms for energy minimization
in vision,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 26, no. 9, pp. 1124–1137, 2004.
... Prior knowledge regarding the road shape has also been used to improve the performance of road detection, e.g. [10]. Such methods may face problems when the road region in the input frame is significantly different than the models in the learned road shape database. ...
Conference Paper
In this article, we propose a new online method for road detection which uses as input video captured by a single video camera. Our method consists of two stages. In the first stage we build a statistical road model using training data and in the second we detect the road area in new video frames. The road model is based on video segmentation with evolving GMMs. After the initial detection of the road area, the result is improved with post-processing, which caters to inaccuracies in the detected road region caused by shadows, illuminations, and unusual road shapes. Experimental results for the established, publicly available CamVid dataset show that the proposed method achieves high accuracy in road detection.
... Road detection using sensors plays an important role in the field of unmanned driving system [1]. However, road detection is a challenging problem because of the following factors: (i) diverse road environments, such as straight roads, curved roads, concrete structured roads, brick roads and unstructured roads which could not be determined by detecting lane features [2,3]; (ii) changes in illumination which could cause differences in the colour or intensity of lane lines in structured roads, and the dividing line between sunshine and shadow that may sometimes be considered as lane feature points [4]; (iii) obstacles such as pedestrians and vehicles which would block the lane markings [4]; (iv) no versatile and low-cost methods that could be used in all environments at present [3]; (v) poor adaptability of most existing methods for road detection when applied in actual diversified scenarios, such as the DARPA Grand Challenge [5]. ...
Full-text available
In order to realise autonomous navigation of unmanned platforms in urban or off‐road environments, it is crucial to study accurate, versatile and real‐time road detection methods. This study proposes an adaptive road detection method that combines lane lines and obstacle boundaries, which can be applied to a variety of driving environments. Combining multi‐channel threshold processing, it is robust to lane feature detection under various complex situations. Obstacle information extracted from the grid image constructed by 3D LIDAR point cloud is used for lane feature selection to avoid interference from pedestrians and vehicles. The proposed method makes use of adaptive sliding window for feature selection, and piecewise least squares method for road line fitting. Experimental results on dataset and in real‐world environments show that the proposed method can overcome illumination changes, shadow occlusion, pedestrian, vehicle interference and so on in a variety of scenes. The proposed method has good enough efficiency, robustness and real‐time performance.
... The problem of scene classification has been well handled by various researchers to classify scenes like disaster damage images, real estate images, road detection, vehicle detection, sports images etc. [4][5] [6][7] [9]. To classify such scene images, different types of low level (color, texture etc.) and high level (region based, semantic based etc.) features have been used. ...
... For the monocular case, various monocular features have been exploited as cues for ground manifold estimation such as colour [17], [18], intensity [19], shape [20], boundary [21], or vanishing points [22], [23]. For the stereo-vision case, the ground plane is typically estimated by using normal vectors in disparity space [11]. ...
Conference Paper
Full-text available
Stixel-based segmentation is specifically designed towards obstacle detection which combines road surface estimation in traffic scenes, stixel calculations, and stixel clustering. Stixels are defined by observed height above road surface. Road surfaces (ground manifolds) are represented by using an occupancy grid map. Stixel-based segmentation may improve the accuracy of real-time obstacle detection, especially if adaptive to changes in ground manifolds (e.g. with respect to non-planar road geometry). In this paper, we propose the use of a polynomial curve fitting algorithm based on the v-disparity space for ground manifold estimation. This is beneficial for two reasons. First, the coordinate space has inherently finite boundaries, which is useful when working with probability densities. Second, it leads to reduced computation time. We combine height segmentation and improved ground manifold algorithms together for stixel extraction. Our experimental results show a significant improvement in the accuracy of the ground manifold detection (an 8% improvement) compared to occupancy-grid mapping methods.
Road extraction is an important part of the intelligent vehicle systems for automatic driving, navigation, and traffic warning. For the complicated road scene, we present a road detection method based on illumination invariant image and quadratic estimation. The algorithm firstly extracts the illumination invariant image, and a priori triangular road region is used as the color sample to analyze the illumination invariant image and obtain the probability maps. Next, based on the histogram analysis, the combined probability map is significantly resettled, and the road region is estimated for the first time. Then gradient images of the illumination invariant image and the probability map are extracted, and the gradient image is analyzed by the estimated road region. Finally, the effective road boundary is extracted, and the more accurate road region is obtained. The experimental results show that our method can adapt to the road image in a variety of environments. Compared with other algorithms, our algorithm is more stable, and the computational efficiency is improved obviously.
Full-text available
This paper highlights the role of ground manifold modeling for stixel calculations; stixels are medium-level data representations used for the development of computer vision modules for self-driving cars. By using single-disparity maps and simplifying ground manifold models, calculated stixels may suffer from noise, inconsistency, and false-detection rates for obstacles, especially in challenging datasets. Stixel calculations can be improved with respect to accuracy and robustness by using more adaptive ground manifold approximations. A comparative study of stixel results, obtained for different ground-manifold models (e.g., plane-fitting, line-fitting in v-disparities or polynomial approximation, and graph cut), defines the main part of this paper. This paper also considers the use of trinocular stereo vision and shows that this provides options to enhance stixel results, compared with the binocular recording. Comprehensive experiments are performed on two publicly available challenging datasets. We also use a novel way for comparing calculated stixels with ground truth. We compare depth information, as given by extracted stixels, with ground-truth depth, provided by depth measurements using a highly accurate LiDAR range sensor (as available in one of the public datasets). We evaluate the accuracy of four different ground-manifold methods. The experimental results also include quantitative evaluations of the tradeoff between accuracy and run time. As a result, the proposed trinocular recording together with graph-cut estimation of ground manifolds appears to be a recommended way, also considering challenging weather and lighting conditions.
Purpose This paper aims to propose a robust and efficient method for vanishing point detection in unstructured road scenes. Design/methodology/approach The proposed method includes two main stages: drivable region estimation and vanishing point detection. In drivable region estimation stage, the road image is segmented into a set of patches; then the drivable region is estimated by the patch-wise manifold ranking. In vanishing point detection stage, the LSD method is used to extract the straight lines; then a series of principles are proposed to remove the noise lines. Finally, the vanishing point is detected by a novel voting strategy. Findings The proposed method is validated on various unstructured road images collected from the real world. It is more robust and more efficient than the state-of-the-art method and the other three recent methods. Experimental results demonstrate that the detected vanishing point is practical for vision-sensor-based navigation in complex unstructured road scenes. Originality/value This paper proposes a patch-wise manifold ranking method to estimate the drivable region that contains most of the informative clues for vanishing point detection. Based on the removal of the noise lines through a series of principles, a novel voting strategy is proposed to detect the vanishing point.
Conference Paper
Full-text available
Semantic image segmentation is a problem of simultaneous segmentation and recognition of an input image into regions and their associated categorical labels, such as person, car or cow. A popular way to achieve this goal is to assign a label to every pixel in the input image and impose simple structural constraints on the output label space. Efficient approximation algorithms for solving this labelling problem such as a-expansion have, at best, linear runtime complexity with respect to the number of labels, making them practical only when working in a specific domain that has few classes-of-interest. However when working in a more general setting where the number of classes could easily reach tens of thousands, sub-linear complexity is desired. In this paper we propose meeting this requirement by performing cascaded inference that wraps around the a-expansion algorithm. The cascade both divides the large label set into smaller more manageable ones by way of a hierarchy, and dynamically subdivides the image into smaller and smaller regions during inference. We test our method on the SUN09 dataset with 107 accurately hand labelled classes.
Conference Paper
Full-text available
Road scene segmentation is important in computer vision for different applications such as autonomous driving and pedestrian detection. Recovering the 3D structure of road scenes provides relevant contextual information to improve their understanding. In this paper, we use a convolutional neural network based algorithm to learn features from noisy labels to recover the 3D scene layout of a road image. The novelty of the algorithm relies on generating training labels by applying an algorithm trained on a general image dataset to classify on–board images. Further, we propose a novel texture descriptor based on a learned color plane fusion to obtain maximal uniformity in road areas. Finally, acquired (off–line) and current (on–line) information are combined to detect road areas in single images. From quantitative and qualitative experiments, conducted on publicly available datasets, it is concluded that convolutional neural networks are suitable for learning 3D scene layout from noisy labels and provides a relative improvement of 7% compared to the baseline. Furthermore, combining color planes provides a statistical description of road areas that exhibits maximal uniformity and provides a relative improvement of 8% compared to the baseline. Finally, the improvement is even bigger when acquired and current information from a single image are combined.
Conference Paper
Full-text available
In this paper we propose a novel Conditional Random Field (CRF) formulation for the semantic scene labeling problem which is able to enforce temporal consistency be-tween consecutive video frames and take advantage of the 3D scene geometry to improve segmentation quality. The main contribution of this work lies in the novel use of a 3D scene reconstruction as a means to temporally couple the individual image segmentations, allowing information flow from 3D geometry to the 2D image space. As our results show, the proposed framework outperforms state-of-the-art methods and opens a new perspective towards a tighter in-terplay of 2D and 3D information in the scene understand-ing problem.
Conference Paper
Full-text available
Simultaneously segmenting and labeling images is a fundamental problem in Computer Vision. In this paper, we introduce a hierarchical CRF model to deal with the problem of labeling images of street scenes by several distinctive object classes. In addition to learning a CRF model from all the labeled images, we group images into clusters of similar images and learn a CRF model from each cluster separately. When labeling a new image, we pick the closest cluster and use the associated CRF model to label this image. Experimental results show that this hierarchical image labeling method is comparable to, and in many cases superior to, previous methods on benchmark data sets. In addition to segmentation and labeling results, we also showed how to apply the image labeling result to rerank Google similar images.
In this paper we describe a new technique for general purpose interactive segmentation of N-dimensional images . The user marks certain pixels as "object" or "background" to provide hard constraints for segmentation. Additional soft constraints incorporate both boundary and region in- formation. Graph cuts are used to find the globally optimal segmentation of the N-dimensional image. The obtained so- lution gives the best balance of boundary and region prop- erties among all segmentations satisfying the constraints . The topology of our segmentation is unrestricted and both "object" and "background" segments may consist of sev- eral isolated parts. Some experimental results are present ed in the context of photo/video editing and medical image seg- mentation. We also demonstrate an interesting Gestalt ex- ample. A fast implementation of our segmentation method is possible via a new max-flow algorithm in (2).
Conference Paper
Vision-based road detection is very challenging since the road is in an outdoor scenario imaged from a mobile platform. In this paper, a new top-down road detection algorithm is proposed. The method is based on scene (road) classification which provides the probability that an image contains certain type of road geometry (straight, left/right curve, etc.). During the training of the classifier a road probability map is also learned for each road geometry. Then, the proper pixel-based method is selected and fused to provide an improved road detection approach. From experiments it is concluded that the proposed method outperforms state-of-the-art algorithms in a frame by frame context.
Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases which do not capture the full variety of scene categories. Whereas standard databases for object categorization contain hundreds of different classes of objects, the largest available dataset of scene categories contains only 15 classes. In this paper we propose the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images. We use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. We measure human scene classification performance on the SUN database and compare this with computational methods. Additionally, we study a finer-grained scene representation to detect scenes embedded inside of larger scenes.
Given a single image of an arbitrary road, that may not be well-paved, or have clearly delineated edges, or some a priori known color or texture distribution, is it possible for a computer to find this road? This paper addresses this question by decomposing the road detection process into two steps: the estimation of the vanishing point asso- ciated with the main (straight) part of the road, followed by the segmentation of the corresponding road area based on the detected vanishing point. The main technical contri- butions of the proposed approach are a novel adaptive soft voting scheme based on variable-sized voting region using confidence-weighted Gabor filters, which compute the dom- inant texture orientation at each pixel, and a new vanishing- point-constrained edge detection technique for detecting road boundaries. The proposed method has been imple- mented, and experiments with 1003 general road images demonstrate that it is both computationally efficient and ef- fective at detecting road regions in challenging conditions.