Content uploaded by Zhen He
Author content
All content in this area was uploaded by Zhen He on May 22, 2016
Content may be subject to copyright.
ROBUST ROAD DETECTION FROM A SINGLE IMAGE USING ROAD SHAPE PRIOR
Zhen He, Tao Wu, Zhipeng Xiao, Hangen He
College of Mechatronics and Automation
National University of Defense Technology
Changsha, Hunan, P. R. China
ABSTRACT
Many road detection algorithms require pre-learned information,
which may be unreliable as the road scene is usually unexpectable.
Single image based (i.e., without any pre-learned information) road
detection techniques can be adopted to overcome this problem,
while their robustness needs improving.
To achieve robust road detection from a single image, this paper
proposes a general road shape prior to enforce the detected region to
be road-shaped by encoding the prior into a graph-cut segmentation
framework, where the training data is automatically generated from a
predicted road region of the current image. By iteratively performing
the graph-cut segmentation, an accurate road region will be obtained.
Quantitative and qualitative experiments on the challenging
SUN Database validate the robustness and efficiency of our method.
We believe that the road shape prior can also be used to yield
improvements for many other road detection algorithms.
Index Terms—Road detection, shape prior, graph cuts
1. INTRODUCTION
Vision-based road detection is of high relevance for autonomous
driving and pedestrian detection. Detecting the road from an on-
board road image is very challenging due to the diversity of road
shapes (e.g., straight, left/right curve, and winding), road materials
(e.g., concrete, soil, and asphalt), and backgrounds (e.g., street, for-
est, and mountain), and the noise induced by varying illumination,
different weather, object occlusion, etc., as shown in Fig. 1. With the
information learned off-line from some specific road scenes, many
proposed segmentation algorithms [1, 2, 3, 4] perform well for the
same road scenes but poorly for the others. However, the off-line
learning needs to label training data manually which is time con-
suming and the road scene is usually unknown, especially on mov-
ing vehicles. Hence, road detection from a single image (i.e. without
any pre-learned information) is needed to overcome these problems
and thus will be more challenging.
There has been some works on single image based road detec-
tion. In [5], Kong et al. use a vanishing-point-constrained edge de-
tection technique to acquire straight road boundaries. This method
may be inapplicable when there is no legible vanishing point (e.g.,
on the ascent), or the road boundaries are curved. To deal with com-
plex road shapes, Alvarez et al. [6] use a likelihood-based region
growing method to label pixels. It is factually a local method and
not robust to noise, which may cause overgrowth or undergrowth.
More recently, in [7], general information off-line learned from oth-
er road scenes is combined with the information on-line learned from
current image to enhance the detection robustness. However, in its
on-line learning procedure, the training data is taken from a fixed
region (i.e. the central-bottom of the image), thus the road patterns
may not be well-learned.
(a) (b) (c) (d) (e) (f)
Fig. 1. Examples of different roads with: (a) Different shapes, (b)
Different materials, (c) Different backgrounds, (d) Different illumi-
nation, (e) Different weather, and (f) Object occlusion.
Aiming at utilizing the prior knowledge to improve the robust-
ness of single image based road detection, we propose a general road
shape prior to enforce the detected road region to be road-shaped
by encoding this prior into a graph-cut segmentation framework,
where the labels are automatically generated from the current im-
age to model the road. The final detected road region is acquired by
iteratively performing the graph cuts, where the last detected road
region is iterated to a current predicted one that is used for prior
encoding and label generating and initialized as a semicircle at the
central-bottom of the image.
The road shape prior is based on the observation that in on-board
road images, the varied road regions should have some shape con-
straints making them look like road, and thus the detected road re-
gions should also satisfy these constraints. Unlike previous methods
using limited templates that are hard to cover varied road shapes and
need matching, our approach benefits from a general road shape pri-
or that is more natural and can be encoded in a unified graph-cut
framework; unlike previous online learning methods generating la-
bels from the current image in one shot, it progressively generates a
better batch of labels automatically by iterative graph cuts and thus
builds a better model.
An outline of the paper follows. In Sec. 2 we demonstrate how
the basic graph-cut segmentation can be used for road detection, in
Sec. 3 we explain how to incorporate the road shape prior in the
graph-cut segmentation framework, and in Sec. 4 we detail the iter-
ative graph cuts. Experimental results and discussion are presented
in Sec. 5, while conclusions are given in Sec. 6.
2. ROAD DETECTION USING BASIC GRAPH-CUT
SEGMENTATION
Road detection can be formulated as a binary labeling problem trying
to assign a label (road or background) to each pixel. A popular way
to such problems is to use the efficient graph-cut segmentation [8].
Let yi∈ {0 (background),1 (road)}be the label of pixel iof
the image, Pbe the set of all pixels, and Nbe the set of all neigh-
boring pixel pairs. Then y={yi|i∈ P } is the collection of all
2757978-1-4799-2341-0/13/$31.00 ©2013 IEEE ICIP 2013
label assignments and defines a segmentation. The basic graph-cut
optimizes yby minimizing the energy function:
E(y) = ∑
i∈P
Di(yi) + λ∑
i,j∈N
Vi,j (yi, yj)(1)
where Diis the data term penalizing the assignments that do not fit
for pixel i, and Vi,j is the smoothness term penalizing the different
assignments for neighboring pixels iand j. Let Iibe the feature
vector of pixel i. Then Diis typically defined as the negative log-
likelihood of a label yibeing assigned to pixel iand takes the form:
Di(1) = −ln Pr(Ii|R)Di(0) = −ln Pr(Ii|B)(2)
where Pr(Ii|R)is the road model, and Pr(Ii|B)is the background
model. The smoothness term Vi,j follows the conventional form:
Vi,j (yi, yj) = [yi̸=yj] exp( −∥Ii− Ij∥2
2β)1
dist(i, j)(3)
where [·]is 1if yi̸=yjand 0otherwise, βdenotes the mean square-
difference between feature vectors over all neighboring pixels, and
dist(i, j)is the Euclidean distance between pixel iand j. Parameter
λ > 0in Eq. (1) weights the importance between Diand Vi,j .
Compared with other local methods, the basic graph-cut main-
ly benefits from Vi,j that encodes the neighboring consistency, and
its efficient global optimization. However, it is still a bottom-up ap-
proach and unable to capture the rich properties of the road region.
Thus, we propose a general road shape prior and incorporate it with
the basic graph-cut segmentation framework, as shown in Sec. 3.
3. INCORPORATING THE ROAD SHAPE PRIOR
As there are a wide variety of road shapes in the on-board road im-
ages (see Fig. 1), it is hard to describe them with limited templates.
However, as shown in Fig. 2, for a segmented image, we can easily
tell whether the foreground (white region) looks like a road. Hence,
there may be some shape constraints making a region like a road,
which we call road shape prior.
(a) (b) (c) (d) (e) (f) (g) (h)
Fig. 2. Examples of some segmented images. Obviously, the white
regions in (a), (d), (f), and (h) look more like roads than those in (b),
(c), (e), and (g).
Here we use the shrinking constraint and consistency constraint
to describe such a prior. The former is based on the observation that
along the road axis (i.e. the centerline of the road region), the road
region is perspectively shrinking from the near to the distant, while
the latter suggests that the region between both sides of the road
should consistently belong to the road, and vice versa.
Consider Fig. 3, let lbe the centerline of the white region R,
and pixel ibe an arbitrary point. Horizontally move lto l′, and draw
a horizon line h, both of which pass through the center of i.pis
an arbitrary pixel on l′and below i, and qis an arbitrary pixel on h
and between l′and l(or on l). The shrinking constraint implies that
∀i, i ∈R⇒p∈R, while the consistency constraint implies that
∀i, (i∈R)∧(i /∈l)⇒q∈R. Thus, in Fig. 3, the shrinking con-
straint is satisfied in (a) and (b), the consistency constraint is satisfied
in (a) and (c), whereas (d) satisfies neither of the constraints.
Our criterion is that a region is road-shaped if and only if it
satisfies both of the constraints. This criterion is consistent with our
(a) (b) (c) (d)
Fig. 3. Illustration of the shrinking constraint and consistency con-
straint. In (a) and (b), ∀i, i ∈R⇒p∈R, so the shrinking con-
straint is satisfied, while in (a) and (c), ∀i, (i∈R)∧(i /∈l)⇒
q∈R, therefore the consistency constraint is satisfied. In (d),
∃i, (i∈R)∧(p /∈R), and ∃i, (i∈R)∧(i /∈l)∧(q /∈R),
so neither of the constraints is satisfied.
Table 1. Comparison between our criterion and the intuition. The
road-shaped regions in Fig. 2 are also road-like, and vice versa.
Images in Fig. 2 (a) (b) (c) (d) (e) (f) (g) (h)
Shrinking const. !!%!%!%!
Consistency const. !%!!!!%!
Road-shaped !%%!%!%!
Road-like !%%!%!%!
intuition (e.g., see Table 1). To improve the detection robustness, we
enforce the detected road region to be road-shaped by encoding both
constraints into a graph-cut framework. Inspired by [9], we define a
shrinking constraint term Si,p and a consistency constraint term Ci,q
as follows:
Si,p(yi, yp) =
{∞,if yi= 1 and yp= 0
0,otherwise
Ci,q(yi, yq) =
{∞,if yi= 1 and yq= 0
0,otherwise
(4)
Note that Si,p is defined on pixel pairs (i, p), while Ci,q is defined on
pixel pairs (i, q)(see Fig. 3). Both terms penalize the assignments
that violate the corresponding constraint by taking infinite cost. In
fact, it is enough to put Si,p and Ci,q only between neighboring
pixels. For Si,p , if every pixel pair (i,p)satisfies the shrinking con-
straint, then, of course, every neighboring (i, p)also satisfies it, and
if there exists some pixel pair not satisfying the shrinking constraint,
then there should be at least a neighboring (i, p)not satisfying it. It
is similar for Ci,q.
We use a standard 8-neighborhood system (see Fig. 4(a)) for
generality. For every pixel ithat not on l, it is easy to find a neigh-
boring pixel q, the center of which is exactly passed through by h.
While for every pixel i,l′may not exactly passes through the center
of any neighboring pixels of i. In this case, a neighboring pixel p
will be chosen if it lies the closest to l′(see Fig. 4(b)).
(a) (b)
Fig. 4. Illustration choosing a pixel pneighbored to ifrom a stan-
dard 8-neighborhood system. (a) A standard 8-neighborhood sys-
tem, where 8 neighbors are connected to each pixel i. (b) A neigh-
boring pixel of i(i.e., p4) is chosen as it lies the closest to l′among
all possible neighbors (i.e., p1∼p5, which are not above i).
2758
With constraint terms Si,p and Ci,q,E(y)becomes:
E(y) = ∑
i∈P
Di(yi) + λ∑
i,j∈N
Vi,j (yi, yj)
+∑
i,p∈N
Si,p(yi, yp) + ∑
i,q∈N
Ci,q(yi, yq)
(5)
As the road region is less changeable than the background and
the road feature is more self-similar, we only build the road model
Pr(Ii|R), and redefine the data term Diin Eq. (5) as:
Di(yi) = {1−yi,if Pr(Ii|R)≥γ
yi,otherwise (6)
where the threshold γ=γ0max Pr(Ii|R), and γ0∈[0,1] is a
fixed value. Dipenalizes the assignments not fitting for the data. In
Eq. (5), Vi,j (yi, yj)takes the same form of Eq. (3), and we fix the
weighting parameter λto 1. According to [10], Eis submodular
since every term in Eq. (5) is submodular. As Eis also a quadratic
pseudo boolean function, it can be exactly minimized via graph cuts.
Note that both the shrinking and consistency constraints depend
on the road axis, which is determined by the road region, and the
road labels should also be generated from the road region. However,
the road region is unknown before detection. Thus, as detailed in
Sec. 4, we first initialize a predicted road region, and then iteratively
use the graph cuts to update the predicted region until it converges.
4. ROAD DETECTION BY ITERATIVE GRAPH CUTS
Our iterative graph cuts algorithm is illustrated in Fig. 5. For an
arbitrary test road image (a), we compute the illuminant invariant
(b) ranging from 0to 255 as its feature,as described in [6].
As the real road region is unknown, we first initialize a road re-
gion Rp(see (d), a semicircle at the central-bottom of the image).
To aquire a reliable labeling region L(see (c)), we leave it with a
margin of 1
2(√SRp−√SRp/2) (SRdenotes the area of R) to the
background region Bp. Then, a road model (f) is built with the train-
ing data uniformly labeled on L, and the likelihood image (see (e),
which is normalized) of (b) can be acquired using (f). Next, we ob-
tain the data terms (see (i)) according to Eq. (6). The smaller the
energy, the more likely the road class. While the smoothness terms
(see (h)) can be directly computed from (b) using Eq. (3).
On the other hand, we get the road axis from (d) and extend it
to the image border (see (g)). Then the road shape prior, i.e. the
shrinking constraint (j) and consistency constraint (k) can be added.
Finally, we minimize E(y)in Eq. (5) via graph cuts. If the
segmented road region Rd(see (n)) converges, i.e., Rdsatisfies
ε(Rd)< ε0, where ε(·)is the convergence error and ε0is a con-
stant, a final detected result (m) will be obtained. Otherwise, Rd
will be iterated to Rpof the next cycle. Here we set ε0to 10−3, and
define ε(Rd) = (SRd∪Rp−SRd∩Rp)/(SRp∪Bp).
An example of the iterative graph cuts is shown in Fig. 6. By
iteration, a better batch of road labels are progressively generated,
thus a better road model can be built. Moreover, it favors our road
shape prior, which benefits the robust road detection.
5. EXPERIMENTAL RESULTS AND DISCUSSION
To validate our method, three different experiments are conducted
on the SUN Database1[11], where 500 of the 1868 still on-board
road images are randomly selected as test set, and also manually
1http://groups.csail.mit.edu/vision/SUN
Fig. 5. Illustration of our iterative graph cuts for road detection.
(a) (b)
Fig. 6. An example of the iterative graph cuts when γ0(see Eq. (6))
is set to 0.1. (a) A test image with its illuminant invariant and s-
moothness energy, both of which are computed in one shot, and the
ground truth. (b) An illustration of the iteration. A better road mod-
el (column 2) is gradually built using the labeled data taken from L
(column 1). As the predicted region is approaching to the real road
region, the road axis also becomes accurate (column 4), and a better
road shape prior can be enforced on the likelihood image (column
3) to get a better result (column 5, 6). The detected road region Rd
converges in the 4th iteration where ε(Rd) = 6.750×10−4<10−3.
labeled as ground truth (no training set is needed in our algorithm).
All the images are resized to 200 ×200 for testing. The parameter
γ0of Eq. (6) is fixed to 0.1by default. We use gco-v3.0 software2
[12, 13, 10] to implement the graph-cut optimization.
Evaluating the performance of iteration. First, we fix the
number of iterations to 4 for efficiency, as 95.9% of the segmented
results converge after 4 iterations (see Table 2). Then, the iteration
performance is evaluated by 4 types of pixel-wise measures (ie., the
recall (T P/(T P +F N)), precision (T P /(T P +F P )), F-measure
(2/(1/recall+1/precision)), and quality (T P /(T P +F P +F N )),
which are denoted by RC,P C ,F, and Q, respectively) and the run-
ning time on a 2 GHz computer with 4.0 GB RAM. As shown in Ta-
ble 3, each of the four measures reaches the highest after 4 iterations,
2http://vision.csd.uwo.ca/code/gco-v3.0.zip
2759
Table 2. The ratio of converged segmented results (η) with the in-
creasing of iterations. ηreaches 95.9% after the 4th iteration.
Iter. 0 1 2 3 4 5 6
η(%) 0.0 10.3 56.2 79.495.997.9 99.0
Table 3. Quantitative results of the iteration performance.
Iter. RC (%)P C (%)F(%)Q(%) Time
0 87.9 85.6 83.8 74.3 35 ms
1 90.4 86.6 86.2 77.9 55 ms
2 90.8 86.8 86.7 78.5 72 ms
3 91.0 86.9 86.8 78.6 86 ms
491.2 86.9 86.9 78.898 ms
which take only 98 ms running time and thus can achieve real-time
detection. Some qualitative results can be seen in Fig. 7.
Comparing with the basic graph-cut segmentation. To show
that our road shape prior can help improve the performance of basic
graph-cut segmentation, we evaluate our method with BGCv1 (Basic
graph cuts with only data terms) and BGCv2 (basic graph cuts with
both data and smoothness terms). Quantitative results are provided
using ROC curves that represent the trade-off between true positive
rate (T P/(T P +F N )) and false positive rate (F P/(F P +T N )).
The data terms for basic graph cuts are got by Eq. (6) using the train-
ing data taken from a fixed region (the same as our initialized label-
ing region). By changing γ0from 0to 1, we obtain the ROC curves
and corresponding AUC (area under the curve) (see Fig. 8). Obvi-
ously, our method outperforms BGCv1 and BGCv2 with the highest
AUC (95.4%). Qualitative results in Fig. 9 show that the detected
road regions are enforced to be road-shaped with our method.
Comparing with state-of-the-art algorithms. We compare
our method with two state-of-the-art single image based road detec-
tion algorithms: the vanishing point (VP) based method [5] and the
likelihood-based region growing (LRG) method [6]. Quantitative
results (see Table 4) show that our method outperforms both VP and
LRG with the measures of RC,P C ,F, and Q. Qualitative results
in Fig. 9 show that VP fails when there is no legible vanishing point
(row 5, 6) or the road boundaries are curved (row 2), and the LRG
fails due to overgrowth (row 1, 3, 5) or undergrowth (row 2, 4). Yet,
our method performs the best on the first five instances, benefiting
from the global optimization under a general road shape constraint.
Discussion. While our method enforces such a shape prior on
the road segmentation task, it can also result in a degradation in per-
formance, e.g., a fork road may not satisfy our road shape prior (see,
e.g., row 6 of Fig. 9). Besides, our method fails when the illuminant-
invariant feature lacks discriminability in some cases (e.g., in the last
instance of Fig. 9, the illuminant-invariant feature of the road and its
surroundings is similar). Thus, engineering better features and ex-
ploring more useful road priors (e.g., a more general shape prior that
can handle more complex road shapes such as forks or crossings) are
our interesting directions for future work.
6. CONCLUSIONS
In this paper, we proposed a general road shape prior to enforce the
detected region to be road-shaped by encoding it into a graph-cut
framework, where the training data is automatically generated from
a predicted road region of the current image. Accurate road region
is obtained by iteratively performing the graph cuts. Experimental
results validate the robustness and efficiency of our method.
Acknowledgment. The presented research work is support-
ed by the National Natural Science Foundation of China (Grant
No. 90820302).
Table 4. Quantitative results of our method compared with state-of-
the-art single image road detection algorithms VP [5] and LRG [6].
RC (%)P C (%)F(%)Q(%)
VP 75.9±28.7 78.2±22.8 72.8±23.8 61.5±22.9
LRG 89.1±8.8 82.7±20.6 83.3±13.1 73.3±16.6
Ours 91.2±8.686.9±19.386.9±13.378.8±16.9
Fig. 7. Qualitative results of the iteration performance.
Fig. 8. ROC curves and the corresponding AUC of our method com-
pared with basic graph-cut methods BGCv1 and BGCv2.
Fig. 9. Comparisons of our method over the basic graph-cut meth-
ods (BGCv1, BGCv2) and two state-of-the-art methods (VP, LRG).
Each row shows a different instance. White and black regions in
column 2∼7 denote roads and backgrounds, respectively. Column
1: Test images. Column 2∼6: Detection results obtained by: VP [5],
BGCv1, BGCv2, LRG [6], and our method, respectively. Column 7:
Manually labeled ground truth. Note that the same feature (i.e., the
illuminant invariant) are used in the last four methods.
2760
7. REFERENCES
[1] J.M. Alvarez, T. Gevers, and A.M. L´
opez, “Vision-based road
detection using road models,” in Image Processing (ICIP),
2009 16th IEEE International Conference on. IEEE, 2009, pp.
2073–2076.
[2] Q. Huang, M. Han, B. Wu, and S. Ioffe, “A hierarchical condi-
tional random field model for labeling and segmenting images
of street scenes,” in Computer Vision and Pattern Recognition
(CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 1953–
1960.
[3] P. Sturgess, L. Ladicky, N. Crook, and P.H.S. Torr, “Scal-
able cascade inference for semantic image segmentation,” in
BMVC, 2012.
[4] G. Floros and B. Leibe, “Joint 2d-3d temporally consistent
semantic segmentation of street scenes,” in Computer Vision
and Pattern Recognition (CVPR), 2012 IEEE Conference on.
IEEE, 2012, pp. 2823–2830.
[5] H. Kong, J.Y. Audibert, and J. Ponce, “Vanishing point de-
tection for road detection,” in Computer Vision and Pattern
Recognition (CVPR), 2009 IEEE Conference on. IEEE, 2009,
pp. 96–103.
[6] J.M. Alvarez and A.M. Lopez, “Road detection based on illu-
minant invariance,” Intelligent Transportation Systems, IEEE
Transactions on, vol. 12, no. 1, pp. 184–193, 2011.
[7] J.M. Alvarez, T. Gevers, Y. LeCun, and A.M. Lopez, “Road
scene segmentation from a single image,” in Proceedings of
the 12th European Conference on Computer Vision. Springer,
2012.
[8] Y. Boykov and M.P. Jolly, “Interactive graph cuts for optimal
boundary & region segmentation of objects in nd images,” in
Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE
International Conference on. IEEE, 2001, vol. 1, pp. 105–112.
[9] O. Veksler, “Star shape prior for graph-cut image segmen-
tation,” in Proceedings of the 10th European Conference on
Computer Vision: Part III. Springer, 2008, pp. 454–467.
[10] V. Kolmogorov and R. Zabin, “What energy functions can be
minimized via graph cuts?,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 26, no. 2, pp. 147–
159, 2004.
[11] J. Xiao, J. Hays, K.A. Ehinger, A. Oliva, and A. Torralba, “Sun
database: Large-scale scene recognition from abbey to zoo,” in
Computer vision and pattern recognition (CVPR), 2010 IEEE
conference on. IEEE, 2010, pp. 3485–3492.
[12] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy
minimization via graph cuts,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, vol. 23, no. 11, pp. 1222–
1239, 2001.
[13] Y. Boykov and V. Kolmogorov, “An experimental compari-
son of min-cut/max-flow algorithms for energy minimization
in vision,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 26, no. 9, pp. 1124–1137, 2004.
2761