Conference PaperPDF Available

Abstract and Figures

Advanced Driver Assistance Systems (ADAS) have experienced major advances in the past few years. The main objective of ADAS includes keeping the vehicle in the correct road direction, and avoiding collision with other vehicles or obstacles around. In this paper, we address the problem of estimating the heading direction that keeps the vehicle aligned with the road direction. This information can be used in precise localization, road and lane keeping, lane departure warning, and others. To enable this approach, a large-scale database (+1 million images) was automatically acquired and annotated using publicly available platforms such as the Google Street View API and OpenStreetMap. After the acquisition of the database, a CNN model was trained to predict how much the heading direction of a car should change in order to align it to the road 4 meters ahead. To assess the performance of the model, experiments were performed using images from two different sources: a hidden test set from Google Street View (GSV) images and two datasets from our autonomous car (IARA). The model achieved a low mean average error of 2.359° and 2.524° for the GSV and IARA datasets, respectively; performing consistently across the different datasets. It is worth noting that the images from the IARA dataset are very different (camera, FOV, brightness, etc.) from the ones of the GSV dataset, which shows the robustness of the model. In conclusion, the model was trained effortlessly (using automatic processes) and showed promising results in real-world databases working in real-time (more than 75 frames per second).
Content may be subject to copyright.
Heading Direction Estimation Using Deep Learning
with Automatic Large-scale Data Acquisition
Rodrigo F. Berriel, Lucas Tabelini Torres, Vinicius B. Cardoso, Rˆ
anik Guidolini,
Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos
Departamento de Inform´
atica
Universidade Federal do Esp´
ırito Santo
Vit´
oria, Brazil
Email: rfberriel@inf.ufes.br
Abstract—Advanced Driver Assistance Systems (ADAS) have
experienced major advances in the past few years. The main
objective of ADAS includes keeping the vehicle in the correct
road direction, and avoiding collision with other vehicles or
obstacles around. In this paper, we address the problem of
estimating the heading direction that keeps the vehicle aligned
with the road direction. This information can be used in precise
localization, road and lane keeping, lane departure warning, and
others. To enable this approach, a large-scale database (+1 million
images) was automatically acquired and annotated using publicly
available platforms such as the Google Street View API and
OpenStreetMap. After the acquisition of the database, a CNN
model was trained to predict how much the heading direction
of a car should change in order to align it to the road 4 meters
ahead. To assess the performance of the model, experiments were
performed using images from two different sources: a hidden
test set from Google Street View (GSV) images and two datasets
from our autonomous car (IARA). The model achieved a low
mean average error of 2.359° and 2.524° for the GSV and IARA
datasets, respectively; performing consistently across the different
datasets. It is worth noting that the images from the IARA dataset
are very different (camera, FOV, brightness, etc.) from the ones
of the GSV dataset, which shows the robustness of the model. In
conclusion, the model was trained effortlessly (using automatic
processes) and showed promising results in real-world databases
working in real-time (more than 75 frames per second).
Index Terms—Deep Learning; Heading Estimation; Convolu-
tional Neural Networks.
I. INTRODUCTION
Advanced Driver Assistance Systems (ADAS) have experi-
enced major advances in the past few years. These systems
are increasing the safety of traffic and helping drivers by
monitoring the surrounds, and warning or acting to avoid acci-
dents. Three of the most common ADAS are stability control,
velocity control, and lane keeping. These technologies are also
part of systems that will be applied in fully autonomous cars.
The main objective of ADAS includes keeping the vehicle
in the correct road direction, and avoiding collision with other
vehicles or obstacles around. For this purpose, it is important
to estimate the current localization of the car in relation
to the lane or road. A common representation of the robot
Scholarships of Productivity on Research (grants 311120/2016-4 and
311504/2017-5) supported by Conselho Nacional de Desenvolvimento
Cient´
ıfico e Tecnol´
ogico (CNPq, Brazil); a scholarship supported by
Coordenac¸˜
ao de Aperfeic¸oamento de Pessoal de N´
ıvel Superior (CAPES,
Brazil); and Vale/FAPES (grant 75537958/16).
localization is the position and orientation, e.g., a 2D location
(x, y)and its heading direction (θ)in relation to the world
representation.
In this work, we address the problem of heading direction
estimation that keeps the vehicle in the road direction. This
information is essential for ADAS as well as for autonomous
cars navigation. Moreover, this information can be used in
precise localization, road and lane keeping, driver snooze
warning [1], lane change detection [2], lane departure warning
[3], and others.
Several approaches can be applied to heading direction
estimation. The main approaches are based in Global Naviga-
tion Satellite Systems (GNSS), Inertial Measurement Unities
(IMU) [4], localization using sensors as Light Detection And
Ranging sensors (LiDAR) with pre-computed maps [5], [6],
optical flow [7] and lane marking detection [8]. Nevertheless,
there are drawbacks in some of these approaches, such as:
GNSS-based approaches suffer from satellites unavailability
that can lead to position errors in urban environments with
many buildings, trees or inside tunnels; IMU sensors are
noisy; and LiDAR sensors are still expensive. By contrast,
cameras are low-cost sensors and can be used in GNSS
denied environments as cities. Yet, most of the camera-based
approaches are based on hand-crafted feature extraction (e.g.,
SIFT [9], SURF [10], HOG [11], among others) and classical
machine learning methods. These methods have shown to
be less generalizable and may fail in complex unstructured
environments as commonly found in real-world car navigation
and other driving related tasks [12].
On the other hand, with the advances in Graphics Processing
Units (GPUs) and the availability of large-scale annotated
databases, Deep Neural Network (DNN) approaches proved
better in several tasks such as image recognition [13], ob-
ject detection [14], road segmentation [15], facial expression
recognition [16], [17], time-series forecast [18], and many
others. DNNs learn which features must be considered instead
of hand-crafted features, enabling end-to-end approaches, as
showed in [19]. The authors present a DNN that learns a
complex task, such as car steering command, using only the
front-view raw image as input in real-world situations (i.e.,
luminosity variations, usual traffic, on-road and off-road).
A drawback of DNNs is the need for large amounts of
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
ROIs OSM Google
APIs
Automatic Data Acquisition
Training
Dataset
Automatic Data Annotation
∆θ
Convolutional Neural Network
Output
Fig. 1: Overview of the proposed system. Regions of interest (ROIs) are given to the OpenStreetMap (OSM) that returns a
list of points (blue circles) for each street in the ROIs. A linear interpolation is applied to generate new samples every 2
meters (gray circles). All the points are used to request images of the streets using the Google Street View API. After that, an
automatic annotation process is used to generate the training dataset of our model (a ConvNet). The model predicts the θ.
(annotated) data to enable the learning and generalization
processes. Furthermore, the processes of acquisition and an-
notation of large datasets are expensive and time-consuming.
As an alternative, automatic acquisition and annotation of large
databases via crowdsourcing and online platforms have proven
to provide good enough data to train DNNs [20], [21].
We propose an end-to-end approach (Figure 1) that uses a
DNN to estimate the heading direction angle in relation to
the road, using only a forward-looking front-view camera raw
image as input. The estimated heading direction represents
how much the current heading must change so that the
vehicle keeps itself aligned to the road 4 meters ahead. The
proposed system, inspired by [22], leverages public available
platforms (such as Google Street View and OpenStreetMap)
to automatically acquire and annotate a large-scale database.
Then, this database is used to train a CNN model end-to-end
to perform the task of interest.
The proposed approach was evaluated in an intra-database
test (GSV) with street-view images from several regions not
used in training, and in a cross-database test (IARA), using an
urban road dataset from an autonomous car. The experiments
showed that our model can estimate the heading direction to
maintain the car aligned to the road using only raw images
as input with a mean absolute error (MAE) of 2.359° for
the GSV experiment. In the IARA experiments, the model
achieved 2.524° of MAE, on average (2.358° and 2.756° in
the first and second route, respectively).
II. RE LATE D WOR KS
There are several solutions proposed for lane detection that
could be used for heading direction estimation, and the most
common ones rely only on images. In [3], Berriel et al. propose
a system that works on a temporal sequence of images and
for each image, one at time, applies an inverse perspective
mapping estimating the lane based on a combination of meth-
ods (Hough lines with Kalman filter and spline with particle
filter). The final lane is represented as a cubic spline. Jung et
al. [23] propose an aligned spatiotemporal image generated by
accumulating the pixels on a scanline along the time axis and
aligning consecutive scanlines. In that image, the trajectory of
the lane points appears smooth and forms a straight line. Lee
et al. [24] built a dataset with 20,000 images with labeled lanes
and trained a Convolutional Neural Network (CNN) to detect
and track the lane. Lane detection methods could be used to
compute the heading direction, but are generally dependent of
labeled datasets (for image processing adjustments or CNNs
training) that are expansive to generate.
In [21], Brahmbhatt and Hays collected a large-scale dataset
of street-view images organized in a graph where nodes are
connected by roads. They applied an A* algorithm to generate
the labels and trained a CNN to reach a destination by deciding
which direction to take at intersections. This method also uses
an automatic data acquisition and labeling system, but it is
more complex to generate the dataset labels because of the
A* algorithm used in the processes. Another issue is that their
CNN finds the path instead of the heading direction.
Bojarski et al. [19] trained a CNN to map a single front
image to steering commands. Their system learned to drive
on roads with or without lane markings, with traffic, unpaved
roads, parking lots and highways. The data was acquired using
their autonomous car. For each image the steering wheel angle,
recorded while a human was driving, was used as the image
label. In [25], Gupta et al. implemented a similar approach but
only for indoor environment robots. This approach is the most
similar to the one presented in this work, but the CNN predicts
the commands for actuation in the robot instead of the heading
direction. Another issue is that their approach depends on a
robot to generate the dataset, and that is sometimes difficult
and expansive to obtain.
In the set of solutions that rely on LIDAR data, Veronese
et al. [26] employs a particle filter localization by matching
the online 2D projection of LIDAR point cloud with offline
2D projection. Using this technique, they are able to compute
the car orientation in the world but they cannot find the road.
Hern´
andez et al. [27] estimate the heading angle by identifying
lane marks on the road surface using the reflection of a laser
system. They employed the DBSCAN, but they could not
achieve good precision. The biggest disadvantages of working
with LIDAR is their prohibitive cost and they generally require
additional drivers.
III. HEADING DIRECTION ESTIMATION SYS TE M
The proposed system is two-fold: i) the automatic data
acquisition and annotation of a large-scale database; and ii)
training and testing a Convolutional Neural Network (CNN)
to estimate the heading direction. The system receives as input
a single image and predicts the difference (θ) in the current
car orientation (θ) and the “ideal” car orientation 4 meters
ahead (θ+4) in the road. The automatic data acquisition and
annotation exploits online mapping platforms (such as Google
Street View and OpenStreetMap) to acquire road definition
points and imagery of a set of regions. The annotated large-
scale database is, then, fed into a CNN model that estimates
the required θto keep the vehicle aligned to the road. The
overview for the proposed system is presented in the Figure 1.
A. Automatic Data Acquisition and Annotation
In the past few years, CNNs have become very popular to
solve different kinds of problems, but a major drawback is that
they need large amounts of labeled data which may be difficult
and expensive to obtain. To overcome this issue, inspired by
[22], publicly available platforms are exploited to automati-
cally acquire and annotate a large-scale database. Firstly, the
data acquisition uses OpenStreetMap via the Overpass API to
obtain road definition points from a set of regions. Secondly,
images from the streets are acquired using Google Street View
Image APIs. Lastly, an automatic labeling process is applied
to generate the dataset.
The OpenStreetMap via the Overpass API receives an input
region and outputs the lists of roads within that region, where
a road is represented by a list of sequential points (latitude and
longitude). To cope with an API limitation, only rectangular
regions with sides smaller than 1/4 degrees (27.75 km)
were used. The list of points the API outputs is not evenly
distributed along the roads. Therefore, each road (sequence of
points) is upsampled (using linear interpolation) to enforce a
2meters distance between two consecutive points. Points
outside the region of interest are discarded.
In order to acquire the images, three steps are performed:
i) for each point on the road, the location of the closest image
is retrieved by requesting it to the Google Street View Image
Metadata API; then, ii) for each point on the road, the angle
(θ, based on the geodetic north) between its predecessor (2m
behind) and its successor (2m ahead) is computed, assuming it
represents the direction of the road at that point; finally, iii) an
image can be requested to the Google Street View Image API
using the location acquired in the first step and using the angle
(heading) computed in the second step. As the database needs
to have images associated with different angles, a random
noise α∼ U(25°,25°)is added to θwhen requesting the
image in the third step. Therefore, the final heading angle of
a given image is equal to θ+α. In addition, to increase to
robustness of the model in respect to vertical oscillations, the
image is requested with a random pitch γ∼ U(10°,10°).
An example of the angle calculation process can be seen
in the Figure 2, where the gray points are the road definition
points (labeled A-E), and the blue circle represents an image
θθ+4
A
BCD
E
Fig. 2: Angle calculation. The gray circles A, B, C, D and E are
road definition points with 2 meters of distance between each
another. The blue circle represents an image location and B its
closest point. The angle to acquire a forward-looking image
is denoted as θ, derived from
AC (upwards dashed arrows
represent the geodetic north). The “ideal” angle 4 meters ahead
is denoted as θ+4, derived from
CE . The difference between
θand θ+4 gives θ, i.e., how much the heading direction
should change to keep the car aligned 4 meters ahead.
position. For example, point B is the closest one to an image
(blue circle) and the previous and next points 2m away (A
and C, respectively) form the vector
AC. In this context, θis
given by the orientation of
AC in respect to the True North.
To generate a dataset to the task of interest, it is required
to annotate all images, i.e., to associate every image with its
corresponding θ(not the θitself). Therefore, in possession
of the vector
AC, the same process is performed to the point
4m ahead of the point of interest (point B, in this example),
which is the point D. Using the vector
CE , the “ideal” angle
4 meters ahead (θ+4) can be calculated, as shown in Figure 2.
As a result, the θassociated with the image represented by
the blue circle can be defined as θ=θ+4 θα, where
αis the random noise added when requesting the image. This
process is performed for every image in the dataset.
B. Deep Neural Network for Heading Estimation
In possession of a annotated dataset, i.e., a set of images and
the corresponding difference in orientation (θ) the car must
achieve 4m ahead, it is feasible to train a model. In this work, a
Convolutional Neural Network (CNN) was chosen. During the
training, the CNN receives an image as input and predicts the
θ. The architecture of the CNN was inspired in the AlexNet
[13] with some modifications, such as using less neurons on
the fully-connected layers; using Batch Normalization instead
of Local Response Normalization; replacing the output layer
by just one neuron with linear activation function to perform
the regression; and changing the input size to 128×128 pixels.
The dataset was split into 3 sets: train, validation and test.
None of the regions used in the test set were seen during
training. Details of the datasets are presented in the Section IV.
During training, all images were downsampled to the input
size of the model: 128 ×128 pixels. In addition, the images
were converted to grayscale mainly for two reasons: i) color is
assumed not to be important for this task, and ii) performance
is a must (less channels means less computation) for this
kind of application. The weights were randomly initialized
using [28]. The objective of the training procedure was to
minimize the Mean Squared Error (MSE). For that, the Adam
optimizer [29] (with β1= 0.9and β1= 0.999) was used with
initial learning rate equal to 103, decaying by 10 every time
the validation loss stops improving for 3 consecutive epochs.
Moreover, the model was trained until the validation loss
reached a plateau for 10 consecutive epochs. After training,
the network is ready for inference and experiments.
IV. EXPERIMENTAL METHODOLOGY
In order to validate the proposed system, two experiments
are carried out. Each experiment was evaluated in a database
that was collected automatically using the proposed system
(Section III-A), and described below. Moreover, the metric
used in the experimentation is shown. Finally, the experiments
and setup are presented in details.
A. Datasets
To validate the proposed system, the experiments were
performed using two datasets: Google Street View (GSV)
and IARA. The GSV dataset was used to train the model
responsible for the prediction and to validade the performance
of the system. The IARA dataset was used exclusively for
performance evaluation. Both datasets are described below.
a) Google Street View (GSV) dataset: The GSV dataset
was generated automatically using the system described in
Section III-A. Therefore, the dataset contains images and the
corresponding θthat will guide the vehicle to the heading
direction of the road. The images were collected from several
cities in Brazil, acquiring images from every single road within
a region of interest. These paths include samples of highways,
paved roads with and without lane markings, and unpaved
roads in both urban and rural areas; different road widths,
number of lanes, lane marking and curvatures; streets with
strong shadows; and many other scenarios. Some samples can
be seen in the (Figure 3).
The dataset has a total of 1,035,524 images of 640 ×480
pixels and was split into train, validation, and test sets. The
train set has 782,123 images of nine different regions in five
different states of Brazil (Manaus-AM, Recife-PE, Rio de
Janeiro-RJ, Salvador-BA, S˜
ao Paulo-SP, Piracicaba-SP, and
Teresina-PI). Even though a region is named after a city,
some of them include parts of neighbor cities as well. The
validation set has 45,534 images of one region (Natal-RN).
The test set has 207,867 images from three different cities in
three different states (Macei´
o-AL, Porto Alegre-RS, Vit´
oria-
ES). The region of Vit´
oria-ES was specifically chosen to be
in the test set, because of the experiments using IARA dataset
that was recorded there.
Fig. 3: Sample images of the GSV dataset.
b) IARA dataset: Another dataset was acquired using the
Intelligent Autonomous Robotic Automobile (IARA), shown
in Figure 4, that is developed by the High Performance
Computing Laboratory (LCAD, Laborat´
orio de Computac¸˜
ao
de Alto Desempenho in Portuguese). IARAs software is com-
posed of many modules and the five main ones are: the Mapper
[30] computes a map of the environment around IARA; the
Localizer [26], [31] estimates IARAs state (position {x, y}
and orientation {θ}), relative to the origin of the map; the
Motion Planner [32] computes a trajectory from the current
IARAs state to the next goal state; the Obstacle Avoider
[33] verifies and eventually changes the current trajectory in
case it becomes necessary to avoid a collision; the Controller
[34] converts the control commands of the trajectories into
acceleration, brake and steering efforts to actuate in IARAs
hardware.
IARAs localizer module computes the car orientation rela-
tive to the map, and the map orientation is aligned to world,
therefore directly providing the car orientation (θ) computed
approximately at 20Hz (i.e., every 50ms). The difference in
orientation the car must achieve 4m ahead (θ) is obtained in
the same way it is done for the training dataset: the difference
between the orientation (θ) of the closest point of an image
and the orientation (θ+4) of the point 4m ahead. The only
difference, though, is that in this case the orientations are
computed by IARAs Localizer, directly from sensors data.
After performing this process in every image, the dataset is
automatically annotated.
The IARA dataset was built to evaluate the robustness
of the model to different images from a different camera,
with different field-of-view, and other factors. To generate the
dataset, two routes were covered (both routes are in the same
city, Vit´
oria-ES, and can be seen in Figure 7). The first route
has 6.2km in an urban environment that crosses avenues with
single and multiples lanes and neighborhood areas with narrow
streets.The second route is the ring-road of the Universidade
Federal do Esp´
ırito Santo (UFES) main campus with 3.5km.
This route has sharp and wide curves, varying road widths
and variations in road pavement, such as cobbles and asphalt,
most of it without lane marking.
To automatically annotate this dataset, a human driver had
to drive IARA through each route twice. In the first lap (i.e.,
ground-truth lap), the car is carefully driven trying to keep the
vehicle aligned to the road direction. This first lap is used as
Fig. 4: Intelligent Robotic Autonomous Automobile (IARA).
Egt
Dgt
Cgt
Bgt
θ
θ+4
Ama
Agt
Ema
Bma Cma Dma
Fig. 5: Ground-truth calculation in the IARA dataset. The
green and pink dotted lines are the ground-truth and maneuver
laps, respectively. The circles A, B, C, D and E are road
definition points with 2 meters of distance between each other.
The ground-truth (θ) for a point (e.g., Bma) is given by θ
of Bma and θ+4 of Dgt, where Dgt is the point 4m ahead of
the closest point of Bma in the ground-truth lap (i.e., Bgt).
ground truth for indicating the road direction. In the second
lap (i.e., maneuver lap), in order to generate more variability,
the driver intentionally performs maneuvers like sudden lane
changes and abrupt turns. These maneuvers cause the heading
direction of the car to differ from the heading direction of the
road, therefore creating a rich dataset. Finally, to compute the
θfor each image in the maneuver lap, the ground-truth and
the maneuver laps had to be synchronized. Since the maneuver
lap does not necessarily point to the direction of the lane, it
is not reliable to be used to calculate the ground truth angle
4m ahead (θ+4). Therefore, they are synchronized so that the
corresponding θ+4 can be calculated for every position in the
maneuver lap (see Figure 5). The ground truth calculation is
as follows. For each image of the maneuver lap, the closest
pose of the car (Bma, point B in the maneuver lap) is used as
the location of the image. Then, the θfor the Bma position is
calculated using the information of the maneuver lap following
the procedure illustrated in the Figure 2. The corresponding
θ+4 is calculated using the ground-truth lap. Therefore, the
point Bgt in the ground-truth lap (i.e., the closest point to
Bma) is used to find the point Dg t that is 4m ahead in
the ground-truth lap. The θ+4 of Dgt is calculated using the
information of the ground-truth lap following the procedure
illustrated in Figure 2. Finally, θis computed using θof
Bma and θ+4 of Dgt.
Some samples of the IARA dataset can be seen in the
Figure 6. The images were collected in daylight on a week
day with usual traffic. The first route has a total of 10,040
images. The second route has a total of 7,182 images. Both
route images were recorded by a ZED stereo camera installed
in the back of the rearview mirror of IARA. Images of IARA
Fig. 6: Sample images of the IARA dataset (first route).
(a) (b)
Fig. 7: Aerial view of the first (a) and second (b) routes of the
IARA dataset.
dataset were recorded at approximately 15 frames per second
with 1920 ×1080 pixels, each. As the images from the GSV
dataset have a ratio of 4 : 3, a centralized crop of 1440×1080
pixels was performed in the original images from the ZED
camera to achieve the same ratio. All three datasets (GSV and
the two routes of the IARA dataset) are very different between
one another (see Figure 3, Figure 6, and Figure 8): differences
in brightness, contrast, position, field of view, and others. All
these factors make this dataset more challenging.
B. Metrics
The metric for evaluation was the Mean Absolute Error
(MAE) as in the Equation 1:
MAE =
Pn
i=1
ˆ
θiθi
n(1)
where nis the number of samples, ˆ
θiis the prediction of the
model and θiis the correct heading angle for the ith sample.
C. Experiments
In order to evaluate the proposed system, two experiments
were performed. The experiments are differentiated by the
evaluation protocol: i) intra-database, where the model is
evaluated in part of the same database used during the training
(train and test regions are mutually exclusive); and ii) cross-
database, where the model is trained in one database and
evaluated in a different one. In both experiments, the model
was only trained with the train set of the GSV dataset.
Fig. 8: Sample images of the IARA dataset (second route).
In the intra-database experiment, the trained model was used
to predict the heading direction in the test set of the GSV
dataset. Then, the heading direction estimated by the model
is compared with the ground truth of the GSV dataset test set
automatically acquired. In the cross-database method, the same
model used in the intra-database experiment is evaluated in the
datasets from IARA. This aims to evaluate the resulting model
in a real application, given that a model is likely to perform (or
expected to be robust, at least) on a different and unpredictable
new setting: camera height, field-of-view, brightness, contrast,
etc.
D. Setup
The experiments were carried out in an Intel Core i7-4770
3.40 GHz with 16GB of RAM, and a Titan Xp GPU with
12GB of memory. The machine was running Linux Ubuntu
16.04 with NVIDIA CUDA 9.0 and cuDNN 7.0. The training
and inference steps were performed using Keras [35] with
Tensorflow 1.4 [36] as the backend. The IARA dataset (both
routes with more than 17,000 images in total), the pre-trained
model and scripts to use on your own data are publicly
available1.
V. EX PE RI ME NTAL RESULTS
The first experiments were performed by both training and
testing on the GSV dataset. Training and test sets were region-
wise mutually exclusive, i.e., none of the regions used for
training were in the test sets. The results are reported in terms
of MAE. The MAE for the GSV dataset was equal to 2.359°.
As can be seen in the Figure 9a that depicts the frequency of
the errors, most of the errors are close to zero.
Besides the frequency of the errors, it can be seen in the
Figure 9b that the MAE is stable for every heading angle to
be predicted, except for the angles that are closer to zero. This
indicates that the model is more reliable closer to zero, where
most of the driving activity usually happens.
In the second experiment, the same model was evaluated
using IARA datasets. None of the regions of the IARA datasets
were used to train the model. As can be seen in the Figure 10a,
1https://github.com/rodrigoberriel/predict-future-heading
the frequency of the error of the first route is also larger around
zero. Furthermore, despite the differences between the images
of the GSV dataset, the MAE also tends to be smaller near
zero (see Figure 10b). The mean average error on the first
route was 2.358°.
The distribution of the error of the second route (see
Figure 11a) is on par with the results of both GSV dataset
and the first route of the IARA dataset. This fact indicates
that the model is robust to several factors that vary between
these datasets. In the Figure 11b, it can be seen that the model
struggled to predict θgreater than 15. However, despite the
fact that most of the second route has no asphalt and no lane
markings on the street (only cobble), the mean average was
2.756°. On average, the model achieved a MAE of 2.524° in
the IARA experiments.
Another important fact that is worth noting is that, unlike
[19], we neither balanced the training set with road curves nor
excluded lane changes and turns from one road to another.
Both these changes could improve the performance of the
model even further. Even though there are other works related
to ours, fair comparisons are difficult for many reasons: i) the
variables of interest are different; ii) oftentimes, the databases
are private (i.e., we cannot run our experiments on their data);
and iii) neither the models nor a way to reproduce their work
are provided (i.e., we cannot run their models on our data). To
enable future fair comparisons, we released our database (first
and second routes of the IARA dataset) and the pre-trained
models.
Qualitative results on the IARA datasets can be seen in the
videos of the first2and second3routes. As can be seen in
the video, the images are very different, there are many lane
changes, turns from one road to another and intersections. In
addition, the dataset was recorded with the usual traffic of a
weekday. A top-view is also depicted in the videos to facilitate
the quality assessment. Moreover, the model can predict in
more than 75 frames per second using the GPU, i.e., the system
is suitable for real-time applications.
2https://youtu.be/htl eJP5oMo
3https://youtu.be/XXckftkJLGc
(a) (b)
Fig. 9: Results for the GSV dataset: a) presents the frequency of the errors and b) the MAE per true θ.
(a) (b)
Fig. 10: Results on the first route of the IARA dataset: a) presents the frequency of the errors and b) the MAE per true θ.
VI. CONCLUSION
In this work, we presented a model that is able to predict
how much the heading angle of a vehicle must change to
be aligned with the road 4 meters ahead based on a single
image. For that, a large-scale database (+1 million images)
was automatically acquired and annotated using images from
publicly available platforms (e.g., Google Street View, Open-
StreetMap). To evaluate our model, two protocols were used:
intra-database and cross-database. In the intra-database exper-
iment, the model was evaluated in regions never seen during
training, i.e., mutually exclusive, and it was able to achieve a
mean absolute error (MAE) of 2.359°. In the cross-database
experiments, the same model used in the intra-database was
evaluated in two datasets acquired using an autonomous car.
These datasets were automatically annotated using the IARAs
systems. In this setting, the model was able to achieve a MAE
of 2.524° (2.358° and 2.756° in the first and second route,
respectively). The results are promising and indicate that the
model is very robust, especially when the car is almost aligned
to the road. Also, qualitative results show that the predictions
of the model are fairly stable and it works in real-world traffic
situations.
For future works, analysis of the results will be further
extended in order to investigate some of the biases that can
be observed in the cross-database evaluation. In addition,
normalizations must be explored even further in order to
reduce the difference between the GSV and other databases
in an automatic way, without penalizing (perhaps improving)
the performance.
ACKNOWLEDGMENT
We gratefully acknowledge the support of NVIDIA Corpo-
ration with the donation of the GPUs used for this research.
Cloud computing resources were provided by a Microsoft
Azure for Research award.
REFERENCES
[1] W. Zhang, Y. L. Murphey, T. Wang, and Q. Xu, “Driver yawning
detection based on deep convolutional neural learning and robust
nose tracking,” in International Joint Conference on Neural Networks
(IJCNN), 2015, pp. 1–8.
[2] X. Wang, Y. L. Murphey, and D. S. Kochhar, “MTS-DeepNet for
lane change prediction,” in International Joint Conference on Neural
Networks (IJCNN), 2016, pp. 4571–4578.
[3] R. F. Berriel, E. de Aguiar, A. F. de Souza, and T. Oliveira-Santos,
“Ego-Lane Analysis System (ELAS): Dataset and Algorithms,” Image
and Vision Computing, vol. 68, pp. 64–75, 2017.
[4] K. Gade, “The Seven Ways to Find Heading,” Journal of Navigation,
vol. 69, no. 05, pp. 955–970, 2016.
[5] J. Levinson, M. Montemerlo, and S. Thrun, “Map-Based Precision
Vehicle Localization in Urban Environments,” in Robotics: Science and
Systems III, vol. 4, 2007.
(a) (b)
Fig. 11: Results on the second route of the IARA dataset: a) presents the frequency of the errors and b) the MAE per true θ.
[6] L. de Paula Veronese, J. Guivant, F. A. A. Cheein, T. Oliveira-Santos,
F. Mutz, E. de Aguiar, C. Badue, and A. F. D. Souza, “A light-weight
yet accurate localization system for autonomous cars in large-scale
and complex environments,” in International Conference on Intelligent
Transportation Systems (ITSC), 2016, pp. 520–525.
[7] M. J. Cree, J. A. Perrone, G. Anthonys, A. C. Garnett, and H. Gouk,
“Estimating heading direction from monocular video sequences using
biologically-based sensors,” in International Conference on Image and
Vision Computing New Zealand (IVCNZ), 2016, pp. 1–6.
[8] R. F. Berriel, E. de Aguiar, V. V. de Souza Filho, and T. Oliveira-Santos,
“A Particle Filter-Based Lane Marker Tracking Approach Using a Cubic
Spline Model,” in SIBGRAPI Conference on Graphics, Patterns and
Images, 2015, pp. 149–156.
[9] D. G. Lowe, “Object recognition from local scale-invariant features,” in
International Conference on Computer Vision (ICCV), 1999, pp. 1150–
1157.
[10] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust
Features,” in European Conference on Computer Vision (ECCV), 2006,
pp. 404–417.
[11] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
detection,” in Conference on Computer Vision and Pattern Recognition
(CVPR), 2005, pp. 886–893.
[12] L. Tai and M. Liu, “Deep-learning in Mobile Robotics - from Perception
to Control Systems: A Survey on Why and Why not,arXiv preprint
arXiv:1612.07139, 2016.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification
with Deep Convolutional Neural Networks,” in Advances in Neural
Information Processing Systems (NIPS), 2012, pp. 1097–1105.
[14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only
Look Once: Unified, Real-Time Object Detection,” in Conference on
Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
[15] Z. Chen and Z. Chen, “RBNet: A Deep Neural Network for Unified
Road and Road Boundary Detection,” in International Conference on
Neural Information Processing, 2017, pp. 677–687.
[16] A. T. Lopes, E. de Aguiar, A. F. De Souza, and T. Oliveira-Santos,
“Facial expression recognition with convolutional neural networks: cop-
ing with few data and the training sample order,Pattern Recognition,
vol. 61, pp. 610–628, 2017.
[17] M. V. Zavarez, R. F. Berriel, and T. Oliveira-Santos, “Cross-Database Fa-
cial Expression Recognition Based on Fine-Tuned Deep Convolutional
Network,” in SIBGRAPI Conference on Graphics, Patterns and Images,
2017, pp. 405–412.
[18] R. F. Berriel, A. T. Lopes, A. Rodrigues, F. M. Varej˜
ao, and T. Oliveira-
Santos, “Monthly Energy Consumption Forecast: A Deep Learning Ap-
proach,” in International Joint Conference on Neural Networks (IJCNN),
2017, pp. 4283–4290.
[19] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp,
P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to
End Learning for Self-Driving Cars,arXiv preprint arXiv:1604.07316,
2016.
[20] R. F. Berriel, A. T. Lopes, A. F. de Souza, and T. Oliveira-Santos, “Deep
Learning-Based Large-Scale Automatic Satellite Crosswalk Classifica-
tion,” Geoscience and Remote Sensing Letters, vol. 14, no. 9, pp. 1513–
1517, 2017.
[21] S. Brahmbhatt and J. Hays, “DeepNav: Learning to Navigate Large
Cities,” in Conference on Computer Vision and Pattern Recognition
(CVPR), 2017, pp. 3087–3096.
[22] R. F. Berriel, F. S. Rossi, A. F. de Souza, and T. Oliveira-Santos,
“Automatic large-scale data acquisition via crowdsourcing for crosswalk
classification: A deep learning approach,” Computers & Graphics,
vol. 68, pp. 32–42, 2017.
[23] S. Jung, J. Youn, and S. Sull, “Efficient lane detection based on spa-
tiotemporal images,” Transactions on Intelligent Transportation Systems,
vol. 17, no. 1, pp. 289–295, 2016.
[24] S. Lee, I. S. Kweon, J. Kim, J. S. Yoon, S. Shin, O. Bailo, N. Kim, T.-
H. Lee, H. S. Hong, and S.-H. Han, “VPGNet: Vanishing Point Guided
Network for Lane and Road Marking Detection and Recognition,” in
International Conference on Computer Vision (ICCV), 2017, pp. 1965–
1973.
[25] S. Gupta, J. Davidson, S. Levine, R. Sukthankar, and J. Malik, “Cogni-
tive Mapping and Planning for Visual Navigation,” in Conference on
Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7272–
7281.
[26] L. de Paula Veronese, J. Guivant, F. A. A. Cheein, T. Oliveira-Santos,
F. Mutz, E. de Aguiar, C. Badue, and A. F. De Souza, “A light-weight
yet accurate localization system for autonomous cars in large-scale
and complex environments,” in International Conference on Intelligent
Transportation Systems (ITSC), 2016, pp. 520–525.
[27] D. C. Hernandez, A. Filonenko, D. Seo, and K.-H. Jo, “Laser scanner
based heading angle and distance estimation,” in International Confer-
ence on Industrial Technology (ICIT), 2015, pp. 1718–1722.
[28] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in International Conference on Artificial
Intelligence and Statistics, 2010, pp. 249–256.
[29] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,
in International Conference on Learning Representations (ICLR), 2015.
[30] F. Mutz, L. P. Veronese, T. Oliveira-Santos, E. de Aguiar, F. A. A.
Cheein, and A. F. De Souza, “Large-scale mapping in complex field
scenarios using an autonomous car,Expert Systems with Applications,
vol. 46, pp. 439–462, 2016.
[31] L. de Paula Veronese, E. de Aguiar, R. C. Nascimento, J. Guivant,
F. A. A. Cheein, A. F. De Souza, and T. Oliveira-Santos, “Re-emission
and satellite aerial maps applied to vehicle localization on urban environ-
ments,” in International Conference on Intelligent Robots and Systems
(IROS), 2015, pp. 4285–4290.
[32] V. Cardoso, J. Oliveira, T. Teixeira, C. Badue, F. Mutz, T. Oliveira-
Santos, L. Veronese, and A. F. De Souza, “A Model-Predictive Motion
Planner for the IARA autonomous car,” in International Conference on
Robotics and Automation (ICRA), 2017, pp. 225–230.
[33] R. Guidolini, C. Badue, M. Berger, L. de Paula Veronese, and A. F.
De Souza, “A simple yet effective obstacle avoider for the IARA au-
tonomous car,” in International Conference on Intelligent Transportation
Systems (ITSC), 2016, pp. 1914–1919.
[34] R. Guidolini, A. F. De Souza, F. Mutz, and C. Badue, “Neural-based
model predictive control for tackling steering delays of autonomous
cars,” in International Joint Conference on Neural Networks (IJCNN),
2017, pp. 4324–4331.
[35] F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015.
[36] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser,
M. Kudlur, J. Levenberg, D. Man´
e, R. Monga, S. Moore, D. Murray,
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar,
P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´
egas, O. Vinyals,
P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,
“TensorFlow: Large-scale machine learning on heterogeneous systems,”
2015, software available from tensorflow.org. [Online]. Available:
https://www.tensorflow.org/
... Autonomous driving and advanced driver assistance systems (ADAS) are two of the fields that have benefited from the advances of deep learning techniques. In this context, many applications have been tackled, such as estimation of car heading direction [4], detection of traffic lights [31] and traffic signs [48], and analysis of pedestrian movements [35]. This work focuses on the problem of detecting traffic signs, whose goal is to locate signs of interest along the road (with the help of a camera mounted on a vehicle) and classify their specific type (e.g., whether it is a 60 or a 80 km/h sign). ...
Article
Full-text available
Deep learning has become a standard approach to machine vision in recent years. Despite several advances, it requires large amounts of annotated data. Nonetheless, in many applications, large-scale data acquisition and annotation is expensive and data imbalance is an intrinsic problem. To address these challenges, we propose a novel synthetic database generation method that only requires (i) arbitrary natural images, i.e., does not demand real images from the target domain, and (ii) templates of the traffic signs. Our method does not aim at overcoming the training with real data but to be a compatible option when there is a lack of real data. Results with data of multiple countries show that the synthetic database generated without human effort is effective for training a deep traffic sign detector. On large datasets, training with a fully synthetic dataset almost matches the performance of training with a real one. When compared to training with a smaller dataset of real images, training with synthetic images increased the accuracy by 12.25%. The proposed method also improves the performance of the detector when target-domain data are available.
... Autonomous driving has been a very active research area in the past few years [1][2][3][4][5][6] . For an autonomous vehicle to be safe, it has to be aware of its surroundings, which includes detecting pedestrians, traffic signs, and traffic lights. ...
Article
Deep neural networks come as an effective solution to many problems associated with autonomous driving. By providing real image samples with traffic context to the network, the model learns to detect and classify elements of interest, such as pedestrians, traffic signs, and traffic lights. However, acquiring and annotating real data can be extremely costly in terms of time and effort. In this context, we propose a method to generate artificial traffic-related training data for deep traffic light detectors. This data is generated using basic non-realistic computer graphics to blend fake traffic scenes on top of arbitrary image backgrounds that are not related to the traffic domain. Thus, a large amount of training data can be generated without annotation efforts. Furthermore, it also tackles the intrinsic data imbalance problem in traffic light datasets, caused mainly by the low amount of samples of the yellow state. Experiments show that it is possible to achieve results comparable to those obtained with real training data from the problem domain, yielding an average mAP and an average F1-score which are each nearly 4 p.p. higher than the respective metrics obtained with a real-world reference model.
... The success of deep learning on autonomous driving and on advanced driver assistance systems (ADAS) has been demonstrated on several applications: scene semantic segmentation [2], traffic light detection [3], crosswalk classification [4], traffic sign detection [5], pedestrian analysis [6], car heading L. Tabelini, R. Berriel direction estimation [7] and many other applications. This work focuses on the traffic sign detection problem, whose goal is to locate signs of interest along the road (with the help of a camera mounted on a vehicle) and classify their specific type (e.g., whether it is a 60 or a 80 km/h sign). ...
Preprint
Deep learning has been successfully applied to several problems related to autonomous driving, often relying on large databases of real target-domain images for proper training. The acquisition of such real-world data is not always possible in the self-driving context, and sometimes their annotation is not feasible. Moreover, in many tasks, there is an intrinsic data imbalance that most learning-based methods struggle to cope with. Particularly, traffic sign detection is a challenging problem in which these three issues are seen altogether. To address these challenges, we propose a novel database generation method that requires only (i) arbitrary natural images, i.e., requires no real image from the target-domain, and (ii) templates of the traffic signs. The method does not aim at overcoming the training with real data, but to be a compatible alternative when the real data is not available. The effortlessly generated database is shown to be effective for the training of a deep detector on traffic signs from multiple countries. On large data sets, training with a fully synthetic data set almost matches the performance of training with a real one. When compared to training with a smaller data set of real images, training with synthetic images increased the accuracy by 12.25%. The proposed method also improves the performance of the detector when target-domain data are available.
... [17]- [19]. Besides, DNNs are also a well-established approach in traffic flow prediction [20], [21], automatic driving fault prediction [22], and railway track circuit conflict detection [9], [23]. ...
Article
Full-text available
Communication-Based Train Control System (CBTC) system is an automated system for train control based on bidirectional train-ground communication. Safety-risk estimation is a vital approach that strives to guide the CBTC system to guarantee the safe operation of vehicles. We propose a deep learning method to predict safety-risk states that combined with formal methods. First, the impact factors are selected, and the movement authorization (MA) failure rate is calculated by statistical model checking. Then, we use a deep neural network to model the relationship between the safe-risk states and the train operation status. Experimental results show that our method can achieve an accuracy of 97.4% on safety-risk prediction, and exceeds the baseline methods.
... Similarly, recent works incorporating deep neural networks to more advanced RL algorithms for navigation tasks have shown promising results in simulated environments. [14], [21], [?], [39], [40] demonstrate deep RL agents performing goal-oriented navigation tasks using real-world street images and at city scale [14], [23], [25], [3], generalizing over different cities with minimal additional training and network architecture changes. However, these approaches often use additional feedback data such as reward function values or the agentrelative velocity that further increase the policy architecture and sample complexity. ...
Preprint
Full-text available
Visual navigation tasks in real world environments often require both self-motion and place recognition feedback. While deep reinforcement learning has shown success in solving these perception and decision-making problems in an end-to-end manner, these algorithms require large amounts of experience to learn navigation policies from high-dimensional inputs, which is generally impractical for real robots due to sample complexity. In this paper, we address these problems with two main contributions. We first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal images representations that can then be used to effectively learn control policies at kilometer scale from a small amount of experience. Second, we present an interactive and realistic framework, called CityLearn, that enables for the first time the training of navigation algorithms across city-sized, real-world environments with extreme environmental changes. CityLearn features over 10 benchmark real-world datasets often used in place recognition research with more than 100 recorded traversals and across 60 cities around the world. We evaluate our approach in two CityLearn environments where our navigation policy is trained using a single traversal. Results show our method can be over 2 orders of magnitude faster than when using raw images and can also generalize across extreme visual changes including day to night and summer to winter transitions.
Article
Full-text available
Deep learning technology is outstanding in visual inspection. However, in actual industrial production, the use of deep learning technology for visual inspection requires a large number of training data with different acquisition scenarios. At present, the acquisition of such datasets is very time-consuming and labor-intensive, which limits the further development of deep learning in industrial production. To solve the problem of image data acquisition difficulty in industrial production with deep learning, this paper proposes a data augmentation method for deep learning based on multi-degree of freedom (DOF) automatic image acquisition and designs a multi-DOF automatic image acquisition system for deep learning. By designing random acquisition angles and random illumination conditions, different acquisition scenes in actual production are simulated. By optimizing the image acquisition path, a large number of accurate data can be obtained in a short time. In order to verify the performance of the dataset collected by the system, the fabric is selected as the research object after the system is built, and the dataset comparison experiment is carried out. The dataset comparison experiment confirms that the dataset obtained by the system is rich and close to the real application environment, which solves the problem of dataset insufficient in the application process of deep learning to a certain extent.
Article
Although there has been substantial research in system analytics for risk assessment in traditional methods, little work has been done for safety risk prediction in communication-based train control (CBTC) system, especially intelligently predicting risk caused by the uncertainty in the system operation. Risk prediction and assessment of hazards in train control systems are vital for the safety and efficiency of urban rail transit. In this paper, we propose an intelligent hazard-risk prediction model based on a deep recurrent neural network for a new communication-mode CBTC system. First, a train-to-train communication-based train control (T2T-CBTC) system is proposed to improve the drawback of CBTC in information-exchanging mode. Then we design a risk prediction feature selection and generation method and estimate a critical function feature in the T2T-CBTC system by statistical model checking. Finally, we construct our intelligent hazard-risk prediction model based on a deep recurrent neural network using a long-short-term memory (LSTM) network. The model had excellent risk prediction classification results and performance in our experiment, even for unbalanced data set. This model consistently outperforms the deep belief network trained in Accuracy, Precision, Recall and F1-score for the hazard-risk prediction problem. Specifically, the mean accuracy is 97.2% and mean F1-score is 93.9% in overall performance of model. The improvements of our model against DBN model are 8.2% for Precision, 7% for Recall and 8% for F1-score.
Conference Paper
Full-text available
In this paper, we propose a unified end-to-end trainable multi-task network that jointly handles lane and road marking detection and recognition that is guided by a vanishing point under adverse weather conditions. We tackle rainy and low illumination conditions, which have not been extensively studied until now due to clear challenges. For example, images taken under rainy days are subject to low illumination, while wet roads cause light reflection and distort the appearance of lane and road markings. At night, color distortion occurs under limited illumination. As a result, no benchmark dataset exists and only a few developed algorithms work under poor weather conditions. To address this shortcoming, we build up a lane and road marking benchmark which consists of about 20,000 images with 17 lane and road marking classes under four different scenarios: no rain, rain, heavy rain, and night. We train and evaluate several versions of the proposed multi-task network and validate the importance of each task. The resulting approach, VPGNet, can detect and classify lanes and road markings, and predict a vanishing point with a single forward pass. Experimental results show that our approach achieves high accuracy and robustness under various conditions in real-time (20 fps). The benchmark and the VPGNet model will be publicly available.
Article
Full-text available
Correctly identifying crosswalks is an essential task for the driving activity and mobility autonomy. Many crosswalk classification, detection and localization systems have been proposed in the literature over the years. These systems use different perspectives to tackle the crosswalk classification problem: satellite imagery, cockpit view (from the top of a car or behind the windshield), and pedestrian perspective. Most of the works in the literature are designed and evaluated using small and local datasets, i.e. datasets that present low diversity. Scaling to large datasets imposes a challenge for the annotation procedure. Moreover, there is still need for cross-database experiments in the literature because it is usually hard to collect the data in the same place and conditions of the final application. In this paper, we present a crosswalk classification system based on deep learning. For that, crowdsourcing platforms, such as OpenStreetMap and Google Street View, are exploited to enable automatic training via automatic acquisition and annotation of a large-scale database. Additionally, this work proposes a comparison study of models trained using fully-automatic data acquisition and annotation against models that were partially annotated. Cross-database experiments were also included in the experimentation to show that the proposed methods enable use with real world applications. Our results show that the model trained on the fully-automatic database achieved high overall accuracy (94.12%), and that a statistically significant improvement (to 96.30%) can be achieved by manually annotating a specific part of the database. Finally, the results of the cross-database experiments show that both models are robust to the many variations of image and scenarios, presenting a consistent behavior.
Article
Full-text available
Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research detection, estimation, and tracking in the past two decades. The interest in this topic has increased even more with the demand for advanced driver assistance systems (ADAS) and self-driving cars. Although extensively studied independently, there is still need for studies that propose a combined solution for the multiple problems related to the ego-lane, such as lane departure warning (LDW), lane change detection, lane marking type (LMT) classification, road markings detection and classification, and detection of adjacent lanes (i.e., immediate left and right lanes) presence. In this paper, we propose a real-time Ego-Lane Analysis System (ELAS) capable of estimating ego-lane position, classifying LMTs and road markings, performing LDW and detecting lane change events. The proposed vision-based system works on a temporal sequence of images. Lane marking features are extracted in perspective and Inverse Perspective Mapping (IPM) images that are combined to increase robustness. The final estimated lane is modeled as a spline using a combination of methods (Hough lines with Kalman filter and spline with particle filter). Based on the estimated lane, all other events are detected. To validate ELAS and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created. The dataset was manually annotated and made publicly available to enable evaluation of several events that are of interest for the research community (i.e., lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes). ELAS achieved high detection rates in all real-world events and proved to be ready for real-time applications.
Conference Paper
Accurately detecting road and its boundary on the images is an essential task for vision-based autonomous driving systems. However, prevailing methods either only detect road or add an extra processing stage to detect road boundary. In this work, we introduce a deep neural network, called Road and road Boundary detection Network (RBNet), that can detect both road and road boundary in a single process. In specific, we first investigate the contextual relationship between the road structure and its boundary arrangement and then model them with a Bayesian network. By implementing the Bayesian model, the RBNet can learn to simultaneously estimate the probabilities of a pixel on the image belonging to the road and road boundary. Comprehensive evaluations are carried out based on the well-known road benchmark, which can demonstrate the compelling performance of the proposed method.
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry