Conference PaperPDF Available

Collecting and Processing a Self-Driving Dataset in the UPB Campus

Collecting and processing a self-driving dataset in
the UPB campus
Andrei Cristian Nica
Mihai Tr˘
Alexandru Andrei Rotaru
Constantin Andreescu
Alexandru Sorici
AIMAS, Faculty of Automatic Control and Computers
University Politehnica Bucharest
Bucharest, Romania
Adina Magda Florea
Vlad Bacue
Prime Motors Industries
Bucharest, Romania
Abstract—Although there is a diversity of publicly available
datasets for autonomous driving, from small-scale to larger
collections with thousands of miles of driving, we consider that
the process of collecting and processing them is often overlooked
in the literature. From a data-driven perspective, quality of a
dataset has proven as important as quantity especially when
evaluating self-driving technologies where safety is crucial. In this
paper, we provide a guideline going through all the steps from
configuring the hardware setup to obtaining a clean dataset. We
describe the data collection scenario design, the hardware and
software employed in the process, the challenges that must be
considered, data filtering and validation stage. This work stems
from our experience in collecting the UPB campus driving dataset
released together with this work. It is our belief that having a
clean and efficient process of collecting a small but meaningful
dataset has the potential to improve benchmarking autonomous
driving solutions, capturing local environment particularities.
Index Terms—autonomous driving, dataset collection, machine
Autonomous driving popularity has lead to an increase in
the last few years of vision-based dataset releases [7, 17,
18, 24]. Commercial interest has also spiked, with companies
involved in autonomous driving collecting vast amounts of
driving data. Unofficial statements place Uber at slightly over
1 billion miles and Waymo at 10 million miles collected, with
other companies not lagging behind too far. Data collection
has been at the forefront of the strive for the main algorithmic
components involved in self-driving decisions to live up to the
required safety standards. As data becomes more abundant, the
general belief is that the self-driving behavior becomes less
error-prone. However, most of the collected data consists of
well constrained and structured environments, like highway
driving in sunny weather. Thus, the challenge of ensuring
that autonomous vehicles behave predictably in complex,
cluttered environments remains to be resolved given the small
available margin, in the sense that even spurious errors might
have tragic consequences. Evaluating systems against these
environments [14, 21] indicates that such errors indeed occur
with current technologies. Considering that most vehicles tend
to be driven in largely the same geographical area, we believe
that collecting a dataset from the target environment has the
potential to increase performance and reduce errors.
In this paper we present our view on the process and results
of collecting a dataset for a specific location, in our case the
UPB campus. This work is part of a larger project intended
to develop computer vision methods for a wide array of
assisted driving tasks. Algorithmic modules will be proposed
for autonomous driving together with a prototype system based
on an electric converted car from Prime Motors Industry.
The data collection sub-project is split in two stages. The
first stage involves less expensive commodity hardware re-
quiring a less complex setup and faster collection procedures.
Although data is gathered at a faster pace and with less time
required to repair any software or hardware faults, the less
accurate setup means more time is spent during calibration,
data pre-processing, cleaning and validation stages. Therefore,
stage two of data collection involves more accurate sensors
and more data modalities will contain a LiDAR, 5 monocular
cameras, frontal radar and a dedicated IMU+GPS module.
This second method requires more effort to setup but usually
yields more reliable data. Our motivation for this two stage
approach is based on our plans of performing initial tests
on a low-cost setup, mainly targeting RGB sensors, and then
comparing performance versus a more complicated setup. This
paper describes the results of the first stage of data collection.
In terms of vision-based algorithm we intend to use the
collected dataset in order to evaluate end-to-end steering
models based on deep neural networks, similar to [25]. We
intend to leverage large scale datasets in a semi-supervised
manner in order to successfully transfer knowledge to our local
Our goal for the dataset is to collect particularities of
the local scene. We are interested in varying environmental
conditions like time of day or capture different behavior
by using multiple drivers. One aspect we have taken into
consideration is to record and then annotate examples of ”bad
driving”, in which the driver is instructed to deviate from
optimal trajectories. This produces recordings with situations
where the car veers slightly off the road, where it wrongfully
departs the lane or the car is driven into a dead end.
Another interesting aspect we introduced in our dataset was
the use of a three front facing camera setup. These commodity
cameras were mounted on top of the car in diverse setups
in which we varied their position or viewing angle, allowing
us to explore data augmentation techniques. Also notable is
that our driving sessions have been started and stopped after
completion in the same location, which was carefully chosen
to contain multiple visual queues at various ranges. This has
facilitated the calibration process and allowed for faster data
Our contributions are as follows:
we provide an overview of a dataset collection pipeline
and process, with various challenges and our proposed
technical solutions for them
we collected, processed, validated and released a public
dataset in the UPB campus
we open-source and describe solutions and tools for these
In Section II we describe other related datasets and how
they relate to our work. Section III describes the hardware
setup and the software tools and scripts used to collect the
dataset. We present solutions and ideas for cleaning the data
in Section IV. The characteristics of the dataset are discussed
in Section V, while in Section VI we present our conclusions
and ideas for future work.
Being one of the first large datasets released, KITTI [11]
quickly became one of the most widely used datasets for
autonomous driving benchmarks. It provides diverse traffic
conditions in different illuminations of the environment. The
size varies depending on the desired vision task, summing up
at around 300 gigabytes. The data provided contains videos,
images, LiDAR and GPS in different traffic conditions.
Since then there has been various work and investment in
this direction, fuelled by advancements in the autonomous
driving industry. Latest research in [12] presents a well docu-
mented overview of publicly open datasets for autonomous
driving covering 45 released datasets. Recent works have
managed to advance the standards for this topic both in size
and in quality.
Two large-scale and complex datasets that we believe are
worth mentioning are NuScenes [18] and the ApolloScape [15]
dataset. NuScenes provides information from a comprehensive
autonomous vehicle sensor suite and it contains around 1k
videos with three-dimensional bounding boxes and behavior
labels. ApolloScape contains as well rich labelling and claimed
to make available at least 15 times more data than the state
of the art datasets for multiple autonomous driving related
tasks such as per-pixel labeling, instance segmentation, 3D
car instance labeling and many more. They proposed labeling
tools and algorithms for each task as the cost of manually
processing is prohibitive.
The Berkeley Deep Drive (BDD) [25] dataset, which in-
spired us in using their proposed data format—given the
similar setup—, has two main characteristics, namely the large
size and the diversity of conditions. BDD contains 100K
videos and provides urban, rural, highway traffic in different
illumination conditions. Each raw video measures around 40
seconds and are annotated with weather tags, time and scene
context (residential, tunnel, etc). This particularity is corre-
lated with characteristics needed for scaling machine learning
algorithms. However, the quality of some of the videos in
terms of their relevance to driving is somewhat degraded (e.g.
stationary cars, unintended changes in the position of the
recording smartphone).
One of the few research works that is similar with ours due
to its aim to discuss in more details a methodology of collect-
ing data at large scale was presented in [9]. They proposed a
full pipeline with all the stages for building an autonomous
driving dataset: off-loading, cleaning, synchronizing and ex-
tracting knowledge from data. Each stage is described with
its challenges and proposed technical solutions. Not only do
they formalize all the tasks, but they also present the hardware
components necessary to semi-automate a continuous data
collection pipeline.
As discussed in Section III we have used commodity
hardware and produced a quality dataset based on the campus
university road layout. However, as discussed in this paper,
the amount of work required to obtain a dataset is high and
all public research attempting to better explain and facilitate
this process is very important especially to the open research
In the process of designing a self-driving dataset the first
step is to determine the hardware setup which is going to be
used. As stated previously, one of the goals of the project is
to implement a self-driving prototype on an electric car. In
our case, we used an electric powered Dacia Logan (version
2), converted from a petrol internal combustion engine. In our
experience, using an electric vehicle for autonomous driving
is sometimes easier in terms of control due to their usu-
ally advanced Battery Management System (BMS). Usually,
commercially available BMSs exposes useful information and
alerts regarding the charging state, acceleration and breaking
(i.e. regenerative breaking). Although communication with the
BMS is easy, recovering the state of other on-vehicle sensors
(steering angle, wheel speed, etc.) requires decoding messages
of the vehicle CAN bus.
From the point of view of installed sensors, we designed a
setup containing 3 cameras (Logitech C930e), a CAN to
USB adapter (Orion BMS CANdapter) and a smartphone
(Samsung S8). All were connected to an onboard laptop
which served the role of recording the sessions.
The 3 cameras captured video at 1080 p with their bit-
stream copied to the output, without compression, resulting
in 30 FPS in MJPEG YUVJ422P format. Notably, the frame
rate drops to about 10 FPS in low light conditions. Recording
Source data stream Hz and very low frequency drops conditions
Mean Std Threshold Conditions
Camera 30 Hz - 10 Hz low light
Phone 60 Hz 5 Hz 4 stdev every 41min
- GPS 1 Hz
Steer 100 Hz 7.4 Hz 4 stdev every 45min
Speed 25 Hz 2 Hz 4 stdev every 47min
using H.264 format proved unreliable due to latency and
instability with the 1080 p resolution. We represent other data
logging frequency measurements in Table I and present how
often outliers occur in the system pipeline. Even in this small
amount of record time we notice a gap in raw CAN data for
distances of almost 12 m while the car was driving at a speed
of about 40 km/h.
The cameras were mounted front facing on top of the
car, just above the windscreen, using commodity suction cup
camera mounts. The central camera was positioned along the
longitudinal axis of the car. The position of two lateral cameras
have been set at either 40 or 56 cm away from the central
camera, and the viewing angle either 0° or 30° from the
longitudinal axis. The smartphone was placed horizontally
on the center of the dashboard. Figure 1 exemplifies the
positioning of the cameras and of the smartphone.
Fig. 1. Camera and smartphone positioning setups. In some setups the
cameras were translated away from the center of the car, and in other setups
the camera viewing angle was rotated outwards.
Most of the software tools for recording, cleaning, synchro-
nizing, processing and visualizing the data are implemented
as python scripts, with some of them running on the onboard
laptop. The following list briefly describes the most important
scripts that we used for the dataset collection and preparation: - This launches the recording server based
on a configuration file which specifies data sources and
formats. It starts threads for each data source and displays
the log screen for them, with recording starting only after
all desired sensor sources are validated. Incoming data is
saved raw so as to avoid the possibility of losing data
due to encoding lag, and each data item is assigned a
Phone sensor collection app - A Unity ap-
plication deployed on the Android smartphone which
gathers the required data from the embedded sensors and
passes them to the server via a websocket connection.
Unity was used for ease of cross-platform deployment of
the app. - The camera script offers a configurable
FFmpeg1recording thread and tools for visually assisting
the user during camera installation. - Raw CAN messages are recorded using the
default Linux candump2tool which can also be used
for package analysis. The message decoding process
is configured by a DBC Format file which offers the
conversion scheme. - This tool makes use of the
recorded session configuration and guides the process
of validating, syncing and post-processing all gathered
sensor data. - We use this tool after
the session data has been processed for viewing several
statistics such as those mentioned in Table I and Figure
6. Also the scrip visually replays all important dataset
features while easily annotating important timestamps. - We offer a tool to
preview the future car path and several distance land-
marks in the camera frame based on the camera properties
(intrinsic and extrinsic), the car specifications (wheelbase
and steering ratio) and steering degree. - This is the final
step in exporting all session data in a homogeneous
format, described in Section V. It uses previously anno-
tated desired labels for export and partitions the data in
fixed temporal length splits of 36 seconds. In this stage,
unwanted sections are disregarded (e.g. long stationary
sequences) and other configurable transformations are
applied (e.g. video compression). - We also provide a tool for
estimating an autonomy measure, as proposed in [2],
which is based on the number of interventions necessary
when the simulated vehicle departs from the recorded
trajectory. It takes advantage of the three camera setup
and visually simulates deviations from the recorded ori-
entation and position of the car.
For each script exposing a recording function we have also
provided tools for raw data visualization, preprocessing and
for metrics computation.
Another possible solution for recording all the data that
we have considered is using the ROS framework3. This
allows taking advantage of a system already designed for
communication involving heterogeneous data streams and all
the tools already implemented for data playback. Systems
such as the Apollo Framework 3.0and other datasets [19]
used this framework for this use case. The first reason for
us choosing a different solution was due to the resulting high
dataset size. We estimated storing one single raw 1080 p video
stream at 30 FPS would have reached more than 500 GB. This
would most likely result in low frequency because of storage
and bandwidth limitations. The second one was the known
latency and throughput problems of ROS based systems, also
indicated in [13], which became one of the major reasons for
the development of ROS24.
Calibrating the sensors is an important step ensuring reliable
data is being recorded. For a vision based dataset like the one
we collected camera calibration before a recording session is
a necessary step.
For computing the camera intrinsic matrix, we resorted
to the ROS solution5for monocular cameras using the node that outputs camera and
distortion coefficients matrices for the three cameras used in
the data collection process.
We have tested two different methods for obtaining the cam-
era extrinsic matrix. The first method consists in measuring
the distances from the camera to a set of crafted landmarks.
A bulls-eye like pattern was used to mitigate possible errors.
Measurements were performed using laser tape measures tools
relative to four points, representing the four corners of the car
windshield. The results were later translated to the coordinates
system origin which we considered to be the midway point of
the car rear axle, just as in the Apollo framework6. Feeding
the measurements into a trilateration algorithm resulted in
a system of linear equations, whose solution we chose to
approximate using the conjugate gradient method. The second
step in determining the extrinsic calibration matrix was to
compute the rotation vector. We have manually extracted 2D
coordinates of pixels representing the center of the landmark
in the picture. Then, the 2D3Dcorrespondences were used
for solving a Perspective-n-Point (PnP) problem using tools
from the OpenCV library [3].
The second method relies on visual queues of known land-
marks, taking advantage of parallel and perpendicular lines
and planes. Each value of the rotation vector was determined
by matching real distances and lines with their projected
counterparts. The translation vector was determined prior to
this by measurements of the camera positioning which was
determined according to the car size specifications provided
by the manufacturer. Again, the midway point of the car rear
axle was used as the system origin.
In the second iteration of the dataset, which will contain
more advanced sensors, calibrations will be performed using
the LiDAR system. We will employ a combined LiDAR and
camera calibration approach similar with those presented for
other datasets [18].
Another challenge, also considered in the literature [4, 9], is
the synchronization procedure of all the sensors. An efficient
5 calibration/Tutorials/MonocularCalibration
method is to generate the same signal on all the different
sources and offset the respective data streams accordingly. We
propose using either the turn signals light or the horn sound
(which in our case though, was not logged on the CAN bus).
Both signals can be recorded by all devices, since the related
events are usually advertised on the CAN bus, together with
either sound or video from the phone and cameras at session
start. We mention that a variation in synchronization offsets
might be possible, but we did not experience this during our
45 min recording sessions. In order to obtain higher quality
data we pay close attention if any offsets are observed while
replaying the processed data. We also notice that during our
first trials we were able to efficiently synchronize a session
by manually annotating the collision timestamps with speed
bumps and synchronize them with corresponding accelerome-
ter sensor spikes and camera movement.
Based on the phone sensors the approximated car global
path trajectory has high errors with temporally collocated drifts
that reach as high as an estimated 35 m as seen in Figure
2. The GPS recordings produce a mean error estimate of
4.5 m ±3.6 m. We observe an increase in global location
accuracy when computing the car path based on the speed
and steer angle reported on the CAN bus. To mitigate error
accumulation, we split the data in chunks of 36 s and man-
ually calibrate them using visual queues from the map and
video data, with the possibility of further error correction for
sub-segments. Although we do not reach the recommended
location accuracy of a few centimeters [22], we are able to
lower it to an estimated maximum global error of 1.5 m, which
can be further improved through more fine-grained manual
Fig. 2. Difference between GPS coordinates in red and corrected path in
blue. Measurements in the images are represented in meters.
Considering one of the intended use cases of our dataset,
we recorded deviations from the normal car trajectory (erratic
driving sessions, Figure 7.d) for performing a qualitative
assessment of out-of-distribution drifts. Using the vector map
we manually approximate what should have been the correct
UTM coordinates of the car, establishing the ground truth
for what should have been a normal driving behavior (useful
in a quantitative assesment). Another solution, given enough
data, that could have been used to describe correct trajectories
for normal driving behaviour is to estimate the mean of
previous non-erratic driving data recorded within that region.
However, since the campus is open to the public and lacking
in sidewalks, there are many situations and specific locations
where the driver is forced to avoid pedestrians on the road
(Figure 7.f). This means that the average trajectory produced
would become skewed for such locations. We provide the
approximated ground truth for a quantitative evaluation within
the rideID JSON file.
In the process of cleaning the dataset we automatically
disregard sequences where there are drops in any sensor stream
above a set threshold. A minimum of 3 data points per seconds
is considered. We synchronize all the data using manually
annotated timestamps and we perform a shift for small drifts
in the time the phone log was received versus the time it was
recorded. We disregard any variance in the delays of the CAN
and video data and use a constant offset. We manually check
all the synchronised processed raw data and manually annotate
unintended driving behaviour since it cannot be inferred from
the information available in the collected data. Situations such
as periodic, unannounced maintenance stops (e.g. checking
camera positioning) or backing up the car after erratic driving
situations are labeled and disregarded at export time. We also
check for high errors in car steering geometry by estimating
the mean steering collected from the car on long straight
trajectories of driving.
Among the errors encountered in the data some have proved
to have interesting sources. The magnetometer sensor was
often affected by the power inverter of the car which was
installed very close to the dashboard under the bonnet, without
us realizing. Speed read from the car sensors were affected
running into speed bumpers. The recorded steering angle
reaches higher errors during tight turns. Also, even with the
high frequency of recording the raw CAN data we discovered
missing information, sometimes for up to 10 m while driving
at 30 km/h.
As it is expected larger labeled datasets increases per-
formance of machine learning algorithms [20]. In spite of
this, efforts for improving labeling tools has been spread too
thin and although many autonomous driving dataset articles
mention such software, these are often developed in-house and
are not publicly available [15, 25].
We use the open source software toolkit ELAN7to annotate
different driver behaviors and causal relationships as proposed
in [19]. We provide a annotation tool for labeling navigation
commands before intersections. The tool automatically sets
labels after intersections are marked on the map. Commands
can be either discrete (e.g. turn right, left) or represented
by future course difference and can be used to conditionally
train the car to steer in intersections similar with previous
works [6, 23]. We also annotate causal relations that cannot
be inferred from the data so we can avoid exporting them.
(e.g. lane change with hesitation).
To increase video labeling efficiency and because of the
high overlapping fields of view of the three cameras we first
produce a panorama by stitching together all three images.
For annotating object bounding boxes we use CVAT8. In an
attempt to implement a guided and interactive segmentation
process on our dataset we used a state-of-the-art segmenta-
tion network trained on Cityscapes [10] to generate initial
segmentation proposals. However, correcting the segmentation
proposals of the network proved less efficient than to manually
annotate them from scratch. This is true especially for the road
class which usually overlaps with many dynamic objects (like
vehicles and pedestrians).
Finally, given annotated timestamps of desired labels, we
provide an export tool which splits all the data in a homo-
geneous format of a fixed time length (36 s in our case).
Each set is identified by an unique ID and contains a list of
features for every location point of the trajectory which was
generated at about 100 Hz. The frequency is determined by the
highest logging rate of all sensors with the other data being
interpolated linearly to match this rate.
Our dataset contains features that are organized in the the
following form:
rideID.json which contains:
ID identifing the session
startTime and endTime (Unix timestamp re-
ported in milliseconds)
cameras intrinsic and extrinsic parameters
locations (easting, northing, timestamp, course, lati-
tude, longitude, speed, steer, break)
raw_phone (20 features collected from phone sen-
sors: GPS, Accelerometer, Gyroscope, Magnetome-
raw data from CAN (steer, speed, acceleration pedal,
break pedal
rideID-camera_id.mp4 for each camera
(camera_id values are 0: Center, 1: Left, 2: Right)
Table II describes key metrics of the UPB campus dataset,
usually for most driving data collections. Also, Figure 6
describes the values obtained for the main features of the
We built the vector map, depicted in Figure 3, of the
university campus road network based on a modified version of
the OpenDRIVE format [8]. The map is customized according
to the requirements in the Apollo Open Platform [1] making it
compatible for route planning. It contains a logical description
of the road network, with information such as direction of
lanes, pedestrian crossings and traffic signs.
To validate our entire dataset production pipeline we use
the exported data and evaluate it with the publicly avail-
able discrete action prediction model released for Berkeley
DeepDrive Video Dataset9. This model is in accordance with
9 Driving Model
UPB Campus dataset
Normal driving Erratic driving
Videos 408 x 3 45 x 3
Frames 458026 53716
Distance (km) 72.7 7.0
Duration (min) 254 30
Low light (% of total) 21%
Intersections 739
- Turn Left 164
- Turn Right 172
- Straight 403
Fig. 3. University campus vector map.
the work done by [24] but we emphasize that it is not
within the list of models evaluated in the published article.
In their work they investigate various models and report
accuracy scores for this task, on the BDD dataset, varying
from 82.0 % for the CNN 1F rame configuration up to
84.1 % for the F CN LS T M configuration. We evaluate the
T CN N 1model with a temporal convolution of window size 1
which yields 77.52 % accuracy on the publicly available BDD
evaluation set. The network was trained on the BDD dataset
[25] to predict discrete driving actions such as Straight,
Stop,Turn Left and Turn Right by minimizing the
cross entropy loss using a weighting scheme to compensate
for the biased distribution in steering. The discrete turning
actions were obtained from the relative difference in course
calculated with a period of 1/3 seconds by using a hard
threshold of 35°. The Stop action is determined by a decrease
in speed under 2 km/h. As presented in Table III we obtain
an accuracy of 62.22 % which is lower than the random guess
of 76.07 % computed on the uneven distribution of classes.
Figure 4 depicts the confusion matrix, which reveals that the
limit between considering Straight and Stop actions is
very sensitive and difficult for the network to discern. Results
indicate that the model trained on the 700.000 videos is not
able to yield very good results on setups like ours. There are
Random guess 76.07%
TCNN1 62.22%
Classes distributions.
Straight Stop Turn Left Turn Right
86.87% 5.45% 4.45% 3.23 %
several works reporting on the topic of predicting steering
commands, like [2, 5, 16], and to the best of our knowledge
only [4] made available an official model which uses video
and laser points data as input.
Fig. 4. Confusion matrix of discrete steering predictions.
We present in Figure 5 a data point visualization of the
entire dataset corrected path coordinates. We reach a maximum
of 41 passes on the same road segment across all trips.
Fig. 5. Visualization of all UTM coordinates for all trips collected in this
UPB campus dataset.
Fig. 6. Histograms of sensor measurements for the entire dataset. The Y axis represents the number of data points for each of the raw stream (sensors produce
data at different rates). Note that we are using log scale.
Fig. 7. Examples of video frames from our dataset. (a) Synchronized views in a three camera setup. (b) Night time. (c) Normal driving behaviour. (d) Erratic
driving behaviour. (e) Unmarked road. (f) Pedestrians walking on the road. (g) View from the left camera in a setup with parallel positioning.
Our work describes the steps that were necessary to take
in order to record, process and validate a dataset for self-
driving applications. From a technical point of view advice
regarding this in the literature is pretty scarce and we view
our proposals as a guideline for helping others develop much
faster their own local dataset. We consider, as indicated by
some of the results in Section V, that a safe adaptation to
the particularities of a specific geographical area (in terms
of road network, traffic participants behaviors, etc.) can only
be achieved with local data. This view is also supported by
works like [12, 19]. Moreover, data format homogeneity is
also a desired feature since it allows faster prototyping and
knowledge transfer from one dataset to another.
The dataset presented in this paper is the first iteration of
a two stage data collection strategy. The first phase is based
on low-cost commodity sensors and hardware with quick setup
time, easy calibration and usage. Data collected in this manner
is useful for rapid prototyping and initial evaluations in the
process of transferring knowledge from a large scale dataset.
To this end, we hope that in the future we will see better
benchmarking standards and widely tested technologies on a
larger diversity of locally gathered data. The second stage of
our work will be based on more accurate sensors and a setup
more robust to errors. Although the need for robust algorithms
which can handle practical use within comfortable safety limits
has captured most attention in the literature, the sensing and
processing setup need to be proven at least as robust. It is not
only an issue on the algorithms need to adapt to new setups
but also that it is imperative to ensure reliability, redundancy,
low-latency and adequate data bandwidth.
The lessons learned in this first stage will reflect upon the
preparation of the second stage. This concerns using fixed
mounts at known coordinates relative to the base of the
vehicle, a more reliable data stream synchronization method
and specialized extrinsic calibration landmarks.
Future work concerning the first stage of data collection
involves finishing the labeling procedures of driving related
objects and semantic segmentation of image frames and mak-
ing them publicly available. Also, it is our intent to develop
and evaluate end-to-end steering and investigate knowledge
transferring from larger driving datasets.
This research was funded by grant PN-III-P1-1.2-PCCDI-
2017-0734 and grant NETIO – subs. 1225/22.01.2018
[1] Baidu. Apollo open platform for autonomous driving,
[2] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner,
B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller,
J. Zhang, et al. End to end learning for self-driving cars.
arXiv preprint arXiv:1604.07316, 2016.
[3] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal
of Software Tools, 2000.
[4] Y. Chen, J. Wang, J. Li, C. Lu, Z. Luo, HanXue,
and C. Wang. Lidar-video driving dataset: Learning
driving policies effectively. In The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June
[5] L. Chi and Y. Mu. Deep steering: Learning end-to-end
driving model from spatial and temporal visual cues.
arXiv preprint arXiv:1708.03798, 2017.
[6] F. Codevilla, M. Miiller, A. L´
opez, V. Koltun, and
A. Dosovitskiy. End-to-end driving via conditional imi-
tation learning. In 2018 IEEE International Conference
on Robotics and Automation (ICRA), pages 1–9. IEEE,
[7] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. En-
zweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele.
The cityscapes dataset for semantic urban scene under-
standing. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 3213–
3223, 2016.
[8] M. Dupuis et al. Opendrive format specification. VIRES
Simulationstechnologie GmbH, 2010.
[9] L. Fridman, D. E. Brown, M. Glazer, W. Angell, S. Dodd,
B. Jenik, J. Terwilliger, J. Kindelsberger, L. Ding, S. Sea-
man, et al. Mit autonomous vehicle technology study:
Large-scale deep learning based analysis of driver be-
havior and interaction with automation. arXiv preprint
arXiv:1711.06976, 2017.
[10] J. Fu, J. Liu, H. Tian, Z. Fang, and H. Lu. Dual
attention network for scene segmentation. arXiv preprint
arXiv:1809.02983, 2018.
[11] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision
meets robotics: The kitti dataset. The International
Journal of Robotics Research, 32(11):1231–1237, 2013.
[12] J. Guo, U. Kurup, and M. Shah. Is it safe to drive?
an overview of factors, challenges, and datasets for
driveability assessment in autonomous driving. arXiv
preprint arXiv:1811.11277, 2018.
[13] C. S. V. Guti´
errez, L. U. S. Juan, I. Z. Ugarte, and V. M.
Vilches. Time-sensitive networking for robotics. arXiv
preprint arXiv:1804.07643, 2018.
[14] S. Hecker, D. Dai, and L. Van Gool. Failure prediction
for autonomous driving. In 2018 IEEE Intelligent Vehi-
cles Symposium (IV), pages 1792–1799. IEEE, 2018.
[15] X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and
R. Yang. The apolloscape open dataset for autonomous
driving and its application. ArXiv e-prints, 2018.
[16] J. Kim and C. Park. End-to-end ego lane estimation based
on sequential transfer learning for self-driving cars. In
Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops, pages 30–38, 2017.
[17] W. Maddern, G. Pascoe, C. Linegar, and P. Newman.
1 year, 1000 km: The oxford robotcar dataset. The
International Journal of Robotics Research, 36(1):3–15,
[18] NuTonomy. Nuscenes dataset, 2018.
[19] V. Ramanishka, Y.-T. Chen, T. Misu, and K. Saenko. To-
ward driving scene understanding: A dataset for learning
driver behavior and causal reasoning. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 7699–7707, 2018.
[20] C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revis-
iting unreasonable effectiveness of data in deep learning
era. In Proceedings of the IEEE international conference
on computer vision, pages 843–852, 2017.
[21] Y. Tian, K. Pei, S. Jana, and B. Ray. Deeptest: Automated
testing of deep-neural-network-driven autonomous cars.
In Proceedings of the 40th international conference on
software engineering, pages 303–314. ACM, 2018.
[22] R. Vivacqua, R. Vassallo, and F. Martins. A low cost
sensors approach for accurate vehicle localization and
autonomous driving application. Sensors, 17(10):2359,
[23] Q. Wang, L. Chen, and W. Tian. End-to-end driving
simulation via angle branched network. arXiv preprint
arXiv:1805.07545, 2018.
[24] H. Xu, Y. Gao, F. Yu, and T. Darrell. End-to-end learning
of driving models from large-scale video datasets. In
Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 2174–2182, 2017.
[25] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan,
and T. Darrell. Bdd100k: A diverse driving video
database with scalable annotation tooling. arXiv preprint
arXiv:1805.04687, 2018.
... UPB campus dataset: the UPB campus dataset [43] was collected on the streets belonging to the campus of University Politehnica of Bucharest and presented in a work focused on defining a guideline on collecting, processing and annotating a self-driving dataset. ...
... By weighting the five class equally, our model can forget about the actual trajectory distribution. UPB dataset [43] lacks of diversity in terms of intersection and curved roads, and we suspect that our model has difficulties in generalizing. We leave the analysis of other balancing strategies for future work. ...
... We have presented a fully self-supervised labeling pipeline for an autonomous driving dataset in the context of steering control and path labeling. We applied our method on two open-source driving datasets [43,44], and we proved the robustness and accuracy by achieving competitive results against the ground-truth labeling counterpart. Consequently, this shows that leveraging self-labeled data for learning a steering model can be almost as reliable as using the ground truth data. ...
Full-text available
Autonomous driving is a complex task that requires high-level hierarchical reasoning. Various solutions based on hand-crafted rules, multi-modal systems, or end-to-end learning have been proposed over time but are not quite ready to deliver the accuracy and safety necessary for real-world urban autonomous driving. Those methods require expensive hardware for data collection or environmental perception and are sensitive to distribution shifts, making large-scale adoption impractical. We present an approach that solely uses monocular camera inputs to generate valuable data without any supervision. Our main contributions involve a mechanism that can provide steering data annotations starting from unlabeled data alongside a different pipeline that generates path labels in a completely self-supervised manner. Thus, our method represents a natural step towards leveraging the large amounts of available online data ensuring the complexity and the diversity required to learn a robust autonomous driving policy.
... An interesting method for creating a dataset is presented in paper [8] which addresses the problem of Collecting and Processing a Self-Driving Dataset. In their paper, they provide a guideline going through all the steps from configuring the hardware setup to obtaining a clean dataset. ...
... We attempted to use different categories with a "top down" approach, based, e.g., on lifecycle stages, architectural levels or components, but those classifications did not match well with the coverage of papers we actually found, due to several factors, including heterogeneity in reference architectural models and immaturity of new V&V approaches for AVs, compared to more stable sectors such as avionics or railways (e.g., automatic train control), where those classifications work better. [2], [10], [29], [30], [52], [60], [66], [69], [71], [75], [87], [90], [91], [91], [100], [106] Test Case Definition and Generation [17], [25], [36], [40], [46], [51], [52], [54], [67], [72], [81], [86], [87], [100] Corner Cases and Adversarial Examples [8], [14], [18], [22], [24], [31], [32], [38], [41], [47]- [49], [53], [55], [85], [89], [95], [96], [102], [104], [105] Fault Injection [3], [4], [6], [15], [37], [62], [64] [73] [74], [92]- [94], [97], [103], [107] Mutation Testing [4], [50], [55], [58] Software Safety Cages [7], [68] Techniques for CPS [20], [21], [28], [35], [39], [56], [65], [76], [99] Formal Methods [16], [27], [34], [61], [63], [70], [79], [82], [87], [98], [99], [101], [102] ...
Full-text available
Autonomous, or self-driving, cars are emerging as the solution to several problems primarily caused by humans on roads, such as accidents and traffic congestion. However, those benefits come with great challenges in the verification and validation (V&V) for safety assessment. In fact, due to the possibly unpredictable nature of Artificial Intelligence (AI), its use in autonomous cars creates concerns that need to be addressed using appropriate V&V processes that can address trustworthy AI and safe autonomy. In this study, the relevant research literature in recent years has been systematically reviewed and classified in order to investigate the state-of-the-art in the software V&V of autonomous cars. By appropriate criteria, a subset of primary studies has been selected for more in-depth analysis. The first part of the review addresses certification issues against reference standards, challenges in assessing machine learning, as well as general V&V methodologies. The second part investigates more specific approaches, including simulation environments and mutation testing, corner cases and adversarial examples, fault injection, software safety cages, techniques for cyber-physical systems, and formal methods. Relevant approaches and related tools have been discussed and compared in order to highlight open issues and opportunities.
... We made an attempt to use different categories with a "top down" approach, based, e.g., on life-cycle stages, architectural levels or components, but those classifications did not match well with the coverage of papers we actually found, due to several factors, including heterogeneity in reference architectural models and immaturity of many new V&V approaches for AVs, compared to more stable sectors such as avionics or railways (e.g., automatic train control), where those classifications work better. [2], [10], [29], [30], [52], [60], [66], [69], [71], [75], [87], [90], [91], [91], [100], [106] Test Case Definition and Generation [17], [25], [36], [40], [46], [51], [52], [54], [67], [72], [81], [86], [87], [100] Corner Cases and Adversarial Examples [8], [14], [18], [22], [24], [31], [32], [38], [41], [47]- [49], [53], [55], [85], [89], [95], [96], [102], [104], [105] Fault Injection [3], [4], [6], [15], [37], [62], [64] [73] [74], [92]- [94], [97], [103], [107] Mutation Testing [4], [50], [55], [58] Software Safety Cages [7], [68] Techniques for CPS [20], [21], [28], [35], [39], [56], [65], [76], [99] Formal Methods [16], [27], [34], [61], [63], [70], [79], [82], [87], [98], [99], [101], [102] ...
Autonomous, or self-driving, cars are emerging as the solution to several problems primarily caused by humans on roads, such as accidents and traffic congestion. However, those benefits come with great challenges in the verification and validation (V&V) for their safety assessment. In fact, due to the possibly unpredictable nature of Artificial Intelligence (AI), its use in autonomous cars creates concerns that need to be addressed using appropriate V&V processes that are able to address trustworthy AI and thus achieve a safe autonomy. In this study, the research work in the last ten years is reviewed to investigate the state-of-the-art in the software V&V of safe autonomous cars and summarize open issues in this field. Relevant papers have been found and classified using a systematic approach. By appropriate inclusion and exclusion criteria, a subset of primary studies have been selected for more in-depth analysis. The applicability of current approaches within reference standards such as the ISO 26262 has also been reviewed. Finally, the review has investigated the adoption of formal methods with a focus on cyber-physical systems, as well as more conventional software verification approaches such as simulation and fault injection, together with mutation testing of machine learning systems, with comparison between relevant approaches.
... In this work we have pursued to design, implement and evaluate an end-to-end model for predicting steering commands of an autonomous vehicle. Starting from our previously collected dataset [10] we used deep convolutional neural networks on the input frames and provided as output a distribution over the discretized steering command. The proposed model yielded better results on our dataset when using data augmentation techniques during training. ...
Full-text available
On-road behavior analysis is a crucial and challenging problem in the autonomous driving vision-based area. Several endeavors have been proposed to deal with different related tasks and it has gained wide attention recently. Much of the excitement about on-road behavior understanding has been the labor of advancement witnessed in the fields of computer vision, machine, and deep learning. Remarkable achievements have been made in the Road Behavior Understanding area over the last years. This paper reviews 100+ papers of on-road behavior analysis related work in the light of the milestones achieved, spanning over the last 2 decades. This review paper provides the first attempt to draw smart mobility researchers’ attention to the road behavior understanding field and its potential impact on road safety to the whole road agents such as: drivers, pedestrians, stuffs, etc. To push for an holistic understanding, we investigate the complementary relationships between different elementary tasks that we define as the main components of road behavior understanding to achieve a comprehensive understanding of approaches and techniques. For this, five related topics have been covered in this review, including situational awareness, driver-road interaction, road scene understanding, trajectories forecast, driving activities, and status analysis. This paper also reviews the contribution of deep learning approaches and makes an in-depth analysis of recent benchmarks as well, with a specific taxonomy that can help stakeholders in selecting their best-fit architecture. We also finally provide a comprehensive discussion leading us to identify novel research directions some of which have been implemented and validated in our current smart mobility research work. This paper presents the first survey of road behavior understanding-related work without overlap with existing reviews.
When talking about automation, “autonomous vehicles”, often abbreviated as AVs, come to mind. In transitioning from the “driver” mode to the different automation levels, there is an inevitable need for modeling driving behavior. This often happens through data collection from experiments and studies, but also information extraction, a key step in behavioral modeling. Particularly, naturalistic driving studies and field operational trials are used to collect meaningful data on drivers’ interactions in real–world conditions. On the other hand, information extraction methods allow to predict or mimic driving behavior, by using a set of statistical learning methods. In simple words, the way to understand drivers’ needs and wants in the era of automation can be represented in a data–information cycle, starting from data collection, and ending with information extraction. To develop this cycle, this research reviews studies with keywords “data collection”, “information extraction”, “AVs”, while keeping the focus on driving behavior. The resulting review led to a screening of about 161 papers, out of which about 30 were selected for a detailed analysis. The analysis included an investigation of the methods and equipment used for data collection, the features collected, the size and frequency of the data along with the main problems associated with the different sensory equipment; the studies also looked at the models used to extract information, including various statistical techniques used in AV studies. This paved the way to the development of a framework for data analytics and fusion, allowing the use of highly heterogeneous data to reach the defined objectives; for this paper, the example of impacts of AVs on a network level and AV acceptance is given. The authors suggest that such a framework could be extended and transferred across the various transportation sectors.
Conference Paper
Full-text available
Learning autonomous-driving policies is one of the most challenging but promising tasks for computer vision. Most researchers believe that future research and applications should combine cameras, video recorders and laser scanners to obtain comprehensive semantic understanding of real traffic. However, current approaches only learn from large-scale videos, due to the lack of benchmarks that consist of precise laser-scanner data. In this paper, we are the first to propose a LiDAR-Video dataset, which provides large-scale high-quality point clouds scanned by a Velodyne laser, videos recorded by a dashboard camera and standard drivers' behaviors. Extensive experiments demonstrate that extra depth information help networks to determine driving policies indeed.
Full-text available
Autonomous driving has attracted tremendous attention especially in the past few years. The key techniques for a self-driving car include solving tasks like 3D map construction, self-localization, parsing the driving road and understanding objects, which enable vehicles to reason and act. However, large scale data set for training and system evaluation is still a bottleneck for developing robust perception models. In this paper, we present the ApolloScape dataset [1] and its applications for autonomous driving. Compared with existing public datasets from real scenes, e.g., KITTI [2] or Cityscapes [3], ApolloScape contains much large and richer labelling including holistic semantic dense point cloud for each site, stereo, per-pixel semantic labelling, lanemark labelling, instance segmentation, 3D car instance, high accurate location for every frame in various driving videos from multiple sites, cities and daytimes. For each task, it contains at lease 15x larger amount of images than SOTA datasets. To label such a complete dataset, we develop various tools and algorithms specified for each task to accelerate the labelling process, such as joint 3D-2D segment labeling, active labelling in videos etc. Depend on ApolloScape, we are able to develop algorithms jointly consider the learning and inference of multiple tasks. In this paper, we provide a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving. We show that practically, sensor fusion and joint learning of multiple tasks are beneficial to achieve a more robust and accurate system. We expect our dataset and proposed relevant algorithms can support and motivate researchers for further development of multi-sensor fusion and multi-task learning in the field of computer vision.
Conference Paper
Full-text available
Learning autonomous-driving policies is one of the most challenging but promising tasks for computer vision. Most researchers believe that should combine cameras, video recorders, and laser scanners to obtain real traffic. However, current approaches only learn from large-scale videos, due to the lack of benchmarks that consist of precise laser-scanner data. In this paper, we are the first to propose a LiDAR-Video dataset, which provides large-scale high-quality point clouds scanned by a Velodyne laser, videos recorded by a dashboard camera and standard drivers’ behaviors. Extensive experiments demonstrate that extra depth information help networks to determine driving policies indeed.
With recent advances in learning algorithms and hardware development, autonomous cars have shown promise when operating in structured environments under good driving conditions. However, for complex, cluttered, and unseen environments with high uncertainty, autonomous driving systems still frequently demonstrate erroneous or unexpected behaviors that could lead to catastrophic outcomes. Autonomous vehicles should ideally adapt to driving conditions; while this can be achieved through multiple routes, it would be beneficial as a first step to be able to characterize driveability in some quantified form. To this end, this paper aims to create a framework for investigating different factors that can impact driveability. Also, one of the main mechanisms to adapt autonomous driving systems to any driving condition is to be able to learn and generalize from representative scenarios. The machine learning algorithms that currently do so learn predominantly in a supervised manner and consequently need sufficient data for robust and efficient learning. Therefore, we also perform a comparative overview of 54 public driving datasets that enable learning and publish this dataset index at Specifically, we categorize the datasets according to the use cases and highlight the datasets that capture the complicated and hazardous driving conditions, which can be better used for training robust driving models. Furthermore, by discussions of what driving scenarios are not covered by the existing public datasets and what driveability factors need more investigation and data acquisition, this paper aims to encourage both targeted dataset collection and the proposal of novel driveability metrics that enhance the robustness of autonomous cars in adverse environments.
Conference Paper
Recent advances in Deep Neural Networks (DNNs) have led to the development of DNN-driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive without any human intervention. Most major manufacturers including Tesla, GM, Ford, BMW, and Waymo/Google are working on building and testing different types of autonomous vehicles. The lawmakers of several US states including California, Texas, and New York have passed new legislation to fast-track the process of testing and deployment of autonomous vehicles on their roads. However, despite their spectacular progress, DNNs, just like traditional software, often demonstrate incorrect or unexpected corner-case behaviors that can lead to potentially fatal collisions. Several such real-world accidents involving autonomous cars have already happened including one which resulted in a fatality. Most existing testing techniques for DNN-driven vehicles are heavily dependent on the manual collection of test data under different driving conditions which become prohibitively expensive as the number of test conditions increases. In this paper, we design, implement, and evaluate DeepTest, a systematic testing tool for automatically detecting erroneous behaviors of DNN-driven vehicles that can potentially lead to fatal crashes. First, our tool is designed to automatically generated test cases leveraging real-world changes in driving conditions like rain, fog, lighting conditions, etc. DeepTest systematically explore different parts of the DNN logic by generating test inputs that maximize the numbers of activated neurons. DeepTest found thousands of erroneous behaviors under different realistic driving conditions (e.g., blurring, rain, fog, etc.) many of which lead to potentially fatal crashes in three top performing DNNs in the Udacity self-driving car challenge.
We argue that Time-Sensitive Networking (TSN) will become the de facto standard for real-time communications in robotics. We present a review and classification of the different communication standards which are relevant for the field and introduce the typical problems with traditional switched Ethernet networks. We discuss some of the TSN features relevant for deterministic communications and evaluate experimentally one of the shaping mechanisms in an exemplary robotic scenario. In particular, and based on our results, we claim that many of the existing real-time industrial solutions will slowly be replaced by TSN. And that this will lead towards a unified landscape of physically interoperable robot and robot components.