Content uploaded by Alexandru Sorici
Author content
All content in this area was uploaded by Alexandru Sorici on Oct 14, 2019
Content may be subject to copyright.
Collecting and processing a self-driving dataset in
the UPB campus
Andrei Cristian Nica
andrei.nica@cti.pub.ro
Mihai Tr˘
asc˘
au
mihai.trascau@cti.pub.ro
Alexandru Andrei Rotaru
alexandru.rotaru@stud.acs.upb.ro
Constantin Andreescu
candreescu@stud.acs.upb.ro
Alexandru Sorici
alexandru.sorici@cs.pub.ro
AIMAS, Faculty of Automatic Control and Computers
University Politehnica Bucharest
Bucharest, Romania
Adina Magda Florea
adina.florea@cs.pub.ro
Vlad Bacue
Prime Motors Industries
Bucharest, Romania
vlad.bacue@primemotors.eu
Abstract—Although there is a diversity of publicly available
datasets for autonomous driving, from small-scale to larger
collections with thousands of miles of driving, we consider that
the process of collecting and processing them is often overlooked
in the literature. From a data-driven perspective, quality of a
dataset has proven as important as quantity especially when
evaluating self-driving technologies where safety is crucial. In this
paper, we provide a guideline going through all the steps from
configuring the hardware setup to obtaining a clean dataset. We
describe the data collection scenario design, the hardware and
software employed in the process, the challenges that must be
considered, data filtering and validation stage. This work stems
from our experience in collecting the UPB campus driving dataset
released together with this work. It is our belief that having a
clean and efficient process of collecting a small but meaningful
dataset has the potential to improve benchmarking autonomous
driving solutions, capturing local environment particularities.
Index Terms—autonomous driving, dataset collection, machine
learning
I. INTRODUCTION
Autonomous driving popularity has lead to an increase in
the last few years of vision-based dataset releases [7, 17,
18, 24]. Commercial interest has also spiked, with companies
involved in autonomous driving collecting vast amounts of
driving data. Unofficial statements place Uber at slightly over
1 billion miles and Waymo at 10 million miles collected, with
other companies not lagging behind too far. Data collection
has been at the forefront of the strive for the main algorithmic
components involved in self-driving decisions to live up to the
required safety standards. As data becomes more abundant, the
general belief is that the self-driving behavior becomes less
error-prone. However, most of the collected data consists of
well constrained and structured environments, like highway
driving in sunny weather. Thus, the challenge of ensuring
that autonomous vehicles behave predictably in complex,
cluttered environments remains to be resolved given the small
available margin, in the sense that even spurious errors might
have tragic consequences. Evaluating systems against these
environments [14, 21] indicates that such errors indeed occur
with current technologies. Considering that most vehicles tend
to be driven in largely the same geographical area, we believe
that collecting a dataset from the target environment has the
potential to increase performance and reduce errors.
In this paper we present our view on the process and results
of collecting a dataset for a specific location, in our case the
UPB campus. This work is part of a larger project intended
to develop computer vision methods for a wide array of
assisted driving tasks. Algorithmic modules will be proposed
for autonomous driving together with a prototype system based
on an electric converted car from Prime Motors Industry.
The data collection sub-project is split in two stages. The
first stage involves less expensive commodity hardware re-
quiring a less complex setup and faster collection procedures.
Although data is gathered at a faster pace and with less time
required to repair any software or hardware faults, the less
accurate setup means more time is spent during calibration,
data pre-processing, cleaning and validation stages. Therefore,
stage two of data collection involves more accurate sensors
and more data modalities will contain a LiDAR, 5 monocular
cameras, frontal radar and a dedicated IMU+GPS module.
This second method requires more effort to setup but usually
yields more reliable data. Our motivation for this two stage
approach is based on our plans of performing initial tests
on a low-cost setup, mainly targeting RGB sensors, and then
comparing performance versus a more complicated setup. This
paper describes the results of the first stage of data collection.
In terms of vision-based algorithm we intend to use the
collected dataset in order to evaluate end-to-end steering
models based on deep neural networks, similar to [25]. We
intend to leverage large scale datasets in a semi-supervised
manner in order to successfully transfer knowledge to our local
scenarios.
Our goal for the dataset is to collect particularities of
the local scene. We are interested in varying environmental
conditions like time of day or capture different behavior
by using multiple drivers. One aspect we have taken into
consideration is to record and then annotate examples of ”bad
driving”, in which the driver is instructed to deviate from
optimal trajectories. This produces recordings with situations
where the car veers slightly off the road, where it wrongfully
departs the lane or the car is driven into a dead end.
Another interesting aspect we introduced in our dataset was
the use of a three front facing camera setup. These commodity
cameras were mounted on top of the car in diverse setups
in which we varied their position or viewing angle, allowing
us to explore data augmentation techniques. Also notable is
that our driving sessions have been started and stopped after
completion in the same location, which was carefully chosen
to contain multiple visual queues at various ranges. This has
facilitated the calibration process and allowed for faster data
validation.
Our contributions are as follows:
•we provide an overview of a dataset collection pipeline
and process, with various challenges and our proposed
technical solutions for them
•we collected, processed, validated and released a public
dataset in the UPB campus
•we open-source and describe solutions and tools for these
steps
In Section II we describe other related datasets and how
they relate to our work. Section III describes the hardware
setup and the software tools and scripts used to collect the
dataset. We present solutions and ideas for cleaning the data
in Section IV. The characteristics of the dataset are discussed
in Section V, while in Section VI we present our conclusions
and ideas for future work.
II. RE LATE D WO RK
Being one of the first large datasets released, KITTI [11]
quickly became one of the most widely used datasets for
autonomous driving benchmarks. It provides diverse traffic
conditions in different illuminations of the environment. The
size varies depending on the desired vision task, summing up
at around 300 gigabytes. The data provided contains videos,
images, LiDAR and GPS in different traffic conditions.
Since then there has been various work and investment in
this direction, fuelled by advancements in the autonomous
driving industry. Latest research in [12] presents a well docu-
mented overview of publicly open datasets for autonomous
driving covering 45 released datasets. Recent works have
managed to advance the standards for this topic both in size
and in quality.
Two large-scale and complex datasets that we believe are
worth mentioning are NuScenes [18] and the ApolloScape [15]
dataset. NuScenes provides information from a comprehensive
autonomous vehicle sensor suite and it contains around 1k
videos with three-dimensional bounding boxes and behavior
labels. ApolloScape contains as well rich labelling and claimed
to make available at least 15 times more data than the state
of the art datasets for multiple autonomous driving related
tasks such as per-pixel labeling, instance segmentation, 3D
car instance labeling and many more. They proposed labeling
tools and algorithms for each task as the cost of manually
processing is prohibitive.
The Berkeley Deep Drive (BDD) [25] dataset, which in-
spired us in using their proposed data format—given the
similar setup—, has two main characteristics, namely the large
size and the diversity of conditions. BDD contains 100K
videos and provides urban, rural, highway traffic in different
illumination conditions. Each raw video measures around 40
seconds and are annotated with weather tags, time and scene
context (residential, tunnel, etc). This particularity is corre-
lated with characteristics needed for scaling machine learning
algorithms. However, the quality of some of the videos in
terms of their relevance to driving is somewhat degraded (e.g.
stationary cars, unintended changes in the position of the
recording smartphone).
One of the few research works that is similar with ours due
to its aim to discuss in more details a methodology of collect-
ing data at large scale was presented in [9]. They proposed a
full pipeline with all the stages for building an autonomous
driving dataset: off-loading, cleaning, synchronizing and ex-
tracting knowledge from data. Each stage is described with
its challenges and proposed technical solutions. Not only do
they formalize all the tasks, but they also present the hardware
components necessary to semi-automate a continuous data
collection pipeline.
As discussed in Section III we have used commodity
hardware and produced a quality dataset based on the campus
university road layout. However, as discussed in this paper,
the amount of work required to obtain a dataset is high and
all public research attempting to better explain and facilitate
this process is very important especially to the open research
community.
III. HAR DWARE AND S OF TWAR E TO OL S AN D CH AL LE NG ES
In the process of designing a self-driving dataset the first
step is to determine the hardware setup which is going to be
used. As stated previously, one of the goals of the project is
to implement a self-driving prototype on an electric car. In
our case, we used an electric powered Dacia Logan (version
2), converted from a petrol internal combustion engine. In our
experience, using an electric vehicle for autonomous driving
is sometimes easier in terms of control due to their usu-
ally advanced Battery Management System (BMS). Usually,
commercially available BMSs exposes useful information and
alerts regarding the charging state, acceleration and breaking
(i.e. regenerative breaking). Although communication with the
BMS is easy, recovering the state of other on-vehicle sensors
(steering angle, wheel speed, etc.) requires decoding messages
of the vehicle CAN bus.
From the point of view of installed sensors, we designed a
setup containing 3 cameras (Logitech C930e), a CAN to
USB adapter (Orion BMS CANdapter) and a smartphone
(Samsung S8). All were connected to an onboard laptop
which served the role of recording the sessions.
The 3 cameras captured video at 1080 p with their bit-
stream copied to the output, without compression, resulting
in 30 FPS in MJPEG YUVJ422P format. Notably, the frame
rate drops to about 10 FPS in low light conditions. Recording
TABLE I
RAW SEN SO R DATA COLL EC TIO N LO G FRE QUE NC Y MET RI CS.
Source data stream Hz and very low frequency drops conditions
Mean Std Threshold Conditions
Camera 30 Hz - 10 Hz low light
Phone 60 Hz 5 Hz 4 stdev every 41min
- GPS 1 Hz
Steer 100 Hz 7.4 Hz 4 stdev every 45min
Speed 25 Hz 2 Hz 4 stdev every 47min
using H.264 format proved unreliable due to latency and
instability with the 1080 p resolution. We represent other data
logging frequency measurements in Table I and present how
often outliers occur in the system pipeline. Even in this small
amount of record time we notice a gap in raw CAN data for
distances of almost 12 m while the car was driving at a speed
of about 40 km/h.
The cameras were mounted front facing on top of the
car, just above the windscreen, using commodity suction cup
camera mounts. The central camera was positioned along the
longitudinal axis of the car. The position of two lateral cameras
have been set at either 40 or 56 cm away from the central
camera, and the viewing angle either 0° or 30° from the
longitudinal axis. The smartphone was placed horizontally
on the center of the dashboard. Figure 1 exemplifies the
positioning of the cameras and of the smartphone.
Fig. 1. Camera and smartphone positioning setups. In some setups the
cameras were translated away from the center of the car, and in other setups
the camera viewing angle was rotated outwards.
Most of the software tools for recording, cleaning, synchro-
nizing, processing and visualizing the data are implemented
as python scripts, with some of them running on the onboard
laptop. The following list briefly describes the most important
scripts that we used for the dataset collection and preparation:
•collect.py - This launches the recording server based
on a configuration file which specifies data sources and
formats. It starts threads for each data source and displays
the log screen for them, with recording starting only after
all desired sensor sources are validated. Incoming data is
saved raw so as to avoid the possibility of losing data
due to encoding lag, and each data item is assigned a
timestamp.
•Phone sensor collection app - A Unity ap-
plication deployed on the Android smartphone which
gathers the required data from the embedded sensors and
passes them to the server via a websocket connection.
Unity was used for ease of cross-platform deployment of
the app.
•camera.py - The camera script offers a configurable
FFmpeg1recording thread and tools for visually assisting
the user during camera installation.
•can.py - Raw CAN messages are recorded using the
default Linux candump2tool which can also be used
for package analysis. The message decoding process
is configured by a DBC Format file which offers the
conversion scheme.
•process_raw_data.py - This tool makes use of the
recorded session configuration and guides the process
of validating, syncing and post-processing all gathered
sensor data.
•visualize_session.py - We use this tool after
the session data has been processed for viewing several
statistics such as those mentioned in Table I and Figure
6. Also the scrip visually replays all important dataset
features while easily annotating important timestamps.
•trajectory_visualizer.py - We offer a tool to
preview the future car path and several distance land-
marks in the camera frame based on the camera properties
(intrinsic and extrinsic), the car specifications (wheelbase
and steering ratio) and steering degree.
•clean_dataset_generator.py - This is the final
step in exporting all session data in a homogeneous
format, described in Section V. It uses previously anno-
tated desired labels for export and partitions the data in
fixed temporal length splits of 36 seconds. In this stage,
unwanted sections are disregarded (e.g. long stationary
sequences) and other configurable transformations are
applied (e.g. video compression).
•evaluate_driving.py - We also provide a tool for
estimating an autonomy measure, as proposed in [2],
which is based on the number of interventions necessary
when the simulated vehicle departs from the recorded
trajectory. It takes advantage of the three camera setup
and visually simulates deviations from the recorded ori-
entation and position of the car.
For each script exposing a recording function we have also
provided tools for raw data visualization, preprocessing and
for metrics computation.
Another possible solution for recording all the data that
we have considered is using the ROS framework3. This
allows taking advantage of a system already designed for
communication involving heterogeneous data streams and all
the tools already implemented for data playback. Systems
such as the Apollo Framework 3.0and other datasets [19]
1https://ffmpeg.org/
2https://github.com/linux-can/can-utils
3http://www.ros.org/
used this framework for this use case. The first reason for
us choosing a different solution was due to the resulting high
dataset size. We estimated storing one single raw 1080 p video
stream at 30 FPS would have reached more than 500 GB. This
would most likely result in low frequency because of storage
and bandwidth limitations. The second one was the known
latency and throughput problems of ROS based systems, also
indicated in [13], which became one of the major reasons for
the development of ROS24.
Calibrating the sensors is an important step ensuring reliable
data is being recorded. For a vision based dataset like the one
we collected camera calibration before a recording session is
a necessary step.
For computing the camera intrinsic matrix, we resorted
to the ROS solution5for monocular cameras using the
cameracalibrator.py node that outputs camera and
distortion coefficients matrices for the three cameras used in
the data collection process.
We have tested two different methods for obtaining the cam-
era extrinsic matrix. The first method consists in measuring
the distances from the camera to a set of crafted landmarks.
A bulls-eye like pattern was used to mitigate possible errors.
Measurements were performed using laser tape measures tools
relative to four points, representing the four corners of the car
windshield. The results were later translated to the coordinates
system origin which we considered to be the midway point of
the car rear axle, just as in the Apollo framework6. Feeding
the measurements into a trilateration algorithm resulted in
a system of linear equations, whose solution we chose to
approximate using the conjugate gradient method. The second
step in determining the extrinsic calibration matrix was to
compute the rotation vector. We have manually extracted 2D
coordinates of pixels representing the center of the landmark
in the picture. Then, the 2D−3Dcorrespondences were used
for solving a Perspective-n-Point (PnP) problem using tools
from the OpenCV library [3].
The second method relies on visual queues of known land-
marks, taking advantage of parallel and perpendicular lines
and planes. Each value of the rotation vector was determined
by matching real distances and lines with their projected
counterparts. The translation vector was determined prior to
this by measurements of the camera positioning which was
determined according to the car size specifications provided
by the manufacturer. Again, the midway point of the car rear
axle was used as the system origin.
In the second iteration of the dataset, which will contain
more advanced sensors, calibrations will be performed using
the LiDAR system. We will employ a combined LiDAR and
camera calibration approach similar with those presented for
other datasets [18].
Another challenge, also considered in the literature [4, 9], is
the synchronization procedure of all the sensors. An efficient
4https://index.ros.org/doc/ros2/
5http://wiki.ros.org/camera calibration/Tutorials/MonocularCalibration
6https://github.com/apolloauto
method is to generate the same signal on all the different
sources and offset the respective data streams accordingly. We
propose using either the turn signals light or the horn sound
(which in our case though, was not logged on the CAN bus).
Both signals can be recorded by all devices, since the related
events are usually advertised on the CAN bus, together with
either sound or video from the phone and cameras at session
start. We mention that a variation in synchronization offsets
might be possible, but we did not experience this during our
45 min recording sessions. In order to obtain higher quality
data we pay close attention if any offsets are observed while
replaying the processed data. We also notice that during our
first trials we were able to efficiently synchronize a session
by manually annotating the collision timestamps with speed
bumps and synchronize them with corresponding accelerome-
ter sensor spikes and camera movement.
IV. DATA CL EA NI NG ,AN NOTATI ON A ND VAL IDATI ON
Based on the phone sensors the approximated car global
path trajectory has high errors with temporally collocated drifts
that reach as high as an estimated 35 m as seen in Figure
2. The GPS recordings produce a mean error estimate of
4.5 m ±3.6 m. We observe an increase in global location
accuracy when computing the car path based on the speed
and steer angle reported on the CAN bus. To mitigate error
accumulation, we split the data in chunks of 36 s and man-
ually calibrate them using visual queues from the map and
video data, with the possibility of further error correction for
sub-segments. Although we do not reach the recommended
location accuracy of a few centimeters [22], we are able to
lower it to an estimated maximum global error of 1.5 m, which
can be further improved through more fine-grained manual
calibration.
Fig. 2. Difference between GPS coordinates in red and corrected path in
blue. Measurements in the images are represented in meters.
Considering one of the intended use cases of our dataset,
we recorded deviations from the normal car trajectory (erratic
driving sessions, Figure 7.d) for performing a qualitative
assessment of out-of-distribution drifts. Using the vector map
we manually approximate what should have been the correct
UTM coordinates of the car, establishing the ground truth
for what should have been a normal driving behavior (useful
in a quantitative assesment). Another solution, given enough
data, that could have been used to describe correct trajectories
for normal driving behaviour is to estimate the mean of
previous non-erratic driving data recorded within that region.
However, since the campus is open to the public and lacking
in sidewalks, there are many situations and specific locations
where the driver is forced to avoid pedestrians on the road
(Figure 7.f). This means that the average trajectory produced
would become skewed for such locations. We provide the
approximated ground truth for a quantitative evaluation within
the rideID JSON file.
In the process of cleaning the dataset we automatically
disregard sequences where there are drops in any sensor stream
above a set threshold. A minimum of 3 data points per seconds
is considered. We synchronize all the data using manually
annotated timestamps and we perform a shift for small drifts
in the time the phone log was received versus the time it was
recorded. We disregard any variance in the delays of the CAN
and video data and use a constant offset. We manually check
all the synchronised processed raw data and manually annotate
unintended driving behaviour since it cannot be inferred from
the information available in the collected data. Situations such
as periodic, unannounced maintenance stops (e.g. checking
camera positioning) or backing up the car after erratic driving
situations are labeled and disregarded at export time. We also
check for high errors in car steering geometry by estimating
the mean steering collected from the car on long straight
trajectories of driving.
Among the errors encountered in the data some have proved
to have interesting sources. The magnetometer sensor was
often affected by the power inverter of the car which was
installed very close to the dashboard under the bonnet, without
us realizing. Speed read from the car sensors were affected
running into speed bumpers. The recorded steering angle
reaches higher errors during tight turns. Also, even with the
high frequency of recording the raw CAN data we discovered
missing information, sometimes for up to 10 m while driving
at 30 km/h.
As it is expected larger labeled datasets increases per-
formance of machine learning algorithms [20]. In spite of
this, efforts for improving labeling tools has been spread too
thin and although many autonomous driving dataset articles
mention such software, these are often developed in-house and
are not publicly available [15, 25].
We use the open source software toolkit ELAN7to annotate
different driver behaviors and causal relationships as proposed
in [19]. We provide a annotation tool for labeling navigation
commands before intersections. The tool automatically sets
labels after intersections are marked on the map. Commands
can be either discrete (e.g. turn right, left) or represented
by future course difference and can be used to conditionally
train the car to steer in intersections similar with previous
works [6, 23]. We also annotate causal relations that cannot
be inferred from the data so we can avoid exporting them.
(e.g. lane change with hesitation).
7https://tla.mpi.nl/tools/tla-tools/elan/
To increase video labeling efficiency and because of the
high overlapping fields of view of the three cameras we first
produce a panorama by stitching together all three images.
For annotating object bounding boxes we use CVAT8. In an
attempt to implement a guided and interactive segmentation
process on our dataset we used a state-of-the-art segmenta-
tion network trained on Cityscapes [10] to generate initial
segmentation proposals. However, correcting the segmentation
proposals of the network proved less efficient than to manually
annotate them from scratch. This is true especially for the road
class which usually overlaps with many dynamic objects (like
vehicles and pedestrians).
Finally, given annotated timestamps of desired labels, we
provide an export tool which splits all the data in a homo-
geneous format of a fixed time length (36 s in our case).
Each set is identified by an unique ID and contains a list of
features for every location point of the trajectory which was
generated at about 100 Hz. The frequency is determined by the
highest logging rate of all sensors with the other data being
interpolated linearly to match this rate.
V. DATA FEATU RE S,PREVIEW AND USAGE
Our dataset contains features that are organized in the the
following form:
•rideID.json which contains:
–ID identifing the session
–startTime and endTime (Unix timestamp re-
ported in milliseconds)
–cameras intrinsic and extrinsic parameters
–locations (easting, northing, timestamp, course, lati-
tude, longitude, speed, steer, break)
–raw_phone (20 features collected from phone sen-
sors: GPS, Accelerometer, Gyroscope, Magnetome-
ter)
–raw data from CAN (steer, speed, acceleration pedal,
break pedal
•rideID-camera_id.mp4 for each camera
(camera_id values are 0: Center, 1: Left, 2: Right)
Table II describes key metrics of the UPB campus dataset,
usually for most driving data collections. Also, Figure 6
describes the values obtained for the main features of the
dataset.
We built the vector map, depicted in Figure 3, of the
university campus road network based on a modified version of
the OpenDRIVE format [8]. The map is customized according
to the requirements in the Apollo Open Platform [1] making it
compatible for route planning. It contains a logical description
of the road network, with information such as direction of
lanes, pedestrian crossings and traffic signs.
To validate our entire dataset production pipeline we use
the exported data and evaluate it with the publicly avail-
able discrete action prediction model released for Berkeley
DeepDrive Video Dataset9. This model is in accordance with
8https://github.com/opencv/cvat
9https://github.com/gy20073/BDD Driving Model
TABLE II
KEY M ETR IC S FOR T HE UPB DATAS ET
UPB Campus dataset
Normal driving Erratic driving
Videos 408 x 3 45 x 3
Frames 458026 53716
Distance (km) 72.7 7.0
Duration (min) 254 30
Low light (% of total) 21%
Intersections 739
- Turn Left 164
- Turn Right 172
- Straight 403
Fig. 3. University campus vector map.
the work done by [24] but we emphasize that it is not
within the list of models evaluated in the published article.
In their work they investigate various models and report
accuracy scores for this task, on the BDD dataset, varying
from 82.0 % for the CNN −1−F rame configuration up to
84.1 % for the F CN −LS T M configuration. We evaluate the
T CN N 1model with a temporal convolution of window size 1
which yields 77.52 % accuracy on the publicly available BDD
evaluation set. The network was trained on the BDD dataset
[25] to predict discrete driving actions such as Straight,
Stop,Turn Left and Turn Right by minimizing the
cross entropy loss using a weighting scheme to compensate
for the biased distribution in steering. The discrete turning
actions were obtained from the relative difference in course
calculated with a period of 1/3 seconds by using a hard
threshold of 35°. The Stop action is determined by a decrease
in speed under 2 km/h. As presented in Table III we obtain
an accuracy of 62.22 % which is lower than the random guess
of 76.07 % computed on the uneven distribution of classes.
Figure 4 depicts the confusion matrix, which reveals that the
limit between considering Straight and Stop actions is
very sensitive and difficult for the network to discern. Results
indicate that the model trained on the 700.000 videos is not
able to yield very good results on setups like ours. There are
TABLE III
EVALUATI ON RE SU LTS OF T HE BDD MODEL USING THE UPB CAMPUS
DATASE T
Accuracy.
Random guess 76.07%
TCNN1 62.22%
Classes distributions.
Straight Stop Turn Left Turn Right
86.87% 5.45% 4.45% 3.23 %
several works reporting on the topic of predicting steering
commands, like [2, 5, 16], and to the best of our knowledge
only [4] made available an official model which uses video
and laser points data as input.
Fig. 4. Confusion matrix of discrete steering predictions.
We present in Figure 5 a data point visualization of the
entire dataset corrected path coordinates. We reach a maximum
of 41 passes on the same road segment across all trips.
Fig. 5. Visualization of all UTM coordinates for all trips collected in this
UPB campus dataset.
Fig. 6. Histograms of sensor measurements for the entire dataset. The Y axis represents the number of data points for each of the raw stream (sensors produce
data at different rates). Note that we are using log scale.
Fig. 7. Examples of video frames from our dataset. (a) Synchronized views in a three camera setup. (b) Night time. (c) Normal driving behaviour. (d) Erratic
driving behaviour. (e) Unmarked road. (f) Pedestrians walking on the road. (g) View from the left camera in a setup with parallel positioning.
VI. CONCLUSIONS AND FU TU RE WORK
Our work describes the steps that were necessary to take
in order to record, process and validate a dataset for self-
driving applications. From a technical point of view advice
regarding this in the literature is pretty scarce and we view
our proposals as a guideline for helping others develop much
faster their own local dataset. We consider, as indicated by
some of the results in Section V, that a safe adaptation to
the particularities of a specific geographical area (in terms
of road network, traffic participants behaviors, etc.) can only
be achieved with local data. This view is also supported by
works like [12, 19]. Moreover, data format homogeneity is
also a desired feature since it allows faster prototyping and
knowledge transfer from one dataset to another.
The dataset presented in this paper is the first iteration of
a two stage data collection strategy. The first phase is based
on low-cost commodity sensors and hardware with quick setup
time, easy calibration and usage. Data collected in this manner
is useful for rapid prototyping and initial evaluations in the
process of transferring knowledge from a large scale dataset.
To this end, we hope that in the future we will see better
benchmarking standards and widely tested technologies on a
larger diversity of locally gathered data. The second stage of
our work will be based on more accurate sensors and a setup
more robust to errors. Although the need for robust algorithms
which can handle practical use within comfortable safety limits
has captured most attention in the literature, the sensing and
processing setup need to be proven at least as robust. It is not
only an issue on the algorithms need to adapt to new setups
but also that it is imperative to ensure reliability, redundancy,
low-latency and adequate data bandwidth.
The lessons learned in this first stage will reflect upon the
preparation of the second stage. This concerns using fixed
mounts at known coordinates relative to the base of the
vehicle, a more reliable data stream synchronization method
and specialized extrinsic calibration landmarks.
Future work concerning the first stage of data collection
involves finishing the labeling procedures of driving related
objects and semantic segmentation of image frames and mak-
ing them publicly available. Also, it is our intent to develop
and evaluate end-to-end steering and investigate knowledge
transferring from larger driving datasets.
ACKNOWLEDGMENT
This research was funded by grant PN-III-P1-1.2-PCCDI-
2017-0734 and grant NETIO – subs. 1225/22.01.2018
REFERENCES
[1] Baidu. Apollo open platform for autonomous driving,
2018.
[2] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner,
B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller,
J. Zhang, et al. End to end learning for self-driving cars.
arXiv preprint arXiv:1604.07316, 2016.
[3] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal
of Software Tools, 2000.
[4] Y. Chen, J. Wang, J. Li, C. Lu, Z. Luo, HanXue,
and C. Wang. Lidar-video driving dataset: Learning
driving policies effectively. In The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June
2018.
[5] L. Chi and Y. Mu. Deep steering: Learning end-to-end
driving model from spatial and temporal visual cues.
arXiv preprint arXiv:1708.03798, 2017.
[6] F. Codevilla, M. Miiller, A. L´
opez, V. Koltun, and
A. Dosovitskiy. End-to-end driving via conditional imi-
tation learning. In 2018 IEEE International Conference
on Robotics and Automation (ICRA), pages 1–9. IEEE,
2018.
[7] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. En-
zweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele.
The cityscapes dataset for semantic urban scene under-
standing. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 3213–
3223, 2016.
[8] M. Dupuis et al. Opendrive format specification. VIRES
Simulationstechnologie GmbH, 2010.
[9] L. Fridman, D. E. Brown, M. Glazer, W. Angell, S. Dodd,
B. Jenik, J. Terwilliger, J. Kindelsberger, L. Ding, S. Sea-
man, et al. Mit autonomous vehicle technology study:
Large-scale deep learning based analysis of driver be-
havior and interaction with automation. arXiv preprint
arXiv:1711.06976, 2017.
[10] J. Fu, J. Liu, H. Tian, Z. Fang, and H. Lu. Dual
attention network for scene segmentation. arXiv preprint
arXiv:1809.02983, 2018.
[11] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision
meets robotics: The kitti dataset. The International
Journal of Robotics Research, 32(11):1231–1237, 2013.
[12] J. Guo, U. Kurup, and M. Shah. Is it safe to drive?
an overview of factors, challenges, and datasets for
driveability assessment in autonomous driving. arXiv
preprint arXiv:1811.11277, 2018.
[13] C. S. V. Guti´
errez, L. U. S. Juan, I. Z. Ugarte, and V. M.
Vilches. Time-sensitive networking for robotics. arXiv
preprint arXiv:1804.07643, 2018.
[14] S. Hecker, D. Dai, and L. Van Gool. Failure prediction
for autonomous driving. In 2018 IEEE Intelligent Vehi-
cles Symposium (IV), pages 1792–1799. IEEE, 2018.
[15] X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and
R. Yang. The apolloscape open dataset for autonomous
driving and its application. ArXiv e-prints, 2018.
[16] J. Kim and C. Park. End-to-end ego lane estimation based
on sequential transfer learning for self-driving cars. In
Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops, pages 30–38, 2017.
[17] W. Maddern, G. Pascoe, C. Linegar, and P. Newman.
1 year, 1000 km: The oxford robotcar dataset. The
International Journal of Robotics Research, 36(1):3–15,
2017.
[18] NuTonomy. Nuscenes dataset, 2018.
[19] V. Ramanishka, Y.-T. Chen, T. Misu, and K. Saenko. To-
ward driving scene understanding: A dataset for learning
driver behavior and causal reasoning. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 7699–7707, 2018.
[20] C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revis-
iting unreasonable effectiveness of data in deep learning
era. In Proceedings of the IEEE international conference
on computer vision, pages 843–852, 2017.
[21] Y. Tian, K. Pei, S. Jana, and B. Ray. Deeptest: Automated
testing of deep-neural-network-driven autonomous cars.
In Proceedings of the 40th international conference on
software engineering, pages 303–314. ACM, 2018.
[22] R. Vivacqua, R. Vassallo, and F. Martins. A low cost
sensors approach for accurate vehicle localization and
autonomous driving application. Sensors, 17(10):2359,
2017.
[23] Q. Wang, L. Chen, and W. Tian. End-to-end driving
simulation via angle branched network. arXiv preprint
arXiv:1805.07545, 2018.
[24] H. Xu, Y. Gao, F. Yu, and T. Darrell. End-to-end learning
of driving models from large-scale video datasets. In
Proceedings of the IEEE conference on computer vision
and pattern recognition, pages 2174–2182, 2017.
[25] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan,
and T. Darrell. Bdd100k: A diverse driving video
database with scalable annotation tooling. arXiv preprint
arXiv:1805.04687, 2018.