A preview of this full-text is provided by Springer Nature.
Content available from Applied Intelligence
This content is subject to copyright. Terms and conditions apply.
https://doi.org/10.1007/s10489-020-01827-9
Deep learning and control algorithms of direct perception
for autonomous driving
Der-Hau Lee1·Kuan-Lin Chen2·Kuan-Han Liou2·Chang-Lun Liu2·Jinn-Liang Liu2
©Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
We propose an end-to-end machine learning model that integrates multi-task (MT) learning, convolutional neural networks
(CNNs), and control algorithms to achieve efficient inference and stable driving for self-driving cars. The CNN-MT model
can simultaneously perform regression and classification tasks for estimating perception indicators and driving decisions,
respectively, based on the direct perception paradigm of autonomous driving. The model can also be used to evaluate the
inference efficiency and driving stability of different CNNs on the metrics of CNN’s size, complexity, accuracy, processing
speed, and collision number, respectively, in a dynamic traffic. We also propose new algorithms for controllers to drive a car
using the indicators and its short-range sensory data to avoid collisions in real-time testing. We collect a set of images from
a camera of The Open Racing Car Simulator in various driving scenarios, train the model using this dataset, test it in unseen
traffics, and find that it outperforms earlier models in highway traffic. The stability of end-to-end learning and self driving
depends crucially on the dynamic interplay between CNN and control algorithms. The source code and data of this work are
available on our website, which can be used as a simulation platform to evaluate different learning models on equal footing
and quantify collisions precisely for further studies on autonomous driving.
Keywords Self-driving cars ·Autonomous driving ·Deep learning ·Image perception ·Control algorithms
1 Introduction
The direct perception model proposed by Chen et al. [1]
maps an input image (high dimensional pixels) from a
Jinn-Liang Liu
jinnliu@mail.nd.nthu.edu.tw;
http://www.nhcue.edu.tw/∼jinnliu/
Der-Hau Lee
derhaulee@yahoo.com.tw
Kuan-Lin Chen
mark0101tw@gmail.com
Kuan-Han Liou
jason839262002@yahoo.com.tw
Chang-Lun Liu
leo28833705@yahoo.com.tw
1Department of Electrophysics, National Chiao Tung
University, Hsinchu, Taiwan
2Institute of Computational and Modeling Science,
National Tsing Hau University, Hsinchu, Taiwan
sensory device of a vehicle to fourteen affordance indicators
(a low dimensional representation) by a convolutional
neural network (CNN). Controllers then drive the vehicle
autonomously using these indicators in an end-to-end (E2E)
and real-time manner. This paradigm falls between and
displays the merits [1–3] of the mediated perception [4–
8] and behavior reflex [9–13] paradigms. We refer to these
papers, some recent review articles [14–18], and references
there for more thorough discussions about these three major
paradigms in the state-of-art machine learning algorithms of
autonomous driving.
We instead study the interplay between CNNs and
controllers and its effects on the overall performance of self-
driving cars in training and testing phases, which are not
addressed in earlier studies. CNN is a perception mapping
from sensory input to affordance output. Controllers
then map key affordances to driving actions, namely, to
accelerate, brake, or steer [1].
These two mapping algorithms are generally proposed
and verified separately since automotive control systems
are very complex varying with vehicle types and levels
of automation [14–19]. A great variety of simulators have
been developed for simulation testing of autonomous cars in
Published online: 8 August 2020
Applied Intelligence (2021) 51:237–247
Content courtesy of Springer Nature, terms of use apply. Rights reserved.