Conference PaperPDF Available

Real-time detection and recognition of traffic signs


Abstract and Figures

Automated recognition of traffic signs is becoming a very interesting area in computer vision with clear possibilities of its application in automotive industry. For example, it would be possible to design a system which could recognize the current speed limit on the road and notify the driver in an appropriate manner. In this paper we deal with methods for automated localization of certain traffic signs, and classification of those signs according to the official designations. We propose two different approaches of determining the current speed limit after the sign was localized. A demo software system was developed to demonstrate the presented methods. Finally, we compare results obtained from the developed software, and discuss the influence of different parameters on recognition performance and quality.
Content may be subject to copyright.
Real-time Detection and Recognition of Traffic Signs
A. Martinović, G. Glavaš, M. Juribašić, D. Sutić and Z. Kalafatić
Faculty of Electrical Engineering and Computing
Unska 3, 10000 Zagreb
E-mail: {andelo.martinovic, goran.glavas, matko.juribasic, davor.sutic, zoran.kalafatic}
Abstract - Automated recognition of traffic signs is
becoming a very interesting area in computer vision
with clear possibilities of its application in automotive
industry. For example, it would be possible to design a
system which could recognize the current speed limit
on the road and notify the driver in an appropriate
manner. In this paper we deal with methods for
automated localization of certain traffic signs, and
classification of those signs according to the official
designations. We propose two different approaches of
determining the current speed limit after the sign was
localized. A demo software system was developed to
demonstrate the presented methods. Finally, we
compare results obtained from the developed software,
and discuss the influence of different parameters on
recognition performance and quality.
Traffic sign detection and recognition has found its
application in many driver assistance systems, which aim
to display helpful information to the driver using
knowledge about the current conditions on the road. A
complete system should have three distinct functions:
1. detection of a traffic sign in an image;
2. classification of the detected sign;
3. sign tracking through time.
A complete traffic sign detection and recognition system
should be able to recognize all of the traffic signs used in
Croatia. Croatian regulations define five sign classes:
warning signs, explicit order signs, information signs,
direction signs and supplemental panels [10]. Due to the
limited annotated image database of traffic signs, we
focused our efforts on detecting and classifying only a
subset of explicit order signs. After a detailed analysis of
image and video databases at our disposal, we determined
that the five most common traffic signs in this category are
those shown in Fig. 1.
Fig. 1. Five common signs in category of explicit order traffic
Since all of the signs from Fig. 1. are similar in
appearance (red circles containing black or red symbols),
our detection algorithm is trained to detect only circular
traffic signs, while an additional classification stage is
needed to separate these signs from each other.
Additionally, there is one traffic sign of specific
importance: the speed limit sign (rightmost on Fig. 1). The
system should also be able to determine the exact speed
limit, if the corresponding sign is classified as such.
This work is organized as follows. In Section 2 we
mention different approaches by various researchers. In
Section 3 we briefly describe an algorithm used to detect
sign in an image. In Section 4 we present two algorithms
used to recognize the detected sign. Section 5 contains
details about how to combine results from individual
frames to achieve detection and tracking in a video.
Results are illustrated and discussed in Section 6. We
conclude this article in Section 7.
The approaches to sign detection vary in use of color and
geometric information. Various color-based approaches
use RGB or other color models (i.e. HSV, L*a*b,
CIECAM97, etc.). Intensity decoupling color schemes
[1,5] are preferred because of diverse lightning conditions
usually encountered in real life applications. Some authors
use simple thresholding [1,2], while other use clustering
methods [3] or recursive region splitting [4,5,6].
Geometric information can be extracted with Hough
transform [6,7,15], histogram of orientation vectors [5,13]
or template matching [16]. Standard classifiers like SVM
can be used with these geometric features [13]. A large
body of work is based on the Viola-Jones detector
proposed in [8]. This approach has been used in
[9,10,11,12]. Most Viola-Jones based implementations
extract shape information in grayscale images, while [9]
uses color based Haar features. Neural networks are used
for detection in [14].
Unlike some of the related work, which considers static
images [5,6,7], our system works on video sequences in
real time (over 20 fps) on a mainstream CPU (~2GHz).
Detecting an object in an image is a computer vision
problem for which a wide variety of algorithms exist.
Because we wanted our system to work in real-time, we
have decided to employ the Viola-Jones detection
A. Viola-Jones algorithm in traffic signs detection
The Viola-Jones detector works by sliding a detection
window across an image. At each position, the classifier
makes the decision if there is a desired object inside the
window. In the vast majority of window’s positions, the
object is not found. The number of classifications for an
image is equal to the number of windows positions, which
can be in the order of 105 or 106. This is why the
classification itself has to be as fast as possible.
Viola-Jones algorithm is based on a cascade of boosted
Haar’s features. More on the Haar’s features can be found
in the original paper [8]. Boosting is done through
AdaBoost, a machine learning algorithm which combines
weak classifiers built on Haar’s features into a strong
classifier. Given enough different weak classifiers,
AdaBoost will produce a strong classifier with arbitrary
precision. A theoretic proof and a good introduction to
boosting can be found in a tutorial from AdaBoost’s
creators, Freund and Schapire [17]. The decision of
whether the object is detected is made through voting of
weak classifiers, each according to its weights. Cascading
classifiers speeds up this process significantly, because
more important weak classifiers get to vote first: if their
decision is negative, the image is rejected and other less
important classifiers do not vote at all. Object is classified
positively only if it successfully passes through the
cascade, positively classified in each stage. Final classifier
works in real time because:
Haar’s features take constant time to calculate
from integral image;
in a classifier produced by AdaBoost, voting is
done as a summation of weighted classifiers;
on average, only a small subset of classifiers votes
every time because of the cascading.
An analysis of the variants in the training process can be
found in a paper by Lienhart, Kuranov and Pisarevsky
B. Viola-Jones training
To train the classifier, we used 757 images of traffic
signs. Each positive image contained only a cropped traffic
sign normalized to the size of 24x24. 3000 images were
used as negatives. We trained the cascade with OpenCV,
which is an open source library of computer vision
functions. We used it with the following parameters:
Minimum hit rate of 0.995 per stage. Only one in
200 positive images is falsely rejected in every
stage, the others are positively classified.
Maximum false positives of 0.4 per stage. This
allows that up to 40% of the images positively
classified are false positives.
Number of stages was set to 20.
With these parameters the theoretic hit rate is expected
to be more than 0.99520 0,9 with outmost 0.420 10-8
false positives.
Training of Viola-Jones detector took approximately 16
hours on a 4 CPU computer, with OpenMP enabled. The
training procedure stopped after reaching the desired
number of stages.
The trained cascade was afterwards tested with a test set
of 286 images. Images from the test set were taken with a
different camera and under various lightning conditions.
Results are shown in Table 1. The number of false
positives is expressed related to the number of signs in the
test set.
Experimental results for trained Viola-Jones detector
Hits Misses False
1.3 61.53% 38.46% 11.88%
1.2 67.13% 32.86% 18.88%
1.1 75.17% 24.83% 28.67%
Viola-Jones detector works by sliding a detection
window across an image, and enlarging that window by a
scale factor after the end of the image is found. Therefore,
modifying the scale factor affects the detection quality and
speed. By reducing its size, we increase the possibility a
sign will be detected, but we also increase the time needed
for the algorithm to finish. In our work we used the scale
factor of 1.1 because it provided the best hit ratio with an
acceptable frame rate (over 20 fps). False positives are
expected to be removed by later stages of sign recognition.
Further analysis of the results revealed that most
unsuccessful detections are caused by signs which are
smaller than the samples used to train the cascade (24x24
pixels). This behavior is not unexpected; its impact
diminishes when the algorithm is used on a video sequence
because traffic signs grow in virtual size as the sequence
progresses. In a certain moment, it will become large
enough to be detected.
After a sign was successfully detected in an image, the
classification process begins, as to determine the type of
the sign. The classifier expects an adequate input vector,
which must firstly be prepared by means of image
A. Sign preprocessing
The resulting sign from the detection stage can have
arbitrary size. In order to correctly classify the sign, a size
normalization is required. In our work, we used a standard
sign size of 10x10 pixels. Image resizing procedure is
implemented using bilinear interpolation algorithm. A
clipping operation is also required, in such a manner that
only the central part of the sign remains, which holds
useful information. Fig. 2 shows results from two different
interpolation algorithms.
Fig. 2. (a) nearest neighbor interpolation, (b) bilinear filtering
After size normalization, color information is discarded.
Conversion from 24-bit RGB space to grayscale image is
conducted by ITU CCR 601 standard. The resulting
grayscale image has to be transformed into a binary image
(image with pixel intensities 0 and 1) using a thresholding
algorithm. All pixels with intensities over a defined
threshold are assigned with value 1, and the remaining
pixels become 0. An iterative threshold selection method
[22] is used. This method produces very good results when
used on images where objects of interest are evenly
illuminated (which is the case with most traffic signs). In
the case of non-uniform illumination, it is advisable to use
one of the adaptive thresholding methods [23].
As a result of the segmentation procedure we get a
binary image with dimensions of 10*10 pixels. Input
vector for the classifier is formed by taking the binary
image as a one-dimensional vector with 100 elements, with
one slight modification. This modification replaces values
of 0 with values of -1, to improve the neural network
performance by distributing values equally around zero.
B. Classification using neural networks
There are many types of neural networks (e.g. feed-
forward networks, radial-basis networks, recursive
networks) and possible applications for them (e.g. pattern
recognition, function interpolation). It has been shown that
multilayer perceptron networks with a single hidden layer
and a nonlinear activation function are universal classifiers
[19, 20]. Therefore, in our work we have chosen a
multilayer perceptron (MLP) with back propagation (BP)
training for classification.
For the purposes of this project we have developed our
own software library for MLP trained according to BP
algorithm. The library provides support for creating and
training arbitrary MLP (arbitrary number of hidden layers
with arbitrary number of units in each layer) with the
sigmoidal activation function. Using the capabilities of the
developed MLP library we have created and trained two
different multilayer perceptrons.
- The purpose of the first network was to classify a
given traffic sign (input vector) into one of five
categories (as shown in Fig. 1).
- The task of the second MLP was to recognize the
actual speed limit if the first network classified the
input vector into speed limit category. Hence, the
second network was trained to recognize decimal
digits (0-9).
In the case of MLP there are always several network
parameters left to be determined experimentally (the
number of units in hidden layers, learning factor for weight
correction, etc.) [21]. To determine the optimal parameters,
we trained the MLP observing the performance on the
validation set to avoid overfitting.
Table II shows optimal parameters for both multilayer
perceptrons (the values for number of units in hidden
layer, learning factor for weight correction, maximal
number of epochs and satisfactory average epoch error
were obtained empirically).
Neuron numbers per layer
Input layer Hidden layer Output
100 (10x10
pixels) 10 5
Speed limit
10 10
Weight correction learning factors, maximum number of
epochs and average errors
factor (ŋ)
number of
average epoch
0.05 when
error drops
below 0.1
10 000
Speed limit
0.005 when
error drops
below 0.1
50 000
Input training samples for the first network were 10x10
pixels images of traffic signs obtained by localization
process on initial images. Input samples for speed limit
MLP were 6x12 pixels clear images of decimal digits, and
their copies with random noise added on the pattern. Noise
was created by flipping 10% of the bits in the original
binary image.
C. Determining the speed limit
In order to analyze the numbers in the speed limit sign, a
segmentation algorithm must be employed to correctly
separate the digits. A straightforward algorithm searches
for maxima in the vertical projection of the input image.
Fig. 3. Input image (top) and the corresponding vertical
projection (bottom)
It is possible that some artifacts or noise will remain in
the obtained digit after the segmentation is complete. To
minimize such interferences, we extract the primary
connected component (Fig. 4)
Fig. 4. Example of a binary image and the corresponding
connected components
The extracted digit is then normalized to a size of 6x12
pixels and transformed into the input vector for the digit
Along a classification by neural network, we developed
another method of digit-based classification based on
structural analysis.
Structural analysis deals with more complex structures
than pixels or edges. For example, it considers loops, line
ends and junctions. In order to extract this high-level
information from a binary image, we must obtain the
skeleton of the digit by thinning the object. The procedure
of skeletonization is actually a reduction of an object to a
graph, and it is mathematically defined with a medial axis
transform. Fig. 5. shows an example of skeletonization.
Fig. 5. Original digit (left) and its skeleton (right)
After the skeleton has been obtained, we can extract
structural features from it. We consider line ends, junctions
and loops, as shown in Fig. 6.
Fig. 6. Line ends (a), junctions (b),(c) and loops (d),(e)
Line end is defined simply as a black pixel with only one
black neighbor pixel. Junctions can be found by counting
the number of white-black transitions in the 8-pixel
neighborhood of the observed pixel. For determining the
number of loops, we can invert the image and search for
connected components. In the end we subtract 1 from total
number of components, because it represents the
Each digit can be described with the number of distinct
features and their relative positions in the image. For
example, “0” is the only digit with one loop and zero line
ends. Digits “1” and “2” have the same number of features
(two line ends and zero junctions and loops), but their
relative and absolute positions differ. We can use this
information to directly distinguish the digits.
In a video sequence, a sign will typically appear through
multiple consecutive frames. Due to the imperfectness of
the detection procedure, the sign might not be detected in
every frame. Additionally, there is a possibility that false
positives will appear. In order to efficiently track only the
sign that is actually in the video, the system would have to
remember information from previous frames and use it to
correct the detection in the current frame.
We propose a system which uses an auto-degrading
reinforcement principle. It is based on two premises:
1. Auto-degradation: The system should have a
short-term memory which only remembers
information in the certain amount of newest
frames. The system „forgets“ older information.
2. Reinforcement: The system should be updated if
an object is detected in the current frame.
The auto-degradation ensures that the information in the
nearer past will have more impact than the older
Such system can be implemented as a cluster of
accumulators. Each accumulator represents a certain object
that can be tracked. The value stored in the accumulator is
proportional to the number of detections of the respective
object. When an object is detected, the respective
accumulator's value is increased by a certain amount.
Objects not detected have their accumulators' values
decreased. If a value in one or more of the accumulators
becomes greater than the defined threshold, the object is
considered to be tracked. If the tracked object leaves the
field of view, the respective accumulator's value will be
decreased through time. When it falls under the threshold,
the object ceases to be tracked.
In our work, we developed a software system that
implements all of the described algorithms. The system
consists of two front-end applications with equivalent
program core. The first application can be used to detect
and recognize traffic signs in a stationary image, with an
easy-to-use graphical user interface (Fig. 7)
The second application uses a video file as its input and
can be used to detect traffic signs in the video in real-time.
The video application uses the same program core as the
static image application, with the addition of an object
tracking subsystem.
The developed system was tested using a test set of 146
static images (for the first application) and a video
sequence in duration of 98 minutes incorporating 128
traffic signs. The results are shown in Table IV. The
system has a 73% hit ratio. 15% of all errors are caused by
misdetections, while the other 13% are errors in
Fig. 7. Graphical user interface for detection of signs in static
Fig. 8. Video application with a recognized speed limit traffic
Static image application experimental results
Number Percentage
Total number of signs 146 100%
Correctly recognized 106 72.6%
Detection errors 22 15.1%
Classification errors 10 13%
Results obtained from the video application are shown in
Table V. The noted performance excels the performance of
the static image application. Since the only difference
between the two applications is the addition of the object
tracking algorithm, we conclude that the improved
performance is the result of the increased amount of
processed frames. For example, misdetection in a frame
can be rectified by a correct detection in one of the
following frames.
The application was tested on a dual core Athlon
processor (X2 5600+) with an average speed of 21 frames
per second and a dual core Intel processor (2,2 GHz) with
an average speed of 30 frames per second. The improved
performance on Intel processors is caused by the fact that
the OpenCV library uses optimized instructions on Intel
platforms, and even makes use of the Intel Performance
Primitives (IPP) if they are present.
Video application experimental results
Number Percentage
Total number of signs 128 100%
Correctly recognized 106 82.8%
Detection errors 13 10.16%
Sign classification errors 7 4.6%
Speed limit errors 2 1.56%
False positives 20
We developed a system that recognizes traffic signs
with real-time application as the main goal. With this goal
in mind we used the Viola-Jones detection algorithm and
neural networks for classification. The results obtained
from the developed software show that our system is
applicable for real-time video processing. Furthermore, we
conclude that the system has better results when used on a
video sequence, compared to the standard approach of
traffic sign detection in static images. Relatively low error
rates in classification stage indicate that the multilayer
perceptron can be successfully used as a classifier of
traffic signs and digits.
Further improvements of the system should include a
larger number of supported traffic sign classes. Currently,
the main holdback is relatively low number of training
samples for less frequent traffic signs. In addition, some
problems could arise when training the neural network
with a larger number of classes due to the increased
dimensionality of the search space. There is also a problem
of similarity between some traffic sign classes.
Other improvements would include recognition of signs
of different shape (triangular, for example), using more
advanced methods of feature extraction such as principal
component analysis, and increasing the code portability by
implementing the functions from external program
libraries, which could prove useful when implementing the
system on different platforms or embedded systems.
[1] W. D. Shaposhnikov , D. G. Shaposhnikov , Lubov. N ,
Er V. Golovan , A. Shevtsova, Road Sign Recognition
by Single Positioning of Space-Variant Sensor, Proc.
15th International Conference on Vision Interface, 2002,
pp. 213-217.
[2] Andrzej Ruta, Yongmin Lia, Xiaohui Liu, Real-time
traffic sign recognition from video by class-specific
discriminative features, Pattern Recognition, Vol. 43,
2010, pp. 416-430.
[3] S. Tominaga, Color image segmentation using three
perceptual attributes, Proc. CVPR-86, 1986, 628–630.
[4] R. Ohlander, K. Price, D. Reddy, Picture segmentation
using a recursive region splitting method, Computer
Graphics Image Processing Conference 13, 1978, pp.
[5] X.W. Gao, L. Podladchikova, D. Shaposhnikov, K.
Hong, N. Shevtsova, Recognition of traffic signs based
on their colour and shape features extracted using human
vision models, Journal of Visual Comunication and
Image Representation, 2006, pp. 675-685.
[6] John Hatzidimos, Automatic Traffic Sign Recognition in
Digital Images, Proceedings of the International
Conference on Theory and Applications of Mathematics
and Informatics - ICTAMI 2004, Thessaloniki, Greece,
pp. 174-184.
[7] V. Barrile, M. Cacciola, G. M. Meduri, F. C. Morabito,
Automatic Recognition of Road Signs by Hough
Transform, "International archives of the
photogrammetry, remote sensing and spatial information
sciences", n. XXXVI Part 5, 2008, pp. 62-67.
[8] Paul Viola and Michael Jones. Robust real-time object
detection. International Journal of Computer Vision,
2004, pp. 137-154.
[9] C. Bahlmann, Y. Zhu, Visvanathan Ramesh, M.
Pellkofer, and T. Koehler. A system for traffic sign
detection, tracking, and recognition using color, shape,
and motion information. Intelligent Vehicles Symposium,
2005, pp. 255–260.
[10] Karla Brkic, Axel Pinz and Sinisa Segvic ``Traffic sign
detection as a component of an automated traffic
infrastructure inventory system'', in Proc. of the annual
Workshop of the Austrian Association for Pattern
Recognition (OAGM/AAPR), Austria, 2009, pp. 1-12.
[11] Keller, C.G. Sprunk, C. Bahlmann, C. Giebel, J.
Baratoff, G., Real-time recognition of U.S. speed signs,
Intelligent Vehicles Symposium, 2008 , pp. 518-523.
[12] Ach, R.; Luth, N.; Techmer, A., Real-time detection of
traffic signs on a multi-core processor, Intelligent
Vehicles Symposium, 2008, pp. 307–312.
[13] Sho Shimamura, Satoshi Yonemoto, Road Sign
Recognition with Color and Edge based features, IEICE
Tech. Rep., vol. 108, no. 471, 2009, pp. 23-28.
[14] Ghica, D., Si Wei Lu, Xiaobu Yuan, Recognition of
traffic signs by artificial neural network, IEEE
International Conference on Neural Networks, 1995, pp.
[15] Hua Huang, Chao Chen, Yulan Jia, Shuming Tang,
Automatic Detection and Recognition of Circular Road
Sign, Mechatronic and Embedded Systems and
Applications, MESA 2008, pp. 626-630.
[16] Bogusław Cyganek, Road Signs Recognition by the
Scale-Space Template Matching in the Log-Polar
Domain, Proc. of the 3rd Iberian conference on Pattern
Recognition and Image Analysis, Part I, 2007, pp. 330-
[17] Freund, Y., Schapire R. A Short Introduction to Boosting.
Journal of Japanese Society for Artificial Intelligence,
1999, pp. 771-780.
[18] R. Lienhart, A. Kuranov, V. Pisarevsky, Empirical
Analysis of Detection Cascades of Boosted Classifiers for
Rapid Object Detection, DAGM'03, 25th Pattern
Recognition Symposium, Germany, 2003, pp. 297-304.
[19] G. Cybenko, Approximation by superpositions of a
sigmoidal function, Mathematics of Control Signals and
Systems 2,1989, pp. 303–314.
[20] K. Funahashi, On the approximate realization of
continuous mappings by neural networks, Neural
Networks 2, 1989, pp. 183–192.
[21] C. M. Bishop, Neural Networks for Pattern Recognition,
Oxford University Press, 1995.
[22] T.W. Ridler, S. Calvard, Picture Thresholding Using an
Iterative Selection Method, SMC(8), 1978, pp. 629-632.
[23] L.G. Shapiro, G.C. Stockman, Computer Vision, Prentice
Hall, 2002.
... The use of Neural Networks in the recognition stage for this application was investigated in many researches. Authors in [15][16] have developed Networks were the input layer consists in the extracted digit pattern. The total number of input neurons is equal to the total number of pixels. ...
... The total number of input neurons is equal to the total number of pixels. Respectively 400 pixels (20x20) for [15] and 72 pixels (6x12) for [16]. The use of a large number of input units requires more computation time and more memory resources. ...
... This table exhibits the classification percentage for the NN and the DT separately and then for the whole system. TABLE V shows the performance comparison of the proposed method with the method discussed in [16]. [16]. ...
Full-text available
This Paper presents a new hybrid technique for digit recognition applied to the speed limit sign recognition task. The complete recognition system consists in the detection and recognition of the speed signs in RGB images. A pretreatment is applied to extract the pictogram from a detected circular road sign, and then the task discussed in this work is employed to recognize digit candidates. To realize a compromise between performances, reduced execution time and optimized memory resources, the developed method is based on a conjoint use of a Neural Network and a Decision Tree. A simple Network is employed firstly to classify the extracted candidates into three classes and secondly a small Decision Tree is charged to determine the exact information. This combination is used to reduce the size of the Network as well as the memory resources utilization. The evaluation of the technique and the comparison with existent methods show the effectiveness.
... Although feature based shape detectors are more robust than direct shape detectors, the detected candidates are still not always accurately localized. Another way to detect traffic signs is to regard the regions containing traffic signs as maximally stable extremal regions [13], but this method needs manual selection of various thresholds. ...
... To work well, classification relies on accurate detection and localisation of candidate traffic signs. Some works [10], [13] have tried to concatenate detection and classification, adding a normalization step which aims to accurately locate the detected candidates. However these normalization steps just rely on shape detectors and are not robust enough for real applications. ...
Full-text available
We propose a localization refinement approach for candidate traffic signs. Previous traffic sign localization approaches, which place a bounding rectangle around the sign, do not always give a compact bounding box, making the subsequent classification task more difficult. We formulate localization as a segmentation problem, and incorporate prior knowledge concerning color and shape of traffic signs. To evaluate the effectiveness of our approach, we use it as an intermediate step between a standard traffic sign localizer and a classifier. Our experiments use the well-known German Traffic Sign Detection Benchmark (GTSDB) as well as our new Chinese Traffic Sign Detection Benchmark. This newly created benchmark is publicly available and goes beyond previous benchmark data sets: it has over 5000 high-resolution images containing more than 14,000 traffic signs taken in realistic driving conditions. Experimental results show that our localization approach significantly improves bounding boxes when compared with a standard localizer, thereby allowing a standard traffic sign classifier to generate more accurate classification results.
... More precisely, SVM is a binary classifier that separates two different classes by a subset of data samples called support vectors. It was implemented as a classifier for traffic sign recognition in [44,55,88,[129][130][131][132][133][134][135][136]. This classification method is robust, highly accurate and extremely fast which is a good choice for large amounts of training data. ...
Full-text available
The automatic traffic sign detection and recognition (TSDR) system is very important research in the development of advanced driver assistance systems (ADAS). Investigations on vision-based TSDR have received substantial interest in the research community, which is mainly motivated by three factors, which are detection, tracking and classification. During the last decade, a substantial number of techniques have been reported for TSDR. This paper provides a comprehensive survey on traffic sign detection, tracking and classification. The details of algorithms, methods and their specifications on detection, tracking and classification are investigated and summarized in the tables along with the corresponding key references. A comparative study on each section has been provided to evaluate the TSDR data, performance metrics and their availability. Current issues and challenges of the existing technologies are illustrated with brief suggestions and a discussion on the progress of driver assistance system research in the future. This review will hopefully lead to increasing efforts towards the development of future vision-based TSDR system.
Conference Paper
Due to exhaustive proposal generations, conventional 3D object tracking methods based on template matching are time-consuming. Some recent works that leverage template clues to directly obtain 3D boxes are more efficient, but they don't take full advantage of the template context. In this work, we propose a novel Multi-Level Context Fusion Network (MLCFNet) to track objects robustly. Our main idea is to fuse template context in multiple levels (point, local, and global features) into the search area and utilize the joint information to predict the final box. Specifically, a 3D Siamese Network firstly extracts multi-level features in the search area and template. Then, to promote the guidance of the template, a Context Fusion Network fuses these features into the search area and generates guided points. Finally, these points are used to regress potential object centers and cluster 3D object proposals. Experiments on KITTI and nuScenes tracking datasets demonstrate that MLCFNet outperforms other state-of-the-art methods by a large margin.
Driver Assistance and Monitoring System plays a very important role in traffic management especially in Indian roads. It eventually reduces the accidents and major injuries. DAMS (Driver Assistance and Monitoring System) give the safety and driving comfort. The main motto of our work is to design the effective methodology for the assistance and driver monitoring system which alerts the driver when it detects the road signs so that driver can take the appropriate action. The proposed methodology detects a road signs which is present in the dataset under cluttered background and different lighting conditions. The proposed work detecting the road sign based on colour and shape. The edge of the road sign is detected using canny edge operator. The images are enhanced and removed the noise using median filters. The images are classified as stop, no entry, speed limit using Convolutional Neural Network (CNN) classifier.
Full-text available
Traffic sign recognition is among the major tasks on driver assistance system. The convolutional neural networks (CNN) play an important role to find a good accuracy of traffic sign recognition in order to limit the dangerous acts of the driver and to respect the road laws. The accuracy of the Detection and Classification determines how powerful of the technique used is. Whereas SSD Multibox (Single Shot MultiBox Detector) is an approach based on convolutional neural networks paradigm, it is adopted in this paper, firstly because we can rely on it for the real-time applications, this approach runs on 59 FPS (frame per second). Secondly, in order to optimize difficulties in multiple layers of DeeperCNN to provide a finer accuracy. Moreover, our experiment on German traffic sign recognition benchmark (GTSRB) demonstrated that the proposed approach could achieve competitive results (83.2% in 140.000 learning steps) using GPU parallel system and Tensorflow.
This paper proposes the application of a swarm intelligence algorithm called Artificial Bee Colony (ABC) for the feature selection to feed a Random Forest (RF) classifier aiming to recognise Traffic Signs. In this paper, the authors define and assess several fitness functions for the feature selection stage. The idea is to minimise the correlation and maximise the entropy of a set of masks to be used for feature extraction results in a higher information gain and allows to reach recognition accuracies comparable with other state-of-art algorithms. The RF comprises as a committee based on decision trees, which allows handling large datasets and features with high performance, enabling a Traffic Sign Recognition (TSR) system oriented for real-time implementations. The German Traffic Sign Recognition Benchmark (GTSRB) was used for experiments, serving as a real basis for comparison of performance for the authors' proposal.
Full-text available
We study the problem of traffic sign detection in the context of traffic infrastructure inventory. The data acquired during filming the roads in Croatia is presented. Based on recent approaches, and motivated by constraints present in our data, we employ the Viola-Jones object detector for triangular warning signs detection. The detector achieves correct detection rates better than 90%, which is sufficient for our application. The false positive rate is a concern, in some cases being higher than 160%, so the causes of false positive detections are analyzed in detail. We suggest a new approach of fusing the Viola-Jones detector with a priori knowledge, in form of a sign model and geometric constraints, in order to increase the correct detection rate and decrease the false positive rate.
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient number of critical visual features and yields extremely efficient classifiers [6]. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performace comparable to the best previous systems [18, 13, 16, 12, 1]. Implemented on a conventional desktop, face detection proceeds at 15 frames per second.
Conference Paper
This paper presents a new road sign recognition method with color and edge based features. In our approach, first, extraction of color regions is performed to reduce the computational costs. Next, Histogram of Oriented Gradient (HOG) features are calculated in all the extracted regions. In calculation of HOG features, the edge gradients are amplified on specific color boundaries with road signs. The road sign recognition can be done by using support vector machine. Experimental results showed that the proposed method can work in real image sequences.
This paper presents a new road sign recognition method with color and edge based features. In our approach, first, extraction of color regions is performed to reduce the computational costs. Next, Histogram of Oriented Gradient (HOG) features are calculated in all the extracted regions. In calculation of HOG features, the edge gradients are amplified on specific color boundaries with road signs. The road sign recognition can be done by using support vector machine. Experimental results showed that the proposed method can work in real image sequences.
The objective of this project is the development of an algorithm for the automatic recognition of traffic signs in digital images. The program An.Si. was created (from the Greek words Anagnorisi Simaton which means Sign Recognition). Up to now, many algorithms for the traffic sign detection and classification have been introduced. Extensive research is being made by major car manufacturing companies in collaboration with Universities and other institutes on real-time and automatic recognition of traffic signs, so that it can be a part of the so called "Driver Support Systems" ([7]). Two major problems exist in the whole detection process. Road signs are frequently occluded partially by other vehicles. Many objects are present in traffic scenes which make the sign detection hard (pedestrians, other vehicles, buildings and billboards may confuse the detection system by patterns similar to that of road signs). Colour information from traffic scene images is affected by varying illumination caused by weather conditions, time (day-night) and shadowing (buildings) ([7]). The proposed method detects the location of the sign in the image, based on its geometrical characteristics and recognises it using colour information. Partial occlusion is dealt by the use of the Hough Transform and suggestions are made for future improvements so that the robustness of the algorithm in light condition changes can be increased.
An object may be extracted from its background in a picture by theshold selection. Ideally, if the object has a different average gray level from that of its surrounding, the effect of thresholding will produce a white object with a black background or vice versa. In practice, it is often difficult, however, to select an appropriate threshold, and a technique is described whereby an optimum threshold may be chosen automatically as a result of an iterative process, successive iterations providing increasingly cleaner extractions of the object region. An application to low contrast images of handwritten text is discussed.
The segmentation of images should generate segments which correspond to objects, parts of objects, or groups of objects which appear in the image. This paper presents a description of a general segmentation method which can be applied to many different types of scenes. It describes the segmentation method in detail and discusses the potential performance of other segmentation techniques on general scenes. Also presented is a subset of the images which have been analyzed using this technique and a summary of the computational effort required. Details of some of the major programs are given in the appendix.
Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the un-derlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting's relationship to support-vector machines. Some examples of recent applications of boosting are also described.
In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.