ArticlePDF Available

Abstract and Figures

The paper presents the study of an effective classification method for traffic signs on the basis of a convolutional neural network with various dimension filters. Every model of convolutional neural network has the same architecture but different dimension of filters for convolutional layer. The studied dimensions of the convolution layer filters are: 3 × 3, 5 × 5, 9 × 9, 13 × 13, 15 × 15, 19 × 19, 23 × 23, 25 × 25 and 31 ×31. In each experiment, the input image is convolved with the filters of certain dimension and with certain processing depth of image borders, which depends directly on the dimension of the filters and varies from 1 to 15 pixels. Performances of the proposed methods are evaluated with German Traffic Sign Benchmarks (GTSRB). Images from this dataset were reduced to 32 × 32 pixels in dimension. The whole dataset was divided into three subsets: training, validation and testing. The effect of the dimension of the convolutional layer filters on the extracted feature maps is analyzed in accordance with the classification accuracy and the average processing time. The testing dataset contains 12000 images that do not participate in convolutional neural network training. The experiment results have demonstrated that every model shows high testing accuracy of more than 82%. The models with filter dimensions of 9 × 9, 15 × 15 and 19 × 19 achieve top three with the best results on classification accuracy equal to 86.4 %, 86 % and 86.8 %, respectively. The models with filter dimensions of 5 × 5, 3 × 3 and 13 × 13 achieve top three with the best results on the average processing time equal to 0.001879, 0.002046 and 0.002364 seconds, respectively. The usage of convolutional layer filter with middle dimension has shown not only the high classification accuracy of more than 86 %, but also the fast classification rate, that enables these models to be used in real-time applications.
Content may be subject to copyright.
Научно-технический вестник информационных технологий, механики и оптики,
546 2019, том 19, № 3
НАУЧНО-ТЕХНИЧЕСКИЙ ВЕСТНИК ИНФОРМАЦИОННЫХ ТЕХНОЛОГИЙ, МЕХАНИКИ И ОПТИКИ
май—июнь 2019 Том 19 № 3 ISSN 2226-1494 http://ntv.itmo.ru/
SCIENTIFIC AND TECHNICAL JOURNAL OF INFORMATION TECHNOLOGIES, MECHANICS AND OPTCS
May—June 2019 Vol. 19 No 3 ISSN 2226-1494 http://ntv.itmo.ru/en/ май—июнь 2019 Том 19 Номер 3
doi: 10.17586/2226-1494-2019-19-3-546-552
EFFECT OF VARIOUS DIMENSION CONVOLUTIONAL LAYER FILTERS
ON TRAFFIC SIGN CLASSIFICATION ACCURACY
V.N. Sichkar, S.A. Kolyubin
ITMO University, Saint Petersburg, 197101, Russian Federation
Corresponding author: vsichkar@itmo.ru
Article info
Received 25.03.19, accepted 30.04.19
Article in English
For citation: Sichkar V.N., Kolyubin S.A. Effect of various dimension convolutional layer filters on traffic sign classification accuracy.
Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 3, pp. 546–552 (in English). doi:
10.17586/2226-1494-2019-19-3-546-552
Abstract
The paper presents the study of an effective classification method for traffic signs on the basis of a convolutional neural network
with various dimension filters. Every model of convolutional neural network has the same architecture but different dimension of
filters for convolutional layer. The studied dimensions of the convolution layer filters are: 3 × 3, 5 × 5, 9 × 9, 13 × 13, 15 × 15,
19 × 19, 23 × 23, 25 × 25 and 31 ×31. In each experiment, the input image is convolved with the filters of certain dimension and
with certain processing depth of image borders, which depends directly on the dimension of the filters and varies from 1 to 15
pixels. Performances of the proposed methods are evaluated with German Traffic Sign Benchmarks (GTSRB). Images from this
dataset were reduced to 32 × 32 pixels in dimension. The whole dataset was divided into three subsets: training, validation and
testing. The effect of the dimension of the convolutional layer filters on the extracted feature maps is analyzed in accordance with
the classification accuracy and the average processing time. The testing dataset contains 12000 images that do not participate in
convolutional neural network training. The experiment results have demonstrated that every model shows high testing accuracy
of more than 82%. The models with filter dimensions of 9 × 9, 15 × 15 and 19 × 19 achieve top three with the best results
on classification accuracy equal to 86.4 %, 86 % and 86.8 %, respectively. The models with filter dimensions of 5 × 5, 3 × 3
and 13 × 13 achieve top three with the best results on the average processing time equal to 0.001879, 0.002046 and 0.002364
seconds, respectively. The usage of convolutional layer filter with middle dimension has shown not only the high classification
accuracy of more than 86 %, but also the fast classification rate, that enables these models to be used in real-time applications.
Keywords
traffic signs classification, convolutional neural network, convolutional layer filters, feature maps extraction, classification
accuracy
УДК 004.855.5 doi: 10.17586/2226-1494-2019-19-3-546-552
АНАЛИЗ ВЛИЯНИЯ РАЗЛИЧНОЙ РАЗМЕРНОСТИ ФИЛЬТРОВ
СВЕРТОЧНОГО СЛОЯ НА ТОЧНОСТЬ КЛАССИФИКАЦИИ
ДОРОЖНЫХ ЗНАКОВ
В.Н. Сичкар, С.А. Колюбин
Университет ИТМО, Санкт-Петербург, 197101, Российская Федерация
Адрес для переписки: vsichkar@itmo.ru
Информация о статье
Поступила в редакцию 25.03.19, принята к печати 30.04.19
Язык статьи — английский
Ссылка для цитирования: Сичкар В.Н., Колюбин С.А. Анализ влияния различной размерности фильтров сверточного слоя на точ-
ность классификации дорожных знаков // Научно-технический вестник информационных технологий, механики и оптики. 2019. Т. 19.
№ 3. С. 546–552. doi: 10.17586/2226-1494-2019-19-3-546-552
Аннотация
Выполнено исследование эффективного метода классификации дорожных знаков на основе сверточной нейронной
сети с фильтрами различной размерности. Каждая модель сверточной нейронной сети имеет одинаковую архитектуру,
но разную размерность фильтров для сверточного слоя. Исследуемыми размерностями фильтров сверточного слоя
являются 3 × 3, 5 × 5, 9 × 9, 13 × 13, 15 × 15, 19 × 19, 23 × 23, 25 × 25 и 31 × 31. В каждом эксперименте входное изо-
бражение подвергается операции свертки фильтрами определенной размерности и с определенной глубиной обработки
Научно-технический вестник информационных технологий, механики и оптики,
2019, том 19, № 3 547
V.N. Sichkar, S.A. Kolyubin
границ изображения, которая прямо пропорционально зависит от размерности фильтров и варьируется в пределах от
1 до 15 пикселей. Характеристики предложенных методов оцениваются с помощью немецкого набора изображений
дорожных знаков (GTSRB). Изображения из данного набора были уменьшены в размерности до 32 × 32 пикселей.
Весь набор данных с изображениями был разделен на три части: набор для обучения, набор для валидации и набор для
тестирования. Влияние размерности фильтров сверточного слоя на извлеченные карты характеристик анализируется в
соответствии с точностью классификации и средним временем обработки. Набор данных для тестирования содержит
12000 изображений, которые не принимают участия в обучении сверточной нейронной сети. Результаты экспериментов
показали, что каждая из моделей обладает высокой точностью классификации, которая составляет более 82 %. Модели
с размерностью фильтров 9 × 9, 15 × 15 и 19 × 19 вошли в первую тройку с лучшими результатами по точности класси-
фикации, которая составила 86,4, 86 и 86,8 % соответственно. Модели с размерностью фильтров 5 × 5, 3 × 3 и 13 × 13
вошли в первую тройку с лучшими результатами по средней скорости обработки, которая составила 0,001879, 0,002046
и 0,002364 секунд соответственно. Использование средней размерности фильтров для сверточного слоя показало не
только высокую точность классификации более 86 %, но и высокую скорость классификации, что позволяет использо-
вать такие модели в приложениях для работы в реальном времени.
Ключевые слова
классификация дорожных знаков, сверточная нейронная сеть, фильтры сверточного слоя, извлечение карт характери-
стик, точность классификации
Introduction
Classification task for traffic signs is important from the traffic safety point of view. A significant part of road
traffic violations occurs due to drivers’ non-compliance with the speed limit. In this regard, many car manufacturers
use traffic sign classification systems to detect a speed limit sign on the image. These systems compare the current
speed of the vehicle with the speed allowed on the current section of the road and notify the driver of the excess
speed or automatically change the speed. The use of traffic sign classification systems in conjunction with the
navigation system makes it possible to obtain data on the speed limit even in cases where the sign has not been
determined, namely, the driver will be informed of the possible presence of the sign.
The main problem associated with this task is the classification of traffic signs in real conditions. Night time
and bad weather conditions complicate significantly the process of the sign classification on the image.
The methods of image classification based on neural networks are actively used and described in the literature
[1–6]. However, every method has its advantages and disadvantages. Therefore, the development of a reliable
algorithm is still a problem of open research. When testing sign classification systems in real traffic conditions,
some signs may be misinterpreted due to different levels of light, vibration, different angles of shooting traffic
signs. To eliminate these shortcomings, convolutional neural networks have proven themselves [7–11]. Such neural
networks are more effective in solving image classification problems than fully connected neural networks in terms
of computational load, as well as due to considerably less number of configurable parameters. However, the main
advantage of convolutional neural networks is that they are invariant with respect to the shape, rotation and color
intensity of the input images.
The paper considers the effect of different dimension of convolutional neural network filters on the accuracy
and rate of classification. This dimension determines the number of features that will be combined to obtain a new
feature at the output feature map. Therefore, the use of small dimension (3 × 3) of convolutional layer filters combines
fewer features and can lead to the loss of important information. On the contrary, the use of a large dimension (31 × 31)
of convolutional layer filters combines more features, but can lead to redundancy of information on irrelevant or
unnecessary characteristics. Training of the convolutional neural network takes place on the GTSRB dataset [12, 13].
Convolutional neural network for traffic sign classification
Classification using convolutional neural networks is a modern method of pattern recognition in computer
vision. Convolutional neural network receives an image and processes it in convolutional layers. Every convolutional
layer consists of a set of trainable filters that process the input image with a convolution operation. The essence
of this operation is that the filter slides over the image and produces an elementwise multiplication of the values
of the filter pixels and the current image area. The result is summarized and written in the corresponding position
of the output feature map. The peculiarity of the convolution layer filters is that they give the possibility to detect
the same specific features in different parts of the image. Mathematically, the convolution operation is described
by the following equation:
(f·g)[m, n] = Σk,l f[mk, nlg[k, l],
where f is an initial matrix of input image; g is a filter for convolution; m, n are the height and width of the feature
map; k, l are the height and width of the filter.
Before feeding to the input of convolutional neural network, every image is preprocessed. Since the images
from GTSRB dataset were used for this study, they were first reduced in size to a resolution of 32 × 32 pixels.
This dataset was divided into three subsets for training, validation and testing with preserving the proportions
of the images for every class. Further, the normalization of images was performed by dividing them by 255 and
Научно-технический вестник информационных технологий, механики и оптики,
548 2019, том 19, № 3
EFFECT OF VARIOUS DIMENSION CONVOLUTIONAL LAYER FILTERS...
subtracting the mean image, which, in turn, was calculated from the training dataset. As a result, the dataset
containing 3-channeled RGB images was prepared. Training subset contains 50000 images, validation subset
contains 4000 images and testing subset contains 12000 images. The convolutional neural network training takes
place with batches of 50 examples at the same time.
The architecture of convolutional neural network is the same for all experiments, but with different
dimension of convolutional layer filters. Developed architecture of convolutional neural network under study is
shown in Fig. 1.
Three-channeled RGB input image is fed to the convolutional layer, which consists of 32 filters. Since the
input image has 3 channels, every filter of convolutional layer also consists of 3 channels. As a result of convolution,
32 feature maps are calculated in accordance with the number of the convolutional layer filters. The ReLU (Rectified
Linear Unit) activation function is applied to the received feature maps, which excludes negative values by replacing
them with zeros [14, 15]. This is followed by a layer of dimension reduction (also known as pooling layer), followed
by a hidden fully connected layer with 500 neurons. The output layer consists of 43 neurons in accordance with
number of classes of traffic signs in the GTSRB dataset.
Parameters of the developed convolutional neural network are described in Table 1. As can be seen from
Table 1, the loss function in this study is negative log-likelihood function. The cost function in this study is defined
as an average of loss functions overall current training batch. The process of convolutional neural network training
is to minimize the cost function by the gradient descent method, which is also called back propagation method.
Fig. 1. Architecture of convolutional neural network
Table 1. Parameters of the developed convolutional neural network
Parameter Description
Weights Initialization HE Normal
Weights Update Policy Adam
Activation Function ReLU
Pooling 2 × 2 Max
Loss Function Negative log-likelihood
Cost Function Average of Loss Functions
Stride for Convolutional Layer 1
Stride for Pooling Layer 2
Negative log-likelihood function is described by the following equation:
L(r, y)= – [y·lnr + (1 – y)·ln(1 – r)],
where r is an obtained probability with convolutional neural network for each of 43 classes; y is a true probability
for each of 43 classes.
Научно-технический вестник информационных технологий, механики и оптики,
2019, том 19, № 3 549
V.N. Sichkar, S.A. Kolyubin
Cost function is described by the following equations:
where w, b are the weights and biases of the output fully connected layer; m is a number of iterations.
There is another important parameter that is directly related to the processing of the input image boundaries.
This parameter is a zero frame (also called zero-padding frame) created around the input image before being sent
to the convolutional layer. In this study, this parameter is linearly dependent on the dimension of the convolutional
layer filters and is calculated by the following equation:
where d is the dimension of the convolutional layer filters.
Since this study analyzes the dependence of the dimensions of the convolutional layer filters on the accuracy
and rate of classification of traffic signs, the zero-padding frame parameter is very important. For example, the
convolutional layer filters with 9 × 9 dimension process an input image with zero padding frame of 4, that is, the
input image dimension is increased from 32 × 32 to 36 × 36. Consequently, the image border, namely, the extreme
pixels of the input image 32, 31, 30, etc., are processed with filters to an additional depth of 4 pixels. With gradual
increase of the dimension of the convolutional layer filters, the size of the zero-padding frame and the processing
depth for the image borders by the filters will increase. This processing makes it possible not to miss the data located
on the border of the image, especially in cases where important information on the image has been cropped.
Experimental results
In this study, training of the convolutional neural network is performed using various dimensions of
convolutional layer filters, namely 3 × 3, 5 × 5, 9 × 9, 13 × 13, 15 × 15, 19 × 19, 23 × 23, 25 × 25 and 31 × 31.
Consequently, in total, 9 models are trained with the same architecture, but with different filter dimensions. Training
is conducted 9 times for every model on the preprocessed GTSRB training dataset with 50000 examples of traffic
signs using Python v3 and pure “numpy” library. The training process for every model with its own dimension of
the convolutional layer filters consists of 9000 iterations divided into 5 epochs. Also, a training dataset is divided
into small batches of 50 examples that are fed into convolutional neural network simultaneously. At the end of every
epoch the accuracy is calculated on the training dataset and on the validation dataset. For calculating accuracy, one
thousand examples are randomly taken from the training dataset and validation dataset respectively. At the end of
the first epoch all current model parameters are written into a file. If in the next epoch validation accuracy is higher
than in the previous epoch, the parameters are updated. In this way, after the training process is finished, every
model will have the best parameters according to the validation accuracy. Fig. 2 and Fig. 3 show the accuracy data
comparison in the training process.
After the 9 models are trained, the accuracy is checked on the testing dataset. This dataset consists of 12000
images that did not participate in the training process. Every model for this operation is loaded with its own found
best parameters from the saved file after training. Summary results are shown in Table 2.
The testing process is as follows. The12000 images are fed to the input of the developed and trained
convolutional neural network, and the result is written to a vector. This vector consists of 12000 class numbers of
traffic signs classified by convolutional neural network. Further, the obtained classes with convolutional neural
network are compared with true classes. The result is converted into the accuracy between 0 and 1. The described
process is applied to any and all 9 models with their own filter dimension.
As is clear from Table 2, in accordance with the testing accuracy the best result is obtained with the model
where the dimension of the convolutional layer filters is 19 × 19 pixels and the closest is 9 × 9.
The testing process is as follows. The 12000 images are fed to the input of the developed and trained
convolutional neural network, and the result is written to a vector. This vector consists of 12000 class numbers of
traffic signs classified by convolutional neural network. Further, the obtained classes with convolutional neural
network are compared with true classes. The result is converted into accuracy between 0 and 1. The described
process is applied to all 9 models with their own filter dimension. Convolutional layer filters can be visualized
to see the changes from the initial state when they are initialized randomly and the final state when the training
process is completed. Fig. 4 and Fig. 5 show comparison of initialized filters and trained filters for the model with
dimension of the filters equal to19 × 19.
Fig. 4 shows that the initialized filters are a chaotic set of pixels of different colours. After training, the
filters have specific characteristics in the form of lines, curves, waves, dots, etc. These specific filters are being
looked for in the input image and, in case of finding them, the maximum response is given, which is written in the
appropriate place of the feature map.
Научно-технический вестник информационных технологий, механики и оптики,
550 2019, том 19, № 3
EFFECT OF VARIOUS DIMENSION CONVOLUTIONAL LAYER FILTERS...
Fig. 2. Training accuracy of models with different dimension of convolutional layer filters
Fig. 3. Validation accuracy of models with different dimension of convolutional layer filters
Table 2. Summarized results for accuracy of every model
Model Training Accuracy Validation Accuracy Testing Accuracy
31 × 31 0.965 0.83 0.843
25 × 25 0.957 0.846 0.851
23 × 23 0.95 0.843 0.846
19 × 19 0.963 0.867 0.868
15 × 15 0.967 0.863 0.86
13 × 13 0.955 0.85 0.854
9 × 9 0.963 0.868 0.864
5 × 5 0.961 0.849 0.848
3 × 3 0.931 0.805 0.828
Научно-технический вестник информационных технологий, механики и оптики,
2019, том 19, № 3 551
V.N. Sichkar, S.A. Kolyubin
In addition to the accuracy, the image classification rate for each of the 9 models with their own filter
dimensions is also estimated. Experimental results are presented in Table 3.
Table 3. The rate of image classification of every model
Model Classification Time, s
31 × 31 0.005419
25 × 25 0.003651
23 × 23 0.004366
19 × 19 0.002786
15 × 15 0.003181
13 × 13 0.002364
9 × 9 0.004472
5 × 5 0.001879
3 × 3 0.002046
The experiments were carried out on a 64-bit laptop with i3 microprocessor with 4 cores, and 4 GB of RAM.
As is clear from Table 3 the highest classification rate was shown by the model with convolutional layer filters of
dimension equal to 5 × 5 pixels.
Conclusion
This paper studies the implementation of the classification algorithm for traffic signs based on convolutional
neural network. The main contribution is analysis of the effect that convolutional layer filter dimensions have
on classification accuracy and rate of traffic signs. The efficiency of the developed algorithm is evaluated on the
GTSRB dataset.
Experimental results show that the use of convolutional layer filters with dimension of 9 × 9 and 19 × 19
gives the best accuracy of 0.864 and 0.868 respectively when tested on the testing dataset. The use of convolutional
layer filters with 5 × 5 dimension gives the best rate of classification. At the same time, the rate of classification
applying convolutional layer filters with 9 × 9 and 19 × 19 dimensions is 0.004472 and 0.002786 seconds,
respectively, and enables their usage in real time applications.
For the future studies, we are planning to research the effect of the number of convolutional layers on the
classification accuracy. In addition, it is planned to use convolutional neural networks not only for classification,
but also for detection of traffic signs.
Fig. 4. Initialized filters for 19×19 model
Fig. 5. Trained filters for 19×19 model
Научно-технический вестник информационных технологий, механики и оптики,
552 2019, том 19, № 3
EFFECT OF VARIOUS DIMENSION CONVOLUTIONAL LAYER FILTERS...
Литература
1. Balali V., Ashouri Rad A., Golparvar-Fard M. Detection,
classification, and mapping of U.S. traffic signs using google
street view images for roadway inventory management
// Visualization in Engineering. 2015. V. 3. N 1. doi:
10.1186/s40327-015-0027-1
2. Lu Y., Lu J., Zhang S., Hall P. Traffic signal detection and
classification in street views using an attention model //
Computational Visual Media. 2018. V. 4. N 3. P. 253–266. doi:
10.1007/s41095-018-0116-x
3. Balali V., Golparvar-Fard M. Segmentation and recognition of
roadway assets from car-mounted camera video streams
using a scalable non-parametric image parsing method //
Automation in Construction. 2016. V. 49. P. 27–39. doi:
10.1016/j.autcon.2014.09.007
4. Khalilikhah M., Heaslip K. The effects of damage on sign
visibility: an assist in traffic sign replacement // Journal of Traffic
and Transportation Engineering. 2016. V. 3. N 6. P. 571–581. doi:
10.1016/j.jtte.2016.03.009
5. Kryvinska N., Poniszewska-Maranda A., Gregus M. An approach
towards service system building for road traffic signs detection
and recognition // Procedia Computer Science. 2018. V. 141.
P. 64–71. doi: 10.1016/j.procs.2018.10.150
6. Khalilikhah M., Heaslip K. Analysis of factors temporarily
impacting traffic sign readability // International Journal of
Transportation Science and Technology. 2016. V. 5. N 2. P. 60–
67. doi: 10.1016/j.ijtst.2016.09.003
7. Shustanov A., Yakimov P. CNN design for real-time traffic sign
recognition // Procedia Engineering. 2017. V. 201. P. 718–725.
doi: 10.1016/j.proeng.2017.09.594
8. Indolia S., Kumar Goswami A., Mishra S.P., Asopa P. Conceptual
understanding of convolutional neural network – a deep learning
approach // Procedia Computer Science. 2018. V. 132. P. 679–
688. doi: 10.1016/j.procs.2018.05.069
9. Ozturk S., Akdemir B. Effects of histopathological image
pre-processing on convolutional neural networks //
Procedia Computer Science. 2018. V. 132. P. 396–403. doi:
10.1016/j.procs.2018.05.166
10. Kurniawan J., Syahra S.G.S., Dewa C.K., Afiahayati.
Traffic congestion detection: learning from CCTV
monitoring images using convolutional neural network //
Procedia Computer Science. 2018. V. 144. P. 291–297. doi:
10.1016/j.procs.2018.10.530
11. Aghdam H.H., Heravi E.J., Puig D. A practical approach for
detection and classification of traffic signs using Convolutional
Neural Networks // Robotics and Autonomous Systems. 2016.
V. 84. P. 97–112. doi: 10.1016/j.robot.2016.07.003
12. Stallkamp J., Schlipsing M., Salmen J., Igel C. The
German traffic sign recognition benchmark: a multi-class
classification competition // Proc. Int. Joint Conference on
Neural Networks. San Jose, USA, 2011. P. 1453–1460. doi:
10.1109/IJCNN.2011.6033395
13. Houben S., Stallkamp J., Salmen J., Schlipsing M., Igel C.
Detection of traffic signs in real-world images: the german
traffic sign detection benchmark // Proc. Int. Joint Conference
on Neural Networks. Dallas, USA, 2013. P. 1–8. doi: 10.1109/
IJCNN.2013.6706807
14. Eckle K., Schmidt-Hieber J. A comparison of deep networks
with ReLU activation function and linear spline-type
methods // Neural Networks. 2019. V. 110. P. 232–242. doi:
10.1016/j.neunet.2018.11.005
15. Lin G., Shen W. Research on convolutional neural network
based on improved Relu piecewise activation function //
Procedia Computer Science. 2018. V. 131. P. 977–984. doi:
10.1016/j.procs.2018.04.239
References
1. Balali V., Ashouri Rad A., Golparvar-Fard M. Detection,
classification, and mapping of U.S. traffic signs using google
street view images for roadway inventory management.
Visualization in Engineering, 2015, vol. 3, no. 1. doi:
10.1186/s40327-015-0027-1
2. Lu Y., Lu J., Zhang S., Hall P. Traffic signal detection and
classification in street views using an attention model.
Computational Visual Media, 2018, vol. 4, no. 3, pp. 253–266.
doi: 10.1007/s41095-018-0116-x
3. Balali V., Golparvar-Fard M. Segmentation and recognition of
roadway assets from car-mounted camera video streams
using a scalable non-parametric image parsing method.
Automation in Construction, 2016, vol. 49, pp. 27–39. doi:
10.1016/j.autcon.2014.09.007
4. Khalilikhah M., Heaslip K. The effects of damage on sign
visibility: an assist in traffic sign replacement. Journal of Traffic
and Transportation Engineering, 2016, vol. 3, no. 6, pp. 571–
581. doi: 10.1016/j.jtte.2016.03.009
5. Kryvinska N., Poniszewska-Maranda A., Gregus M. An approach
towards service system building for road traffic signs detection
and recognition. Procedia Computer Science, 2018, vol. 141,
pp. 64–71. doi: 10.1016/j.procs.2018.10.150
6. Khalilikhah M., Heaslip K. Analysis of factors temporarily
impacting traffic sign readability. International Journal of
Transportation Science and Technology, 2016, vol. 5, no. 2,
pp. 60–67. doi: 10.1016/j.ijtst.2016.09.003
7. Shustanov A., Yakimov P. CNN design for real-time traffic sign
recognition. Procedia Engineering, 2017, vol. 201, pp. 718–725.
doi: 10.1016/j.proeng.2017.09.594
8. Indolia S., Kumar Goswami A., Mishra S.P., Asopa P.. Conceptual
understanding of convolutional neural network – a deep learning
approach. Procedia Computer Science, 2018, vol. 132, pp. 679–
688. doi: 10.1016/j.procs.2018.05.069
9. Ozturk S., Akdemir B. Effects of histopathological image
pre-processing on convolutional neural networks. Procedia
Computer Science, 2018, vol. 132, pp. 396–403. doi:
10.1016/j.procs.2018.05.166
10. Kurniawan J., Syahra S.G.S., Dewa C.K., Afiahayati. Traffic
congestion detection: learning from CCTV monitoring images
using convolutional neural network. Procedia Computer Science,
2018, vol. 144, pp. 291–297. doi: 10.1016/j.procs.2018.10.530
11. Aghdam H.H., Heravi E.J., Puig D. A practical approach for
detection and classification of traffic signs using Convolutional
Neural Networks. Robotics and Autonomous Systems, 2016,
vol. 84, pp. 97–112. doi: 10.1016/j.robot.2016.07.003
12. Stallkamp J., Schlipsing M., Salmen J., Igel C. The
German traffic sign recognition benchmark: a multi-class
classification competition. Proc. Int. Joint Conference on
Neural Networks. San Jose, USA, 2011, pp. 1453–1460. doi:
10.1109/IJCNN.2011.6033395
13. Houben S., Stallkamp J., Salmen J., Schlipsing M., Igel C.
Detection of traffic signs in real-world images: the german
traffic sign detection benchmark. Proc. Int. Joint Conference
on Neural Networks. Dallas, USA, 2013, pp. 1–8. doi: 10.1109/
IJCNN.2013.6706807
14. Eckle K., Schmidt-Hieber J. A comparison of deep networks
with ReLU activation function and linear spline-type methods.
Neural Networks, 2019, vol. 110, pp. 232–242. doi: 10.1016/j.
neunet.2018.11.005
15. Lin G., Shen W. Research on convolutional neural network
based on improved Relu piecewise activation function. Procedia
Computer Science, 2018, vol. 131, pp. 977–984. doi: 10.1016/j.
procs.2018.04.239
Authors
Valentyn N. Sichkar — postgraduate, software engineer, ITMO
University, Saint Petersburg, 197101, Russian Federation, ORCID
ID: 0000-0001-9825-0881, vsichkar@itmo.ru
Sergey A. Kolyubin — PhD, Associate Professor, ITMO
University, Saint Petersburg, 197101, Russian Federation,
Scopus ID: 35303066700, ORCID ID: 0000-0002-8057-1959,
s.kolyubin@itmo.ru
Авторы
Сичкар Валентин Николаевич — аспирант, программист,
Университет ИТМО, Санкт-Петербург, 197101, Российская
Федерация, ORCID ID: 0000-0001-9825-0881, vsichkar@itmo.ru
Колюбин Сергей Алексеевич — кандидат технических
наук, доцент, Университет ИТМО, Санкт-Петербург, 197101,
Российская Федерация, Scopus ID: 35303066700, ORCID ID:
0000-0002-8057-1959, s.kolyubin@itmo.ru
... We adopted a detection strategy hinging on pre-trained YOLOv3 and CNN models [27]. This approach, introduced in 2020 and documented by two pivotal articles [28,29], has consistently yielded robust outcomes. The architecture of this approach is shown in Figure 7. ...
Article
Full-text available
In the field of smart mobility, Artificial Intelligence (AI) approaches are influential and can make a highly beneficial contribution. Our project aims to develop a real-time ecological map of road traffic. This map will allow electric vehicles (EVs) and thermal vehicles (TVs) to display the cost of energy consumption and CO2 emissions on different road sections. In urban environments, road traffic emissions are a significant contributor to environmental pollution, with vehicle emissions being a major component. Addressing these impacts requires a thorough understanding of the operational behavior of vehicles on different road infrastructures within the region. This paper presents a novel, comprehensive dataset, the Vehicle Activity Dataset (VAD), designed to assess the emissions and fuel consumption characteristics of vehicles about their actual operating environment. Constructed from a large number of real-world driving scenarios, VAD incorporates emission data collected by an industrial Portable Emission Measurement System (PEMS), road scenes captured by an RGB camera, and the detection of different object classes within these images. The primary objective of VAD is to provide a comprehensive understanding of the relationship between vehicle emissions and the diverse range of objects present on the road. Experimental results in real road traffic environments through different studies demonstrate the robustness of the developed dataset.
... In Sichkar and Kolyubin (2019) [36], a study was conducted to explore the impact of different dimensions of convolutional layer filters on the performance of a CNN for traffic sign classification. The dimensions considered in the experiment were 3, 5, 9, 13, 15, 19, 23, 25, and 31. ...
Article
Full-text available
Autonomous vehicles have become a topic of interest in recent times due to the rapid advancement of automobile and computer vision technology. The ability of autonomous vehicles to drive safely and efficiently relies heavily on their ability to accurately recognize traffic signs. This makes traffic sign recognition a critical component of autonomous driving systems. To address this challenge, researchers have been exploring various approaches to traffic sign recognition, including machine learning and deep learning. Despite these efforts, the variability of traffic signs across different geographical regions, complex background scenes, and changes in illumination still poses significant challenges to the development of reliable traffic sign recognition systems. This paper provides a comprehensive overview of the latest advancements in the field of traffic sign recognition, covering various key areas, including preprocessing techniques, feature extraction methods, classification techniques, datasets, and performance evaluation. The paper also delves into the commonly used traffic sign recognition datasets and their associated challenges. Additionally, this paper sheds light on the limitations and future research prospects of traffic sign recognition.
... The participants achieved a very high performance of up to 98.98% correct recognition rate which is similar to human performance on this dataset. 2) : [6]Valentyn Sichkar, Sergey A. Kolyubin, "Effect of various dimension convolutional layer filters on traffic sign classification accuracy": This paper proposes a classification model for traffic sign detection together with carefully chosen evaluation metrics and baseline results. In their evaluation, they separate sign detection from classification, also measured the performance on relevant categories of signs to allow for benchmarking specialized solutions. ...
Article
Full-text available
This paper presents an effective solution to detecting traffic signs on road by first classifying the traffic sign images us-ing Convolutional Neural Network (CNN) on the German Traffic Sign Recognition Benchmark (GTSRB)[1] and then detecting the images of Indian Traffic Signs using the Indian Dataset which will be used as testing dataset while building classification model. Therefore this system helps electric cars or self driving cars to recognise the traffic signs efficiently and correctly. The system involves two parts, detection of traffic signs from the environment and classification based on CNN thereby recognising the traffic sign. The classification involves building a CNN model of different filters of dimensions 3 × 3, 5 × 5, 9 × 9, 13 × 13, 15 × 15,19 × 19, 23 × 23, 25 × 25 and 31 ×31 from which the most efficient filter is chosen for further classifying the image detected. The detection involves detecting the traffic sign using YOLO v3-v4 and BLOB detection. Transfer Learning is used for using the trained model for detecting Indian traffic sign images.
... We compare speed and accuracy of classification to the 1-NN classifier on the mediumsize road signs dataset [30] with 43 classes (images are resized to 256-dimensional representation, 10% test set) and two large binary classification datasets HIGGS (1.1 * 10 7 items) and SUSY (5 * 10 6 items) [4] (5% test set). To measure detailed speedup statistics we used another medium-size Cover Type dataset [7]. ...
Preprint
Full-text available
K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and excellent empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work targets the design and implementation a classification algorithm to be used with efficient data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach which uses navigable small world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2-4 times classification speedup for both average and 99th percentile time compared to 1-NN method with asymptotically close classification accuracy. We observe two orders better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.
Article
Full-text available
K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2–4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.
Article
Full-text available
The issue of effective detection and classification of various traffic signs is studied. The two-stage method is proposed for creation of holistic model with end-to-end solution. The first stage includes implementation of effective localization of traffic signs by YOLO version 3 algorithm (You Only Look Once). At the first stage the traffic signs are grouped into four categories according to their shapes. At the second stage, an accurate classification of the located traffic signs is performed into one of the forty-three predefined categories. The second stage is based on another model with one convolutional neural layer. The model for detection of traffic signs was trained on German Traffic Sign Detection Benchmark (GTSDB) with 630 and 111 RGB images for training and validation, respectively. Сlassification model was trained on German Traffic Sign Recognition Benchmark (GTSRB) with 66000 RGB images on pure “numpy” library with 19 × 19 dimension of convolutional layer filters and reached 0.868 accuracy on testing dataset. The experimental results illustrated that the training of the first model deep network with only four categories for location of traffic signs produced high mAP (mean Average Precision) accuracy reaching 97.22 %. Additional convolutional layer of the second model applied for final classification creates efficient entire system. Experiments on processing video files demonstrated frames per second (FMS) between thirty-six and sixty-one that makes the system feasible for real time applications. The frames per second depended on the number of traffic signs to be detected and classified in every single frame in the range from six to one.
Article
Full-text available
In this paper, we present an intelligent traffic congestion detection method using image classification approach on CCTV camera image feeds. We use a deep learning architecture, convolutional neural network (CNN) which is currently the state-of-the art for image processing method. We only do minimal image preprocessing steps on the small size image, where the conventional methods require a high quality, handcrafted features need to do manual calculation. The CNN model is trained to do binary classification about road traffic condition using 1000 CCTV monitoring image feeds with balance distribution. The result shows that a CNN with simple, basic architecture that trained on small grayscale images has an average classification accuracy of 89.50%.
Article
Full-text available
Detecting small objects is a challenging task. We focus on a special case: the detection and classification of traffic signals in street views. We present a novel framework that utilizes a visual attention model to make detection more efficient, without loss of accuracy, and which generalizes. The attention model is designed to generate a small set of candidate regions at a suitable scale so that small targets can be better located and classified. In order to evaluate our method in the context of traffic signal detection, we have built a traffic light benchmark with over 15,000 traffic light instances, based on Tencent street view panoramas. We have tested our method both on the dataset we have built and the Tsinghua–Tencent 100K (TT100K) traffic sign benchmark. Experiments show that our method has superior detection performance and is quicker than the general faster RCNN object detection framework on both datasets. It is competitive with state-of-the-art specialist traffic sign detectors on TT100K, but is an order of magnitude faster. To show generality, we tested it on the LISA dataset without tuning, and obtained an average precision in excess of 90%.
Article
Full-text available
Deep learning has become an area of interest to the researchers in the past few years. Convolutional Neural Network (CNN) is a deep learning approach that is widely used for solving complex problems. It overcomes the limitations of traditional machine learning approaches. The motivation of this study is to provide the knowledge and understanding about various aspects of CNN. This study provides the conceptual understanding of CNN along with its three most common architectures, and learning algorithms. This study will help researchers to have a broad comprehension of CNN and motivate them to venture in this field. This study will be a resource and quick reference for those who are interested in this field.
Conference Paper
Full-text available
In this study, classification performance of histopathological images which are processed by pre-processing algorithms using convolutional neural network structure is examined. The images are divided into four different pre-processing classes with their original state and processed with three different techniques. These classes are; original, normal pre-processing, other normal pre-processing and over pre-processing. Histopathological images of these four classes include cancerous and non-cancerous image patches. For these image classes, cancer patch classification is done using the same convolutional neural network structure. In this view, pre-processing effects on the classification success of the convolutional neural network is examined. For the normal pre-processing algorithm, background noise reduction and cell enhancement are applied. For over pre-processing, thresholding and morphological operations are applied in addition to normal preprocessing operations. At the end of the experiments, the most successful classification results are produced with the normal pre-processing algorithms. This is why the meaningful features of the image are left for the CNN structure that automatically learns the feature. The over pre-processing algorithm removes most of these important features from the image. Abstract In this study, classification performance of histopathological images which are processed by pre-processing algorithms using convolutional neural network structure is examined. The images are divided into four different pre-processing classes with their original state and processed with three different techniques. These classes are; original, normal pre-processing, other normal pre-processing and over pre-processing. Histopathological images of these four classes include cancerous and non-cancerous image patches. For these image classes, cancer patch classification is done using the same convolutional neural network structure. In this view, pre-processing effects on the classification success of the convolutional neural network is examined. For the normal pre-processing algorithm, background noise reduction and cell enhancement are applied. For over pre-processing, thresholding and morphological operations are applied in addition to normal preprocessing operations. At the end of the experiments, the most successful classification results are produced with the normal pre-processing algorithms. This is why the meaningful features of the image are left for the CNN structure that automatically learns the feature. The over pre-processing algorithm removes most of these important features from the image.
Article
Full-text available
With the continuous development of deep learning, convolution neural network with its excellent recognition performance obtains a series of major breakthrough results in target detection, image recognition and other fields. An improved ReLu segmentation correction Activate function is proposed, by improving the traditional convolution neural network, adding the local response normalization layer, and using the maximum stacking and so on. Based on the Google depth learning platform TensorFlow, the activation function is used to construct the modified convolution neural network structure model, using the CIFAR-10 data set as the neural network input for the model training and evaluation. We analyze effects of different neuron activation function on the neural network convergence speed and the accuracy of image recognition. The experimental results show that using the improved unsaturated nonlinear segment activation function SignReLu, the convergence rate is faster, the gradient vanishing problem is effectively alleviated, and the accuracy of neural network identification is improved obviously.
Article
Full-text available
Deep neural networks (DNNs) generate much richer function spaces than shallow networks. Since the function spaces induced by shallow networks have several approximation theoretic drawbacks, this explains, however, not necessarily the success of deep networks. In this article we take another route by comparing the expressive power of DNNs with ReLU activation function to piecewise linear spline methods. We show that MARS (multivariate adaptive regression splines) is improper learnable by DNNs in the sense that for any given function that can be expressed as a function in MARS with M parameters there exists a multilayer neural network with O(Mlog(M/ε))O(M \log (M/\varepsilon)) parameters that approximates this function up to sup-norm error ε.\varepsilon. We show a similar result for expansions with respect to the Faber-Schauder system. Based on this, we derive risk comparison inequalities that bound the statistical risk of fitting a neural network by the statistical risk of spline-based methods. This shows that deep networks perform better or only slightly worse than the considered spline methods. We provide a constructive proof for the function approximations.
Article
Full-text available
Nowadays, more and more object recognition tasks are being solved with Convolutional Neural Networks (CNN). Due to its high recognition rate and fast execution, the convolutional neural networks have enhanced most of computer vision tasks, both existing and new ones. In this article, we propose an implementation of traffic signs recognition algorithm using a convolution neural network. The paper also shows several CNN architectures, which are compared to each other. Training of the neural network is implemented using the TensorFlow library and massively parallel architecture for multithreaded programming CUDA. The entire procedure for traffic sign detection and recognition is executed in real time on a mobile GPU. The experimental results confirmed high efficiency of the developed computer vision system.
Article
Full-text available
Traffic sign readability can be affected by the existence of dirt on traffic sign faces. However, among damaged signs, dirty traffic signs are unique since their damage is not permanent and they just can be cleaned instead of replaced. This study aimed to identify the most important factors contributing to traffic sign dirt. To do so, a large number of traffic signs in Utah were measured by deploying a vehicle instrumented with mobile LiDAR imaging and digital photolog technologies. Each individual daytime digital image was inspected for dirt. Location and climate observations obtained from official sources were compiled using ArcGIS throughout the process. To identify contributing factors to traffic sign dirt, the chi-square test was employed. To analyze the data and rank all of the factors based on their importance to the sign dirt, Random forests statistical model was utilized. After analyzing the data, it can be concluded that ground elevation, sign mount height, and air pollution had the highest effect on making traffic signs dirty. The findings of this investigation assist transportation agencies in determining traffic signs with a higher likelihood of sign dirt. In this way, agencies would schedule to clean such traffic signs more frequently.
Article
Automatic detection and classification of traffic signs is an important task in smart and autonomous cars. Convolutional Neural Networks has shown a great success in classification of traffic signs and they have surpassed human performance on a challenging dataset called the German Traffic Sign Benchmark. However, these ConvNets suffer from two important issues. They are not computationally suitable for real-time applications in practice. Moreover, they cannot be used for detecting traffic signs for the same reason. In this paper, we propose a lightweight and accurate ConvNet for detecting traffic signs and explain how to implement the sliding window technique within the ConvNet using dilated convolutions. Then, we further optimize our previously proposed real-time ConvNet for the task of traffic sign classification and make it faster and more accurate. Our experiments on the German Traffic Sign Benchmark datasets show that the detection ConvNet locates the traffic signs with average precision equal to . Using our sliding window implementation, it is possible to process 37.72 high-resolution images per second in a multi-scale fashion and locate traffic signs. Moreover, single ConvNet proposed for the task of classification is able to classify of the test samples, correctly. Finally, our stability analysis reveals that the ConvNet is tolerant against Gaussian noise when .