Content uploaded by Zabir Al Nazi
Author content
All content in this area was uploaded by Zabir Al Nazi on Nov 26, 2019
Content may be subject to copyright.
Residual Net for Car Detection with spatial transformation
Zabir Al Nazi
MazeGeek, Inc.
Dhaka, Bangladesh
zabir@vinndo.com
Mamunur Rahaman Mamun
MazeGeek, Inc.
Dhaka, Bangladesh
mamunur@vinndo.com
Abidur Rahman Mallik
MazeGeek, Inc.
Dhaka, Bangladesh
abid@mazegeek.com
Tanmoy Tapos Datta
MazeGeek, Inc.
Dhaka, Bangladesh
tanmoy@vinndo.com
Mohammad Samawat Ullah
American International
University-Bangladesh
Dhaka, Bangladesh
samawat@aiub.edu
ABSTRACT
In spite of the immense success of deep neural networks for clas-
sication tasks, it’s challenging to use them to extend the usage
for industrial applications. The struggle comes from the nature of
variation in data sources and the distribution of data. For image
classication and detection schemes, it’s signicant to design mod-
els that are less prone to transformation shifts. In this work, we
propose LadonNet, a CNN which trains with multiple spatial trans-
formations of the input instance. The model is extended to work
with a Residual CNN model on training samples and generated aug-
mented samples. The model was compared with other varieties of
residual architectures and showed competitive performance. In the
future, the model can be extended with attention to better visualize
the strongest features.
CCS CONCEPTS
•Computing methodologies
;
•Machine learning
;
•Machine
learning approaches;•Neural networks;
KEYWORDS
neural networks, car detection, classication, residual networks
ACM Reference Format:
Zabir Al Nazi, Mamunur Rahaman Mamun, Abidur Rahman Mallik, Tan-
moy Tapos Datta, and Mohammad Samawat Ullah. 2020. Residual Net for
Car Detection with spatial transformation. In Proceedings of ICCA 2020 -
International Conference on Computing Advancements (ICCA 2020). ACM,
Dhaka, Bangladesh, 4 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 INTRODUCTION
In recent years, deep learning has shown promising performance
in dierent domains ranging from biomedical to power systems
[
10
,
11
,
13
,
15
]. Deep convolutional neural networks have wide
applications in many computer vision tasks. Even though DCNN
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh
©2020 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
performs signicantly good when trained with optimized hyper-
parameter set and regularizer - it needs signicant amount of data
to train. Augmentation and transfer learning can be useful in this
regard [
1
,
7
,
18
]. But it’s still challenging to train the model to be
robust enough to make it transformation aware. In real life scenar-
ios, this is a very desired characteristic of the learning algorithm. In
this work, we have designed a spatial transformation scheme which
will be applied to the input images before applying convolution op-
erations. The new data points are mixed with augmented data. The
batches are used to train proposed Ladon-Net which uses multiple
spatial transformations to learn better geometric representation
of each data instance. Due to automatic transformation applied to
each data points, the model can learn better representation of the
data, and requires less data to train which is another advantage.
2 RELATED WORKS
Deformable part models is used in object detection in cluttered
images which reduces the problem to classication task with latent
variables. DPM uses local part templates and geometric constraints
on the location of the parts to represent each object [
3
]. A two stage
DPM based model is proposed for car detection [
17
]. The model is
based on composite feature set (DMP/CF) which can localize cars
of dierent types and orientation. The rst stage involves HOG
template matching while second stage works on the ROI obtained
from previous stage and HOG, CNN features based part detectors
are used for validation. A latent logistic regression is to optimize the
location, window size and features. Conventionally, CNN is used for
car classication. Cascade methods and feature fusion can improve
the performance [
6
,
8
,
19
]. A car detection system based on Infrared
(IR) images is proposed by authors which utilizes spatio-temporal
evolution of car temperature with ConvNet to localize cars [2].
3 METHODOLOGY
A car classication system is developed which can show better gen-
eralization. In real life scenarios, all the learning algorithms need
to adapt to new instances. Our system consists of two modules the
generator module, and Ladon-Net. The generator module assem-
bles the data points by mixing data from many sources. A small
set of old data is kept which pass through augmentation scheme.
The incoming car images are combined with generated samples.
Finally, the training batch is generated. The Ladon-Net generates
four spatial transformation per sample to understand the geometric
ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh Tanmoy and Zabir, et al.
Figure 1: Training of Ladon-Net
invariance better. The overall methodology for training Ladon-Net
is shown in Fig 1.
3.1 Spark Augmentation
Commercial images often have watermarks, and impaired regions.
To overcome this, we have proposed an augmentation technique
called ’spark’. Here, random regions of an image are erased. This
works as a regularization scheme, and helps to teach the model
a better representation of the training objects. The algorithm for
generating the eect in a 8 bit image is demonstrated. Some of the
samples generated by the algorithm is also shown in Fig 2.
4 RESULT ANALYSIS
In our experiment, we used Stanford Cars, Tiny ImageNet. We also
scraped car images from dierent sources and extended our dataset.
For positive class, in total 22307 images were used - 16185 images
from Stanford Cars, 7200 from Tiny ImageNet. In negative class,
42918 images were used [
5
,
12
,
16
]. We experimented with three
dierent versions of residual networks:
•Resnet18
•Xception
•Resnet50
Reanet18 was trained on a balanced dataset. No augmentation
techniques were used. No pretrained weights or data augmentation
techniques were used. It achieved 100% accuracy on the validation
split but it failed to predict some tricky images manually picked to
observe the generalization property of the model [4, 9].
Xception was trained on the unbalanced dataset and achieved
similar accuracy on the validation split. We initialized the model
with pre-trained weights of ImageNet dataset. It did slightly better
on the manually picked images but still failed to recognize all of
them.
Resnet50 was trained in the same manner as Xception. It achieved
similar accuracy on the validation set but did much better on the
Algorithm 1:
Algorithm for applying spark augmentation in
images
1function SparkAug (I,r,s,d,ks,bw);
Input :
Image
I
, spark center radius
r
, spark strength
s
, spark
density d, kernel speed ks, blob width bw
Output : Augmented Image I′
2h,w,c=I.dim ; // height, width, channel ;
3Cx=w/2+randomI nRanдe(−r,r);
4Cy=h/2+randomI nRanдe(−r,r);
5// randomInRange(a,b) returns a random integer from [a,b] ;
6Sp=[(0,ks),(ks,ks),(ks,0),(0,−ks),(−ks,−ks),(−ks,0)] ;
7Csd =r andomInRanдe(d−ks,d+ks);
8while Csd >0do
9Csd =−1;
10 Cksi=randomInRanдe(0,Sp.len −1);
11 Cks =Sp[Cksi];
12 f luc =randomI nRanдe(−ks/√2,ks/√2);
13 dx=Cks [0]+f luc ;
14 dy=Cks [1]+f luc ;
15 Css =r andomInRanдe(s/3,s);
16 cx=Cx;
17 cy=Cy;
18 while Css >0do
19 Css =−1;
20 cx=cx+dx+randomI nRanдe(−ks/2,ks/2);
21 cy=cy+dy+randomI nRanдe(−ks/2,ks/2);
22 if x inRanдe(0,w)and yinRanдe(0,h)then
23 w1=randomI nRanдe(1,2∗bw);
24 w2=randomI nRanдe(1,2∗bw);
25 w3=randomI nRanдe(1,2∗bw);
26 w4=randomI nRanдe(1,2∗bw);
27 I[cx−w1:cx+w2,cy−w3:cy+w4,:]=255 ;
28 else
29 end
30 end
31 end
32 re tur n I ′=I;
manual test. But still, it failed to predict some images where cars
were located in unusual orientations.
Another version of Resnet50 was trained with data augmentation.
This version failed to achieve 100% accuracy on the validation
set like other models. But it successfully recognized images with
dierent orientations.
So, We decided to use both models of Resnet50 and use their
average score. It had zero false positive prediction. From now on we
will call these model RNet1 and RNet2 throughout the rest of our
document. Our target was to push it farther and we investigated
the predictions from the nal layers and found that both of the
models were very condent about their predictions. These models
in almost all cases had at least 100 times more condence in the
True positive prediction. And one more interesting thing was found
Residual Net for Car Detection with spatial transformation ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh
(a) Original Image (b) Spark Augmented Image
(c) Warp Variation (d) Scatter Variation
Figure 2: Spark Augmentation
Figure 3: ResNet50 without Augmentation (Accuracy)
that RNet2 missed some positive images and had the condence of
around 10-20%. So we decided to move the threshold of the sigmoid
layer and set it to 10%. Then instead of taking the normal average,
we took the weighted average of the model with two times of the
condence of RNet2.
These heuristics gave a signicant boost in the performance.
Some of the images that were used to manually check the model
are attached below.
We proposed Ladon-Net which generates multiple spatial trans-
formations from images and train on all of them. It learns the
geometry and orientation of the classes better than the previous
models. It outperformed detecting test images oriented in dierent
geometry [
14
]. The model takes input images of size 160X120. It
then generates k geometric variations. In our experiment, we have
used k = 4. The model reached 99.98% validation accuracy after 18
epochs. It contains 23.6 million parameters.
Figure 4: ResNet50 without Augmentation (Loss curve)
Table 1: Hyperparameters
Input shape 160x120
Batch Size 32
Interpolation Bicubic
Epochs RNet1 10
RNet2 15
Optimizer Adam L. Rate 0.0001
Beta_1 0.9
Beta_2 0.999
Decay 0
L. Rate Reduction Patience 1
Factor 0.3
Min. L.
Rate
0.00001
Augmentations Flip
Rotation, Shear
Height, width shift
Zoom, Spark
Channel shift
5 CONCLUSION
In this work, automatic car detection scheme was developed with
residual network. Multiple variation of ResNet and Xception was
evaluated for car detection task, we contributed to the learning
scheme of residual CNN to generalize better. Proposed model was
trained with extended augmented dataset, and applied multiple geo-
metric transformation to learn the representation of the data better.
The transformation algorithm improved the overall performance
of the residual network. This work is expected to be useful in self
driving car research.
REFERENCES
[1]
Zabir Al Nazi and Tasnim Azad Abir. 2020. Automatic Skin Lesion Segmen-
tation and Melanoma Detection: Transfer Learning Approach with U-Net and
DCNN-SVM. In Proceedings of International Joint Conference on Computational
Intelligence. Springer, 371–381.
[2]
Muhammet Bastan, Kim-Hui Yap, and Lap-Pui Chau. 2018. Idling car detection
with ConvNets in infrared image sequences. In 2018 IEEE International Symposium
on Circuits and Systems (ISCAS). IEEE, 1–5.
ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh Tanmoy and Zabir, et al.
Figure 5: ResNet50 with Augmentation (Accuracy)
Figure 6: ResNet50 with Augmentation (Loss curve)
[3]
Pedro Felzenszwalb, Ross Girshick, David McAllester, and Deva Ramanan. 2013.
Visual Object Detection with Deformable Part Models. Commun. ACM 56, 9 (Sept.
2013), 97–105. https://doi.org/10.1145/2494532
[4]
Heechul Jung, Min-Kook Choi, Jihun Jung, Jin-Hee Lee, Soon Kwon, and Woo
Young Jung. 2017. ResNet-based vehicle classication and localization in trac
surveillance systems. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops. 61–67.
[5]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Repre-
sentations for Fine-Grained Categorization. In Proceedings of the 2013 IEEE Inter-
national Conference on Computer Vision Workshops (ICCVW ’13). IEEE Computer
Society, Washington, DC, USA, 554–561. https://doi.org/10.1109/ICCVW.2013.77
[6]
Jun Liang, Xu Chen, Mei-ling He, Long Chen, Tao Cai, and Ning Zhu. 2018. Car
detection and classication using cascade model. IET Intelligent Transport Systems
12, 10 (2018), 1201–1209.
[7]
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2017. Deep
transfer learning with joint adaptation networks. In Proceedings of the 34th
International Conference on Machine Learning-Volume 70. JMLR. org, 2208–2217.
[8]
Satish Madhogaria, Paul M Baggenstoss, Marek Schikora, Wolfgang Koch, and
Daniel Cremers. 2015. Car detection by fusion of HOG and causal MRF. IEEE
Trans. Aerospace Electron. Systems 51, 1 (2015), 575–590.
[9]
Mamunur Rahaman Mamun, Zabir Al Nazi, and Md Salah Uddin Yusuf. 2018.
Bangla Handwritten Digit Recognition Approach with an Ensemble of Deep
Residual Networks. In 2018 International Conference on Bangla Speech and Lan-
guage Processing (ICBSLP). IEEE, 1–4.
[10]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol
Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu.
2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499
(2016).
[11]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional
networks for biomedical image segmentation. In International Conference on
Medical image computing and computer-assisted intervention. Springer, 234–241.
[12]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean
Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al
.
2015. Imagenet large scale visual recognition challenge. International journal of
computer vision 115, 3 (2015), 211–252.
[13]
Sara Sabour, Nicholas Frosst, and Georey E Hinton. 2017. Dynamic routing
between capsules. In Advances in neural information processing systems. 3856–
3866.
[14]
Ravi Teja Mullapudi, William R Mark, Noam Shazeer, and Kayvon Fatahalian.
2018. Hydranets: Specialized dynamic architectures for ecient inference. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
8080–8089.
[15]
Yixing Wang, Meiqin Liu, and Zhejing Bao. 2016. Deep learning neural network
for power system fault diagnosis. In 2016 35th Chinese Control Conference (CCC).
IEEE, 6678–6683.
[16]
Jiayu Wu, Qixiang Zhang, and Guoxi Xu. 2019. Tiny ImageNet Challenge. CS231n,
Stanford University. Accessed (2019), 06–17.
[17]
Hao Xu, Qin Huang, and C-C Jay Kuo. 2016. Car detection using deformable part
models with composite features. In 2016 IEEE International Conference on Image
Processing (ICIP). IEEE, 3812–3816.
[18]
Zhilin Yang, Ruslan Salakhutdinov, and William W Cohen. 2017. Transfer learn-
ing for sequence tagging with hierarchical recurrent networks. arXiv preprint
arXiv:1703.06345 (2017).
[19]
Ye Yu, Qiang Jin, and Chang Wen Chen. 2018. FF-CMnet: A CNN-based model
for ne-grained classication of car models based on feature fusion. In 2018 IEEE
International Conference on Multimedia and Expo (ICME). IEEE, 1–6.