Conference PaperPDF Available

Residual Net for Car Detection with spatial transformation

Authors:

Abstract and Figures

In spite of the immense success of deep neural networks for classification tasks, it's challenging to use them to extend the usage for industrial applications. The struggle comes from the nature of variation in data sources and the distribution of data. For image classification and detection schemes, it's significant to design models that are less prone to transformation shifts. In this work, we propose LadonNet, a CNN which trains with multiple spatial transformations of the input instance. The model is extended to work with a Residual CNN model on training samples and generated augmented samples. The model was compared with other varieties of residual architectures and showed competitive performance. In the future, the model can be extended with attention to better visualize the strongest features.
Content may be subject to copyright.
Residual Net for Car Detection with spatial transformation
Zabir Al Nazi
MazeGeek, Inc.
Dhaka, Bangladesh
zabir@vinndo.com
Mamunur Rahaman Mamun
MazeGeek, Inc.
Dhaka, Bangladesh
mamunur@vinndo.com
Abidur Rahman Mallik
MazeGeek, Inc.
Dhaka, Bangladesh
abid@mazegeek.com
Tanmoy Tapos Datta
MazeGeek, Inc.
Dhaka, Bangladesh
tanmoy@vinndo.com
Mohammad Samawat Ullah
American International
University-Bangladesh
Dhaka, Bangladesh
samawat@aiub.edu
ABSTRACT
In spite of the immense success of deep neural networks for clas-
sication tasks, it’s challenging to use them to extend the usage
for industrial applications. The struggle comes from the nature of
variation in data sources and the distribution of data. For image
classication and detection schemes, it’s signicant to design mod-
els that are less prone to transformation shifts. In this work, we
propose LadonNet, a CNN which trains with multiple spatial trans-
formations of the input instance. The model is extended to work
with a Residual CNN model on training samples and generated aug-
mented samples. The model was compared with other varieties of
residual architectures and showed competitive performance. In the
future, the model can be extended with attention to better visualize
the strongest features.
CCS CONCEPTS
Computing methodologies
;
Machine learning
;
Machine
learning approaches;Neural networks;
KEYWORDS
neural networks, car detection, classication, residual networks
ACM Reference Format:
Zabir Al Nazi, Mamunur Rahaman Mamun, Abidur Rahman Mallik, Tan-
moy Tapos Datta, and Mohammad Samawat Ullah. 2020. Residual Net for
Car Detection with spatial transformation. In Proceedings of ICCA 2020 -
International Conference on Computing Advancements (ICCA 2020). ACM,
Dhaka, Bangladesh, 4 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 INTRODUCTION
In recent years, deep learning has shown promising performance
in dierent domains ranging from biomedical to power systems
[
10
,
11
,
13
,
15
]. Deep convolutional neural networks have wide
applications in many computer vision tasks. Even though DCNN
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh
©2020 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
performs signicantly good when trained with optimized hyper-
parameter set and regularizer - it needs signicant amount of data
to train. Augmentation and transfer learning can be useful in this
regard [
1
,
7
,
18
]. But it’s still challenging to train the model to be
robust enough to make it transformation aware. In real life scenar-
ios, this is a very desired characteristic of the learning algorithm. In
this work, we have designed a spatial transformation scheme which
will be applied to the input images before applying convolution op-
erations. The new data points are mixed with augmented data. The
batches are used to train proposed Ladon-Net which uses multiple
spatial transformations to learn better geometric representation
of each data instance. Due to automatic transformation applied to
each data points, the model can learn better representation of the
data, and requires less data to train which is another advantage.
2 RELATED WORKS
Deformable part models is used in object detection in cluttered
images which reduces the problem to classication task with latent
variables. DPM uses local part templates and geometric constraints
on the location of the parts to represent each object [
3
]. A two stage
DPM based model is proposed for car detection [
17
]. The model is
based on composite feature set (DMP/CF) which can localize cars
of dierent types and orientation. The rst stage involves HOG
template matching while second stage works on the ROI obtained
from previous stage and HOG, CNN features based part detectors
are used for validation. A latent logistic regression is to optimize the
location, window size and features. Conventionally, CNN is used for
car classication. Cascade methods and feature fusion can improve
the performance [
6
,
8
,
19
]. A car detection system based on Infrared
(IR) images is proposed by authors which utilizes spatio-temporal
evolution of car temperature with ConvNet to localize cars [2].
3 METHODOLOGY
A car classication system is developed which can show better gen-
eralization. In real life scenarios, all the learning algorithms need
to adapt to new instances. Our system consists of two modules the
generator module, and Ladon-Net. The generator module assem-
bles the data points by mixing data from many sources. A small
set of old data is kept which pass through augmentation scheme.
The incoming car images are combined with generated samples.
Finally, the training batch is generated. The Ladon-Net generates
four spatial transformation per sample to understand the geometric
ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh Tanmoy and Zabir, et al.
Figure 1: Training of Ladon-Net
invariance better. The overall methodology for training Ladon-Net
is shown in Fig 1.
3.1 Spark Augmentation
Commercial images often have watermarks, and impaired regions.
To overcome this, we have proposed an augmentation technique
called ’spark’. Here, random regions of an image are erased. This
works as a regularization scheme, and helps to teach the model
a better representation of the training objects. The algorithm for
generating the eect in a 8 bit image is demonstrated. Some of the
samples generated by the algorithm is also shown in Fig 2.
4 RESULT ANALYSIS
In our experiment, we used Stanford Cars, Tiny ImageNet. We also
scraped car images from dierent sources and extended our dataset.
For positive class, in total 22307 images were used - 16185 images
from Stanford Cars, 7200 from Tiny ImageNet. In negative class,
42918 images were used [
5
,
12
,
16
]. We experimented with three
dierent versions of residual networks:
Resnet18
Xception
Resnet50
Reanet18 was trained on a balanced dataset. No augmentation
techniques were used. No pretrained weights or data augmentation
techniques were used. It achieved 100% accuracy on the validation
split but it failed to predict some tricky images manually picked to
observe the generalization property of the model [4, 9].
Xception was trained on the unbalanced dataset and achieved
similar accuracy on the validation split. We initialized the model
with pre-trained weights of ImageNet dataset. It did slightly better
on the manually picked images but still failed to recognize all of
them.
Resnet50 was trained in the same manner as Xception. It achieved
similar accuracy on the validation set but did much better on the
Algorithm 1:
Algorithm for applying spark augmentation in
images
1function SparkAug (I,r,s,d,ks,bw);
Input :
Image
I
, spark center radius
r
, spark strength
s
, spark
density d, kernel speed ks, blob width bw
Output : Augmented Image I
2h,w,c=I.dim ; // height, width, channel ;
3Cx=w/2+randomI nRanдe(−r,r);
4Cy=h/2+randomI nRanдe(−r,r);
5// randomInRange(a,b) returns a random integer from [a,b] ;
6Sp=[(0,ks),(ks,ks),(ks,0),(0,ks),(−ks,ks),(−ks,0)] ;
7Csd =r andomInRanдe(dks,d+ks);
8while Csd >0do
9Csd =1;
10 Cksi=randomInRanдe(0,Sp.len 1);
11 Cks =Sp[Cksi];
12 f luc =randomI nRanдe(−ks/2,ks/2);
13 dx=Cks [0]+f luc ;
14 dy=Cks [1]+f luc ;
15 Css =r andomInRanдe(s/3,s);
16 cx=Cx;
17 cy=Cy;
18 while Css >0do
19 Css =1;
20 cx=cx+dx+randomI nRanдe(−ks/2,ks/2);
21 cy=cy+dy+randomI nRanдe(−ks/2,ks/2);
22 if x inRanдe(0,w)and yinRanдe(0,h)then
23 w1=randomI nRanдe(1,2bw);
24 w2=randomI nRanдe(1,2bw);
25 w3=randomI nRanдe(1,2bw);
26 w4=randomI nRanдe(1,2bw);
27 I[cxw1:cx+w2,cyw3:cy+w4,:]=255 ;
28 else
29 end
30 end
31 end
32 re tur n I =I;
manual test. But still, it failed to predict some images where cars
were located in unusual orientations.
Another version of Resnet50 was trained with data augmentation.
This version failed to achieve 100% accuracy on the validation
set like other models. But it successfully recognized images with
dierent orientations.
So, We decided to use both models of Resnet50 and use their
average score. It had zero false positive prediction. From now on we
will call these model RNet1 and RNet2 throughout the rest of our
document. Our target was to push it farther and we investigated
the predictions from the nal layers and found that both of the
models were very condent about their predictions. These models
in almost all cases had at least 100 times more condence in the
True positive prediction. And one more interesting thing was found
Residual Net for Car Detection with spatial transformation ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh
(a) Original Image (b) Spark Augmented Image
(c) Warp Variation (d) Scatter Variation
Figure 2: Spark Augmentation
Figure 3: ResNet50 without Augmentation (Accuracy)
that RNet2 missed some positive images and had the condence of
around 10-20%. So we decided to move the threshold of the sigmoid
layer and set it to 10%. Then instead of taking the normal average,
we took the weighted average of the model with two times of the
condence of RNet2.
These heuristics gave a signicant boost in the performance.
Some of the images that were used to manually check the model
are attached below.
We proposed Ladon-Net which generates multiple spatial trans-
formations from images and train on all of them. It learns the
geometry and orientation of the classes better than the previous
models. It outperformed detecting test images oriented in dierent
geometry [
14
]. The model takes input images of size 160X120. It
then generates k geometric variations. In our experiment, we have
used k = 4. The model reached 99.98% validation accuracy after 18
epochs. It contains 23.6 million parameters.
Figure 4: ResNet50 without Augmentation (Loss curve)
Table 1: Hyperparameters
Input shape 160x120
Batch Size 32
Interpolation Bicubic
Epochs RNet1 10
RNet2 15
Optimizer Adam L. Rate 0.0001
Beta_1 0.9
Beta_2 0.999
Decay 0
L. Rate Reduction Patience 1
Factor 0.3
Min. L.
Rate
0.00001
Augmentations Flip
Rotation, Shear
Height, width shift
Zoom, Spark
Channel shift
5 CONCLUSION
In this work, automatic car detection scheme was developed with
residual network. Multiple variation of ResNet and Xception was
evaluated for car detection task, we contributed to the learning
scheme of residual CNN to generalize better. Proposed model was
trained with extended augmented dataset, and applied multiple geo-
metric transformation to learn the representation of the data better.
The transformation algorithm improved the overall performance
of the residual network. This work is expected to be useful in self
driving car research.
REFERENCES
[1]
Zabir Al Nazi and Tasnim Azad Abir. 2020. Automatic Skin Lesion Segmen-
tation and Melanoma Detection: Transfer Learning Approach with U-Net and
DCNN-SVM. In Proceedings of International Joint Conference on Computational
Intelligence. Springer, 371–381.
[2]
Muhammet Bastan, Kim-Hui Yap, and Lap-Pui Chau. 2018. Idling car detection
with ConvNets in infrared image sequences. In 2018 IEEE International Symposium
on Circuits and Systems (ISCAS). IEEE, 1–5.
ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh Tanmoy and Zabir, et al.
Figure 5: ResNet50 with Augmentation (Accuracy)
Figure 6: ResNet50 with Augmentation (Loss curve)
[3]
Pedro Felzenszwalb, Ross Girshick, David McAllester, and Deva Ramanan. 2013.
Visual Object Detection with Deformable Part Models. Commun. ACM 56, 9 (Sept.
2013), 97–105. https://doi.org/10.1145/2494532
[4]
Heechul Jung, Min-Kook Choi, Jihun Jung, Jin-Hee Lee, Soon Kwon, and Woo
Young Jung. 2017. ResNet-based vehicle classication and localization in trac
surveillance systems. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops. 61–67.
[5]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Repre-
sentations for Fine-Grained Categorization. In Proceedings of the 2013 IEEE Inter-
national Conference on Computer Vision Workshops (ICCVW ’13). IEEE Computer
Society, Washington, DC, USA, 554–561. https://doi.org/10.1109/ICCVW.2013.77
[6]
Jun Liang, Xu Chen, Mei-ling He, Long Chen, Tao Cai, and Ning Zhu. 2018. Car
detection and classication using cascade model. IET Intelligent Transport Systems
12, 10 (2018), 1201–1209.
[7]
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2017. Deep
transfer learning with joint adaptation networks. In Proceedings of the 34th
International Conference on Machine Learning-Volume 70. JMLR. org, 2208–2217.
[8]
Satish Madhogaria, Paul M Baggenstoss, Marek Schikora, Wolfgang Koch, and
Daniel Cremers. 2015. Car detection by fusion of HOG and causal MRF. IEEE
Trans. Aerospace Electron. Systems 51, 1 (2015), 575–590.
[9]
Mamunur Rahaman Mamun, Zabir Al Nazi, and Md Salah Uddin Yusuf. 2018.
Bangla Handwritten Digit Recognition Approach with an Ensemble of Deep
Residual Networks. In 2018 International Conference on Bangla Speech and Lan-
guage Processing (ICBSLP). IEEE, 1–4.
[10]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol
Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu.
2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499
(2016).
[11]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional
networks for biomedical image segmentation. In International Conference on
Medical image computing and computer-assisted intervention. Springer, 234–241.
[12]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean
Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al
.
2015. Imagenet large scale visual recognition challenge. International journal of
computer vision 115, 3 (2015), 211–252.
[13]
Sara Sabour, Nicholas Frosst, and Georey E Hinton. 2017. Dynamic routing
between capsules. In Advances in neural information processing systems. 3856–
3866.
[14]
Ravi Teja Mullapudi, William R Mark, Noam Shazeer, and Kayvon Fatahalian.
2018. Hydranets: Specialized dynamic architectures for ecient inference. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
8080–8089.
[15]
Yixing Wang, Meiqin Liu, and Zhejing Bao. 2016. Deep learning neural network
for power system fault diagnosis. In 2016 35th Chinese Control Conference (CCC).
IEEE, 6678–6683.
[16]
Jiayu Wu, Qixiang Zhang, and Guoxi Xu. 2019. Tiny ImageNet Challenge. CS231n,
Stanford University. Accessed (2019), 06–17.
[17]
Hao Xu, Qin Huang, and C-C Jay Kuo. 2016. Car detection using deformable part
models with composite features. In 2016 IEEE International Conference on Image
Processing (ICIP). IEEE, 3812–3816.
[18]
Zhilin Yang, Ruslan Salakhutdinov, and William W Cohen. 2017. Transfer learn-
ing for sequence tagging with hierarchical recurrent networks. arXiv preprint
arXiv:1703.06345 (2017).
[19]
Ye Yu, Qiang Jin, and Chang Wen Chen. 2018. FF-CMnet: A CNN-based model
for ne-grained classication of car models based on feature fusion. In 2018 IEEE
International Conference on Multimedia and Expo (ICME). IEEE, 1–6.
... There are many existing works on watermarking most of the recent ones are based on deep learning. Deep learning algorithnls are mostly used in image classification [3,4,5], segmentation [6,7,8], signal processing domain [9,10,11]. The DCT coefficients for the host image as well as the watermark image are determined. ...
Conference Paper
Full-text available
Nowadays improving lossless data hiding methods is on demand. For this demand, here in this paper the conventional transforms- Discrete Cosine Transform (DCT), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) have been implemented to do watermark on various host images. To find out which transform method is more reliable and lossless, the performance parameters- Mean Squared Error (MSE) and Peak Signal to Noise Ratio (PSNR) are analyzed here. To compare the robustness among the methods, the watermarked images have been attacked by various noises. Different noise removing filters have been run over the noisy images to find out an appropriate filter for a corresponding noise
Article
Full-text available
In recent years, a number of vision-based classification methods have been proposed. However, a few of them were paid attention to vehicle-type classification in a real-world image, which is an important part of the intelligent transportation system. Owing to the large variances of the car appearance in images, it is critical to capture the discriminative object parts that can provide key information about the car pose. In the authors’ project, the traditional convolutional neural network (CNN) models are modified and experiments are followed as well. The model has two main contributions. First, the output shows a confidence score of how likely this box contains a car for each predicted box, which has some certain advantages compared with other models and is quite different from traditional approaches. Another contribution is the fine-grained classification of the makers and models of a car, which need to train the bounding box predictors as part of the network training. The experiment results show that data enhancement and pre-train of CNNs with original images can classify the vehicle makes and models with a high accuracy of nearly 80%. Cropping images by cascade methods can increase the precision to 86.6%.
Article
We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an iterative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data.
Article
A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation paramters. Active capsules at one level make predictions, via transformation matrices, for the instantiation parameters of higher-level capsules. When multiple predictions agree, a higher level capsule becomes active. We show that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. To achieve these results we use an iterative routing-by-agreement mechanism: A lower-level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule.
Article
Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.