In spite of the immense success of deep neural networks for classification tasks, it's challenging to use them to extend the usage for industrial applications. The struggle comes from the nature of variation in data sources and the distribution of data. For image classification and detection schemes, it's significant to design models that are less prone to transformation shifts. In this work, we propose LadonNet, a CNN which trains with multiple spatial transformations of the input instance. The model is extended to work with a Residual CNN model on training samples and generated augmented samples. The model was compared with other varieties of residual architectures and showed competitive performance. In the future, the model can be extended with attention to better visualize the strongest features.
performs signicantly good when trained with optimized hyper-
parameter set and regularizer - it needs signicant amount of data
to train. Augmentation and transfer learning can be useful in this
regard [
]. But it’s still challenging to train the model to be
robust enough to make it transformation aware. In real life scenar-
ios, this is a very desired characteristic of the learning algorithm. In
this work, we have designed a spatial transformation scheme which
will be applied to the input images before applying convolution op-
erations. The new data points are mixed with augmented data. The
batches are used to train proposed Ladon-Net which uses multiple
spatial transformations to learn better geometric representation
of each data instance. Due to automatic transformation applied to
each data points, the model can learn better representation of the
data, and requires less data to train which is another advantage.
Deformable part models is used in object detection in cluttered
images which reduces the problem to classication task with latent
variables. DPM uses local part templates and geometric constraints
on the location of the parts to represent each object [
]. A two stage
DPM based model is proposed for car detection [
]. The model is
based on composite feature set (DMP/CF) which can localize cars
of dierent types and orientation. The rst stage involves HOG
template matching while second stage works on the ROI obtained
from previous stage and HOG, CNN features based part detectors
are used for validation. A latent logistic regression is to optimize the
location, window size and features. Conventionally, CNN is used for
car classication. Cascade methods and feature fusion can improve
the performance [
]. A car detection system based on Infrared
(IR) images is proposed by authors which utilizes spatio-temporal
evolution of car temperature with ConvNet to localize cars [2].
A car classication system is developed which can show better gen-
eralization. In real life scenarios, all the learning algorithms need
to adapt to new instances. Our system consists of two modules the
generator module, and Ladon-Net. The generator module assem-
bles the data points by mixing data from many sources. A small
set of old data is kept which pass through augmentation scheme.
The incoming car images are combined with generated samples.
Finally, the training batch is generated. The Ladon-Net generates
four spatial transformation per sample to understand the geometric
ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh Tanmoy and Zabir, et al.
Figure 1: Training of Ladon-Net
invariance better. The overall methodology for training Ladon-Net
is shown in Fig 1.
3.1 Spark Augmentation
Commercial images often have watermarks, and impaired regions.
To overcome this, we have proposed an augmentation technique
called ’spark’. Here, random regions of an image are erased. This
works as a regularization scheme, and helps to teach the model
a better representation of the training objects. The algorithm for
generating the eect in a 8 bit image is demonstrated. Some of the
samples generated by the algorithm is also shown in Fig 2.
In our experiment, we used Stanford Cars, Tiny ImageNet. We also
scraped car images from dierent sources and extended our dataset.
For positive class, in total 22307 images were used - 16185 images
from Stanford Cars, 7200 from Tiny ImageNet. In negative class,
42918 images were used [
]. We experimented with three
dierent versions of residual networks:
Reanet18 was trained on a balanced dataset. No augmentation
techniques were used. No pretrained weights or data augmentation
techniques were used. It achieved 100% accuracy on the validation
split but it failed to predict some tricky images manually picked to
observe the generalization property of the model [4, 9].
Xception was trained on the unbalanced dataset and achieved
similar accuracy on the validation split. We initialized the model
with pre-trained weights of ImageNet dataset. It did slightly better
on the manually picked images but still failed to recognize all of
Resnet50 was trained in the same manner as Xception. It achieved
similar accuracy on the validation set but did much better on the
Algorithm 1:
Algorithm for applying spark augmentation in
1function SparkAug (I,r,s,d,ks,bw);
Input :
, spark center radius
, spark strength
, spark
density d, kernel speed ks, blob width bw
Output : Augmented Image I
2h,w,c=I.dim ; // height, width, channel ;
3Cx=w/2+randomI nRanдe(−r,r);
4Cy=h/2+randomI nRanдe(−r,r);
5// randomInRange(a,b) returns a random integer from [a,b] ;
6Sp=[(0,ks),(ks,ks),(ks,0),(0,ks),(−ks,ks),(−ks,0)] ;
7Csd =r andomInRanдe(dks,d+ks);
8while Csd >0do
9Csd =1;
10 Cksi=randomInRanдe(0,Sp.len 1);
11 Cks =Sp[Cksi];
12 f luc =randomI nRanдe(−ks/2,ks/2);
13 dx=Cks [0]+f luc ;
14 dy=Cks [1]+f luc ;
15 Css =r andomInRanдe(s/3,s);
16 cx=Cx;
17 cy=Cy;
18 while Css >0do
19 Css =1;
20 cx=cx+dx+randomI nRanдe(−ks/2,ks/2);
21 cy=cy+dy+randomI nRanдe(−ks/2,ks/2);
22 if x inRanдe(0,w)and yinRanдe(0,h)then
23 w1=randomI nRanдe(1,2bw);
24 w2=randomI nRanдe(1,2bw);
25 w3=randomI nRanдe(1,2bw);
26 w4=randomI nRanдe(1,2bw);
27 I[cxw1:cx+w2,cyw3:cy+w4,:]=255 ;
28 else
29 end
30 end
31 end
32 re tur n I =I;
manual test. But still, it failed to predict some images where cars
were located in unusual orientations.
Another version of Resnet50 was trained with data augmentation.
This version failed to achieve 100% accuracy on the validation
set like other models. But it successfully recognized images with
dierent orientations.
So, We decided to use both models of Resnet50 and use their
average score. It had zero false positive prediction. From now on we
will call these model RNet1 and RNet2 throughout the rest of our
document. Our target was to push it farther and we investigated
the predictions from the nal layers and found that both of the
models were very condent about their predictions. These models
in almost all cases had at least 100 times more condence in the
True positive prediction. And one more interesting thing was found
Residual Net for Car Detection with spatial transformation ICCA 2020, 10-12th of January 2020, Dhaka, Bangladesh
(a) Original Image (b) Spark Augmented Image
(c) Warp Variation (d) Scatter Variation
Figure 2: Spark Augmentation
Figure 3: ResNet50 without Augmentation (Accuracy)
that RNet2 missed some positive images and had the condence of
around 10-20%. So we decided to move the threshold of the sigmoid
layer and set it to 10%. Then instead of taking the normal average,
we took the weighted average of the model with two times of the
condence of RNet2.
These heuristics gave a signicant boost in the performance.
Some of the images that were used to manually check the model
are attached below.
We proposed Ladon-Net which generates multiple spatial trans-
formations from images and train on all of them. It learns the
geometry and orientation of the classes better than the previous
models. It outperformed detecting test images oriented in dierent
geometry [
]. The model takes input images of size 160X120. It
then generates k geometric variations. In our experiment, we have
used k = 4. The model reached 99.98% validation accuracy after 18
epochs. It contains 23.6 million parameters.
Figure 4: ResNet50 without Augmentation (Loss curve)
Table 1: Hyperparameters
Input shape 160x120
Batch Size 32
Interpolation Bicubic
Epochs RNet1 10
RNet2 15
Optimizer Adam L. Rate 0.0001
Beta_1 0.9
Beta_2 0.999
Decay 0
L. Rate Reduction Patience 1
Factor 0.3
Min. L.
Augmentations Flip
Rotation, Shear
Height, width shift
Zoom, Spark
Channel shift
In this work, automatic car detection scheme was developed with
residual network. Multiple variation of ResNet and Xception was
evaluated for car detection task, we contributed to the learning
scheme of residual CNN to generalize better. Proposed model was
trained with extended augmented dataset, and applied multiple geo-
metric transformation to learn the representation of the data better.
The transformation algorithm improved the overall performance
of the residual network. This work is expected to be useful in self
driving car research.
