Content uploaded by Rohan Saha
Author content
All content in this area was uploaded by Rohan Saha on Dec 19, 2018
Content may be subject to copyright.
1
Transfer Learning – A Comparative Analysis
Project report submitted
in partial fulfilment of the requirement for the degree of
Bachelor of Technology
By
Rohan Saha (1505614)
Debaruna Saha (1505029)
Under the supervision of
(Dr.) Prof. Sudan Jha
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
Deemed to be University, BHUBANESWAR
DECEMBER 2018
2
Certificate
This is to certify that the project entitled “TRANSFER LEARNING – A COMPARATIVE
APPROACH” submitted by
ROHAN SAHA 1505614
DEBARUNA SAHA 1505029
in partial fulfilment of the requirements for the award of the Degree of Bachelor of Technology
in Computer Science and Engineering is a bonafide record of the work carried under my(our)
guidance and supervision at School of Computer Engineering, KIIT Deemed to be University.
Signature of Supervisor
(Dr.) Prof Sudan Jha
School of Computer Engineering
KIIT Deemed to be University
The Project was evaluated by us on _____________
EXAMINER 1
EXAMINER 2
EXAMINER 3
3
Acknowledgement
We feel extremely privileged in expressing our sincere gratitude to our supervisor and mentor
(Dr.) Prof. Sudan Jha, for his constant guidance and supervision throughout our project work.
His way of solving the real world problem by implementing it in machines have inspired us to
take up this project. Our heartfelt thanks to you sir for the support and patience shown to us.
We are also thankful to our department of School of Computer Science for giving us this
opportunity to implement our idea into real time implementation.
Rohan Saha (1505614)
Debaruna Saha (1505029)
4
Abstract
Deep learning has paved the way for critical and revolutionary applications in almost every field
of life in general. Ranging from engineering to healthcare, machine learning and deep learning
has left its mark as the state-of-the-art technology application which holds the epitome of a
reasonable high benchmarked solution. Incorporating neural network architectures into
applications has become a common part of any software development process.
In this paper we look at the transfer learning and how the process of transfer learning can
augment the performance of a neural network architecture with pre-existing models in
conjunction with densely connected layers suitable for a specific problem.
We implement different transfer learning approaches and models for arriving at a unison
conclusion regarding the best model for the hand-written digit recognition problem in terms of
loss and accuracy metrics. We then perform a comparative analysis of the different models used
and visualize the metrics using TensorBoard. Visualization helps us to analyze the time
variations of the metrics, which helps us to get deeper insights.
The comparative analysis provides us with the data for the final model that can be used in the
system which later will be used to extract features and recognize digits and characters from an
image containing written information written on a piece of paper or blackboard.
Transfer learning, though new has proved to be one of the crucial inventions in the field of deep
learning and computer vision because of the promising results it provides with very little
distortion regarding entropy and randomization of the real-world unstructured data.
5
Table Of Contents
Chapter
Title
Page No.
1
Introduction
6
2
Literature Survey
7-10
3
Related Work
11
4
Motivation
12
5
Transfer Learning
13-16
6
Implementation
17
7
Results
18-26
8
Conclusion
27-28
9
References
29-30
6
Chapter 1
Introduction
_____________________________________________________________________________________
The analysis of gigantic masses of digital visual information all around the world requires
several visual information image analysis techniques. The main purpose is to automatically
analyze the semantic contents of images. The distinction of the object from the whole image
needs various object recognition techniques. Recognition of object for human in real world is
achieved without any efforts, but for machines self-recognition is not possible until an
algorithmic description is fed to the machine. Thus object recognition techniques need to be
developed which are less complex and efficient.
Significant efforts have been made to develop representation schemes and algorithms aiming at
recognizing generic objects in images under different imaging conditions. Presently there are
several object recognition techniques that address the object recognition scenario. Several
techniques consider color and shape features for recognition. Simple task of object recognition
and segmentation scheme that removes some set of features in an image sequence and uses them
for further learning in the learning phase. Or feature extraction which is one of the most
important fields in artificial intelligence, extracts the most relevant features of an image and
assigns it into a label. Transfer learning is the improvement of learning in a new task through the
transfer of knowledge from a related task that has already been learned. The recognition process
is carried out by matching the test image against the stored object representations or models.
Various techniques are available each having pros and cons. Firstly the paper focuses on
different types of techniques for image recognition. Later on the main focus of the paper will be
image recognition using transfer learning which is based on deep learning. Deep learning has
many advantages over other conventional machine learning algorithms like statistical training,
detecting relationships between dependent and independent variables, detection of possible
interactions between predictor variables, pattern recognition and many more. The deep learning
model like Convolutional Neural Networks (CNNs) and Deep Belief Networks (DBNs) have
been widely used for image classification, object recognition, and action recognition. Therefore
opening new applications in every field.
7
Chapter 2
Literature Survey
_____________________________________________________________________________
Image recognition centered on appearance-based techniques as advanced are feature extraction
and template matching algorithms. Looking at them detail-
2.1 Template Matching
In [1], template matching is described as a technique to locate small parts of the image which
match a template image. A straightforward process where template images for different objects
are stored. An input image is matched with the stored template images to determine the object in
the input. The author proposed a mathematical morphological template matching approach for
object recognition in inertial navigation systems (INS).The focus of the paper is to detect and
track the ground objects. The flying systems equipped with camera were used to capture the
photos of ground; to identify the objects. Their method is independent of the altitude and
orientation of the object.
In [2], an approach for measuring similarity between visual images based on matching internal
self-similarities. A template image is to be compared to another image. Measuring similarity
across images can be complex, the similarity within each image can be easily revealed with
simple similarity measure, such as SSD (Sum of Square Differences), resulting in local self-
similarity descriptors which can be matched across images.
2.2 Feature Extraction
Feature extraction is one of the most important fields in artificial intelligence. It extracts the most
relevant features of an image and assign it into a label. In image classification, the crucial step is
to analyze the properties of image features and to organize the numerical features into classes.
In [3] the performance of the data models obtained by the different feature extraction techniques
in the context of binary and multiclass classification by using different classifiers is presented.
Some techniques used in the paper based on which the conclusion was reached are PHOG
(Pyramid Histogram of Oriented Gradients), LBP (Local Binary Patterns Features) etc. The
features extraction are based on the following.
8
2.2.1 Color Features
In image classification and image retrieval, the color is the most important feature. The color
histogram represents the most common method to extract color feature. It is regarded as the
distribution of the color in the image. The efficacy of the color feature resides in the fact that is
independent and insensitive to size, rotation and the zoom of the image.
2.2.2 Texture Features
Texture feature extraction is very robust technique for a large image which contains a repetitive
region. The texture is a group of pixel that has certain characterize. The texture feature methods
are classified into two categories: spatial texture feature extraction and spectral texture feature
extraction.
2.2.3 Shape Features
Shape features are very used in the object recognition and shape description. The shape features
extraction techniques are classified as: region based and contour based. The contour methods
calculate the feature from the boundary and ignore its interior, while the region methods
calculate the feature from the entire region.
The dataset was split into two parts: Training set and Test set. In this study, the dataset was
divided as follows: 60% of instances will be used in the training phase and 40% of remaining
instances constitute the test set. The Linear SVM, SVM with Gaussian kernel, Least Square
SVM (LS-SVM) and k-nearest neighbor is used for the classification. The results show that the
PHOG, GABOR and LBP methods have reached a high classification accuracy rate and are the
very precise and efficient methods.
9
2.3 Image Segmentation
Among the image recognition techniques Image segmentation is a process of disconnecting an
object into various segments so that the resultant piece is a clarified portrayal of an object into a
thing that is far more purposeful and easier to analyze.
The segmentation of object is classified as of two types based upon the camera’s mobility-
a) The static segmentation of camera
b) Moving segmentation of camera.
In the static segmentation of camera, the camera is located at a particular position at a fixed
angle, so that object and the background are firm. In dynamic camera segmentation, the camera
keeps on moving, it is more challenging because it requires acknowledging the movement of the
camera and modification of background.
In [4], a method is discussed which is based on spectral graph partitioning. The paper explains
joint optimization problem using patch grouping and pixel grouping. There work explained that
falsely detected parts can be powerfully cut away by perceptual organization. The results were
showed on the 120×120 synthetic image. They used the trained Linear Fisher classifier for every
body part and the output was a reliably better object segmentation.
In [5] , a method is presented of recognizing and segmenting the objects with the help of SIFT
and graph cuts, they took 20 object models from the angles of every 45 degrees, and the
obtained results were as follows; recall- 0.81, precision: 1.00. The seeds were automatically
given to graph cuts using SIFT, but the disadvantage of the approach was that if the number of
key points became less, the accuracy of recognition and segmentation would fall down and
computing time would increase.
In [6], a method for viewpoint independent recognition for free-form objects and their
segmentation. Their work automatically registered unordered views of an object with the
complexity of O(n). The experiment produced the overall recognition rate of 95% for synthetic
as well as real images on the whole. They contrasted their work with the spin image recognition
algorithms and stated that the algorithm was better in all the aspects such as in recognition rate
and efficiency.
10
In [7], an approach where objects were represented by the hierarchy of fragments. Their
approach was effective and provided an accessible framework for organization, segmentation
and recognition. The advantages of this approach included; class specific features, pictorial
features of the fragments that represented and recognized the complete object and their parts.
This approach was consistent with both psychological and physiological evidences. The major
challenges for the future could be; full scene interpretation and dynamic recognition of objects.
In [8], Kim et al. proposed an unsupervised moving object segmentation and recognition method
for Intelligent Transportation Systems (ITS). This method used clustering and neural network
approach. Advantages included; real-time and robust performance, efficient remote monitoring.
They have performed the experiments on a Pentium IV 1.4 Giga Hertz processor with windows
98, plus the algorithm was executed using development tool of MS Visual C++. Processing time
for each image frame took 0.36 seconds on an avg. Comparing with Badenas et al. [13], it
produced an average time of 0.57 sec, and a segmentation rate of 91.3% and the method
presented by them produced an average time of 0.22 sec and segmentation rate of 95.8 %.
In [9], a method of machine learning-dependent object recognizing algorithms that used the
MLS point clouds in such a way that it created maps using the road environment architecture.
They collected unorganized 3D point clouds and performed pre-processing, segmentation and
feature extraction. The local prediction of the segmented objects then formed the labelled object
locations. They used the MLS principle that helped in increasing the segmentation of point
clouds with the arrangement sharpness varying from 78.3% to 87.9%.
In [10], it is described that transfer learning mainly consists of two approaches:
1) Preserving the original pre-trained network and updating the weights based on the new
training dataset.
2) Using pre-trained network for feature extraction, and representation followed by a generic
classifier such as SVM for classification.
The second approach has been successfully applied for many recognition and classification
tasks [11], [12].The human action recognition technique also falls under the second category.
Among the recently proposed benchmark deep models such as AlexNet, and GoogleNet, the
AlexNet is selected as source model for building a target model for the action recognition task.
The source model has been used for feature extraction and representation followed by a hybrid
Support Vector Machine and K-Nearest Neighbor (SVM-KNN) classifier for action recognition.
11
Chapter 3
Related Work
___________________________________________________
The existing state-of-the-art methods for action recognition using handcrafted based
representations and deep learning. This method has achieved remarkable performance for human
action recognition. But this feature engineering process is labor-intensive and requires expertise
of the subject matter.
Due to the limitations, more research is directed to deep learning-based approach. This approach
has been used in several domains such as image classification, speech recognition, and object
recognition etc. These models have also been explored for human activity recognition. In [13], a
human action recognition method was proposed using unsupervised on-line deep learning
technique. This method achieved accuracy of 89.86%, and 88.5% on KTH and UCF sports
dataset respectively.
The handcrafted feature-based techniques, in particular, trajectory based methods have less
discriminative power. Conversely, deep network architectures are inefficient in capturing the
salient motion. For addressing this issue, [14] combined the deep convolutional networks with
trajectory for action recognition. However, deep learning-based methods also have some
limitations, these models require huge amount of data for training, and collecting huge amount of
domain-specific data is time consuming and expensive. Therefore, training the deep learning
model from scratch is not feasible for domain-specific problems. This problem can be solved
using pre-trained network as a source architecture for training the target model with small
dataset, known as using transfer learning.
Models of ImageNet Large Scale Visual Recognition Challenge (ILSVRC) such as AlexNet,
GoogleNet, and ResNet are publicly available as pre-trained networks. These networks can be
used for transfer learning. One of the important ways to employ the existing models for new task
is to use pre-trained models as feature extraction machine and combine this deep representation
with off-the-shelf classifiers for action recognition.
12
Chapter 4
Motivation
Our primary motivation for investment into this project was to build a solid foundation for
implementing a system which could recognize the handwritten characters written on a piece of
blackboard or whiteboard. The systems can be used in an environment where the primary
method of instruction is by writing on a mounted board.
The system should be able to recognize the instructors writing and automatically generate the
typed notes in a suitable font for easier understanding and cross-referencing for the students.
Existing systems occur such as the those launched by Panasonic; the electronic whiteboard. The
electronic whiteboard works by assessing the contents of the written information and creating a
digital version of it. The digital representation can be later shared with people for the
dissemination of information. However, there are some limitations in such type of systems. Some
are listed below:
Expensive
Requires additional hardware
Non-attachable to the existing blackboard.
Machinery may result in wear and tear.
Difficult to maintain.
Not power efficient.
The fundamental motive of this project is to alleviate all of the above costs and develop a
sustainable system that will be able to overcome the shortcomings of the existing systems.
Our system will be designed keeping in mind the use experience and the cost. Also, it is to be
designed in a way so as to obviate additional hardware. In essence, it should a plug-and-play
system.
In this project, we will primarily focus on the models that might be suitable in recognizing the
symbols written on the board with a reasonable amount of performance.
13
Chapter 5
Transfer Learning
______________________________________________________________________________
Training a new deep learning model from scratch requires huge amount of data, high
computational resources, and hours, in some cases, days of training. In real-world applications,
collecting and annotating huge amount of domain-specific data is time consuming and
expensive, which makes it a quite challenging to apply deep learning models. To overcome this
challenge, Researchers are convinced that, the knowledge of previous objects, assist in learning
the new objects through their similarity and connection with the new objects. Based on this idea,
some studies suggest that the deep learning models trained for a classification task, can be
employed for classification. Thus, the CNNs models trained on a specific dataset or task can be
fine-tuned for a new task even in a different domain. This concept is known as transfer learning.
Human inherent ways to transfer knowledge between tasks. We recognize and apply relevant
knowledge from previous learning experiences when we encounter new tasks. The more related a
new task is to our previous experience, the more easily we can master it. Common machine
learning algorithms traditionally address isolated tasks. Transfer learning attempts to change this
by developing methods to transfer knowledge learned in one or more source tasks and use it to
improve learning in a related target task.
Since long time, transfer learning has been studied as a machine learning technique for solving
the different visual categorization problems. In recent years, due to explosion of information
such as images, audios, and videos over the internet, demands for high accuracies, and
computational efficiencies are increased. Due to these reasons, the transfer learning has attracted
a lot of interests in the areas of machine learning and computer vision. When the traditional
machine learning techniques have reached their limits, the transfer learning unlocks new flow of
streams for visual categorization. It has been applied successfully for visual categorization tasks
in the domains of object recognition, image classification and human action recognition.
14
5.1 Methodology
In computer vision, Neural Networks usually try to detect edges in their earlier layers, shapes in
their middle layer and some task-specific features in the later layers. With transfer learning, the
early and middle layers are used and only the latter layers is retrained. It helps to leverage the
labeled data of the task it was initially trained on.
For example for a model trained for recognizing a backpack on an Image, which will be used to
identify Sunglasses. In the earlier layers, the model has learned to recognize objects and because
of that, we will only re-train the latter layers, so that it will learn what separates sunglasses from
other objects.
In Transfer Learning, we try to transfer as much knowledge as possible from the previous task,
the model was trained on, to the new task at hand. This knowledge can be in various forms
depending on the problem and the data. For example, it could be how models are composed
which would allow us to more easily identify novel objects.
15
Below shown is a basic convergence graph for transfer learning.
As an example, the architecture of VGG19 neural network is given below:
16
5.2 Applications
The main advantages of transfer learning are basically that time is saved, the Neural Network
model performs better in most cases and that there is no need a lot of data.
Usually, there is a need of lot of data to train a Neural Network from scratch but access to
enough data is not available. That is where Transfer Learning comes into play because with it a
solid machine learning model can be built with comparatively little training data because the
model is already pre-trained. This is especially valuable in Natural Language Processing (NLP)
because there is mostly expert knowledge required to created large labeled datasets. Therefore a
lot of training time can be saved, because it sometimes take days or even weeks to train a deep
Neural Network from scratch on a complex task.
5.3 Advantages:
1. Higher start-The initial skill (before refining the model) on the source model is higher than
without transfer method.
2. Higher slope-The rate of improvement of skill during training of the source model is steeper
than any other method.
3. Higher asymptote- The converged skill of the trained model is better than it other methods.
5.4 Disadvantages:
1. The distribution of the training data which the pre-trained model has used should be like the
data that it is going to face during test time or at least don't vary too much.
2. Second, the number of training data for transfer learning should be in a way that it will over fit
the model.
3. Cannot remove layers with confidence to reduce the number of parameters. Basically the
number of layers is a hyper-parameter, there is no consensus on how to be chosen. If the
convolutional layers are removed from the first layers, again based on experience, the model
won't have good learning because of the nature of the architecture which finds low level
features. Furthermore, if first layers is removed there will be a problem for the denser layers,
because the number of trainable parameters changes. Densely connected layers and deep
convolutional layers can be good points for reduction but it may take time to find how many
layers and neurons to be diminished in order not to over fit.
17
Chapter 6
Implementation
____________________________________________________
In the area of implementation, we have written the procedure for comparing the analysis for
different existing transfer learning models on the ‘mnist’ dataset of handwritten digits. Our
primary goal is to test the different transfer learning methodologies on the dataset after which the
best model can be used in conjunction with the extraneous densely connected hidden layers for
predicting the output. The feature set from the existing models is fed to the input of a series of
three hidden layers with the last layer being the output.
The following tools were used for implementing the project:
Google Colaboratory
Python
Keras (with TensorFlow backend)
Numpy
Pandas
Scikit-learn
TensorBoard (for computation graph visualization and visualizing loss)
In this paper, we evaluate the different approaches of transfer learning to the MNIST dataset of
handwritten digits as a benchmark for performance evaluation.
The following models were used in the experiment with ‘imagenet’ weights.
VGG16
VGG19
ResNet50
Xception
MobileNet
DenseNet
18
Chapter 7
Results
____________________________________________________
We have used a GPU execution environment for the model to be trained. The time taken for the
training process depends upon the number of epochs of the model in addition to defining the
batch size. We have used the ‘keras’ deep learning framework for implementing the deep
learning models in addition to the inbuilt transfer learning applications it provides to the users.
7.1 Algorithm for a generalized transfer learning
Algorithm: Transfer Learning for digit classification
Input: Input dataset (Dimension - 1 x 784)
Output: Probability of digit (0 to 9)
1. Import pre-trained model
2. Set weights:= ‘imagenet’
3. Reshape the input data to match the size of the input of the model
4. Feed the input dataset into the model
5. Extract bottleneck features
6. Feed the features to the densely connected network
7. Extract output
8. Plot the loss and accuracy over successive iterations
In our implementation, the following were the values of the parameters and the number of
examples for the training process.
1. Batch size – 100
2. Epochs – 100
3. Training size – 60000 images
4. Validation size – 10000 images
19
7.2 Metrics
To measure the performance of the model, we used the following set of metrics.
Loss function: Sparse categorical cross-entropy
Optimizer: Adam
Experimentally it has been found out that the above combination of metrics is most effective
while dealing with sparse feature set or data such as images.
The densely connected neural networks use the ‘Rectified Linear Unit’ activation function and
the “Softmax’ function as output.
The different results are plotted below with supplemental visual representation for a better
understanding. All the models are trained with the ‘imagenet’ weights.
All the values shown in the graph are actual results. The data is visualized in ‘loss’ and
‘accuracy’, in that order.
The blue line refers to the results corresponding to the validation data.
The orange line refers to the results corresponding to the training data.
20
7.3 Analysis
1. DenseNet
Epochs
Epochs
Loss: 14.6499
Accuracy: 6.24 %
21
2. MobileNet
Epochs
Epochs
Loss: 2.4834
Accuracy: 5.15 %
22
3. Xception
Epochs
Epochs
Loss: 2.5570
Accuracy: 15.09 %
23
4. ResNet50
Epochs
Epochs
Loss: 7.1141
Accuracy: 22.59 %
24
5. VGG19
Epochs
Epochs
Loss: 0.1177
Accuracy: 96.64 %
25
6. VGG16
Epochs
Epochs
Loss: 0.1354
Accuracy: 96.19 %
26
7.4 Inference
From the results above, it can be inferred that the VGG19 and the VGG16 architectures provide
the best results for the stipulated constraints. In fact, since both VGG16 and VGG19 are hardly
different from the architectural point of view, both of them can be used to solve similar
problems. In this case, the problem of hand-written digit recognition is solved with a very
confident validation accuracy of 96.64 % for VGG19 and an accuracy of 96.19 % for VGG16.
Therefore, we conclude that VGG19 or VGG16 can be used for the process for the problem of
hand-written digit recognition and other processes like character recognition.
Below given is a table with the results for the different neural network architectures.
Architecture
Loss
Accuracy (%)
DenseNet
14.6499
6.24
MobileNet
2.4834
5.15
Xception
2.5570
15.09
ResNet50
7.1141
22.59
VGG19
0.1177
96.64
VGG16
0.1354
96.19
27
Chapter 8
Conclusion
____________________________________________________
7.1 Summary
In this project, we explained the introduced to the reader the motivation to build a system of
character recognition using transfer learning methodologies. We first explained the drawbacks of
existing systems and provided goals of our systems that would alleviate the drawbacks of the
existing systems. We later explained the method of transfer learning in addition to citing the
advantages and disadvantages over other conventional methods.
We analyzed the different architectures of the pre-existing models and implemented them along
with plotting the graphs for the measures of the respective models. In the end, we stated the
model that would be best suitable for the problem of recognizing digits and which can also be
applied to other similar problems like character recognition.
7.2 Cost Analysis
Since all the dataset and the implementation was cloud based, there were no additional costs
required. The only miscellaneous cost that can be incurred is attributed to that of a high
resolution camera for efficient extraction of features and accurate recognition of characters and
digits.
7.3 Challenges
We encountered numerous challenges during the planning and implementation of the project.
Some of them are listed below.
Deciding the correct neural network architecture for the solution.
Transforming the shape of the input data so as to conveniently feed into the input layer.
Selecting a cloud environment for the implemented code to be run.
28
7.4 Planning and Project Management
Following is the detailed project planning and management:
Activity
Start Date
Number of weeks
Literature Review
10 -Oct
1
Finalizing problem definition
18-Oct
1
Requirement gathering
24-Oct
1
Implementation
10-Nov
2
Result analysis
24-Nov
1
Preparation of project report
4-Nov
1
Preparation of project
presentation
4-Nov
1
29
References
[1] W. Hu, A.M.Gharuib, A.Hafez, “Template Match Object Detection for Inertial Navigation Systems,”
Scientific research (SCIRP), pp.78-83, May 2011.
[2] E.Shectman, M.Irani, “Matching Local Self-Similarities across Images and Videos,” In IEEE
International Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[3] Seyyid Ahmed Medjahed, “A Comparative Study of Feature Extraction Methods in Images
Classification”,In I.J. Image, Graphics and Signal Processing.
[4] X. Y. Stella, R. Gross, J. Shi, Concurrent Object Recognition and Segmentation by Graph
Partitioning, published in Neural Information Processing Systems, Vancouver, pp. 1-8, 2002.
[5] A. Suga , K.. Fukuda, T. Takiguchi, Y. Ariki, Object Recognition and Segmentation using SIFT and
Graph cuts, published in 19th IEEE Conference on Pattern Recognition, pp. 1-4, 2008.
[6] A. S. Mian , M. Bennamoun, R. Owens, Three-Dimensional Model-Based Object Recognition and
Segmentation in Cluttered Scenes, published in IEEE Transaction on Pattern Analysis and Machine
Intelligence, Vol. 28, No. 10, pp.1584-1601, 2006.
[7] S. Ullman, Object recognition and segmentation by a fragmentbased hierarchy, published in Trends in
Cognitive Sciences Vol. 11, No. 12, pp. 58-64, 2007.
[8] J. B. Kim, H. S. Park, M. H. Park, H. J. Kim, Unsupervised Moving Object Segmentation and
Recognition Using Clustering and a Neural Network, published in International Joint Conference on
Neural Network, pp. 1240-1245, 2002
[9] M. Lehtomäki, A. Jaakkola, J. Hyyppä,J. Lampinen, H. Kaartinen, A. Kukko, E. Puttonen, H. Hyyppä,
Object Classification and Recognition from Mobile Laser Scanning Point Clouds in a Road Environment,
published in IEEE Transaction on Geoscience and Remote Sensing, Volume:PP , No: 99, pp. 1-14, 2015.
[10] Allah Bux Sargano, Human action recognition using transfer learning with deep
representations,published in IEEE Conference on 2017 International Joint Conference on Neural
Networks (IJCNN)
[11] H. Azizpour, S.A. Razavian, J. Sullivan, "From generic to specific deep representations for visual
recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, 2015.
30
[12] S.A. Razavian, H. Azizpour, J. Sullivan, "CNN features off-the-shelf: an astounding baseline for
recognition", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops, 2014.
[13] K. Charalampous, A. Gasteratos, "On-line deep learning method for action recognition", Pattern
Analysis and Applications, vol. 19, no. 2, pp. 337-354, 2016.
[14] L. Wang, Y. Qiao, X. Tang, "Action recognition with trajectory-pooled deep-convolutional
descriptors", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
[15] L. Cao, Z. Liu, T.S. Huang, "Cross-dataset action detection", Computer vision and pattern
recognition (CVPR) 2010 IEEE conference on, 2010.