ChapterPDF Available

Facial Analysis Using Jacobians and Gradient Boosting

  • PES University

Abstract and Figures

Security and identity have become one of the primary concerns of the people in this digital world. Person authentication and identification is transforming the way these services are provided. Earlier it was mainly achieved through passwords and patterns but with significant advancements in face recognition technologies, it has been exploited in providing authentication in smart phones and computers. Face Recognition (FR) extends its use in applications like face tagging in social media, surveillance system at theaters, airports and so on. The proposed mathematical model employs linear algebra and mathematical simulations for the task of person identification. Kernel singular value decomposition is used to denoise the facial image which is then passed to a feature detector and descriptor based on nonlinear diffusion filtering. The obtained descriptors are quantized into a vector using an encoding model called VLAD which uses k-means++ for clustering. Further, classification is done using Gradient boosting decision trees. The pipeline proposed aims at reducing the average computational power and also enhancing the efficiency of the system. The proposed system has been tested on the three benchmark datasets namely Face 95, Face 96 and Grimace.
Content may be subject to copyright.
Facial Analysis using Jacobians and Gradient
Vinay A, Abhijay Gupta, Vinayaka R Kamath,Aprameya Bharadwaj, Arvind
Srinivas, KN Balasubramanya Murthy and S Natarajan
Centre for Pattern Recognition and Machine Intelligence, PES University, Bengaluru,
Abstract. Security and identity have become one of the primary con-
cerns of the people in this digital world. Person authentication and iden-
tification is transforming the way these services are provided. Earlier it
was mainly achieved through passwords and patterns but with significant
advancements in face recognition technologies, it has been exploited in
providing authentication in smart phones and computers. Face Recogni-
tion(FR) extends its use in applications like face tagging in social media,
surveillance system at theaters, airports and so on. The proposed math-
ematical model employs linear algebra and mathematical simulations for
the task of person identification. Kernel singular value decomposition is
used to denoise the facial image which is then passed to a feature detec-
tor and descriptor based on nonlinear diffusion filtering. The obtained
descriptors are quantized into a vector using an encoding model called
VLAD which uses k-means++ for clustering. Further, classification is
done using Gradient boosting decision trees. The pipeline proposed aims
at reducing the average computational power and also enhancing the ef-
ficiency of the system. The proposed system has been tested on the three
benchmark datasets namely Face 95, Face 96 and Grimace .
Keywords: Linear Algebra ·Kernel-SVD ·Feature Quantization ·Gra-
dient Boosted Decision Tree.
1 Introduction
The need for recognizing a person among the masses is very much important for
various applications. With a wide variety of purposes, face recognition is gaining
prominence in almost all the fields where there is a need for interaction between
humans and machines. Face recognition involves recognizing a facial biometric
blueprint among an already existing database of individuals. This mechanism can
be achieved by several means. The images of these subjects may be subjected to
several mathematical operations so as to contain these blueprints in the desired
format. This technique which is maneuvered on images of the individuals deter-
mines the extent of the correctness of application which utilizes the design. The
system imposed also decides the computational demand the pipeline expects in
making the decision of recognizing the subjects. Hence choosing the right math-
ematical models plays a prominent role in directly influencing the end results of
2 A. Vinay et al.
the process.These steps involve choosing right preprocessing techniques, optimal
feature extraction algorithm as well as representing the extracted features in an
appropriate form. This apparatus experiences a lot of problems when used in an
application. Numerous factors affect the image and impart variations in the end
result. Variations in illumination, pose of the subject constitute a major hurdle
to overcome. Recognizing these variations and mapping them to the same class
is another challenge to tackle. The procedure may give appropriate results for
frontal views of the person while running poorly when the vision of the subject
is progressed away from the source that is capturing the image. Facial hair such
as mustache and beard can cause loss of features in the lower half of the face,
contributing in delivering imperfect output. The distortions in the image may
be due to the subject itself or variations in the background. These need to be
handled to avoid variations in the decisions. Even though modern techniques
claim to overcome these challenges, the scope for making this process better is
perennial. Making the system computationally superficial is also a characteristic
one has to keep in mind before designing a model. Numerous face recognition
algorithms assume that a large number of samples per person are available for
training. This is another setback which they oversee and is very distant to the
real world scenario. There is also a need for making face recognition resistant
to aging. The wrinkles developed over time can bring significant changes in the
outlook of the person. This as a whole can be broken down into three impor-
tant key aspects: detecting the region of interest, extracting the prominent key
features and illustrating these features for classification. Most of the obstacles
mentioned hinder the perseverance of the descriptors.This brings out the need
for strategies which can conquer the problems listed. It may be difficult to design
a model which can overcome all the hitches. But contraptions which can excel
at specific conditions can be drafted to fit certain use cases closer to the real
2 Related Work
D.Suter and K. Schindler in their recent work have used incremental kernel
SVD[1] to achieve face recognition with image sets. They have put forth a popu-
lar linear subspace updating algorithm to the nonlinear case by using the kernel
methodology and apply a reduced set construction method to generate sparse
expressions[2] for the derived subspace basis in order to maintain constant pro-
cessing speed and memory usage.
In [3], KPCA was used to extract feature descriptors from numerous images
for utilization in mobile robot navigation and localization. RS expansions are
constructed to compress the KPCA-derived bases to reduce computational load
during KPC utilization. In [4], AKAZE was used in remote sensing[5] image
matching. Distortions caused by the orientation change of camera were first
modeled by various tilts images; then the key points were localized by improved
Accelerated-KAZE (AKAZE) algorithm. The feature points are detected in a
nonlinear space constructed by Fast Explicit Diffusion(FED) with the help of
Facial Analysis using Jacobians and Gradient Boosting 3
variable conductance function, and the resulting feature points are described
by improved SIFT[6] descriptor. In the end euclidean distance metric was used
to determine the correspondences and Random Sample Consensus(RANSAC)[7]
algorithm was used in eliminating the false matches as well.
Given a collection of local features taken from an image, VLAD is generated by
quantizing local features with a visual vocabulary, recollecting the residual statis-
tics of quantized features for each of the generated centroids and by summing
up the aggregated residual vectors from each centroid. The search accuracy can
be optimized by increasing the size of vocabulary can prove to be costly both in
terms of memory and sheer computational power. Demonstrating a remarkable
accuracy-efficiency trade-off, VLAD has gained prominence from the community
and large number of extensions have been proposed. In [8] an attempt to make an
in-depth analysis of the framework which aimed at increasing the thorough un-
derstanding of its various processing steps and inflating its overall performance
was made. It involved the evaluation of various existing and novel extensions
along with the study of the consequences of several unexplored parameters. It
focused on exploring more productive local features, making the aggregated rep-
resentation better and tuning the indexing scheme to get better results. The
authors successfully managed to produce various insights into extensions that
contributed, and multiple others that do not.
In [9], discriminative feature descriptors were constructed as an application of
Vector of Locally Aggregated Descriptors. A hierarchical multi-VLAD was in-
troduced to interpret the trade off between descriptor discriminability and com-
putation complexity. A tree-structured hierarchical quantization (TSHQ) was
constructed to speed up the VLAD computation with a large vocabulary. If at
all quantization error propagates from root to leaf node with TSHQ, multi-VLAD
can be used; by constructing a VLAD descriptor for each level of the vocabulary
tree to cope up with the quantization error[10] at that level. Intense analysis with
regards to various benchmark datasets has proven that the proposed approach
was far better the state-of-the-art methods in terms of retrieval accuracy, fast
extraction, as well as light memory cost. A mechanism was proposed for pose
based human action recognition using Extreme Gradient Boosting[11] by Vina
Ayumi. A clear insight on comparisons for gesture recognitions using SVM[12],
naive bayes[13] and XGBoost was also delivered. Clustering is basically a strat-
egy of grouping a set of objects in such a way that objects which share similar
properties are likely to be more closer to each other in comparison to those
data point or objects in different assessments or clusters. A comparative study
of partition-predicated clustering techniques, such as K-Means[14], K-Means++
[15] and object predicated Fuzzy C-Means clustering algorithm was presented by
Kapoor and Singhal. A methodology for obtaining better results by application
of sorted and unsorted data into the algorithms. Gradient Boosting Decision
Tree (GBDT) has a lot of effective implementations such as XGBoost and pG-
BRT. The implementations are not satisfactory when there is a lot of data to
process. To overcome this hurdle, two techniques were designed by researchers
from Microsoft: Gradient-based One-Side Sampling (GOSS) and Exclusive Fea-
4 A. Vinay et al.
ture Bundling (EFB). LightGBM[16] was used to speed up the training process
of conventional GBDT by over 20 times without compromising the accuracy.
Fig. 1. Schematic of the proposed pipeline
3 Proposed Methodology
This sections explains about the procedures implemented to achieve face recog-
nition. The pipeline of the proposed model is represented in Fig 1.
3.1 Image De-Noising
In order to evaluate a bunch of images which are numerous and are compound in
nature the requirement of subspaces which are nonlinear in nature are required
for analysis. To achieve this, the input data is mapped in a non linear fashion to a
space which is of higher dimension by making use of the function ϕ:Rm=>J and
then singular value decomposition which is in J. The kernel function K introduces
a mapping ϕwhich takes the inner product amongst the input data which is
mapped and contained in the feature space. A= [ ϕ(x1)... ϕ(xn) ] is obtained
by the alteration of input data by utilizing ϕ. Let M = ATA , by performing
decomposition using eigen values we can deduce that M=QδQTwhere M is
a matrix which is obtained by taking the inner product of the corresponding
columns in matrix A and is evaluated by using the kernel regression function.
By performing singular factorization of A the rank r of the matrix is given by
the equation:
Ar= [AQr(δr)1/2][(δr)1/2][(Qr)T] = Urσr(Vr)T.(1)
where Qr=Q[:,1 : r]andδr=δ[1 : r, 1 : r]
The matrix M is positive in nature and is semi-definite which is obtained from
the utilization of the Mercer kernel. Basis Uris given by the equation:
Ur=AQr(δr)1/2:= (2)
Facial Analysis using Jacobians and Gradient Boosting 5
Fig. 2. Image De-noising using Kernel-SVD
Urwhich is the basis is obtained by the linear combination of B which is
the input data mapped. One more basis Xr= Bµwhich is acquired from B by
applying KSVD to it.The resulting output D can be computed and is given by
the equation:
D= (Ur)TXr=αTAT(3)
Kernel function particularly that of ATB is used in the above equation.On ap-
plying SVD on D the equation becomes:
YTDZ =θ(4)
Here diag(θ) = {θ1, , θr}are the primary angles formed by span(Ur) and
span(Xr).Those functions which are a function of θare often used to measure
the distance amongst subspaces. Fig 2 shows denoising of images.
3.2 Feature detection and description
In this method, as a preliminary step the input image is gray scaled. This helps in
removing unwanted features that arise due to variation in color. To detect facial
features in the image at different scale levels, the determinant of the Hessian
matrix is computed for each image.
Hessian =σ2
yy Li
xy) (5)
6 A. Vinay et al.
Fig. 3. Keypoint detection using AKAZE
Here L represents a filtered image in a non linear scale space. The Fast
Explicit Diffusion technique (FED) to enhance the non linear scale space com-
putation. The features extracted from the FED technique is used describe the
different characteristics of the image. Here, robustness, rotation-invariance is
achieved by doing binary tests between the mean of areas and an estimation
of orientation of facial interest point in KAZE and the rotation of the grid of
local binary descriptor respectively. The information about gradients and in-
tensity are very important for fast detection and description of features in the
image. The descriptor obtained is now used for feature aggregation and then for
classification.Key point detection using AKAZE can be seen in Fig 3.
3.3 VLAD
When the all the vectors of features are obtained after the descriptors are ap-
plied on the images, they now have to be aggregated. These vectors are of large
lengths and contain a lot information about the keypoints being described such
as color, location, intensity, information about neighboring pixels and a lot more.
However, all this information might not be needed for classification. If these fea-
tures are not ignored classification becomes harder and the time complexity also
increases. Hence, in order to prevent this we quantize the features using vector
of locally aggregated features VLAD in a non probabilistic Fisher kernel which
uses a codebook, computed using k-means++ algorithm. Each descriptor xtis
affiliated with its closest visual word in the codebook. Let µ1, µ2..., µkrepresent
codewords. The difference xtµito a vector liis cumulated. The algorithm
for feature quantization is described below: L2 normalization is applied to V. In
the above algorithm the 2nd FOR loop represents the cumulation of descriptors
and the 3rd FOR loop is used for power normalization. The obtained ddimen-
sional vector is quantized version of our facial descriptors and is apt for the
classification job.
Facial Analysis using Jacobians and Gradient Boosting 7
Algorithm 1 Computing the descriptor Vfrom a set of descriptors x1, x2, ..t.
Given codewords µ1, µ2..., µkcomputed using k-means++ algorithm
for i= 1, .., K do
li:= 0d
end for
for t= 1, ..., T do
li:= li+xiµi
end for
V= [lT
for u= 1, ...Kd do
Vu:= sign(Vu)|Vu|α
end for
3.4 Classification using Gradient Boosted Trees
After the features are quantized into a vector, they are passed to tree boost-
ing algorithms for person classification. Two implementations based on gradient
boosted decision tree have been used in this paper. They provide a comparison
between the speed and accuracy of the model.
The first one is a scalable end-to-end tree boosting algorithm called XGBoost.
Boosting combines a set of relatively weak learners to form a complex predictor
which tends to have a low error rate as they learn from the mistakes of the pre-
vious learner. The previous learners weights are also accounted for and at each
iteration they are updated with respect to the residual weights. Multiple decision
trees are constructed with a specific number of terminal nodes in the decision
tree, six in our case. This allows intercommunication of node values within tree
resulting in better feature under- standing. Gradient descent is used to minimize
the error. This algorithm was designed as a sparsity aware algorithm providing
robust and inexpensive computation.
The second implementation of the gradient boosted decision tree is LightGBM
which uses Gradient-based One-Side Sampling (GOSS) and Exclusive Feature
Bundling (EFB). In order to compute information gain at a faster rate, GOSS
samples data instances with only large gradients as they contribute more to the
final classification. EFB is a greedy technique which bundles exclusive features,
that is, it reduces features to provide robustness at the cost of accuracy.
These two implementations of gradient boosted decision tree provides a robust
and reliable method for classification of facial features.
4 Datasets Used
4.1 Faces95
Faces95 as shown in Fig 4 is a collection of facial images of 72 individuals with
large head scale variations. The dataset provides images of resolution 180 x
200 pixels in portrait mode. The collection contains images of both male and
8 A. Vinay et al.
female subjects, thus delivering a challenge in the upper half of the region of
interest as well. No agitation is offered in terms of background disfigurements,
but slight variations are observed in the red background because of shadows
and changes in illuminations. The person is not stationary and is subjected to
slight movements. The reverberation of this is observed in brightness changes in
the region of interest. The artificial lighting system adds up to the intensified
changes in the glare. The dataset offers meager variations in the expressions of
the subjects which is not a significant obstacle. The same hairstyle is maintained
on all the sample images for a single test subject.
Fig. 4. Sample record from the Faces95 Dataset
4.2 Faces96
Using a still camera 152 subjects were photographed with 20 images per per-
son. The database possesses images of size 192 x 192 pixels of both male and
female candidates. However one of the paramount hurdles offered by faces96 is
the variations in the background which does not lie in the region of interest.
Major challenge constituted by the dataset is the variations in the trivial parts
of the images. The individuals proffer slightest changes in the expressions while
tremendous changes in the head scale are exhibited. The person also locomotes
towards the camera after every test image delivering changes in the lighting con-
ditions as well. They dataset demonstrates variations in the head tilt, turn, and
slant but is not significant enough. The collection was designed and maintained
by Dr. Libor Spacek under Computer Vision Science Research Projects.
4.3 Grimace
The individual moves his/her head after every picture and makes grimaces on
their faces which gets drastic towards the end of every sequence. A set of 18
individuals are put through this process to form a collection of faces. The images
are of size 180 x 200 pixels which have a considerate amount of information for
further processing. No variations in terms of background and head scale are
presented. Although a considerable amount of discrepancy is unveiled head tilt
Facial Analysis using Jacobians and Gradient Boosting 9
and turns. Very little variability is presented in terms of illumination, which is
slightly better in case of the positioning of the head in the image. The database
as shown in Fig 5 offers excessive fluctuations in the expressions of the figurines
which delivers a major challenge for all mathematical models.
Fig. 5. Subject exhibiting variations in facial expressions
5 Results and Conclusion
We executed our proposed model over every group of images present in the three
benchmark databases, namely, FACES95, FACES96 and GRIMACE. The results
obtained using our technique on the three datasets are tabulated in Table 1. On
FACES95 and FACES96 our model predicts on an average 91% of the time
correctly. Even though the above mentioned datasets contains a high number
of classes(72 and 152), our model is still able to predict each individual with
a good confidence level. GRIMACE contains faces with extreme variations in
expression, illumination and translation, our model is able to correctly identify
the person with an accuracy of 93.33%.
Table 1. Performance Grid of XGBoost
Dataset Accuracy Recall Precision Prediction Time (100 images)
Faces95 0.9214 0.9214 0.9312 0.42ms
Faces96 0.90 0.9010 0.9134 0.94ms
Grimace 0.9333 0.9333 0.9467 0.13ms
Kernel-SVD plays an important role in denoising the image, reducing a few
features and thereby helping to lower computation cost in further steps.
Table 2. Feature Aggregation with vector of size 100 vs 200
Dataset 200 100
Faces95 0.9214 0.8927
Faces96 0.90 0.8533
Grimace 0.9333 0.9167
Changing the vector size of the feature aggregator impacts the accuracy a
lot. On increasing the number of features in the quantization step, the accuracy
10 A. Vinay et al.
of the model increases till a certain extent which can be seen in Table 2.
Table 3 shows a direct comparison of XGBoost with LightGBM for feature clas-
sification. Both of these algorithms ran for 500 epochs with maximum depth of
each tree created set to 7 and learning rate set to 0.5. From Table 3 it is un-
derstood that LightGBM outperforms XGBoost on all the three datasets by an
average margin of 2%.
Table 3. Comparison of XGBoost with LightGBM
Dataset LightGBM XGBoost
Faces95 0.94 0.9214
Faces96 0.9367 0.90
Grimace 0.9633 0.9333
The framework proposed is a mathematical model employing Single valued
decomposition for the purpose of image denoising. Further an hessian matrix is
computed to extract features from a facial image. The use of gradiet boosting
algorithm for classification by minimizing a loss function. The aim of the paper
is completed using mathematical models and hence do justice to the theme of
the conference
A direct comparison of our proposed method with the state of the art models
cannot be made because of the following reasons. Implementation and testing of
the modules are performed on different hardware which might result in different
efficiency results. Datasets used and pre-processing steps followed in the state of
the art models and our proposed methods are different which leads to variation in
results. [17] developed a method for deep hypersphere embedding for face recog-
nition to achieve a remarkable accuracy of 95% on the YTF dataset. Chain code
based local descriptors are proposed in [18] for the task of face recognition. [18]
was tested on CAS-PEAL, ColorFERET and FG-NET resulting in an average
accuracy of 98%. A deep learning approach for face recognition was developed in
[19] where a trunk branch ensemble convolutional neural network was designed
to solve the problem of pose variation and occlusions resulting in an average
accuracy of 95% on PaSC, COX Face and YouTube faces datasets. A different
approach to face recognition was proposed in [20] using multi-resolution wavelet
combining discrete cosine transform and Walsh transform which resulted in an
accuracy of 99.24% on FACES94 dataset.
1. Tat-Jun Chin, K. Schindler and D. Suter, ”Incremental kernel SVD for face recogni-
tion with image sets,”7th International Conference on Automatic Face and Gesture
Recognition (FGR06), Southampton, 2006, pp. 461-466.
2. Z. W. Wang, M. W. Huang and Z. L. Ying, ”The performance study of facial
expression recognition via sparse representation,”2010 International Conference on
Machine Learning and Cybernetics, Qingdao, 2010, pp. 824-827.
Facial Analysis using Jacobians and Gradient Boosting 11
3. J. Meltzer, M.-H. Yang, R. Gupta, and S. Soatto. Multiple view feature descriptors
from image sequences via kernel pca. In ECCV, pages 215227, 2004
4. Y. Liu, C. Lan, F. Yao, L. Li and C. Li, ”Oblique remote sensing image match-
ing based on improved AKAZE algorithm,”2016 Sixth International Conference on
Information Science and Technology (ICIST), Dalian, 2016, pp. 448-454
5. U. Sakarya et al., ”A short survey of hyperspectral remote sensing and hyperspec-
tral remote sensing research at tbtak Uzay,” 2015 7th International Conference on
Recent Advances in Space Technologies (RAST), Istanbul, 2015, pp. 187-192.
6. J. Liu, Q. Wu and X. Li, ”Research on Image Matching Algorithm Based on Local
Invariant Features,”2013 Ninth International Conference on Intelligent Information
Hiding and Multimedia Signal Processing, Beijing, 2013, pp. 113-116.
7. M. Hu, J. Chen and C. Shi, ”Three-dimensional mapping based on SIFT and
RANSAC for mobile robot,”2015 IEEE International Conference on Cyber Tech-
nology in Automation, Control, and Intelligent Systems (CYBER), Shenyang, 2015,
pp. 139-144
8. E. Spyromitros-Xioufis, S. Papadopoulos, I. Y. Kompatsiaris, G. Tsoumakas and
I. Vlahavas, ”A Comprehensive Study Over VLAD and Product Quantization in
Large-Scale Image Retrieval,” inIEEE Transactions on Multimedia, vol. 16, no. 6,
pp. 1713-1728, Oct. 2014.
9. Y. Wang, L. Y. Duan, J. Lin, Z. Wang and T. Huang, ”Hierarchical multi-VLAD for
image retrieval,”2015 IEEE International Conference on Image Processing (ICIP),
Quebec City, QC, 2015, pp. 4629-4633.
10. Z. Lu, F. Xu and Q. Tian, ”Research on quantization errors of stability for model-
based networked control system,”2012 Proceedings of International Conference on
Modelling, Identification and Control, Wuhan, Hubei, China, 2012, pp. 867-872.
11. V. Ayumi, ”Pose-based human action recognition with Extreme Gradient Boost-
ing,”2016 IEEE Student Conference on Research and Development (SCOReD),
Kuala Lumpur, 2016, pp. 1-5.
12. Q. Shubo, G. Shuai and Z. Tongxing, ”Research on Paper Defects Recognition
Based on SVM,”2010 WASE International Conference on Information Engineering,
Beidaihe, Hebei, 2010, pp. 177-180.
13. G. Qiang, ”Research and improvement for feature selection on naive bayes text
classifier,”2010 2nd International Conference on Future Computer and Communi-
cation, Wuhan, 2010, pp. V2-156-V2-159.
14. S. Na, L. Xumin and G. Yong, ”Research on k-means Clustering Algorithm: An
Improved k-means Clustering Algorithm,”2010 Third International Symposium on
Intelligent Information Technology and Security Informatics, Jinggangshan, 2010,
pp. 63-67.
15. A. Kapoor and A. Singhal, ”A comparative study of K-Means, K-Means++ and
Fuzzy C-Means clustering algorithms,”2017 3rd International Conference on Com-
putational Intelligence & Communication Technology (CICT), Ghaziabad, 2017, pp.
16. Guolin Ke , Qi Meng , Thomas Finley, Taifeng Wang , Wei Chen , Weidong Ma,
Qiwei Ye, Tie-Yan Liu, LightGBM: A Highly Efficient Gradient Boosting Decision
Tree, Conference on Neural Information Processing Systems, 2007.
17. Liu, Weiyang, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song.
”Sphereface: Deep hypersphere embedding for face recognition.” In The IEEE Con-
ference on Computer Vision and Pattern Recognition (CVPR), vol. 1. 2017.
18. Karczmarek, Pawe, Adam Kiersztyn, Witold Pedrycz, and Micha Dolecki. ”An ap-
plication of chain code-based local descriptor and its extension to face recognition.”
Pattern Recognition 65 (2017): 26-34.
12 A. Vinay et al.
19. Ding, Changxing, and Dacheng Tao. ”Trunk-branch ensemble convolutional neural
networks for video-based face recognition.” IEEE transactions on pattern analysis
and machine intelligence (2017).
20. Choudhary, Alpa, and Rekha Vig. ”Face recognition using multiresolution wavelet
combining discrete cosine transform and Walsh transform.” In Proceedings of the
2017 International Conference on Biometrics Engineering and Application, pp. 33-
38. ACM, 2017
... The proposed framework uses 17 classifiers, 10 of which are machine learning models [23,24], and rest of them are deep and pre-trained transfer learning models. Hence, several machine learning classifiers are used such as Adaboost [25], Decision Tree (DT) [26,27], Gradient Boosting (GB) [28], K-Nearest Neighbour (KNN) [29], Logistic Regression (LR) [30], Multi-layer Perceptron (MLP) [31], Naïve Bayes (NB) [32], Random Forest (RF) [33], Support Vector Machine (SVM) [34], Gradient Boosting (XGB) [35], Convolutional Neural Network (CNN) and pre-trained CNNs are DenseNet121 [36], ResNet50 [33], VGG16 [37], VGG19 [38], MobileNet-V1 [39] and MobileNet-V2 [40]. However, the results of default pre-trained models were not promising; hence, we appended several additional layers in each of models. ...
Full-text available
Autism spectrum disorder (ASD) is a complex neuro-developmental disorder that affects social skills, language, speech and communication. Early detection of ASD individuals, especially children, could help to devise and strategize right therapeutic plan at right time. Human faces encode important markers that can be used to identify ASD by analyzing facial features, eye contact, and so on. In this work, an improved transfer-learning-based autism face recognition framework is proposed to identify kids with ASD in the early stages more precisely. Therefore, we have collected face images of children with ASD from the Kaggle data repository, and various machine learning and deep learning classifiers and other transfer-learning-based pre-trained models were applied. We observed that our improved MobileNet-V1 model demonstrates the best accuracy of 90.67% and the lowest 9.33% value of both fall-out and miss rate compared to the other classifiers and pre-trained models. Furthermore, this classifier is used to identify different ASD groups investigating only autism image data using k-means clustering technique. Thus, the improved MobileNet-V1 model showed the highest accuracy (92.10%) for k = 2 autism sub-types. We hope this model will be useful for physicians to detect autistic children more explicitly at the early stage.
Conference Paper
Full-text available
This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter m. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.
Full-text available
Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blurinsensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.
Conference Paper
Full-text available
Hyperspectral remote sensing (HSRS) is becoming more and more attractive. Recent advances in sensor technologies enabled numerous applications of this imaging modality. HSRS research has been conducted at TUBITAK UZAY since 2012. This paper provides a short survey of these research and development activities ranging from hyperspectral remote sensing applications, radiometric correction, geometric correction and denoising to classification and fusion with other modalities of HS data using most recent algorithms.
Conference Paper
In this paper a face recognition system based on multi resolution hybrid wavelet approach has been presented. The multi resolution hybrid wavelet transform matrix is generated using Kronecker product of Walsh and DCT transform matrices. This wavelet is used to extract features from face images with different expressions of subjects' faces. A feature map is generated using energy compaction technique which is used as a template to extract features of enrolled and test images. The experiments are performed on faces94 database with different variations in facial expression, change in face position and occlusion. The recognition rates achieved are 99.24%.
Conference Paper
This Paper investigate action recognition by using Extreme Gradient Boosting (XGBoost). XGBoost is a supervised classification technique using an ensemble of decision trees. In this study, we also compare the performance of Xboost using another machine learning techniques Support Vector Machine (SVM) and Naive Bayes (NB). The experimental study on the human action dataset shows that XGBoost better as compared to SVM and NB in classification accuracy. Although takes more computational time the XGBoost performs good classification on action recognition.
Local descriptors are widely used technique of feature extraction to obtain information about both local and global properties of an object. Here, we discuss an application of the Chain Code-Based Local Descriptor to face recognition by focusing on various datasets and considering different variants of this description method. We augment the generic form of the descriptor by adding a possibility of grouping pixels into blocks, i.e., effectively describing larger neighborhoods. The results of experiments show the efficiency of the approach. We demonstrate that the obtained results are comparable or even better than those delivered by other important algorithms in the class of methods based on the Bag-of-Visual-Words paradigm.
Conference Paper
In recent years, mobile robots are playing an increasingly important role in the rescue work. Relative research has been widely attention. Mapping of the environment and navigation research become more important when robots need to move in a complex environment. Two-dimensional(2D) mapping can not be remarkable in robot navigation due to the limitation of information access, while laser-based 3D mapping always owns high cost. Besides, the accuracy of ordinary 3D mapping cannot meet requirement. In this paper, a method is presented to realize the low cost 3D-environment-mapping. We combine the two-dimensional navigation algorithm based on extended Kalman filtering with the three-dimensional mapping based on SIFT algorithm. The 2D laser ranger cooperates with Kinect sensor. Experimental result shows that the method can guide the robot to explore independently in the unknown environment with good performance for mapping and navigation.
Conference Paper
In order to address the matching problems of oblique remote sensing image with viewpoint change, geometric deformation and radiometric distortion, this paper presents a new point-based matching method which conducts feature extraction by building nonlinear space on simulation images derived from a full-range resample of the origin image. All distortions caused by the position change of camera are first modeled by different tilts images; then the feature points are localized by improved Accelerated-KAZE (AKAZE) algorithm, namely the feature points are detected in nonlinear space constructed by Fast Explicit Diffusion (FED) and variable conductance function, and the extracted feature points are described by improved SIFT descriptor; finally, Euclidean distance is used as similar metric to determine the correspondences and Random Sample Consensus (RANSAC) algorithm is employed to eliminate the false matches. In experiments of oblique images and UAV images, the number of feature points extracted by the proposed method is biggest, the effective correspondences are most and the correct matching ratio is highest among SIFT, ASIFT and our proposed algorithm. Thus, our proposed algorithm can be successfully used in oblique remote sensing image matching.