Conference PaperPDF Available

A Deep Learning Approach for Face Detection using Max Pooling

Authors:

Figures

Content may be subject to copyright.
A Deep Learning Approach for Face Detection
using Max Pooling
F.M. Javed Mehedi Shamrat
Department of Software Engineering
Daffodil International University
Dhaka, Bangladesh
javedmehedicom@gmail.com
Md. Masum Billah
Department of Software Engineering
Daffodil International University
Dhaka, Bangladesh
masum.swe.ndc@gmail.com
Md. Alauddin
Department of Computer Science and Engineering
European University of Bangladesh
Dhaka, Bangladesh
alauddin12340@gmail.com
Md. Al Jubair
Department of Computer Science and Engineering
European University of Bangladesh
Dhaka, Bangladesh
jubair@eub.edu.bd
Sovon Chakraborty
Department of Computer Science and Engineering
European University of Bangladesh
Dhaka, Bangladesh
sovonchakraborty2014@gmail.com
Rumesh Ranjan*
Department of Plant Breeding and Genetics
Punjab Agriculture University
Punjab, India
rumeshranjan@pau.edu
AbstractDeep learning is a trendy term these days, and it
refers to a modern age in machine learning in which algorithms
are taught to identify patterns in vast amounts of data. It mostly
refers to studying various layers of representation, which assists in
the understanding of data that includes text, sound, and pictures.
To interact with the objects in a video series, many researchers use
a form of deep learning called a CNN. Face detection involves
several face-related technologies, such as face authentication,
facial recognition, and face clustering. For identification and
understanding, effective preparation must be carried out. The
standard technique did not produce a positive outcome in terms of
face recognition precision. The objectives of this research are by
using a deep learning model to enhance the accuracy of face
detection. For recognizing faces from datasets, the proposed model
utilizes a deep learning technique named convolutional neural
networks. The proposed work is applied using Max Pooling, a well-
known deep learning process. Our model is trained and validated
using the LFW dataset, which includes 13000 photos collected
from Kaggle. The training accuracy of the model was 95.72%
percent, and the validation accuracy was 96.27%.
Keywordsface detection; CNN; max-pooling; deep learning
I. INTRODUCTION
Facial recognition is a basic step in computer vision and
pattern recognition, as well as a foundational process in face-
related science such as face analysis [1], verification [2], and
tracking [3]. Face recognition has been commonly used in a
variety of fields, including protection screening, after decades of
growth and testing. In the world of video clips, it is rapidly
becoming a science destination. Skin-color recognition and the
SVM classifier [4] are two non-neural network-based face
detectors that are widely used. Face detection using
conventional image feature extraction algorithms is accurate and
fast. To obtain cascade classifiers of various feature forms, Ma
et al. [5] introduced an AdaBoost-based training approach: HOG
for better discrimination, Haar-like. Owing to the existence of so
many poor classifiers, this needs a lot of computation. To
overcome extreme face occlusion, a Bayesian framework-based
algorithm [6] used the Omega shape created by a person's head
and shoulder for head localization. In Automatic Teller
Machines, it performs well in detecting faces with extreme
occlusion, but the scene is small. Mathias et al. [7], in addition
to AdaBoost-based approaches, suggested face identification
using deformable component models (DPM) and obtained
promising performance. However, the computing expense of
this approach is typically high. For detecting faces with
occlusion, another approach focused on DPM is suggested [8].
While it has a poor uniqueness as only face recognition
representations are used in the experiments, it can reduce false-
negative face recognition and identification error rates.
Two 3*3 convolutions are linked in series to substitute a 5*5
convolution in the Inception-V2 Block, and the parameter
number is decreased if the receptive fields are the same [9]. The
paper [10] shows that substitution with global average pooling
(GAP) for the layers of link in the last layers raises the fully
linked layer with AM-Softmax for R-Net. The Marked Faces in
the Wild dataset was suggested by Huang et al., [11], and it
contains all restrictions such as posture, illumination, shoes,
occlusions, and context. There are 13233 images in the dataset,
depicting 5749 persons of different ages. The DeepID3 deep
CNN was proposed by Sun et al. [12]., wherever in recent era
5th International Conference on Trends in Electronics and Informatics (ICOEI 2021)
Tirunelveli, India, 3-5, June 2021
Pre-Print
machine learning and deep learning widely used for various
purposes [13-18]. VGGNet and GoogLeNet are merged in their
convolution neural network. Convolution and genesis layers are
used to construct their architecture. Using the LFW dataset, they
were able to achieve 96% on the test results. The piecewise
affine transformation is often used to derive features from 3D
face rendering. The method was suggested by Taigman et al.,
[19] and is focused on 3D face modeling. This was achieved
utilizing a nine-layer convolutional neural network. On the LFW
dataset, they scored 97.35%. Sun et al. [20] suggested a system
for facial recognition that utilized a convolution neural network.
They used the features from the convolution neural network's
last secret layer. Complementary and over-complementary
images are created by merging different sections of the face. On
the LFW dataset, they scored 97.45%.
The rest of the paper is in the same arrangement. The recent
progress on detection and identification is discussed in this
sector. The analytical technique for the whole system's design is
defined in Section II. In Section III, the outcome of the structure
is investigated. Section IV ends with a study of the findings and
shortcomings, as well as preparations for future work.
II. RESEARCH METHODOLOGY
The human face is a significant force that plays an important
role in our everyday social experiences, such as expressing an
individual's personality. Face recognition is a biometric system
that identifies individuals by using mathematics to erase facial
features and then preserving them as a faceprint. Biometric
facial recognition technology has gained a lot of interest in
recent years owing to a large variety of uses in law enforcement
and other civilian sectors, institutes, and organisations. Thanks
to its non-contact operation, facial recognition technology has a
small benefit over other biometric technologies such as
fingerprint, palmprint, and iris. Without contacting or interacting
with the user, a face recognition device may identify them from
a distance. Furthermore, facial recognition technology assists in
crime detection by preserving the recorded picture in a
repository, which can then be utilized in several ways, such as
recognizing an individual.
Face recognition systems are currently used in social
networking platforms such as Twitter, malls, train stations, bus
stops, heavily protected locations, advertisements, and health
care. The aim of these apps is to eliminate illegal crime, false
verification, and the monitoring of compulsive gamblers in
casinos, while Facebook uses a facial recognition technology for
automated labeling. Wide data sets and complex features are
needed for face recognition in order to uniquely recognize
various subjects by manipulating different challenges such as
lighting, posture, and aging. Facial recognition technologies
have advanced dramatically over the past three years. In the last
decade, there has been a tremendous advancement in the field of
face detection. Most facial recognition technologies now work
best with just a few faces in the picture. Furthermore, these
methods have been put to the test in controlled lighting, with
correct facial poses and non-blurry photographs. Machine
learning [21-24] on edge computing nodes is already gaining
traction, and it's just going to get bigger with time. The whole
suggested system diagram as shown in Fig. 1.
Fig. 1. Proposed System diagram.
A. Dataset Collection:
We used LFW (Face Recognition) dataset
(https://www.kaggle.com/atulanandjha/lfwpeople) for this
study. The dataset contains over 13000 photos, but we used
exactly 13000 images for this study. With the name of the
person, each picture is properly labeled. Our data collection has
also been generated with a total of 104 images.
Fig. 2. Datasets Images Sample.
B. Preprocessing and augmentation of Data :
CNN performs better with the increasing amount of data.
ImageDataGenerator has been used to increase the amount of
data from our existing dataset. Within the ImageDataGenerator
feature, we have enabled zooming, shearing, scaling. Initially,
the images were transformed in 256 X 256 pixels.
C. Proposed Convolution Neural Network (CNN) architecture
For classification and image processing, CNN is used. One
or two convolution layers make up a CNN. Rather than dealing
with the whole picture, CNN tries to identify elements that are
useful inside it. There are several hidden layers in CNN, as well
as an input layer and an output layer. In this analysis, we used a
deep CNN [25-27] with four convolution layers. Convolution is
a technique for merging two mathematical functions to create a
single one. Our CNN model's working process is depicted in Fig.
3.
Fig. 3. Three Convolution Layer with Max pooling operation.
First, images are transformed into 256 X 256 pixels and
transferred to the first convolutional layer in the proposed
architecture. There are 128 hidden layers in all. We transformed
all images into 128 X 128 after running the max-pooling
process. The second convolutional layer then extracts the feature
and again max pooling is applied. The final layer resizes the
images to 32 X 32 pixels. To render measurements simpler,
photos are transformed into numpy arrays. The final step is to
apply a connected layer. We used the Relu activation function in
both of the convolution layers, and the Softmax activation
function in the output layer.Adam stochastic gradient descent is
applied for optimal result finding. The proposed system
algorithm is shown below:
Algorithm of the proposed system:
Input: Image data
Output: Image Classification
Step 1: Input Image data
Step 2: Call Function Conv2D (Add (Number of filters,
matrix = 256 x 256, padding)
Step 3: Activate RELU)
Step 4: Call Function MaxPooling2D (add (data, pool size))
Step 5: Call Function Conv2D (Add (Number of filters,
matrix = 128 x 128, padding)
Step 6: Activate RELU)
Step 7: Call Function MaxPooling2D (add (data, pool size))
Step 8: Call Function Conv2D (Add (Number of filters,
matrix = 64 x 64, padding)
Step 9: Activate RELU)
Step 10: Call Function MaxPooling2D (add (data, pool size))
Step 11: Reshape image, set list
Step 12: Add (Multiply (image, pool size), matrix)
Step 13: Activate Softmax
Step 14: Output image classification
D. Evaluating performance using performance matrix:
We measured the performance using precision, recall, f1-
score, and accuracy after completing the training and testing
phase. The formulas that we used are as follows:
  
 (1)
  
 (2)
  
 (3)
   
 (4)
III. EXPERIMENTAL RESULT ANALYSIS:
Our model can recognize the face of a particular individual
based on their name. The dataset includes 13000 images, with
1680 people's images included. Our model can detect images
with 95.72% accuracy for training data. We've separated training
and testing into 80/20 percentages. The highest level of
validation accuracy is 96.27%. The minimum data loss during
validation is 5.32%. The findings of our training and validation
dataset are shown in Table I.
TABLE I. DEMONSTRATE THE RESULT OF OUR TRAINING AND
VALIDATION DATASET.
Epoch
Training
Loss
Training
Accuracy
Validation
Loss
Validation
Accuracy
1
44.13%
86.24%
13.64%
89.98%
2
12.14%
89.53%
8.32%
91.05%
3
10.42%
90.27%
8.12%
92.53%
4
9.43%
90.79%
8.01%
92.25%
5
9.12%
91.87%
7.63%
93.15%
6
8.87%
92.57%
7.17%
93.89%
7
8.45%
92.89%
6.89%
94.04%
8
8.15%
94.05%
6.77%
95.12%
9
6.23%
94.34%
6.01%
95.53%
10
5.89%
95.72%
5.32%
96.27%
Fig. 4. Validation Accuracy and Training Accuracy for CNN with Max
Pooling Layer.
The max-pooling model outperformed in this study. The
model achieved 95.72% training accuracy and validation
accuracy is 96.27%. The LFW dataset contains 13000 files, with
80% of them being used for training and the remaining 2600
images being used for testing. The confusion matrix is seen
below after the evaluation has been completed on the dataset.
The confusion matrix is depicted in Table II.
TABLE II. CONFUSION MATRIX ON LFW DATASET.
Class
Precision
Recall
Accuracy
Training Set
et
96.22
93.03%
93.28
Testing Set
93.24
93.29%
93.42
Later, we used our self-captured selfie photos, with a total of
four classes (1.shamrat, 2.shongkho, 3.masum, 4.jubair). We
have such a minimum of 200 pictures, with 50 images from
each. The confusion matrix is displayed in Table III.
TABLE III. CONFUSION MATRIX AFTER APPLYING MAXPOOLING.
Class
Precision
Recall
Accuracy
Shamrat
et
96.24
94.79
96.33
Shongkho
95,89
94.36
95.78
Masum
95.93
94.77
96.01
Jubair
94.69
94.23
94.73
This model is capable of recognizing the face from datasets.
In Fig. 5 showing the detection result of our proposed model.
Fig. 5. Successfully Detection of Faces from Dataset Images.
IV. CONCLUSION AND FUTURE WORK
Face detection is a way of identifying or checking an
individual's identification by utilizing his/her face. For multiple
purposes such as an auto-control attendance control scheme,
monitoring of limited access areas like intruder detection living
areas and public acknowledgement of celebrities, the
recognition of household detainees by a networked home
automation device and much others, face recognition has been
used. In this study, a deep Convolution Neural Networks (CNN)
[28] architecture was used. Our main goal was to recommend a
proper model with high accuracy such that face recognition
could be accurate. For measuring accuracy with a larger dataset,
we would attempt to apply further models to compare with our
proposed model. Our model can be applied to any automated
system [29] or IoT [30-34] embedded device for face
recognition more accurately.
REFERENCES
[1] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, ‘‘ArcFace: Additive angular
margin loss for deep face recognition,’’ in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 46904699.
[2] D. Chen, C. Xu, J. Yang, J. Qian, Y. Zheng, and L. Shen, ‘‘Joint Bayesian
guided metric learning for end-to-end face verification,’’
Neurocomputing, vol. 275, pp. 560567, Jan. 2018.
[3] M. H. Khan, J. McDonagh, and G. Tzimiropoulos, ‘‘Synergy between
face alignment and tracking via discriminative global consensus
optimization,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
pp. 38113819.
[4] M. Drożdż and T. Kryjak, ‘‘FPGA implementation of multi-scale face
detection using HOG features and SVM classifier,’’ Image Process.
Commun., vol. 21, no. 3, pp. 2744, Sep. 2016.
[5] C. Ma, N. Trung, H. Uchiyama, H. Nagahara, A. Shimada, and R.-I.
Taniguchi, ‘‘Adapting local features for face detection in thermal image,’’
Sensors, vol. 17, no. 12, p. 2741, Nov. 2017.
[6] T. Zhang, J. Li, W. Jia, J. Sun, and H. Yang, ‘‘Fast and robust occluded
face detection in ATM surveillance,’’ Pattern Recognit. Lett., vol. 107,
pp. 3340, May 2018.
[7] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, ‘‘Face
detection without bells and whistles,’’ in Proc. Eur. Conf. Comput. Vis.
Springer, 2014, pp. 720735.
[8] D. Marcetic and S. Ribaric, ‘‘Deformable part-based robust face detection
under occlusion by using face decomposition into face components,’’ in
Proc. 39th Int. Conv. Inf. Commun. Technol., Electron. Microelectron.
(MIPRO), May 2016, pp. 13651370.
[9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, ``Going deeper with convolutions,'' in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp.
1-9.
[10] X. Li, Z. Yang and H. Wu, "Face Detection Based on Receptive Field
Enhanced Multi-Task Cascaded Convolutional Neural Networks," in
IEEE Access, vol. 8, pp. 174922-174930, 2020, doi:
10.1109/ACCESS.2020.3023782.
[11] G. B. Huang, M. Mattar, T. Berg, and E. Learned–Miller, ‘‘Labeled faces
in the wild: A database forstudying face recognition in unconstrained
environments,’’ in Proc. Workshop Faces ‘Real-Life’ Images, Detection,
Alignment, Recognit., Oct. 2008, pp. 111.
[12] Y. Sun, D. Liang, X. Wang, and X. Tang, ‘‘DeepID3: Face recognition
with very deep neural networks,’’ Feb. 2015, arXiv:1502.0087. [Online].
Available: https://arxiv.org/abs/1502.00873.
[13] F. M. Javed Mehedi Shamrat, Z. Tasnim, P. Ghosh, A. Majumder and M.
Z. Hasan, "Personalization of Job Circular Announcement to Applicants
Using Decision Tree Classification Algorithm," 2020 IEEE International
Conference for Innovation in Technology (INOCON), Bangluru, India,
2020, pp. 1-5, doi: 10.1109/INOCON50539.2020.9298253.
[14] S. Manlangit, “Novel Machine Learning Approach for Analyzing
Anonymous Credit Card Fraud Patterns,” International Journal of
Electronic Commerce Studies, vol. 10, no. 2, 2019.
[15] F. M. Javed Mehedi Shamrat, P. Ghosh, M. H. Sadek, M. A. Kazi and S.
Shultana, "Implementation of Machine Learning Algorithms to Detect the
Prognosis Rate of Kidney Disease," 2020 IEEE International Conference
for Innovation in Technology (INOCON), Bangluru, India, 2020, pp. 1-7,
doi: 10.1109/INOCON50539.2020.9298026.
[16] P. Ghosh, F. M. Javed Mehedi Shamrat, S. Shultana, S. Afrin, A. A.
Anjum and A. A. Khan, "Optimization of Prediction Method of Chronic
Kidney Disease Using Machine Learning Algorithm," 2020 15th
International Joint Symposium on Artificial Intelligence and Natural
Language Processing (iSAI-NLP), Bangkok, Thailand, 2020, pp. 1-6, doi:
10.1109/iSAI-NLP51646.2020.9376787.
[17] K. Mahmud, S. Azam, A. Karim, S. Zobaed, B. Shanmugam, and D.
Mathur, “Machine Learning Based PV Power Generation Forecasting in
Alice Springs,” IEEE Access, pp. 1–1, 2021.
[18] F.M. Javed Mehedi Shamrat, Md. Asaduzzaman, A.K.M. Sazzadur
Rahman, Raja Tariqul Hasan Tusher, Zarrin Tasnim “A Comparative
Analysis of Parkinson Disease Prediction Using Machine Learning
Approaches” International Journal of Scientific & Technology Research,
Volume 8, Issue 11, November 2019, ISSN: 2277-8616, pp: 2576-2580.
[19] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, ‘‘DeepFace: Closing the
gap to human-level performance in face verification,’’ in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 17011708.
[20] Y. Sun, X. Wang, and X. Tang, ‘‘Deep learning face representation from
predicting 10,000 classes,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2014, pp. 18911898.
[21] F. M. Javed Mehedi Shamrat, Md. Abu Raihan, A.K.M. Sazzadur
Rahman, Imran Mahmud, Rozina Akter, “An Analysis on Breast Disease
Prediction Using Machine Learning Approaches” International Journal of
Scientific & Technology Research, Volume 9, Issue 02, February 2020,
ISSN: 2277-8616, pp: 2450-2455.
[22] P. Ghosh et al., "Efficient Prediction of Cardiovascular Disease Using
Machine Learning Algorithms With Relief and LASSO Feature Selection
Techniques," in IEEE Access, vol. 9, pp. 19304-19326, 2021, doi:
10.1109/ACCESS.2021.3053759.
[23] A.K.M Sazzadur Rahman, F. M. Javed Mehedi Shamrat, Zarrin Tasnim,
Joy Roy, Syed Akhter Hossain “A Comparative Study on Liver Disease
Prediction Using Supervised Machine Learning Algorithms”
International Journal of Scientific & Technology Research, Volume 8,
Issue 11, November 2019, ISSN: 2277-8616, pp: 419-422.
[24] F. M. Javed Mehedi Shamrat, Zarrin Tasnim, Imran Mahmud, Ms. Nusrat
Jahan, Naimul Islam Nobel, “Application Of K-Means Clustering
Algorithm To Determine The Density Of Demand Of Different Kinds Of
Jobs”, International Journal of Scientific & Technology Research,
Volume 9, Issue 02, February 2020, ISSN: 2277-8616, pp: 2550-2557.
[25] A. Karim, S. Azam, B. Shanmugam, and K. Kannoorpatti, “Efficient
Clustering of Emails Into Spam and Ham: The Foundational Study of a
Comprehensive Unsupervised Framework,” IEEE Access, vol. 8, pp.
154759154788, 2020.
[26] P. Ghosh et al., “A Comparative Study of Different Deep Learning Model
for Recognition of Handwriting Digits,” International Conference on IoT
Based Control Networks and Intelligent Systems (ICICNIS 2020), pp. 857
866, January 19, 2021.
[27] M. F. Foysal, M. S. Islam, A. Karim, and N. Neehal, “Shot-Net: A
Convolutional Neural Network for Classifying Different Cricket Shots,”
Communications in Computer and Information Science, pp. 111120,
2019.
[28] Junayed M.S., Jeny A.A., Neehal N., Atik S.T., Hossain S.A. (2019) A
Comparative Study of Different CNN Models in City Detection Using
Landmark Images. In: Santosh K., Hegadi R. (eds) Recent Trends in
Image Processing and Pattern Recognition. RTIP2R 2018.
Communications in Computer and Information Science, vol 1035.
Springer, Singapore. https://doi.org/10.1007/978-981-13-9181-1_48.
[29] Biswas A., Chakraborty S., Rifat A.N.M.Y., Chowdhury N.F., Uddin J.
(2020) Comparative Analysis of Dimension Reduction Techniques Over
Classification Algorithms for Speech Emotion Recognition. In: Miraz
M.H., Excell P.S., Ware A., Soomro S., Ali M. (eds) Emerging
Technologies in Computing. iCETiC 2020. Lecture Notes of the Institute
for Computer Sciences, Social Informatics and Telecommunications
Engineering, vol 332. Springer, Cham. https://doi.org/10.1007/978-3-
030-60036-5_12.
[30] Javed Mehedi Shamrat F.M., Allayear S.M., Alam M.F., Jabiullah M.I.,
Ahmed R. (2019) A Smart Embedded System Model for the AC
Automation with Temperature Prediction. In: Singh M., Gupta P., Tyagi
V., Flusser J., Ören T., Kashyap R. (eds) Advances in Computing and
Data Sciences. ICACDS 2019. Communications in Computer and
Information Science, vol 1046. Springer, Singapore.
https://doi.org/10.1007/978-981-13-9942-8_33
[31] F. M. Javed Mehedi Shamrat, Zarrin Tasnim, Naimul Islam Nobel, and
Md. Razu Ahmed. 2019. An Automated Embedded Detection and Alarm
System for Preventing Accidents of Passengers Vessel due to Overweight.
In Proceedings of the 4th International Conference on Big Data and
Internet of Things (BDIoT'19). Association for Computing Machinery,
New York, NY, USA, Article 35, 15.
DOI:https://doi.org/10.1145/3372938.3372973
[32] Shamrat F.M.J.M., Nobel N.I., Tasnim Z., Ahmed R. (2020)
Implementation of a Smart Embedded System for Passenger Vessel
Safety. In: Saha A., Kar N., Deb S. (eds) Advances in Computational
Intelligence, Security and Internet of Things. ICCISIoT 2019.
Communications in Computer and Information Science, vol 1192.
Springer, Singapore. https://doi.org/10.1007/978-981-15-3666-3_29.
[33] A. Islam Chowdhury, M. Munem Shahriar, A. Islam, E. Ahmed, A.
Karim, and M. Rezwanul Islam, “An Automated System in ATM Booth
Using Face Encoding and Emotion Recognition Process,” 2020 2nd
International Conference on Image Processing and Machine Vision, 2020.
[34] F.M. Javed Mehedi Shamrat, Shaikh Muhammad Allayear and Md. Ismail
Jabiullah "Implementation of a Smart AC Automation System with Room
Temperature Prediction ", Journal of the Bangladesh Electronic Society,
Volume 18, Issue 1-2, June-December 2018, ISSN: 1816-1510, pp: 23-
32.
... Proposed an effective method for updating linear subspace representations in real-time video recognition. [9] "Deep Metric Learning" ...
Research
Full-text available
This paper explores the intersection of video recognition, computer vision, and artificial intelligence, highlighting its broad applicability across various fields. The research fo-cuses on the applications, challenges, ethical dilemmas, and outcomes of artificial intelligence , which continues to grow in significance in the 21st century. We propose a systematic approach that incorporates models for face detection, feature extraction, and recognition. Our methodology includes the accurate segmentation of 100 human faces from video frames, with each face averaging 150x150 pixels. The feature extraction process yielded 1,000 face feature vectors, with an average size of 128, representing key characteristics for recognition. By applying a cosine similarity threshold of 0.7, we filtered irrelevant data and determined whether the two images matched. Our recognition system achieved 85% accuracy, demonstrating the effectiveness of the models and techniques employed. Additionally, ethical considerations were addressed, emphasizing the importance of data privacy, informed consent, cybersecurity, and transparency. This research advances the understanding of face recognition from video data and highlights the need for further exploration in this domain
... By identifying the most crucial data, it reduces the sample size of the feature maps. A popular method called "max pooling" uses the maximum value within a narrow window is retained, reducing the spatial dimensions of the feature maps [4]. ...
... The expression of fear is explicated with opened mouth, lips stretched, and skewed eyebrows. Eminently majority of FER systems were prompted to recognize six basic emotional expressions and certain more emotions such as contempt, envy, pain, and drowsiness were attempted by a few FER systems [5]. Furthermore, some FER systems associate expressions with spontaneous and pose-based expressions. ...
... Javed et.al. [7], Suggested a novel framework employing a deep learning approach known as Convolutional Neural Networks (CNN). The presented system incorporated Max Pooling, a widely recognized technique in deep learning. ...
Article
In this study, we present an innovative approach to real-time facial feature detection utilizing the MTCNN (MultiTask Cascaded Neural Network) deep learning architecture. Unlike traditional methods, our novel framework combines advanced techniques to achieve unparalleled precision and efficiency in facial feature localization. Through a meticulous exploration of MTCNN's capabilities, we unveil a transformative methodology that significantly enhances the speed and accuracy of real-time facial detection. Our research focuses on pushing the boundaries of existing facial recognition technologies, introducing a fresh perspective on the application of MTCNN. By leveraging its unique architecture, we not only address the challenges associated with real-time detection but also enhance the overall robustness of the system. The proposed approach showcases the untapped potential of MTCNN, establishing it as a key player in the realm of facial feature detection. Through rigorous experimentation and evaluation, we demonstrate the superiority of our approach over conventional methods, highlighting its effectiveness in diverse scenarios. This work contributes to the ongoing evolution of deep learning applications in computer vision, with implications for security, surveillance, and various human-computer interaction domains. Our findings open new avenues for researchers and practitioners seeking cutting-edge solutions in the dynamic field of real-time facial feature detection.
... 8 [3,44,45,46,47,48,31,13] Batch Normalization Normalizes the activations of each layer, improving convergence and training stability, often accelerating training and enhancing generalization. ...
Article
Full-text available
The development of deep learning algorithms has led to major improvements in image classification, a key problem in computer vision. In this study, the researcher provide an in-depth analysis of the various deep learning method architectures used for image classification. By efficiently learning hierarchical representations straight from raw image data, deep learning has brought about amazing performance gains across a wide range of applications, therefore revolutionizing the discipline. The objective was to review how different architectural choices impact the performance of deep learning models in image classification. Journals and papers published by IEEE access, ACM, Springer, Google scholar, Wiley online library, and Springer between 2013 and 2023 were analyzed. Sixty two publications were chosen based on their titles from the results of the search. The results show that more complex designs usually have better accuracy, but they may also be prone to overfitting and so benefit from regularization methods. Convolutional layers for feature extraction, pooling layers for down sampling and lowering spatial dimensions, and fully linked layers for classification are typical architectural components in deep learning algorithms for image classification. The common occurrence of skip connections in residual networks allows for a more uniform gradient flow and the training of more complex models. Models' discriminatory skills may be improved with the use of attention processes that help them zero down on important parts of a picture. In conclusion to prevent overfitting, regularization techniques like batch normalization and dropout are often used. Improved feature propagation and targeted learning, enabled by skip connections and attention techniques, greatly boosts model performance.
Article
In this research, a novel deep learning -based method called OCEAN (Otsu Combined Entity Aware Network) has been introduced to enhance face detection accuracy, particularly in the presence of occlusion. Initially, Adaptive Median Filtering is applied to the input occluded images to remove noise. The pre-processed images are then analyzed using the YOLOv7 network to detect occluded objects on the face. Next, the face entities on the occluded regions are segmented using the Otsu image segmentation algorithm. Angular Vector Projection Scaling is applied to these segments to correlate the entities using an Angular Vector Transformation Matrix based on the training dataset. This step helps identify variant features and non-variant hotspot variables from pixel variations in the occluded face objects. Finally, the face is detected using the YOLOv7 network based on the regenerated image and training samples. The proposed method enhances the accuracy by 5.18%, 4.21%, and 2.33% compared to YOLOv3, Tiny-YOLOv4, and YOLOv5, respectively.
Article
Full-text available
Visual Feature Learning (VFL) is a critical area of research in computer vision that involves the automatic extraction of features and patterns from images and videos. The applications of VFL are vast, including object detection and recognition, facial recognition, scene understanding, medical image analysis, and autonomous vehicles. In this paper, we propose to conduct extensive systematic literature review (SLR) on VFL based on deep learning algorithms. The paper conducted an SLR covering deep learning algorithms such as Convolutional Neural Networks (CNNs), Autoencoders, and Generative Adversarial Networks (GANs) including their variants. The review highlights the importance of VFL in computer vision and the limitations of traditional feature extraction techniques. Furthermore, it provides an in-depth analysis of the strengths and weaknesses of various deep learning algorithms for solving problems in VFL. The discussion of the applications of VFL provides an insight into the impact of VFL on various industries and domains. The review also analyzed the challenges faced by VFL, such as data scarcity and quality, overfitting, generalization, interpretability, and explainability. The discussion of future directions for VFL includes hybrid techniques, unsupervised feature learning, continual learning, attention-based models, and explainable AI. These techniques aim to address the challenges faced by VFL and improve the performance of the models. The systematic literature review concludes that VFL is a rapidly evolving field with the potential to transform many industries and domains. The review highlights the need for further research in VFL and emphasizes the importance of responsible use of VFL models in various applications. The review provides valuable insights for researchers and practitioners in the field of computer vision, who can use these insights to enhance their work and ensure the responsible use of VFL models.
Conference Paper
This research work delves into the significance of people counting systems, emphasizing their role in furnishing valuable data for operational improvement, security enhancement, and resource optimization in various businesses and organizations. The focus is on a thorough examination of face detection and object detection methodologies rooted in computer vision and deep learning. Specifically, the study scrutinizes Multi-task Cascaded Convolutional Networks (MTCNN) and YuNet for face detection, along with the Histogram of Oriented Gradients (HOG) feature descriptor coupled with Support Vector Machine (SVM) and the real-time capabilities of You Only Look Once version 8 (YOLOv8) for object detection. Through empirical evaluation across diverse conditions and comparative analysis, this research aims to elucidate the strengths and limitations of these algorithms in the context of people counting tasks. The findings provide valuable insights to guide the selection of suitable approaches for specific use cases, contributing to the ongoing progress in the field of computer vision.
Article
Full-text available
The generation volatility of photovoltaics (PVs) has created several control and operation challenges for grid operators. For a secure and reliable day or hour-ahead electricity dispatch, the grid operators need the visibility of their synchronous and asynchronous generators’ capacity. It helps them to manage the spinning reserve, inertia and frequency response during any contingency events. This study attempts to provide a machine learning-based PV power generation forecasting for both the short and long-term. The study has chosen Alice Springs, one of the geographically solar energy-rich areas in Australia, and considered various environmental parameters. Different machine learning algorithms, including Linear Regression, Polynomial Regression, Decision Tree Regression, Support Vector Regression, Random Forest Regression, Long Short-Term Memory, and Multilayer Perceptron Regression, are considered in the study. Various comparative performance analysis is conducted for both normal and uncertain cases and found that Random Forest Regression performed better for our dataset. The impact of data normalization on forecasting performance is also analyzed using multiple performance metrics. The study may help the grid operators to choose an appropriate PV power forecasting algorithm and plan the time-ahead generation volatility.
Article
Full-text available
Cardiovascular diseases are among the most common serious illnesses affecting human health. CVDs may be prevented or mitigated by early diagnosis, and this may reduce mortality rates. Identifying risk factors using machine learning models is a promising approach. We would like to propose a model that incorporates different methods to achieve effective prediction of heart disease. For our proposed model to be successful, we have used efficient Data Collection, Data Pre-processing and Data Transformation methods to create accurate information for the training model. We have used a combined dataset (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). Suitable features are selected by using the Relief, and Least Absolute Shrinkage and Selection Operator (LASSO) techniques. New hybrid classifiers like Decision Tree Bagging Method (DTBM), Random Forest Bagging Method (RFBM), K-Nearest Neighbors Bagging Method (KNNBM), AdaBoost Boosting Method (ABBM), and Gradient Boosting Boosting Method (GBBM) are developed by integrating the traditional classifiers with bagging and boosting methods, which are used in the training process. We have also instrumented some machine learning algorithms to calculate the Accuracy (ACC), Sensitivity (SEN), Error Rate, Precision (PRE) and F1 Score (F1) of our model, along with the Negative Predictive Value (NPR), False Positive Rate (FPR), and False Negative Rate (FNR). The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy while using RFBM and Relief feature selection methods (99.05%).
Article
Full-text available
With the expansion of Artificial Neural Network (ANN), Deep Learning (DL) has brought interesting turn in the various fields of Artificial Intelligence (AI) by making it smarter and more efficient than what we had even in 10-2 years back. DL has been in use in various fields due to its versatility. Convolutional Neural Network (CNN) is at the major point of advancement that brings together the ANN and innovative DL techniques. In this research paper, we have contrived a multi-layer, fully connected neural network (NN) with 10 and 12 hidden layers for handwritten digits (HD) recognition. The testing is performed on the publicly attainable MNIST handwritten database. We selected 60,000 images from the MNIST database for training, and 10,000 images for testing. Our multi-layers ANN (10), ANN (12) and CNN are able to achieve an overall accuracy of 99.10%, 99. 34% and 99.70% respectively while determining digits using the MNIST handwriting dataset.
Conference Paper
Full-text available
The chronic kidney disease is the loss of kidney function. Often time, the symptoms of the disease is not noticeable and a significant amount of lives are lost annually due to the disease. Using machine learning algorithm for medical studies, the disease can be predicted with a high accuracy rate and a very short time. Using four of the supervised classification learning algorithms, i.e., logistic regression, Decision tree, Random Forest and KNN algorithms, the prediction of the disease can be done. In the paper, the performance of the predictions of the algorithms are analyzed using a pre-processed dataset. The performance analysis is done base on the accuracy of the results, prediction time, ROC and AUC Curve and error rate. The comparison of the algorithms will suggest which algorithm is best fit for predicting the chronic kidney disease.
Preprint
Full-text available
Chronic Kidney disease (CKD), a slow and late-diagnosed disease, is one of the most important problems of mortality rate in the medical sector nowadays. Based on this critical issue, a significant number of men and women are now suffering due to the lack of early screening systems and appropriate care each year. However, patients' lives can be saved with the fast detection of disease in the earliest stage. In addition, the evaluation process of machine learning algorithm can detect the stage of this deadly disease much quicker with a reliable dataset. In this paper, the overall study has been implemented based on four reliable approaches, such as Support Vector Machine (henceforth SVM), AdaBoost (henceforth AB), Linear Discriminant Analysis (henceforth LDA), and Gradient Boosting (henceforth GB) to get highly accurate results of prediction. These algorithms are implemented on an online dataset of UCI machine learning repository. The highest predictable accuracy is obtained from Gradient Boosting (GB) Classifiers which is about to 99.80% accuracy. Later, different performance evaluation metrics have also been displayed to show appropriate outcomes. To end with, the most efficient and optimized algorithms for the proposed job can be selected depending on these benchmarks.
Article
Full-text available
With the continuous development of deep learning, face detection methods have made the greatest progress. For real-time detection, cascade CNN based on the lightweight model is still the dominant structure that predicts face in a coarse-to-fine manner with strong generalization ability. Compared to other methods, it is not required for a fixed size of the input. However, MTCNN still has poor performance in detecting tiny targets. To improve model generalization ability, we propose a Receptive Field Enhanced Multi-Task Cascaded CNN. This network takes advantage of the Inception-V2 block and receptive field block to enhance the feature discriminability and robustness for small targets. The experimental results show that the performance of our network is improved by 1.08% on the AFW, 2.84% on the PASCAL FACE, 1.31% on the FDDB, and 2.3%, 2.1%, and 6.6% on the three sub-datasets of the WIDER FACE benchmark in comparison with MTCNN respectively. Furthermore, our structure uses 16% fewer parameters.
Conference Paper
Full-text available
The research work aimed to present a comparative study of increasing the performance of classifier algorithms by using dimension reduction algorithms. The dataset had been collected from Ryerson AudioVisual Database (RAVDESS). The research had been conducted to detect five emotional speech (happy, sad, angry, fearful, neutral) accurately. At first Mel Frequency Cepstrum Coefficients (MFCC) were extracted where seven dominant features had been extracted. Two other features were directly extracted from the dataset. Then different classifier algorithms (Random Forest, Gradient Boosting and Support Vector Machine) had been applied to the dataset. This initial study showed that Random Forest had the highest accuracy level of 61.26%. After that, dimension reduction techniques namely Recursive Feature Elimination, Principal Component Analysis and P-value Calculation had been applied to the dataset. Then classifier algorithms were used for accuracy again. Later this study showed that a progress in terms of accuracy (63.12%) had resulted from Gradient Boosting.
Conference Paper
Full-text available
Nowadays, the banking transaction system is more flexible than the previous one. When the banking sector introduces the ATM booth to us, it was a step ahead to ease the human effort. Here, ATM booth is an automated teller machine that gives out money to the consumer by inserting a card in it. All ATM booths support both credit and debit cards for the transaction, and this has saved everyone’s time. But still, there are some certain situations, i.e., forgetting the card authentication details for a transaction can ruin a consumer's day. For this reason, this paper has tried to propose a system that will help everyone regarding this situation. This proposed system is about face encoding process with an emotion recognition test for making transactions faster and accurate, based on Convolutional Neural Network (CNN). However, normal card transactions can still be possible besides using the proposed system. FER2013 dataset was used for training and then tested the model using our own sample images. The result shows that the proposed system can correctly separate ‘Happy’ faces from other emotional faces and allow the transaction to proceed.
Article
Full-text available
The spread and adoption of spam emails in malicious activities like information and identity theft, malware propagation, monetary and reputational damage etc. are on the rise with increased effectiveness and diversification. Without doubt these criminal acts endanger the privacy of many users and businesses’. Several research initiatives have taken place to address the issue with no complete solution until now; and we believe an intelligent and automated methodology should be the way forward to tackle the challenges. However, till date limited studies have been conducted on the applications of purely unsupervised frameworks and algorithms in tackling the problem. To explore and investigate the possibilities, we intend to propose an anti-spam framework that fully relies on unsupervised methodologies through a multi-algorithm clustering approach. This paper presents an in-depth analysis on the methodologies of the first component of the framework, examining only the domain and header related information found in email headers. A novel method of feature reduction using an ensemble of ‘unsupervised’ feature selection algorithms has also been investigated in this study. In addition, a comprehensive novel dataset of 100,000 records of ham and spam emails has been developed and used as the data source. Key findings are summarized as follows: I) out of six different clustering algorithms used – Spectral and K-means demonstrated acceptable performance while OPTICS projected the optimum clustering with an average of 3.5% better efficiency than Spectral and Kmeans, validated through a range of validations processes II) The other three algorithms- BIRCH, HDBSCAN and K-modes, did not fare well enough. III) The average balanced accuracy for the optimum three algorithms has been found to be ≈94.91%, and IV) The proposed feature reduction framework achieved its goal with high confidence.