Content uploaded by F M Javed Mehedi Shamrat
Author content
All content in this area was uploaded by F M Javed Mehedi Shamrat on May 29, 2021
Content may be subject to copyright.
A Deep Learning Approach for Face Detection
using Max Pooling
F.M. Javed Mehedi Shamrat
Department of Software Engineering
Daffodil International University
Dhaka, Bangladesh
javedmehedicom@gmail.com
Md. Masum Billah
Department of Software Engineering
Daffodil International University
Dhaka, Bangladesh
masum.swe.ndc@gmail.com
Md. Alauddin
Department of Computer Science and Engineering
European University of Bangladesh
Dhaka, Bangladesh
alauddin12340@gmail.com
Md. Al Jubair
Department of Computer Science and Engineering
European University of Bangladesh
Dhaka, Bangladesh
jubair@eub.edu.bd
Sovon Chakraborty
Department of Computer Science and Engineering
European University of Bangladesh
Dhaka, Bangladesh
sovonchakraborty2014@gmail.com
Rumesh Ranjan*
Department of Plant Breeding and Genetics
Punjab Agriculture University
Punjab, India
rumeshranjan@pau.edu
Abstract—Deep learning is a trendy term these days, and it
refers to a modern age in machine learning in which algorithms
are taught to identify patterns in vast amounts of data. It mostly
refers to studying various layers of representation, which assists in
the understanding of data that includes text, sound, and pictures.
To interact with the objects in a video series, many researchers use
a form of deep learning called a CNN. Face detection involves
several face-related technologies, such as face authentication,
facial recognition, and face clustering. For identification and
understanding, effective preparation must be carried out. The
standard technique did not produce a positive outcome in terms of
face recognition precision. The objectives of this research are by
using a deep learning model to enhance the accuracy of face
detection. For recognizing faces from datasets, the proposed model
utilizes a deep learning technique named convolutional neural
networks. The proposed work is applied using Max Pooling, a well-
known deep learning process. Our model is trained and validated
using the LFW dataset, which includes 13000 photos collected
from Kaggle. The training accuracy of the model was 95.72%
percent, and the validation accuracy was 96.27%.
Keywords—face detection; CNN; max-pooling; deep learning
I. INTRODUCTION
Facial recognition is a basic step in computer vision and
pattern recognition, as well as a foundational process in face-
related science such as face analysis [1], verification [2], and
tracking [3]. Face recognition has been commonly used in a
variety of fields, including protection screening, after decades of
growth and testing. In the world of video clips, it is rapidly
becoming a science destination. Skin-color recognition and the
SVM classifier [4] are two non-neural network-based face
detectors that are widely used. Face detection using
conventional image feature extraction algorithms is accurate and
fast. To obtain cascade classifiers of various feature forms, Ma
et al. [5] introduced an AdaBoost-based training approach: HOG
for better discrimination, Haar-like. Owing to the existence of so
many poor classifiers, this needs a lot of computation. To
overcome extreme face occlusion, a Bayesian framework-based
algorithm [6] used the Omega shape created by a person's head
and shoulder for head localization. In Automatic Teller
Machines, it performs well in detecting faces with extreme
occlusion, but the scene is small. Mathias et al. [7], in addition
to AdaBoost-based approaches, suggested face identification
using deformable component models (DPM) and obtained
promising performance. However, the computing expense of
this approach is typically high. For detecting faces with
occlusion, another approach focused on DPM is suggested [8].
While it has a poor uniqueness as only face recognition
representations are used in the experiments, it can reduce false-
negative face recognition and identification error rates.
Two 3*3 convolutions are linked in series to substitute a 5*5
convolution in the Inception-V2 Block, and the parameter
number is decreased if the receptive fields are the same [9]. The
paper [10] shows that substitution with global average pooling
(GAP) for the layers of link in the last layers raises the fully
linked layer with AM-Softmax for R-Net. The Marked Faces in
the Wild dataset was suggested by Huang et al., [11], and it
contains all restrictions such as posture, illumination, shoes,
occlusions, and context. There are 13233 images in the dataset,
depicting 5749 persons of different ages. The DeepID3 deep
CNN was proposed by Sun et al. [12]., wherever in recent era
5th International Conference on Trends in Electronics and Informatics (ICOEI 2021)
Tirunelveli, India, 3-5, June 2021
Pre-Print
machine learning and deep learning widely used for various
purposes [13-18]. VGGNet and GoogLeNet are merged in their
convolution neural network. Convolution and genesis layers are
used to construct their architecture. Using the LFW dataset, they
were able to achieve 96% on the test results. The piecewise
affine transformation is often used to derive features from 3D
face rendering. The method was suggested by Taigman et al.,
[19] and is focused on 3D face modeling. This was achieved
utilizing a nine-layer convolutional neural network. On the LFW
dataset, they scored 97.35%. Sun et al. [20] suggested a system
for facial recognition that utilized a convolution neural network.
They used the features from the convolution neural network's
last secret layer. Complementary and over-complementary
images are created by merging different sections of the face. On
the LFW dataset, they scored 97.45%.
The rest of the paper is in the same arrangement. The recent
progress on detection and identification is discussed in this
sector. The analytical technique for the whole system's design is
defined in Section II. In Section III, the outcome of the structure
is investigated. Section IV ends with a study of the findings and
shortcomings, as well as preparations for future work.
II. RESEARCH METHODOLOGY
The human face is a significant force that plays an important
role in our everyday social experiences, such as expressing an
individual's personality. Face recognition is a biometric system
that identifies individuals by using mathematics to erase facial
features and then preserving them as a faceprint. Biometric
facial recognition technology has gained a lot of interest in
recent years owing to a large variety of uses in law enforcement
and other civilian sectors, institutes, and organisations. Thanks
to its non-contact operation, facial recognition technology has a
small benefit over other biometric technologies such as
fingerprint, palmprint, and iris. Without contacting or interacting
with the user, a face recognition device may identify them from
a distance. Furthermore, facial recognition technology assists in
crime detection by preserving the recorded picture in a
repository, which can then be utilized in several ways, such as
recognizing an individual.
Face recognition systems are currently used in social
networking platforms such as Twitter, malls, train stations, bus
stops, heavily protected locations, advertisements, and health
care. The aim of these apps is to eliminate illegal crime, false
verification, and the monitoring of compulsive gamblers in
casinos, while Facebook uses a facial recognition technology for
automated labeling. Wide data sets and complex features are
needed for face recognition in order to uniquely recognize
various subjects by manipulating different challenges such as
lighting, posture, and aging. Facial recognition technologies
have advanced dramatically over the past three years. In the last
decade, there has been a tremendous advancement in the field of
face detection. Most facial recognition technologies now work
best with just a few faces in the picture. Furthermore, these
methods have been put to the test in controlled lighting, with
correct facial poses and non-blurry photographs. Machine
learning [21-24] on edge computing nodes is already gaining
traction, and it's just going to get bigger with time. The whole
suggested system diagram as shown in Fig. 1.
Fig. 1. Proposed System diagram.
A. Dataset Collection:
We used LFW (Face Recognition) dataset
(https://www.kaggle.com/atulanandjha/lfwpeople) for this
study. The dataset contains over 13000 photos, but we used
exactly 13000 images for this study. With the name of the
person, each picture is properly labeled. Our data collection has
also been generated with a total of 104 images.
Fig. 2. Datasets Images Sample.
B. Preprocessing and augmentation of Data :
CNN performs better with the increasing amount of data.
ImageDataGenerator has been used to increase the amount of
data from our existing dataset. Within the ImageDataGenerator
feature, we have enabled zooming, shearing, scaling. Initially,
the images were transformed in 256 X 256 pixels.
C. Proposed Convolution Neural Network (CNN) architecture
For classification and image processing, CNN is used. One
or two convolution layers make up a CNN. Rather than dealing
with the whole picture, CNN tries to identify elements that are
useful inside it. There are several hidden layers in CNN, as well
as an input layer and an output layer. In this analysis, we used a
deep CNN [25-27] with four convolution layers. Convolution is
a technique for merging two mathematical functions to create a
single one. Our CNN model's working process is depicted in Fig.
3.
Fig. 3. Three Convolution Layer with Max pooling operation.
First, images are transformed into 256 X 256 pixels and
transferred to the first convolutional layer in the proposed
architecture. There are 128 hidden layers in all. We transformed
all images into 128 X 128 after running the max-pooling
process. The second convolutional layer then extracts the feature
and again max pooling is applied. The final layer resizes the
images to 32 X 32 pixels. To render measurements simpler,
photos are transformed into numpy arrays. The final step is to
apply a connected layer. We used the Relu activation function in
both of the convolution layers, and the Softmax activation
function in the output layer.Adam stochastic gradient descent is
applied for optimal result finding. The proposed system
algorithm is shown below:
Algorithm of the proposed system:
Input: Image data
Output: Image Classification
Step 1: Input Image data
Step 2: Call Function Conv2D (Add (Number of filters,
matrix = 256 x 256, padding)
Step 3: Activate RELU)
Step 4: Call Function MaxPooling2D (add (data, pool size))
Step 5: Call Function Conv2D (Add (Number of filters,
matrix = 128 x 128, padding)
Step 6: Activate RELU)
Step 7: Call Function MaxPooling2D (add (data, pool size))
Step 8: Call Function Conv2D (Add (Number of filters,
matrix = 64 x 64, padding)
Step 9: Activate RELU)
Step 10: Call Function MaxPooling2D (add (data, pool size))
Step 11: Reshape image, set list
Step 12: Add (Multiply (image, pool size), matrix)
Step 13: Activate Softmax
Step 14: Output image classification
D. Evaluating performance using performance matrix:
We measured the performance using precision, recall, f1-
score, and accuracy after completing the training and testing
phase. The formulas that we used are as follows:
(1)
(2)
(3)
(4)
III. EXPERIMENTAL RESULT ANALYSIS:
Our model can recognize the face of a particular individual
based on their name. The dataset includes 13000 images, with
1680 people's images included. Our model can detect images
with 95.72% accuracy for training data. We've separated training
and testing into 80/20 percentages. The highest level of
validation accuracy is 96.27%. The minimum data loss during
validation is 5.32%. The findings of our training and validation
dataset are shown in Table I.
TABLE I. DEMONSTRATE THE RESULT OF OUR TRAINING AND
VALIDATION DATASET.
Epoch
Training
Loss
Training
Accuracy
Validation
Loss
Validation
Accuracy
1
44.13%
86.24%
13.64%
89.98%
2
12.14%
89.53%
8.32%
91.05%
3
10.42%
90.27%
8.12%
92.53%
4
9.43%
90.79%
8.01%
92.25%
5
9.12%
91.87%
7.63%
93.15%
6
8.87%
92.57%
7.17%
93.89%
7
8.45%
92.89%
6.89%
94.04%
8
8.15%
94.05%
6.77%
95.12%
9
6.23%
94.34%
6.01%
95.53%
10
5.89%
95.72%
5.32%
96.27%
Fig. 4. Validation Accuracy and Training Accuracy for CNN with Max
Pooling Layer.
The max-pooling model outperformed in this study. The
model achieved 95.72% training accuracy and validation
accuracy is 96.27%. The LFW dataset contains 13000 files, with
80% of them being used for training and the remaining 2600
images being used for testing. The confusion matrix is seen
below after the evaluation has been completed on the dataset.
The confusion matrix is depicted in Table II.
TABLE II. CONFUSION MATRIX ON LFW DATASET.
Class
Precision
Recall
F1 - Score
Accuracy
Training Set
et
96.22
93.03%
91.00%
93.28
Testing Set
93.24
93.29%
91.87%
93.42
Later, we used our self-captured selfie photos, with a total of
four classes (1.shamrat, 2.shongkho, 3.masum, 4.jubair). We
have such a minimum of 200 pictures, with 50 images from
each. The confusion matrix is displayed in Table III.
TABLE III. CONFUSION MATRIX AFTER APPLYING MAXPOOLING.
Class
Precision
Recall
F1 - Score
Accuracy
Shamrat
et
96.24
94.79
94.43
96.33
Shongkho
95,89
94.36
94.00
95.78
Masum
95.93
94.77
94.38
96.01
Jubair
94.69
94.23
94.62
94.73
This model is capable of recognizing the face from datasets.
In Fig. 5 showing the detection result of our proposed model.
Fig. 5. Successfully Detection of Faces from Dataset Images.
IV. CONCLUSION AND FUTURE WORK
Face detection is a way of identifying or checking an
individual's identification by utilizing his/her face. For multiple
purposes such as an auto-control attendance control scheme,
monitoring of limited access areas like intruder detection living
areas and public acknowledgement of celebrities, the
recognition of household detainees by a networked home
automation device and much others, face recognition has been
used. In this study, a deep Convolution Neural Networks (CNN)
[28] architecture was used. Our main goal was to recommend a
proper model with high accuracy such that face recognition
could be accurate. For measuring accuracy with a larger dataset,
we would attempt to apply further models to compare with our
proposed model. Our model can be applied to any automated
system [29] or IoT [30-34] embedded device for face
recognition more accurately.
REFERENCES
[1] J. Deng, J. Guo, N. Xue, and S. Zafeiriou, ‘‘ArcFace: Additive angular
margin loss for deep face recognition,’’ in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4690–4699.
[2] D. Chen, C. Xu, J. Yang, J. Qian, Y. Zheng, and L. Shen, ‘‘Joint Bayesian
guided metric learning for end-to-end face verification,’’
Neurocomputing, vol. 275, pp. 560–567, Jan. 2018.
[3] M. H. Khan, J. McDonagh, and G. Tzimiropoulos, ‘‘Synergy between
face alignment and tracking via discriminative global consensus
optimization,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
pp. 3811–3819.
[4] M. Drożdż and T. Kryjak, ‘‘FPGA implementation of multi-scale face
detection using HOG features and SVM classifier,’’ Image Process.
Commun., vol. 21, no. 3, pp. 27–44, Sep. 2016.
[5] C. Ma, N. Trung, H. Uchiyama, H. Nagahara, A. Shimada, and R.-I.
Taniguchi, ‘‘Adapting local features for face detection in thermal image,’’
Sensors, vol. 17, no. 12, p. 2741, Nov. 2017.
[6] T. Zhang, J. Li, W. Jia, J. Sun, and H. Yang, ‘‘Fast and robust occluded
face detection in ATM surveillance,’’ Pattern Recognit. Lett., vol. 107,
pp. 33–40, May 2018.
[7] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, ‘‘Face
detection without bells and whistles,’’ in Proc. Eur. Conf. Comput. Vis.
Springer, 2014, pp. 720–735.
[8] D. Marcetic and S. Ribaric, ‘‘Deformable part-based robust face detection
under occlusion by using face decomposition into face components,’’ in
Proc. 39th Int. Conv. Inf. Commun. Technol., Electron. Microelectron.
(MIPRO), May 2016, pp. 1365–1370.
[9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, ``Going deeper with convolutions,'' in
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp.
1-9.
[10] X. Li, Z. Yang and H. Wu, "Face Detection Based on Receptive Field
Enhanced Multi-Task Cascaded Convolutional Neural Networks," in
IEEE Access, vol. 8, pp. 174922-174930, 2020, doi:
10.1109/ACCESS.2020.3023782.
[11] G. B. Huang, M. Mattar, T. Berg, and E. Learned–Miller, ‘‘Labeled faces
in the wild: A database forstudying face recognition in unconstrained
environments,’’ in Proc. Workshop Faces ‘Real-Life’ Images, Detection,
Alignment, Recognit., Oct. 2008, pp. 1–11.
[12] Y. Sun, D. Liang, X. Wang, and X. Tang, ‘‘DeepID3: Face recognition
with very deep neural networks,’’ Feb. 2015, arXiv:1502.0087. [Online].
Available: https://arxiv.org/abs/1502.00873.
[13] F. M. Javed Mehedi Shamrat, Z. Tasnim, P. Ghosh, A. Majumder and M.
Z. Hasan, "Personalization of Job Circular Announcement to Applicants
Using Decision Tree Classification Algorithm," 2020 IEEE International
Conference for Innovation in Technology (INOCON), Bangluru, India,
2020, pp. 1-5, doi: 10.1109/INOCON50539.2020.9298253.
[14] S. Manlangit, “Novel Machine Learning Approach for Analyzing
Anonymous Credit Card Fraud Patterns,” International Journal of
Electronic Commerce Studies, vol. 10, no. 2, 2019.
[15] F. M. Javed Mehedi Shamrat, P. Ghosh, M. H. Sadek, M. A. Kazi and S.
Shultana, "Implementation of Machine Learning Algorithms to Detect the
Prognosis Rate of Kidney Disease," 2020 IEEE International Conference
for Innovation in Technology (INOCON), Bangluru, India, 2020, pp. 1-7,
doi: 10.1109/INOCON50539.2020.9298026.
[16] P. Ghosh, F. M. Javed Mehedi Shamrat, S. Shultana, S. Afrin, A. A.
Anjum and A. A. Khan, "Optimization of Prediction Method of Chronic
Kidney Disease Using Machine Learning Algorithm," 2020 15th
International Joint Symposium on Artificial Intelligence and Natural
Language Processing (iSAI-NLP), Bangkok, Thailand, 2020, pp. 1-6, doi:
10.1109/iSAI-NLP51646.2020.9376787.
[17] K. Mahmud, S. Azam, A. Karim, S. Zobaed, B. Shanmugam, and D.
Mathur, “Machine Learning Based PV Power Generation Forecasting in
Alice Springs,” IEEE Access, pp. 1–1, 2021.
[18] F.M. Javed Mehedi Shamrat, Md. Asaduzzaman, A.K.M. Sazzadur
Rahman, Raja Tariqul Hasan Tusher, Zarrin Tasnim “A Comparative
Analysis of Parkinson Disease Prediction Using Machine Learning
Approaches” International Journal of Scientific & Technology Research,
Volume 8, Issue 11, November 2019, ISSN: 2277-8616, pp: 2576-2580.
[19] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, ‘‘DeepFace: Closing the
gap to human-level performance in face verification,’’ in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1701–1708.
[20] Y. Sun, X. Wang, and X. Tang, ‘‘Deep learning face representation from
predicting 10,000 classes,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2014, pp. 1891–1898.
[21] F. M. Javed Mehedi Shamrat, Md. Abu Raihan, A.K.M. Sazzadur
Rahman, Imran Mahmud, Rozina Akter, “An Analysis on Breast Disease
Prediction Using Machine Learning Approaches” International Journal of
Scientific & Technology Research, Volume 9, Issue 02, February 2020,
ISSN: 2277-8616, pp: 2450-2455.
[22] P. Ghosh et al., "Efficient Prediction of Cardiovascular Disease Using
Machine Learning Algorithms With Relief and LASSO Feature Selection
Techniques," in IEEE Access, vol. 9, pp. 19304-19326, 2021, doi:
10.1109/ACCESS.2021.3053759.
[23] A.K.M Sazzadur Rahman, F. M. Javed Mehedi Shamrat, Zarrin Tasnim,
Joy Roy, Syed Akhter Hossain “A Comparative Study on Liver Disease
Prediction Using Supervised Machine Learning Algorithms”
International Journal of Scientific & Technology Research, Volume 8,
Issue 11, November 2019, ISSN: 2277-8616, pp: 419-422.
[24] F. M. Javed Mehedi Shamrat, Zarrin Tasnim, Imran Mahmud, Ms. Nusrat
Jahan, Naimul Islam Nobel, “Application Of K-Means Clustering
Algorithm To Determine The Density Of Demand Of Different Kinds Of
Jobs”, International Journal of Scientific & Technology Research,
Volume 9, Issue 02, February 2020, ISSN: 2277-8616, pp: 2550-2557.
[25] A. Karim, S. Azam, B. Shanmugam, and K. Kannoorpatti, “Efficient
Clustering of Emails Into Spam and Ham: The Foundational Study of a
Comprehensive Unsupervised Framework,” IEEE Access, vol. 8, pp.
154759–154788, 2020.
[26] P. Ghosh et al., “A Comparative Study of Different Deep Learning Model
for Recognition of Handwriting Digits,” International Conference on IoT
Based Control Networks and Intelligent Systems (ICICNIS 2020), pp. 857
– 866, January 19, 2021.
[27] M. F. Foysal, M. S. Islam, A. Karim, and N. Neehal, “Shot-Net: A
Convolutional Neural Network for Classifying Different Cricket Shots,”
Communications in Computer and Information Science, pp. 111–120,
2019.
[28] Junayed M.S., Jeny A.A., Neehal N., Atik S.T., Hossain S.A. (2019) A
Comparative Study of Different CNN Models in City Detection Using
Landmark Images. In: Santosh K., Hegadi R. (eds) Recent Trends in
Image Processing and Pattern Recognition. RTIP2R 2018.
Communications in Computer and Information Science, vol 1035.
Springer, Singapore. https://doi.org/10.1007/978-981-13-9181-1_48.
[29] Biswas A., Chakraborty S., Rifat A.N.M.Y., Chowdhury N.F., Uddin J.
(2020) Comparative Analysis of Dimension Reduction Techniques Over
Classification Algorithms for Speech Emotion Recognition. In: Miraz
M.H., Excell P.S., Ware A., Soomro S., Ali M. (eds) Emerging
Technologies in Computing. iCETiC 2020. Lecture Notes of the Institute
for Computer Sciences, Social Informatics and Telecommunications
Engineering, vol 332. Springer, Cham. https://doi.org/10.1007/978-3-
030-60036-5_12.
[30] Javed Mehedi Shamrat F.M., Allayear S.M., Alam M.F., Jabiullah M.I.,
Ahmed R. (2019) A Smart Embedded System Model for the AC
Automation with Temperature Prediction. In: Singh M., Gupta P., Tyagi
V., Flusser J., Ören T., Kashyap R. (eds) Advances in Computing and
Data Sciences. ICACDS 2019. Communications in Computer and
Information Science, vol 1046. Springer, Singapore.
https://doi.org/10.1007/978-981-13-9942-8_33
[31] F. M. Javed Mehedi Shamrat, Zarrin Tasnim, Naimul Islam Nobel, and
Md. Razu Ahmed. 2019. An Automated Embedded Detection and Alarm
System for Preventing Accidents of Passengers Vessel due to Overweight.
In Proceedings of the 4th International Conference on Big Data and
Internet of Things (BDIoT'19). Association for Computing Machinery,
New York, NY, USA, Article 35, 1–5.
DOI:https://doi.org/10.1145/3372938.3372973
[32] Shamrat F.M.J.M., Nobel N.I., Tasnim Z., Ahmed R. (2020)
Implementation of a Smart Embedded System for Passenger Vessel
Safety. In: Saha A., Kar N., Deb S. (eds) Advances in Computational
Intelligence, Security and Internet of Things. ICCISIoT 2019.
Communications in Computer and Information Science, vol 1192.
Springer, Singapore. https://doi.org/10.1007/978-981-15-3666-3_29.
[33] A. Islam Chowdhury, M. Munem Shahriar, A. Islam, E. Ahmed, A.
Karim, and M. Rezwanul Islam, “An Automated System in ATM Booth
Using Face Encoding and Emotion Recognition Process,” 2020 2nd
International Conference on Image Processing and Machine Vision, 2020.
[34] F.M. Javed Mehedi Shamrat, Shaikh Muhammad Allayear and Md. Ismail
Jabiullah "Implementation of a Smart AC Automation System with Room
Temperature Prediction ", Journal of the Bangladesh Electronic Society,
Volume 18, Issue 1-2, June-December 2018, ISSN: 1816-1510, pp: 23-
32.