ArticlePDF Available

Computer Vision in Healthcare Applications

  • Tiangong University
Computer Vision in Healthcare Applications
Junfeng Gao ,
Yong Yang ,
Pan Lin,
and Dong Sun Park
College of Biomedical Engineering, South-Central University for Nationalities, Wuhan 430074, China
Key Laboratory of Cognitive Science, State Ethnic Aairs Commission, Wuhan 430074, China
Hubei Key Laboratory of Medical Information Analysis and Tumor Diagnosis & Treatment, Wuhan 430074, China
School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330032, China
IT Convergence Research Center, Chonbuk National University, Jeonju, Jeonbuk 54896, Republic of Korea
Correspondence should be addressed to Yong Yang;
Received 27 December 2017; Accepted 28 December 2017; Published 4 March 2018
Copyright © 2018 Junfeng Gao et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The research of computer vision, imaging processing and
pattern recognition has made substantial progress during
the past several decades. Also, medical imaging has
attracted increasing attention in recent years due to its
vital component in healthcare applications. Investigators
have published a wealth of basic science and data docu-
menting the progress and healthcare application on medical
imaging. Since the development of these research elds has
set the clinicians to advance from the bench to the bedside,
the Journal of Healthcare Engineering set out to publish this
special issue devoted to the topic of advanced computer
vision methods for healthcare engineering, as well as review
articles that will stimulate the continuing eorts to
understand the problems usually encountered in this eld.
The result is a collection of fteen outstanding articles
submitted by investigators.
Following the goal of special issue, we identify four major
domains covered by the papers. The rst is medical image
analysis for healthcare, the second is the computer vision
for predictive analytics and therapy, the third is fundamental
algorithms for medical images, and the last one focuses on
the machine learning algorithms for medical images. Here,
we give the review of these published papers.
1. Analysis of Medical Image
This theme attempts to address the improvement and new
techniques on the analysis methods of medical image. First,
integration of multimodal information carried out from
dierent diagnostic imaging techniques is essential for a
comprehensive characterization of the region under exam-
ination. Therefore, image coregistration has become crucial
both for qualitative visual assessment and for quantitative
multiparametric analysis in research applications. S. Monti
et al. in Italy An Evaluation of the Benets of Simulta-
neous Acquisition on PET/MR Coregistration in Head/
Neck Imagingcompare and assess the performance
between the traditional coregistration methods applied to
PET and MR acquired as single modalities and the obtained
results with the implicitly coregistration of a hybrid PET/
MR, in complex anatomical regions such as the head/neck
(HN). The experimental results show that hybrid PET/MR
provides a higher registration accuracy than the retrospec-
tively coregistered images.
The feature extraction is one of the key issues for the
analysis of medical images. I. I. Esener et al. in Turkey
A New Feature Ensemble with a Multistage Classication
Scheme for Breast Cancer Diagnosisdevelop a new and
eective feature ensemble with a multistage classication
which is used in a computer-aided diagnosis (CAD) sys-
tem for breast cancer diagnosis. In this new method, four
features, the local conguration pattern-based, statistical,
and frequency domain features were concatenated as feature
vectors, and eight well-known classiers are used in a mul-
tistage classication scheme. High classication accuracy
was obtained, and it shows that the proposed multistage
classication scheme is more eective than the single-
stage classication for breast cancer diagnosis.
Journal of Healthcare Engineering
Volume 2018, Article ID 5157020, 4 pages
Currently, the traditional approach to reduce colorectal
cancer-related mortality is to perform regular screening in
search for polyps, which results in polyp miss rate and
inability to perform visual assessment of polyp malig-
nancy. D. Vazquez et al. in Spain and Canada A Benchmark
for Endoluminal Scene Segmentation of Colonoscopy
Imagespropose an extended benchmark of colonoscopy
image segmentation and establish a new strong benchmark
for colonoscopy image analysis. By training a standard fully
convolutional networks (FCN), they show that in endolum-
inal scene segmentation, the performance of FCN is better
than the result of the prior researches.
2. Computer Vision for Predictive Analytics
and Therapy
Computer vision technique has shown great application
in surgery and therapy of some diseases. Recently, three-
dimensional (3D) modeling and rapid prototyping tech-
nologies have driven the development of medical imaging
modalities, such as CT and MRI. P. Gargiulo et al. in
Iceland New Directions in 3D Medical Modeling: 3D-
Printing Anatomy and Functions in Neurosurgical Planning
combine CT and MRI images with DTI tractography and
use image segmentation protocols to 3D model the skull
base, tumor, and ve eloquent ber tracts. The authors
provide a great potential therapy approach for advanced
neurosurgical preparation.
The elderly is easy to fall and it will harm the body and
accordingly has serious negative mental impacts on them.
T.-H. Lin et al. in Taiwan Fall Prevention Shoes Using
Camera-Based Line-Laser Obstacle Detection Systemdesign
an interesting line-laser obstacle detection system to prevent
the elderly from falls. In the system, a laser line passes
through a horizontal plane and has a specic height to the
ground, and optical axis in a camera has a specic inclined
angle to the plane, and hence, the camera can observe the
laser pattern to obtain the potential obstacles. Unfortunately,
this system designed is useful mainly for indoor applications
instead of outdoor environment.
Human activity recognition (HAR) is one of the
widely studied computer vision problem. S. Zhang et al.
in China A Review on Human Activity Recognition
Using Vision-Based Methodintroduce an overview of
various HAR approaches as well as their evolutions with
the representative classical literatures. The authors highlight
the advances of image representation approaches and
classication methods in vision-based activity recognition.
Representation approaches generally include global repre-
sentations, local representations, and depth-based represen-
tations. They accordingly divide and describe the human
activities into three levels including action primitives,
actions/activities, and interactions. Also, they summarize
the classication techniques in HAR application which
include 7 types of method from the classic DTW and the
newest deep learning. Lastly, they address that applying these
current HAR approaches in real-world systems or applica-
tions has great challenge although up to now recent HAR
approaches have achieved great success. Also, three future
directions are recommended in their work.
3. Fundamental Algorithms for Medical Images
The majority of this issue focuses on the research of
improved algorithm for medical images. Organ segmentation
is a prerequisite for CAD systems. In fact, the segmentation
algorithm is the most important and basic for image process-
ing and also enhance the level of disease prediction and ther-
apy. C. Pan et al. in China Leukocyte Image Segmentation
Using Novel Saliency Detection Based on Positive Feedback
of Visual Perceptionuse the ensemble of polyharmonic
extreme learning machine (EPELM) and positive feedback
of perception to detect salient objects, which is totally data-
driven without any prior knowledge and labeled samples
compared with the existed algorithms. A positive feedback
module based on EPELM focuses on xation area for the
purpose of intensifying objects, inhibiting noises, and pro-
moting saturation in perception. Experiments on several
standard image databases show that the novel algorithm
outperforms the conventional saliency detection algorithms
and also segments nucleated cells successfully in dierent
imaging conditions.
High-intensity focused ultrasound (HIFU) has been
proposed for the safe ablation of both malignant and
benign tissues and as an agent for drug delivery, while
MRI has been proposed for guidance and monitoring for
the therapy. A. Vargas-Olivares et al. in México and Canada
Segmentation Method for Magnetic Resonance-guided
High-Intensity Focused Ultrasound Therapy Planningused
the MR images for the HIFU therapy planning and propose
an ecient segmentation approach. The segmentation
scheme uses the watershed method to identify the regions
found on the HIFU treatment. In addition, the authors also
propose a thread pool strategy, in order to reduce the compu-
tational overload of the processing time of the group of MR
images and the segmentation algorithm.
Recently, random walkers (RW) have attracted a growing
interest to process segmentation of medical images. How-
ever, classical RW method needs a long computation time
and a high memory usage because of the construction of
corresponding large-scale graph to solve the resulting sparse
linear system. C. Dong et al. in China and USA An
Improved Random Walker with Bayes Model For Volu-
metric Medical Image Segmentationincorporate the prior
(shape and intensity) knowledge in the optimization of
sparse linear system. Integrating the Bayes model into the
RW sparse system, the organ is automatically segmented
for the adjacent slice, which is called RWBayes algorithm in
the article. Compared with the conventional RW and the
state-of-the-art interactive segmentation methods, their
method can signicantly improve the segmentation accuracy
and could be extended to segment other organs in the future.
Automatic segmentation of the spinal cord in MR images
remains a dicult task. C.-C. Liao et al. in Taiwan Atlas-
Free Cervical Spinal Cord Segmentation on Midsagittal
T2-Weighted Magnetic Resonance Imagespresent an auto-
matic segmentation method on sagittal T2-weighted images.
2 Journal of Healthcare Engineering
The method is atlas-free, in which expectation maximization
algorithm is used to cluster the pixels on a midsagittal MR
image according to their gray levels or SIs. Dynamic pro-
gramming is used to detect anatomical structures and their
edges. The detection of the anterior and posterior edges of
the spinal cord within the cervical spinal canal is nally
successful in all 79 images, showing its high accuracy and
robustness. Based on this proposed algorithm, using alone
or combining with others, one can develop a computer-
aided diagnosis system with massive screening on cervical
spine diseases. Finally, the authors point out several limita-
tions in the algorithm, such as its inability to be applied to
lower lumbar spinal levels.
The misalignments originated from motion and defor-
mation often result in errors in estimating an apparent dif-
fusion coecient (ADC) map tted with prostate DWI, and
the ADC map is an important indicator in diagnosing pros-
tate cancer. Until now, there are few studies that focus on
this misalignment in prostate DWI. L. Hao et al. in China
Nonrigid Registration of Prostate Diusion-Weighted
MRapply ane transformation to DWI to correct intraslice
motions. Then, nonrigid registration based on free-form
deformation (FFD) is used to compensate for intraimage
deformations. The experimental results show that the pro-
posed algorithm can correct the misalignment of prostate
DWI and decrease the artifacts of ROI in the ADC maps.
These ADC maps thus obtain sharper contours of lesions,
which are helpful for improving the diagnosis and clinical
staging of prostate cancer.
Medical ultrasound is widely used in the diagnosis and
assessment of internal body structures and also plays a key
role in treating various diseases due to its safety, noninvasion,
and well tolerance in patients. However, the images are
always contaminated with speckle noise and hence hinder
the identication of image details. Currently, many methods
have been proposed to remove the noise and preserve the
image details at the same time. M. Szczepański and K. Radlak
in Poland Digital Path Approach Despeckle Filter for Ultra-
sound Imaging and Videopropose a so-called escaping
paths based on traditional digital paths, and also, they
extend this concept from the spatial domain (2D) to the
spatiotemporal domain (3D) that is designed for multipli-
cative noise suppression, specically for ultrasound image
and video ltering. In addition, the extended neighbor-
hood model is used to increase the lter denoising ability,
which is based on von Neumann concept derived from
cellular automata theory. The experimental results prove
that the proposed removal technique outperforms the
state-of-the-art approach for multiplicative noise removal
with lower computational overload which enables one to
complete image processing tasks and image enhancement
of video streams in a real-time environment.
A primary challenge in accelerating MR imaging is how
to reconstruct high-resolution images from undersampled
k-space data. There is a trade-obetween the spatial resolu-
tion and temporal resolution. J. Chen et al. in China Low-
Rank and Sparse Decomposition Model for Accelerating
Dynamic MRI Reconstructionintroduce a low-rank and
sparse decomposition model to resolve this problem, which
is based on the theory of robust principal component
analysis (RPCA). Unlike k-t RPCA (a method that uses
the low-rank plus sparse decomposition prior to reconstruc-
tion of dynamic MRI from part of the k-space measure-
ments), the authors propose inexact augmented Lagrangian
method (IALM) to solve the optimization of RPCA and
to accelerate the dynamic MRI reconstruction from highly
undersampled k-space data, which has a generalized for-
mulation capability of separating dynamic MR data into
low-rank and sparse component. The experimental results
on cardiac datasets prove that the proposed method can
achieve more satisfactory reconstruction performance and
faster reconstruction speed, compared with the state-of
the-art reconstruction methods.
4. Machine Learning Algorithms for
Medical Images
The growth of the older adult population in the world is
surprising and it will have a great impact on the healthcare
system. The elders always lack self-care ability and hence,
healthcare and nursing robot draw much attention in recent
years. Although somatosensory technology has been intro-
duced into the activity recognition and healthcare interaction
of the elderly, traditional detection method is always in a
single modal. In order to develop an ecient and convenient
interaction assistant system for nurses and patients with
dementia, X. Dang et al. in China An Interactive Care
System Based on a Depth Image and EEG for Aged Patients
with Dementiapropose two novel multimodal sparse auto-
encoder frameworks based on motion and mental features.
First, the motion is extracted after the preprocessing of depth
image and then EEG signals as the mental feature is recorded.
The proposed novel system is designed to be based on the
multimodal deep neural networks for the patient with
dementia with special needs. The input features of the
networks include (1) extracted motion features based on
the depth image sensor and (2) EEG features. The output
layer is the type recognition of the patients help requirement.
Experimental results show that the proposed algorithm
simplies the process of the recognition and achieved
96.5% and 96.4% (accuracy and recall rate), respectively, for
the shued dataset, and 90.9% and 92.6%, respectively, for
the continuous dataset. Also, the proposed algorithms sim-
plify the acquisition and data processing under high action
recognition ratio compared with the traditional method.
Recently, deep learning has become very popular in
articial intelligence. Q. Song et al. in China Using Deep
Learning for Classication of Lung Nodules on Computed
Tomography Imagesemploy a convolution neural network
(CNN), deep neural network (DNN), and stacked autoenco-
der (SAE) for the early diagnosis of lung cancer to doctors.
The experimental results suggest that the CNN archived the
best performance than DNN and SAE.
N. D. Kamarudin et al. in Malaysia and Japan A
Fast SVM-Based Tongues Colour Classication Aided by
k-Means Clustering Identiers and Colour Attributes as
Computer-Assisted Tool for Tongue Diagnosispropose a
two-stage classication system for tongue color diagnosis
3Journal of Healthcare Engineering
aided with the devised clustering identiers, and it can
diagnose three tongue colors: red, light red, and deep red.
The diagnosis system is very useful for the early detection
of imbalance condition inside the body. Experimental result
shows that this novel classication system outperforms the
conventional SVM by 20% computational time and 15% in
terms of classication accuracy.
5. Conclusion
These authors highlight both the promise and the challenges
faced by this healthcare application eld of medical images.
Their researches identify the critical need for clinical and
theory prospective of medical images. This special issue
brings about various new developments in computer vision
about medical images and clinical application. In summary,
this special issue provides a snapshot of the computer vision
in healthcare applications on medical images across the
globe. Hopefully, this publication will provide a good refer-
ence for future computer vision, analysis algorithms, and
machine learning of medical images. However, there are still
some key messages that emerge from the papers compiled
within this special issue: there still remain limitation and
challenge for computer vision and various algorithms and
processing techniques of medical images although these
works show good eciency than traditional and state-of-art
methods. We hope that this theme issue will further advance
our understanding of computer vision about medical image
processing and healthcare applications and pave the way
for new directions in medical images and computer vision
research across health and disease.
This work was supported by the National Nature Science
Foundation of China (61773408, 81271659, 61662026, and
61473221). The guest editors are very thankful to all the
anonymous reviewers of the journal and the perseverant
and generous support of the editor in chief.
Junfeng Gao
Yong Yang
Pan Lin
Dong Sun Park
4 Journal of Healthcare Engineering
... Video analytics has grown substantially from practical needs in the past decade, driven by the revival of artificial intelligence (AI) and the advancement of deep learning. AI-enabled video analytics applications are being widely deployed across different fields such as smart transportation [1], security surveillance systems [2], and online healthcare [3]. The main goal is to automatically recognize temporal-spatial events in videos and gain valuable insights. ...
... Particularly, the parameters µ where the first term in the right-hand side of (2) represents the communication cost of transmittinĝ z (k) t by utilizingv (k) t as the side information, and the second term corresponds to the cost to transmit the hyper latentv (k) t . With the above preparation, we obtain a tractable formulation in the following as (3). Minimizing the loss function (3) is equivalent to minimizing an upper bound of (1). ...
With the development of artificial intelligence (AI) techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video analytics, but the device-server communications will remain a bottleneck due to the limited bandwidth. This paper proposes a task-oriented communication framework for edge video analytics, where multiple devices collect the visual sensory data and transmit the informative features to an edge server for processing. To enable low-latency inference, this framework removes video redundancy in spatial and temporal domains and transmits minimal information that is essential for the downstream task, rather than reconstructing the videos at the edge server. Specifically, it extracts compact task-relevant features based on the deterministic information bottleneck (IB) principle, which characterizes a tradeoff between the informativeness of the features and the communication cost. As the features of consecutive frames are temporally correlated, we propose a temporal entropy model (TEM) to reduce the bitrate by taking the previous features as side information in feature encoding. To further improve the inference performance, we build a spatial-temporal fusion module at the server to integrate features of the current and previous frames for joint inference. Extensive experiments on video analytics tasks evidence that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
... We believe another key factor for deep learning's recent success is the availability and facility of deep learning frameworks such as TensorFlow [9], PyTorch [10] and Apache MXNet [11], which enabled deep learning to become more accessible to the broader research community. The recent advancements in deep learning methodologies for computer vision tasks have yielded momentous technologies in many domains such as intelligent transportation systems [12,13], sports analytics [14,15] and medical imaging [16,17]. These advancements are not restricted to RGB imagery, with advances in infrared [18] and hyperspectral imagery [19]. ...
Full-text available
Fully-supervised object detection and instance segmentation models have accomplished notable results on large-scale computer vision benchmark datasets. However, fully-supervised machine learning algorithms’ performances are immensely dependent on the quality of the training data. Preparing computer vision datasets for object detection and instance segmentation is a labor-intensive task requiring each instance in an image to be annotated. In practice, this often results in the quality of bounding box and polygon mask annotations being suboptimal. This paper quantifies empirically the ground truth annotation quality and COCO’s mean average precision (mAP) performance by introducing two separate noise measures, uniform and radial, into the ground truth bounding box and polygon mask annotations for the COCO and Cityscapes datasets. Mask-RCNN models are trained on various levels of noise measures to investigate the performance of each level of noise. The results showed degradation of mAP as the level of both noise measures increased. For object detection and instance segmentation respectively, using the highest level of noise measure resulted in a mAP degradation of 0.185 & 0.208 for uniform noise with reductions of 0.118 & 0.064 for radial noise on the COCO dataset. As for the Cityscapes datasets, reductions of mAP performance of 0.147 & 0.142 for uniform noise and 0.101 & 0.033 for radial noise were recorded. Furthermore, a decrease in average precision is seen across all classes, with the exception of the class motorcycle. The reductions between classes vary, indicating the effects of annotation uncertainty are class-dependent.
... A positive result from the second level triggers a third level, and so on [7]. Object detection is one of the vital research topics as it used wide range of application such as intelligent surveillance [8,9], home automation [10], manufacturing [11], and healthcare [12]. There is other common application like face detection [13], character recognition [14], autonomous driving [15], hand gesture recognition [16], etc. ...
Full-text available
Detecting small objects using computer vision is a challenging task due to their small size in the image and therefore the lack of features when describing them. In this paper, a computer was trained to detect three small balls using 20 levels of the AdaBoost cascade classifier. The features of the balls in each level are described using the HOG feature descriptor. Three balls were recorded in practice at various distances (d = 2, 3, 4, ..., 10 m) from the camera and the targets (balls). The frames are then taken from the videos and resized using five magnification factors (RS = 1, 3, 5, 7, and 9) to make the balls seem as they should. According to the results, the detection rate of balls at all distances was 80% when using the magnification factor RS = 1, 90% when using the magnification factor RS = 3, 5, and 7, and 100% when using the magnification factor RS = 9. The suggested approach was also used in calculating the height and width of the detected balls. The overall results indicated that the height and width of the balls dwindle as the distance between the camera and the targets increases.
... Computer Vision is an application of Artificial Intelligence (AI) focused on implementing human-like cognition and visual processing in computer systems. It has extensive applications, including autonomous driving [1], surveillance systems [2], agriculture [3] and healthcare [4]. Computer vision consists of tasks such as classification, where an image is classified into one of two or more classes; segmentation, where regions of interest are extracted from an image; tracking, where objects of interest are tracked across a video. ...
Full-text available
Object detection is a computer vision task that involves localisation and classification of objects in an image. Video data implicitly introduces several challenges, such as blur, occlusion and defocus, making video object detection more challenging in comparison to still image object detection, which is performed on individual and independent images. This paper tackles these challenges by proposing an attention-heavy framework for video object detection that aggregates the disentangled features extracted from individual frames. The proposed framework is a two-stage object detector based on the Faster R-CNN architecture. The disentanglement head integrates scale, spatial and task-aware attention and applies it to the features extracted by the backbone network across all the frames. Subsequently, the aggregation head incorporates temporal attention and improves detection in the target frame by aggregating the features of the support frames. These include the features extracted from the disentanglement network along with the temporal features. We evaluate the proposed framework using the ImageNet VID dataset and achieve a mean Average Precision (mAP) of 49.8 and 52.5 using the backbones of ResNet-50 and ResNet-101, respectively. The improvement in performance over the individual baseline methods validates the efficacy of the proposed approach.
... At present, image processing, computer vision, and artificial intelligence (AI) have been extensively used in medical imaging and digital health applications [12][13][14][15][16][17][18] due to their excellent performance on image classification and target detection [19,20]. These techniques are used in various medical diagnostic applications such as lung nodules classification based on CT images [21], heart rhythm monitoring [22], brain tumor classification from MRI images [23], and breast cancer detection using histopathology images [24]. ...
Full-text available
Starting from December 2019, the global pandemic of coronavirus disease 2019 (COVID-19) is continuously expanding and has caused several millions of deaths worldwide. Fast and accurate diagnostic methods for COVID-19 detection play a vital role in containing the plague. Chest computed tomography (CT) is one of the most commonly used diagnosis methods. However, a complete CT-scan has hundreds of slices, and it is time-consuming for radiologists to check each slice to diagnose COVID-19. This study introduces a novel method for fast and automated COVID-19 diagnosis using the chest CT scans. The proposed models are based on the state-of-the-art deep convolutional neural network (CNN) architecture, and a 2D global max pooling (globalMaxPool2D) layer is used to improve the performance. We compare the proposed models to the existing state-of-the-art deep learning models such as CNN based models and vision transformer (ViT) models. Based off of metric such as area under curve (AUC), sensitivity, specificity, accuracy, and false discovery rate (FDR), experimental results show that the proposed models outperform the previous methods, and the best model achieves an area under curve of 0.9744 and accuracy 94.12% on our test datasets. It is also shown that the accuracy is improved by around 1% by using the 2D global max pooling layer. Moreover, a heatmap method to highlight the lesion area on COVID-19 chest CT images is introduced in the paper. This heatmap method is helpful for a radiologist to identify the abnormal pattern of COVID-19 on chest CT images. In addition, we also developed a freely accessible online simulation software for automated COVID-19 detection using CT images. The proposed deep learning models and software tool can be used by radiologist to diagnose COVID-19 more accurately and efficiently.
Objectives: Although the development of artificial intelligence (AI) technologies in medicine has been significant, their application to paediatric anaesthesia is not well characterised. As the paediatric operating room is a data-rich environment that requires critical clinical decision-making, this systematic review aims to characterise the current use of AI in paediatric anaesthesia and to identify barriers to the successful integration of such technologies. Methods: This review was registered with PROSPERO (CRD42022304610), the international registry for systematic reviews. The search strategy was prepared by a librarian and run in five electronic databases (Embase, Medline, Central, Scopus, and Web of Science). Collected articles were screened by two reviewers. Included studies described the use of AI for paediatric anaesthesia (
Full-text available
The sanitizer dispensers are too useful in a variety of settings to sterilize and protect people from illness, including patients and medical staff in hospitals, teachers, and students in schools or universities, and so on especially in coronavirus. The hand sanitizer dispenser mobile robot is one of the medical applications that is used to detect people who want to sanitize their hands. The object detection-based deep learning technique helps to recognize humans. In this paper, two algorithms based on the EfficientDet deep learning using a camera vision technique for a touchless hand sanitizer mobile robot are proposed. Each algorithm has been 80% trained dataset and 20% validation dataset. The AP is equal to 96% for algorithm-1 and 94% for algorithm-2. In algorithm-2 the first person can sanitize his/her hands and the second person also sanitize his/her hands after the first person. While in algorithm-1 just only one person can sanitize his/her hands. The loss errors of validation and the training of the proposed EfficientDet algorithm-2 are better than the loss errors of the proposed EfficientDet algorithm-2. The proposed EfficientDet algorithm-2 has two classes but the proposed EfficientDet of algorithm-1 has 7 classes. The two proposed algorithms have good AP results. The training time of algorithm-1 = 6 hours while the training time of algorithm-2 = 1 hour. The other proposal step includes changing the input-equal image size to a non-equal size, while the second proposal step presents a new flowchart for a mobile robot hand sanitization application which will be too useful for the COVID-19 pandemic to sanitize each person's hand in public and closed places. Dataset has been created and prepared using image processing and a graphical tool called as LabelImg. The Dataset images were taken using a vision-based Raspberry Pi version 4 mini-computer. Then the dataset has been trained by the EfficientDet algorithm. The Python 3.7.4 programing language has been used. This proposed method has very good results, an efficient and scalable technique for hand sanitization person with a mobile robot application. In future work, the full version of the hardware implantation will be done where the mobile robot will recognize the people using a deep learning approach and move to the person who wants to sanitize her/his hand.
ResearchGate has not been able to resolve any references for this publication.