Research ProposalPDF Available

System for detecting social distance during COVID-19 using YOLOv3 and OpenCV

Authors:

Abstract

A virus called COVID-19 spreads between people in close proximity via minute droplets created through talking, sneezing, coughing, and most commonly by inhalation. Many people have died as a result of the pandemic's severe respiratory infection, which is still present today. You can reduce your risk of contracting COVID-19 by avoiding physical contact with others. This study suggests a real-time AI framework for people detection, monitoring social distance violations, and categorising people's social distances based on live video feeds. In this study, YOLOv3 was suggested for object detection. Its straightforward neural network architecture makes it appropriate for embedded devices that are reasonably priced. Comparing the suggested model to other real-time detection methods, it is a better choice. Additionally, with the aid of OpenCV, an open-source toolkit for computer vision, machine learning, and image processing. The major purpose of the image processing feature is to enhance the image quality so that the AI detection system would accurately recognise human movement. Computer vision is used to analyse photos and videos.The final iteration of the prototype algorithm has been put to use in low-cost CCTV Cameras made up of fixed cameras that are placed in any public area where large crowds used to congregate. The suggested method is appropriate for a surveillance system in sustainable smart cities for people detection, social distance classification, and tracking social distance violations. This will make it easier for the government to understand how people who are socially isolated are doing.
System for detecting social distance during
COVID-19 using YOLOv3 and OpenCV
Barnabas Zakariya
Computer science and Engineering
GH Raisoni College of Engineering
Nagpur, India
Barnazaka@gmail.com
Abstract—A virus called COVID-19 spreads between people
in close proximity via minute droplets created through talking,
sneezing, coughing, and most commonly by inhalation. Many
people have died as a result of the pandemic’s severe respiratory
infection, which is still present today. You can reduce your
risk of contracting COVID-19 by avoiding physical contact with
others. This study suggests a real-time AI framework for people
detection, monitoring social distance violations, and categorising
people’s social distances based on live video feeds. In this study,
YOLOv3 was suggested for object detection. Its straightforward
neural network architecture makes it appropriate for embedded
devices that are reasonably priced. Comparing the suggested
model to other real-time detection methods, it is a better choice.
Additionally, with the aid of OpenCV, an open-source toolkit for
computer vision, machine learning, and image processing. The
major purpose of the image processing feature is to enhance the
image quality so that the AI detection system would accurately
recognise human movement. Computer vision is used to analyse
photos and videos.The final iteration of the prototype algorithm
has been put to use in low-cost CCTV Cameras made up of
fixed cameras that are placed in any public area where large
crowds used to congregate. The suggested method is appropriate
for a surveillance system in sustainable smart cities for people
detection, social distance classification, and tracking social dis-
tance violations. This will make it easier for the government to
understand how people who are socially isolated are doing.
Index Terms—COVID-19, AI, Machine Learning, YOLOv3
I. INTRODUCTION
A novel coronavirus infection is the cause of the acute
respiratory infectious illness COVID-19 [1]. The major symp-
toms are a fever, a dry cough, exhaustion, etc. Nasal con-
gestion, runny nose, diarrhoea, and other symptoms of the
upper respiratory tract and digestive system are present in
a small percentage of individuals. After a week, severe pa-
tients frequently experience respiratory problems, and they
quickly advance to irreversible metabolic acidosis, coagula-
tion malfunction, and multiple organ failure. More than 6.58
million people have died as a result of COVID-19 up to this
point in several nations throughout the world. To stop the
virus from spreading, numerous areas have now implemented
policies including limiting traffic, Wearing face mask, using
hand sanitizer frequently and cancelling significant events.
The next stage is to figure out how to prevent the virus
from spreading as much as possible in a regular setting. By
giving consistent information from health care officials, the
health system makes it simple for patients to prevent the
infection. Any unexpected sharper rise and quick increase
in the infection rate will result in a failure of health care
services and, as a result, an increase in the number of deaths.
The aim of adhering to social distancing recommendations
is to limit the transmission of the virus among persons [2,
3]. Although certain vaccinations [4] have been created to
combat the virus’s transmission, the most effective method
is to keep a safe social distance between pedestrians. Staying
away from large crowds and preserving a 6 foot gap from
each individual—roughly the length of a body—is what social
distance entails. Isolation and quarantine are not the same
as social distance. The government uses social distance as
a preventative strategy for everyone. Those who have been
affected or are suspected of being afflicted with infectious
illnesses must be isolated in a ward and cared for by specially
trained medical professionals. Persons who have been exposed
to infectious people but have not yet got the disease are
quarantined. This means that good pedestrian detection and
distance measuring technologies can aid in the control of
COVID-19 transmission. In a public setting, the most often
used pedestrian detection approach is based on a computer
vision solution [5]. Pedestrian recognition and social distance
assessment may be accomplished easily and affordably using
current public area security cameras. In comparison to systems
relying on mobile devices such as GPS sensors, computer
vision-based pedestrian detection approaches offer a broader
variety of applications, including intelligent-assisted driving
[6, 7], intelligent monitoring [8, 9], pedestrian analysis [10],
and intelligent robot [11]. Furthermore, various open-source
pedestrian identification data sets based on computer vision
have been produced to aid in the evaluation and improvement
of detection algorithms, such as the INRIA person dataset [10],
the Caltech pedestrian detection benchmark [11], and the ETH
dataset [12].
Previously, background modelling methods [13]
were frequently used to extract foreground moving targets,
after which feature extraction in the target area was performed
and classifiers (e.g., multi-layer perceptron, support vector
machine, and random forest) were used to classify them to
determine whether pedestrians are included. In actuality, it still
encounters the following issues during the application process:
(1) Lighting variations can readily produce large changes in
picture grey levels, lowering detection accuracy. (2) Camera
shaking can easily cause backdrop modelling to fail, affecting
target position computation. (3) There may be ghost zones
that impact the model’s assessment. The statistical learning
approach [14] automatically mines characteristics from a large
number of data and builds a pedestrian detection classifier.
The retrieved characteristics primarily comprise the target’s
grayscale, edge, texture, colour, and gradient histogram. Statis-
tical learning is also confronted with the following challenges:
(1) Variable pedestrian stance, apparel, scale, and lighting
environment. (2) Classifiers often require a significant number
of training examples. (3) The quality of the features has a di-
rect impact on the classifier’s ultimate detection performance.
There have been some advances in the usage of multi-feature
fusion and cascaded classifiers. Haar feature [15], HOG feature
[16], LBP feature [17], and Edgelet feature [18] are examples
of often used features. In this research, we compare and study
the processes for detecting pedestrians and watching their
social distance in order to increase the efficiency of epidemic
prevention. This work’s contributions include:
A unique vision-based surveillance system for monitoring
social distance violations in public spaces.
I used a strong algorithm to recognise people and quantify
the distance between them. In compared to the previous
techniques, I propose speedier and more accurate out-
comes.
The suggested methodology is an accurate method of
translating a camera frame recorded from a perspective
point of view to a top-down view. This will keep the con-
version rate between pixel distance and physical distance
consistent.
The goal of this project is to provide an AI-based
solution to reduce the transmission of coronavirus among
individuals and its economic effect. We present YOLOv3[19],
a revolutionary deep learning model, together with the con-
struction of an algorithm for social distance and OpenCV
which is a computer vision with machine learning for effective
picture processing.
II. LITERATURE REVIEW
On a variety of datasets, including the Caltech dataset
and the KITTI dataset, the integration of several attributes
yields the best results. Through the use of upgraded decision
forests and low-level characteristics in the intermediate layer,
Zhang et al. [20] created a number of cutting-edge pedestrian
detectors. For quick and precise pedestrian recognition in low-
end surveillance systems, Kim et al. [21] used the model
compression technique based on the teacher-student paradigm
to the random forest (RF) classifier. The results of the ex-
periments demonstrate that the suggested technique outper-
forms various state-of-the-art methods in terms of detection
performance on the Performance Evaluation of Tracking and
Surveillance 2006 dataset, the Town Centre dataset, and the
Caltech benchmark dataset. A multi-scale pedestrian detector
based on self-attention mechanism and adaptive spatial feature
fusion is presented, and the asymmetric pyramid non-local
block (APNB) module is used to better extract global infor-
mation [22]. According to Nam et al. [23], upgraded decision
trees are still effective in the quick rigid object recognition
even in the face of the introduction of sophisticated and
data-intensive approaches. They suggested an efficient feature
transformation to eliminate correlation in the local neighbour-
hood that is suited for use with orthogonal decision trees,
drawing inspiration from previous work on the identification
and decorrelation of HOG features. In actuality, the orthogonal
tree with local decorrelation features outperforms the inclined
tree trained on original data, and it does so at a low compu-
tational cost. Magoo and co.[24] suggested using the YOLO
v3 object recognition model as a key point regression to find
important feature points in a surveillance video application
framework based on a bird’s-eye perspective. A pedestrian
detector that blends common sense and everyday information
into a straightforward and computationally efficient functional
architecture was proposed by Zhang et al. [25]. The suggested
features are resilient to occlusion, and experimental findings
on pedestrian datasets from INRIA and Caltech demonstrate
that their detector delivers the most advanced performance at
minimal computing cost. Even in the event of low-quality
video, the flow field will degrade as a result of the motion
characteristics produced from optical flow, according to Walk
et al[26] .’s research. They also included a brand-new feature,
called self-similarity on the colour channel, which may con-
tinually enhance the detection efficiency of static photos and
video sequences on various data sets. The authors concluded
by discussing the critical complexity of detector assessment
and demonstrating how the existing benchmark technique is
missing vital information that might skew the evaluation. The
detection process was meticulously examined and optimised
by Tome et al. [27], who also offered a unique deep learning
architecture that outperformed the conventional approach in
terms of task accuracy and computational time. Finally, the
author put the suggested technique to the test on the 192-core
NVIDIA Jetson TK1 platform, which serves as the premier
computing platform for future autonomous cars. Chen et al.
[28] developed a unique attention-guided encoder-decoder
convolutional neural network to address the poor-resolution
and low signal-to-noise characteristics of infrared pictures
that may change depending on the weather. To further re-
weight the multi-scale characteristics produced by the encoder-
decoder module, they also suggested an attention module.
The suggested technique increases the accuracy of the most
sophisticated algorithms by 5.1 percent and 23.78 percent,
respectively, using the KMU and CVC-09 pedestrian data sets.
III. METHODOLOGY
A. The architecture of social distancing
In this part, I will go through the actions that must be taken
in order to create a sequence design that will determine and
verify whether or not social distancing norms are followed by
individuals.
1. Streaming the video footage captured by the camera that
shows people.
2. Frame-by-frame extraction of the camera’s footage.
3. YOLOv3 architecture is used to identify just the people
in the camera recordings.
4. For good and accurate image processing, use the OpenCv
image processing tool to count the number of individuals in
the camera recordings.
5. Determine the separation between the bounding boxes’
centres, which are where the people in the videos are located.
6. Last but not least, the algorithm will decide whether or
not the individuals are in a violation or safe environment based
on the quantity of people in the videos and the measured sep-
aration between the centroid of bounding boxes. I established
two distinct levels for violation with two distinct threshold set
points for the measured distance between the centre points of
the bounding boxes, it should be noted. Risk is the violation
level, and the bounding box is coloured red to indicate this. I
coloured the bounding box green to indicate the safe state.
B. Object detection
A computer vision technique called object detection finds
the items in an image or video. The initial step in this
investigation is to determine the coordinates of the people in
the footage. For people detection in the Camera footage, we
used YOLOv3[29]. I created a 53-layer convolutional neural
network (CNN) for YOLOv3. The purpose of this research
is to develop a lightweight model that takes into account the
real-time application needs of convolutional neural networks
(CNNs) in low-cost embedded systems, such as IoT devices.
YOLOv3 is made up of two major modules: the conventional
model, which has a high recognition accuracy, and the tiny
model, which has a slightly reduced recognition accuracy. For
primary feature extraction, the mAP (accuracy) of the standard
model YOLOv3-416, which is composed of Convolutional
block (Conv) and Residual networks, is used (ResNet). The
YOLO v3 network seeks to forecast each object’s bounding
boxes (area of interest of the candidate object) as well as
the probability of the class to which the object belongs. To
accomplish this, the model separates each input image into
a SxS grid of cells, with each grid predicting B bounding
boxes and C class probabilities of objects whose centres fall
within the grid cells. According to the research, each bounding
box may specialise in detecting a specific type of object.
The number of anchors utilised is connected with the number
of bounding boxes ”B. Each bounding box includes 5+C
attributes, where 5 refers to the five bounding box attributes
(for example, centre coordinates (bx, by), height (bh), width
(bw), and confidence score) and C is the number of classes.
Our output from passing this image into a forward pass
convolution network is a 3-D tensor because we are working
on an SxS image. The output looks like [S, S, B*(5+C)].
1) Anchor Boxes: Previously, scientists employed the slid-
ing window approach and ran an image classification algorithm
on each window to detect an object. They quickly recognised
that this made no sense and was inefficient, so they switched
to ConvNets and ran the entire image in a single shot. Because
the ConvNet generates square matrices of feature values (e.g.,
13x13 or 26x26 in the case of YOLO), the concept of ”grid”
entered the picture. The square feature matrix is defined as
a grid, however the main issue arose when the objects to
detect were not square in shape. These things could be of any
shape (mostly rectangular). Anchor boxes were so introduced.
Anchor boxes are pre-defined boxes with a specified aspect
ratio. Even before training, these aspect ratios are defined
by executing a K-means clustering on the full dataset. These
anchor boxes are connected to the grid cells and have the
same centroid. YOLO v3 employs three anchor boxes for each
detection scale, for a total of nine anchor boxes.
2) Non-Maximum Suppression: There is a potential that the
output expected after the single forward pass would contain
numerous bounding boxes for the same object because the
centroid is the same, but we only need one bounding box that
is best suited for all.
For this, we can employ a technique known as non-maxim
suppression (NMS), which essentially cleans up after these
detections. I may specify a particular threshold that will
operate as a constraint for this NMS technique, causing it to
disregard all other bounding boxes whose confidence is lower
than the specified threshold, so removing a few. However, this
would not exclude everything, thus the following stage in the
NMS would be executed, which would be to arrange all of
the bounding box confidences in decreasing order and select
the one with the highest score as the most appropriate one
for the item. Then we discover all the other boxes that have
a high Intersection over union (IOU) with the bounding box
and delete them as well.
C. OpenCV
OpenCV (Open Source Computer Library) was first lunched
in 1999 by intel[29] OpenCV (Open Source Computer Vision
Library) is a free and open source software library for com-
puter vision and machine learning. OpenCV was created to
offer a standard foundation for computer vision applications
and to speed up the adoption of machine perception.The
library contains over 2500 optimised algorithms, including a
complete variety of both traditional and cutting-edge computer
vision and machine learning techniques. These algorithms can
be used to detect and recognise faces, identify objects, classify
human actions in videos[30], track camera movements, track
moving objects, extract 3D models of objects, produce 3D
point clouds from stereo cameras, stitch images together to
produce a high resolution image of an entire scene, find similar
images from an image database, remove red eyes from images
taken with flash, follow eye movements, recognise scenery,
and establish markers to overlay. OpenCV has around 47
thousand users and an estimated 18 million downloads[21].
D. Dataset training procedure
The suggested method was trained on two distinct picture
datasets. The first dataset contains 1000 photos. FLIR gathered
these pictures for the cameras [31]. This dataset contains the
first collection of photos captured by CCTV cameras equipped
with infrared radiation sensors. Dataset II features 950 photos
of various persons collected under realistic conditions during
surveillance and monitoring. They are from various settings,
and include individuals creeping, strolling, jogging, and in var-
ious body postures. These photos were gathered from various
online sources. Both datasets’ photos were classified for the
class of just people in the photographs. For each dataset, the
photos were divided into 70 percent for training, 20 percent
for validating, and 10 percent for testing the architecture.
Stochastic gradient descent (sdgm) was used to train YOLOv3
[32]. To regulate the model’s response to mistake, the learning
rate has been tuned in the training option. The learning rate
was fine-tuned to 103, and the loose curve remained stable at
this value for both datasets[33].
IV. RES ULT S AN D DISCUSSION
All outcome details and comparisons are presented in this
section. I depicted the outcome from several angles. I ran
the algorithm over the testing photos from both datasets to
evaluate the performance of the suggested technique. The
photographs were created using true situations captured by
various cameras in outdoor settings. We picked these datasets
for our tests with this in mind. I also used YOLOv3 and the
approach offered for measuring social distance with OpenCV
on a huge scale of films. These movies are scalable in terms
of screening persons’ movements as cameras measured their
distance to determine whether or not they broke the social
distance law. In addition to my investigation, I conducted
another experiment by studying (Fast R-CNN) and you only
look once (YOLOv2) detectors for persons detection, both
employing the identical images from the two training datasets
of images. The purpose of this is to compare these designs to
YOLOv3 and suggested approaches. Using the same testing
photos from both datasets and the videos database. To evaluate
the suggested approach for metric computation, confusion
matrix criteria were utilised. The criteria selected to evaluate
the algorithm’s goodness are recall, accuracy, and precision.
see Eq(1)
where TP denotes the number of true positives, TN
the number of true negatives, FP the number of false positives,
and FN the number of false negatives.
Based on the findings of these studies, YOLOv3
produced encouraging results for people detection on pictures
on both testing datasets and videos database; person detection
points have been exhibited in OpenCV view window for both
safe and risk circumstances with assigned colours, respec-
tively. Furthermore, YOLOv3 outperformed other approaches
in terms of accuracy[34,35,36].
A. Equations
precision = TP / TP + FP
Accuracy TP + TN / TP + FN + TN + FP
Recall = TP / TP + FN (1)
Fig. 1. Social distancing status with the proposed method, which show
8 persons violated the Risk threshold distance, and 10 persons were in
safe conditions: a perspective transformation of human detection points with
OpenCV view.
CONCLUSION
This study presented a deep learning-based social distance
approach for people detection in movies or photos utilising
OpenCV view. The obtained findings demonstrated that the
designed intelligent surveillance system recognised persons
who violated social distance using good picture processing.
YOLOv3 performed well in terms of accuracy and precision.
OpenCV and the CCTV Camera view technology have been
developed to efficiently map human detection sites. The pro-
posed technique is a way for the authorities to perceive pedes-
trians who follow social distance norms in outdoor locations.
I coloured the safe condition green for the bounding boxes,
while the unsafe state is red, and the algorithm, YOLOv3, will
identify and count how often individuals breached the social
distancing.
REFERENCES
[1] The Visual and Data Journalism Team.: Coronavirus: a visual guide to
the outbreak. 6 Mar. 2020
[2] Fong, M.W., Gao, H., Wong, J.Y., Xiao, J., Shiu, E.Y., Ryu, S.,
Cowling, B.J.: Nonpharmaceutical measures for pandemic influenza in
nonhealthcare settings—social distancing measures. Emerg. Infect. Dis.
26, 976 (2020)
[3] Ahmedi, F., Zviedrite, N., Uzicanin, A.: Effectiveness of workplace so-
cial distancing measures in reducing influenza transmission: a systematic
review. BMC Public Health 18, 518 (2018)
[4] Hotez, P.J.: COVID-19 and the antipoverty vaccines. Mol. Front. J. 4,
58–61 (2020)
[5] Mou, Q., Wei, L., Wang, C., et al.: Unsupervised domain-adaptive
scene-specific pedestrian detection for static video surveillance. Pattern
Recogn. 118(9), 108038 (2021)
[6] Liu, T., Du, S., Liang, C., et al.: A novel multi-sensor fusion based object
detection and recognition algorithm for intelligent assisted driving. IEEE
Access 9, 81564–81574 (2021)
[7] Zheng, Q., Zhao, P., Zhang, D., Wang, H.: MR-DCAE: Manifold
regularization-based deep convolutional autoencoder for unauthorized
broadcasting identification. Int. J. Intell. Syst. (2021).
[8] Chen, Y., Ma, J., Wang, S.: Spatial regression analysis of pedestrian
crashes based on point-of-interest data. J. Data Anal. Inf. Process. 08(1),
1–19 (2020)
[9] Zheng, Q., Yang, M., Tian, X., Jiang, N., Wang, D.: A full stage data
augmentation method in deep convolutional neural network for natural
image classification. Discrete Dyn. Nat. Soc. 2020, 1–11 (2020).
[10] Dalal, N., Triggs, B.: Histograms of oriented gradients for human
detection. In: IEEE Computer Society Conference on Computer Vision
Pattern Recognition (2005)
[11] Dollar, P., Wojek, C., Schiele, B., et al.: Pedestrian detection: an
evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell.
34(4), 743–761 (2011)
[12] Ess, A., Leibe, B., Schindler, K. et al.: Moving obstacle detection in
highly dynamic scenes. In: IEEE Int. Conf. Robot. Autom. pp. 56–63
(2009)
[13] Rodriguez, P., Wohlberg, B.: Incremental principal component pursuit
for video background modeling. J. Math. Imaging Vis. 55(1), 1–18
(2016)
[14] Zheng, J., Peng, J.: A novel pedestrian detection algorithm based
on data fusion of face images. Int. J. Distrib. Sens. Netw. 15(5),
155014771984527 (2019)
[15] Park, K.Y., Hwang, S.Y.: An improved Haar-like feature for efficient
object detection. Pattern Recogn. Lett. 42, 148–153 (2014)
[16] Sheng, Y., Liao, X., Borasy, U.K.: A pedestrian detection method based
on the HOG-LBP feature and gentle AdaBoost. Int. J. Adv. Comput.
Technol. 4(19), 553–560 (2012)
[17] Costa, Y., Oliveira, L.S., Koerich, A.L., et al.: Music genre classification
using LBP textural features. Signal Process. 92(11), 2723–2737 (2012)
[18] Zhao, J.: Boundary extraction using supervised edgelet classification.
Opt. Eng. 51(1), 7002 (2012)
[19] YOLOv3: An Incremental Improvement, Joseph Redmon, Ali Farhadi,
Apr 2018 University of Washington
[20] Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for
pedestrian detection. In: IEEE Conf. on Computer Vision and Pattern
Rec. (CVPR) (2015)
[21] Wang, M., Chen, H., Li, Y., et al.: Multi-scale pedestrian detection based
on self-attention and adaptively spatial feature fusion. IET Intell. Transp.
Syst. 15(6), 837–849 (2021)
[22] Nam, W., Doll´
ar, P., Han, J.H.: Local Decorrelation for Improved
Detection. Adv. Neural Inf. Process. Syst. 1, 424–432 (2014)
[23] Magoo, R., Singh, H., Jindal, N., et al.: Deep learning-based bird eye
view social distancing monitoring using surveillance video for curbing
the COVID-19 spread. Neural Comput. Appl. 33(22), 15807–15814
(2021)
[24] Magoo, R., Singh, H., Jindal, N., et al.: Deep learning-based bird eye
view social distancing monitoring using surveillance video for curbing
the COVID-19 spread. Neural Comput. Appl. 33(22), 15807–15814
(2021)
[25] Zhang, S. et al.: Informed Haar-like features improve pedestrian detec-
tion. In: IEEE Computer Vision Pattern Recognition (CVPR) (2014)
[26] Walk, S., Majer, N., Schindler, K. et al.: New features, and insights for
pedestrian detection. In: IEEE Conference on Computer Vision Pattern
Recognition (CVPR) (2010)
[27] Tom`
e, D., Monti, F., Baroffio, L., et al.: Deep convolutional neural
networks for pedestrian detection. Signal Process. Image Commun. 47,
482––489 (2016)
[28] Chen, Y., Shin, H.: Pedestrian detection at night in infrared images using
an attention guided encoder decoder convolutional neural network. Appl.
Sci. 10(3), 809 (2020)
[29] I. Culjak, D. Abram, T. Pribanic, H. Dzapo and M. Cifrek, ”A brief
introduction to OpenCV,” 2012 Proceedings of the 35th International
Convention MIPRO, 2012, pp. 1725-1730.
[30] Saponara, S., Elhanashi, A., Gagliardi, A.: Implementing a real-time,
AI-based, people detection and social distancing measuring system for
Covid-19. J. Real-Time Image Proc. (2021).
[31] Mahamkali, Naveenkumar Ayyasamy, Vadivel. (2015). OpenCV for
Computer Vision Applications.
[32] FLIR Thermal Dataset for Algorithm Training, FLIR Systems.
[33] Glorot, X. et al.: Understanding the difficulty of training deep feed-
forward neural networks. In: Int. Conf. on Artificial Intelligence and
Statistics (2010)
[34] Sener, F., et al.: Two-person interaction recognition via spatial multiple
instances embedding. J. Vis. Commun. Image Represent. 32, 63 (2015)
[35] Rinkal, K., et al.: Real-time social distancing detector using social
distancingnet-19 deep learning network. SSRN Electron. J. 40, 6 (2020)
[36] Yadav, S.: Deep learning based safe social distancing and face mask
detection in public areas for covid-19 safety guidelines adherence. Int.
J. Res. Appl. Sci. Eng. Technol. 8, 1–10 (2020)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The object detection and recognition algorithm based on the fusion of millimeter-wave radar and high-definition video data can improve the safety of intelligent-driving vehicles effectively. However, due to the different data modalities of millimeter-wave radar and video, how to fuse the two effectively is the key point. The difficulty lies in the data fusion methods such as insufficient adaptability of image distortion in data alignment and coordinate transformation and also the mismatching of information levels of the data to be fused. To solve the problem of data fusion of millimeter wave radar and video, this paper proposes a decision-level fusion method of millimeter-wave radar and high-definition video data based on angular alignment. Specifically, through the joint calibration and approximate interpolation, projected to polar coordinate system, the radar and the camera are angularly aligned in the horizontal direction. Then objects are detected by a deep neural network model from video data, and combined with those detected by radar to make the joint decision. Finally, object detection and recognition task based on the fusion of the two kinds of data is completed. Theoretical analysis and experimental results indicate that the accuracy of the algorithm based on the two data fusion is superior to that of the single detection and recognition algorithm on the basis of millimeter-wave radar or video data.
Article
Full-text available
Abstract Pedestrian detection is a classic problem in computer vision, which has an essential impact on the safety of urban autonomous driving. Although significant improvement has been made in pedestrian detection recently, small‐scale pedestrian detection is still challenging. To effectively tackle this issue, a multi‐scale pedestrian detector based on self‐attention mechanism and adaptive spatial feature fusion is proposed in this paper. In order to better extract global information, the spatial attention mechanism asymmetric pyramid non‐local block (APNB) module is applied. To achieve scale‐invariance detection, multiple detection branches are designed, which include a high‐resolution detection branch and a low‐resolution detection branch. In integrating multi‐scale features, the adaptively spatial feature fusion (ASFF) method is employed, which can solve the problem of feature inconsistency across different scales. Experimental results show that the proposed method obtains competitive performance on Caltech and CityPersons datasets.
Article
Full-text available
COVID-19 is a disease caused by a severe respiratory syndrome coronavirus. It was identified in December 2019 in Wuhan, China. It has resulted in an ongoing pandemic that caused infected cases including many deaths. Coronavirus is primarily spread between people during close contact. Motivating to this notion, this research proposes an artificial intelligence system for social distancing classification of persons using thermal images. By exploiting YOLOv2 (you look at once) approach, a novel deep learning detection technique is developed for detecting and tracking people in indoor and outdoor scenarios. An algorithm is also implemented for measuring and classifying the distance between persons and to automatically check if social distancing rules are respected or not. Hence, this work aims at minimizing the spread of the COVID-19 virus by evaluating if and how persons comply with social distancing rules. The proposed approach is applied to images acquired through thermal cameras, to establish a complete AI system for people tracking, social distancing classification, and body temperature monitoring. The training phase is done with two datasets captured from different thermal cameras. Ground Truth Labeler app is used for labeling the persons in the images. The proposed technique has been deployed in a low-cost embedded system ( Jetson Nano ) which is composed of a fixed camera. The proposed approach is implemented in a distributed surveillance video system to visualize people from several cameras in one centralized monitoring system. The achieved results show that the proposed method is suitable to set up a surveillance system in smart cities for people detection, social distancing classification, and body temperature analysis.
Article
Full-text available
The number of global COVID19 cases has just exceeded 15 million, and there is mounting evidence for a devastating economic impact from this illness. Although COVID19 affected primarily China, Europe, and North America during the first half of 2020, now this disease is accelerating in the resource-poor nations of the Global South. Across Latin America, South Asia, and Africa, COVID19 is expected to push up to 100 million people into extreme poverty, eroding many of the economic gains achieved over the last five years. COVID19 vaccines will be required to help control the pandemic, especially in low- and middle-income nations. These will have important health benefits, but might also prevent further economic devastation. The term ”antipoverty vaccines” has been used to refer to vaccines to prevent neglected tropical diseases that affect worker productivity, child development, and the health of girls and women. COVID19 vaccines could also become important antipoverty technologies provided we find ways to scale and distribute them as affordable vaccines. Two vaccines now being accelerated for global health include whole inactivated virus and recombinant protein vaccines. These might become essential tools for combating global poverty.
Article
Full-text available
Influenza virus infections are believed to spread mostly by close contact in the community. Social distancing measures are essential components of the public health response to influenza pandemics. The objective of these mitigation measures is to reduce transmission, thereby delaying the epidemic peak, reducing the size of the epidemic peak, and spreading cases over a longer time to relieve pressure on the healthcare system. We conducted systematic reviews of the evidence base for effectiveness of multiple mitigation measures: isolating ill persons, contact tracing, quarantining exposed persons, school closures, workplace measures/closures, and avoiding crowding. Evidence supporting the effectiveness of these measures was obtained largely from observational studies and simulation studies. Voluntary isolation at home might be a more feasible social distancing measure, and pandemic plans should consider how to facilitate this measure. More drastic social distancing measures might be reserved for severe pandemics.
Article
Full-text available
Pedestrian-related accidents are much more likely to occur during nighttime when visible (VI) cameras are much less effective. Unlike VI cameras, infrared (IR) cameras can work in total darkness. However, IR images have several drawbacks, such as low-resolution, noise, and thermal energy characteristics that can differ depending on the weather. To overcome these drawbacks, we propose an IR camera system to identify pedestrians at night that uses a novel attention-guided encoder-decoder convolutional neural network (AED-CNN). In AED-CNN, encoder-decoder modules are introduced to generate multi-scale features, in which new skip connection blocks are incorporated into the decoder to combine the feature maps from the encoder and decoder module. This new architecture increases context information which is helpful for extracting discriminative features from low-resolution and noisy IR images. Furthermore, we propose an attention module to re-weight the multi-scale features generated by the encoder-decoder module. The attention mechanism effectively highlights pedestrians while eliminating background interference, which helps to detect pedestrians under various weather conditions. Empirical experiments on two challenging datasets fully demonstrate that our method shows superior performance. Our approach significantly improves the precision of the state-of-the-art method by 5.1% and 23.78% on the Keimyung University (KMU) and Computer Vision Center (CVC)-09 pedestrian dataset, respectively.
Article
Nowadays, radio broadcasting plays an important role in people's daily life. However, unauthorized broadcasting stations may seriously interfere with normal broadcastings and further disrupt the management of civilian spectrum resources. Since they are easily hidden in the spectrum and are essentially the same as normal signals, it still remains challenging to automatically and effectively identify unauthorized broadcastings in complicated electromagnetic environments. In this paper, we introduce the manifold regularization-based deep convolutional autoencoder (MR-DCAE) model for unauthorized broadcasting identification. The specifically designed autoencoder (AE) is optimized by entropy-stochastic gradient descent, then the reconstruction errors in the testing phase can be adopted to determine whether the received signals are authorized. To make this indicator more discriminative, we design a similarity estimator for manifolds spanning various dimensions as the penalty term to ensure their invariance during the back-propagation of gradients. In theory, the consistency degree between discrete approximations in the manifold regularization (MR) and the continuous objects that motivate them can be guaranteed under an upper bound. To the best of our knowledge, this is the first time that MR has been successfully applied in AE to promote cross-layer manifold invariance. Finally, MR-DCAE is evaluated on the benchmark data set AUBI2020, and comparative experiments show that it achieves state-of-the-art performance. To help understand the principle behind MR-DCAE, convolution kernels and activation maps of test signals are both visualized. It can be observed that the expert knowledge hidden in normal signals can be extracted and emphasized, rather than simple overfitting.
Article
Objects from one category may be drawn from different distributions due to diverse illuminations, backgrounds, and camera viewpoints. Traditional object detection methods generally perform poorly due to the domain shift. To address this problem, we propose to train a domain-adaptive scene-specific pedestrian detector in an unsupervised manner. A generic detector is transferred to different target domains from one labeled source domain dataset without human-annotated target samples. Specifically, we first extend the generic detector to a dual-boundary classifier and collect hard samples as unlabeled target samples according to the detection confidence. Then, we propose a cycle semantic transfer network to align the instance-level and class-level distributions between the source domain and target domain and automatically label the hard samples. The initial generic detector is then re-trained by these labeled hard samples and specialized to a target scene. This process can be conveniently extended to different surveillance scenarios and generate specific detectors under various static camera viewpoints. Moreover, to reduce the impact of mislabeled hard samples on the generic detector, an online gradual optimization algorithm is proposed to iteratively update the generic model, thereby obtaining an optimized process that is insensitive to individual mislabeled target samples. Extensive experiments show that even if the target domain is not manually annotated, the proposed self-learning method demonstrates the effectiveness of pedestrian detection in various domain shift scenarios, and it outperforms existing scene-specific pedestrian detection methods and some classic supervised methods.