Conference PaperPDF Available
Hybrid SNN-based Privacy-Preserving Fall Detection using
Neuromorphic Sensors
Shyam Sunder Prasad1,2, Naval Kishore Mehta1,2, Himanshu Kumar1,2, Abeer Banerjee1,2, Sumeet
Saurav1,2, and Sanjay Singh1,2
1CSIR-Central Electronics Engineering Research Institute (CSIR-CEERI)
Pilani, Rajasthan, India
2Academy of Scientic and Innovative Research (AcSIR)
Ghaziabad, Uttar Pradesh, India
ABSTRACT
Indoor surveillance is crucial for ensuring the safety and security
of occupants within the premises. Only those who are ill or elderly
tend to spend the most time at home. The use of indoor surveil-
lance to continuously monitor these people’s security could help
in the early detection and avoidance of tragic incidents. Ensuring
privacy while achieving this task has led to a recent research focus
on protecting privacy in human fall detection. This paper attempts
to address the issue of privacy-preserving fall detection by employ-
ing the Dynamic Vision Sensor (DVS), which captures intensity
changes without compromising individuals’ privacy. This paper
introduces a novel event-based dataset named “DVSFall”, incor-
porating diverse daily living activities (ADL) and simulated falls.
Captured from multiple viewpoints using DVS cameras, the dataset
encompasses twenty-one participants across varying age groups. To
evaluate the dataset, we employed Spiking Neural Networks (SNN)
designed to replicate neural activity. Furthermore, we explored a
hybrid framework, the 3D-CNN & SNN (NeuCube) approach, for
fall detection. Our proposed framework achieved an accuracy of
94.59% with SNN and notably improved to 97.84% using the hybrid
approach, as measured against the recorded dataset.
KEYWORDS
Action Classication, DVS, Fall Detection, Privacy-Preserving, SNN
ACM Reference Format:
Shyam Sunder Prasad
1,2
, Naval Kishore Mehta
1,2
, Himanshu Kumar
1,2
,
Abeer Banerjee
1,2
, Sumeet Saurav
1,2
, and Sanjay Singh
1,2
. 2023. Hybrid
SNN-based Privacy-Preserving Fall Detection using Neuromorphic Sen-
sors. In Proceedings of 14th Indian Conference on Computer Vision, Graph-
ics and Image Processing (ICVGIP’23). ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3627631.3627650
1 INTRODUCTION
In computer vision, human activity recognition is a challenging
problem and has been used for surveillance and security purposes.
In this paper, we particularly concentrate on the task of human fall
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ICVGIP’23, December 2023, Ropar, India
©2023 Copyright held by the owner/author(s).
ACM ISBN .
https://doi.org/10.1145/3627631.3627650
Figure 1: The right image captured using a neuromorphic
camera visualizes the events triggered during the falling ac-
tion, while the left one depicts a normal grayscale frame. It
is quite evident that identifying the person is considerably
more challenging in the right image than in the left.
detection. Automatic fall detection systems are crucial for moni-
toring the safety of sick and older adults, swiftly identifying and
notifying falls to enable timely assistance and minimize response
time [
28
,
44
]. While researchers have developed various fall detec-
tion and prevention systems, those relying on external environ-
mental sensors, particularly cameras, raise privacy concerns when
deployed in residential settings. Traditional frame-based video cam-
eras have historically been utilized in computer vision applications,
and certain end-to-end convolutional neural network-based mod-
els remain operational, producing notable outcomes. Nevertheless,
real-time tasks like human activity recognition and dynamic scene
cognition demand signicantly greater robustness than what cur-
rent techniques can provide. In scenarios where ample natural
light or rapidly changing lighting conditions are present, current
frame-based methods struggle to perform eectively and may ne-
cessitate substantial computational resources. Another major limi-
tation of frame-based indoor surveillance cameras having an im-
mense impact is the privacy concern. The privacy concern restricts
the user adaptability of vision-based solutions using conventional
frame-based RGB cameras. In the case of general cameras, video
anonymization is often necessary for privacy preservation [
18
]. Uti-
lizing event-based camera systems like the dynamic vision sensor
(DVS) can mitigate this issue. DVS captures video streams dier-
ently, registering only event changes and signicantly reducing
data redundancies, which is particularly benecial for infrequent
events like human falls (Figure 1).
Event-based cameras operate independently at each pixel, gen-
erating a stream of events through data transmission, indicating
changes in illumination. This characteristic allows them to achieve
ICVGIP’23, December 2023, Ropar, India Shyam Sunder Prasad1,2, Naval Kishore Mehta1,2, Himanshu Kumar1,2, Abeer Banerjee1,2, Sumeet Saurav1,2, and Sanjay Singh1,2
very low latency in microseconds (µs), unlike standard cameras
that exhibit a latency of 50-200 milliseconds (ms) [
1
,
12
]. As a result,
event-based cameras excel in handling real-time tasks, outperform-
ing standard cameras. An advantage of event-based cameras over
frame-based cameras is their immunity to motion blur since they
only register data when intensity changes occur. Moreover, by not
capturing RGB images as frames, event-based cameras substantially
reduce data redundancies, enabling computationally feasible real-
time video processing. The commercial availability of the Dynamic
Vision Sensor (DVS), a revolutionary vision sensor mimicking the
human retina’s capabilities, holds tremendous potential for deep
learning applications [3, 20, 29, 42].
Previous eorts for privacy-aware human fall detection, such as
the work by Jixin Liu et al. [
21
], utilized multi-layer compressed
sensing to visually shield the video stream. However, since the
video was captured using frame-based cameras, privacy invasion
risks persisted. Other approaches, like those of Tateno et al. [
41
],
Ma et al. [
23
], and Gupta et al. [
15
], used low-resolution infrared
sensors for neural network-based action recognition, eectively
preserving privacy. Nevertheless, a drawback of such methods is
the sacrice of input quality to achieve privacy preservation.
In this paper, we have used SNN, and a hybrid 3D-CNN & SNN
architecture for privacy-preserving fall detection using our own
recorded dataset with DVS. Our recorded dataset involved the
participation of 21 subjects (14 male and 7 female) who were asked
to perform falling actions and other actions of daily living (ADL).
We chose two dierent types of classier networks to validate the
performance of our methods using the DVSFall dataset. Firstly, we
use Spiking Neural Networks (SNNs) which is an emerging research
eld of brain-inspired neural networks. Since the nature of DVS
inputs is spatiotemporal events, it is perfectly suited as inputs for
SNNs, a promising candidate for processing spatiotemporal data
that is based on brain-inspired modeling of memory and learning
mechanisms, has been explored for human fall detection due to
their proven eciency, ability to learn faster and recall patterns
that occur across very dierent time scales. Therefore, we explore
the functionalities of SNNs for directly processing the DVS data
to perform action classication. Secondly, we explore a hybrid
approach using a combination of 3D-CNNs and SNNs to achieve
better classication accuracy. In the case of the hybrid network, we
use spatiotemporal features obtained from 3D-CNN as the input of
SNNs for classifying whether an action was a falling action or not.
Implementation of spiking neural networks on extremely low-
power chips has become possible due to the availability of neuro-
morphic hardware such as TrueNorth [
25
], SpiNNaker [
11
] and
Intel Loihi [
7
]. The mentioned neuromorphic hardware doesn’t
have an online learning mechanism, hence, learning is usually done
on a computer. According to Shrestha et al. [
35
], Spike LAYer Error
Reassignment architecture, known as SLAYER, is an architecture for
backpropagation-based spiking neural networks (SNNs) learning
based on PyTorch [
2
], that can be used for the oine training of
the network before deploying the network on neuromorphic hard-
ware. NeuCube architecture, proposed by Kasabov [
17
] was one
of the rst computational architectures for the implementation of
SNNs, which can also be used for oine training for neuromorphic
hardware. Both frameworks are used to process spatiotemporal
data. The NeuCube architecture consists of functional modules that
include a spike encoder module for input data, a 3D-SNN reser-
voir module (SNNr), an output classication module, and a gene
regulatory network. At rst, the continuous-valued input data is en-
coded into spike sequences referred to as the train of spikes. Then,
the SNNr module is trained in an unsupervised mode to learn the
spike sequences representing the input patterns. An evolving SNN
classier is then trained in a supervised mode to classify dierent
dynamic patterns of the SNNr activities corresponding to dierent
classes. The mentioned steps are repeated to obtain maximum accu-
racy and then the model can be used on new data. In this paper, we
use both NeuCube and SLAYER-based methods for fall detection
and have utilized oine learning to achieve the solution.
Our contributions are summarized as follows:
A hybrid framework for event-based privacy-preserving fall
detection using Spiking Neural Networks and 3D-convolutional
architectures.
An original multi-view labelled DVS dataset of human falls
and activities of daily living (ADL) from diverse subject
groups. To the best of our knowledge, this event-based dataset
is a novel contribution to the research community of privacy-
preserving machine learning.
Extensive performance evaluation and comparative analy-
sis of the designed individual and hybrid models using our
original dataset.
2 RELATED WORK
Fall detection can be achieved by the use of wearable devices, and
ambient sensors, and due to the recent advancement in deep learn-
ing and computer vision techniques, real-time fall detection is per-
formed using neural networks [40, 45].
The most common method is to use wearable electronic de-
vices that use an inertial measurement unit (IMU) to gather motion
data for further processing. Other sensors like magnetometers and
barometers are also used to improve the measurement accuracy of
motion such that movement estimation can be performed accurately.
Based on the data captured by the IMUs, dierent classication
techniques are used for human fall detection. The most frequent
classiers that are used to perform fall detection have been the con-
cept of threshold-based fall detection. But, like most other classical
image processing-based techniques, they are not good at exception
handling, e.g., in cases where a falling action and an ADL sequence
(like someone lying down) look similar, it is very hard to handcraft
a threshold to distinguish between the two actions. Small sensors
are now available and embedded in a wide range of everyday de-
vices at cheaper prices, including smartphones and smartwatches.
This makes the technology of human fall detection accessible to the
masses. Although a big limitation of wearable devices is that they
must be worn by the user all of the time to keep the fall detection
mechanism operational, in addition to that, wearable devices rely
on batteries, so timely recharge of the device is mandatory during
which the device is not operational.
Fall detection using ambient sensors has also been explored by
Sun et. al. [
38
]. The basic idea behind using ambient sensors for fall
detection is that a human fall triggers certain special signals in the
surrounding environment that can be analyzed for classifying the
Hybrid SNN-based Privacy-Preserving Fall Detection using Neuromorphic Sensors ICVGIP’23, December 2023, Ropar, India
Figure 2: A complete overview of our methods using DVS inputs. Our methods include a pure SNN (SLAYER) architecture, and a
hybrid 3D-CNN & SNN (NeuCube) architecture, both of which use DVS data for privacy-preserving fall detection.
action as fall or not fall. The vibration data on the oor is monitored
in real-time by piezoelectric sensors put on the oor surface.
Many methods, such as [
23
,
31
,
33
], employ 3D-CNNs to extract
spatiotemporal aspects that include not only spatial information
connected with pose identication but also the time link established
between successive poses resulting to a fall. This approach is used
in [
10
], which creates a dynamic image by fusing all of the frames
from a time window into a single image and passing this image
to the ANN as the input from which features are extracted. Cer-
tain convolutional architectures, such as those built into OpenPose
and used in [
38
,
43
], can identify human body key points using
convolutional pose machines (CPMs), which are CNNs that can
recognize those features. In a bottom-up approach, these key points
are used to build a vector model of the human body. Galvao et al.
[
13
] combined RGB images and accelerometer data to train a multi-
modal convolution neural network to detect falls. DVS-OUTLAB, a
dataset and solution for outdoor activity monitoring systems based
on neuromorphic vision sensors, was recently proposed [
5
]. Lee et
al. [
19
] proposed embedded real-time fall detection using DVS and
achieved 31.25 FPS on the NVIDIA Jetson TX1 mobile GPU board.
Another study by Liu et al. [
22
] used SNN architecture to propose
a solution for event-based action recognition, which was evalu-
ated on the DailyAction-DVS dataset. In contrast to RNN-based
frameworks, SNNs excel in processing sparse features, closely mim-
icking the biological neuron behavior with discrete spikes. While
RNNs are optimized for continuous data, necessitating complex
adaptations for sparse inputs and potentially causing information
loss. Thus, for applications like fall detection requiring ecient
sparse temporal data processing, SNNs oer a natural and accurate
solution.
Our study primarily emphasizes vision-based solutions for the
detection of human falls. In a previous work by Miguel et al. [
8
], a
fall detection system for smart homes was introduced, utilizing a
single camera and employing computer vision techniques such as
background subtraction and Kalman ltering. In another research
eort by Shieh et al. [
34
], a multi-camera video surveillance setup
was proposed for detecting falling incidents. Their approach in-
volves a falling-pattern recognition algorithm that extracts relevant
images from each camera’s eld of view to monitor designated re-
gions. MapCam omnidirectional cameras have been used to perform
360
scenes-based fall detection where a background subtraction
model is used to extract the silhouettes of people in the scene. Then,
threshold-based techniques are used to classify an action as a fall
[
27
]. Stone et al. [
37
] have used the Microsoft Kinect ®camera
to perform fall detection by using the vertical state of a person
obtained from depth maps, but their method is limited by the per-
formance of the motion tracker used to obtain the vertical state.
Gasparrini et al. [
14
] leveraged depth cameras for fall detection, uti-
lizing a Kinect ®sensor to record depth information. The captured
elements within the depth scene underwent recognition through a
segmentation algorithm, which facilitated the classication of scene
blobs. Belbachir et al. [
4
] employed a pair of event-driven sensor
chips for ecient activity recording, featuring low data volume
and high temporal resolution. Notably, they curated a dataset for
real-time fall detection using event-driven stereo-vision systems,
yielding promising outcomes.
ICVGIP’23, December 2023, Ropar, India Shyam Sunder Prasad1,2, Naval Kishore Mehta1,2, Himanshu Kumar1,2, Abeer Banerjee1,2, Sumeet Saurav1,2, and Sanjay Singh1,2
3
OVERVIEW OF PROPOSED METHODOLOGY
Human action recognition based on Dynamic Vision Sensors (DVS)
represents an emerging eld of study. Our primary emphasis lies in
human fall detection, achieved through the utilization of a unique
DVS dataset and the exploration of SNNs. From an environmental
point of view, SNNs are faster to train and consume a lot less
power compared to CNNs for a given task [
36
]. Previous approaches
have used traditional convolutional networks with DVS to perform
fall detection, while we, in this paper, explore the possibilities of
spiking neural networks to deal with DVS input. Figure 2 provides
an overview of the approach. The recording procedures for the
DVSFall dataset are provided in Section 4.2.
4 DATASET PREPARATION
Earlier studies have introduced various datasets for human fall
detection utilizing frame-based cameras. In contrast, our contri-
bution centers on presenting an event-based privacy-preserving
dataset specically tailored for fall detection. This dataset leverages
multiple DVS to capture diverse viewpoints. Given the infrequent
occurrence of human falls, assembling a suitable dataset for fall
detection poses signicant challenges. Further elaboration on the
recording format and dataset labeling procedure is provided in the
subsequent sections.
4.1 Recording format
Our dataset was recorded using four DVS cameras, specically
three DAVIS 346 and one DAVIS 240 model. These cameras oer
resolutions of 346
×
260 and 240
×
180 respectively. To capture data
with the DVS, it is essential to have relative motion between the
sensor and the scene, as the DVS solely detects changes in pixel
intensity. The event classication involves a tuple (t, x, y, p), where
t represents the event’s time-stamp, and x, and y denote the pixel
coordinates within the frame. The parameter p signies the event’s
polarity, indicating changes in brightness [
29
]. Our data logging
occurred at a microsecond interval, with the information being
stored in Address Event Data format (AEDAT). Both the DAVIS 346
and DAVIS 240 models exhibit a dynamic range of 120 dB, with
simultaneous active pixel frame outputs.
Figure 3: Recording setup for the DVSFall dataset: 4 DVS
camera systems (three DVS 346 and one DVS 240) were placed
at four corners of the room. A falling platform was placed at
the center of an adequately illuminated room.
The entire dataset was saved in the AEDAT format version 4.0.
AEDAT data comprises both positive and negative events, detected
based on changes in lighting intensities resulting from human mo-
tion. The AEDAT 4.0 format uses Google Flatbuers to serialize data
in a convenient and ecient format. Flatbuers are cross-platform
serialization libraries that also allow quick and easy support for
languages like Python by auto-generating the appropriate support
les for them. All Flatbuers are size-prexed, meaning the rst
four bytes represent a 32-bit integer encoding the size of the fol-
lowing, actual Flatbuer data. All timestamps inside the data are
64-bit integers, representing Unix time in microseconds.
The AEDAT 4.0 le encompasses various data components, in-
cluding DVS noise-ltered output events, gray frames, accumulated
frames at 33 ms intervals, and direct camera events. DVS cameras
often undergo preprocessing involving noise lters due to their
vulnerability to background activity noise caused by temporal noise
and junction leakage currents. In conditions characterized by low
light or heightened sensitivity, background activity tends to esca-
late, leading to the adoption of a noise lter. This lter demonstrates
ecacy in specic scenarios by attenuating authentic events arising
from subtle light uctuations and thereby enhancing the dierenti-
ation between the moving subject and the background. The DVS
noise lter is a popular spatiotemporal lter that keeps a time map
of events. A trainable hot-pixel lter is included in the lter to lter
out “broken”(always-active) pixels in the sensor. It also contains a
refractory period lter, which restricts the number of events that
may be generated in a row by single pixels. It has an 8-neighboring-
events xed mask size and various additional adjustable features.
The accumulated frames within the AEDAT 4.0 le retain human
shape and movement, rendering them suitable for direct utilization
in convolutional neural networks.
Also, we processed the raw AEDAT (DVS noise-ltered output
events) data which is an ideal input choice for Spiking Neural
Networks (SNNs). Each event has four components: x, y, p, and
t, which represent the x-coordinate, y-coordinate, polarity, and
timestamp, respectively. Using the SLAYER-PyTorch library, each 2-
second spike event is converted into a tensor and saved as a 260
×
346
spatial event with polarities (ON and OFF) in a 725-dimensional
bin in NumPy format.
4.2 DVSFall dataset recording procedure
The dataset is captured using three DAVIS 346 and one DAVIS 240
dynamic vision system camera at a resolution of 346
×
260 and
240
×
180 respectively that are mounted at 4 dierent viewpoints
as shown in Figure 3. Our subjects were asked to mimic falling
actions including falling sidewards, backward, from a chair, etc.,
and other activities of daily living (ADL) like sleeping, standing,
walking, running, sitting, picking up objects, etc. To increase the
diversity in the dataset, multiple subjects were used and their falling
actions were recorded from multiple viewpoints. Recording actions
that do not perfectly mimic real falling actions might lead to the
nal model being trained on the wrong or irrelevant features thus
rendering the model useless in practical scenarios, therefore, due
care was taken to ensure that the falling sequences look as real as
possible.
The complete dataset has 75 falling activity sequences recorded
using 21 subjects (14 male and 7 female) from diverse age groups.
The recording setup specic to our case involved placing the DVS
camera systems at four corners of the room. The room was suf-
ciently illuminated and had a falling platform at the centre. To
compensate for the lower resolution of the DVS 240 sensor, we
Hybrid SNN-based Privacy-Preserving Fall Detection using Neuromorphic Sensors ICVGIP’23, December 2023, Ropar, India
Figure 4: A sample activity sequence recorded from multiple
viewpoints using the DVSFall recording setup.
mounted it close to the falling platform. The subjects were allowed
to enter the room, walk around, sit on the chair, and do other regu-
lar activities. They were also allowed to perform a realistic falling
action at any random time according to their choice while adequate
measures were taken to ensure their safety. All the activities per-
formed by the subjects had an average screen time of 2 minutes.
The falling action involved imitating a sudden stroke or sudden loss
of body balance. A sample of an activity sequence from multiple
viewpoints has been provided in Figure 4.
4.3 Labelling Procedure
The data was manually labeled into two classes: "fall" and "no fall."
For each two-second video clip, a running window of 30 frames was
employed to scan through the recorded data, encompassing falling
activities and activities of daily living (ADL). The classication of
a sequence of frames as "fall" or "no fall" was determined based
on the degree of overlap when the 30-frame window coincided
with the falling sequence. Specically, a video sequence was labeled
as "fall" only if more than 50% of the frames within the window
exhibited falling action (illustrated in the top section of Figure 5).
Otherwise, the sequence was labeled as "no fall" (illustrated in the
bottom section of Figure 5).
Figure 5: The gure illustrates a captured falling sequence
(top) and a daily living activity (bottom). In the daily living
sequence, it’s clear that the person is probably seated on a
chair. This action spans a total of 2 seconds.
The complete distribution of falling and ADL sequences from the
four dierent cameras is listed in Table 1. A detailed comparison of
our proposed dataset with the existing dataset [
26
] is provided in
Table 2 The entire dataset, obtained from four camera viewpoints
Table 1: Distribution of fall and ADL sequences in Training
and Testing splits of the recorded DVSFall dataset. Fall activ-
ities include: Falling while getting up from a chair, falling
forward, falling sideways, and falling backward. ADL activ-
ities include: Sitting in a chair, picking up objects, laying,
squatting, walking, and changing body positions while sleep-
ing
Cameras Fall ADL
Camera 1 114 578
Camera 2 85 239
Camera 3 159 1425
Camera 4 100 2020
Total 458 4262
Table 2: Comparison with existing Fall Detection dataset.
Attributes Miao et al. [26] Ours
Recordings 180 4720
Duration 5s 2s
Subjects 15 21
Classes 4 10 (1-Fall / 9-ADL)
No. of Cameras 1 3 & 1
Camera Resolution 346 ×260 346 ×260 / 240 ×180
Camera Position Dynamic Fixed
and featuring twenty-one subjects, was divided into three sets for
training and testing, following the methodology of [
32
]. In the initial
testing split, recordings from ve male subjects and two randomly
selected female subjects were included, while the recordings of the
remaining subjects were used for the training split. The second
testing split also had ve male subjects and two female subjects
while the training split contained the rest. The third testing split
had four male subjects and three female subjects. The detailed
distribution of videos in each split can be found in Table 3.
Table 3: Distribution of videos in DVSFall dataset
Data Splits Fall ADL Total
Train Test Train Test Fall ADL
Split-1 349 197 3212 1600 546 4812
Split-2 379 167 3176 1636 546 4812
Split-3 342 204 2983 1829 546 4812
5 TRAINING AND IMPLEMENTATION
In this section, various methods employed for fall detection are
explored. We explored the fall detection performance using spiking
neural network s like NeuCube, SLAYER, and a hybrid 3DCNN
& NeuCube architecture. Since the DVS provides spatiotemporal
events that could easily be converted to spikes, SNN-based oine
frameworks were used to achieve fall detection such that the trained
model could be used on neuromorphic hardware like SpiNNaker
[11], TrueNorth [25], Intel Loihi [7], etc.
ICVGIP’23, December 2023, Ropar, India Shyam Sunder Prasad1,2, Naval Kishore Mehta1,2, Himanshu Kumar1,2, Abeer Banerjee1,2, Sumeet Saurav1,2, and Sanjay Singh1,2
5.1 Implementation of SNN-based Fall
Detection
SNNs are a class of neural architectures that are fundamentally
dierent from traditional neural nets. Unlike traditional neural nets
that use state-independent data, the basic neural unit of an SNN
requires spike signals. This requires the time axis for their repre-
sentation, which means the propagation of information through
an SNN is very dierent from that of a traditional articial neural
network. Rather than working with continuous values, SNNs work
with discrete values (called spikes) at dened times. Therefore, an
encoded train of spikes is set as an input to the SNN, and another
set of spikes is expected at the output. The SNN architecture is com-
posed of spiking neurons and linking synapses that are described
by congurable scalar weights. The analog data is encoded to form
spike trains using an encoding technique [
30
] and the modeled
spiking neurons have pure threshold dynamics dependent on the
intensity of spikes. We used two dierent types of SNN architec-
tures to analyze the DVSFall dataset. First, we experimented with
the proposed SNN architectures using only spike-based data. Sec-
ond, we used the features extracted using 3D-CNN to train the
SNN-based NeuCube architecture.
5.1.1 Implementation on SLAYER architecture. We have experi-
mented with the SLAYER[
2
,
35
] SNN architecture based on the
work of Marchisio et al. [
24
]. This 10-layer SNN is implemented
in PyTorch and the proposed model is illustrated in Figure 6. The
Figure 6: Architecture of SLAYER framework that uses Con-
volution Spike Blocks (CSBs) and Fully-Connected Spike
Blocks (FSBs) for the implementation of spiking neural net-
works.
structure of the model consists of four convolution-spike blocks
(CSBs) followed by two fully connected-spike blocks (FSBs). The
𝑃×𝐻×𝑊×𝑇
dimensional tensor is the input to the SNN model,
where P = 2 is the polarity, H = 260 is the height pixels, W = 346
is the width pixels, and time bins T is the sample length per unit
sampling time in milliseconds. Each CSB and FSB consists of Pool-
SpikeLoihi-DelayShift-Dropout-Conv-DelayShift-SpikeLoihi lay-
ers and FC-SpikeLoihi-DelayShift layers, respectively. The CNNs
nomenclature layers, namely Pool, Dropout, Conv, and FC, per-
form the same operations as in CNNs with input feature maps for
each time stamp. The SpikeLoihi returns an output spike tensor
after applying Loihi neuron dynamics to weighted spike inputs and
DelayShift introduces a delay in time dimension. We have used
numSpikes loss function and the NAdam optimizer which is basi-
cally the Adam optimizer with Nesterov momentum. The learning
rate was set to 1
×
10
𝑒
2[
2
] and the training was done for 200
epochs on a DGX-5 with 32GB of memory. Study with various time
bin sizes T has been discussed in the following section.
5.1.2 Implementation on NeuCube architecture. Figure 7 discusses
the training steps of the NeuCube architecture based on SNN. SNNs
generally train faster than CNNs for a given task, but CNNs tend
to be more accurate [
36
]. According to El-Assal et al. [
9
], the gap
Figure 7: Data Mapping: Training pipeline of the
NeuCube[
17
] involves the conversion of spatiotempo-
ral data chunks to spike trains via spike-encoder block.
Learning: Spike sequence patterns are learnt in an unsuper-
vised mode and the output classication module is trained
in a supervised way. Optimization: The data mapping and
learning procedure is repeated till the maximum accuracy is
achieved.
between SNNs and CNNs in terms of detection accuracy, can be
bridged by a pre-processing step before passing the data to SNN.
Therefore, we explore a hybrid 3D-CNN & SNN architecture where
the pre-processed features extracted using 3D-CNN are fed to the
SNN. Figure 8 shows the owchart of hybrid 3D-CNN and NeuCube
for action classication. The spatiotemporal features derived from
the nal pool layer of the 3D-CNN are resized into a 48 ×32 feature
map. We adopted Ben’s Spiker Algorithm (BSA) Encoding [
30
] in
our experiments (described in algorithm 1) which has maximized
the NeuCube model’s classication accuracy. BSA creates unipolar
spike sequences using a Finite Impulse Response (FIR) lter (posi-
tive spikes and zero). Two error terms are examined at each time
point. The rst term is the result of subtracting the lter coecients
from subsequent signal values s(t), and the second term is obtained
by the unmodied input signal. If the subtraction error is smaller
than the unmodied signal error term minus a threshold, a positive
(excitatory) spike is generated, and the lter coecients are sub-
tracted from the signal. The Spiking Neuron Coordinates (SNNc)
structure is dened by providing the number of evenly spaced neu-
rons in the x, y, and z coordinates; this results in a cuboid shape; in
our case, we used a 5 ×5×5 cube. The small-world connectivity
(SWC) method with a radius of 3 was used as a linking parame-
ter to connect neurons in the SNNc. The learning rule in STDP
is indicated in Eq. 1, where the strength of the synaptic weight is
Hybrid SNN-based Privacy-Preserving Fall Detection using Neuromorphic Sensors ICVGIP’23, December 2023, Ropar, India
proportional to the degree of correlation between the spikes in the
pre-synaptic and post-synaptic neurons. Repeated spike arrival a
few milliseconds before post-synaptic action potentials cause synap-
tic long-term potentiation (LTP), resulting in synaptic connection
strengthening, whereas repeated spike arrival after post-synaptic
spikes cause long-term depression (LTD) and synaptic weight loss
[16, 39].
Figure 8: Training the NeuCube architecture using the fea-
tures extracted by the 3D-CNN.
Algorithm 1 BSA Encoding [30]
input: 𝑠 𝑖𝑛𝑝𝑢𝑡 , 𝑓 𝑖𝑟 , 𝜃 𝑡 ℎ𝑟𝑒 𝑠ℎ𝑜𝑙𝑑
𝐿𝑙𝑒𝑛𝑔𝑡 (𝑠)
𝐹𝑙𝑒𝑛𝑔𝑡 (𝑓 𝑖𝑟 )
𝜃0.5
𝑠𝑠𝑚𝑖𝑛 (𝑠)
𝑜𝑢𝑡 𝑝𝑢𝑡 𝑧𝑒𝑟 𝑜𝑠 (𝑙𝑒 𝑛𝑔𝑡ℎ (𝑠) )
for t= 1:(L-F) do
𝑒𝑟𝑟𝑜𝑟 10
𝑒𝑟𝑟𝑜𝑟 20
for t = 1:F do
𝑒𝑟𝑟𝑜𝑟 1𝑒𝑟 𝑟𝑜𝑟1+𝑎𝑏𝑠 (𝑠(𝑡+𝑘) 𝑓 𝑖𝑟 (𝑘))
𝑒𝑟𝑟𝑜𝑟 2𝑒𝑟 𝑟𝑜𝑟2+𝑎𝑏𝑠 (𝑠(𝑡+𝑘1) )
end for
if 𝑒𝑟𝑟𝑜𝑟 1 (𝑒𝑟 𝑟𝑜 𝑟2𝜃)then
𝑜𝑢𝑡 𝑝𝑢𝑡 (𝑡) 1
𝑁𝑁
2
for k = 1:F do
𝑠(𝑡+𝑘+1) 𝑠(𝑡+𝑘+1) 𝑓 𝑖𝑟 (𝑘) )
end for
end if
end for
return 𝑜𝑢𝑡 𝑝𝑢𝑡, 𝑚𝑖𝑛 (𝑠)
Δ𝑤=(𝐴 𝑒𝑥𝑝 (Δ𝑡/𝜏+);Δ𝑡<0
𝐵 𝑒𝑥 𝑝 (Δ𝑡/𝜏);Δ𝑡0,(1)
Where
Δ𝑤
represents the change in synaptic weight, A and B
are the synaptic learning rates for LTP and LTD, respectively. The
time constants
𝜏+
and
𝜏
determine the rate of decay for LTP and
LTD and the arrival time dierence between the pre-synaptic and
post-synaptic spikes is denoted by
Δ𝑡
. In this study, we used A and
B of 10 ms,
𝜏+
, and
𝜏
values of 2 ms, the ring threshold of 0.5,
and the refractory duration of 6 ms. Finally, the NeuCube version
of KNN, the Dynamic evolving SNN classier (deSNN), adopts the
rank order (RO) coding technique. The method parameters are
𝛼(𝑚𝑜𝑑)=
0
.
8for weight update on initial spike occurrence and
𝑑(𝑑𝑟 𝑖 𝑓 𝑡 )=
0
.
005 for weight update on successive spike occurrence.
The NeuCube is trained in an unsupervised way using the DVS-
Fall Split-2 test set. The gradient of the neurons in the SNNc is
clearly depicted in Figure 9(a), allowing for a clearer understanding
of learning patterns during the STDP process. The 25
𝑡ℎ
and 28
𝑡ℎ
features are found to be the most dominating features as learned
by the SNN. The “neuron percentage” statistic from the NeuCube
was utilized to calculate the relative intensity of interaction among
the total number of neurons in the SNNc shown in Figure 9(b).
(a) (b)
Figure 9: SNNc visualization: (a) Neuronal connections and
gradient strength visualized for a trained SNNc, allowing
for a clearer understanding of learning patterns during the
STDP process. (b) Neuron proportion signifying the gradient
strength indicates that the 25
𝑡ℎ
and 28
𝑡ℎ
features are found
to be dominant as visualized in the pie chart.
Figure 10: The gure demonstrates Fall Detection using SNN
on our recorded dataset. The upper row displays the corre-
sponding grayscale frames of the event logs, while the lower
row shows the polarity-coded event logs from the DVS (ob-
tained using DV software [7]).
6 RESULTS
In our approach, we address the challenge of detecting infrequent
human falls, which inherently represent rare events. Consequently,
a dataset containing a broad range of human actions would exhibit
signicant class imbalance. When assessing a model’s performance
using accuracy in the presence of such class imbalance, the issue
of the accuracy paradox arises, as noted by Branco et al. [
6
].
Therefore, we prefer to judge our models’ performance based on
the obtained sensitivity and specicity given by equations 2, 3.
ICVGIP’23, December 2023, Ropar, India Shyam Sunder Prasad1,2, Naval Kishore Mehta1,2, Himanshu Kumar1,2, Abeer Banerjee1,2, Sumeet Saurav1,2, and Sanjay Singh1,2
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 +𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 (2)
𝑆𝑝 𝑒𝑐𝑖 𝑓 𝑖 𝑐𝑖𝑡𝑦 =
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣 𝑒
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣 𝑒 +𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (3)
We utilized DVS noise-ltered output events to train the SLAYER
SNN. The performance of the SLAYER SNN was evaluated with
dierent time bins (T) using intervals of 25 ms, ranging from 600
ms to 1000 ms, corresponding to sample lengths of 1200 ms and
2000 ms at a sampling rate of 2, as illustrated in Figure 11 with the
best performance on time bin of 725 ms on DVSFall Spilt-1. Hence
we choose sample length 1450 ms(time bin: 725 ms) to train on the
other splits, the model achieved an average accuracy of 94.59% and
sensitivity of 80.63% on the DVSFall test set as provided in Table
4. The qualitative results of the best-performing model trained on
split-1 is shown in Figure 10.
Figure 11: SLAYER SNN performance on various time bins.
In the next set of experiments, we obtained results through the
utilization of a 3D-CNN network, which was trained over 10 epochs
on the DVSFall Split-1 train sets. The network consists of two 3D
convolution layers employing 16 and 32 kernels, each with a size
of 3
×
3
×
3, sequentially followed by a 3D max pooling layer with
a kernel size of 3
×
3
×
3. The resultant feature dimensions of
16
×
32 and a depth of 3 are used as input for the spike encoder.
The spike encoder extracted features, resulting in a feature map
of size 48
×
32 for NeuCube. These extracted features were then
subjected to computation and evaluation on the DVSFall Split-1
test set. The same procedural steps were repeated for both the
DVSFall Split-2 and DVSFall Split-3 in order to train and assess the
NeuCube architecture’s performance. Table 4, indicates that the
model achieved an impressive average accuracy of 97.84% and a
sensitivity of 85.03% when applied to the DVSFall dataset.
7 CONCLUSION
In this work, our objective is to achieve privacy-preserving human
fall detection through the utilization of Dynamic Vision Sensors
(DVS). To tackle this challenge, we investigated two approaches:
Table 4: Results on SLAYER SNN and hybrid 3D-CNN and
NeuCube model trained on DVSFall.
Method Dataset Acc. Sen. Spe.
SLAYER SNN (DVSFall-Split1 Test) 95.28 85.73 96.34
SLAYER SNN (DVSFall-Split2 Test) 94.31 81.62 96.12
SLAYER SNN (DVSFall-Split3 Test) 94.17 74.55 95.85
SLAYER SNN (DVSFall Test) (avg) 94.59 80.63 96.10
Hybrid 3D-CNN and NeuCube (DVSFall-Split1 Test) 98.16 92.73 98.72
Hybrid 3D-CNN and NeuCube (DVSFall-Split2 Test) 97.83 83.64 99.26
Hybrid 3D-CNN and NeuCube (DVSFall-Split3 Test) 97.51 78.95 99.45
Hybrid 3D-CNN and NeuCube (DVSFall Test) (avg) 97.84 85.03 99.14
employing SNN and a hybrid 3D-CNN & SNN architectures. We per-
form privacy-preserving fall detection directly by using SNNs and
a hybrid 3D-CNN & SNN architecture and report the performance
details comparatively. We observed that when the SNN is used to
process the DVS data directly, the achieved accuracy is signicantly
low compared to other hybrid approaches. However, when the SNN
received the extracted features from the 3D-CNN, a substantial
improvement was noticed. Our future task could be to improve
the accuracy solely by using SNNs with minor or preferably no
pre-processing step. The presented DVSFall datasets open up a
wide range of possibilities for future work. Sparse convolutional
networks compatible with DVS output can be used to accelerate
the computation speed with a lesser memory footprint, and the
fall detection algorithm could be executed directly using spiking
neural networks without using the derived features from the 3D-
CNN. The prevalence of fatal human falls among the elderly and
the frequent occurrence of minor, non-fatal falls in children suggest
the potential utility of an attention mechanism that triggers alarms
according to fall severity in practical situations. Looking ahead, we
intend to expand our method to other SNN applications such as
action detection and localization, and real-time edge processing on
various neuromorphic processors.
REFERENCES
[1]
2022. Dynamic Vision Sensor. https://inivation.com/products/customsolutions/
videos/. Accessed: 2022-04-13.
[2]
2022. SLAYER PyTorch. https://bamsumit.github.io/slayerPytorch/build/html/
spikeLoss.html. Accessed: 2022-04-28.
[3]
Abeer Banerjee, Shyam Sunder Prasad, Naval Kishore Mehta, Himanshu Ku-
mar, Sumeet Saurav, and Sanjay Singh. 2022. Gaze Detection Using Encoded
Retinomorphic Events. In International Conference on Intelligent Human Computer
Interaction. Springer, 442–453.
[4]
Ahmed Nabil Belbachir, Stephan Schraml, and Aneta Nowakowska. 2011. Event-
driven stereo vision for fall detection. In CVPR 2011 WORKSHOPS. IEEE, 78–83.
[5]
Tobias Bolten, Regina Pohle-Frohlich, and Klaus D Tonnies. 2021. D VS-OUTLAB:
A Neuromorphic Event-Based Long Time Monitoring Dataset for Real-World
Outdoor Scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition. 1348–1357.
[6]
Paula Branco, Luís Torgo, and Rita P Ribeiro. 2016. A survey of predictive
modeling on imbalanced domains. ACM Computing Surveys (CSUR) 49, 2 (2016),
1–50.
[7]
Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang
Cao, Sri Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain,
et al
.
2018. Loihi: A neuromorphic manycore processor with on-chip learning.
Ieee Micro 38, 1 (2018), 82–99.
[8]
Koldo De Miguel, Alberto Brunete, Miguel Hernando, and Ernesto Gambao. 2017.
Home camera-based fall detection system for the elderly. Sensors 17, 12 (2017),
2864.
[9]
Mireille El-Assal, Pierre Tirilly, and Ioan Marius Bilasco. 2021. A Study On the
Eects of Pre-processing On Spatio-temporal Action Recognition Using Spiking
Hybrid SNN-based Privacy-Preserving Fall Detection using Neuromorphic Sensors ICVGIP’23, December 2023, Ropar, India
Neural Networks Trained with STDP. In 2021 International Conference on Content-
Based Multimedia Indexing (CBMI). IEEE, 1–6.
[10]
Yaxiang Fan, Martin D Levine, Gongjian Wen, and Shaohua Qiu. 2017. A deep
neural network for real-time detection of falling humans in naturally occurring
scenes. Neurocomputing 260 (2017), 43–58.
[11]
Steve B Furber, Francesco Galluppi, Steve Temple, and Luis A Plana. 2014. The
spinnaker project. Proc. IEEE 102, 5 (2014), 652–665.
[12]
Guillermo Gallego, Tobi Delbrück, Garrick Orchard, Chiara Bartolozzi, Brian
Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, Jörg Conradt, Kostas
Daniilidis, et al
.
2020. Event-based vision: A survey. IEEE transactions on pattern
analysis and machine intelligence 44, 1 (2020), 154–180.
[13]
Yves M Galvão, Janderson Ferreira, Vinícius A Albuquerque, Pablo Barros, and
Bruno JT Fernandes. 2021. A multimodal approach using deep learning for fall
detection. Expert Systems with Applications 168 (2021), 114226.
[14]
Samuele Gasparrini, Enea Cippitelli, Susanna Spinsante, and Ennio Gambi. 2014.
A depth-based fall detection system using a Kinect®sensor. Sensors 14, 2 (2014),
2756–2775.
[15]
Ronak Gupta, Prashant Anand, Santanu Chaudhury, Brejesh Lall, and Sanjay
Singh. 2020. Compressive sensing based privacy for fall detection. In Computer
Vision, Pattern Recognition, Image Processing, and Graphics: 7th National Con-
ference, NCVPRIPG 2019, Hubballi, India, December 22–24, 2019, Revised Selected
Papers 7. Springer, 429–438.
[16]
Nikola Kasabov, Nathan Matthew Scott, Enmei Tu, Stefan Marks, Neelava Sen-
gupta, Elisa Capecci, Muhaini Othman, Maryam Gholami Doborjeh, Norhanifah
Murli, Reggio Hartono, et al
.
2016. Evolving spatio-temporal data machines based
on the NeuCube neuromorphic framework: Design methodology and selected
applications. Neural Networks 78 (2016), 1–14.
[17]
Nikola K Kasabov. 2014. NeuCube: A spiking neural network architecture for
mapping, learning and understanding of spatio-temporal brain data. Neural
Networks 52 (2014), 62–76.
[18]
Erik Krempel, Pascal Birnstill, and Jürgen Beyerer. 2017. A Privacy-Aware Fall
Detection System for Hospitals and Nursing Facilities. European Journal for
Security Research 2, 2 (2017), 83–95.
[19]
Hyunwoo Lee, Jooyoung Kim, Dojun Yang, and Joon-Ho Kim. 2017. Embedded
real-time fall detection using deep learning for elderly care. arXiv preprint
arXiv:1711.11200 (2017).
[20]
Fuyou Liao, Feichi Zhou, and Yang Chai. 2021. Neuromorphic vision sensors:
Principle, progress and perspectives. Journal of Semiconductors 42, 1 (2021),
013105.
[21]
Jixin Liu, Rong Tan, Guang Han, Ning Sun, and Sam Kwong. 2020. Privacy-
Preserving In-Home Fall Detection Using Visual Shielding Sensing and Private
Information-Embedding. IEEE Transactions on Multimedia 23 (2020), 3684–3699.
[22]
Qianhui Liu, Dong Xing, Huajin Tang, De Ma, and Gang Pan. 2021. Event-based
Action Recognition Using Motion Information and Spiking Neural Networks. In
Proceedings of the Thirtieth International Joint Conference on Articial Intelligence,
IJCAI-21, Z.-H. Zhou, Ed. International Joint Conferences on Articial Intelligence
Organization, Vol. 8. 1743–1749.
[23]
Chao Ma, Atsushi Shimada, Hideaki Uchiyama, Hajime Nagahara, and Rin-ichiro
Taniguchi. 2019. Fall detection using optical level anonymous image sensing
system. Optics & Laser Technology 110 (2019), 44–61.
[24]
A. Marchisio, G. Pira, M. Martina, G. Masera, and M. Shaque. 2021. DVS-Attacks:
Adversarial Attacks on Dynamic Vision Sensors for Spiking Neural Networks. In
2021 International Joint Conference on Neural Networks (IJCNN).
[25]
Paul A Merolla, John V Arthur, Rodrigo Alvarez-Icaza, Andrew S Cassidy, Jun
Sawada, Filipp Akopyan, Bryan L Jackson, Nabil Imam, Chen Guo, Yutaka Naka-
mura, et al
.
2014. A million spiking-neuron integrated circuit with a scalable
communication network and interface. Science 345, 6197 (2014), 668–673.
[26]
Shu Miao, Guang Chen, Xiangyu Ning, Yang Zi, Kejia Ren, Zhenshan Bing,
and Alois Knoll. 2019. Neuromorphic Vision Datasets for Pedestrian Detection,
Action Recognition, and Fall Detection. Frontiers in Neurorobotics 13 (2019).
https://doi.org/10.3389/fnbot.2019.00038
[27]
S-G Miaou, Pei-Hsu Sung, and Chia-Yuan Huang. 2006. A customized human
fall detection system using omni-camera images and personal information. In 1st
Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, 2006.
D2H2. IEEE, 39–42.
[28]
Dariusz Mrozek, Anna Koczur, and Bożena Małysiak-Mrozek. 2020. Fall detection
in older adults with mobile IoT devices and machine learning in the cloud and
on the edge. Information Sciences 537 (2020), 132–147.
[29]
Elias Mueggler, Christian Forster, Nathan Baumli, Guillermo Gallego, and Davide
Scaramuzza. 2015. Lifetime estimation of events from dynamic vision sensors.
In 2015 IEEE international conference on Robotics and Automation (ICRA). IEEE,
4874–4881.
[30]
Balint Petro, Nikola Kasabov, and Rita M Kiss. 2019. Selection and optimization of
temporal spike encoding methods for spiking neural networks. IEEE transactions
on neural networks and learning systems 31, 2 (2019), 358–370.
[31]
Shyam Sunder Prasad, Naval Kishore Mehta, Abeer Banerjee, Himanshu Kumar,
Sumeet Saurav, and Sanjay Singh. 2022. Real-Time Privacy-Preserving Fall Detec-
tion using Dynamic Vision Sensors. In 2022 IEEE 19th India Council International
Conference (INDICON). IEEE, 1–6.
[32]
Sumeet Saurav, Ravi Saini, and Sanjay Singh. 2022. A dual-stream fused neural
network for fall detection in multi-camera and 360
videos. Neural Computing
and Applications 34, 2 (2022), 1455–1482.
[33]
Sumeet Saurav, Ravi Saini, and Sanjay Singh. 2022. Vision-based techniques for
fall detection in 360
videos using deep learning: Dataset and baseline results.
Multimedia Tools and Applications 81, 10 (2022), 14173–14216.
[34]
Wann-Yun Shieh and Ju-Chin Huang. 2012. Falling-incident detection and
throughput enhancement in a multi-camera video-surveillance system. Medical
engineering & physics 34, 7 (2012), 954–963.
[35]
Sumit B Shrestha and Garrick Orchard. 2018. Slayer: Spike layer error reassign-
ment in time. Advances in neural information processing systems 31 (2018).
[36]
Martino Sorbaro, Qian Liu, Massimo Bortone, and Sadique Sheik. 2020. Opti-
mizing the energy consumption of spiking neural networks for neuromorphic
applications. Frontiers in neuroscience (2020), 662.
[37]
Erik E Stone and Marjorie Skubic. 2014. Fall detection in homes of older adults
using the Microsoft Kinect. IEEE journal of biomedical and health informatics 19,
1 (2014), 290–301.
[38]
Guangmin Sun and Zhongqi Wang. 2020. Fall detection algorithm for the elderly
based on human posture estimation. In 2020 Asia-Pacic Conference on Image
Processing, Electronics and Computers (IPEC). IEEE, 172–176.
[39]
Clarence Tan, Marko Šarlija, and Nikola Kasabov. 2020. Spiking neural networks:
Background, recent development and the NeuCube architecture. Neural Processing
Letters 52, 2 (2020), 1675–1701.
[40]
Huachun Tan, Yang Zhou, Yong Zhu, Danya Yao, and Keqiang Li. 2014. A
novel curve lane detection based on Improved River Flow and RANSA. In 17th
international ieee conference on intelligent transportation systems (itsc). IEEE, 133–
138.
[41]
Shigeyuki Tateno, Fanxing Meng, Renzhong Qian, and Yuriko Hachiya. 2020.
Privacy-preserved fall detection method with three-dimensional convolutional
neural network using low-resolution infrared array sensor. Sensors 20, 20 (2020),
5957.
[42]
Jixiang Wan, Ming Xia, Zunkai Huang, Li Tian, Xiaoying Zheng, Victor Chang,
Yongxin Zhu, and Hui Wang. 2021. Event-Based Pedestrian Detection Using
Dynamic Vision Sensors. Electronics 10, 8 (2021), 888.
[43]
Bo-Hua Wang, Jie Yu, Kuo Wang, Xuan-Yu Bao, and Ke-Ming Mao. 2020. Fall
detection based on dual-channel feature integration. IEEE Access 8 (2020), 103443–
103453.
[44]
Xueyi Wang, Joshua Ellul, and George Azzopardi. 2020. Elderly fall detection
systems: A literature survey. Frontiers in Robotics and AI 7 (2020), 71.
[45]
Pei-Chen Wu, Chin-Yu Chang, and Chang Hong Lin. 2014. Lane-mark extraction
for automobiles under complex conditions. Pattern Recognition 47, 8 (2014),
2756–2767.
... Similarly, 17 healthy adults were used for simulating falling and falls using RGB cameras [12]. Finally, the DVSFall [13] dataset includes simulated falls and diverse ADLs performed by 21 healthy subjects of varying ages, captured using multiple dynamic vision sensors strategically positioned. ...
Article
Full-text available
Despite extensive research in machine learning for fall detection, early warning signs of falls have been largely overlooked. Current datasets mainly focus on fall mitigation rather than the irregularities in movement and behavior patterns that precede a fall, useful for fall prevention. Identifying these early signs is crucial for enabling timely interventions to reduce injury severity and improve the quality of life for older adults. To address this gap, we present the Pre-VFall dataset, a novel resource designed to simulate early fall indicators using vision sensor technology. Since RGB cameras already exist in common areas of living facilities for seniors, the proposed vision-based sensors become naturally well-suited. This open dataset comprises over 22K simulated instances encompassing normal conditions, various abnormal states (including weakness, dizziness, delirium-confusion, and Normal Pressure Hydrocephalus (NPH)-confusion), and fall events, all recorded from nine healthy young adult participants. The dataset includes comprehensive data in the form of videos, images, key gradient vector magnitude, and key gradient vector direction features. These elements are crucial for advancing research into the pre-fall irregularities that signal potential falls, thereby supporting the development of more sophisticated and proactive fall detection systems.
... Dynamic Vision Sensors (DVS), with their exceptional temporal resolution of 1 µs and latency under 1 ms, offer a promising solution by capturing changes in brightness through binary ON and OFF events [7]. While DVS technology excels in scenarios requiring low latency and a high dynamic range [8], [9], its limited adoption in gaze tracking is due to the high cost and limited availability of the hardware. We approach these concerns by simulating DVS-like event streams, we replicate the high temporal precision needed for saccadic gaze prediction while overcoming the cost and accessibility barriers associated with DVS. ...
... Furthermore, an active and growing research field is investigating the usage of event cameras coupled with other data sources to realize assistive devices, in particular for visual impairment. As already observed, when sensitive data are transmitted, it is clear that realizing privacy-preserving systems is crucial for the adoption of such sensors in real scenarios, as investigated in [198,199]. As a final note, the authors in [157] stated how there are no established studies that have delved into the utilization of event cameras for specific biomedical purposes. ...
Article
Full-text available
Traditional frame-based cameras, despite their effectiveness and usage in computer vision, exhibit limitations such as high latency, low dynamic range, high power consumption, and motion blur. For two decades, researchers have explored neuromorphic cameras, which operate differently from traditional frame-based types, mimicking biological vision systems for enhanced data acquisition and spatio-temporal resolution. Each pixel asynchronously captures intensity changes in the scene above certain user-defined thresholds, and streams of events are captured. However, the distinct characteristics of these sensors mean that traditional computer vision methods are not directly applicable, necessitating the investigation of new approaches before being applied in real applications. This work aims to fill existing gaps in the literature by providing a survey and a discussion centered on the different application domains, differentiating between computer vision problems and whether solutions are better suited for or have been applied to a specific field. Moreover, an extensive discussion highlights the major achievements and challenges, in addition to the unique characteristics, of each application field.
Article
Full-text available
Alarming cases of falls in the elderly have triggered the rise of robust and cost-efficient systems for automated fall detection in humans. Although several potential solutions exist, they still have not achieved the desired level of robustness and acceptability. Lately, the proliferation of low-cost cameras coupled with deep learning techniques has transformed vision-based methods for fall detection. Motivated by this, in this paper, we present an alternate low-cost and efficient system for fall detection in 360∘ videos using deep learning. Towards this, we first built a well-balanced video dataset named Fall360. The Fall360 dataset contains video clips of several falls and non-fall actions, captured by a 360∘ camera mounted on the ceiling in a home-like environment. Secondly, we examined the performance of deep learning techniques that consist of several variants of hybrid CNN & LSTM, hybrid CNN & ConvLSTM, and 3D CNNs to test the effectiveness of the dataset in the fall detection task. Thirdly, to assess the performance of these techniques, we conducted an ablation study on a recently introduced multi-camera UP-Fall dataset. The deep learning models attained substantial improvement in recognition accuracy on both the fall datasets and have set the new state-of-the-art performance. Overall, our designed fall detection system using 360∘ videos, in addition to providing a better perspective, bestows a more suitable and low-cost alternative for the existing multi-camera-based fall detection systems. To encourage more study, we will make our in-house Fall360 dataset publicly available to the research community.
Article
Full-text available
Globally, human falls are the second leading cause of deaths induced due to unintentional injuries. These fatalities, in most cases, arise due to a lack of timely medication. Therefore, over the years, there has been an immense demand for systems that can quickly send fall-related information to the caretakers so that the medical relief team can reach on time. The traditional schemes for fall detection using wearable sensors such as accelerometers, gyroscopes, etc., are highly intrusive and generate high false positives in real-world conditions. Consequently, the current research directions in this domain have been toward harnessing the availability of low-cost vision sensors and the power of deep learning. To this end, in this work, we present a dual-stream fused neural network (DSFNN) for fall detection in multi-camera and 360360^{\circ } video streams. The DSFNN model learns to extract spatial-temporal information using two neural networks, trained independently on the RGB video sequences of fall and non-fall activities and their corresponding single dynamic images. Once trained, the model fuses the prediction scores of the two neural networks using a weighted fusion scheme to obtain the final decision. We assessed the performance of the proposed DSFNN on two multi-camera fall datasets, namely UP-Fall and URFD, and on a new in-house 360360^{\circ } video dataset of fall and non-fall activities. The evaluation results in terms of different performance metrics demonstrated the superiority of the proposed fall detection scheme. The framework achieved superior performance and outperformed the previous state-of-the-art fall detection methods. For further research and analysis in the fall detection domain, we will make the source code and the in-house fall dataset available to the research community on request.
Conference Paper
Full-text available
Event-based cameras have attracted increasing attention due to their advantages of biologically inspired paradigm and low power consumption. Since event-based cameras record the visual input as asynchronous discrete events, they are inherently suitable to cooperate with the spiking neural network (SNN). Existing works of SNNs for processing events mainly focus on the task of object recognition. However, events from the event-based camera are triggered by dynamic changes, which makes it an ideal choice to capture actions in the visual scene. Inspired by the dorsal stream in visual cortex, we propose a hierarchical SNN architecture for event-based action recognition using motion information. Motion features are extracted and utilized from events to local and finally to global perception for action recognition. To the best of the authors’ knowledge, it is the first attempt of SNN to apply motion information to event-based action recognition. We evaluate our proposed SNN on three event-based action recognition datasets, including our newly published DailyAction-DVS dataset comprising 12 actions collected under diverse recording conditions. Extensive experimental results show the effectiveness of motion information and our proposed SNN architecture for event-based action recognition.
Article
Full-text available
Pedestrian detection has attracted great research attention in video surveillance, traffic statistics, and especially in autonomous driving. To date, almost all pedestrian detection solutions are derived from conventional framed-based image sensors with limited reaction speed and high data redundancy. Dynamic vision sensor (DVS), which is inspired by biological retinas, efficiently captures the visual information with sparse, asynchronous events rather than dense, synchronous frames. It can eliminate redundant data transmission and avoid motion blur or data leakage in high-speed imaging applications. However, it is usually impractical to directly apply the event streams to conventional object detection algorithms. For this issue, we first propose a novel event-to-frame conversion method by integrating the inherent characteristics of events more efficiently. Moreover, we design an improved feature extraction network that can reuse intermediate features to further reduce the computational effort. We evaluate the performance of our proposed method on a custom dataset containing multiple real-world pedestrian scenes. The results indicate that our proposed method raised its pedestrian detection accuracy by about 5.6–10.8%, and its detection speed is nearly 20% faster than previously reported methods. Furthermore, it can achieve a processing speed of about 26 FPS and an AP of 87.43% when implanted on a single CPU so that it fully meets the requirement of real-time detection.
Article
Full-text available
Conventional frame-based image sensors suffer greatly from high energy consumption and latency. Mimicking neurobiological structures and functionalities of the retina provides a promising way to build a neuromorphic vision sensor with highly efficient image processing. In this review article, we will start with a brief introduction to explain the working mechanism and the challenges of conventional frame-based image sensors, and introduce the structure and functions of biological retina. In the main section, we will overview recent developments in neuromorphic vision sensors, including the silicon retina based on conventional Si CMOS digital technologies, and the neuromorphic vision sensors with the implementation of emerging devices. Finally, we will provide a brief outline of the prospects and outlook for the development of this field.
Article
A computational system able to automatically and efficiently detect and classify falls would be beneficial for monitoring the elderly population and speed up the assistance proceedings, reducing the risk of prolonged injuries and death. One of the most common problems in such systems is the high number of false-positives in their recognition scheme, which may cause an overload on surveillance system calls. We address this problem by proposing different topologies of a multimodal convolution neural network, which is trained to detect falls based on RGB images and information from accelerometers. We train and evaluate our networks with the UR Fall Detection dataset and UP-Fall dataset, and provide an extensive comparison with state-of-the-art models. Our model reached good results on UR Fall Detection dataset and achieved the state-of-art on UP-Fall detection dataset, relying on easily available sensors to do so, demonstrating it can be a scalable solution for robust fall detection in the real world.