Conference Paper

Compressive sensing based privacy for fall detection

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Fall detection holds immense importance in the field of health-care, where timely detection allows for instant medical assistance. In this context, we propose a 3D ConvNet architecture which consists of 3D Inception modules for fall detection. The proposed architecture is a custom version of Inflated 3D (I3D) architecture, that takes compressed measurements of video sequence as spatio-temporal input, obtained from com-pressive sensing framework, rather than video sequence as input, as in the case of I3D convolutional neural network. This is adopted since privacy raises a huge concern for patients being monitored through these RGB cameras. The proposed framework for fall detection is flexible enough with respect to a wide variety of measurement matrices. Ten action classes randomly selected from Kinetics-400 with no fall examples, are employed to train our 3D ConvNet post compressive sensing with different types of sensing matrices on the original video clips. Our results show that 3D ConvNet performance remains unchanged with different sensing matrices. Also, the performance obtained with Kinetics pre-trained 3D ConvNet on compressively sensed fall videos from benchmark datasets is better than the state-of-the-art techniques.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The main reasons of this unsatisfying performance are related to the low number of features and the use of handcraft rules based on threshold values. Strategy (ii) [18], [19], [20], [21] relies on features extracted (e.g., maximum and minimum peak, acceleration, angular velocity, velocity) from a sliding window and classical machine learning algorithms (e.g., k-Nearest Neighbors, Support Vector Machine, Naïve Bayes). The performances are very good. ...
... However, unlike researchers, humans are reluctant to accept cameras due to privacy concerns. Thus, other types of cameras have been proposed to detect falls, such as thermal and depth cameras [9], [10], [21], [22]. The main advantage of these latter technologies is that patients cannot be recognized (or it is a very complex task to recognize patients). ...
Article
Full-text available
Among elderly populations over the world, a high percentage of individuals are affected by physical or mental diseases, greatly influencing their quality of life. As it is a known fact that they wish to remain in their own home for as long as possible, solutions must be designed to detect these diseases automatically, limiting the reliance on human resources. To this end, our team developed a sensors platform based on infrared proximity sensors to accurately recognize basic bathroom activities such as going to the toilet and showering. This work is based on the body of scientific literature which establish evidences that activities relative to corporal hygiene are strongly correlated to health status and can be important signs of the development of eventual disorders. The system is built to be simple, affordable and highly reliable. Our experiments have shown that it can yield an F-Score of 96.94~\%. Also, the durations collected by our kit are approximately 6 seconds apart from the real ones; those results confirm the reliability of our kit.
... The main reasons of this unsatisfying performance are related to the low number of features and the use of handcraft rules based on threshold values. Strategy (ii) [22], [23], [24], [25] relies on features extracted (e.g., maximum and minimum peak, acceleration, angular velocity, velocity) from a sliding window and classical machine learning algorithms (e.g., k-Nearest Neighbors, Support Vector Machine, Naïve Bayes). The performances are very good. ...
... However, unlike researchers, humans are reluctant to accept cameras due to privacy concerns. Thus, other types of cameras have been proposed to detect falls, such as thermal and depth cameras [10], [11], [25], [26]. The main advantage of these latter technologies is that patients cannot be recognized (or it is a very complex task to recognize patients). ...
Article
Full-text available
Fall detection is a major challenge for researchers. Indeed, a fall can cause injuries such as femoral neck fracture, brain hemorrhage, or skin burns, leading to significant pain. However, in some cases, trauma caused by an undetected fall can get worse with the time and conducts to painful end of life or even death. One solution is to detect falls efficiently to alert somebody (e.g., nurses) as quickly as possible. To respond to this need, we propose to detect falls in a real apartment of 40 square meters by exploiting three ultra-wideband radars and a deep neural network model. The deep neural network is composed of a convolutional neural network stacked with a long-short term memory network and a fully connected neural network to identify falls. In other words, the problem addressed in this paper is a binary classification attempting to differentiate fall and non-fall events. As it can be noticed in real cases, the falls can have different forms. Hence, the data to train and test the classification model have been generated with falls (four types) simulated by 10 participants in three locations in the apartment. Finally, the train and test stages have been achieved according to three strategies, including the leave-one-subject-out method. This latter method allows for obtaining the performances of the proposed system in a generalization context. The results are very promising since we reach almost 90% of accuracy.
... In Chapter 5, a conclusion is made. Gupta et al., 2020). However, it is known that systems working with this method are rejected by elderly individuals. ...
... The model described achieved a rapid fall detection time of 0.312 seconds for the first fall (average of 0.5 seconds) with 100% accuracy. Ronak et al. [21] proposed a fall detection system named as 3D ConvNet architecture using NVIDIA DGX-1 through developing a compressive sensing framework. Tsung et al. [22] implemented a fall detection solution with NVIDIA Jetson TX2, for real-time information extraction incorporated with traditional algorithms. ...
Conference Paper
Full-text available
Accidental falls have created an alarming situation for elderly people who are either living independently or left alone for a short time such as during a pandemic by their families. Falls have become one of the leading causes of death and irreversible injuries for aged people worldwide. Our proposed system combines machine learning and object detection with low power sensors and computing systems to create a portable and low-cost solution for fall detection. Specifically, our system for human fall detection uses a millimeter wave (mmWave) sensor connected to NVIDIA Jetson Nano system. We use neural network models to analyze the output response from the mmWave system to classify different human postures and provide an accurate fall detection. The developed system provides privacy as no camera images are recorded and high accuracy to detect real-time fall detection for both single and multiple individuals, simultaneously.
... In this section, recent methods in fall detection are brie°y explained whose results are compared in Sec. 4. A 3D ConvNet architecture is used to identify falls. 12 It consists of 3D Fall Detection Inception modules. It is a modi¯ed variant of In°ated 3D (I3D) architecture, which takes Structural Measurement Matrix (SMM) of video sequence as spatio-temporal data, obtained from compressive sensing frameworks, rather than video sequence as data, as in the case of I3D ConvNet. ...
Article
Fall detection is a serious problem in elder people. Constant inspection is important for this fall identification. Currently, numerous methods associated with fall detection are a significant area of research for safety purposes and for the healthcare industries. The objective of this paper is to identify elderly falls. The proposed method introduces keyframe based fall detection in elderly care system. Experiments were conducted on University of Rzeszow (UR) Fall Detection dataset, Fall Detection Dataset and MultiCam dataset. It is substantially proved that the proposed method achieves higher accuracy rate of 99%, 98.15% and 99% for UR Fall detection dataset, Fall Detection Dataset and MultiCam dataset, respectively. The performance of the proposed method is compared with other methods and proved to have higher accuracy rate than those methods.
Chapter
Full-text available
Music is a sound that arises sensations in human mind and body. Music, since the beginning of time, has been present in every culture and life form, in audio and symbolic form, and in physical or digital mode of communication. While availability and scope for advancements increase exponentially, so does the need to search, compare, and organise music. Music industry has been striving towards finding the best possible approach to categorise music whether classification on the basis of emotions, instrumentation, genres, or any other music information will be most efficient as well as useful to listeners and music service providers. With the aim to support the best music experience, the current study statistically shows, with the help of prior research in music information retrieval and implementation of several powerful machine learning-based technologies, that genre classification, that too, ensemble-based, can be as accurate as 73.17%. The study analyses the performances of all models towards genre classification and concludes by proving max-voting ensemble-based models to be more accurate than each component classifier and advanced ensemble models and also optimal for real-world music genre classification as compared to prior experiments on GTZAN database, which is the novel contribution of the study.
Conference Paper
Full-text available
Human falls occur very rarely; this makes it difficult to employ supervised classification techniques. Moreover, the sensing modality used must preserve the identity of those being monitored. In this paper, we investigate the use of thermal camera for fall detection, since it effectively masks the identity of those being monitored. We formulate the fall detection problem as an anomaly detection problem and aim to use autoencoders to identify falls. We also present a new anomaly scoring method to combine the reconstruction score of a frame across different video sequences. Our experiments suggests that Convolutional LSTM autoencoders perform better than convolutional and deep autoencoders in detecting unseen falls.
Conference Paper
Full-text available
This paper addresses the real-time encoding-decoding problem for high-frame-rate video compressive sensing (CS). Unlike prior works that perform reconstruction using iterative optimization-based approaches, we propose a non-iterative model, named "CSVideoNet". CSVideoNet directly learns the inverse mapping of CS and reconstructs the original input in a single forward propagation. To overcome the limitations of existing CS cameras, we propose a multi-rate CNN and a synthesizing RNN to improve the trade-off between compression ratio (CR) and spatial-temporal resolution of the reconstructed videos. The experiment results demonstrate that CSVideoNet significantly outperforms the state-of-the-art approaches. With no pre/post-processing, we achieve 25dB PSNR recovery quality at 100x CR, with a frame rate of 125 fps on a Titan X GPU. Due to the feedforward and high-data-concurrency natures of CSVideoNet, it can take advantage of GPU acceleration to achieve three orders of magnitude speed-up over conventional iterative-based approaches. We share the source code at https://github.com/PSCLab-ASU/CSVideoNet.
Article
Full-text available
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Full-text available
Persistent surveillance from camera networks, such as at parking lots, UAVs, etc., often results in large amounts of video data, resulting in significant challenges for inference in terms of storage, communication and computation. Compressive cameras have emerged as a potential solution to deal with the data deluge issues in such applications. However, inference tasks such as action recognition require high quality features which implies reconstructing the original video data. Much work in compressive sensing (CS) theory is geared towards solving the reconstruction problem, where state-of-the-art methods are computationally intensive and provide low-quality results at high compression rates. Thus, reconstruction-free methods for inference are much desired. In this paper, we propose reconstruction-free methods for action recognition from compressive cameras at high compression ratios of 100 and above. Recognizing actions directly from CS measurements requires features which are mostly nonlinear and thus not easily applicable. This leads us to search for such properties that are preserved in compressive measurements. To this end, we propose the use of spatio-temporal smashed filters, which are compressive domain versions of pixel-domain matched filters. We conduct experiments on publicly available databases and show that one can obtain recognition rates that are comparable to the oracle method in uncompressed setup, even for high compression ratios.
Article
Full-text available
Faced with the growing population of seniors, developed countries need to develop new healthcare systems to help elderly people staying at home in a secure environment. Falls are one of the major risk for seniors living alone at home, causing severe injuries. Computer vision provides a new and promising solution for fall detection. The number of works on fall detection using computer vision increases in the last few years, and currently there is no easy way to compare the different algorithms. We present here a unique video data set which will be very useful for the scientific community to test their fall detection algorithms. This reports provides an overview of our video data set acquired from a calibrated multi-camera system. This video data set contains simulated falls and normal daily activities acquired in realistic situations.
Article
Full-text available
Since falls are a major public health problem in an ageing society, there is considerable demand for low-cost fall detection systems. One of the main reasons for non-acceptance of the currently available solutions by seniors is that the fall detectors using only inertial sensors generate too much false alarms. This means that some daily activities are erroneously signaled as fall, which in turn leads to frustration of the users. In this paper we present how to design and implement a low-cost system for reliable fall detection with very low false alarm ratio. The detection of the fall is done on the basis of accelerometric data and depth maps. A tri-axial accelerometer is used to indicate the potential fall as well as to indicate whether the person is in motion. If the measured acceleration is higher than an assumed threshold value, the algorithm extracts the person, calculates the features and then executes the SVM-based classifier to authenticate the fall alarm. It is a 365/7/24 embedded system permitting unobtrusive fall detection as well as preserving privacy of the user.
Article
Full-text available
Population of old generation is growing in most countries. Many of these seniors are living alone at home. Falling is amongst the most dangerous events that often happen and may need immediate medical care. Automatic fall detection systems could help old people and patients to live independently. Vision based systems have advantage over wearable devices. These visual systems extract some features from video sequences and classify fall and normal activities. These features usually depend on cameras view direction. Using several cameras to solve this problem increases the complexity of the final system. In this paper we propose to use variations in silhouette area that are obtained from only one camera. We use a simple background separation method to find the silhouette. We show that the proposed feature is view invariant. Extracted feature is fed into a support vector machine for classification. Simulation of the proposed method using a publicly available dataset shows promising results.
Conference Paper
Full-text available
Results in compressed sensing describe the feasibility of reconstructing sparse signals using a small number of linear measurements. In addition to compressing the signal, do these measurements provide secrecy? This paper considers secrecy in the context of an adversary that does not know the measurement matrix used to encrypt the signal. We demonstrate that compressed sensing-based encryption does not achieve Shannon's definition of perfect secrecy, but can provide a computational guarantee of secrecy.
Conference Paper
Full-text available
The compressed sensing (CS) paradigm unifies sensing and compression of sparse signals in a simple linear measurement step. Reconstruction of the signal from the CS measurements relies on the knowledge of the measurement matrix used for sensing. Generation of the pseudo-random sensing matrix utilizing a cryptographic key, offers a natural method for encrypting the signal during CS. This CS based encryption has the inherent advantage that encryption occurs implicitly in the sensing process - without requiring additional computation. Additionally, the robustness of recovery from compressed sensing, allows a new form of ldquorobust encryptionrdquo for multimedia data, wherein the signal is recoverable with high fidelity despite the introduction of additive noise in the encrypted data. In this paper, we examine the security and robustness of this CS based encryption method. The security implications are investigated by considering brute force and structured attacks. Robustness is characterized empirically. Our analysis and results indicate that the computational complexity of these attacks renders them infeasible in practice. In addition, the CS based encryption is found to have fair robustness against additive noise, making it a promising ldquorobust encryptionrdquo technique for multimedia.
Article
Full-text available
Faced with the growing population of seniors, developed countries need to establish new healthcare systems to ensure the safety of elderly people at home. Computer vision provides a promising solution to analyze personal behavior and detect certain unusual events such as falls. In this paper, a new method is proposed to detect falls by analyzing human shape deformation during a video sequence. A shape matching technique is used to track the person's silhouette along the video sequence. The shape deformation is then quantified from these silhouettes based on shape analysis methods. Finally, falls are detected from normal activities using a Gaussian mixture model. This paper has been conducted on a realistic data set of daily activities and simulated falls, and gives very good results (as low as 0% error with a multi-camera setup) compared with other common image processing methods.
Article
Fall events are one of the greatest risks for public safety, especially in some complex scenes with large number of people. Nevertheless, there are few researches on fall detection in complex scenes, and even no public datasets. A fall event dataset in crowded and complex scenes is constructed. Aiming at detecting fall events in complex scenes, we further propose an attention guided LSTM model. Our method provides the spatial and temporal locations of fall events, which are indispensable information for danger alarm in complex public scenes. Specifically, the effective YOLO v3 is employed to detect pedestrian in videos, and followed by a tracking module. CNN features are extracted for each tracked bounding boxes. Fall events are detected by the attention guided LSTM. Experimental results show that our method achieves good performance, outperforming the state-of-the-art methods.
Article
Fall detection is an important public healthcare problem. Timely detection could enable instant delivery of medical service to the injured. A popular non-intrusive solution for fall detection is based on videos obtained through ambient camera, and the corresponding methods usually require a large dataset to train a classifier and are inclined to be influenced by the image quality. However, it is hard to collect fall data and instead simulated falls are recorded to construct the training dataset, which is restricted to limited quantity. To address these problems, a three-dimensional convolutional neural network (3D CNN) based method for fall detection is developed which only uses video kinematic data to train an automatic feature extractor and could circumvent the requirement for large fall dataset of deep learning solution. 2D CNN could only encode spatial information, and the employed 3D convolution could extract motion feature from temporal sequence, which is important for fall detection. To further locate the region of interest in each frame, a LSTM (Long Short-Term Memory) based spatial visual attention scheme is incorporated. Sports dataset Sports-1M with no fall examples is employed to train the 3D CNN, which is then combined with LSTM to train a classifier with fall dataset. Experiments have verified the proposed scheme on fall detection benchmark with high accuracy as 100%. Superior performance has also been obtained on other activity databases. IEEE
Conference Paper
The compressed sensing (CS) theory has been successfully applied to image compression in the past few years as most image signals are sparse in a certain domain. Several CS reconstruction models have been recently proposed and obtained superior performance. However, there still exist two important challenges within the CS theory. The first one is how to design a sampling mechanism to achieve an optimal sampling efficiency, and the second one is how to perform the reconstruction to get the highest quality to achieve an optimal signal recovery. In this paper, we try to deal with these two problems with a deep network. First of all, we train a sampling matrix via the network training instead of using a traditional manually designed one, which is much appropriate for our deep network based reconstruct process. Then, we propose a deep network to recover the image, which imitates traditional compressed sensing reconstruction processes. Experimental results demonstrate that our deep networks based CS reconstruction method offers a very significant quality improvement compared against state-of-the-art ones.
Article
The theory of Compressive Sensing (CS) enables the compact storage of image datasets which are exponentially generated today. In this application, the high computational complexity CS reconstruction process is considered to be outsourced to the cloud for its abundant computing and storage resources. Although it is promising, how to protect data privacy and simultaneously maintain management of the image remains challenging. To address the challenge, we propose a novel outsourced image reconstruction and identity authentication service in cloud, which integrates the techniques of signal processing in the CS domain and computation outsourcing. In our system, the image CS samples are outsourced to cloud for reduced storage. For privacy, the scheme ensures the cloud to securely reconstruct image without revealing the underlying content. For management, whether the cloud determines to supply the reconstruction service is depending on the identity authentication result. Theoretical analysis and empirical evaluations show a satisfactory security performance and low computational complexity of the proposed system. Besides, experimental results also confirm the feasibility of identity authentication in the CS domain.
Conference Paper
Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.
Conference Paper
Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).
Conference Paper
Compressive imaging can acquire image signal in an under-sampled (i.e., under Nyquist rate) representation called measurement. However, measurement compression still has an essential problem in its overall rate-distortion performance. In this paper, we propose a measurement prediction method in which the best predictor is directionally selected in order to reduce the entropy of measurement to be sent. Generally, the measurement prediction usually works well with a small block while the quality of recovery is known to be better with a large block. In order to overcome this dilemma, we propose to use a structural measurement matrix with which compressive sensing is done in a small block size but recovery is performed in a large block size. In this way, both prediction and recovery are expected to be improved at the same time. Experimental results show its superiority in measurement coding amounting up to bitrate reduction by 39 %.
Conference Paper
Recent advances in camera architectures and associated mathematical representations now enable compressive acquisition of images and videos at low data-rates. In such a setting, we consider the problem of human activity recognition, which is an important inference problem in many security and surveillance applications. We propose a framework for understanding human activities as a non-linear dynamical system, and propose a robust, generalizable feature that can be extracted directly from the compressed measurements without reconstructing the original video frames. The proposed feature is termed recurrence texture and is motivated from recurrence analysis of non-linear dynamical systems. We show that it is possible to obtain discriminative features directly from the compressed stream and show its utility in recognition of activities at very low data rates.
Block-based compressive sensing coding of natural images by local structural measurement matrix
  • X Gao
  • J Zhang
  • W Che
  • X Fan
  • D Zhao
Gao, X., Zhang, J., Che, W., Fan, X., Zhao, D.: Block-based compressive sensing coding of natural images by local structural measurement matrix. In: 2015 Data Compression Conference. pp. 133-142. IEEE (2015)
Falls in older people
  • B Krishnaswamy
  • G Usha
Krishnaswamy, B., Usha, G.: Falls in older people
An architecture for compressive imaging
  • M B Wakin
  • J N Laska
  • M F Duarte
  • D Baron
  • S Sarvotham
  • D Takhar
  • K F Kelly
  • R G Baraniuk
Wakin, M.B., Laska, J.N., Duarte, M.F., Baron, D., Sarvotham, S., Takhar, D., Kelly, K.F., Baraniuk, R.G.: An architecture for compressive imaging. In: Image Processing, 2006 IEEE International Conference on. pp. 1273-1276. IEEE (2006)
  • J Redmon
  • A Farhadi
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)