Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Low-cost depth sensors, such as Microsoft Kinect, have potential for non-contact health monitoring that is robust to ambient lighting conditions. However, captured depth images typically suer from high acquisition noise, and hence processing them to estimate biometrics is difficult. In this paper, we propose to capture depth video of a human subject using Kinect 2.0 to estimate his/her heart rate and rhythm; as blood is pumped from the heart to circulate through the head, tiny oscillatory head motion due to Newtonian mechanics can be detected for periodicity analysis. Specifically, we first restore a captured depth video via a joint bit-depth enhancement / denoising procedure, using a graph-signal smoothness prior for regularization. Second, we track an automatically detected head region throughout the depth video to deduce 3D motion vectors. The detected vectors are fed back to the depth restoration module in a loop to ensure that the motion information in two modules are consistent, improving performance of both restoration and motion tracking. Third, the computed 3D motion vectors are projected onto its principal component for 1D signal analysis, composed of trend removal, band-pass filtering, and wavelet-based motion denoising. Finally, the heart rate is estimated via Welch power spectrum analysis, and the heart rhythm is computed via peak detection. Experimental results show accurate estimation of the heart rate and rhythm using our proposed algorithm as compared to rate and rhythm estimated by a portable oximeter.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Besides, these in-head or in-hand imaging architectures are suitable for personal usage, and they are difficult to be extended to the large-scale usage. Inspired by our previous research [11], we attempt to develop an algorithm to excavate the pure signals related to breath based on depth video when people walk past the global depth camera. ...
... Before GSA, we pre-process [11,15] each channel of the raw signal y (i,j) (t). We replace outliers by linear interpolation, remove trend by least square method, and filter each channel to a passband of [0.167, 0.667] Hz by Butterworth filter. ...
Preprint
This paper presents an unobtrusive solution that can automatically identify deep breath when a person is walking past the global depth camera. Existing non-contact breath assessments achieve satisfactory results under restricted conditions when human body stays relatively still. When someone moves forward, the breath signals detected by depth camera are hidden within signals of trunk displacement and deformation, and the signal length is short due to the short stay time, posing great challenges for us to establish models. To overcome these challenges, multiple region of interests (ROIs) based signal extraction and selection method is proposed to automatically obtain the signal informative to breath from depth video. Subsequently, graph signal analysis (GSA) is adopted as a spatial-temporal filter to wipe the components unrelated to breath. Finally, a classifier for identifying deep breath is established based on the selected breath-informative signal. In validation experiments, the proposed approach outperforms the comparative methods with the accuracy, precision, recall and F1 of 75.5%, 76.2%, 75.0% and 75.2%, respectively. This system can be extended to public places to provide timely and ubiquitous help for those who may have or are going through physical or mental trouble.
... Thus, many previous studies have proposed heart-rate estimation approaches using video input. These approaches can be classified as touchless [20][21][22][23][24][25] and touch-based monitoring systems [26,27] (Figure 1). Detailed reviews of vision-based heart-rate monitoring systems are available in the literature [28,29]. ...
... A desktop-based touchless system that tracks patient head motion from video input to estimate the heart rate has been proposed previously [21]. In another study, 3D head motion was tracked using a depth sensor [22]. That approach, referred to as ballistocardiography (BCG), was based on the fact that the influx of blood through the carotid artery causes a Newtonian reaction that results in subtle head motion (approximately 5 mm) that can be used to determine the heart rate. ...
Article
Full-text available
Newtonian reaction to blood influx into the head at each heartbeat causes subtle head motion at the same frequency as the heartbeats. Thus, this head motion can be used to estimate the heart rate. Several studies have shown that heart rates can be measured accurately by tracking head motion using a desktop computer with a static camera. However, implementation of vision-based head motion tracking on smartphones demonstrated limited accuracy due to the hand-shaking problem caused by the non-static camera. The hand-shaking problem could not be handled effectively with only the frontal camera images. It also required a more accurate method to measure the periodicity of noisy signals. Therefore, this study proposes an improved head-motion-based heart-rate monitoring system using smartphones. To address the hand-shaking problem, the proposed system leverages the front and rear cameras available in most smartphones and dedicates each camera to tracking facial features that correspond to head motion and background features that correspond to hand-shaking. Then, the locations of facial features are adjusted using the average point of the background features. In addition, a correlation-based signal periodicity computation method is proposed to accurately separate the true heart-rate-related component from the head motion signal. The proposed system demonstrates improved accuracy (i.e., lower mean errors in heart-rate measurement) compared to conventional head-motion-based systems, and the accuracy is sufficient for daily heart-rate monitoring.
... There are two main ways to obtain point clouds. One way is to capture the data with range-scanning devices, like Lidar or Kinect [7][8][9][10]. However, the point clouds captured by rangescanning devices are often polluted by noise and unable to be recognized. ...
... After training, the trained proposed networks are transferred to inference module (as we named Point Set i in the red circle of Figure 8) respectively to synthesize point clouds (as we named Point Set i in the red circle of Figure 8), and then the synthesized point clouds from all the groups are combined. Finally, the synthesized point clouds are recovered by using Equation (9). After the above generating procedure, a new 3D point cloud model is synthesized. ...
Article
Full-text available
Abstract Studying representation learning and generative modelling has been at the core of the 3D learning domain. By leveraging the generative adversarial networks and convolutional neural networks for point‐cloud representations, we propose a novel framework, which can directly generate 3D objects represented by point clouds. The novelties of the proposed method are threefold. First, the generative adversarial networks are applied to 3D object generation in the point‐cloud space, where the model learns object representation from point clouds independently. In this work, we propose a 3D spatial transformer network, and integrate it into a generation model, whose ability for extracting and reconstructing features for 3D objects can be improved. Second, a point‐wise approach is developed to reduce the computational complexity of the proposed network. Third, an evaluation system is proposed to measure the performance of our model by employing various categories and methods, and the error, considered as the difference between synthesized objects and raw objects are quantitatively compared, is less than 2.8%. Extensive experiments on benchmark dataset show that this method has a strong ability to generate 3D objects in the point‐cloud space, and the synthesized objects have slight differences with man‐made 3D objects.
... As our approach aims at recovering the pulse peak locations, including N-N intervals, this heart rate estimation technique is not enough. Another method uses the head movement to extract the heart rate [42], but in comparison to rgbPPG methods the depth data is used for motion tracking. Our method also uses solely a depth camera, but in contrast, we utilize the amplitude data to track the skin absorption changes. ...
... For these kinds of signals, a strong frequency response at several harmonic positions is characteristic. Due to this property, the energy contained in the main frequency component and its second harmonic, is used as a measure for a pulse wave like signal [30], [42]. Two derived descriptive values are used, once this contained energy is put into relation to the overall contained peak energies, and once into relation of the rest of the peak energies. ...
Conference Paper
Full-text available
The heart beat is one of the basic vital signs, but the pulse wave can communicate much more information than the beat frequency of the heart. Through the heart rate variability (HRV) among other things, stress level and drowsiness can be inferred. Reliable HRV measurements are commonly obtained by electrocardiography (ECG). In this paper we analyse the correlation between HRV and pulse rate variability (PRV) obtained from contact or remote photoplethysmography (PPG) on a Time-of-Flight (ToF) camera from PMD Technologies AG. The ToF camera is independent of passive illumination as it uses an infrared (IR) light source, which is invisible to the human eye. Therefore, potential use cases include driver monitoring or convenient heart rate measurement with smart phones. We will demonstrate methods for using a ToF camera for contact and remote PPG. For contact PPG the sensor is mounted directly on the subjects body. The pulse wave is calculated as the mean amplitude intensity of all pixels. For remote PPG the sensor is mounted to measure the subjects face. The pulse wave cannot be found in every part of exposed skin. Therefore, the part of the skin with the clearest signal is localised and the pulse wave is recovered from there, by means of passband filtering and blind source separation. Our results show that contact PPG on a ToF camera has high accuracy and highly correlates with a commercial IR pulse oximetry device, as well as with an ECG grade chest belt. For remote PPG our method shows correlation with the ToF contact PPG signal.
... The Kinect v2 sensor is very robust regarding body rotation, flip, scale changes, lighting changes, cluttered backgrounds and distortions. The Kinect v2 sensor is commercially available at low cost and with high-level programming interfaces; thus, it is a promising technology for many biomedical imaging applications in both clinical and non-clinical environments[135,137,141,[271][272][273][274][275][276][277][278][279][280][281][282][283]. ...
... Another study byGambi et al. [282] used EVM technique to reveal facial skin colour from RGB frames obtained by a Kinect sensor, when only the face region was exposed for analysis. Moreover, Yang et al.[283] recently proposed estimating the cardiac signal and rhythm at different head poses using 3D head motion tracking based on a Kinect depth sensor. The limitation was the difficulty of extracting the cardiac signal when the subject was lying on a bed, as well as some problems caused by unclear ROI. ...
Thesis
Interest continues rapidly increasing in the remote measurement of physiological signs using camera imaging technologies for clinical and biomedical applications. Physiological signs are vital parameters that indicate the internal state of the human body. Conventionally, these signs are measured using adhesive sensors and electrodes, which may cause discomfort and constrain the patient if used for long periods. These methods may also cause skin damage and infection, especially when applied to premature infants or people with sensitive skin. Thus, this project aims to explore possible remote monitoring systems based on camera imaging technologies to monitor heart rate and respiratory rate from different regions of the human body where cardiorespiratory signals can be detected. The work in this thesis relies on magnification of nearly imperceptible variations resulting from the cardiorespiratory activity of the human body, which include skin colour variations, arterial pulse motion, head motion, and thoracic and abdominal motion. A range of image and signal processing techniques are applied to extract the cardiorespiratory signal and differentiate the various forms of abnormal cardiorespiratory events. The contributions of this thesis are towards addressing the issue of creating robust remote image–based monitoring systems when different challenges exist, including different applied regions of interest, the existence of noise and motion artefacts correlated with the signal, monitoring multiple people at a given time (up to six people), monitoring from a long distance (up to 60 metres) and long-term monitoring. The experimental results show promising performance in comparison with the reference measurements from clinical instruments, with close agreement, strong correlation and low error rate. Therefore, this thesis leads to a new perspective for remote physiological monitoring and remote sensing assessment, showing promising performance in clinical and biomedical applications.
... From the feature point of view, remote HR measurement methods can be divided as color-based approaches and motion-based approaches. Color-based approaches such as [16], [17], [4], [8], [10] and [22] rely on the subtle color changes of facial skin pixels to measure HRs, while motionbased approaches such as [1], [11] and [23] track the motion trajectories of facial pixels to measure HRs. From the learning point of view, remote HR measurement methods can be divided as training-free approaches and learning-based approaches. ...
Preprint
Remote measurement of physiological signals from videos is an emerging topic. The topic draws great interests, but the lack of publicly available benchmark databases and a fair validation platform are hindering its further development. For this concern, we organize the first challenge on Remote Physiological Signal Sensing (RePSS), in which two databases of VIPL and OBF are provided as the benchmark for kin researchers to evaluate their approaches. The 1st challenge of RePSS focuses on measuring the average heart rate from facial videos, which is the basic problem of remote physiological measurement. This paper presents an overview of the challenge, including data, protocol, analysis of results and discussion. The top ranked solutions are highlighted to provide insights for researchers, and future directions are outlined for this topic and this challenge.
... Chen et al. [28] directly used the NIR images captured by RealSense 3 and demonstrated the possibility of using NIR images for HR measurement. Instead of using non-color sensors to perform rPPG-based HR estimation, a BCG-based method using depth images was studied in [31]. Similar to the color sensor based methods, all the remote HR estimation using non-color sensors are still hand-crafted, and make certain assumptions. ...
Preprint
Heart rate (HR) is an important physiological signal that reflects the physical and emotional status of a person. Traditional HR measurements usually rely on contact monitors, which may cause inconvenience and discomfort. Recently, some methods have been proposed for remote HR estimation from face videos; however, most of them focus on well-controlled scenarios, their generalization ability into less-constrained scenarios (e.g., with head movement, and bad illumination) are not known. At the same time, lacking large-scale HR databases has limited the use of deep models for remote HR estimation. In this paper, we propose an end-to-end RhythmNet for remote HR estimation from the face. In RyhthmNet, we use a spatial-temporal representation encoding the HR signals from multiple ROI volumes as its input. Then the spatial-temporal representations are fed into a convolutional network for HR estimation. We also take into account the relationship of adjacent HR measurements from a video sequence via Gated Recurrent Unit (GRU) and achieves efficient HR measurement. In addition, we build a large-scale multi-modal HR database (named as VIPL-HR, available at 'http://vipl.ict.ac.cn/view_database.php?id=15'), which contains 2,378 visible light videos (VIS) and 752 near-infrared (NIR) videos of 107 subjects. Our VIPL-HR database contains various variations such as head movements, illumination variations, and acquisition device changes, replicating a less-constrained scenario for HR estimation. The proposed approach outperforms the state-of-the-art methods on both the public-domain and our VIPL-HR databases.
... Most digital cameras use RGB color models, where each sensor captures the intensity of light at red, green and blue levels in the spectrum [25]. Depth camera is a relatively new device for human motion detection, which can generate a depth image, and its pixel encodes the distance to the corresponding point in the scene [26]. It should be noted that the introduction of low-cost RGB-Depth sensors such as Kinect can provide both depth maps and color images, which is very conducive to the practice of human motion analysis [27]- [29]. ...
... on-contacting 3-D position estimation can be widely useful for human motion tracking [1][2] [3] and tip tracking of soft robots [4]. 3-D position estimation is also very important for localization of medical robots or devices moving inside the human body [5] [6]. ...
Article
In this letter, a new 3-D electromagnetic position sensing method is proposed for localization of continuum medical robots. An electromagnet and magnetic sensors are placed outside the human body while only a piece of passive mu-metal with high magnetic permeability is attached to the robot moving inside the body, resulting in a wireless non-contacting position estimation system. The mu-metal gets easily magnetized by the electromagnet and thus exerts position-dependent influence on the external magnetic field, which is measured for position estimation using a particle filter. An alternating magnetic field from the electromagnet is used and hence disturbances from nearby ferromagnetic objects can be rejected. The 3-D position estimation system is evaluated on a flexible trans-esophageal robot for ultrasound imaging with motions of insertion and maneuver. Experiments show that the mean position estimation error is about 5 mm and the system is robust in the presence of magnetic disturbances from a ferromagnetic object. This new wireless and robust 3-D position estimation system is demonstrated to have the potential to localize a continuum medical robot, which can enable autonomous navigation of the robot.
... When the right medicine is given at right time, it can help prevent heart attacks and reduce the probability of deaths. So design of stress detection and health monitoring technology that could help people to understand their state of mind and body is very essential [9]. In the recent years, wireless technology is playing a crucial role in various sectors as well as biomedical to provide better health care. ...
... However, both ECG and PPG sensors need to be attached to body parts which may cause discomfort and are inconvenient for long-term monitoring. To counter for this issue, new technology of remote photoplethysmography (rPPG) [1,2,3,4,5,6,7,8,9,10] is developing fast in recent years, which targets to measure heart activity remotely without any contact. ...
Preprint
Recently average heart rate (HR) can be measured relatively accurately from human face videos based on non-contact remote photoplethysmography (rPPG). However in many healthcare applications, knowing only the average HR is not enough, and measured blood volume pulse signal and its heart rate variability (HRV) features are also important. We propose the first end-to-end rPPG signal recovering system (PhysNet) using deep spatio-temporal convolutional networks to measure both HR and HRV features. PhysNet extracts the spatial and temporal hidden features simultaneously from raw face sequences while outputs the corresponding rPPG signal directly. The temporal context information helps the network learn more robust features with less fluctuation. Our approach was tested on two datasets, and achieved superior performance of HR and HRV features comparing to the state-of-the-art methods.
... A natural and basic constraint is that the converted S˜ should facilitate "local processing", that is, it should describe topologically the same graph, which is essential in virtually all GSP applications, such as filter design [8], sampling [15], denoising [16], and classification [17], otherwise, the conversion will dramatically increase the calculation complexity. To ensure that the converted graph facilitates "local processing", that is, an implementation of an L-th order polynomial filter requires L data exchanges between neighbouring nodes [13], we introduce Definition 2. In fact, the definition of a matrix describing a graph (see details in Definition 2) is not new in spectral graph theory. ...
Preprint
Full-text available
It has recently been shown that, contrary to the wide belief that a shift-enabled condition (necessary for any shift-invariant filter to be representable by a graph shift matrix) can be ignored because any non-shift-enabled matrix can be converted to a shift-enabled matrix, such a conversion in general may not hold for a directed graph with non-symmetric shift matrix. This paper extends this prior work, focusing on undirected graphs where the shift matrix is generally symmetric. We show that while, in this case, the shift matrix can be converted to satisfy the original shift-enabled condition, the converted matrix is not associated with the original graph, that is, it does not capture anymore the structure of the graph signal. We show via a counterexample, that a non-shift-enabled matrix cannot be converted to a shift-enabled one and still maintain the topological structure of the underlying graph, which is necessary to facilitate localized signal processing.
... From the 'feature' point of view, remote HR measurement methods can be divided as 'color-based approaches' and 'motion-based approaches'. Color-based approaches such as [16], [17], [4], [8], [10] and [22] rely on the subtle color changes of facial skin pixels to measure HRs, while motionbased approaches such as [1], [11] and [23] track the motion trajectories of facial pixels to measure HRs. From the 'learning' point of view, remote HR measurement methods can be divided as 'training-free approaches' and 'learningbased approaches'. ...
... Huang et al. [4] utilized depth information for error concealment in video transmission. Yang et al. [5,6] leveraged on depth video to better track and measure patients' chest, abdominal movements, and heart rate over time. Shotton et al. [7], Shen et al. [8], and Lopes et al. [9] developed human pose recognition and correction models using depth images. ...
Article
Full-text available
Depth has been a valuable piece of information for perception tasks such as robot grasping, obstacle avoidance, and navigation, which are essential tasks for developing smart homes and smart cities. However, not all applications have the luxury of using depth sensors or multiple cameras to obtain depth information. In this paper, we tackle the problem of estimating the per-pixel depths from a single image. Inspired by the recent works on generative neural network models, we formulate the task of depth estimation as a generative task where we synthesize an image of the depth map from a single Red, Green, and Blue (RGB) input image. We propose a novel generative adversarial network that has an encoder-decoder type generator with residual transposed convolution blocks trained with an adversarial loss. Quantitative and qualitative experimental results demonstrate the effectiveness of our approach over several depth estimation works.
... In 2015, Lam et al. improved blind source separation by randomly choosing a pair of patches from the face, obtained multiple extracted PPG signals, and, finally, used a majority voting scheme to robustly recover the HR [7]. Different from existing methods based on RGB videos, Yang et al. proposed a non-intrusive heart rate estimation system via 3D motion tracking in depth video [17]. In 2018, Prakash [18] used a bounded kalman filter for motion estimation and feature tracking in remote photo PPG methods. ...
Article
With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with accuracy and the lack of comprehensive and challenging datasets for use and comparison. Recently, it was shown that heart rate information can be extracted from facial videos by spatial decomposition and temporal filtering. Inspired by this, a new framework is introduced in this paper for remotely estimating the heart rate under realistic conditions by combining spatial and temporal filtering and a convolutional neural network. Our proposed approach shows better performance compared with the benchmark on the MMSE-HR dataset in terms of both the average heart rate estimation and short-time heart rate estimation. High consistency in short-time heart rate estimation is observed between our method and the ground truth.
... Graph signal processing (GSP) extends classical digital signal processing (DSP) to signals on graphs, and provides potential solutions to numerous real-world problems that involve signals defined on topologically complicated domains, such as social networks, point clouds, biological networks, environmental and condition monitoring sensor networks [1], [2], [3], [4], [5]. However, there are several challenges in extending classical DSP to signals on graphs, particularly related to the scope of graph filters. ...
Preprint
In this letter, we consider the implementation problem of distributed graph filters, where each node only has access to the signals of the current and its neighboring nodes. By using Gaussian elimination, we show that as long as the graph is connected, we can implement any graph filter by decomposing the filter into a product of directly implementable filters, filters that only use the signals at the current and neighboring nodes as inputs. We have also included a concrete example as an illustration.
... However, it is not sufficient just to have H to be represented as a polynomial of any arbitraryS. A natural and basic constraint is that the convertedS should keep the same topological structure as S of the graph, which is essential in virtually all GSP applications, such as filter design [6], sampling [7], denoising [8], and classification [9]. In a nutshell, two graph matrices describe the same graph if the conversion from one to another preserves the graph topological structure. ...
Preprint
Full-text available
In a 2013 paper by Sandryhaila and Moura, the authors introduced a condition (herein we will call it shift-enabled condition) that any shift invariant filter can be represented by the shift matrix if the condition is satisfied. In the same, the authors also argued that shift-enabled condition can be ignored as any non-shift-enabled matrix can be converted to a shift-enabled one. In our prior work, we proved that such conversion in general may not hold for a directed graph with non-symmetric shift matrix. This letter will focus on undirected graphs where shift matrix is generally symmetric. Though the shift matrix can be converted to satisfy shift-enabled condition,the converted matrix is not associated with the original graph, making the conversion moot. Finally, some potential methods which preserving main graph topologies to convert graph shift matrices will be introduced. Note that these methods also do not hold for all matrices and further researches on shift enabled conditions are needed.
... However, it is not sufficient to have H represented as a polynomial of any arbitraryS. VOLUME 4, 2016 A natural and basic constraint is that the convertedS should facilitate "local processing", that is, it should describe topologically the same graph, which is essential in virtually all GSP applications, such as filter design [19], sampling [20], denoising [21], and classification [22], otherwise, the conversion will dramatically increase the calculation complexity. To ensure that the converted graph facilitates "local processing", that is, an implementation of an L-th order polynomial filter requires L data exchanges between neighbouring nodes [17], we introduce Definition 2. In fact, the definition of a matrix describing a graph (see details in Definition 2) is not new in spectral graph theory. ...
Article
Full-text available
With the growing application of undirected graphs for signal/image processing on graphs and distributed machine learning, we demonstrate that the shift-enabled condition is as necessary for undirected graphs as it is for directed graphs. It has recently been shown that, contrary to the widespread belief that a shift-enabled condition (necessary for any shift-invariant filter to be representable by a graph shift matrix) can be ignored because any non-shift-enabled matrix can be converted to a shift-enabled matrix, such a conversion in general may not hold for a directed graph with non-symmetric shift matrix. This paper extends this prior work, focusing on undirected graphs where the shift matrix is generally symmetric. We show that while, in this case, the shift matrix can be converted to satisfy the original shift-enabled condition, the converted matrix is not associated with the original graph, that is, it does not capture anymore the structure of the graph signal. We show via examples, that a non-shift-enabled matrix cannot be converted to a shift-enabled one and still maintain the topological structure of the underlying graph, which is necessary to facilitate localized signal processing.
... 3D motion capture data analytics is currently emerging research area with most researches using it as a validation tool rather than an analysis tool. In the last few years research on analytics is picking pace and 3D data analytics is 125 most challenging problem [23]. Motion features such as trajectories, velocities and angles between markers are used for classification of human motion [24]. ...
Article
Full-text available
Machine translation of sign language is a critical task of computer vision. In this work, we propose to use 3D motion capture technology for sign capture and graph matching for sign recognition. Two problems related to 3D sign matching are addressed in this work: (1) how to identify same signs with different number of motion frames and (2) sign extraction from a clutter of non-sign hand motions. These two problems make the 2D or 3D sign language machine translation a challenging task. We propose graph matching with early estimation model to address these problems in two phases. The first phase consists of intra graph matching for motion frame extraction, which retains motion intensive frames in database and query 3D videos. The second phase applies inter graph matching with early estimation model on motion extracted query and dataset 3D videos. The proposed model increases the speed of the graph matching algorithm in estimating a sign with fewer frames. To test the graph matching model, we recorded 350 words of Indian sign language with 3D motion capture technology. For testing 4 variations per sign are captured for all signs with 5 different signers at same, slower, faster hand speeds and sign mixed cluttered hand motions. The early estimation graph matching model is tested for accuracy and efficiency in classifying 3D signs with the two induced real time constraints. In addition to 3D sign language dataset, the proposed method is validated on five benchmark datasets and against the state-of-the-art graph matching methods.
... 3D motion capture data analytics is a currently emerging research field, and most researches use the analytics as a validation tool rather than an analysis tool. In the last few years, research on analytics is picking pace, and 3D data analytics is the most challenging problem [47]. Motion features such as trajectories, velocities, and angles between markers are used for the classification of human motion [8]. ...
Article
Full-text available
A machine cannot easily understand and interpret three-dimensional (3D) data. In this study, we propose the use of graph matching (GM) to enable 3D motion capture for Indian sign language recognition. The sign classification and recognition problem for interpreting 3D motion signs is considered an adaptive GM (AGM) problem. However, the current models for solving an AGM problem have two major drawbacks. First, spatial matching can be performed on a fixed set of frames with a fixed number of nodes. Second, temporal matching divides the entire 3D dataset into a fixed number of pyramids. The proposed approach solves these problems by employing interframe GM for performing spatial matching and employing multiple intraframe GM for performing temporal matching. To test the proposed model, a 3D sign language dataset is created that involves 200 continuous sentences in the sign language through a motion capture setup with eight cameras.The method is also validated on 3D motion capture benchmark action dataset HDM05 and CMU. We demonstrated that our approach increases the accuracy of recognizing signs in continuous sentences.
... Another study by Gambi et al. [23] also used an EVM technique to reveal facial skin colour changes from RGB frames obtained by the Kinect sensor, when only the face region was exposed for analysis. Estimating the cardiac signal and rhythm at different head poses using 3D head motion tracking based on a Kinect depth sensor was proposed by Yang et al. [24]. The limitation was the difficulty of extracting the cardiac signal when the subject was lying down on a bed and some other problems caused by unclear ROI. ...
Article
Full-text available
Monitoring of cardiopulmonary activity is a challenge when attempted under adverse conditions, including different sleeping postures, environmental settings, and an unclear region of interest (ROI). This study proposes an efficient remote imaging system based on a Microsoft Kinect v2 sensor for the observation of cardiopulmonary-signal-and-detection-related abnormal cardiopulmonary events (e.g., tachycardia, bradycardia, tachypnea, bradypnea, and central apnoea) in many possible sleeping postures within varying environmental settings including in total darkness and whether the subject is covered by a blanket or not. The proposed system extracts the signal from the abdominal-thoracic region where cardiopulmonary activity is most pronounced, using a real-time image sequence captured by Kinect v2 sensor. The proposed system shows promising results in any sleep posture, regardless of illumination conditions and unclear ROI even in the presence of a blanket, whilst being reliable, safe, and cost-effective.
... Low-cost real-time depth sensors integrated into devices like Kinect and new smartphones have contributed to obtaining 3D data in an easy and affordable way [7], [8]. These developments have triggered the design of fully-automatic Manuscript received July 5, 2021. ...
Article
Full-text available
3D Anthropometric measurement extraction is of paramount importance for several applications such as clothing design, online garment shopping, and medical diagnosis, to name a few. State-of-the-art 3D anthropometric measurement extraction methods estimate the measurements either through some landmarks found on the input scan or by fitting a template to the input scan using optimization-based techniques. Finding landmarks is very sensitive to noise and missing data. Template-based methods address this problem, but the employed optimization-based template fitting algorithms are computationally very complex and time-consuming. To address the limitations of existing methods, we propose a deep neural network architecture which fits a template to the input scan and outputs the reconstructed body as well as the corresponding measurements. Unlike existing template-based anthropocentric measurement extraction methods , the proposed approach does not need to transfer and refine the measurements from the template to the deformed template, thereby being faster and more accurate. A novel loss function, especially developed for 3D anthropometric measurement extraction is introduced. Additionally, two large datasets of complete and partial front-facing scans are proposed and used in training. This results in two models, dubbed Anet-complete and Anet-partial, which extract the body measurements from complete and partial front-facing scans, respectively. Experimental results on synthesized data as well as on real 3D scans captured by a photogrammetry-based scanner, an Azure Kinect sensor, and the very recent TrueDepth camera system demonstrate that the proposed approach systematically outperforms the state-of-the-art methods in terms of accuracy and robustness.
... However, it is not sufficient to have H represented as a polynomial of any arbitraryS. A natural and basic constraint is that the convertedS should facilitate "local processing", that is, it should describe topologically the same graph, which is essential in virtually all GSP applications, such as filter design [8], sampling [9], denoising [10], and classification [11], otherwise, the conversion is meaningless. To ensure that the converted graph facilitates "local processing", that is, an implementation of an L-th order polynomial filter requires L data exchanges between neighbouring nodes [6], we introduce Definition 2. In fact, the definition of a matrix describing a graph (see details in Definition 2) is not new in spectral graph theory. ...
Preprint
Full-text available
It has recently been shown that, contrary to the wide belief that a shift-enabled condition (necessary for any shift-invariant filter to be representable by a graph shift matrix) can be ignored because any non-shift-enabled matrix can be converted to a shift-enabled matrix, such a conversion in general may not hold for a directed graph with non-symmetric shift matrix. This paper extends this prior work, focusing on undirected graphs where the shift matrix is generally symmetric. We show that while, in this case, the shift matrix can be converted to satisfy the original shift-enabled condition, the converted matrix is not associated with the original graph, that is, it does not capture anymore the structure of the graph signal. We show via a counterexample, that a non-shift-enabled matrix cannot be converted to a shift-enabled one and still maintain the topological structure of the underlying graph, which is necessary to facilitate localized signal processing.
Article
Full-text available
IOT devices is used in many fields which make the user's daily life more comfortable. These smart sensors devices is used to collect heartbeat which is used to assess the health condition of the patient. Communicating the collected information to the doctor, making exact decision on the data collected and notifying the patient is the challenging task in the IOT. This paper will give you a comparative study on health detection and monitoring of the patient.
Article
Quality sleep is a basic human need for well-being, yet sleep deprivation has been a long-term global problem. A common type of sleep deprivation is obstrucive sleep apnea, where people repeatedly stop breathing during sleep with subsequent abnormal vital signs, namely, respiration rate and heart rate. While tremendous effort has been made for vital signs monitoring systems during sleep, existing works still lack portability for bulky and intrusive systems and reliability for consumer-level, nonintrusive systems. To bridge the gap between practicability and accuracy and facilitate Internet of Things for smart healthcare, in this article, we propose a vital signs estimation system during sleep via a thermal camera. The system first captures thermal image sequences of a sleeping subject and then processes the facial regions within the thermal images for vital signs signal extraction. Specifically, leveraging on the inherent graph structure among subregions of the facial area, we propose a graph-based, spatial-temporal signal denoising scheme. Experimental results show that the graph-based denoising scheme in our system effectively reduces the noise level introduced by cameras and subjects, and our proposed system outperforms state-of-the-art nonintrusive vital signs monitoring systems. Since the algorithm components in our system have relatively low time complexity and no model training is required, our system can be deployed efficiently at the edge devices in a smart home setting. The extracted vital signs can then be used for sleep abnormality detection and disease screening.
Article
Recent advances in image acquisition and analysis have resulted in disruptive innovation in physical rehabilitation systems facilitating cost-effective, portable, video-based gait assessment. While these inexpensive motion capture systems, suitable for home rehabilitation, do not generally provide accurate kinematics measurements on their own, image processing algorithms ensure gait analysis that is accurate enough for rehabilitation programs. This paper proposes high-accuracy classification of gait phases and muscle actions, using readings from low-cost motion capture systems. First, 12 gait parameters, drawn from the medical literature, are defined to characterize gait patterns. These proposed parameters are then used as input to our proposed multi-channel time-series classification and gait phase reconstruction methods. Proposed methods fully utilize temporal information of gait parameters, thus improving the final classification accuracy. The validation, conducted using 126 experiments, with 6 healthy volunteers and 9 stroke survivors with manually-labelled gait phases, achieves state-of-art classification accuracy of gait phase with lower computational complexity compared to previous solutions. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup
Article
Heart rate (HR) is an important physiological signal that reflects the physical and emotional status of a person. Traditional HR measurements usually rely on contact monitors, which may cause inconvenience and discomfort. Recently, some methods have been proposed for remote HR estimation from face videos; however, most of them focus on well-controlled scenarios, their generalization ability into less-constrained scenarios (e.g., with head movement, and bad illumination) are not known. At the same time, lacking large-scale HR databases has limited the use of deep models for remote HR estimation. In this paper, we propose an end-to-end RhythmNet for remote HR estimation from the face. In RyhthmNet, we use a spatial-temporal representation encoding the HR signals from multiple ROI volumes as its input. Then the spatial-temporal representations are fed into a convolutional network for HR estimation. We also take into account the relationship of adjacent HR measurements from a video sequence via Gated Recurrent Unit (GRU) and achieves efficient HR measurement. In addition, we build a large-scale multi-modal HR database (named as VIPL-HR <sup>1</sup> ), which contains 2,378 visible light videos (VIS) and 752 near-infrared (NIR) videos of 107 subjects. Our VIPL-HR database contains various variations such as head movements, illumination variations, and acquisition device changes, replicating a less-constrained scenario for HR estimation. The proposed approach outperforms the state-of-the-art methods on both the public-domain and our VIPL-HR databases. <sup>1</sup> VIPL-HR is available at: http://vipl.ict.ac.cn/view_database.php?id=15 </fn
Article
Existing discriminative learning methods for image denoising use either a single residual learning or a nonresidual learning design. However, we observe that these two schemes perform differently with the same noise level, and yet, there have been no explorations regarding whether residual or nonresidual designs are better suited for denoising. Additionally, many discriminative denoisers are designed to learn a model that corresponds to a fixed noise level, which means that multiple models are required to recover corrupted images with noise at different levels. In this paper, we propose a dynamic dual learning network for blind image denoising, namely, DualBDNet. Instead of modeling a sole task prediction network, the proposed DualBDNet investigates the inherent relations between the residual estimation and the nonresidual estimation. In particular, DualBDNet produces task-dependent feature maps, and each part of the features is devoted to one specific task (residual/nonresidual mapping). To address different noise levels with a single network or even cases where the statistics of noise are unknown, we further introduce an embedded subnetwork into DualBDNet. One output of the subnetwork is the learning of a dynamic compositional attention to highlight the more significant task-dependent feature maps, adaptively coinciding with the extent of corruption. The other output is the learning of a weight used for fusion of the results to ensure an end-to-end manner. Extensive experiments demonstrate that the proposed DualBDNet outperforms the state-of-the-art methods on both synthetic and real noisy images without estimating the noise levels as input.
Article
Vital sign (e.g., respiration rate) monitoring has become increasingly more important because it offers useful clues about medical conditions such as sleep disorders. There is a compelling need for technologies that enable contact-free and easy deployment of vital sign monitoring over an extended period of time for healthcare. In this article, we present a SonarBeat system to leverage a phase-based active sonar to monitor respiration rates with smartphones. We provide a sonar phase analysis and discuss the technical challenges for respiration rate estimation utilizing an inaudible sound signal. Moreover, we design and implement the SonarBeat system, with components including signal generation, data extraction, received signal preprocessing, and breathing rate estimation with Android smartphones. Our extensive experimental results validate the superior performance of SonarBeat in different indoor environment settings.
Article
The aim of this study is to develop an affordable and remote intelligent respiratory monitoring system. To achieve low-cost and remote measurement of respiratory signal, an RGB camera collaborated with marker tracking is used as data acquisition sensor, and a Raspberry Pi is used as data processing platform. To overcome challenges in actual applications, the signal processing algorithms are designed for removing sudden body movements and smoothing the raw signal. Subsequently, respiratory rate is estimated by a translational cross point algorithm, and respiratory pattern is identified by recurrent neural network. For estimating respiratory rate, the translational cross point algorithm performs better than other methods with RMSE of 3.29 bpm. With respect to the classification of breathing patterns, the established neural network performs better than SVM-based classifiers with the accuracy, precision, recall, and F1 of 89.0%, 89.0%, 90.5%, and 89.0%, respectively. The obtained decision-making information and some original information are sent to user’s smartphone via a cloud service platform. In a way, due to its low-price, non-contact and portable merits, the established system can be seen as a “respiratory consultant” by your side.
Article
In recent years, multi-view learning has emerged as a promising approach for 3D shape recognition, which identifies a 3D shape based on its 2D views taken from different viewpoints. Usually, the correspondences inside a view or across different views encode the spatial arrangement of object parts and the symmetry of the object, which provide useful geometric cues for recognition. However, such view correspondences have not been explicitly and fully exploited in existing work. In this paper, we propose a correspondence-aware representation (CAR) module, which explicitly finds potential intra-view correspondences and cross-view correspondences via $k$ NN search in semantic space and then aggregates the shape features from the correspondences via learned transforms. Particularly, the spatial relations of correspondences in terms of their viewpoint positions and intra-view locations are taken into account for learning correspondence-aware features. Incorporating the CAR module into a ResNet-18 backbone, we propose an effective deep model called CAR-Net for 3D shape classification and retrieval. Extensive experiments have demonstrated the effectiveness of the CAR module as well as the excellent performance of the CAR-Net.
Chapter
This chapter presents the contactless vital signs monitoring through Continuous-Waves (CW) radar sensor. The chapter is divided into two parts: contactless vital signs monitoring through CW radar sensors and contactless vital signs monitoring through frequency-modulated Continuous-Waves (FMCW) radar sensors. The theory and signal-processing algorithm of the radar are introduced. The architectures of the CW/FMCW radar systems are simple, which facilitates the integration of radar technology into compact devices. Commercial CW/FMCW radar front-end transceivers are widely available. With the advantage of strong environmental adaptability, low-power consumption, and penetrability, CW/FMCW radars are promising for contactless vital signs monitoring application. This chapter introduces their potential application in cardiopulmonary monitoring, cancer medical application, and indoor human tracking.
Chapter
Vital signs (e.g., the breathing rate) monitoring has become increasingly important because it can offer useful clues to medical conditions such as sleep disorders or anomalies. There is a compelling need for technologies that enable contact-free, easily deployable, and long-term vital signs monitoring for healthcare. In this chapter, we focus on acoustic-based vital signs monitoring for which related acoustic-based sensing techniques with smartphones are introduced. Then, we present the SonarBeat system to exploit a phase-based active sonar for monitoring breathing rates with smartphones. We design and implement the SonarBeat system, with components including signal generation, data extraction, received signal preprocessing, and breathing-rate estimation, with Android smartphones. Our experimental results validate the superior performance of SonarBeat in various indoor environment settings.
Article
Contactless exercise monitoring is a new trend that makes people feel more comfortable and unconstrained. However, the decrease in accuracy of the pulse rate measurement caused by large motion artifacts is an urgent problem to be solved. In this paper, we proposed a novel approach to monitor step count and improve the accuracy of remote pulse rate measurement based on the image detection method. We designed a chrominance-based adaptive filter and normalization (CADN) method and a domain selection scheme (DSS) to enhance the accuracy of contactless pulse rate measurement during exercise. Various exercises such as biking, stepping, and treadmill running were conducted to evaluate motion robustness of the proposed CADN+DSS and the accuracy of step counts. The results reveal that the detection rates of the proposed step count method are 99.57 % and 99.77% for stepping and treadmill exercise, respectively. The pulse rate accuracy is compared with two state-of-the-arts algorithms- chrominance (CHROM) and chrominance-based adaptive filter (CAD). The results show the proposed CADN+DSS method provides a lower discrepancy between the detected pulse rate and a ground-truth device (Polar H7) for all activities. We expand the scope of contactless measurement for physical activity detection and develop an unfettered step count and pulse rate measurement method in exercise. Therefore, step count and pulse rate can be measured synchronously without relying on any contact sensors.
Article
Full-text available
Inverse imaging problems are inherently under-determined, and hence, it is important to employ appropriate image priors for regularization. One recent popular prior—the graph Laplacian regularizer—assumes that the target pixel patch is smooth with respect to an appropriately chosen graph. However, the mechanisms and implications of imposing the graph Laplacian regularizer on the original inverse problem are not well understood. To address this problem, in this paper, we interpret neighborhood graphs of pixel patches as discrete counterparts of Riemannian manifolds and perform analysis in the continuous domain, providing insights into several fundamental aspects of graph Laplacian regularization for image denoising. Specifically, we first show the convergence of the graph Laplacian regularizer to a continuous-domain functional, integrating a norm measured in a locally adaptive metric space. Focusing on image denoising, we derive an optimal metric space assuming non-local self-similarity of pixel patches, leading to an optimal graph Laplacian regularizer for denoising in the discrete domain. We then interpret graph Laplacian regularization as an anisotropic diffusion scheme to explain its behavior during iterations, e.g., its tendency to promote piecewise smooth signals under certain settings. To verify our analysis, an iterative image denoising algorithm is developed. Experimental results show that our algorithm performs competitively with state-of-the-art denoising methods, such as BM3D for natural images, and outperforms them significantly for piecewise smooth images.
Article
Full-text available
Our society will face a notable demographic shift in the near future. According to a United Nations report, the ratio of the elderly population (aged 60 years or older) to the overall population increased from 9.2% in 1990 to 11.7% in 2013 and is expected to reach 21.1% by 2050 [1]. According to the same report, 40% of older people live independently in their own homes. This ratio is about 75% in the developed countries. These facts will result in many societal challenges as well as changes in the health-care system, such as an increase in diseases and health-care costs, a shortage of caregivers, and a rise in the number of individuals unable to live independently [2]. Thus, it is imperative to develop ambient intelligence-based assisted living (AL) tools that help elderly people live independently in their homes. The recent developments in sensor technology and decreasing sensor costs have made the deployment of various sensors in various combinations viable, including static setups as well as wearable sensors. This article presents a survey that concentrates on the signal processing methods employed with different types of sensors. The types of sensors covered are pyro-electric infrared (PIR) and vibration sensors, accelerometers, cameras, depth sensors, and microphones.
Article
Full-text available
Background The cardiac parameters, such as heart rate (HR) and heart rate variability (HRV), are very important physiological data for daily healthcare. Recently, the camera-based photoplethysmography techniques have been proposed for HR measurement. These techniques allow us to estimate the HR contactlessly with low-cost camera. However, the previous works showed limit success for estimating HRV because the R–R intervals, the primary data for HRV calculation, are sensitive to noise and artifacts. Methods This paper proposed a non-contact method to extract the blood volume pulse signal using a chrominance-based method followed by a proposed CWT-based denoising technique. The R–R intervals can then be obtained by finding the peaks in the denoised signal. In this paper, we taped 12 video clips using the frontal camera of a smart phone with different scenarios to make comparisons among our method and the other alternatives using the absolute errors between the estimated HRV metrics and the ones obtained by an ECG-accurate chest band. Results As shown in experiments, our algorithm can greatly reduce absolute errors of HRV metrics comparing with the related works using RGB color signals. The mean of absolute errors of HRV metrics from our method is only 3.53 ms for the static-subject video clips. Conclusions The proposed camera-based method is able to produce reliable HRV metrics which are close to the ones measured by contact devices under different conditions. Thus, our method can be used for remote health monitoring in a convenient and comfortable way.
Conference Paper
Full-text available
Heart rate is an important indicator of people's physiological state. Recently, several papers reported methods to measure heart rate remotely from face videos. Those methods work well on stationary subjects under well controlled conditions, but their performance significantly degrades if the videos are recorded under more challenging conditions, specifically when subjects' motions and illumination variations are involved. We propose a framework which utilizes face tracking and Normalized Least Mean Square adaptive filtering methods to counter their influences. We test our framework on a large difficult and public database MAHNOB-HCI and demonstrate that our method substantially outperforms all previous methods. We also use our method for long term heart rate monitoring in a game evaluation scenario and achieve promising results.
Article
Full-text available
In recent years, the videogame industry has been characterized by a great boost in gesture recognition and motion tracking, following the increasing request of creating immersive game experiences. The Microsoft Kinect sensor allows acquiring RGB, IR and depth images with a high frame rate. Because of the complementary nature of the information provided, it has proved an attractive resource for researchers with very different backgrounds. In summer 2014, Microsoft launched a new generation of Kinect on the market, based on time-of-flight technology. This paper proposes a calibration of Kinect for Xbox One imaging sensors, focusing on the depth camera. The mathematical model that describes the error committed by the sensor as a function of the distance between the sensor itself and the object has been estimated. All the analyses presented here have been conducted for both generations of Kinect, in order to quantify the improvements that characterize every single imaging sensor. Experimental results show that the quality of the delivered model improved applying the proposed calibration procedure, which is applicable to both point clouds and the mesh model created with the Microsoft Fusion Libraries.
Conference Paper
Full-text available
Quality of sleep greatly affects a person's physiological well-being. Traditional sleep monitoring systems are expensive in cost and intrusive enough that they disturb the natural sleep of clinical patients. In our previous work, we proposed a non-intrusive sleep monitoring system to first record depth video in real-time, then offline analyze recorded depth data to track a patient's chest and abdomen movements over time. Detection of abnormal breathing is then interpreted as episodes of apnoea or hypopnoea. Leveraging on recent advances in graph signal processing (GSP), in this paper we propose two new additions to further improve our sleep monitoring system. First, temporal denoising is performed using a block motion vector smoothness prior expressed in the graph-signal domain, so that unwanted temporal flickering can be removed. Second, a graph-based event classification scheme is proposed, so that detection of apnoea / hypopnoea can be performed accurately and robustly. Experimental results show first that graph-based temporal denoising scheme outperforms an implementation of temporal median filter in terms of flicker removal. Second, we show that our graph-based event classification scheme is noticeably more robust to errors in training data than two conventional implementations of support vector machine (SVM).
Conference Paper
Full-text available
Depth sensors like Microsoft Kinect can acquire partial geometric information in a 3D scene via captured depth images, with potential application to non-contact health monitoring. However, captured depth videos typically suffer from low bit-depth representation and acquisition noise corruption, and hence using them to deduce health metrics that require tracking subtle 3D structural details is difficult. In this paper, we propose to capture depth video using Kinect 2.0 to estimate the heart rate of a human subject; as blood is pumped to circulate through the head, tiny oscillatory head motion can be detected for periodicity analysis. Specifically, we first perform a joint bit-depth enhancement / denoising procedure to improve the quality of the captured depth images, using a graph-signal smoothness prior for regularization. We then track an automatically detected nose region throughout the depth video to deduce 3D motion vectors. The deduced 3D vectors are then analyzed via principal component analysis to estimate heart rate. Experimental results show improved tracking accuracy using our proposed joint bit-depth enhancement / denoising procedure, and estimated heart rates are close to ground truth.
Conference Paper
Full-text available
Quality of sleep greatly affects a person’s physiological well-being. Traditional sleep monitoring systems are expensive in cost and intrusive enough that they disturb natural sleep of clinical patients. In this paper, we propose an inexpensive non-intrusive sleep monitoring system using recorded depth video only. In particular, we propose a two-part solution composed of depth video compression and analysis. For acquisition and compression, we first propose an alternating-frame video recording scheme, so that different 8 of the 11 bits in MS Kinect captured depth images are extracted at different instants for efficient encoding using H.264 video codec. At decoder, the uncoded 3 bits in each frame can be recovered accurately via a block-based search procedure. For analysis, we estimate parameters of our proposed dual-ellipse model in each depth image. Sleep events are then detected via a support vector machine trained on statistics of estimated ellipse model parameters over time. Experimental results show first that our depth video compression scheme outperforms a competing scheme that records only the eight most significant bits in PSNR in mid- to high-bitrate regions. Further, we show also that our monitoring can detect critical sleep events such as hypopnoea using our trained SVM with very high success rate.
Article
Full-text available
Image denoising is the most basic inverse imaging problem. As an under-determined problem, appropriate definition of image priors to regularize the problem is crucial. Among recent proposed priors for image denoising are: i) graph Laplacian regularizer where a given pixel patch is assumed to be smooth in the graph-signal domain; and ii) self-similarity prior where image patches are assumed to recur throughout a natural image in non-local spatial regions. In our first contribution, we demonstrate that the graph Laplacian regularizer converges to a continuous time functional counterpart, and careful selection of its features can lead to a discriminant signal prior. In our second contribution, we redefine patch self-similarity in terms of patch gradients and argue that the new definition results in a more accurate estimate of the graph Laplacian matrix, and thus better image denoising performance. Experiments show that our designed algorithm based on graph Laplacian regularizer and gradient-based self-similarity can outperform non-local means (NLM) denoising by up to 1.4 dB in PSNR.
Article
Full-text available
Sparse coding methods have achieved great success in visual tracking, and we present a strong classifier and structural local sparse descriptors for robust visual tracking. Since the summary features considering the sparse codes are sensitive to occlusion and other interfering factors, we extract local sparse descriptors from a fraction of all patches by performing a pooling operation. The collection of local sparse descriptors is combined into a boosting-based strong classifier for robust visual tracking using a discriminative appearance model. Furthermore, a structural reconstruction error based weight computation method is proposed to adjust the classification score of each candidate for more precise tracking results. To handle appearance changes during tracking, we present an occlusion-aware template update scheme. Comprehensive experimental comparisons with the state-of-the-art algorithms demonstrated the better performance of the proposed method.
Article
Full-text available
Thermal infrared imaging has been proposed as a potential system for the computational assessment of human autonomic nervous activity and psychophysiological states in a contactless and noninvasive way. Through bioheat modeling of facial thermal imagery, several vital signs can be extracted, including localized blood perfusion, cardiac pulse, breath rate, and sudomotor response, since all these parameters impact the cutaneous temperature. The obtained physiological information could then be used to draw inferences about a variety of psychophysiological or affective states, as proved by the increasing number of psychophysiological studies using thermal infrared imaging. This paper presents therefore a review of the principal achievements of thermal infrared imaging in computational physiology with regard to its capability of monitoring psychophysiological activity.
Conference Paper
Full-text available
Image denoising is an under-determined problem, and hence it is important to define appropriate image priors for regularization. One recent popular prior is the graph Laplacian regularizer, where a given pixel patch is assumed to be smooth in the graph-signal domain. The strength and direction of the resulting graph-based filter are computed from the graph's edge weights. In this paper, we derive the optimal edge weights for local graph-based filtering using gradient estimates from non-local pixel patches that are self-similar. To analyze the effects of the gradient estimates on the graph Laplacian reg-ularizer, we first show theoretically that, given graph-signal h D is a set of discrete samples on continuous function h(x, y) in a closed region Ω, graph Laplacian regularizer (h D) T Lh D converges to a continuous functional SΩ integrating gradient norm of h in metric space G—i.e., (∇h) T G −1 (∇h)—over Ω. We then derive the optimal metric space G : one that leads to a graph Laplacian regularizer that is discriminant when the gradient estimates are accurate, and robust when the gradient estimates are noisy. Finally, having derived G we compute the corresponding edge weights to define the Lapla-cian L used for filtering. Experimental results show that our image denoising algorithm using the per-patch optimal metric space G outperforms non-local means (NLM) by up to 1.5 dB in PSNR.
Conference Paper
Full-text available
RGB-D cameras, also known as range imaging cameras, are a recent generation of sensors. As they are suitable for measuring distances to objects at high frame rate, such sensors are increasingly used for 3D acquisitions, and more generally for applications in robotics or computer vision. This kind of sensors became popular especially since the Kinect v1 (Microsoft) arrived on the market in November 2010. In July 2014, Windows has released a new sensor, the Kinect for Windows v2 sensor, based on another technology as its first device. However, due to its initial development for video games, the quality assessment of this new device for 3D modelling represents a major investigation axis. In this paper first experiences with Kinect v2 sensor are related, and the ability of close range 3D modelling is investigated. For this purpose, error sources on output data as well as a calibration approach are presented.
Article
Full-text available
Object tracking has been one of the most important and active research areas in the field of computer vision. A large number of tracking algorithms have been proposed in recent years with demonstrated success. However, the set of sequences used for evaluation is often not sufficient or is sometimes biased for certain types of algorithms. Many datasets do not have common ground-truth object positions or extents, and this makes comparisons among the reported quantitative results difficult. In addition, the initial conditions or parameters of the evaluated tracking algorithms are not the same, and thus, the quantitative results reported in literature are incomparable or sometimes contradictory. To address these issues, we carry out an extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria to understand how these methods perform within the same framework. In this work, we first construct a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for the performance analysis. Second, we integrate most of the publicly available trackers into one code library with uniform input and output formats to facilitate large-scale performance evaluation. Third, we extensively evaluate the performance of 31 algorithms on 100 sequences with different initialization settings. By analyzing the quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
Conference Paper
Full-text available
Heart Rate (HR) and Respiration Rate (RR) are considered among the most useful biomedical signals to be observed from a subject in order to evaluate his/her health conditions. HR and RR are routinely monitored in patients recovered in hospitals and eventual variations of these quantities need to be measured and reported. Today HR and RR are measured with standard methods: electrocardiography (ECG) and spirometry (SP). Both this methods need to be in contact with the subject and require the presence of expert personnel to be correctly operated. Consequently, their use is limited to hospitals or ambulatory environments and their diffusion in domestic environments is rare. In this paper we present a novel method for the measurement of HR and RR without contact on a subject. The proposed method is realized by means of a Kinect™ Device (KD). The KD is a widely-diffused multi-sensors device based on a depth-sensor, a camera-sensor and 4 microphones. In our work it has been used in conjunction with a special processing algorithm to calculate the HR and RR values. In order to measure HP and RR 10 healthy subjects were observed with the proposed method and with reference methods (ECG and a SP). Results from tests show that the standard deviation of the residuals (difference between the ECG or SP data and the corresponding measurements obtained by KD) are 6% and 9.7% for HR and RR values respectively. Therefore the proposed measurement method, based on the use of KD, could be used for the home-monitoring of HR and RR values in healthy subject without the presence of experts or clinicians.
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Conference Paper
Conventional video tracking operates over RGB or grey-level data which contain significant clues for the identification of the targets. While this is often desirable in a video surveillance context, use of video tracking in privacy-sensitive environments such as hospitals and care facilities is often perceived as intrusive. Therefore, in this work we present a tracker that provides effective target tracking based solely on depth data. The proposed tracker is an extension of the popular Struck algorithm which leverages a structural SVM framework for tracking. The main contributions of this work are novel depth features based on local depth patterns and a heuristic for effectively handling occlusions. Experimental results over the challenging Princeton Tracking Benchmark (PTB) dataset report a remarkable accuracy compared to the original Stuck tracker and other state-of-the-art trackers using depth and RGB data.
Article
Obstructive sleep apnea, characterized by repetitive obstruction in the upper airway during sleep, is a common sleep disorder that could significantly compromise sleep quality and quality of life in general. The obstructive respiratory events can be detected by attended in-laboratory or unattended ambulatory sleep studies. Such studies require many attachments to a patient's body to track respiratory and physiological changes, which can be uncomfortable and compromise the patient's sleep quality. In this paper, we propose to record depth video and audio of a patient using a Microsoft Kinect camera during his/her sleep, and extract relevant features to correlate with obstructive respiratory events scored manually by a scientific officer based on data collected by Philips system Alice6 LDxS that is commonly used in sleep clinics. Specifically, we first propose an alternating-frame H.264 video encoding scheme and bit recovery scheme at the decoder. Next, we perform depth video temporal denoising using a motion vector graph smoothness prior. Then, we build a dual-ellipse model and track a patient's chest and abdominal movements in the denoised videos. Finally, we extract features from both depth video and audio for classifier training and respiratory event detection. Experimental results show 1) that our depth video compression scheme outperforms a competitor that records only the 8 most significant bits, 2) our graph-based temporal denoising scheme reduces the flickering effect without over-smoothing, and 3) our trained classifiers can deduce respiratory events scored manually based on data collected by system Alice6 LDxS with high accuracy.
Article
Given the prevalence of JPEG compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed DCT coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors---Laplacian prior for DCT coefficients, sparsity prior and graph-signal smoothness prior for image patches---to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error (MMSE) initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the over-complete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared to previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth (PWS) signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.
Conference Paper
Video cameras are increasingly being used to measure human heart rates non-invasively, without contact. Such systems find applications in tele-medicine, remote monitoring of quarantined patients and premature neonates and are also useful for health conscious consumers. Several algorithms have been reported in the literature for measuring the heart rate from videos of human subjects. These algorithms use offline, computationally involved techniques such as Independent Component Analysis (ICA) or Principal Component Analysis (PCA) which render the algorithms unfeasible for implementation on real-time embedded systems. We conducted experiments to find an optimal colorspace for measuring the heart rate. We subsequently used a novel means of optical filtering of this colorspace to develop an accurate real-time algorithm, without the need for ICA or PCA, using ordinary standard definition (SD) web cameras. In this paper we present our algorithm and also compare it with existing state of the art algorithms.
Conference Paper
Real time non-contact heart rate detection is performed by using different methods for periodic signal detection in noise. Heart rate is estimated from a scalar signal formed from the average color values of the observed skin from videos recorded previously at first, and then captured in real time with a built-in laptop webcam. Cramer-Rao lower bounds are calculated as a function of SNR for an assumed parametric model, and heart rate is detected via autocorrelation, maximum likelihood and Fourier-based methods. The performance is evaluated by comparing the error rates for different detection techniques in each case.
Article
The heart pulsation sends out the blood throughout the body. The rate in which the heart performs this vital task, heartbeat rate, is of curial importance to the body. Therefore, measuring heartbeat rate, a.k.a. pulse detection, is very important in many applications, especially the medical ones. To measure it, physicians traditionally, either sense the pulsations of some blood vessels or install some sensors on the body. In either case, there is a need for a physical contact between the sensor and the body to obtain the heartbeat rate. This might not be always feasible, for example, for applications like remote patient monitoring. In such cases, contactless sensors, mostly based on computer vision techniques, are emerging as interesting alternatives. This paper proposes such a system, in which the heartbeats (pulses) are detected by subtle motions that appear on the face due to blood circulation. The proposed system has been tested in different facial expressions. The experimental results show that the proposed system is correct and robust and outperforms state-of-the-art.
Article
Block-based image or video coding standards (e.g. JPEG) compress an image lossily by quantizing transform coefficients of non-overlapping pixel blocks. If the chosen quantization parameters (QP) are large, then hard decoding of a compressed image—using indexed quantization bin centers as reconstructed transform coefficients—can lead to unpleasant blocking artifacts. Leveraging on recent advances in graph signal processing (GSP), we propose a dequantization scheme specifically for piecewise smooth (PWS) images: images with sharp object boundaries and smooth interior surfaces. We first mathematically define a PWS image as a low-frequency signal with respect to an inter-pixel similarity graph with edges of weights 1 or 0. Using quantization bin boundaries as constraints, we then jointly optimize the desired graph-signal and the similarity graph in a unified framework. A generalization to consider generalized piecewise smooth (GPWS) images—where sharp object boundaries are replaced by transition regions—is also proposed. Experimental results show that our proposed scheme outperforms a state-of-the-art dequantization method by 1 dB on average in PSNR.
Conference Paper
Non-contact image photoplethysmography has gained a lot of attention during the last 5 years. Starting with the work of Verkruysse et al. [1], various methods for estimation of the human pulse rate from video sequences of the face under ambient illumination have been presented. Applied on a mobile service robot aimed to motivate elderly users for physical exercises, the pulse rate can be a valuable information in order to adapt to the users conditions. For this paper, a typical processing pipeline was implemented on a mobile robot, and a detailed comparison of methods for face segmentation was conducted, which is the key factor for robust pulse rate extraction even, if the subject is moving. A benchmark data set is introduced focusing on the amount of motion of the head during the measurement.
Article
Contact measurements of the cardiac pulse using the conventional electrocardiogram equipment requires patients to wear adhesive gel patches or chest straps that can cause skin irritation and discomfort. Commercially available pulse oximetry sensors that attach to the fingertips or earlobes also cause inconvenience for patients and the spring-loaded clips can be painful to use. Therefore, a novel robust non-contact technique is developed for the evaluation of heart rate variation. According to the periodic variation of reflectance strength resulting from changes to hemoglobin absorptivity across the visible light spectrum as heartbeats cause changes to blood volume in the blood vessels in the face, a reflectance signal is decomposed from consecutive frames of the green channel of the facial region. Furthermore, ensemble empirical mode decomposition of the Hilbert-Huang transform (HHT) is used to acquire the primary heart rate signal while reducing the effect of ambient light changes. The effective instantaneous frequencies from intrinsic mode functions decomposed by HHT are implemented by the multiple-linear regression model to evaluate heart rates before the frequencies were rectified by maximum likelihood method assuming Poisson distribution, and the minimum elapse time for heart rate evaluation is also evaluated in the estimate process. Experimental results show that our proposed approach provides a convenient non-contact method to evaluate heart rate and outperforms the current state-of-the-art method with higher accuracy and smaller variance.
Article
This paper proposes a remote photoplethysmography measurement technique where human skin color variations are analysed for observing human vital signs including but not limited to average heart rate and variation. Remote monitoring of the vital signs could be useful for non-contact physiological and psychological diagnosis. For this purpose, an off-the-self non-invasive video camera is used. Facial appearance modelling is performed for stabilizing color variations in the selected facial region during the signal acquisition stage. The proposed method offers a novel signal processing approach for extracting the periodic component of the raw color signal for the heart rate and variation estimation. To this end, we have collected a ground truth dataset using a PPG instrument attached to the skin of the subject under observation. Objective performance tests show strong correlation with the ground truth values for the estimated heart rate and variation.
Article
Coverage and accuracy of unobtrusively measured biosignals are generally relatively low compared to clinical modalities. This can be improved by exploiting redundancies in multiple channels with methods of sensor fusion. In this paper, we demonstrate that two modalities, skin color variation and head motion, can be extracted from the video stream recorded with a webcam. Using a Bayesian approach, these signals are fused with a ballistocardiographic signal obtained from the seat of a chair with a mean absolute beat-to-beat estimation error below 25 milliseconds and an average coverage above 90% compared to an ECG reference.