Conference Paper
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Automatic radar based classification of automotive targets, such as pedestrians and cyclist, poses several challenges due to low inter-class variations among different classes and large intra-class variations. Further, different targets required to track in typical automotive scenario can have completely varying dynamics which gets challenging for tracker using conventional state vectors. Compared to state-of-the-art using independent classification and tracking, in this paper, we propose an integrated tracker and classifier leading to a novel Bayesian framework. The tracker's state vector in the proposed framework not only includes the localization parameters of the targets but is also augmented with the targets's feature embedding vector. In consequence, the tracker's performance is optimized due to a better separability of the targets. Furthermore, the classifier's performance is enhanced due to Bayesian formulation utilizing the temporal smoothing of classifier's embedding vector.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Illustration of the training (dotted lines) and inference (solid lines) phase of the developed integrated Bayesian radar classification and tracking framework[2]. ...
Conference Paper
Full-text available
Autonomous driving is disrupting the automotive industry as we know it today. For this, fail-operational behavior is essential in the sense, plan, and act stages of the automation chain in order to handle safety-critical situations on its own, which currently is not reached with state-of-the-art approaches. The European ECSEL research project PRYSTINE realizes Fail-operational Urban Surround perceptION (FUSION) based on robust Radar and LiDAR sensor fusion and control functions in order to enable safe automated driving in urban and rural environments. This paper showcases some of the key exploitable results (e.g., novel Radar sensors, innovative embedded control and E/E architectures, pioneering sensor fusion approaches, AI-controlled vehicle demonstrators) achieved until its final year 3.
Thesis
Full-text available
Both pedestrians and cyclists as vulnerable road users (VRUs) always exhibit agile and complex behaviors whose safety protection has attracted the highest priority concerns in the design of advanced driver assistance systems (ADAS). Additionally, a pedestrian with the radar cross section (RCS) of about -5dBm is a typical low-observable target, whose backscattered signals can be masked by the strong reflections from the ground or metallic targets (cars, guardrails) in urban regions. Recently, there has been an enormous study in the detection and classification of VRUs using different point estimate-based neural network architectures. Whereas, there is no discriminative target signature estimation method in the state-of-the-art (SOTA). As a result, SOTA methods inevitably suffer from appearance learning and fail to correctly recognize targets in the presence of similar targets. Additionally, SOTA methods lack to quantify the uncertainty of their prediction. In this thesis, the problem is approached from the perspective of both deep representation learning for increasing the robustness and Bayesian inference for measuring the uncertainty of the estimates from learning algorithms. In practice, there are several challenges to learning-based solutions using radar systems, particularly for a varying set of target representations. In the varying set of target representations, the system needs to handle variations of the input data for the unknown operating environment with many similar target classes. Conventional deep learning approaches rely on the softmax output layer which provides separability only for target classes but does not provide discriminative class boundaries. Hence, many unknown classes are erroneously predicted as one of the known classes with high confidence, resulting in poor performance in real-world environments. Furthermore, other challenges arise due to the strong similarity between intra-class features from one target to another or the sparse representation of the target. To address this, a novel integrated representation learning framework, dubbed as BayesRadar, and a hierarchical attention-based end-to-end learning framework, dubbed as HARadNet are proposed in this thesis. Both the frameworks are designed for different target representations as input i.e. BayesRadar takes image-based micro-Doppler signatures and HARadNet takes point-clouds based spatial signatures of the targets. The BayesRadar framework addresses the challenges that arises due to the strong similarity between intra-class features from one target to another. This is done by learning to project the input map to a latent feature space where similar classes are grouped while dissimilar classes are far apart. Thus, the BayesRadar framework simultaneously learns separable interclass and compact discriminative intra-class differences, essential for open set classification problems. Subsequently, the feature embedding corresponding to the target’s signature is fed inside the tracker’s state and brings two major advantages. First, the classifier’s performance is enhanced due to the temporal smoothing of the feature embedding vector. Secondly, the tracker’s performance is better optimized due to appearance modality inside the target association formulation. On the other hand, HARadNet uses a direction field vector as a motion modality to achieve attention inside the network at different latent feature spaces to enhance the representation learning for VRUs. The attention operates at the different hierarchies of the latent feature abstraction layer with each point sampled according to a conditional direction field vector, allowing the network to exploit and learn a joint feature representation and correlation to its neighborhood. This leads to a significant improvement in the performance of the classification. This approach is very useful for sparse input data such as radar pointclouds. Thus, the HARadNet framework successfully enables an end-to-end framework for VRU detection and classification using a single frame. To enable better optimization of both learning frameworks i.e. BayesRadar and HARadNet, hybrid loss functions are proposed in this thesis. Further, both BayesRadar and HARadNet are evaluated in the context of Bayesian inference to quantify the uncertainty of the algorithms estimates. The resulting hybrid network architecture can predict estimates in the form of a maximum likelihood distribution conditioned on the input data representation and the network architecture design. Consequently, the framework can quantify the stochasticity and uncertainty of deep neural network architectures without additional parameters using multiple inferences defined in the literature. The proposed methods show generalization and scalability on different seen and unseen target classes over different target representations
Article
Target localization and classification from radar point clouds is a challenging task due to the inherently sparse nature of the data with highly non-uniform target distribution. This work presents HARadNet, a novel attention based anchor free target detection and classification network architecture in a multi-task learning framework for radar point clouds data. A direction field vector is used as motion modality to achieve attention inside the network. The attention operates at different hierarchy of the feature abstraction layer with each point sampled according to a conditional direction field vector, allowing the network to exploit and learn a joint feature representation and correlation to its neighborhood. This leads to a significant improvement in the performance of the classification. Additionally, a parameter-free target localization is proposed using Bayesian sampling conditioned on a pre-trained direction field vector. The extensive evaluation on a public radar dataset shows an substantial increase in localization and classification performance.
Article
Full-text available
Variational Auto-Encoders (VAEs) are deep latent space generative models which have been immensely successful in many applications such as image generation, image captioning, protein design, mu-tation prediction, and language models among others. The fundamental idea in VAEs is to learn the distribu-tion of data in such a way that new meaningful data can be generated from the encoded distribution. This concept has led to tremendous research and variations in the design of VAEs in the last few years creating a field of its own, referred to as unsupervised representation learning. This paper provides a much-needed comprehensive evaluation of the variations of the VAEs based on their end goals and resulting architectures. It further provides intuition as well as mathematical formulation and quantitative results of each popular variation, presents a concise comparison of these variations, and concludes with challenges and future oppor-tunities for research in VAEs.
Article
Full-text available
Short-range compact radar systems offer attractive modality for localization and tracking of human targets in indoor and outdoor environments for industrial and consumer applications. Micro-Doppler radar reflections from human targets can be sensed and used for human activity classification, which has applications in human-computer interaction and health assessment among others. Traditionally, the detected human targets' location are tracked and its micro-Doppler spectrogram extracted for further activity classification of the human target. In this paper, we propose a novel integrated human localization and activity classification using unscented Kalman filter and demonstrate our results using a short-range 60-GHz frequency modulated continuous wave radar. The proposed solution is shown to result in improved classification accuracy with the capability of providing uncertainty with associated classification probabilities and thus is a simple mechanism to achieve Bayesian classification.
Conference Paper
Full-text available
We present a radar-based detection processing framework for accurate detection and counting of human targets in an indoor environment. This can be used to control lighting, heating, ventilation and air conditioning (HVAC) in smart homes and other presence related loads in commercial, office, and public spaces. Such smart home applications can facilitate monitoring, controlling, and saving energy. Conventionally, the radar range-Doppler processing pipeline includes moving target indicators (MTI) to remove static targets, maximal ratio combining (MRC) to integrate data across antennas, constant false alarm rate (CFAR) based detectors and then clustering algorithms to generate the target range-Doppler detections. However, the conventional pipeline, in case of indoor human target detection, suffers from ghost targets and multi-path reflections from static objects such as walls, furniture, etc. Further, conventional parametric clustering algorithms lead to single target splits and adjacent target merges in the target range-Doppler detections. To overcome such issues, we in this paper, propose a deep residual U-net architecture that generates human target detections directly from static target removed range-Doppler images (RDI). To train this network, we record RDIs from a variety of indoor scenes with different configurations and multiple humans performing several regular activities. We devise a custom loss function and apply augmentation strategies to generalize this model during real-time inference of the model. We demonstrate that the proposed network can efficiently learn to detect and correctly count human targets under different indoor environments while the conventional signal processing pipeline fails.
Article
Full-text available
This paper presents a millimeter wave radar in the 24-GHz ISM band for detection, tracking, and classification of human targets. Linear frequency modulation of the transmit signal and two receive antennas enable distance and angle measurements, respectively. Multiple consecutive frequency chirps are used for target velocity calculation. Hardware as well as firmware concepts of the proposed system are described in detail. Various algorithms for human detection and tracking are investigated and combined to a new signal processing routine optimized for compactness and low-power to run on a microcontroller. Additionally, a novel Doppler-compensated angle-ofarrival estimation method as well as a one-class support vector machine for human classification are proposed to further enhance the human detection and tracking performance. The achieved performances of the designed hardware and the implemented algorithm are verified in extensive measurements. The distance and angle errors of the realized radar sensor are at most 25 cm along a measurement range of 18m and 10° for a two-sided angle sweep of 65°, respectively. The achieved range resolution is 0.9 m. Dedicated verifications of the most important signal processing routines are presented to verify their functionality and experiments with several human targets illustrate the performance and limits of the overall tracking algorithm. It is shown that range, velocity, and angle of up to five humans are correctly detected and tracked. The presented one-class classifier successfully distinguishes human targets from other quasi-static targets like trees and shadowing effects of human subjects on walls.
Article
Full-text available
Deep metric learning has been demonstrated to be highly effective in learning semantic representation and encoding information that can be used to measure data similarity, by relying on the embedding learned from metric learning. At the same time, variational autoencoder (VAE) has widely been used to approximate inference and proved to have a good performance for directed probabilistic models. However, for traditional VAE, the data label or feature information are intractable. Similarly, traditional representation learning approaches fail to represent many salient aspects of the data. In this project, we propose a novel integrated framework to learn latent embedding in VAE by incorporating deep metric learning. The features are learned by optimizing a triplet loss on the mean vectors of VAE in conjunction with standard evidence lower bound (ELBO) of VAE. This approach, which we call Triplet based Variational Autoencoder (TVAE), allows us to capture more fine-grained information in the latent embedding. Our model is tested on MNIST data set and achieves a high triplet accuracy of 95.60% while the traditional VAE (Kingma & Welling, 2013) achieves triplet accuracy of 75.08%.
Conference Paper
Full-text available
Pushed by the EuroNCAP regulations the number of autonomous emergency braking systems for pedestrians (AEB-P) is rapidly increasing since the year 2016. The same rise is expected for cyclist protection systems driven by new test scenarios of EuroNCAP from 2018 onwards. To get an adequate reaction of the system, target objects have to be classified clearly. Visual sensors provide some benefits in object recognition, but the performance suffers in adverse environmental conditions. In this area radar sensors have proven as very robust against darkness, fog, or rain. For classification unique features are required. To extract relevant features, a motion analysis of cyclists is presented in this work which is subsequently used to derive a reflection point model. Specific properties of radar evaluation have been added to the model too. In combination with an automotive radar sensor model, detection limits of actual sensors can be explored. It is also possible to determine the performance requirements for future sensor generations and to evaluate new signal processing algorithms.
Article
Full-text available
Biological motion contains information about the identity of an agent as well as about his or her actions, intentions, and emotions. The human visual system is highly sensitive to biological motion and capable of extracting socially relevant information from it. Here we investigate the question of how such information is encoded in biological motion patterns and how such information can be retrieved. A framework is developed that transforms biological motion into a representation allowing for analysis using linear methods from statistics and pattern recognition. Using gender classification as an example, simple classifiers are constructed and compared to psychophysical data from human observers. The analysis reveals that the dynamic part of the motion contains more information about gender than motion-mediated structural cues. The proposed framework can be used not only for analysis of biological motion but also to synthesize new motion patterns. A simple motion modeler is presented that can be used to visualize and exaggerate the differences in male and female walking patterns.
Conference Paper
Full-text available
To solve the radar target tracking problem with range rate measurements, in which the errors between range and range rate measurements are correlated, a sequential unscented Kalman filter (SUKF) is proposed in this paper. A pseudo measurement is constructed by block-partitioned Cholesky factorization first, this can keep the range, bearing and elevation (or two direction cosine) measurements unchanged, while the errors between the original range and range rate measurement are decorrelated; then based on the UKF, the bearing, elevation (or two direction cosine) and the pseudo measurement are sequentially processed to enhance the filtering precision and the computational efficiency simultaneously. Validity and consistency of the new proposed algorithm is verified by Monte-Carlo simulation.
Article
Full-text available
A low-complexity radar for human tracking in three-dimensional space is reported. The system consists of a three-element receiving array and two continuous-wave (CW) frequencies configured to provide azimuth bearing, elevation bearing and range measurements. Doppler processing is used for clutter removal as well as to separate multiple moving targets. A radar prototype is constructed and tested. Three-dimensional tracking of multiple humans is performed in both unobstructed and through-wall scenarios.
Book
This exciting new resource presents emerging applications of artificial intelligence and deep learning in short-range radar. The book covers applications ranging from industrial, consumer space to emerging automotive applications. The book presents several human-machine interface (HMI) applications, such as gesture recognition and sensing, human activity classification, air-writing, material classification, vital sensing, people sensing, people counting, people localization and in-cabin automotive occupancy and smart trunk opening. The underpinnings of deep learning are explored, outlining the history of neural networks and the optimization algorithms to train them. Modern deep convolutional neural network (DCNN), popular DCNN architectures for computer vision and their features are also introduced. The book presents other deep learning architectures, such as long-short term memory (LSTM), auto-encoders, variational auto-encoders (VAE), and generative adversarial networks (GAN). The application of human activity recognition as well as the application of air-writing using a network of short-range radars are outlined. This book demonstrates and highlights how deep learning is enabling several advanced industrial, consumer and in-cabin applications of short-range radars, which weren't otherwise possible. It illustrates various advanced applications, their respective challenges, and how they are been addressed using different deep learning architectures and algorithms. Table of Content - 1. Introduction to Radar Signal Processing 2. Introduction to Deep Learning 3. Gesture Sensing and Recognition 4. Human Activity Recognition and Elderly-Fall Detection 5. Air Writing 6. Material Classification 7. Vital Sensing & Classification 8. People Sensing, Counting & Localization 9. Automotive In-cabin Sensing
Article
Variational autoencoders provide a principled framework for learning deep latent-variable models and corresponding inference models. In this work, we provide an introduction to variational autoencoders and some important extensions.
Chapter
Micro-Doppler effects in narrowband radar are introduced in this chapter. We mainly analyze the micro-Doppler effects induced by rotations, vibrations, and precessions of targets. Rotations, vibrations, and precessions are typical micromotion dynamics of targets in the real world. Typical rotations include rotations of helicopter rotors, mechanical scanning radar antennas, turbine blades, etc. Typical vibrations include engine-induced car surface vibrations, mechanical oscillations of a bridge, etc. Precession is a major movement form of spatial targets, for example, precessions usually go with a ballistic missile in its midcourse flight. Analyzing the micro-Doppler effects of rotations, vibrations, and precessions provides a basic foundation for further discussions on micro-Doppler effects of complicated motions. The influence of radar platform vibration on micro-Doppler effect analysis is also investigated. Finally, related MATLAB codes are provided on the companion Website: http://booksite.elsevier.com/9780128098615/ .
Article
Despite significant recent advances in the field of face recognition, implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors. Our method uses a deep convolutional network trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches. To train, we use triplets of roughly aligned matching / non-matching face patches generated using a novel online triplet mining method. The benefit of our approach is much greater representational efficiency: we achieve state-of-the-art face recognition performance using only 128-bytes per face. On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%. On YouTube Faces DB it achieves 95.12%. Our system cuts the error rate in comparison to the best published result by 30% on both datasets.
Conference Paper
This paper presents a multiple hypothesis tracking (MHT) framework for tracking the ranges and velocities of a variable number of moving human targets via a mono-static ultra-wideband (UWB) radar. The multi-target tracking (MTT) problem for UWB radar-based human target tracking differs from traditional applications because of the multitude of observations (multipath scattering) per target in each scan, due to the short spatial extent of the transmitted UWB signal pulse width. We develop an MHT framework for UWB radar-based multiple human target tracking that extends a previously studied human tracking algorithm. We present experimental results in which a monostatic UWB radar tracks both individual and multiple human targets, even with changing numbers of targets across radar scans.
Article
An artificial neural network is proposed to track a human using the Doppler information measured by a set of spatially distributed sensors. The neural network estimates the target position and velocity given the observed Doppler data from multiple sensors. It is trained using data from a simple point scatterer model in free space. The minimum required number of sensors is investigated for the robust target tracking. The effect of sensor position on the estimation error is studied. For the verification of the proposed method, a toy car and a human moving in a circular track are measured in line-of-sight and through-wall environments. The resulting normalized estimation errors on the target parameters are less than 5%.
Conference Paper
This paper presents an algorithm for human presence detection and tracking using an Ultra-Wideband (UWB) impulse-based mono-static radar. UWB radar can complement other human tracking technologies, as it works well in poor visibility conditions. UWB electromagnetic wave scattering from moving humans forms a complex returned signal structure which can be approximated to a specular multi-path scattering model (SMPM). The key technical challenge is to simultaneously track multiple humans (and non-humans) using the complex scattered waveform observations. We develop a multiple-hypothesis tracking (MHT) framework that solves the complicated data association and tracking problem for an SMPM of moving objects/targets. Human presence detection utilizes SMPM signal features, which are tested in a classical likelihood ratio (LR) detector framework. The process of human detection and tracking is a combination of the MHT method and the LR human detector. We present experimental results in which a mono-static UWB radar tracks human and non-human targets, and detects human presence by discerning human from moving non-human objects.
Article
A numerically well-conditioned, quasi-extended Kalman filter is proposed. The filter is numerically described. The simulation results presented show that the estimation performance of the quasi-extended filter is superior, for short distances, compared with the widely used linear tracking filters. In addition, the simplicity of the quasi-extended filter makes it very easy to implement
Sequential unscented kalman filter for radar target tracking with range rate measurements
  • X R Zhansheng Duan
  • Chongzhao Li
  • Hongyan Han
  • Zhu
Zhansheng Duan, X. R. Li, Chongzhao Han, and Hongyan Zhu, "Sequential unscented kalman filter for radar target tracking with range rate measurements," in 2005 7th International Conference on Information Fusion, vol. 1, 2005, pp. 8 pp.-.
Bayesian representation learning with oracle constraints
  • T Karaletsos
  • S Belongie
  • G Rätsch
Bayesian representation learning with oracle constraints
  • karaletsos