Conference Paper

Generative Adversial Network based Extended Target Detection for Automotive MIMO Radar

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Applying the OS-CFAR algorithm for the peak detection, requires a clustering to group all detections of the same individual target. Therefore, a density-based spatial clustering (DBSCAN) is used [66]. ...
... The design-space of the CNN architecture involves large number of parameters which makes it hard to find the optimum architecture for the given problem definition. Thus, the choice of the architecture design hyper-parameters is mainly inspired from [66], where the authors used a similar architecture for the target detection on sparse radar RD-maps. Both encoder and decoder have a 3-layered convolution layer VOLUME 9, 2021 with a rectified linear unit (ReLu) as the non-linearity function. ...
Article
Full-text available
With the recent advancements in radar systems, radar sensors offer a promising and effective perception of the surrounding. This includes target detection, classification and tracking. Compared to the state-of-the-art, where the state vector of classical tracker considers only localization parameters, this paper proposes an integrated Bayesian framework by augmenting state vector with feature embedding as appearance parameter together with localization parameter. In context of automotive vulnerable road users (VRUs) such as pedestrian and cyclist, the classical tracker poses multiple challenges to preserve the identity of the tracked target during partial or complete occlusion, due to low inter-class (pedestrian-cyclist) variations and strong similarity between intra-class (pedestrian-pedestrian). Subsequently, feature embedding corresponding to target’s micro-Doppler signature are learned using novel Bayesian based deep metric learning approaches. The tracker’s performance is optimized due to a better separability of the targets. At the same time, the classifiers’ performance is enhanced due to Bayesian formulation utilizing the temporal smoothing of the classifier’s embedding vector. In this work, we demonstrate the performance of the proposed Bayesian framework using several vulnerable user targets based on a 77 GHz automotive radar.
... Recently, this model has been used to a variety of fields, including medicine [13], to minimize ultrasonic speckle in echo graphic images. To produce load profiles with conditions like temperature and time of day, [14] employed a variation of Generative Adversial Network (GAN) called the CONDITIONAL GAN. On an electricity load database, [15] investigated and compared multiple GAN variations to forecast short-term profiles. ...
Chapter
This study provides a comprehensive comparison of the different algorithms implemented on a reservoir system, and the results are statistically analyzed from the results of other machine learning algorithms. Different weights and activation methods have been used to obtain the results. The algorithms implemented on the data of reservoir system are generative adversarial networks, synthetic model, non-dominated sorting genetic algorithm 2. Later on, we have done comparisons and visualization on the data obtained We have attempted to implement generative adversarial networks on a reservoir system that is in the time series representation and the data values are from June 1, 1989, to May 1, 2016. Data was collected from the reservoir authorities, and they did not have the records for some of the months. The target is to regenerate that empty values and find out what could be the next data value in the upcoming months.
Article
When deep learning is applied to intelligent textile defect detection, the insufficient training data may result in low accuracy and poor adaptability of varying defect types of the trained defect model. To address the above problem, an enhanced generative adversarial network for data augmentation and improved fabric defect detection was proposed. Firstly, the dataset is preprocessed to generate defect localization maps, which are combined with non-defective fabric images and input into the network for training, which helps to better extract defect features. In addition, by utilizing a Double U-Net network, the fusion of defects and textures is enhanced. Next, random noise and the multi-head attention mechanism are introduced to improve the model’s generalization ability and enhance the realism and diversity of the generated images. Finally, we merge the newly generated defect image data with the original defect data to realize the data enhancement. Comparison experiments were performed using the YOLOv3 object detection model on the training data before and after data enhancement. The experimental results show a significant accuracy improvement for five defect types – float, line, knot, hole, and stain – increasing from 41%, 44%, 38%, 42%, and 41% to 78%, 76%, 72%, 67%, and 64%, respectively.
Thesis
Full-text available
Both pedestrians and cyclists as vulnerable road users (VRUs) always exhibit agile and complex behaviors whose safety protection has attracted the highest priority concerns in the design of advanced driver assistance systems (ADAS). Additionally, a pedestrian with the radar cross section (RCS) of about -5dBm is a typical low-observable target, whose backscattered signals can be masked by the strong reflections from the ground or metallic targets (cars, guardrails) in urban regions. Recently, there has been an enormous study in the detection and classification of VRUs using different point estimate-based neural network architectures. Whereas, there is no discriminative target signature estimation method in the state-of-the-art (SOTA). As a result, SOTA methods inevitably suffer from appearance learning and fail to correctly recognize targets in the presence of similar targets. Additionally, SOTA methods lack to quantify the uncertainty of their prediction. In this thesis, the problem is approached from the perspective of both deep representation learning for increasing the robustness and Bayesian inference for measuring the uncertainty of the estimates from learning algorithms. In practice, there are several challenges to learning-based solutions using radar systems, particularly for a varying set of target representations. In the varying set of target representations, the system needs to handle variations of the input data for the unknown operating environment with many similar target classes. Conventional deep learning approaches rely on the softmax output layer which provides separability only for target classes but does not provide discriminative class boundaries. Hence, many unknown classes are erroneously predicted as one of the known classes with high confidence, resulting in poor performance in real-world environments. Furthermore, other challenges arise due to the strong similarity between intra-class features from one target to another or the sparse representation of the target. To address this, a novel integrated representation learning framework, dubbed as BayesRadar, and a hierarchical attention-based end-to-end learning framework, dubbed as HARadNet are proposed in this thesis. Both the frameworks are designed for different target representations as input i.e. BayesRadar takes image-based micro-Doppler signatures and HARadNet takes point-clouds based spatial signatures of the targets. The BayesRadar framework addresses the challenges that arises due to the strong similarity between intra-class features from one target to another. This is done by learning to project the input map to a latent feature space where similar classes are grouped while dissimilar classes are far apart. Thus, the BayesRadar framework simultaneously learns separable interclass and compact discriminative intra-class differences, essential for open set classification problems. Subsequently, the feature embedding corresponding to the target’s signature is fed inside the tracker’s state and brings two major advantages. First, the classifier’s performance is enhanced due to the temporal smoothing of the feature embedding vector. Secondly, the tracker’s performance is better optimized due to appearance modality inside the target association formulation. On the other hand, HARadNet uses a direction field vector as a motion modality to achieve attention inside the network at different latent feature spaces to enhance the representation learning for VRUs. The attention operates at the different hierarchies of the latent feature abstraction layer with each point sampled according to a conditional direction field vector, allowing the network to exploit and learn a joint feature representation and correlation to its neighborhood. This leads to a significant improvement in the performance of the classification. This approach is very useful for sparse input data such as radar pointclouds. Thus, the HARadNet framework successfully enables an end-to-end framework for VRU detection and classification using a single frame. To enable better optimization of both learning frameworks i.e. BayesRadar and HARadNet, hybrid loss functions are proposed in this thesis. Further, both BayesRadar and HARadNet are evaluated in the context of Bayesian inference to quantify the uncertainty of the algorithms estimates. The resulting hybrid network architecture can predict estimates in the form of a maximum likelihood distribution conditioned on the input data representation and the network architecture design. Consequently, the framework can quantify the stochasticity and uncertainty of deep neural network architectures without additional parameters using multiple inferences defined in the literature. The proposed methods show generalization and scalability on different seen and unseen target classes over different target representations
Chapter
The chapter introduces the fundamentals of radar signal processing, and how a target is detected to track and explains the rationale behind it. The chapter introduces deep learning, its evolution over time, and the different facets that make deep learning so powerful. Various components of conventional convolutional neural networks, recurrent neural network, fully connected layers in relation to various tasks such as classification, localization, segmentation are introduced.
Article
Target localization and classification from radar point clouds is a challenging task due to the inherently sparse nature of the data with highly non-uniform target distribution. This work presents HARadNet, a novel attention based anchor free target detection and classification network architecture in a multi-task learning framework for radar point clouds data. A direction field vector is used as motion modality to achieve attention inside the network. The attention operates at different hierarchy of the feature abstraction layer with each point sampled according to a conditional direction field vector, allowing the network to exploit and learn a joint feature representation and correlation to its neighborhood. This leads to a significant improvement in the performance of the classification. Additionally, a parameter-free target localization is proposed using Bayesian sampling conditioned on a pre-trained direction field vector. The extensive evaluation on a public radar dataset shows an substantial increase in localization and classification performance.
Conference Paper
Automatic radar based classification of automotive targets, such as pedestrians and cyclist, poses several challenges due to low inter-class variations among different classes and large intra-class variations. Further, different targets required to track in typical automotive scenario can have completely varying dynamics which gets challenging for tracker using conventional state vectors. Compared to state-of-the-art using independent classification and tracking, in this paper, we propose an integrated tracker and classifier leading to a novel Bayesian framework. The tracker's state vector in the proposed framework not only includes the localization parameters of the targets but is also augmented with the targets's feature embedding vector. In consequence, the tracker's performance is optimized due to a better separability of the targets. Furthermore, the classifier's performance is enhanced due to Bayesian formulation utilizing the temporal smoothing of classifier's embedding vector.
Article
Full-text available
Road extraction from aerial images has been a hot research topic in the field of remote sensing image analysis. In this letter, a semantic segmentation neural network which combines the strengths of residual learning and U-Net is proposed for road area extraction. The network is built with residual units and has similar architecture to that of U-Net. The benefits of this model is two-fold: first, residual units ease training of deep networks. Second, the rich skip connections within the network could facilitate information propagation, allowing us to design networks with fewer parameters however better performance. We test our network on a public road dataset and compare it with U-Net and other two state of the art deep learning based road extraction methods. The proposed approach outperforms all the comparing methods, which demonstrates its superiority over recently developed state of the arts.
Article
Full-text available
Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.
Article
Full-text available
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
Full-text available
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Article
The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Conference Paper
In radar the reflected signal is received by the antenna which is amplified, down converted and then the required signal is extracted (video signal). The video signal is then passed through Moving Target Indicator (MTI) processor which suppresses clutter. The post-MTI data is passed through Constant False Alarm Rate (CFAR) processor which qualifies echoes as targets or otherwise. The role of CFAR processor is to determine a threshold, above which any return can be considered to be from target. If this threshold is too low, more targets will be detected at the expense of more false alarms. Conversely, if the threshold is set too high, then fewer targets will be detected but the number of false alarms will be less. The distribution of the clutter can be approximated by certain probability distribution functions, where each medium follows a different probability distribution. We shall investigate important CFAR processing techniques in Gaussian noise and Rayleigh clutter. The threshold is set adaptive, that is the threshold is raised or lowered, to maintain a required Probability of False Alarm (Pfa). This paper discusses various CFAR processing techniques, by applying them to raw video of real radar, analyzing advantages and disadvantages of each technique.
Article
Generative Adversarial Nets [8] were recently introduced as a novel way to train generative models. In this work we introduce the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. We show that this model can generate MNIST digits conditioned on class labels. We also illustrate how this model could be used to learn a multi-modal model, and provide preliminary examples of an application to image tagging in which we demonstrate how this approach can generate descriptive tags which are not part of training labels.
Sensor fusion: A comparison of sensing capabilities of human drivers and highly automated vehicles
  • B Schoettle