Content uploaded by Diego Andina
Author content
All content in this area was uploaded by Diego Andina on May 22, 2017
Content may be subject to copyright.
(1)
Application of a Neural Network to Radar Detection
Diego ANDINA and José L. SANZ-GONZALEZ
Departamento de Señales, Sistemas y Radiocomunicaciones, ETSI de Telecomunicación Universidad Politécnica de
Madrid, Ciudad Universitaria s/n 28040 Madrid, Spain, Tel/Fax: +34 [1] 549 5700 (ext.384 ) / 543 9652,
E-Mail: andina@ics.upm.es
Abstract. The application of Neural Networks to radar detection has many open questions and some of them are solved
in this paper. First, we propose a network structure useful for the problem of binary detection. We model the input as signal
and noise (given by its complex envelope), and the binary output are 1 or 0. We evaluate different structures and their
dependence on training signal-to-noise ratio and the threshold value. Then, we evaluate its performance by Montecarlo
trials. We present its Receiver Operating Characteristics (ROC) and detection curves.
1. Introduction
We could briefly reduce the binary detection problem as having to decide if an input complex value (the complex envelope
involving signal and noise) has to be classified as one of two outputs, 0 or 1. Neural networks have proved their abilities in
classifying problems and could have interesting nonlinear capabilities for detection when the input is affected with non-Gaussian
noise. Obviously, the binary detection problem is highly dependent on each application, but we could try to model each
characteristic as a learning parameter, or as modifications on the network structure. For example, the need of processing complex
signals with back-propagation learning [2] can be characterized by complex weights and adapted sigmoidal function or simply
by separating the inputs into its real and imaginary parts and doubling the number of input nodes. Also, the presence at the output
of only 0 or 1 does not imply that the nodes in our network must have hard limiters: we can establish a threshold at the output
and assign the two binary values to each sides separated by this threshold.
In this paper, we model the input at time t as a complex value composed by
were xi is the input vector, si is the signal vector (s1 corresponds to "0" and s2 corresponds to "1") and n is the noise vector. Each
component of xi corresponds to the complex envelope of the received signal.
At the neural network output we will have values ranging in [0,1]. Then we will have to choose a threshold value T0 [0,1]
so that output values [0,T) will be considered as binary output 0 and values in [T,1] will represent value 1.
2. The Neural Network as Detector
For the present study, we have chosen one representative algorithm for supervised learning: multi-layer perceptron with back-
propagation. This choice is mainly motivated by the fact that it is an efficient technique, widely used for classification tasks.
Back-propagation have performed well for many real problems and has become the most popular learning algorithm for multi
layer networks. This learning method yield a good approximation to the detection problem.
One of the parameters to choose is the learning rate that can be the same for every weight in the network, different for each
layer, different for each node or different for each weight in the network. In general, to determine the best learning rate is not
an easy task, so we have chosen a general solution proposed in [3] making the learning rate for each node inversely proportional
to the average magnitude of vectors feeding into the node.
In the basic algorithm to update each weight, we add the well-known momentum term [3] as a simple approach to adapt the
learning rate as a function of the local curvature on the error surface.
For stopping the training algorithm there are several methods. You can terminate it when the magnitude of the gradient is
sufficiently small, since by definition the gradient will be zero at the minimum. You can stop the algorithm when the estimation
error falls bellow a fixed threshold, that fulfil the starting requirements. Or you can stop when a fixed number of iterations have
been performed, although there is little guarantee that this stopping condition will terminate the algorithm at a minimum. In fact,
with this solutions one do not optimize the net, premature terminating the learning algorithm.
The method we have chosen is "cross validation"; we split the data into two sets: a training set which is used to train the
network, and a test set which is used to estimate the error probability (Pe) of the neural network detector. During learning, the
performance of the network on the training data will continue to improve, but its performance on the test data will improve until
a point, beyond which it will start to degrade. It is at this point, were the network starts to be overtrained, that the learning
algorithm is terminated. Although more computationally intensive, this method avoids premature termination, improving the
generalization performance of the network. For low values of Pe you can have some estimation problems because very low
probability values will need extremely large testing sets. For better estimation of probabilities you could then use techniques
such as Importance Sampling, although it complicates the algorithms.
If you find that the rate of convergence is too slow, you can use learning rate adaptation methods as proposed in [4]. The
essence of this method is to trace the curvature of the error propagation surface. It increases the learning rate if the error
performance surface is flat at the current point in the parameter space. Otherwise, the learning rate is decreased to avoid potential
oscillations. The drawback of this method is the increase in computational complexity, since every network weight has its own
(2a)
(2b)
learning rate to be computed. However, exploiting the fact that the weight changes (due to each training pattern) are usually small
compared with the magnitude of the weights, we can combine a gradient reuse method [5] with the learning rate adaptation
method.
Although the net will deal with complex signal, it will be an all real coefficient one, as we have indicate in the previous
section. To provide generalization, it will have at least three layers. The number of layers and nodes in them will be discussed
in the next section. The last layer will have only one node. The output will be "1" or "0" after thresholding.
3. Application to radar
For the radar case we can build a simplified model of the input, the complex envelope constituted by a sequence of M
complex samples (the radar azimuth samples, referred to the same range bin). We define the two detection hypothesis
where T0 is the pulse repetition period, k varies from 0 to M-1 (being MT0 the time on target), S is the signal amplitude, Θ
indicates an initial phase (constant in the same sequence) and n(kT0) represents an uncorrelated zero-mean Gaussian variable
with variance σ2. Each complex input is separated in its real and imaginary parts, yielding two real inputs to the net; so the
number of input units must be 2M. This input model allow us to generate as many training and test pairs as we need in our
analysis. Then we compare different MLP structures. Taking M=8 as a typical value for radar detection, the input layer has 16
nodes. We choose the threshold value T to achieve a given (Pd , Pfa) relationship.
The training and testing pair has been simulated in the computer with the noise standard deviation, σ=1, for different values
of the training signal-to-noise ratio (TSNR). One of the parameters that have shown more influence in the performance of the
net is this training signal-to-noise ratio.
The learning procedure consists of alternately presenting sequences of only noise and signal plus noise samples, with a given
TSNR, so that the desired output alternates between 0 and 1 at each iteration. The desired output value is 0 for only noise and
1 in the presence of signal and noise. We vary the phase Θ randomly in the interval [0,2π) for imposing the network to
generalize on the input phases during the learning procedure.
4. Computer Results
In our experiment, we have chosen an MLP with 8 nodes in the hidden layer (so the input layer has 16 nodes) as a result of
a thorough study of the net structure were we have seen that not significatively improvements are achieved by increasing the
complexity of the net, at least for this type of model. A more complexity of the net will probably be needed for more complex
models of signal and noise, and for increasing the robustness of the detector. But this simple net could be good enough if the
statistical characteristics of the input does not change. Or, due to its quickly training, it could present interesting adaptation
properties to changes in input distributions, if continuously (real time) trained.
4.1. ROC curves.
First we study how the network performs under changes in the training signal-to-noise ratio (TSNR), by means of the
Receiver Operating Characteristic (ROC). What we could expect is that a net trained with a TSNR is good to detect inputs which
Signal-to-Noise Ratio (SNR) is within certain range of values, close to its TSNR. We can see in Figure 1(a) that it Figure
1 Receiver Operating Characteristic (ROC) curves: Detection Probability (Pd) vs false alarm Probability (Pfa) for a MLP with
8 nodes in 1 hidden layer and 3 different Training Signal-to-Noise Ratio (TSNR). (a) Input Signal-to-Noise-Ratio (SNR)= 6
dB, (b) SNR= 0 dB.
is generally true for a SNR= 6 dB. For low values of this SNR, we see that the value of Pfa impose a limit to the last statement.
In Figure 1(b), with SNR= 0 dB, the net trained with 3 dB present worst characteristics than that of 6 dB because for low values
of TSNR the value of Pfa impose a threshold value too high to get a good Pd.
4.2. Detection curves.
In this section we study how the network performs under changes in the training signal-to-noise ratio (TSNR), by means of
the detection curves. As we can see in Figure 2, if you train with a very low TSNR, the value of Pfa will limit the
Figure 2. Detection Probability (Pd) vs Signal-to-Noise Ratio (SNR) for a MLP with 8 nodes in 1 hidden layer and 4
different Training Signal-to-Noise Relationships (TSNR). (a) Pfa= 0.01, (b)Pfa= 0.001.
detection capabilities of the net. If you train with a very high TSNR, the values of Pfa will not be a limitation for the detection
capabilities, but it does not present better performance of detection than lower TSNR (# 6 dB) . In other words train your net
in adverse conditions (low TSNR), and it will be a better detector for the same SNR than if you train it in favourable conditions
(high SNR). But if the training conditions are too adverse you will never get a Pd high enough for practical purposes.
In this experiment, the net that generally performs best is the one trained with 6 dB. This has been found empirically. To find
an optimal net or an analytical expression for the TSNR-detection performance is yet an open question.
5. Conclusions
We can summarize this paper in the following points.
a) A three layer neural network is probably sufficient to achieve your design requirements in terms of detection and false
alarm probabilities (or error probability). The net can be an all-real coefficient one with an even number of inputs (double of
input samples) and one node in the output layer.
b) Increasing the complexity of the network do not improve the performance of your net for an specific task. It is better to
optimize the relationship among signal-to-noise ratio for training (TSNR), detection probability (Pd) and false alarm probability
(Pfa). Increasing the complexity of the network likely improves the robustness of the detector, or it could be necessary for more
complex models; but training time increases.
c) To train the net with a low training signal-to-noise ratio (TSNR) will improve its detection performance. But if TSNR is
too low, the detection capabilities is seriously degraded. On the other hand, if you train your net with a high training signal-to-
noise ratio, it will be only efficient for high input signal-to-noise ratio, and that is not desirable.
d) As a general rule, the net should be trained with an intermediate TSNR (the minimum TSNR depends on Pfa) and then
adjust the threshold value to achieve the design requirements of Pfa. If this is impossible, a higher training signal-to-noise ratio
has to be used.
6. References
[1] D.E. Rumelhart, G.E. Hinton, R.J. Williams, "Learning Internal Representations by Error Propagation", in D.E.
Rumelhart & McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognitron.
Vol 1: Foundations, MIT Press, 1986.
[2] M.S. Kim and C.C. Guest, "Modification of Back propagation Networks for Complex-Valued Signal Processing in
Frequency Domain", IEEE Proceedings of IJCNN, Vol III, pp. 27-31, San Diego, June 1990.
[3] D.R. Hush and B. G. Horne, "Progress in Supervised Neural Networks. What's new since Lipmann?", IEEE Signal
Processing Magazine, pp. 8-36, January, 1993
[4] Jacobs, R.A., "Increased Rates of convergence through Learning Rate adaptation", Neural Networks, vol 1, pp. 4-22,
1987.
[5] D.R. Hush and J.M. Salas, Improving the Learning Rate of Back-propagation with the Gradient Reuse Algorithm",
Proc. IEEE Second Int. Conf. on Neural Networks, vol I, pp. 441-447, San Diego, 1988.