Content uploaded by Gordon Okimoto
Author content
All content in this area was uploaded by Gordon Okimoto on Jul 31, 2016
Content may be subject to copyright.
Principal component analysis in the wavelet domain: New features for
underwater object recognition
Gordon Okimotoa* d David Lemonds)**
aTrex Enterprises, Inc., 3398 Manoa Road, Honolulu, HI 96822
bORJCON Corporation, 970 North Kalaheo Ave., Ste. C-215, Kailua, HI 96822
ABSTRACT
Principal component analysis (PCA) in the wavelet domain provides powerful features for underwater object recognition
applications. The multiresolution analysis of the Monet wavelet transform (MWT) is used to pre-process echo returns from
targets ensonified by biologically motivated broadband signals. PCA is then used to compress and denoise the resulting
time-scale signal representation for presentation to a hierarchical neural network for object classification. WaveIetfPCA
features combined with multi-aspect data fusion and neural networks have resulted in impressive underwater object
recognition performance using backscatter data generated by simulated dolphin echolocation clicks and bat-like linear
frequency modulated (LFM) upsweeps. For example, wavelet/PCA features extracted from LFM echo returns have resulted
in correct classification rates of 98.6% over a six target suite, which includes two mine simulators and four clutter objects.
For the same data, ROC analysis of the two-class mine-like versus non-mine-like problem resulted in a probability of
detection (Pd) of 0.98 1 and a probability of false alarm (Pfa) of 0.032 at the "optimal" operating point. The waveletlPCA
feature extraction algorithm is currently being implemented in VLSI for use in small, unmanned underwater vehicles
designed for mine-hunting operations in shallow water environments.
Keywords: principal component analysis; Morlet wavelet transform; hierarchical neural network; biomimetic systems;
underwater object recognition.
1. INTRODUCTION
Biomimetics is the attempt to emulate in hardware and software the natural biosonar of animals such as the bottlenose
dolphin and big brown bat. These creatures exhibit extraordinary echolocation capabilities in acoustically harsh
environments by exploiting structural cues in the acoustic backscatter generated by their broadband transmit signals'.
Although the higher level processing responsible for the observed performance of these animals is not well understood, we
have nevertheless implemented a biomimetic system for underwater object recognition that use dolphin-like echolocation
clicks and bat-like LFM upsweeps to ensonify targets of interest. In our approach, a compact set of features is extracted from
the echo return based on PCA compression in the wavelet transform domain. The resulting features are then presented to a
hierarchical neural network for aspect fusion and classification processing. Figure 1 (next page) illustrates the signal
processing that we have implemented using the MWT and PCA. In general, a set of features should be parsimonious and
faithful in order to alleviate the curse of dimensionality that plague real-world pattern recognition systems2. In this study, we
show that PCA compression in the wavelet domain provides such features and results in a robust system that generalizes well
from training to test data (see Section 7).
This paper focuses on the specification and evaluation of signal features obtained by performing PCA in the wavelet
transform domain (waveletfPCA features). in Section 2, we provide a brief description of the data that are used to evaluate
the impact of waveletfPCA features on system performance. We note that waveletlPCA features conform to the so-called
"expansion/compression" (E/C) paradigm that we describe in Section 3. Section 4 provides an overview of the Monet
wavelet transform that implements the expansion phase of the EtC paradigm. Section 5provides a discussion of PCA, which
is used to implement the compression phase of E/C. We discuss the evaluation of wavelet/PCA features using neural
*Correspondence: email: gokimoto@thermotrex; telephone: 808 988 7158
** Correspondence: email: lemonds@orinconhi.com; telephone: 808 254 1532
Part of the SPIE Conferenceon Detection and Remediation Technologies
for Mines and Minelike Targets IV • Orlando, Florida • April 1999 697
SPIE Vol. 3710 • 0277-786X/99/$1O.OO
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
698
Echo Return
LA11
The data consists of echo returns from two distinct target suites, one of which was ensonified with a simulated dolphin
echolocation click and the other with an LFM upsweep. The dolphin-like data set was generated using the WAU1 biosonar
transducer which ensonifies underwater targets using a simulated dolphin echolocation click. The targets are suspended in
the water column of a test tank facility located at the Hawaii Institute of Marine Biology (HIMB) on Coconut Island,
Kaneohe, HI. Each click is appoximately 50microseconds in duration and is highly impulsive and broadband with a peak
frequency in excess of 100 KHz and a sound pressure level of around 200 dBs (see Figure 2). Four targets of identical
dimensions but of differing material composition are ensonified by the WAU1 at a distance of 2 meters. The target types are:
1) foam-filled aluminum cylinder, 2) coral rock cylinder, 3) hollow aluminum cylinder and 4) hollow stainless steel
Ensonify,
Data prep
Worki 800 samples for training (even)
Figure 2 Dolphin-like data collection and pre-processing
cylinder. Each target is rotated in the vertical axis from 0 degrees aspect (broadside) to 90 degrees aspect (left-end view) in
5degree increments. Fifty echoes consisting of 1024 samples each are collected, range-gated and digitized at a rate of 1
MHz for each aspect angle. Each echo is then peak-centered using a matched filter based on the transmitted signal. An echo
consisting of 512 samples is then extracted based on the signal peak provided by the matched filter. Only the first 20 echoes
from each aspect are actually used. The echo data are generally of good quality, with signal-to-noise ratio (SNR) varying
significantly with aspect angle.
The LFM backscatter data was provided by Dr. Gerald Dobeck of Coastal Systems Station, Dahigren Division, NSWC,
Panama City, FL. These data consists of echo returns from a six-target suite which include: 1) a bullet-shaped mine
simulator; 2) a conical mine simulator; 3) a water-filled fifty gallon drum; 4) an irregularly shaped limestone rock; 5)a
smooth granite rock; and 6) a water-soaked log. Each target was ensonifed by a single 20 kHz to 60 kHz LFM upsweep at 5-
degree increments from 0 to 355 degrees resulting in 72 echo returns per target. A H52 hydrophone placed between the F33
networks in Section 6 and present performance results in Section 7. In Section 8 we present conclusions and ideas for future
work.
-i
i:'-T"kwk(\tv%TA1
Figure 1 . Biomimetic signal processing using waveletlPCA features
2. DATA
0 100 200 kH
Simulated Dolphin Click
* Cylindrical Targets
1. Foam-filled aluminum 3. Hollow Aluminum
2. Coral Rock 4. Stainless Steel
*IMHz sample rate
*512 data record
* Ø9Ø degrees, 5 degree increments
* 1520 total samples
— p •j
000kHz
hlm@
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
LFM acoustic projector and the target was used to detect the backscatter signal which was sampled at 2 MHz. Each echo
return was pre-processed to remove anomalous scatterers (e.g. target mount) and multipath reflections. The resulting time
series was then down-sampled to 1024 points at 250 kHz to reduce the amount of data (see Figure 3). Aspect dependent
reverberation was then added to each echo return at the 12 dB level with respect to a class-dependent reference amplitude3.
The synthetic reverberation was modeled by convolving the LFM transmit signal with a Gaussian white noise process.
Transmit signal
._A Ensonify —
204060kHz
* Unear FM upsweep
*20KHz -60KHz
—Ieprocess -
Ike-processed *Removebiases
Data -_J Remove anomalous scatterers
-*Addaspect-dependent reverb @12dB
*Downsample to 1024 pts @ 250 KHz
Figure 3 LFM data collection and pre-processing
3. THE EXPANSION/COMPRESSION PARADIGM4
The expansion/compression (E/C) paradigm is a general approach for extracting features for pattern recognition applications.
Essentially, the E/C paradigm first expands the input signal in some transform domain and then compresses the resulting
expansion for presentation to a classifier. The aim of the expansion phase is to better separate signal from noise and to "pre-
whiten" nonstationary and non-Gaussian noise backgrounds (e.g., fractal noise). Studies have shown that the orthogonal
wavelet transform of fractal noise is Karhunen-Loeve-like in terms of correlation structure5. This implies that the wavelet
transform can be used to pre-condition a signal to enhance signal-to-noise ratio (SNR) by concentrating signal information in
a small number of non-zero coefficients. In this study, we have implemented the expansion phase of the E/C paradigm as a
transformation to the time-scale domain using the Monet wavelet transform (MWT). Although, the MWT is not orthogonal,
it is multiresolution and still provides some degree of signal/noise separation and background equalization. Moreover, the
redundancy of the MWT provides a signal representation with features that are appealing and more easily interpretable than
that provided by an orthogonal wavelet transform.
Assuming that signal and noise are better conditioned in the wavelet domain, we expect to obtain better features by
compressing the wavelet transform of the signal rather than the signal itself. We have implemented the compression phase of
the E/C paradigm using standard PCA based on the singular value decomposition (SVD) of the wavelet data matrix. PCA is
a classical statistical technique that: 1) decorrelates the wavelet coefficients of the echo return; 2) removes the pre-
conditioned noise background; and 3) reduces the dimensionality of the wavelet feature vector that is presented to the neural
classifier. SVD was chosen to compute the waveletlPCA features because it operates directly on the wavelet data matrix and
is more numerically stable than standard PCA.
We note that any number of signal transforms can be used to implement the compression phase of the E/C paradigm.
Examples include the short-time Fourier transform, orthogonal wavelets such those from the Daubechies family, adapted
wavelet packets or the Wigner-Ville transform. Similarly, any number of variations of PCA can be used to implement the
compression phase of EtC. These include PCA based on the Fisher Criterion, non-linear PCA, PCA neural networks and
independent component analysis. Consequently, any transformlPCA pair results in a distinct set of features that may be
useful for underwater object recognition applications.
699
* Targets
1. Bullet mine 4. Limestone rock
2. Cone-shaped mine 5. Granite rock
3. 50 gal. drum 6. Log
*2MHz sample rate
*8192data record
*355degrees, 5 degree incr
*432samples
Target 1 @ 20
-20 40 60kHz
* 216training & 216 test samples
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
(i:)
(fJ'sJIL)
=
np(n)f(n)1'SiI(J
=('s)f
s POU!JOP
SI
'
(,6)
l
f
'Iu!s
JO
.LAF'I snonuruoo OILL
•o1u!1 pU I1os U!
1U!S
sozqooI 1OjOAM
qoo
(
pui
i/
1OjOAM JOIflOW I1 JO SUO!SJA pTjSUI1 U1
PUI'P °
(n)
iIi
SOJAM Oqi
(j
:q oiou iqwnu io
£mntqii
u st
d
pui
'oui
ui i
uind
uoiuqsu snonuuoo si
j
'ioiu.nmd
pos snonuuoz g si s
(z)
—
=
(n)
l'SAL
cq
puijp S1AUM
LInOJqI
UIIP IU
UIjOOIL cq
pAIqO si
spuis
jo
sIsIcpuU
OS-Omi3 V
uuwop
iou3nbaIJ
ui ioijg
upuodsauoo
(q)
pu
'uiwop
own oq Ui IAU1\ 10q20w 3j1OjA
(U)
j7
Oinid
I
°
or
09 /
08
on
(q)
aI
a—
OOL
coo cnoo noo ccoo coo czoo zoo cioo too cooo
60 80 LO 90 cO no co zo 0 0
(U)
U)
UIUwOp
iouonboij
olTi U!
k/il
oj
uipuodsauoo
oqi SMOS
(q)' oini
UU
'L0
=
)/ 0J
(I) uoflUnbo iq
poujop
j/
IOjOAUM ioqow opoj,44 oq sMoqs
(U)j
o.ini
L10' pOUIDO5SU OL Jo
iouonboij
ioiuoo oip SU
pOlOidlOlUi
oq uio
JOIOWUJUd
IUOJ oip UU fl oioqi
SU puijop Si IOIOAUM JOqIOW OjJOjA OUJ 6TLwboIj IOIOAUM Joipoui
OjUIS
U I1IOJJ POAUO OJU SUOUOUflJ OjOAUM liv
ipplMpuUq JOpIM jUE
soiouonboj
iouoo
ioqiq UIAUq
sioijj oi
'oidiouud /1UI1iflJOOUn
OI4
iq
OUOt UU
'5IOIOAUM
O1
pOssOJdUIoO
oiow oq o
puodsauoo
sopos
ioq8iq
oqi
'i(pnis
siq u OlOAUM
oqT Jo UOUflIOSOJ ow! O1 UO
iuopuodop
14pIMpURq pui
iouonboij
ioiuoo sq
'uonounj
OlOAUM oq Jo UIIOJSUUJI tounoj oqT Si qoiqM
'iojg
pOPIDOSSU OUT 0wfl U! UJflIOJ O1JOO OI1 JO
uissoood
UO!UjO1JOO JOJ
UiIdOJ
U SU posn Si UU UOflflOSOJ Jo OLUDS JO UOflUJfl OUIfl UIUUOO U SU OjOAUM OUJ uiUmop
iouonboij
oqi ui iojg pOIUTOOSSU ml UII OjOAUM UIUUIOp-Owfl U
spuodsauoo
atoqi
'OIUO5
qOUO oj
9iuopuodopui
OIUOS si iqi
s!SlUUU jEUIS
uonnjosaiujnm U
uipiAoJd
snq
'suonwnp
own TU0JOJJ!p Jo SOjOAUM
UI5fl
SOIUOS JO
OUU1
U JOAO
iI!S SOZiEUU
pUUq JO11O oqi UO
uissoooid jUUIS
OIOAUM
Tuopuodop
OIUOS OOUOq Si UU
'OZis
pOXiJ JO SMOpUIM
SIS/jEUU
SOZijIlfl UUOJSUUJI JoIJflO OI UO posq
uissoooid jUUIS
IUUO!IUOAUOD
}\TMOISMV11 1W1AVM LFIMOVI H1
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
where isthe complex conjugate of l/f! and ()isthe inner product in L2 (9). By Parseval's identity we have that
(s,t)= IsI' (sco)j(co)}' (t) (4)
where (.)v is the inverse Fourier transform of (). That is, as a function of t , J(s,t) is the inverse Fourier transform of
the product 1/1 (sw)f(co) with respect to the variable Ct), with S playing the role of a parameter. This result illustrates how
the MWT operates in the frequency domain and allows us to efficiently compute f (s, t) using the Fourier transform.
Because the signal representation provided by the MWT is highly redundant, it is possible to subsample equation (3) over
scale and time without losing the information that is necessary for the reconstruction of the original signal f That is, we
dyadically subsample the continuous signal representation provided by equation (3) at the discrete lattice points (q, p)
where we set Sq 2q , t, =p'r02q and 'V sampling period of f (which we now regard as a discrete signal). We
then use equation equation (4) to compute f(Sq t )= f(q,p) using wavelets of the form
IIfp,q(u)= 22y(2u_pr0). (5)
In practice, the scale and time indices, p and q have finite ranges since the signal f is usually discrete and is assumed to
have finite time resolution.
For underwater object recognition, the dyadic subsampling of the Monet filter bank described above is much too coarse in
frequency domain and results in the loss of spectral information that could hurt classification performance. Our
implementation of the MWT spectrally decomposes a signal over three octaves with 1 6-20 filters per octave using a
technique known as voicing. This technique uses multiple "mother" wavelets to more densely cover the frequency domain in
order to prevent the loss of spectral information that would have otherwise occurred had we dyadically spaced the filter-bank
using just a single mother wavelet7. Specifically, given a mother wavelet I/I ,the general form of the voices
012N—I. .
111 ,1/I , ai , . . . , yf is shown in equation (6) below
I//i (u) = 2J/N,(2J/N ) (6)
for j 0,1,2. . . , N —1 . Each voice generates a discrete, constant-Q filterbank that is shifted in frequency but aligned in
time. Because the filterbank associated with each voice is time-aligned we are able to capture all the lattice points for each
voice in the 2-dimensional time-scale domain that corresponds to a given sample point in the signal. At the same time, we
have closed the gaps in the frequency domain. Figure 5 shows a sequence of filterbanks that cover three octaves in the
frequency domain for a LFM echo return (these octaves correspond to scales 6,7, and 8 which is where the energy for both
the dolphin-like and LFM data resides). Note how the filters widen as frequency increases illustrating the constant-Q nature
of the Monet filterbank. Also note that had voicing not been used, we would have had just one filter to cover each octave
which is clearly not sufficient for spectral coverage without gaps.
701
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
(a) (b) (c)
Figure5. Monet filterbank for an LFM echo return: (a) scale 6, (b) scale 7 and (c) scale 8
The magnitude of the output of each filter is sampled at every sample point of the signal resulting in a 60x1024 image
(assuming 20 filters per octave) that represents a highly redundant analysis of the signal over scale and time. This two-
dimensional representation is known as a scalogram (see Figure 6(b)). Each pixel of the scalogram represents the amount of
signal energy at a given time corresponding to the output of a filter associated with a wavelet at a given scale. Each
horizontal scan represents the signal's energy distribution over time with respect to a filter at a fixed scale and each vertical
scan represents the signal's energy distribution over the entire filter-bank with respect to a fixed time point of the signal.
Note also in Figure 6(b) how the MWT separates signal from reverberation background, especially at the higher resolution
scales.
(a) Time Series
4.. . iii ....
:-0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 6. (a) LFM echo return from 50 gallon drum, and (b) scalogram of echo return
The Monet scalogram in Figure 6(b) is composed of 61,440 coefficients and is an extremely rich but dense signal
representation that is much too large for direct input to a neural classifier. To address this problem we must find a way to
compress the wavelet representation without losing important information. The first step we take is to bin-average in scale
and time to produce a 15 by 16 representation that is raster-scanned to a vector with 240 coefficients. Although, the number
of coefficients has been reduced significantly, the dimensionality of the feature vector is still too high. One way to reduce
the dimensionality even further is to extract the principal components of the wavelet data matrix whose rows are equal to the
bin-averaged wavelet coefficients of the time series data. For example, after applying PCA compression to the wavelet
transform of the LFM backscatter data, we end up with feature vectors of length thirty
5. PRINCIPALCOMPONENT ANALYSIS IN THE WAVELET DOMAIN
PCA is a classical statistical technique for characterizing the linear correlation that exists in a set of data'°. It is closely
related to the Karhunen-Loeve transform (random processes), singular value decomposition (matrix diagnonalization) and
factor analysis (correlation structure of multivariate stochastic observations). It has recently been getting much attention as a
702
6.5
7.5
S
(5) Scalogram
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
means of extracting features for pattern recognition applications. The primary goal of PCA in pattern recognition is to find a
linear transformation that maps a vector of noisy, correlated time domain components into a much smaller vector of
denoised, uncorrelated feature components.
Let A = [x1 ,x2,. . . , XK T be a data matrix whose rows are K noisy data vectors x, of dimension N with correlated
components (where superscript T is the matrix transpose operator). We desire a linear transformation P such that the
vector y =Px1 has uncorrelated, denoised components and dimensionality M much smaller than N (i.e., M <<N).
....TA T7TT7T T
PCA states that there is an orthogonal matrix V and a diagonal matrix D such that A ii— viiV . Note that A A is
essentially the covariance matrix of the data {x1 ,x2,. . . , XK }. The columns of V are eigenvectors for ATA and form an
orthonormal basis for .R N while the diagonal entries of D are the eigenvalues 2 of AT A and are ordered so that
>A±1 for j 1,2,.. . N—1.Nowchoose the eigenvectors of V that correspond to the M largest eigenvalues where
M << Nand form the matrix V' whose columns are equal to these eigenvectors. Then P : jM defined by
P = (V)T the linear transformation we seek since it maps a N dimensional vector into a M dimensional vector where
M << N . The M components of y Px, are known as the principal components of x, . Theresulting feature vector
is also denoised due to the truncation of those eigenvectors of V whose associated eigenvalues are below a certain
threshold. We note that the eigenvectors that were ignored may contain information that is useful for classification and one
needs to be careful that this information is not lost in the thresholding process. Usually though, a visual analysis of a plot of
the eigenvalues makes it clear where signal ends and noise begins and where the threshold should be set.
Figure 7 shows a plot of the 240 eigenvalues obtained for the LFM training data set. Note that the plot is essentially flat at
some nominal value starting at about the 3O eigenvalue, that is, about 85% of all the "energy" in the data is captured by the
first 30 principal components. The remaining principal components span the noise subspace . Thisimplies that we use the
top 30 eigenvectors of V to construct a transformation matrix that generates the principal wavelet components that
characterize the signal subspace for both the training and test data. To the extent that our training set characterizes the
universe of possibilities, retained eigenvectors will allow the neural network to interpolate over to the test cases. Similarly,
the eigenvalues for the dolphin-like training data can be plotted. The eigenvalues for the dolphin-like data level off at about
the 45 principal component capturing a little over 90% of the variation in the training data. The eigenvectors associated
with these 45 eigenvalues are used to construct the PCA compression matrix for the dolphin-like training and test data.
450
400
350
300
250
200
150
100
50
Figure 7. Eigenvalue plot for LFM training data (even angles)
We have implemented PCA by taking the SVD of the wavelet data matrix whose rows are equal to the wavelet transform of
the echo returns in the training data". Note that SVD operates directly on the wavelet data matrix and precludes the
703
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
computation of the data covariance matrix and is hence is more numerically stable than standard PCA. Essentially, if A is
an MxN matrix, then there are orthogonal matrices U and V and a diagonal matrix Z {o J such that
A=UVT (7)
where U is MxM , Vis NxN , and has the same dimensions as A . The columns of U and V are known as the left
and right singular vectors of A , respectively, while the diagonal elements Q of are called the singular values of A.
We note that: 1) the orthogonal matrix V is the same for both SVD and PCA; and 2) the eigenvalues of PCA are related to
the singular values of SVD by A, a . Itfollows that if A is the demeaned wavelet data matrix, then the SVD of
(i/i)results in the PCA of (1/K)AT A (based on the covariance matrix of the training data). As shown above, we
can then construct a linear transformation P that generates the principal components of the wavelet coefficients.
Modern time-frequency signal representations such as the wavelet transform are highly redundant and are often unsuitable
for direct input to a classification algorithm. Indeed, the initial impetus for the development of the wavelet transform was to
produce a more intuitive and appealing way of presenting the essential features of signals for visual inspection. To the
human eye, the wavelet representation is quite informative in that signal, clutter and noise are often well separated and better
conditioned. But the price paid for this visual acuity is high dimensionality. PCA alleviates this curse of dimensionality by
compressing the wavelet representation into a few uncorrelated features that characterize the main features of the signal. The
effect of PCA compression is especially pronounced in the wavelet domain since the wavelet coefficients are often very well
conditioned in terms of SNR and noise equalization.
6. PERFORMANCE EVALUATION OF WAVELETIPCA FEATURES
The multilayer perceptron (MLP) is used as the baseline neural classifier for evaluation of waveletfPCA features. The design
we have chosen uses hyperbolic tangent activation functions for the hidden nodes and logistic activations for the output
nodes. The net is trained using backpropagation to output a value of 0.9 for the node associated with a target class and 0.1
for the remaining output nodes. The node with the highest output value determines the class declaration of the network.
Neural networks with six, four and two output nodes were implemented to solve the six, four and two class problems,
respectively. A six-class net was implemented to classify backscatter from the six-target LFM target suite. A four-class net
was implemented for the dolphin-like data set base on the four target cylinders. Two-class nets were implemented for the
mine-like vs. non-mine-like and the man-made vs. non-man-made scenarios with respect to the LFM target suite. Here, the
two mine simulators were combined into the mine-like class while the remaining objects comprised the non-mine-like class.
For the man-made vs. non-man-made scenario, the two mine simulators and 50-gallon drum were combined to form the
man-made class while the other three objects comprised the non-man-made class. Finally, a two-node net was implemented
to address the man-made vs. non-man-made problem for the dolphin-like backscatter data. In this case, the coral rock
cylinder comprised the non-man-made class while the hollow aluminum, stainless steel and foam-filled aluminum cylinders
were grouped to form the man-made class.
All available data for a given scenario are divided into training and crossvalidation/test sets for neural network training.
When the mean-squared-error on the crossvalidation/test data set begins to increase training is stopped. This approach
prevents over-training as the number of training exemplars is small, especially for the LFM data. As indicated above,
training data consists of exemplars based on even aspect angles, while crossvalidation/test exemplars are based on data
collected from odd aspect angles, i.e., the neural classifier is trained on even angles to maximize classification performance
on the odd angles. This training strategy was adopted to evaluate the interpolation capabilities of the net using waveletfPCA
feature patterns.
Multi-aspect data fusion was implemented by concatenating three exemplars that were separated by 30 degrees across all
aspect angles. This procedure increased the size of the input feature vector to the neural network threefold. But because the
704
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
PCA-compressed single-aspect LFM exemplars were only 30 components long, the final input vector had a length of only 90
components. Had we instead concatenated three uncompressed LFM exemplars of length 240, the input feature vector would
have been 720 components long. The greatly reduced size of the multi-aspect feature vector significantly improved LFM
classification performance (see Section 7). The same remarks also apply to the dolphin-like data. In general, PCA
compression in the wavelet domain enhances the positive effect that multi-aspect data fusion has on classification
performance because of the reduced size of the concatenated multi-aspect feature vector.
We note that the two-class results using waveletlPCA features were obtained using hierarchical neural networks. That is, an
optimal, single-aspect 6-class neural network was trained using waveletlPCA features. The 6-dimensional output vectors
generated by the trained net were then concatenated to form multi-aspect feature vectors of dimension 18 and input to a small
2-class neural network. This two-stage process resulted in the best ROC curves for the two-class problems involving the
LFM and dolphin-like data sets.
7. RESULTS
Confusion matrices and ROC curves summarizing classification performance are presented below. Results for the LFM data
are presented first, followed by results for the dolphin-like data. The entries in all confusion matrices are counts with respect
to the total number of test exemplars. All results use waveletfPCA features together with 3-ping multi-aspect data fusion.
The average correct classification rate over the six target classes of the LFM dataset using wavelet/PCA features was 98.6%
(see Figure 8). The test data consisted of 36 exemplars from each target class corresponding to the odd angles that range
from 5 through 355 degrees for a total of 216 test exemplars. We note that the neural classifier was trained on the even
angles from 0 through 350 degrees for a total of 216 training exemplars. Hence the neural classifier was required to
interpolate between the even angles to identify echo returns at odd aspect angles. As Figure 8 shows, the combination of
wavelet/PCA features and multi-aspect data fusion enables the net to do a very good job of interpolating between the even
angles in order to classify the odd angles.
*216training & 216 test samples
2*peakcenteredsignal extraction
*6-class MLP: 90x45x6 (nodes)
'32*Targets
I. Bullet mine 4. limestone rock
42. Cone-shaped mine 5. Granite rock
•0 3.50 gal. drum 6. Log
5
6
Figure 8. Six-class confusion matrix for LFM data
Figure 9 shows ROC curves for the mine-like vs. non-mine-like and the man-made vs. non-man-made problems using LFM
data. The "knee" of the mine-like vs. non-mine-like ROC curve of Figure 9(a) corresponds to a Pd of 0.982 and a Pfa of
705
ground truth
123456
AvgPcc = 0.986
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
0.032. The knee of the man-made vs. non-man-made ROC curve of Figure 9(b) corresponds to a Pd of 0.968 and a Pfa of
0.033. These performance results compare very well to other results obtained to date for this data set.
—(a) J——(b)
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
:: :216 training & 216 test exemplars
C.C2-class MLP: 18x10x2 (nodes)
0u. U.'. Pla) —. —.-, Pta
Figure 9. ROC curves for LFM data: (a) mine-like vs.
non-mine-like, and (b) man-made vs. non-man-made
What follows are performance results for the doiphine-like data. Figure 10 shows the confusion matrix for the 4 cylinder
target suite of the dolphin-like data set using waveletlPCA features. An average correct classification rate of 96.7% over the
four target cylinders was recorded. We note a number of misses involving the foam-filled aluminum cylinder. The data
record length was 512 points and we believe that taking a longer data record may help in alleviating the"miss' problem for
the foam-filled aluminum cylinder. Still, the overall rate of correct classification is quite impressive considering that
discrimination is based on material composition alone.
Truth
; falumcoral
halum
SS
Avg Pee = 0.967
Figure 10. Four-class confusion matrix for dolphin-like data
Figure 1 1 (next page) shows the ROC curve for the man-made versus non-man-made problem using the dolphin-like data.
Figure 1 1(b) shows that the knee of the zoomed ROC curve corresponds to a Pd of 0.972 and a Pfa of 0.0058. A hierarchical
neural network design was used to fuse the multi-aspect outputs of an optimal single-aspect, four-class neural network. We
note the unusually low Pfa for this two-class problem coupled with a very respectable Pd.
706
o
—.- (
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
(a) Full ROC Curve (b) Zoomed ROC Curve
Figure 1 1 . ROCcurve for Man-made vs. non-man-made scenario for dolphin-like data
8. CONCLUSIONS AND FUTURE WORK
The principal conclusion based on the results of Section 7 are that wavelet/PCA features are capable of accurately and
robustly classifying different target types in a wide range of situations. The effectiveness of these features has been
demonstrated using two different data sets where each was generated by a different transmit signal and target suite. We note
that one of the data sets, (LFM) was contaminated with synthetic reverberation whose spectral distribution coincided with
that of the signal of interest making these echo returns especially difficult to classify. We have also demonstrated the
effectiveness of combining waveletlPCA features with multi-aspect data fusion using both data sets. Indeed, we observed
that the significant dimensionality reduction achieved using PCA in the wavelet domain significantly enhances the
effectiveness of multi-aspect data fusion.
The results presented in this paper suggest that we continue to exploit the B/C paradigm for new signal features by using
different signal expansions (e.g., biorthogonal wavelets, adapted wavelet bases, Wigner-Ville distribution) and various
nonlinear extensions of PCA compression (nonlinear PCA, PCA neural networks, independent components analysis). Also,
given the success of wavelet/PCA features using dolphin-like and LFM transmit signals, we intend to investigate the fusion
of features extracted from the echo returns generated by these two very different signals off a common target set for
enhanced object recognition. Finally, an effort to implement waveletlPCA features and multi-aspect data fusion in VLSI is
currently underway for use in small unmanned, underwater vehicles designed for mine-hunting operations in very shallow
water environments. The challenge here is to realize the potential of waveletlPCA features, multi-aspect data fusion and
similar novel signal processing approaches in real-world settings.
ACKNOWLEDGEMENTS
This work was supported by the Office of Naval Research (ONR) under Contract NO()014-98-C-O1 1 .The authors wish to
thank Dr. Harold Hawkins of ONR for his support and encouragement during this research effort
REFERENCES
1 . W.Au, "Sonar of Dolphins," Springer —Verlag, 1993.
2. R.O. Duda and P.E. Hart, "Pattern Classification and Scene Analysis,", John Wiley & Sons, 1973
3. L.L. Burton and H. Lai, "Active sonar target imaging and classification system," in Proceedings of the SPIE Inter. Symp.
on Aero space/Defense Sensing and Control, vol. 3079, pp. 19-33, Orlando, FL, April 1997.
4. H.H. Szu and P. Watanapongse, "Application of principal wavelet component in pattern classification," in Proceedings
of the SPIE Inter. Symp. on AerospacelDefense Sensing and Control, vol. 3391, pp.194-205, Orlando, FL, April 1998.
5. 0.Wornell, "Signal Processing with Fractals: A Wavelet Based Approach," Prentice Hall, 1996.
6. R. Young, "Wavelet Theory and its Applications," .Kluwer Academic Publishers, 1993.
707
, —---:------------------;----
0.9
0.8
0.7 %..
0.6 .. '
-O504
0.3
0.2
0_I
(.-
099
0.98
0.97
0.96
0.95
0.94
093
0.92
0.91
.::...: ::.
I't
- 0 0 2 0.4 0 6 0.8 I0.02 0 04 0.06 008
P1 pf
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx
7. T. Masters, "Signal and image processing with neural networks", John Wiley & Sons, Inc., New York, N.Y., 1994.
8. G. Kaiser, "A Friendly Guide to Wavelets," Birkhauser, 1994.
9. 5. Mallat, "A Wavelet Tour of Signal Processing," Academic Press, 1998.
10. K.I. Diamantaras and S.Y. Kung, "Principal Component Neural Networks: Theory and Applications," John Wiley &
Sons, Inc., 1996.
11. D. Kalman, "A singularly valuable decomposition: The SVD of a Matrix," The College Mathematics Journal, vol. 27,
no. 1, January 1996.
708
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 07/31/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx