Page 1
1660
Bulletin of the Seismological Society of America, Vol. 92, No. 5, pp. 1660–1674, June 2002
An Automatic, Adaptive Algorithm for Refining Phase Picks
in Large Seismic Data Sets
by C. A. Rowe,* R. C. Aster, B. Borchers, and C. J. Young
Abstract
based method for greatly reducing thedegreeof pickinginconsistencyinlarge,digital
seismic catalogs and for quantifying similarity within, and discriminating among,
clusters of disparate waveform families. Innovations in the technique include (1) the
use of eigenspectral methods for crossspectral phase estimation and for providing
subsample pick lag error estimates in units of time, as opposed to dimensionless
relative scaling of uncertainties; (2) adaptive, crosscoherencybased filtering; and
(3) a hierarchical waveform stack correlation method for adjusting mean intercluster
pick times without compromising tight intracluster relative pick estimates. To solve
the systems of crosscorrelation lags we apply an iterative, optimized conjugate gra
dient technique that minimizes an L1norm misfit. Our repicking technique not only
provides robust similarity classification–event discrimination without making apriori
assumptions regarding waveform similarity as a function of preliminary hypocenter
estimates, but also facilitates highresolution relocation of seismic sources. Although
knowledgeable user input is needed initially to establish runtime parameters, sig
nificant improvement in pick consistency and waveformbased event classification
may be obtained by then allowing the programs to operate automatically on the data.
The process shows promise for enhancing catalog reliability while at the same time
reducing analyst workload, although careful assessment of the automatic results is
still important.
We have developed an adaptive, automatic, correlation and clustering
Introduction
Earthquake location and many other traveltimebased
seismological applications historically have depended criti
cally on the ability of human analysts to estimate arrival
times of body waves. Standard network operationsgenerally
involve the manual measuring of Pwave and Swave arriv
als or, more recently, computer identification of thesephases
using software autopickers. The most common human or
computer picking approach is done one event at a time, with
records from several or all recording stations. These meth
ods, although well suited for the nearrealtime processing
demands of network operation, do not necessarily produce
consistent phase arrival times (picks), because path effects,
signaltonoise conditions, and source radiation pattern dif
ferences within the network of receivers may be large. Picks
obtained from the resulting heterogeneous suite of wave
forms may thus be highly inconsistent for even very similar
events. These inconsistent picks are then used to calculate
the hypocenter, and the event is archived. Seldom are any
*Present address: Department of Geology and Geophysics, University of
Wisconsin–Madison, 1215 W. Dayton St., Madison, Wisconsin 53706
(char@geology.wisc.edu).
but the most egregious picking errors noted and corrected
prior to moving on to the next earthquake; hence, picking
inconsistencies between similar events remain unresolved.
Such routine network operations have produced very
large sets of hypocenter locations (e.g., the online SCEC
database yieldsmorethan430,000southernCaliforniaearth
quakes having more than 28 million picks from 1981–2001)
with location error estimates of a few to a few 10s of kilo
meters for regionalscale networks with good azimuthalcov
erage. Standard catalog locations have served to document
general seismicity levels and the gross geometry of com
parably scaled seismogenic features; however, within the
diffuse scatter of locations that result from picking incon
sistencies, fine details remain unresolved. Significant im
provement in the precision of hypocenter location and the
resulting delineation of the details of seismic source regions
has sometimes been achieved through careful, painstaking
visual crosscorrelation and repicking of phases for prelim
inarily located events (e.g., Phillips et al., 1997; Phillips,
2000), but this is generally a time and costprohibitive un
dertaking that will necessarily be limited to small, focused
subsets of the larger catalogs.
Page 2
An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets
1661
Quantitative, waveformcorrelationbasedphaserepick
ing and relative doublet and multiplet relocations have pro
duced some impressive resolution of seismogenic structures
within families of similar events. Fremont and Malone
(1987) used relative relocations based on crosscorrelation
lags of multiplets at Mount St. Helens to delineate source
regions on the order of a few 10s of meters. Deichmann and
GarciaFernandez (1992) used crosscorrelation methods to
identify relative arrival time differences among similar Al
pine events for precise relative location. Got et al. (1994)
relocated seismicity at Kilauea volcano, Hawaii, using mul
tiplets chosen based on crossspectral coherency. Slunga
et al. (1995) used relative arrival times calculated from
Fourierinterpolated crosscorrelationfunctionstodetermine
precise relative locations and improved absolute locationsin
clusters of similar microearthquakes in Iceland by incorpo
rating the crosscorrelation lags into a modified joint hypo
centraldetermination(JHD)application.Gillardetal.(1998)
used crosscorrelation methods and multiplet analysis at Ki
lauea, Hawaii, to reduce quasilinear ‘cigars’ of microearth
quake foci into precise relative relocations delineating
pencilthin lines of seismicity. Rubin et al. (1999) applied
similar methods to identify tight multiplets on the Hayward
Fault in California.
Dodge et al. (1995) developed an automatic, computer
based correlation approach which calculatesindividualpick
time inconsistencies for event pairs and uses the weighted
lag constraints to adjust for consistent picks. This approach
is a significant improvement over earlier efforts that rely on
a master event method, in that masterevent bias is reduced,
highly similar multiplets are not required, and dissimilar
event pairs will have limited influence over the pick adjust
ments. Shearer (1997) demonstrated significant improve
ment in delineating the seismogenic features associatedwith
the Whittier Narrows aftershock sequence in California by
also invoking a pick lag estimation method, after first
segregating events into groups meeting minimum cross
correlation criteria (e.g., Aster and Scott, 1993). The above
methods, although successively more quantitative and effi
cient, still rely to some significant degree on user interaction
and have been so far applied to specific studies of catalog
subsets chosen either through spatial restrictions or genetic
assumptions (e.g., a limited box of data, a specificaftershock
sequence), or they operate by preliminary exclusion of in
dividual events failing to meet very high (?0.9) cross
correlation criteria. A certain a priori selection to ensurecor
relatability has therefore been invoked. We describe a
method that combines the most desirable features of tech
niques already available with additional adaptability and
portability, in an automatic package that canbeimplemented
for large catalogs such that a wide variety of applications
may be addressed with little timeconsuming customization.
Further, as with the Dodge et al. (1995) or Shearer (1997)
approaches, our method provides corrected picks. These
may be used not only for event relocation, using either stan
dard singleevent location methods or more sophisticated
joint location methods such as JHD (e.g., Pujol, 1992), JHD
collapsing (e.g., Fehler et al., 2000) or the HypoDD tech
nique (Waldhauser and Ellsworth, 2000), but are available
also for other applications such as seismic tomography (e.g.,
Kissling, 1988).
We first outline the technique with a discussion of cat
alog segregation (clustering) and identification of similar
event families. This is followed by a description of the
signalprocessing tools in our algorithm and an explanation
of how we apply them to the data. We then outline the final
calculation of lags and standard errors from the interevent
constraints.
Technique
Our method may be outlined as follows:
• Grooming the catalog
• Preliminary crosscorrelation for waveform similarity ma
trix
• Clustering catalog based on waveform similarity
• Adaptive windowlength crosscorrelation within clusters
• Solving for consistent pick lags within clusters
• Stacking realigned waveforms within clusters
• Crosscorrelating stacks to obtain intercluster pick adjust
ments.
Preliminary Data Organization
Prior to processing, waveforms have preliminary P or S
picks (or both) produced by an analyst or autopicker. These
parameters are read from the trace headers, along with other
potentially useful parametric data such as preliminary hy
pocenter coordinates.
As with other quantitative waveform correlation meth
ods for precise earthquake relocation, we simultaneouslyan
alyze traces from many events on a stationbystation basis
(e.g., Dodge et al., 1995; Shearer, 1997; Waldhauser and
Ellsworth, 2000). This data regrouping allows one to im
prove the consistency of pick times among similar events
by exploiting waveform resemblance for similar source–
receiver raypaths.
Clustering
Once the data are properly formatted, we begin by ob
taining preliminary crosscorrelationvaluestodividethecat
alog into clusters of highly similar events. We have found
that using crosscorrelation results to adjust picks in a het
erogeneous catalog provides unsatisfactory results by mak
ing inappropriate comparisons among inconsistent wave
forms. Downweighting relative lags based on preliminary
interhypocentral distance may incorrectly associate or dis
sociate constraints in the case of mislocated events. Solving
the matrix of first differences to obtain consistent pick ad
justments may be compromised by effectively zeroing some
constraints without adjusting degrees of freedom appropri
Page 3
1662
C. A. Rowe, R. C. Aster, B. Borchers, and C. J. Young
* *
* *
*
*
*
*
*
*
*
COPHENETIC CORRELATION
1
2
3
4
5
6
7
8
9
10
11
12
1
fusion stops
}
 ∆ Cmax
CLUSTERING SCHEMATIC
a)
b)
Figure 1.
based, hierarchical pairgroup clustering method. (a)
The dendrogram is built beginning by selection of the
most similar event pair. Its combination yields a new
single cluster entity, which is then compared against
all other events. Subsequent joinings may be between
two individual events, one event and a preexisting
cluster, or between two clusters, depending on the
values in the reduced similarity matrix. (b) The co
phenetic correlation parameter of equation (4) is cal
culated with each fusion step. Fusion continues until
all events have been associated; the retroactive seg
regation cutoff is chosen as the step prior to the great
est drop in cophenetic correlation value.
Schematic illustration of dendrogram
ately. By decoupling the system into highly similar sub
groups, each can be solved for consistency among closely
related waveform correlation lags (Rowe, 2000).
This subdivision, or clustering (discussed in detail be
low), is followed by relative lag estimation among member
traces within each cluster. Individual picks are adjusted and
the realigned waveforms are stacked to provide a composite
representative waveform for the cluster. These stacks may
then be crosscorrelated to obtain optimal pick adjustments
between clusters, providing improved intercluster as well as
intracluster hypocentralrelationships.Wewilloutlinedetails
of these steps later in this article.
Many candidate approaches for similarity clusteringex
ist. Among those used successfully in other seismological
applications are signal envelope crosscorrelation (Carr et
al., 1999a,b), sonogram pattern recognition (Joswig, 1995)
(sometimes referred to as spectral fingerprinting), and mul
tistation median waveform crosscorrelation (e.g., Aster and
Scott, 1993). Lees (1998) applied equivalence class analysis
to events that had been segregated by preliminary hypocen
ter location. We have chosen the waveformcrosscorrelation
coefficient for all interevent pairs as our catalog clustering
criterion. The clustering may be performed on correlation
values for a single station, the median correlation value for
a suite of stations, or some other criterion best suited to the
catalog being evaluated. Performing clustering early in the
analysis boosts the efficiency of our technique, as time and
memory requirements for correlation and lag estimation on
resulting subclusters decrease quadratically with the number
of events.
An agglomerative, dendrogrambased hierarchical pair
group clustering algorithm (e.g., Lance and Williams, 1967;
Sneath and Sokal, 1973; Ludwig and Reynolds, 1988) has
been chosen for catalog segregation. Available clustering
options include cluster centroid mean or median, single link,
complete link, or flexible combinational weighting. Our
technique was implemented following the flexible method
in the MATLABbased seismicanalysispackage,MATSEIS
(Harris and Young, 1997; Young et al., 2001). We outline
the algorithm and its implementation for seismic waveform
clustering later in this article.
Preliminary crosscorrelation of N events yields an N ?
N symmetric event similarity matrix M, whose i,j entries
represent the crosscorrelation maxima for the ith and jth
events. From this the event dissimilarity matrix K is con
structed:
K
? 1.001 ? M .
(1)
i,ji,j
The Ki,jmay be viewed as a measure of interevent distance
in waveformsimilarity spaceforeventsiandj,whereavalue
of ?0 equates with colocation and a value of ?1 represents
infinite similarity distance. The use of 1.001 rather than 1.0
in equation (1) eliminates dividebyzero errors in the rare
instances where crosscorrelationmaximaare1.0tomachine
precision. The algorithm constructs a hierarchical structure
of event similarity for all events (Fig. 1).
First, the two events i and j (i ? j) with the smallest K
value (equation 1) are fused. A new vector k1is constructed
whose entries are weighted by the ?1, ?2, and b coefficients
(c ? 0) described in Table 1 (Lance and Williams, 1967).
For N events,
1
for m ? 1:N k
? ? K
? ? K
? bK .
(2)
m
1
i,m
2
j,m i,j
This vector is added as a new row and column (plus a
dummy diagonal value) to K, whose dimension is now N ?
1. The two rows and two columns, i and j, are then annihi
lated and the matrix is thus reduced to a matrix Kgof di
mension N ? 1, where g denotes the fusion step and g ?
1:N ? 1. The matrix is again searched for minimum dis
tance, and individual events may continue to be grouped by
equation (1) until the shortest distance is found to belong
either to a cluster with an individual event, or two clusters.
Page 4
An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets
1663
Table 1
Cluster Combinational Weighting Parameters for Different
Hierarchical Clustering Strategies
Strategy
?1
?2
b
Centroid (unweighted centroid)
t(j)
t(j,k)
t(k)
t(j,k)
?t(j)t(k)
t(j,k)
Centroid (weighted)/median1
2
1
2
1
?4
Group mean/unweighted pair
grouping method
t(j)
t(j,k)
t(k)
t(j,k)
0
Flexible 0.6250.625
?0.25
The number of entities in the jth and kth groups are represented by t(j)
and t(k), respectively, and the number of entities in the combined (j,k)
group is t(j,k). (After Ludwig and Reynolds, 1988.)
The new combinational vector kgis derived using the linear
combinational equation (Lance and Williams, 1967) of the
form
g
m
? ? K
g?1
for m ? 1:Nk
? ? K
(i, h), m
(j, h), m ? bK
1
g?1
g?1
(i, j),
(3)
2
where the distance between the cluster ?i,j? and another en
tity ?h? may be computed from the known distances Ki,hand
Kj,h, and the weighting parameters ?1, ?2, and b. Note that
?h? may be an individual entity, or a previously joined clus
ter. These combinational steps are repeated, and the Kgma
trix reduced, until all events have been associated into a
single group, requiring a total of N ? 1 cycles.
For the waveform similarity problem, we choose the
flexible combinational weighting scheme. This weighting
scheme has coefficients chosen so that the sum of the ?1,
?2, and b parameters equals 1, which means that with suc
cessive joinings, the Kgmatrix values (distances between
clusters) move monotonically in a single direction, either
continually contracting or continually expanding, with no
reversals of direction that would cause problems for an au
tomated system. The value chosen for b can be shown
(Sneath and Sokal, 1973; Ludwig and Reynolds, 1988) to
govern the spatial relationships of the clustering hierarchy.
When b approaches ?1 the system is dilated as in complete
linkage clustering schemes, whereas b → 0 contracts the
space similar to singlelinkage methods (Ludwig and Rey
nolds, 1988). In other words, after fusion the reconstructed
matrix has distances that are, respectively, much greater or
smaller than the original matrix. A choice near ?0.25 for b
tends to minimize this distortion (e.g., Sneath and Sokal,
1973).
The final problem is determining at what point to ter
minate clustering in an automated manner. Carr et al.
(1999a) have obtained good seismicity clustering results us
ing the technique of cophenetic correlation (Sneath and So
kal, 1973), wherein the distances between elements in the
Kgmatrix are compared at each clustering step with the
event distances in the original K matrix.
The cophenetic correlation for each pairwise combina
tion (of entities or multievent clusters) is calculated after
each fusion step using theoriginalKmatrixandacophenetic
matrix Kc. The cophenetic matrix begins as a duplicate of
the original matrix, but with each successive grouping, all
entries in Kcassociated with the clustering step (either in
dividuals or all members of the clusters being addressed)are
replaced with the current dissimilarity distance value. This
reduces the similarity of Kcto the original K matrix. The
gth value of the cophenetic correlation parameter (for the
gth clustering step) is then
nn
c
j,k
KK
? ?
j?1 k?1
j,k
C ?
g
,
(4)
nnnn
1/2
?
c
j,k
c
j,k
KKK
? ?
j?1 k?1
j,k ? ?
j?1 k?1
?
where j and k represent the entities being combined in the
ith correlation step(eitherindividualsorpreviouslyclustered
groups). As larger groups are formed, the similaritybetween
the Kcmatrix and the original K matrix will continue to
decrease as the original entries are replaced with the dissim
ilarity values calculated for the growing clusters. Overall
values of Cg(equation 4) will thus decline, although the
function decrease is not necessarily monotonic. We target
the largest drop in cophenetic correlation as the point im
mediately before which fusion should stop, as thisrepresents
the transition where the greatest leap in disparity between
Kcand K occurs.
We illustrate the clustering technique with a 12object
example dendrogram and cophenetic correlation function in
Figure 1. Each joining on the dendrogram represents the
identification of smallest distance in similarity space for the
entries in the reduced matrix at each fusion step. In this
example, objects 7 and 8 are most similar, and so are joined
as a pair by equation (2). The matrix is reduced through
annihilation of rows and columns 7 and 8, with a replace
ment row and column added whose entries are the new re
lationship of cluster ?7,8? to the remaining 10 members,gov
erned by equation (2). We next join objects 10 and 11, then
5 and 6, then 3 and 4. The fifth combinational step finds the
smallest distance in the reduced matrix to lie betweencluster
?7,8? and object 9, so these are also joined under equation
(3). The process continues until step 11, when all entitiesare
united. The point at which to stop fusion is determined ret
roactively, using the maximum negative difference of the
cophenetic correlation function, Cg. At each step we have
calculated a value for the cophenetic correlation, displayed
beneath the dendrogram in Figure 1. The greatest drop in
cophenetic correlation occurs at the fusion of cluster
?5,6,3,4,1,2? with cluster ?7,8,9?, which implies that in the
hierarchy of this dendrogram the greatest dissimilarity oc
curs at this fusion step. We therefore segregate the dataset
Page 5
1664
C. A. Rowe, R. C. Aster, B. Borchers, and C. J. Young
into clustermembershipsasdefinedimmediatelypriortothis
fusion step, leaving three clusters, ?5,6,3,4,1,2?, ?7,8,9?,
?10,11?, and an orphan object ?12?, that is not very similar
to any of the others. Other automatic decisionmaking tech
niques exist, such as comparing the inter and intracluster
variances (e.g., Ludwig and Reynolds, 1988). We find, how
ever, that the cophenetic correlation seems well suited to
catalogs that divide robustly into unrelated groups ofdistinct
waveforms, such as the discrimination of mine blasts from
different locations (e.g., Carr et al., 1999a,b), or teleseisms
from different source regions.
Efforts to apply the cophenetic correlation method to
catalogs of earthquakes exhibiting continuous waveform
variation, however, were less satisfactory (e.g., Rowe, 2000;
Roweet al., 2002). Under such circumstancesthecophenetic
correlation function becomes erratic, and a derivativebased
termination of fusion on such a function is unreliable. We
have therefore modified thealgorithm sothatwemayinstead
select a similarity threshold, below which fusion stops. We
have found that using athreshold of0.8workswell,although
optimal results may vary from catalog tocatalog.Thethresh
old approach yields a large number of small (doublet and
multiplet) similarity groups and may be overly aggressive in
cases where large general earthquake families are present.
The final cluster membershipsunderanysegregationscheme
depend strongly on the length of correlation window and the
degree of filtering; hence, some interactive testing on a ran
dom catalog subset is advisable to determinesuchparameters.
Once a satisfactory division has been found, the catalog
is separated into corresponding clusters, and individual
phases within each cluster are crosscorrelatedtoobtainrela
tive pick lag estimates.
Relative Lag Estimation
Relative lag estimation between pairs of tracesproceeds
in two steps: a coarse discrete correlation step that provides
an estimate of lag to the nearest time sample and a fine cor
relation step that provides a refinement to the subsample
level. Among all events recorded at a given station, we com
pare each event pair for each phase (P,S) to measure wave
form similarity and to estimate lag. A userspecified M
sample time window, including a fractional prepick offset,
is established about the preliminary picks for each pair of
events to be compared. This window length is chosen based
on sample rate and overall frequency content of the targeted
phase. Generally speaking, two cycles of the dominant
waveform is an acceptable length; however, the choice of
correlation window length varies depending on the intended
use of the correlation results. After cluster separation, intra
cluster crosscorrelations are performed using a suite of cor
relation window lengths. This range is chosen such that the
minimum window length includes one to two cycles of the
highestfrequency component that may be consistently iden
tified among a representative sample of waveforms; the
longest window is chosen based on a conservative estimate
of likely maximum pick error. These window length ranges
may be different for P waves and S waves and will generally
vary among stations. The entire cluster membership for each
phase is crosscorrelatedforeachofthewindowlengths.The
resulting systems of crosscorrelation values, lags, and stan
dard deviations are compared to identify the best overall
crosscorrelation values and smallest average lag standard
deviations for each phase in question.
For stations with multiple components (usually three),
we use polarization filtering to improve P and S signalto
noise levels prior to waveform comparison. This provides
the best function for subsequent correlation when the
source–receiver geometry is not favorable for a particular
component.
A mean covariance matrix (e.g., Aster et al., 1990) is
calculated from the sum oftheenergynormalizedmulticom
ponent signal for each event in the pair:
T
1 1
T
2 2
E
2
1 x x
?
2 E
x x
C ??
,
(5)
?
1
where xjis the (usually) threecomponent matrix with col
umns that are individual component time series for event j,
and
T
j
1/2
E ? trace(x • x) .
j
(6)
j
The diagonalization of the positive definite C matrix gives
the unit eigenvectorcharacterizing the best data projec
u l1
tion for mutually linearizing particlemotionbetweenthetwo
traces. For two or threecomponent seismograms,theeigen
value and eigenvector decomposition of the signal covari
ance matrix may be calculated exactly; however, the use of
fourcomponent (or more) sensors in reservoir microearth
quake studies (e.g., Baria et al., 1999) requires an iterative
approach to the signal decomposition. We have therefore
made use of repeated Jacobi transformations (Press et al.,
1989) to find the principal components of the tensor. All
subsequent analysis for the waveform pair is performed on
the projected (1dimensional) data
u u
1
x? ? (x • l ) l .
jj
(7)
1
Each waveform pair is next transformed via fast Fourier
transform (FFT) into the frequency domain
M?1
?
k?0
?ijk/N
X ?
jk
x? e
jk
.
(8)
The crossspectrum,
*
2k
s ? X X
k
(9)
1k
(where * denotes complex conjugate),
and coherence,
Page 6
An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets
1665
j?m
?
l?j?m
j?m
?
l?j?m
s
l
c ?
k
,
(10)
s
l
are calculated, where the coherence averaging width is 2m
? 1 Rayleigh bins (incrementally decreasing near the zero
and Nyquist frequencies). We choose m as a fraction of the
window length, M, with a minimum of five Rayleigh bins.
Prior to crosscorrelation, we use the coherence and sig
nal power to uniformly prefilter the two seismograms under
consideration to emphasize frequencies that have high
signaltonoise and high coherency (e.g., Rowe and Aster,
1999; Aster and Rowe, 2000; Rowe, 2000). The spectral
weighting is
1/2
c ? (X ?X )
k
c .
k
(11)
1k
2k
This filtering thus downweights incoherent frequency bands
while reducing removal of potentially useful signal (a risk
in an a priori bandpass filter choice; Fig. 2). Although the
most coherent frequencies will most likely be found in the
lower frequencies of the spectrum, the adaptive nature of
this approach enhances the comparisons of highly similar
waveforms in a diverse catalog by passing more of the co
herent spectrum to each interevent correlation function. At
the same time, we can accommodate gross similarities
among those event pairs whose higher frequencydetailsmay
not correlate well. A similar method has been applied to
processing of seismic array signals by Wassermann and
Ohrnberger (2001). Use of coherency filtering provides an
additional benefit of permitting robust crosscorrelation of
slightly to moderately clipped waveforms. The spurious
spectral contributions that may arise from clipping generally
exhibit poor coherency, so are downweighted in the integer
correlation step.
The coherencyfiltered signals, with spectra
Y ? X c ,
jk
(12)
jk k
are next crosscorrelated (with zero padding to eliminatecir
cular correlation wraparound) in the frequency domain. The
filtered crossspectrum is transformed back into the time do
main and the maximum of this crosscorrelation function is
the estimate for the coarse interevent pick lag. To estimate
a standard deviation for the coarse (integer sample) corre
lation lags, la, we perform a set of narrowband correlations
(typically eight) and find the variance, where each term is
weighted by the crossspectral power in that band (Asterand
Rowe, 2000; Rowe, 2000). This coarse correlation standard
deviation, ra, is used, with the coarse crosscorrelationmax
imum, as a discriminator to determine which waveforms are
sufficiently similar to merit being passed to the subsample
crosscorrelation step for further lag refinement.
In some instances, the desired level of relative lag res
olution is less than the sample interval. For example, in an
area where the Pwave velocity is approximately 5 km/sec
and where data are sampled at 100 samples/sec, the integer
sample arrival time resolution for even highsignaltonoise
data can be as poor as 0.005 sec, which may introduce a
worstcase location error of up to 25 m. The ability to con
sistently pick waveforms to subsample precision can dra
matically improve resolution of smallscale features if suf
ficiently highquality data are available.
The crossspectral method (e.g., Poupinet et al., 1984)
provides a means of determining subsample lag adjustments
by estimating a continuous function, the zerointerceptslope
of the crossspectral phase (?), where the subsample lag
term estimate is
1 d?(f)
2p df
l ?
b
(13)
and
imag(s )
?
real(s )
k
? ? atan
k
.
(14)
?
k
In many applications of this technique (e.g., Poupinet et al.,
1984; Got et al., 1994), the relative weights of the phase
values used in slope estimation have been calculated using
a coherencybased measure. This provides useful relative
weights to the phase points, but requires ad hoc scaling to
the standard deviations needed to estimate meaningful error
statistics for the subsample lag. For reestimation of phase
picks and their subsequent inclusion in relocation or other
applications, quantitative estimates of the pick standard de
viations in time units are important.
A further difficulty in applying the crossspectral
method arises from the inherent difficulty of characterizing
the spectrum for a short time series. Spectral estimation on
a sampled time series via FFT may beseverelycompromised
when the length of the time window is shortened, because
spectral leakage, which results from truncation of the win
dowing function, can bias the highfrequency rolloff of the
spectral estimate both in amplitude and phase (Park et al.,
1987). For many seismological applications, however, in
cluding the crosscorrelationbased repicking algorithm, it is
desirable to use a fairly short time window to isolate the
limited and correlatable (direct phase arrival) portion of an
intrinsically nonstationary signal; lengthening the window
to reduce spectral leakage introduces a higher proportion of
background noise, as well as scattering contributions from
the coda, and degrades the direct phase arrival comparison.
A standard approach to reducing spectral leakage is to
apply a taper to the truncated time series, one that smoothly
downweights data points toward zero at the ends of the win
dow. This provides good results in terms of reduced spectral
leakage, but causes a severely elevated varianceforthespec
tral estimate. The Hann taper, for instance, discards approx
imately 5/8 of the statistical information of the time series
(Park et al., 1987). A further difficulty arises because of the
Page 7
1666
C. A. Rowe, R. C. Aster, B. Borchers, and C. J. Young
0 0.05 0.10.150.2 0.25 0.30.350.4
2
0
2
05 101520 25 30354045 50
0
1
10
20
0 0.05 0.10.150.20.250.3 0.35 0.4
1
0
05 1015 202530 3540 4550
0
0.5
1
0 0.050.10.150.2 0.250.3 0.350.4
2
0
2
00.05 0.10.150.20.250.30.35 0.4
1
0
1
seconds
Hz
seconds
Hz
seconds
seconds
a)
b)
c)
d)
e)
f)
SYNTHETIC EXAMPLE  ADAPTIVE PREFILTERING
Figure 2.
with added Gaussian noise. (b) Amplitude–frequency spectra of unfiltered seismograms
from Fig. 1a. (c) Crosscorrelation function for unfiltered noisy seismograms in Fig. 1a;
the maximum crosscorrelation coefficient is approximately 0.5. (d) Crosscoherency for
spectra shown in Fig. 1b, plotted as a function of frequency. Although most coherent
energy resides below 20 Hz, crosscoherency has a small peak at 35 Hz. A priori lowpass
filtering may reject this energy, which could be an important common constituent to in
clude in the crosscorrelation. (e) Adaptively filtered seismograms from Fig. 1a; note
significant reduction in the random noise constituent and overall similarity of resulting
waveforms. (f) Recomputed crosscorrelation function for the waveform pair, showing
maximum crosscorrelation coefficient ? 0.9. Initial crosscorrelation provided a zero
sample lag; the filtered trace pair yields a lag of 2 samples (from Aster and Rowe, 2000).
Example illustration of adaptive prefiltering. (a) Two synthetic seismograms
intrinsic nonstationarity of the signal. Use of the Hann (or
similar) taper tends to preferentially emphasize the spectral
properties of that portion of the time series which falls in
the central part of the window, while neglecting most of the
information near the extrema, which may not be well rep
resented by the center portion of the window.
One of the most successful approaches to solving these
problems is the multitaper spectral estimation (Thomson,
1982), wherein discrete prolate spheroidal wave functions,
which are eigenfunctions of the Dirichlet kernel, are em
ployed. The eigenfunctions, denoted by Uk(N, W; f), k ?
0, 1 . . . , N ? 1 are solutions to
W
?
?W
sinNp(f ? f?)U (N,W;f?)df?
sinp(f ? f?)
k
? k (N,W)•U (N,W;f),
k
(15)
k
where W (0 ? W ? 1/2) is a bandwidth normally of the
order 1/N. The functions are ordered by their eigenvalues:
1 ? k (N, W) ? k (N, W)
01
? . . . ? k
(N, W).
(16)
N?1
The first 2NW eigentapers have eigenvalues that are ex
tremely close to 1. Of all functions that are the Fourier trans
form of an indexlimited sequence, the discrete prolatesphe
roidal wave function has the greatest fractional energy
concentration within the bandwidth of (?W, W) (Thomson,
1982). These eigenfunctions are orthogonal over the interval
(?W, W) and are orthonormal over (?1/2, 1/2). Their Fou
rier transforms provide the discrete prolate spheroidal se
quences, also known as prolate eigentapers, with which we
can window the time series prior to estimating its spectrum
(Park et al., 1987). We illustrate the five lowestorder multi
taper functions for a 128sample window in Figure 3.
Page 8
An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets
1667
020 4060 80100120
2
1.5
1
0.5
0
0.5
1
1.5
2
1 2 3 4 5
0 1 2 3 4
FIVE LOWEST–ORDER EIGENTAPERS
Time index (samples)
Figure 3.
for a timebandwidth product of 4. Each taper recov
ers a different portion of the windowed seismogram;
note that higherorder tapers have increasingly steep
initial slopes; hence, spectral leakagebecomesgreater
with higherorder tapers. The window is 128 samples
long.
First five prolate spheroidal eigentapers
The products of the time series with each of the eigen
tapers are mutually orthogonal, as are their Fourier trans
forms; hence, we obtain a linearly independent series of ei
genspectra for the time series, which may be combined in a
weighted sum to estimate its true spectrum. The orthogo
nality of the eigenspectra further permits us to calculate es
timates of error statistics for the summed spectrum in units
of time. Use of the eigenspectral method therefore also en
ables us to address the question of dimensionallymeaningful
estimates of subsample pick lag errors, in lieu of dimension
less weights derived from ad hoc scaling of coherency, dis
cussed earlier.
We precompute the tapering functions for a particular
data length N and a specified time–bandwidth product, W.
W is typically chosen to be 4, where 2W approximatelyspec
ifies the resolution of the resulting spectral estimatesin Ray
leigh bins (Thompson, 1982; Park et al., 1987).
In our application, we calculate multitaper estimates of
the crossspectrum from two seismograms through conju
gate multiplication (equation 9) of corresponding multitaper
spectra (Fig. 4). Figure 4a shows two synthetic seismograms
of length 32 samples. Each waveform is multiplied by the
six lowestorder eigentaper functions, providing six linearly
independent tapered realizations of each trace (Fig. 4b).
Each is then transformed to the frequency domain, and we
compute six linearly independent crossspectral phase esti
mates from the tapered functions (Fig. 4c).
The lowestorder eigentapers (especially 0 and 1) have
very low spectral leakage outside of the specified spectral
resolution bandwidth (k ? W, k ? W). Higherorder tapers,
however, have progressively worse spectral leakage char
acteristics, which may unacceptably flatten the estimated
phase slope and hence underestimate the subsample corre
lation lag (e.g., Aster and Rowe, 2000; Rowe, 2000). We
therefore use the average of the two lowestorder tapers to
obtain the spectral values, while using the standarddeviation
of the six lowestorder tapers to estimate standarddeviations
on each phase point. This provides an acceptable tradeoff
between the advantages of the multitaper method (particu
larly quantitative error bars) and the need to reduce spectral
leakage and resulting underestimation of the subsample lag
(Aster and Rowe, 2000; Rowe, 2000). We illustrate by
showing the average crossspectral phase in Figure 4d, cal
culated from the 0th and 1storder spectra represented by
solid lines in Figure 4c. Error bars in Figure 4d were cal
culated using all six of the eigencrossspectra shown inFig
ure 4c. We further downweight the most uncertain phase
values by stretching the standard deviationsof themultitaper
estimates, rk, using the mapping
r? ? tan (r ), (?p/2 ? e ? r ? p/2 ? e),
kk
(17)
k
where e ? 0.01, and removing phase points from the phase
slope estimation if rkis outside the range specified in equa
tion (17). The phase slope and its standard deviation rbare
estimated using the L2(leastsquares) zerointercept linear
regression for the K usable data points with
K
? /r?
i
?
i?1
K
?
i?1
i
d?
df
?
(18)
f /r?
ii
and
K
1/2
?
1/r?
?
i?1
?
i?1
i
?
r (d?/df) ?
b
.
(19)
K
f /r?
ii
Because we use a preliminary integer crosscorrelation step
that adjusts the traces to the nearest sample prior to invoking
the crossspectral phase slope method, we do not need to
concern ourselves with phase unwrapping when estimating
the slope of the crossspectral phase. Event pairs that are
sufficiently similar to be passed to the subsample lag esti
mator will have a maximum phase lag of ?p. Since slight
to moderate waveform clipping does not affect the signal
phase, this subsample lag estimation isrelativelyrobusteven
when applied to clipped waveforms (e.g., Poupinet et al.,
1984).
The final interevent crosscorrelation lag difor event
pair i is determined by summing the coarse, integerlagvalue
laand the subsample lag estimate lb. Total lag standard de
Page 9
1668
C. A. Rowe, R. C. Aster, B. Borchers, and C. J. Young
051015 202530 35
‹ 0.6
‹ 0.4
‹ 0.2
0
0.2
0.4
0.6
0.8
Sample number
a)
00.10.2 0.30.40.5
‹ 3
‹ 2
‹ 1
0
1
2
3
Normalized Frequency
Figure 4.
dard deviation. (a) Two synthetic seismograms that have been aligned to the nearest
sample with integer correlation. (b) Six tapered representations of one of the synthetic
traces; each function is the result of using one of the six lowestorder eigentapers to
weight the time series. (c) Six linearly independent estimates ofthecrossspectralphase
for the traces in panel a. Each trace was windowed with thesix lowestordereigentapers
(as in panel b), then six corresponding crossspectra were calculated. Solid lines rep
resent the crossspectra corresponding to the two lowestorder eigentapers. The dashed
functions are crossspectral phases estimated from the third through sixth tapers. (d)
Mean crossspectral phase function estimated using two lowestorder crossspectra.
Dashed line represents the phase slope estimate; vertical error bars show standard
deviations for each Nyquist bin, estimated from all six crossspectra. Final phase–
frequency point with arrowed error bar has a standard deviation large enough to dis
qualify the point in the phase slope fitting (after Aster and Rowe, 2000).
Phase (radians)
c)
0510 1520 2530 35
Sample number
‹ 0.6
‹ 0.4
‹ 0.2
0
0.2
0.4
0.6
0.8
b)
Phase (radians)
Normalized Frequency
0 0.10.2 0.3 0.40.5
d)
Example of estimation of crossspectral phase slope and associated stan
viation estimates r? are the quadrature sum of the coarseand
fine lag standard deviations:
2
a,i
2
b,i
r? ?
i
r
? r
.
(20)
?
Solving the Systems of Constraints for Outlier
Resistant Pick Corrections
The desired Nvector of pick adjustments,
lution to
, is the so
rb
rr
G b ? d,
(21)
where
weighted interevent lags,
is a M (up to N (N ? 1)/2length) vector of
rd
l
? l
r?i
a,i
b,i
d ?
i
(i ? M),
(22)
and the system matrix, G, is a weighted firstdifference op
erator onof the form
?
. . .. . .. . .
rb
?1/r?
?1/r?
?1/r?
1/r?
00
0
0
0
0
0
0
0
0
0
0
0
0
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
(23)
2,12,1
0
0
1/r?
3,1 3,1
01/r?
4,14,1
0
0
0
0
0
?1/r?
?1/r?
?1/r?
1/r?
0
3,23,2
G ?
0
0
1/r?
4,2 4,2
0 1/r?
5,2 5,2
0
0
?1/r?
?1/r?
1/r?
0. . .
0
4,34,3
?
01/r?
. . . . . .
5,3 5,3
. . .
Page 10
An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets
1669
(Aster and Rowe, 2000; Rowe, 2000). G is sparse; its M ?
N dimension results in MN ? N3entries, of which only 2M
? N2entries are nonzero. This system sparseness can be
exploited to reduce greatly computer storage and solution
time. We parameterize G by two Mlength index vectors,
?and
AA
and positive entries for each constraint, and by an Mlength
vector,, containing the lag standard deviations. This stor
age scheme is capable of easily representing very large (tens
of millions of elements) G systems within currently avail
able workstation memory limits.
Straightforward linear approaches to solving equation
(21) for a leastsquares residual (L2) solution include Cho
lesky factorization or other techniques of solving the normal
equations or involve singularvalue decomposition (e.g.,
Press et al., 1989). Such methods, however, require calcu
lation of the (nonsparse) GTG (which contains M2? N4en
tries) or other large intermediary objects, which eliminate
computational and storage advantages associated with our
(
AA
L2solution has the undesirable property of being strongly
perturbed by outliers (e.g., Parker and McNutt, 1980;
Shearer, 1998).
We instead solve equation (21) by implementing an it
erative Polak–Ribiere conjugate gradient minimization (Po
lak, 1971; Press et al., 1989) formulated to operate effi
ciently with the (
AA
also beimplementedforthemorerobustminimumonenorm
residual (L1) solution. The functional to be minimized is
?, which contain the row indices of the negative
rr
rr?
?,
?) representation of the sparse G. Additionally, the
rr
?,
?) sparse storage scheme. This can
rr
M
?
i?1
d ? d
i

i,pred
(1)
f ? l
?
,
(24)
r
i
and the gradient of f at a general solution space point, x, is
r
r
?f ? sgn(G•x ? d),
(25)
where the sign function operates on each elementofavector,
returning 1 if the argument is positive, ?1 if the argument
is negative, and 0 if the argument is zero. Although it has
superior resistance to outliers, implementation of the L1re
sidual minimization becomes problematic when any of the
residuals becomes too small, as the derivative function be
comes discontinuous. We have successfully addressed this
difficulty by modifying the misfit function for values of f
that lie within the region ?e ? 0 ? e for small e:
d ? d
i

i,pred
iff  ? e, f ?
i
,
(26)
i
ri
and ?f is as described in equation (25) (Aster and Rowe,
2000; Rowe, 2000).
2
?d ? d
i
?
i,pred
e
2
If f  ? e, f ?
i
?
(27)
i
2r e
i
sgn(d ? d
i
)
i,pred
and df ?
.
(28)
i
r e
i
This modification has a theoretical drawback, insofar as the
smallest misfit we may obtain is e/2, as opposed to zero;
however, this poses no practical difficulty. We are currently
obtaining satisfactory results using a value of e ? 0.1. Cal
culation of the solution probability (outlined next) may be
done by recomputing f with the exact L1formulation, al
though this will not be the true minimum because of our
approximation for small fi(Rowe, 2000).
From a probabilistic viewpoint, the L1solution is the
maximum likelihood under the assumption of exponentially
distributed data errors, described by
1
r
(1)
P (x) ?
exp(?x ? m/r).
(29)
Parker and McNutt (1980) describe the statistics of l(1)
(equation 24) under an assumption of Gaussian data errors,
which we invoke as a useful qualityoffit measure to assess
whether the relative lags estimated by our L1solutions form
a consistent set of firstdifference constraints on
21). The L1analogue to the L2(v2) qstatistic for M ? K ?
N degrees of freedom is approximated by a thirdmoment
expression for the probability that a greater value of l(1)(M)
than the observed one (equation 24) could have occurred:
(equation
rb
c
6
(2)
q(f,M) ? P(x) ?
Z (x),
(30)
where P(x) is the cumulative probability integral
x
1
22
1
(?t /r )
P(x) ?
expdt
(31)
1/2 ?
r (2p)
1
??
for a zeromean Gaussian distribution with the variance of
r1? r2(l(1)), where
2
1
r ? (1 ? 2/p) M.
(32)
The second term is proportional both to the skewness of l(1):
2 ? p/2
c ?
.
(33)
3/21/2
(p/2 ? 1)
M
and to
1
(2)22
Z (x) ?
(x ? 1) exp(?x /2),
(34)
1/2
(2p)
where
Page 11
1670
C. A. Rowe, R. C. Aster, B. Borchers, and C. J. Young
f ? ¯ l
r1
x ?
(35)
and
1/2
¯ l ? (2/p)
M.
(36)
We have found that we can further improve the solution by
conservatively rejecting correlation outliers. The first step is
to discard lag constraints whose crosscorrelation maxima
are sufficiently poor that there is little likelihoodofconstruc
tive contribution to the solution.Wehaveadoptedanapriori
threshold of 0.8; any constraints which fail to meet this min
imum are rejected prior to the first attempt at solving the
system. This tolerance will vary depending on the quality of
the catalog being addressed and thedesiredsimilaritythresh
old in the application.
Preliminary clustering helps to ensure that disparate
families of earthquakes are not being compared, but outliers
and poorly correlating events resulting from correlation cy
cle skips, excessive noise or grossly inaccurate initial picks
may still remain. To eliminate the influence of theseoutliers,
we first calculate the L1solution to equation (21) and its
misfit measure, f (equation 24), using the full constraint and
data set for the cluster (minus the a priori rejections). Ifthere
are data outliers or if the system is otherwise highly incon
sistent, a large value of c (equation 33) will produce a highly
unlikely (very small) value of q(f,M) (equation 30). Wethen
successively cull constraints and corresponding data from
the system, using a binary search mechanism(e.g.,Asterand
Rowe, 2000; Rowe, 2000). The data misfit vector is sorted
and the constraints corresponding to the worse half of the
misfit estimate are discarded. Discarding, rather than down
weighting, the highest misfit constraints ensures correct
probability calculations for subsequent solutions by appro
priately adjusting the degrees of freedom of the system. The
reduced system is solved, and the value of q(fi,Mi) is recal
culated for the ith bisection step under the new degrees of
freedom. If this value is too good (q(fi,Mi) ? 0.02), we
assume that too many constraints have been discarded and
we restore a portion of them. We recompute q(fi,Mi) and
restore or discard constraints again, as appropriate. Thispro
cess generally converges to a satisfactory value of q(fi,Mi)
within 10 steps, depending on predetermined thresholds for
convergence and bisection step size parameters. After con
vergence has been achieved, we obtain a final pick adjust
ment solution for the reduced system of M? constraints, and
calculate 1r error bars for each element of the final solution
via Monte Carlo propagation of Gaussian data errors. An a
posteriori zeromean constraint is applied to the final set of
pick adjustments.
Figure 5 illustrates the solution process for an m ? 6
event synthetic cluster, initially constrained by a full set of
M ? (6)(5)/2 ? 15 intereventlag estimatesand1zeromean
constraint. The true pattern of pick adjustments was chosen
arbitrarily to be a zeromean halfperiod sine function with
an amplitude of 1 time unit. The 15 firstdifference data
points were randomized by adding a Gaussian error term
with a standard deviation of 0.2 time units. Outliers were
introduced to the system by adding large random terms to
data points 3 and 7.
Figures 5a and 5b show the L1solution and data fit for
the entire data and constraint set, where the recovery of true
pick adjustments has been skewed by the data outliers, and
the probability of a worse misfit is q ? 0 to single precision.
After automatically rejecting thetwosystemconstraintswith
the largest residual contributions, as we have already de
scribed, and resolving the problem, we obtain a revised so
lution (Fig. 5c) with an acceptable data misfit (Fig. 5d) of
q(9.22, 8) ? 0.06 and an L1misfit improvement between
solution and true model,
M?x ? x
i
?
i?1

i,true
f? ?
,
(37)
r
1
of 59% with generally tighter 1r error bars.
Absolute versus Relative Locations
Introduction of the new picks for each cluster provides
precise relative event relocations within clusters, but the
question of improved intercluster locations is not addressed
in this fashion. Within any given cluster it is commonly ob
served that analyst picks do not always scatter about a zero
mean; they are often systematically late in instances of low
signal/noise. The resulting adjusted picks for a particular
phase may therefore exhibit a significant bias. Such biases
will result in the relative mislocation of mean cluster cen
troids, an artifact that is carried forwardfromthepreliminary
catalog to the intracluster relative relocations (Rowe, 2000;
Rowe et al., 2002).
To correct for relative cluster centroid mislocations, we
compare waveform similarity among the clusters. Relative
pick lags within clusters are estimated as we have outlined,
adjusting the phase picks accordingly. Waveforms for each
phase within each cluster are then aligned on their adjusted
picks and stacked (Fig. 6). Each stack is then treated as a
composite earthquake trace for the cluster. Ensembles of
stacked seismograms are then crosscorrelated and relative
pick lags determined between the composite earthquakesus
ing the L1norm conjugate gradient solver as before. The
resulting intercluster lags are used to adjust mean picks
within each cluster. In this way the very tightly constrained
relative adjustments for intracluster associations are pre
served, with no risk of degrading these relative locations by
including uncorrelated events, and the overall intercluster
relationships are adjusted according to the composite wave
form crosscorrelation lags (Rowe, 2000; Roweet al.,2002).
We illustrate this hierarchical correlation and stacking
method in Figure 6 for three synthetic clusters. In Figures
6a, 6b, and 6c we show preliminary and repicked waveform
alignments for each of the three clusters. These synthetic
Page 12
An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets
1671
bj
di
b)
d)
Full data set fit
Residualrejected data set fit
Index
Full data set L Solution
1
a)
c)Residualrejected data set L Solution
1
Index
bj
di
1 2 3 4 5 6
1 2 3 4 5 6
2 4 6 8 10 12 14 16
2 4 6 8 10 12 14 16
10
0
10
10
0
10
8
4
0
4
8
8
4
0
4
8
Figure 5.
iterative residualbased system constraint rejection. (a) Full data set solution (triangles)
with Monte Carlo–estimated standard deviations, compared to the true model (circles).
(b) Predicted data (triangles) and actual data (circles) accompanied by standard errors.
(c) Reduced dataset with outliers removed by residualbased rejection. (d) Refined
solution following removal of outliers (after Aster and Rowe, 2000).
Improved solution robustness using the L1residual minimization and
clusters were generated by isolating one cluster of similar
waveforms and variously perturbing the preliminary picks
to generate different pick variances and means among the
three examples. Although within each of the aligned clusters
the new picks are consistent, note that the mean adjusted
pick (horizontal dashed line) occurs at somewhat different
times on the resulting alignment stack. This is an artifact of
the differing preliminary pick distributions among each of
the clusters. Figure 6d illustrates the problem that results if
we assume that preliminary cluster centroids are accurate:in
the upper panels we show on the left all member events of
the three clusters, aligned on their preliminary picks, with
the resulting stack shown to the right. In the lower panels of
Figure 6d we show the traces aligned on their revised intra
cluster picks, and the resulting stack. Although the wave
forms have clearly been well aligned for intracluster consis
tency, a serious misalignment is exhibited among the three
clusters in terms of the resulting mean pick estimates.
If we subsequently crosscorrelate the stacked wave
forms, however, and determine relative lags among the
stacks (Fig. 6e), we can then apply the additional pick cor
rection to each of the member traces and adjust all events
by their relative intercluster lags (Fig. 6f). This bilevelcross
correlation approach still does not address the question of
overall analyst bias for an entire catalog, but any remaining
picking artifact could be handled through individual station
corrections determined through JHD or other joint location
methods. We note, however, that this approach cannot re
move the artifacts from the picks themselves. Hence, other
applications that rely on the picks will be unable to separate
picking bias from actual traveltime residuals at the receiv
ers. This overall bias exists in the preliminary as well as the
relocated catalog. Furthermore, cluster stacks and individual
events that do not correlate well with other events or clusters
remain uncorrected with this technique. Addressing these
orphans, as well as addressing overall catalog bias, is the
subject of ongoing work.
Summary
We have developed an automatic, adaptive algorithm
for adjusting phase picks for consistency among similar
events within large digital seismic waveform catalogs. In
novations include automatic, adaptive, crosscoherency and
polarization filtering, and the use of eigenspectral methods
for estimating subsample phase lags and dimensionally
meaningful lag standard deviations. After initial cross
correlation the catalog is clustered using a hierarchical,
dendrogrambased pairgroup classification scheme with
segregation based either on the cophenetic correlation func
tion or on a predetermined crosscorrelation threshold. Re
sulting clusters are solved for consistent intracluster pick
lags using an L1norm minimizing, outlierresistant, itera
tive, conjugate gradient method formulated to minimize
memory and computation requirements. Intracluster seis
mograms are aligned on the zeromean adjusted repicks,
then stacked to provide a composite waveform. Ensembles
of stacked seismograms are then crosscorrelated among
clusters to determine intercluster pick lag adjustments; this
corrects for possible analyst biases and provides for consis
tent intercluster pick (and location) relationships within the
Page 13
1672
C. A. Rowe, R. C. Aster, B. Borchers, and C. J. Young
0
50
100
150
200
250
300
0
50
100
150
200
250
300
All traces, following intra
cluster adjustment
0
50
100
150
200
250
300
0
50
100
150
200
250
300
0
50
100
150
200
250
300
0
50
100
150
200
250
300
d) traces stack e) stacks stack f) traces stack
adjusted preliminary
Intracluster alignments for each of three waveform clusters, with stacks
Stacks, before and after
stack correlation
All traces, preliminary and
after stack lags applied
0
50
100
150
200
250
300
0
50
100
150
200
250
300
0
50
100
150
200
250
300
42024
0
50
100
150
200
250
300
42024
cluster1
0
50
100
150
200
250
300
0
50
100
150
200
250
300
0
50
100
150
200
250
300
42024
0
50
100
150
200
250
300
42024
0
50
100
150
200
250
300
0
50
100
150
200
250
300
adjusted preliminary
a) Cluster 1 b) Cluster 2 c) Cluster 3
C1 traces C1 stack C2 traces C2 stack C3 traces C3 stack
Figure 6.
hypothetical clusters of five events each. Upper panels show traces aligned on prelim
inary picks and associated waveform stack; lower panels show traces aligned on ad
justed picks, with associated stack. Horizontal dashed lines indicate pick times on the
stacked trace in each. (d) Clusters from a, b, and c combined to show initial scatter
(upper panels) and relationships among intracluster adjustments (lower panels). (e)
Stacks from clusters of a, b, and c, showing preliminary alignment (upper panels) and
appropriately shifted stacks, following hierarchical stacking and crosscorrelation
(lower panels). (f) The same three clusters showing initial misalignments(upperpanels)
and final, corrected alignments (lower panels) after both intracluster and intercluster
lags have been applied.
Hierarchical clustering, lag adjustment, and stacking method (a–c). Three
catalog, with no dependence on preliminary hypocenters.
We stress that, although the method is termedautomatic,this
does not imply black box or completely handsoff, insofar
as application requires some intelligent choices on the part
of the user to tune the parameters in the algorithm so that it
then may be allowed to process the data automatically. The
goal of this software is to provide substantial improvement
in pick consistency and accuracy while reducing the burden
on analysts; it goes without saying that problematic events
will require human intervention, and assessment of the suc
cess of the automatic processing will be required. We are
continuing development of the procedure to further mini
mize userinterventionandtoimplementnearrealtimefunc
tioning.
Page 14
An Automatic, Adaptive Algorithm for Refining Phase Picks in Large Seismic Data Sets
1673
An application of this technique to a large seismic cat
alog can be found in Rowe et al. (2002), in which the al
gorithm is applied to microearthquakes associated with in
jection experiments at the Soultz, France, hot dry rock
geothermal site.
Acknowledgments
Very useful discussions of the method were provided by H. Moriya
and R. Jones. Helpful reviews of the manuscript were provided by M.
Fehler, C. Thurber, H. Tobin, J. Schlue, and N. Deichmann. We also ap
preciate thorough reviews by S. Moran and H. Asanuma. This work was
supported under a grant from Sandia National Laboratories, Albuquerque,
New Mexico. Sandia is a multiprogram laboratory operated by Sandia Cor
poration, a Lockheed Martin Company, for the U.S. Department of Energy
under Contract Number DEAC0494AL85000. Funding was alsoprovided
by Niitsuma Laboratories, Tohoku University, Sendai, Japan, and by Na
tional Science Foundation Office of Polar Programs Grant Number
9419267.
References
Aster, R. C., and C. A. Rowe (2000). Automatic phase pick refinement and
similar event association in large seismic data sets, in Advances in
Seismic Event Location, C. Thurber and N. Rabinowitz(Editors),Klu
wer, Amsterdam, 231–263.
Aster, R. C., and J. Scott (1993). Comprehensive characterization of wave
form similarity in microearthquake data sets, Bull. Seism. Soc. Am.
83, 1307–1314.
Aster, R. C., P. M. Shearer, and J. Berger (1990). Quantitative measure
ments of shear wave polarization at the Anza seismic network, south
ern California: implications for shear wave splitting and earthquake
prediction, J. Geophys. Res. 95, 12,449–12,473.
Baria, R., J. Baumgartner, A. Gerard, R. Jung, and J. Garnish (1999). Eu
ropean HDR research programme at SoultzsousForets (France)
1987–1996, Geothermics 28, 655–669.
Carr, D., C. Young, R. Aster, and X. Zhang (1999a). Cluster Analysis for
CTBT Seismic Event monitoring, 21st Annual Seismic Research
Symposium on Monitoring a CTBT, 285–293.
Carr, D., C. Young, J. Harris, R. Aster, and X. Zhang (1999b). Cluster
Analysis for CTBT Seismic Event monitoring (abstract), Seism. Res.
Lett. 70, 227–228.
Deichmann, N., and M GarciaFernandez (1992). Rupture geometry from
highprecision relative hypocentre locations of microearthquake clus
ters, Geophys. J. Int. 110, 501–517.
Dodge, D. A., G. C. Beroza, and W. L. Ellsworth (1995). Foreshock se
quence of the 1992 Landers, California, earthquake and its implica
tions for earthquake nucleation, J. Geophys. Res. 100, 9865–9880.
Fehler, M., W. S. Phillips, R. Jones, L. House, R. Aster, and C. Rowe
(2000). A method for improving relative earthquake locations, Bull.
Seism. Soc. Am. 90, 775–780.
Fremont, M.J., and S. D. Malone (1987). High precision relative locations
of earthquakes at Mount St. Helens, Washington, J. Geophys. Res.
92, 10,223–10,236.
Gillard, D., A. M. Rubin, and P. Okubo (1998). Highly concentrated seis
micity caused by deformation of Kilauea’s deep magma system, Na
ture 384, 343–346.
Got, J.L., J. Fre ´chet, and F. W. Klein (1994). Deep fault plane geometry
inferred from multiplet relative relocation beneath the south flank of
Kilauea, J. Geophys. Res. 99, 15,375–15,386.
Harris, M., and C. Young (1997). MatSeis: a seismic GUI and toolbox for
MATLAB, Seism. Res. Lett. 68, 267–269.
Joswig, M. (1995). Automated classification of local earthquake data in the
BUG small array, Geophys. J. Int. 160, 262–285.
Kissling, E. (1988). Geotomography with local earthquake data, Rev. Geo
phys. 26, 659–698.
Lance, G. N., and W. T. Williams (1967). Ageneraltheoryforclassificatory
sorting strategies. 1. hierarchical systems., Comput. J. 10, 271–276.
Lees, J. M., (1998). Multiplet analysis at Coso geothermal, Bull. Seism.
Soc. Am. 88, 1127–1143.
Ludwig, J. A., and J. F. Reynolds (1988). Statistical Ecology: A Primer on
Methods and Computing, John Wiley & Sons, New York.
Park, J., C. R. Lindberg, and F. L. Vernon III (1987). Multitaper spectral
analysis of highfrequency seismograms, J. Geophys. Res. 92,
12,675–12,684.
Parker, R., and M. McNutt (1980). Statistics for the onenorm misfit mea
sure, J. Geophys. Res. 85, 4429–4430.
Phillips, W. S. (2000). Precise microearthquake locations and fluid flow in
the geothermal reservoir at SoultzsousFore ˆts, France, Bull. Seism.
Soc. Am. 90, 212–228.
Phillips, W. S., L. S. House, and M. C. Fehler (1997). Detailed joint struc
ture in a geothermal reservoir from studies of induced microearth
quake clusters, J. Geophys. Res. 102, 11,745–11,763.
Polak, E. (1971). Computational Methodsin Optimization,AcademicPress,
New York.
Poupinet, G., W. L. Ellsworth, and J. Fre ´chet (1984). Monitoring velocity
variations in the crust using earthquake doublets: an application to
the Calaveras Fault, California, J. Geophys. Res. 89, 5719–5731.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1989).
Numerical Recipes in C, Cambridge Univ. Press, New York.
Pujol, J. (1992). Joint hypocentral location in media with lateral velocity
variations and interpretation of the station corrections, Phys. Earth
Planet. Inter. 75, 7–24.
Rowe, C. (2000). CorrelationBased Phase Pick Correction and Similar
Earthquake Family Identification in Large Seismic Waveform Cata
logs, Ph.D. Thesis, New Mexico Institute of Mining and Technology,
Socorro.
Rowe, C. A., and R. C. Aster (1999). Application of automatic, adaptive
filtering and eigenspectral techniques to large digital waveform cat
alogs for improved phase pick consistency and uncertainty estimates
(abstract), EOS 80, F660.
Rowe, C. A., R. C. Aster, W. S. Phillips, R. H. Jones, B. Borchers, and
M. C. Fehler (2002). Relocation of induced microseismicity at the
Soultz geothermal reservoir using automated, highprecision repick
ing, Pure Appl. Geophys. 159, 563–596.
Rubin, A. M., D. Gillard, and J.L. Got (1999). Streaks of microearthquakes
along creeping faults, Nature 400, 635–641.
Shearer, P. M. (1997). Improving local earthquake locations using the L1
norm and waveform cross correlation: application to the WhittierNar
rows, California, aftershock sequence, J. Geophys, Res. 102, 8269–
8283.
Shearer, P. M. (1998). Evidence from a cluster of small earthquakes for a
fault at 18 km depth beneath Oak Ridge, Southern California, Bull.
Seism. Soc. Am. 88, 1327–1336.
Slunga, R., S. T. Rognvaldsson, and R. Bodvarsson (1995). Absolute and
relative locations of similar events with application to microearth
quakes in southern Iceland, Geophys. J. Int. 123, 409–419.
Sneath, P. H. A., and R. R. Sokal (1973). Numerical Taxonomy, W. H.
Freeman & Company, San Francisco.
Thomson, J. D. (1982). Spectrum estimation and harmonic analysis, Proc.
IEEE 70, 1055–1096.
Waldhauser, F., and W. L. Ellsworth (2000). A doubledifference earth
quake location algorithm: method and application to the northern
Hayward Fault, California, Bull. Seism. Soc. Am. 90, 1353–1368.
Wassermann, J., and M. Ohrnberger (2001). Automatic hypocenter deter
mination of volcano induced seismic transients based on wavefield
coherence: an application to the 1998 eruption of Mt. Merapi, Indo
nesia, J. Volcanol. Geotherm. Res. 110, 57–77.
Young, C. J., B. J. Merchant, and R. C. Aster (2001). Comparison of cluster
analysis methods for identifying regional seismic events, 23rd Annual
DTRA/NNSA Seismic Research Review, 229–238.
Page 15
1674
C. A. Rowe, R. C. Aster, B. Borchers, and C. J. Young
Department of Earth and Environmental Science and Geophysical
Research Center
New Mexico Institute of Mining and Technology
Socorro, New Mexico 87801
char@geology.wisc.edu
(C.A.R., R.C.A.)
Department of Mathematics
New Mexico Institute of Mining and Technology
Socorro, New Mexico 87801
(B.B.)
Sandia National Laboratories
P.O. Box 5800, MS 1138
Albuquerque, New Mexico 87185–1138
(C.J.Y.)
Manuscript received 22 August 2001.