ArticlePDF Available

The effects of acoustic misclassification on cetacean species abundance estimation

Authors:

Abstract and Figures

To estimate the density or abundance of a cetacean species using acoustic detection data, it is necessary to correctly identify the species that are detected. Developing an automated species classifier with 100% correct classification rate for any species is likely to stay out of reach. It is therefore necessary to consider the effect of misidentified detections on the number of observed data and consequently on abundance or density estimation, and develop methods to cope with these misidentifications. If misclassification rates are known, it is possible to estimate the true numbers of detected calls without bias. However, misclassification and uncertainties in the level of misclassification increase the variance of the estimates. If the true numbers of calls from different species are similar, then a small amount of misclassification between species and a small amount of uncertainty around the classification probabilities does not have an overly detrimental effect on the overall variance. However, if there is a difference in the encounter rate between species calls and/or a large amount of uncertainty in misclassification rates, then the variance of the estimates becomes very large and this dramatically increases the variance of the final abundance estimate.
Content may be subject to copyright.
The effects of acoustic misclassification on cetacean species
abundance estimation
Marjolaine Caillat
a)
Sea Mammal Research Unit, Scottish Oceans Institute, St Andrews University, St Andrews KY16 8LB,
United Kingdom
Len Thomas
Centre for Research into Ecological and Environmental Modelling, The Observatory Buchanan Gardens,
University of St Andrews, St Andrews KY16 9LZ, United Kingdom
Douglas Gillespie
Sea Mammal Research Unit, Scottish Oceans Institute, St Andrews University, St Andrews KY16 8LB,
United Kingdom
(Received 4 July 2012; revised 29 January 2013; accepted 11 February 2013)
To estimate the density or abundance of a cetacean species using acoustic detection data, it is nec-
essary to correctly identify the species that are detected. Developing an automated species classifier
with 100% correct classification rate for any species is likely to stay out of reach. It is therefore
necessary to consider the effect of misidentified detections on the number of observed data and
consequently on abundance or density estimatio n, and develop methods to cope with these misiden-
tifications. If misclassification rates are known, it is possible to estimate the true numbers of
detected calls without bias. However, misclassification and uncertainties in the level of misclassifi-
cation increase the var iance of the estimates. If the true numbers of calls from different species are
similar, then a small amount of misclassification between species and a small amount of uncertainty
around the classification probabilities does not have an overly detr imental effect on the overall var-
iance. However, if there is a difference in the encounter rate bet ween species calls and/or a large
amount of uncertainty in misclassification rates, then the variance of the estimates becomes very
large and this dramatically increases the variance of the final abundance estimate.
V
C
2013 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4816569]
PACS number(s): 43.60.Bf [ZH M] Pages: 2469–2476
I. INTRODUCTION
Over the last two decades, researchers and managers
have become increasingly aware of the advantages of using
passive acoustic monitoring over visual cues to detect marine
mammals. Many studies, in particular those processing large
datasets from long-term fixed hydrophone deployments, rely
on automatic detectors and species classifiers to decrease the
time and cost of analysis.
The repertoire of vocalizations by marine mammals is
large and highly variable across species. Some species, such
as large whales, produce calls that are easily recognized by
an experienced observer or by an automatic classifier.
However, many of the delphinid species produce highly vari-
able calls where the frequency range of the different species’
vocalizations overlaps to a large degree. These sounds are
more challenging to classify. Classification algorithms have
been developed by a number of researchers to identify del-
phinid sounds (e.g., Datta and Sturtivant, 2002; Gillespie
et al., 2013; Oswald et al., 2007). The rate of misclassifica-
tion in these examples was determined by testing the classi-
fiers on recordings of species whose identity had been
determined visually. However, none of these classifiers are
perfect, and there remains considerable misclassification
between species.
In any management strategy, accurate and precise quan-
tification of population size (“abundance”) is crucial to de-
velop appropriate management actions. A standard method
for estimating abundance based on acoustic detections is cue
counting, where the cues are the vocalizations detected
(Marques et al., 2009, 2011). The general formula to estimate
a species’ abundance from cues is given by
^
N ¼
nð1
^
cÞ
aT
^
P
^
r
A; (1)
where n is the number of detected cues,
^
c is the estimated
proportion of false positives detected (calls classifie d as the
species of interest which originated from other species or
other sources of noise), a is the area in which cues can be
detected,
^
P is the estimated average probability of a cue
being detected within this area during recording time T,
^
ris
the estimated cue production rate and A is the total study
area (Marques et al., 2009). Apart from the fact that this for-
mula requires knowledge of the cue production (i.e., vocal-
ization) rate, which is unknown for many species, the
abundance estimate in Eq. (1) only considers the presence of
one species at a time in the area of interest.
In this paper, we only address the issue of determining
the true number of calls v
_
, which in Eq. (1) is the term
a)
Author to whom correspondence should be addressed. Electronic mail:
mc326@st-andrews.ac.uk
J. Acoust. Soc. Am. 134 (3), Pt. 2, September 2013
V
C
2013 Acoustical Society of America 24690001-4966/2013/134(3)/2469/8/$30.00
Author's complimentary copy
v
_
¼ n(1 c
_
). Marques et al. (2009) estimated the proportion
of false positive detections,
^
c, by visually examining 30 peri-
ods of 10 min from 6 days of recordings, a process which
relied heavily on a human operator being able to distinguish
between the sounds of interest and a range of other sound
sources.
If the main source of false positive detections is the pres-
ence of other species with similar vocalizations in the study
area, then the rate of false positive detections will be strongly
related to the relative call densities from the different species.
For example, if we know that species A and B are often con-
fused by the classifier, and that species B is much more com-
mon or more vocal than species A, then a high percentage of
the detections attributed by the classifier to species A will in
fact be false positives detections resulting from the presence
of species B. If on the other hand, species B were extremely
rare or very silent, then there would be few misclassifications
assigned to species A from species B.
Since we are interested in estimating the density of
multiple species within a given study area, it becomes nec-
essary to replace the (1 c
_
) term with the more general
equation
^
v ¼ MðnÞ; (2)
where v
_
and n are now vectors representing the true numbers
of calls and the numbers of calls counted for each species af-
ter misclassification, respectively, and M is a more general
misclassification operator.
The level of misclassification between species can gen-
erally be described in terms of a confusion matrix (e.g.,
Oswald et al., 2007; Gillespie et al., 2013), which summa-
rizes the probabilities for correct, false positive and false
negative classifications of all species considered. The confu-
sion matrix [Eq. (3)] is a square matrix of dimension m m
in which each element of the matrix p
ij
is the probability of
classifying species j (column) as species i (rows). In particu-
lar, the entries for i ¼ j represent the probabilities of cor-
rectly classifying a species (success) and the off-diagonals
(i j) are probabilities of incorrectly classifying species j as
species i (failure). A small p
ij
, 8 I j, means a low misclassi-
fication rate of species j as species i while a large p
ij
, 8 i j,
means a high misclassification rate. On the other hand, a
small p
ij
, 8 i ¼ j, means a low correct classification rate of
species j and vice versa for a high p
ij
, 8 i ¼ j. Hence, the
confusion matrix is given as
C¼ðp
i;j
Þ
1i;jm
p
11
 p
1j
 p
1m
.
.
.
.
.
.
p
i1
 p
ij
 p
im
.
.
.
.
.
.
p
m1
 p
mj
 p
mm
0
B
B
B
B
B
B
B
B
B
@
1
C
C
C
C
C
C
C
C
C
A
; (3)
where R
jp
ij
¼ 1 8 1 j m.
The expected number of detected calls E(n) for each
species following misclassification is therefore given by
EðnÞ¼Cv (4)
and it follows that the true number of detections for each
species can be estimated using
^
v ¼ C
1
n; (5)
where C
1
is the inverse of the confusi on matrix C.
Species classification is a stochastic process where each
classification may be considered as an independent random
event. In addition, we cannot assume that the confusion ma-
trix is known precisely since it is typically derived from a fi-
nite sample of real data. Gillespie et al. (2013) show
uncertainties, expressed as a measure of standard deviation,
ranging from 0.04 to 0.48 for the probabilities of a typical
confusion matrix. The stochastic nature of the classification
process combined with our imperfect knowledge of the con-
fusion matrix add to the uncertainty of any estimate of the
true number of detected cues (
^
v) and consequently, to the
uncertainty of estimated species abundance if misclassifica-
tion is taken into account.
With this in mind, this paper presents the first statistical
analysis of the effects of species misclassification in acoustic
surveys. In particular, it examines the bias and precision of
the estimates of the true number of detected calls from multi-
ple species which arise from the stochastic nature of the con-
fusion process, as well as the uncertainty within the confusion
matrix. We achieved this by looking at hypothetical confusion
matrices and simulated data.
After a brief description of the classification process in
mathematical terms, which also serves as an introduction of
notation, we begin by looking at a simple model containing
only the stochasticity within the classification process. We
TABLE I. The five different confusion matrixes (a–e) used during the simulation studies. Confusion matrix a is the identity matrix (no misclassification), b
and c both have a high correct classification rate, but differ in that the misclassification rates of b are equal between species, whereas they are different in c.
Confusion matrices d and e both have low rates of correct classification and again differ in that misclassification is equal between species in d, but varies in e.
(a) True species (b) True species (c) True species (d) True species (e) True species
SpA SpB SpC SpD SpA SpB SpC SpD SpA SpB SpC SpD SpA SpB SpC SpD SpA SpB SpC SpD
Predicted species SpA 1 0 0 0 0.85 0.05 0.05 0.05 0.85 0.08 0.02 0.01 0.52 0.16 0.16 0.16 0.52 0.04 0.20 0.20
SpB 0 1 0 0 0.05 0.85 0.05 0.05 0.10 0.85 0.03 0.09 0.16 0.52 0.16 0.16 0.15 0.52 0.13 0.05
SpC 0 0 1 0 0.05 0.05 0.85 0.05 0.03 0.05 0.85 0.05 0.16 0.16 0.52 0.16 0.10 0.14 0.52 0.23
SpD 0 0 0 1 0.05 0.05 0.05 0.85 0.02 0.02 0.10 0.85 0.16 0.16 0.16 0.52 0.23 0.30 0.15 0.52
Scenario x.a Scenario x.b Scenario x.c Scenario x.d Scenario x.e
2470 J. Acoust. Soc. Am., Vol. 134, No. 3, Pt. 2, September 2013 Caillat et al.: The effects of misclassification
Author's complimentary copy
then extend this analysis by incorporating uncertainty in the
rates of misclass ification.
II. THE CLASSIFICATION PROCESS
We assume that classification events are independent of
each other. Thus the classification for each species j can be
described as the outcome of a multinomial process, wher e
the vector of probabilities of the corresponding multinomial
distribution is given by the jth column of the confusion
matrix.
The numbers of trials in these multinomial distributions
are the true number of detections v, i.e., v
j
is the number of
trials, or the true number of detections for species j.
The expected observed number of vocalizations of spe-
cies i (n
i
) is equal to the number of vocalizations of species i
correctly classified as species i plus the false positive classi-
fications when vocalizations of another s pecies j i have
been misclassified as species i,
E½n
i
¼ p
ii
v
i
z}|{
Correct
Classified
þ
X
ji
p
ij
v
j
z}|{
Misclassified
species
: (6)
The following interpretation will be useful when simulations
are considered later on: Since we have identified each column
with the probability vector of a multinomial distribution, it
follows from Eq. (6) that the observed data for species i (n
i
)
is the sum of the output values of the ith components of m
multinomial distributions, i.e.,
n
i
¼
X
m
j¼1
Multiðv
j
; p
:j
Þ½i (7)
with the number of trials being the true number of detections
v
j
and the multinomial probability for species j being the jth
column p.
j
of the confusion matrix, e.g., n
1
is the sum of the
first realized values of m multinomial distributions.
III. METHODS
For this study, we have not considered the effects of ani-
mal encounter rate, which can be an important source of
uncertainty on animal abundance estimates, but would
detract from the primary purpose of this paper which is to
examine the effects of misclassification. We therefore con-
sider only the following two sources of uncertainty:
(1) The stochastic nature of the classification process.
(2) Uncertainty in our knowledge of the classifier perform-
ance (i.e., uncertainty on the values of the elements of
the confusion matrix).
First, we only consider the stochastic nature of the classifi-
cation process, by assuming that the confusion matrix is
known (i.e., no uncertainty). In a second step, we include
additional uncertainty in the values of the confusion matrix
itself.
The bias and variance on our estimates of the true num-
ber of detected calls was assessed using five different confu-
sion matrixes (Table I) with increasing levels of
misclassification. These include the identity matrix (i.e., no
misclassification) and four others containing both low and
high rates of misclassification with the misclassification
being either the same (scenarios b or d) or differing for each
species (scenarios c or e).
For each confusion matrix we evaluated the bias and
variance using both balanced data (i.e., same number of calls
for each species, scenario 1) and unbalanced data (i.e., dif-
fering numbers of calls per species, scenario 2). All models
were developed with four species. For balanced data, we
assumed that the true number of calls was exactly 3000 for
each species. For unbalanced data, we selected values of
8000, 3000, 950, and 50 calls, respectively. Thus the total
number of calls is the same as the balanced data, but with a
160-fold difference in the number of vocalizations between
the most and the least abundant species.
The ten different scenarios (five confusion matrixes
with balanced and unbalanced data) are summarized in
Table II.
TABLE II. Summary of the scenarios tested in the simulation study: similar misclassification rates means that elements of the confusion matrix outside the di-
agonal are the same between species (scenarios x.b and scenarios x.d), whereas for different misclassifications rates, they are different between species (sce-
narios x.b and scenarios x.e).
Balanced data Unbalanced data
No misclassification Scenario 1.a Scenario 2.a
Low misclassification rates Similar misclassification rates Scenario 1.b Scenario 2.b
Different misclassification rates Scenario 1.c Scenario 2.c
High misclassification rates Similar misclassification rates Scenario 1.d Scenario 2.d
Different misclassification rates Scenario 1.e Scenario 2.e
TABLE III. Examples of Dirichlet a parameters used for species A for each scenario. For the remaining species a parameters were the same but in different
order to match the confusion matrices.
a for: Scx.a Scx.b Scx.c Scx.d Scx.e
Low uncertainty 100,0,0,0 85,5,5,5 85,10,3,2 52,16,16,16 52,15,10,23
High uncertainty 0.1,0,0,0 0.85,5,5,5 0.85,0.1,0.03,0.02 0.52,0.16,0.16,0.16 0.52,0.15,0.1,0.23
J. Acoust. Soc. Am., Vol. 134, No. 3, Pt. 2, September 2013 Caillat et al.: The effects of misclassification 2471
Author's complimentary copy
For the simple case, in which the variance within the
values of the confusion matrix is assumed zero, we have
derived an analytical solution for the bias and variance on
the true number of detected calls (Appendix). However,
when uncertainty is added to the confusion matrix, the ana-
lytical approach becomes more complex, so we also explore
bias and variance through simulation. When variability in
the values of the confusion matrix is added to the model,
bias and precision are measured from simulation only.
For each simulation (b), the numbers of misclassified, or
observed, calls n
b
were generated from the sum of four mul-
tinomial distributions with parameters v
b
representing the
true number of calls and p’s being the confusion matrix
probabilities [Eq. (7)]. The estimated true number of calls
^
v
b
was then estimated by multiplying the inverse of the confu-
sion matrix by the number of misclassified (observed) calls
Eq. (8),
^
v
b
¼ C
1
n
b
: (8)
For each scenario, this process was repeated 10 000 times
and the mean [Eq. (A3) in Appendix] and variance [Eq.
(A10) in Appendix] of the estimated
^
v calculated.
When uncertainty in the confusion matrix was consid-
ered, the columns p
j
of the confusion matrix are considered
to be realizations of a probability distribution. To meet the
requirement that columns have to sum to 1, this distribution
was chosen to be a Dirichlet. The Dirichlet distribution is a
multivariate probability distribution parameterized by a vec-
tor a of positives reals, p Dir(a) where
P
k
i¼1
p
ij
¼ 1
(Gelman, 2004).
For each of the 10 000 simulation trials, new values for
the confusion matrix probabilities p
ij
were generated from a
Dirichlet distribution; these were then used in the same mul-
tinomial misclassification process as for the simpler situa-
tion. The true number of calls
^
v was again estimated using
the inverse of the mean of the confusion matrix used to sim-
ulate the observed data [Eq. (5)].
Simulations were run with two levels (low and high) of
uncertainty on the confusion matrix. In both situations, the
alpha parameters of the Dirichlet distribution were selected
such that the means of the parameters were equal to the con-
fusion matrix probabilities of the different scenarios (Table
III). However to generate low uncertainty in the confusion
matrix, the parameters were selected to have a variance
equal to 0.01 on average. The parameters for the high uncer-
tainty were selected to match a variance of 0.1 observed
with real data of Gillespie et al. (2013).
IV. RESULTS
Through this study the variance was represented by the
coefficient of variation (CV), which is the standard deviation
of the estimate divided by the estimate, generally reported in
percent. When uncertainties in the probabilities of the confu-
sion matrix were not taken into account, the analytical
approach (Appendix) demonstrated that the means of
^
v were
an unbiased estimate of the truth (n), (Table IV). The simula-
tions verified this result (Table V ); no significant difference
between means and variances calculated analytically and
estimated through simulation was observed.
As expected, without misclassification, the estimates were
unbiased and precise (CV ¼ 0). A decreas e in the rate of cor-
rect classifications (scenarios b and c versus d and e) did not
affect the
^
v estimate’s means, but it did significantly increase
the variance and so the CV of these estimates (Fig. 1).
Where there were different numbers of calls from the
four species, we again obtained unbiased estimates of the
true numbers of calls [Fig. 1(B)]. The CV on the estimates of
numbers of the more common species dropped (due to lower
variance coming from misclassifications of the rarer species)
but the CV of the estimates of the numbers of rare species
TABLE IV. Analytically derived mean expected values for the true number of calls, E[
^
v], and coefficient of variation (CV, expressed as a percentage).
Scenario 1 (balanced data) Scenario 2 (unbalanced data)
Confusion matrix SpA SpB SpC SpD SpA SpB SpC SpD
a 3000 (0%) 3000 (0%) 3000 (0%) 3000 (0%) 8000 (0%) 3000 (0%) 950 (0%) 50 (0%)
b 3000 (1.19%) 3000 (1.19%) 3000 (1.19%) 3000 (1.19%) 8000 (0.54%) 3000 (1.19%) 950 (3.34%) 50 (59.9%)
c 3000 (1.12%) 3000 (1.36%) 3000 (1.14%) 3000 (1.17%) 8000 (0.57%) 3000 (1.48%) 950 (2.91%) 50 (43.85%)
d 3000 (4.10%) 3000 (4.10%) 3000 (4.10%) 3000 (4.10%) 8000 (1.75%) 3000 (4.10%) 950 (12.13%) 50 (223.51%)
e 3000 (3.98%) 3000 (3.00%) 3000 (4.07%) 3000 (4.96%) 8000 (1.59%) 3000 (3.29%) 950 (10.66%) 50 (299.92%)
TABLE V. Simulation result, without uncertainty in the confusion matrix, of mean expected values for the true number of calls E[
^
v], and coefficient of varia-
tion (CV, expressed as a percentage).
Scenario 1 Scenario 2
SpA SpB SpC SpD SpA SpB SpC SpD
Scenario x.a 3000 (0%) 3000 (0%) 3000 (0%) 3000 (0%) 8000 (0%) 3000 (0%) 950 (0%) 50 (0%)
Scenario x.b 2999.93 (1.18%) 3000.12 (1.18%) 3000.01 (1.19%) 2999.94 (1.18%) 8000.37 (0.55%) 2999.47 (1.19%) 950.14 (3.67%) 50.02 (59.89%)
Scenario x.c 3000.69 (1.12%) 2998.99 (1.36%) 3000.14 (1.15%) 3000.18 (1.17%) 7999.46 (0.56%) 3000.40 (1.49%) 949.95 (2.94%) 50.19 (43.7%)
Scenario x.d 2999.87 (4.09%) 3001.49 (4.14%) 2998.55 (4.08%) 3000.09 (4.12%) 8000.74 (1.75%) 3000.72 (4.08%) 949.64 (12.14%) 48.90 (229.82%)
Scenario x.e 2997.28 (4.03%) 3002.00 (2.98%) 3000.30 (4.07%) 3000.41 (4.92%) 7999.63 (1.59%) 3000.88 (3.27%) 948.58 (10.69%) 50.92 (295.94%)
2472 J. Acoust. Soc. Am., Vol. 134, No. 3, Pt. 2, September 2013 Caillat et al.: The effects of misclassification
Author's complimentary copy
calls rose significantly, reaching over 200% with confusion
matrixes c and d (Fig. 2 and Table V).
When uncertainty in the confusion matrix was included,
the simulations again showed unbiased estimation of
^
v for
all the misclassification scenarios (Table VI and Table VII).
However, adding uncertainty to the confusion matrix gener-
ated a large increase in the CV due to an increase of the var-
iance (Fig. 3). With balanced data the CV, across all
scenarios, increased on average from 2% without uncertainty
to 11.7% with low uncertainty and to 87.7% with high uncer-
tainty [Fig. 3(A)].
With the unbalanced data the average CV across all sce-
narios for the common species (species A and B) increased
on average from 1.4% without uncertainty to 9% with low
uncertainty to 68.6% with high uncertainty in the confusion
matrix. For the rare species (species D) the average CV
across the five scenarios was at 124.9% without uncertainty
rising to 1009.3% with a low level of uncertainty and
7030.3% with a high level of uncertainty [Fig. 3(B)]. With
the high variability in the confusion matrix some individual
simulation results gave some negative estimates of
^
v, which
is clearly not possible with real data.
The presence of uncertainties in the confusion matrix
did not alter the fact that a confusion matrix with low mis-
classification will give a more precise estimation of
^
v than a
confusion matrix with a high misclassification rates (Tables
VI and VII).
V. DISCUSSION
Our results show that it is possible to derive unbiased
estimates the true number of detections of each species from
data containing misclassified acoustic detections. Howev er
the precision of the estimates is strongly related to the degree
of misclassification (Fig. 1) and the degree of uncertainty
within the confusion matrix (Fig. 3).
A low CV (<10%) on the estimated numbers of calls
can be achieved in some situations, such as when there are
FIG. 1. (Color online) Expected true number of detections for each species, from simulation without uncertainty within the confusion matrix: (A) for balanced
data scenarios Sc1a to Sc1e. (B) For unbalanced data scenarios Sc2a to Sc2b. Solid bars show the standard deviation and the dotted line the true number of
detections.
TABLE VI. Simulation result, with a low level of uncertainty in the confusion matrix, of mean expected values for the true number of calls E[
^
v], and coeffi-
cient of variation (CV, expressed as a percentage).
Scenario 1 Scenario 2
SpA SpB SpC SpD SpA SpB SpC SpD
Sc x.a 3000 (0%) 3000 (0%) 3000 (0%) 3000 (0%) 8000 (0%) 3000 (0%) 950 (0%) 50 (0%)
Sc x.b 3000.11 (6.51%) 3000.58 (6.58%) 2999.38 (6.61%) 2999.92 (6.54%) 8000.48 (4.60%) 2999.24 (8.57%) 949.98 (24.85%) 50.30 (467.87%)
Sc x.c 2999.72 (6.68%) 2999.89 (6.54%) 3000.13 (6.57%) 3000.25 (6.61%) 7999.70 (4.60%) 3000.05 (8.58%) 950.17 (25.19%) 50.07 (471.00%)
Sc x.d 3002.12 (22.90%) 2996.36 (22.77%) 3001.90 (22.25%) 2999.35 (22.81%) 7998.41 (14.47%) 3000.28 (30.81%) 950.71 (92.89%) 50.60 (1722.71%)
Sc x.e 2999.25 (21.00%) 2999.79 (17.48%) 2999.06 (21.97%) 3001.90 (28.79%) 8001.65 (13.42%) 2999.24 (19.90%) 950.78 (105.79%) 48.3 2 (2578.82%)
J. Acoust. Soc. Am., Vol. 134, No. 3, Pt. 2, September 2013 Caillat et al.: The effects of misclassification 2473
Author's complimentary copy
similar numbers of calls between species, a low misclassifica-
tion rate, and low uncertainty within the confusion matrix. In
cases where there are large differences in the numbers of
detected calls between species (scenarios 2.x), the uncertainty
is much higher on the estimates of the number of calls from
the rarer species. In the more optimistic scenarios (low mis-
classification rate and low uncertainty within the confusion
matrix), the CV for the common species A and B varied
between 0.55% to almost 9%. However, the CV rises close to
100% for less common species (species C) in scenarios with
a high rate of misclassification and low uncertainty for the
values of the confusion matrix. For species with a very low
encounter rate (Species D), even with a small level of uncer-
tainty and low misclassification rate, the CV is higher than
400%, reaching the value of 2500% with a high misclassifica-
tion rate. With uncertainties in the confusion matrix similar
to those observed in real data (Gillespie et al., 2013), the CV
is higher than 50%, even for common species, and the esti-
mate becomes totally uninformative for the rare species (CV
> 10 000%).
From our results it appears that uncertainty in the confu-
sion matrix is the parameter responsible of most of the var-
iance of the estimates. Indeed the average CV, across all
species and all misclassification rates, is 70 times higher
when a high level of uncertainty (average CV across 4 spe-
cies ¼ 1885) is assumed for the confusion matrix than where
there is no uncertainty in the confusion matrix (average CV
across 4 species ¼ 27). Whereas the average variance, across
all species and all levels of uncertainty within the confusion
matrix, is only 29 times higher for models with a high mis-
classification rate (mean CV ¼ 13 211) than for models with
a low misclassification rate (mean CV ¼ 450). A CV of 10%
on a density estimate is considered as very good, a CV of
20% as reasonable and a CV of 100% near useless (Thomas
and Marques, 2012). Particularly for rare species, CV’s are
often high, generally due to a low encounter rate. For exam-
ple, Hammond et al. (2002) used visual line transect distance
sampling methods to estimate the abundance of the relatively
common European harbor porpoise, Phocoena phocoena,
FIG. 2. (Color online) CV for unbalanced data for each scenario (Sc2b to
Sc2e), with different misclassification rates. The y axis is on the log10 scale.
TABLE VII. Simulation result, with a high level of uncertainty in the confusion matrix, of mean expected values for the true number of calls E[
^
v], and coefficient of variation (CV, expressed as a percentage).
Scenario 1 Scenario 2
SpA SpB SpC SpD SpA SpB SpC SpD
Sc x.a 3000 (0%) 3000 (0%) 3000 (0%) 3000 (0%) 8000 (0%) 3000 (0%) 950 (0%) 50 (0%)
Sc x.b 2999.94 (61.89%) 3000.14 (61.28%) 3000.02 (62.65%) 2999.90 (61.70%) 8000.04 (42.64%) 3000.03 (80.85%) 949.97 (226.46%) 49.94 (4485.69%)
Sc x.c 2999.96 (62.53%) 3000.00 (60.69%) 3000.06 (65.51%) 2999.98 (62.27%) 7999.79 (44.42%) 3000.11 (83.19%) 950.01 (236.21%) 50.08 (4490.55%)
Sc x.d 3000.26 (214.59%) 2999.97 (217.66%) 2999.84 (212.69%) 2999.94 (218.69%) 8000.43 (101.44%) 2999.42 (214.96%) 949.61 (646.61%) 50.53 (12 788.65%)
Sc x.e 2999.66 (195.02%) 2999.97 (164.58%) 3000.15 (200.83%) 3000.22 (274.79%) 8000.13 (93.18%) 2999.67 (138.27%) 949.83 (751.36%) 50.37 (16 944.28%)
2474 J. Acoust. Soc. Am., Vol. 134, No. 3, Pt. 2, September 2013 Caillat et al.: The effects of misclassification
Author's complimentary copy
with a CV of 14%, but the abundance of the rarer common
dolphin Delphinus delphis from the same survey, had a CV
of 67%. Gerrodette et al. (2011) estimated the abundance of
the extremely rare Vaquita Phocoena sinus in the Gulf of
California with a CV of 73%.
In this paper, we have only considered uncertainty in
estimates of the true number of detections due to misclassi-
fication. In practice, however, significant contributions to
the overall CV can be expected from the estimate of detec-
tion range, the encounter rate, and the estimate of vocaliza-
tion (cue) rate which is unknown for many species. Thomas
and Marques (2012) outline a number of methods for esti-
mating both det ection range and cue rate and the method
chosen will be dependent on both the species and the study
area.
Clearly the additional contributions to the overall CV of
an acoustic abundance estimate from both misclassification
and from uncertainty of the vocalization rate are important.
However, acoustic survey methods using fixed sensors can
often collect significantly more data than visual surveys,
which will reduce the contribution to the CV from the en-
counter rate.
If we consider the species for which the true number of
detection is estimated with a CV lower than 50% (for exam-
ple, common species A and B), we can hope that, despite
unavoidable misclassifications, acoustic detections provide
useful information. However for the rare species, a small
amount of misclassification from the more common species
can render the acoustic data useless for all practical purposes.
Uncertainty on the values of the confusion matrix
depends heavily on the amount and the quality of the
available training data. The more data available to train the
classifier, the more accurate are the statistics of the classified
sounds used in the whistle classifier and the uncertainty on
the values of the confusion matrix decreases. However,
whistle classifiers should ideally be trained usin g visually
confirmed data from the same study area since it is know
that different sub-populations of a species may produce sig-
nificantly different vocalizations (e.g., Rendell et al., 2006;
Riesch et al., 2006; May-Collado and Wartzok, 2008; Janik,
2009). When developing a classifier for use in a particular
study, there may therefore be a trade-off between the desire
to acquire as much data as possible from multiple studies,
possibly in different geographic areas and the desire to use a
smaller amount of locally acquired data.
Being able to know the true number of detections from
misidentified observed data is not a problem specific to the
cue counting method discussed in this paper. In the case of
estimating abundance of cetacean population using uniden-
tified acoustic cues, the first question will al ways be about
the true number of detections of each species, irrespective
of the specific survey method applied. Thus, at its root, the
problem considered here arises equally in any situation
where it is known that there is misclassification between
multiple species.
Since the uncertainty on the estimate of each species as
highly dependent on the presence of other species, incorpo-
rating information on the likely abundance of calls from
other species will hopefully lead to more robust estimates.
We are therefore developing a Bayesian model which incor-
porates prior information on the relative abundance of calls
from different species (based on previous survey work and
FIG. 3. (Color online) Mean CV across the five scenarios (A) Sc1a to Sc1e and (B) Sc2a to Sc2e) for each species and each level of uncertainty of the confu-
sion matrix values: no uncertainty, low uncertainty and high uncertainty. The y axis is on the log10 scale.
J. Acoust. Soc. Am., Vol. 134, No. 3, Pt. 2, September 2013 Caillat et al.: The effects of misclassification 2475
Author's complimentary copy
information on call rates) as well as the uncertainty on the
values in the confusion matrix.
ACKNOWLEDGMENTS
We particular ly thank Professor Peter Jupp for assis-
tance deriving the analytical approach given in Appendix.
This work was funded through the Natural Environment
Research Council and SMRU Ltd.
APPENDIX: ANALYTIC ESTIMATE OF THE BIAS AND
VARIANCE OF THE TRUE NUMBER OF DETECTED
CALLS WHEN THERE IS NO UNCERTAINTY IN THE
VALUES OF THE CONFUSION MATRIX
The notations used in this appendix are the same as the
notations defined in the main body of the text.
The mean of a multinomially distributed random vari-
able yMultinom(v,p)is(Royle and Dorazio, 2008).
E½y
j
¼vp
j
(A1)
with v being the numbers of trials and p the event
probabilities.
The expected value of a sum is equal to the sum of the
expected values
E
X
j¼1
Y
j

¼
X
j¼1
EðY
j
Þ: (A2)
In the following, these two expressions [Eqs. (A1) and (A2)]
are used to derive the expected values of
^
v.
Our model can be described as
E½
^
v¼E½C
1
n
¼ C
1
E½n (A3)
with m being the true number of detections, C being a con-
stant confusion matrix and n the observed detections.
Since n is a sum of several multinomial elements [Eq.
(7)] the latter is given by
n
i
¼ y
i1
þ y
i2
þ y
i3
þ y
i4
with y
:i
Multinom
j
ðv
j
; p
j
Þ;
E½n
i
¼
X
m
j¼1
Eðy
ij
Þ¼
X
m
j¼1
v
j
p
ij
: (A4)
The variance and covariance of a multinomial distribution
are (Royle and Dorazio 2008)
Varðy
j
Þ¼vp
j
ð1 p
j
Þ; (A5)
covðy
i
; y
j
Þ¼vp
i
p
j
: (A6)
In general, the variance/covariance of a matrix multiplying
an uncorrelated random variable Z is
covðCZÞ¼CcovðZÞC
T
: (A7)
With our model from Eq. (A7)
covð
^
vÞ¼covð C
1
nÞ¼c
1
covðnÞC
1
T
:
(A8)
Again identifying n as the sum of multinomial random varia-
bles, we have
covðnÞ
¼
varðn
i
Þ  covðn
m
; n
m
Þcovðn
1
; n
m
Þ
.
.
.
.
.
.
covðn
i
; n
1
Þ  varðn
j
Þcovðn
i
; n
j
Þ
.
.
.
.
.
.
covðn
m
; n
1
Þ  covðn
m
; n
j
Þvarðn
m
Þ
2
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
5
(A9)
with
varðn
i
Þ¼
X
m
j¼1
varðy
ij
Þ¼
X
m
j¼1
v
j
p
ij
ð1 p
ij
Þ (A10)
and
covðn
i
n
k
Þ¼
X
j
covðy
ij
; y
kj
Þ¼
X
j
v
j
p
ij
p
kj
: (A11)
Datta, S., and Sturtivant, C. (2002). “Dolphin whistle classification for deter-
mining group identities,” Signal Process. 82, 251–258.
Gelman, A. (2004). Bayesian Data Analysis (Chapman and Hall/CRC,
London), pp. 576–577.
Gerrodette, T., Taylor, B. L., Swift, R., Rankin, S., Jaramillo-Legorreta, A.
M., and Rojas-Bracho, L. (2011). “A combined visual and acoustic esti-
mate of 2008 abundance, and change in abundance since 1997, for the
vaquita, Phocoena sinus,” Mar. Mamm. Sci. 27, E79–E100.
Gillespie, D., Caillat, M., Gordon, J., and White, P. R. (2013). “Automatic
detection and classification of odontocete whistles,” J. Acoust. Soc. Am.
134, xxx–xxx.
Hammond, P. S., Berggren, P., Benke, H., Borchers, D. l., Collet, A., Heide-
Jørgensen, M. p., Heimlich, S., Hiby, A. R., Leopold, M. F., and Øien, N.
(2002). “Abundance of harbour porpoise and other cetaceans in the North
Sea and adjacent waters,” J. Appl. Ecol. 39, 361–376.
Janik, V. M. (2009). “Acoustic communication in delphinids,” Adv. Study
Behav. 40, 123–157.
Marques, T., Munger, L., Thomas, L., Wiggins, S., and Hildebrand, J.
(2011). “Estimating North Pacific right whale Eubalaena japonica density
using passive acoustic cue counting,” Endangered Species Res. 13,
163–172.
Marques, T. A., Thomas, L., Ward, J., DiMarzio, N., and Tyack, P. L.
(2009). “Estimating cetacean population density using fixed passive acous-
tic sensors: An example with Blainville’s beaked whales,” J. Acoust. Soc.
Am. 125, 1982–1994.
May-Collado, L. J., and Wartzok, D. (2008). “A comparison of bottlenose
dolphin whistles in the Atlantic Ocean: Factors promoting whistle var-
iation,” J. Mammal. 89, 1229–1240.
Oswald, J. N., Rankin, S., Barlow, J., and Lammers, M. O. (2007). “A tool
for real-time acoustic species identification of delphinid whistles,”
J. Acoust. Soc. Am. 122, 587–595.
Rendell, L. E., Matthews, J. N., Gill, A., Gordon, J. C. D., and Macdonald,
D. W. (2006). “Quantitative analysis of tonal calls from five odontocete
species, examining interspecific and intraspecific variation,” J. Zool. 249,
403–410.
Riesch, R., Ford, J. K. B., and Thomsen, F. (2006). “Stability and group
specificity of stereotyped whistles in resident killer whales, Orcinus orca,
off British Columbia,” Anim. Behav. 71, 79–91.
Royle, J. A., and Dorazio, R. M. (2008). Hierarchical Modeling and
Inference in Ecology: The Analysis of Data from Populations,
Metapopulations and Communities (Academic Press, Oxford, UK), pp 31.
Thomas, L., and Marques, T. A. (2012). “Passive acoustic monitoring for
estimating animal density,” Acoust. Today 8, 35–44.
2476 J. Acoust. Soc. Am., Vol. 134, No. 3, Pt. 2, September 2013 Caillat
et al.: The effects of misclassification
Author's complimentary copy
... Our findings add to the growing body of evidence that misidentification is pervasive in passive sampling of multiple, related animal species (e.g., Lukacs and Burnham 2005, Simons et al. 2007, McClintock et al. 2010b). Although we found misidentification rates to be relatively low for most of our ice seal species, even low levels of misidentification have been demonstrated to induce substantial biases in estimators of species distribution and abundance (e.g., McClintock et al. 2010a, Miller et al. 2011, Caillat et al. 2013, Conn et al. 2013a. Thus, even when these events are rare, it is important that analytical methods account for such errors. ...
... Historically, surveys required observers to visually identify and count target species during flight, but technological advances in digital imaging have facilitated the collection of large quantities of photographic data from aerial (e.g.,Conn et al. 2013b) and satellite (e.g.,LaRue et al. 2011, Fretwell et al. 2012 surveys, which have the benefit of documenting all observations for postprocessing, verification, and archiving. However, species and age class identification from aerial or satellite imagery can often be difficult (e.g.,O'Brien and Lindzey 1998, Fretwell et al. 2014), and if not accounted for, misidentification can result in unreliable inference about species distribution and abundance (e.g.,McClintock et al. 2010a, Miller et al. 2011, Caillat et al. 2013, Conn et al. 2013a. ...
Full-text available
Technical Report
Bearded, spotted, ribbon and ringed seals are key components of Arctic marine ecosystems and they are important subsistence resources for northern coastal Alaska Native communities. Although these seals are protected under the Marine Mammal Protection Act (MMPA) and bearded and ringed seals have been listed as threatened1 under the Endangered Species Act (ESA), no reliable, comprehensive abundance estimates are available for any of the species. Obtaining reliable abundance estimates for ice-associated seals is vital for developing sound plans for management, conservation, and responses to potential environmental impacts of oil and gas activities and climate change. The Bering Okhotsk Seal Surveys (BOSS) project addressed the most critical need for fundamental assessment data on iceassociated seals (also known as ice seals) in the Bering and Okhotsk Seas. Improved monitoring of ice seals is fundamental for the National Marine Fisheries Service (NMFS) to meet its management and regulatory mandates for stock assessments under the MMPA and extinction-risk assessments under the ESA. The best way to estimate the abundances of ice-associated seals is to conduct aerial photographic or sightings surveys during the reproductive and molting period when the geographic structure of the population reflects the breeding structure and the greatest proportions of the populations are hauled out on the ice and are available to be seen. The distributions of these seals are broad and patchy and so surveys must cover large areas. Similarly, the extent, locations, and conditions of the sea ice habitat change so rapidly that surveys must be conducted in a relatively short period of time. The expense and logistic complexity of these surveys have been the primary impediments to acquisition of comprehensive and reliable estimates, though the complexity of the seals’ behavior is also a factor. Scientists at the Polar Ecosystems Program of NOAA’s National Marine Mammal Laboratory, Alaska Fisheries Science Center, collaborated with colleagues from the State Research and Design Institute for the Fishing Fleet (“Giprorybflot”) in Saint Petersburg, Russia, to conduct synoptic aerial surveys of ice associated seals in the Bering and Okhotsk Seas. Conducting spring-time surveys in those areas will yield abundance estimates for the entire population of ribbon seals, and all but a small fraction of the spotted seal population. For bearded seals, the surveys included the large and important fraction of the population that overwinters and breeds in the Bering and Okhotsk Seas. The U.S. Bureau of Ocean Energy Management (BOEM) provided critical financial support in 2012 and 2013 to complete the U.S. surveys of the central and eastern Bering Sea. Surveys for the portions of the bearded and ringed seal populations that breed in the Chukchi and Beaufort seas will require separate and subsequent surveys, possibly with different seasonal timing. Two years of survey effort were required to achieve adequate precision for abundance estimates and to ensure that sufficient periods of suitable weather occurred during survey periods. Aerial surveys were conducted in spring 2012 and 2013. In the United States and Russia combined, the teams flew more than 47,000 nautical miles (nmi) (87,000 km) of survey track. The completion of this project marks the largest survey of ice-associated seals ever completed and will provide the first comprehensive estimates of abundance for bearded, spotted, ribbon, and ringed seals in the Bering Sea and Sea of Okhotsk. Analysis of full data sets from both years indicate substantial annual variation in numbers of seals in the U.S. portion of the Bering Sea during April and early May. Model-averaged estimates in 2012 were 240,000 spotted seals, 117,000 ribbon seals, 170,000 bearded seals, and 186,000 ringed seals. In contrast, the estimates for 2013 were lower: 163,000 spotted seals, 38,000 ribbon seals, 125,000 bearded seals, and 119,000 ringed seals. Seals may have been distributed farther to the west in 2013 (i.e. more in Russian waters), but there is substantial uncertainty about ribbon and spotted seal numbers in 2012 because weather constraints prohibited us from conducting many flights over the southwest portion of our study (at the ice edge) where densities were the highest. Based on the proportions of 165 seals instrumented with satellite tags in separate studies, we estimate that 69,000-101,000 (42%) of spotted seals and 6,000-25,000 (21%) of ribbon seals that occupy the eastern (U.S.) Bering Sea in spring used the Chukchi Sea during the summer, open-water period in 2013 and 2012, respectively.
... Historically, surveys required observers to visually identify and count target 1 species during flight, but technological advances in digital imaging have facilitated the collection of large quantities of photographic data from aerial (e.g., Conn et al. 2013b) and satellite (e.g., LaRue et al. 2011, Fretwell et al. 2012 surveys, which have the benefit of documenting all observations for postprocessing, verification, and archiving. However, species and age class identification from aerial or satellite imagery can often be difficult (e.g., O'Brien andLindzey 1998, Fretwell et al. 2014), and if not accounted for, misidentification can result in unreliable inference about species distribution and abundance (e.g., McClintock et al. 2010a, Miller et al. 2011, Caillat et al. 2013, Conn et al. 2013a. ...
... Our findings add to the growing body of evidence that misidentification is pervasive in passive sampling of multiple, related animal species (e.g., Lukacs and Burnham 2005, Simons et al. 2007, McClintock et al. 2010b). Although we found misidentification rates to be relatively low for most of our ice seal species, even low levels of misidentification have been demonstrated to induce substantial biases in estimators of species distribution and abundance (e.g., McClintock et al. 2010a, Miller et al. 2011, Caillat et al. 2013, Conn et al. 2013a. Thus, even when these events are rare, it is important that analytical methods account for such errors. ...
Article
Technological advances have facilitated collection of vast quantities of photographic data from aerial surveys of marine mammals. However, when it is difficult to distinguish species from a distance, reliable identification from aerial images can often be challenging. This is the case for ice-associated seals, species for which global climate change has motivated intensive monitoring efforts in recent years. We assess species and age class identification from aerial images of four ice seal species (bearded seals, Erignathus barbatus; ribbon seals, Histriophoca fasciata; ringed seals, Pusa hispida; spotted seals, Phoca largha) in the Bering Sea. We also investigate the specific phenomenological and behavioral traits commonly associated with species identification and observer confidence. We generally found species and age class misidentification occurred at relatively low levels, but only 83% of spotted seals tended to be correctly identified (with 11% mistaken as ribbon seals). We also found certain traits were strong predictors for observed species, age class, or observer confidence. Our findings add to the growing body of evidence that species misidentification is pervasive in passive sampling of animal populations. Even low levels of misidentification have been demonstrated to induce substantial biases in estimators of species distribution and abundance, and it is important that statistical models account for such errors.
... However, the data collected often produces uncertainty in the species classification of recorded individuals (e.g. Walters et al. 2012;Caillat, Thomas & Gillespie 2013). The ability to robustly analyse datasets with uncertain species identification could enable economically efficient technological monitoring techniques to be successfully deployed to address a wide range of important ecological questions. ...
... Although Runge, Hines & Nichols (2007) account for species uncertainty in estimates of survival and some studies have considered the impact of species mis-identification on estimates of abundance (Caillat, Thomas & Gillespie 2013;Conn et al. 2013). ...
Article
1.Many emerging methods for ecological monitoring use passive monitoring techniques, which cannot always be used to identify the observed species with certainty. Digital aerial surveys of birds in marine areas are one such example of passive observation and they are increasingly being used to quantify the abundance and distribution of marine birds to inform assessments of impact for proposed offshore wind developments. However, the uncertainty in species identification presents a major hurdle to determining the abundance and distribution of individual species.2.Using a novel analytical approach, we combined data from two surveys in the same area: aerial digital imagery that identified only 23% of individuals to species level and boat survey records that identified 95% of individuals to species level. The datasets were analysed to estimate the effects of environmental covariates on species density and to produce species-specific estimates of population size.3.For each digital aerial observation without certain species identification, randomised species assignments were generated using the observed species proportions from the boat surveys. For each species, we modelled several random realisations of species assignments and produced a density surface from the ensemble of models. The uncertainty from each stage of the process was propagated, so that final confidence limits accounted for all sources of uncertainty, including species identification.4. In the breeding season several species had higher densities near colonies and this pattern was clearest for three auk species. Sandeel density was an important predictor of the density of several gull species.5.Synthesis and applications. This method shows it is possible to construct maps of species density in situations in which ecological observations cannot be identified to species level with certainty. The increasing use of passive detection methods is providing many more datasets with uncertain species identification and this method could be used with these data to produce species-specific abundance estimates. We discuss the advantages of this approach for estimating the abundance and distribution of birds in marine areas, particularly for quantifying the impacts of offshore renewable developments by making the estimates derived from the older digital surveys more comparable to the recently improved surveys.This article is protected by copyright. All rights reserved.
... The manual matching and detection process used in this work was labor intensive and, while important for this initial demonstration, may be difficult to implement for a longerduration study. However, it would be possible to extend the method to account for potential mis-associations (Caillat et al., 2013). A fully automated cross correlation process to match tracked fin whales to fin whales detected on the glider was not possible, primarily because there was such an abundance of fin whale pulses, with many instances of multiple animals detectable and some very distant pulses and multipath detections. ...
Article
A single-hydrophone ocean glider was deployed within a cabled hydrophone array to demonstrate a framework for estimating population density of fin whales ( Balaenoptera physalus) from a passive acoustic glider. The array was used to estimate tracks of acoustically active whales. These tracks became detection trials to model the detection function for glider-recorded 360-s windows containing fin whale 20-Hz pulses using a generalized additive model. Detection probability was dependent on both horizontal distance and low-frequency glider flow noise. At the median 40-Hz spectral level of 97 dB re 1 μPa ² /Hz, detection probability was near one at horizontal distance zero with an effective detection radius of 17.1 km [coefficient of variation (CV) = 0.13]. Using estimates of acoustic availability and acoustically active group size from tagged and tracked fin whales, respectively, density of fin whales was estimated as 1.8 whales per 1000 km ² (CV = 0.55). A plot sampling density estimate for the same area and time, estimated from array data alone, was 1.3 whales per 1000 km ² (CV = 0.51). While the presented density estimates are from a small demonstration experiment and should be used with caution, the framework presented here advances our understanding of the potential use of gliders for cetacean density estimation.
... Performance on that class would likely be improved by augmenting the training set with additional examples from other deployments. Misclassification of rare species is an important issue, as it can lead to significant errors in later steps such as density estimation [55]. Manual review and editing of the labels using a batch review tool such as PAMGuard [56] or DetEdit [6] remains particularly important in these cases. ...
Full-text available
Article
Machine learning algorithms, including recent advances in deep learning, are promising for tools for detection and classification of broadband high frequency signals in passive acoustic recordings. However, these methods are generally data-hungry and progress has been limited by challenges related to the lack of labeled datasets adequate for training and testing. Large quantities of known and as yet unidentified broadband signal types mingle in marine recordings, with variability introduced by acoustic propagation, source depths and orientations, and interacting signals. Manual classification of these datasets is unmanageable without an in-depth knowledge of the acoustic context of each recording location. A signal classification pipeline is presented which combines unsupervised and supervised learning phases with opportunities for expert oversight to label signals of interest. The method is illustrated with a case study using unsupervised clustering to identify five toothed whale echolocation click types and two anthropogenic signal categories. These categories are used to train a deep network to classify detected signals in either averaged time bins or as individual detections, in two independent datasets. Bin-level classification achieved higher overall precision (>99%) than click-level classification. However, click-level classification had the advantage of providing a label for every signal, and achieved higher overall recall, with overall precision from 92 to 94%. The results suggest that unsupervised learning is a viable solution for efficiently generating the large, representative training sets needed for applications of deep learning in passive acoustics.
... Spending sufficient time for fieldwork recording animals during all aspects of their behavioural repertoire, including foraging, socializing and resting, and across a range of group sizes seems crucial to fully describe their acoustic repertoire. This is important for correct species classification using passive acoustic monitoring and related density and abundance estimation (Caillat et al. 2013), as NBHF species are distributed globally and generally are sympatric with broadband toothed whale species (e.g. Heinrich et al. 2010). ...
Full-text available
Article
Toothed whales use powerful ultrasonic biosonar pulses (i.e. clicks) for echolocation. Underwater acoustic recordings have suggested that the majority of toothed whale species can be grouped acoustically as either producing broadband clicks or narrowband high-frequency (NBHF) clicks. Recently, it has been shown that Heaviside’s dolphins, Cephalorhynchus heavisidii, emit NBHF clicks for echolocation but also clicks of lower frequency and broader bandwidth for communication. Here, we use acoustic recorders and drone video footage to reinforce previous findings that Commerson’s dolphins (C. commersonii) produce signals similar to Heaviside’s dolphins. We reveal that they use clicks with a lower frequency and broader bandwidth in the form of click trains and burst-pulses. These sounds were not recorded in the presence of smaller groups of Commerson’s dolphins, indicating that they may fulfil a communication function in larger groups. Also, we utilised a novel combination of drone video footage paired with underwater acoustic recordings to estimate the source level of echolocation clicks produced by Commerson’s dolphins. In addition, we compare the acoustic signals produced by Commerson’s and Heaviside’s dolphins to identify interspecific similarities and differences. Spectral differences were found in NBHF click trains, buzzes and burst-pulses between species; however, bandwidth and duration parameters were not significantly different for broadband click trains. Our findings make it likely that all four species of the Cephalorhynchus genus have the ability to generate both signal types, and further challenges the evolutionary concept of NBHF signal production. Significance statement This study confirms the presence of a duel echolocation click (i.e. biosonar) strategy in Commerson’s dolphins, making them the second species of their genus known to produce two types of biosonar. We provide an in-depth quantitative analysis of Commerson’s dolphin acoustic signal types, and include a comparison of signal types between Commerson’s dolphins and the other species known to produce two types of biosonar, the Heaviside’s dolphin. In addition, this is the first study to combine drone footage with underwater acoustic recordings to measure the source level of toothed whale echolocation signals. We use this novel technique to provide source levels measured from Commerson’s dolphin echolocation clicks which are comparable to published values for this species calculated using an expensive and complicated array of hydrophones. Thus, we provide a simpler and more cost effective way to study sounds produced by marine mammals.
... This enhancement can prove significant in situations where a mix of rare and abundant species are present. 13 The goal of this study is thus to improve on the whistle extraction process, by casting it into a MTT framework, which allows for simultaneous tracking of multiple objects of interest from the noisy measurements in the presence of missed detections, and false alarms (i.e., clutter, additional measurements not generated by a whistle). Additionally, in contrast to the majority of automated methods for whistle contour extraction, the MTT accounts for the time-varying number of whistles by modelling their birth (when a whistle starts) and their death (when a whistle ends). ...
Full-text available
Article
The need for automated methods to detect and extract marine mammal vocalizations from acoustic data has increased in the last few decades due to the increased availability of long-term recording systems. Automated dolphin whistle extraction represents a challenging problem due to the time-varying number of overlapping whistles present in, potentially, noisy recordings. Typical methods utilize image processing techniques or single target tracking, but often result in fragmentation of whistle contours and/or partial whistle detection. This study casts the problem into a more general statistical multi-target tracking framework, and uses the probability hypothesis density (PHD) filter as a practical approximation to the optimal Bayesian multi-target filter. In particular, a particle version, referred to as a Sequential Monte Carlo PHD (SMC-PHD) filter, is adapted for frequency tracking and specific models are developed for this application. Based on these models, two versions of the SMC-PHD filter are proposed and their performance is investigated on an extensive real-world dataset of dolphin acoustic recordings. The proposed filters are shown to be efficient tools for automated extraction of whistles, suitable for real-time implementation.
... Large-scale PAM arrays (Carlén et al., 2018) can sample the spatial and temporal distribution of animals, potentially in tandem with environmental forcing factors, including human disturbance (McCarthy et al., 2011). However, classification is challenging for species with similar calls or when dealing with rare species (Caillat, Thomas, & Gillespie, 2013). Deep learning and engaging with citizen scientists are promising avenues for improving the cost-effectiveness of acoustic data processing. ...
Article
Earth‐based observations of the biosphere are spatially biased in ways that can limit our ability to detect macroecological patterns and changes in biodiversity. To resolve this problem, we need to supplement the ad hoc data currently collected with planned biodiversity monitoring, in order to approximate global stratified random sampling of the planet. We call this all‐encompassing observational system ‘the macroscope’. With a focus on the marine realm, we identify seven main biosphere observation tools that compose the macroscope: satellites, drones, camera traps, passive acoustic samplers, biologgers, environmental DNA and human observations. By deploying a nested array of these tools that fills current gaps in monitoring, we can achieve a macroscope fit for purpose and turn these existing powerful tools into more than the sum of their parts. Building a macroscope requires commitment from many fields, together with coordinated actions to attract the level of funding required for such a venture. We call on macroecologists to become advocates for the macroscope and to engage with existing global observation networks.
... Other species of cetacean may echolocate less predictably or not at all (Van Parijs et al. 2009), and it is not currently possible to discriminate between different dolphin species based upon their click characteristics (Thompson, Brookes & Cordes 2015). Species misidentification from acoustic detections is not a problem in this study because only one species of porpoise is found in the study area; however, further research would be necessary to discriminate between species in other areas (Caillat, Thomas & Gillespie 2013). The coloration and small size of harbour porpoises also makes them relatively straightforward to identify from digital images. ...
Full-text available
Article
1.Robust estimates of the density or abundance of cetaceans are required to support a wide range of ecological studies and inform management decisions. Considerable effort has been put into the development of line-transect sampling techniques to obtain estimates of absolute density from aerial and boat-based visual surveys. Surveys of cetaceans using acoustic loggers or digital cameras provide alternative methods to estimate relative density that have the potential to reduce cost and provide a verifiable record of all detections. However the ability of these methods to provide reliable estimates of relative density has yet to be established.
Full-text available
Article
We present a method for estimating animal density from fixed passive acoustic detectors, and illustrate it by estimating the density of North Pacific right whales Eubalaena japonica in the areas surrounding 3 hydrophones deployed in the southeastern Bering Sea in 2001 to 2002 and 2005 to 2006. Input data were the distances to detected right whale calls, estimated using a normal mode sound propagation model, and call production rate, estimated from encounters by survey vessels with right whale groups. Given the scarcity of information about this highly endangered species, we also extrapolate our results to provide a tentative estimate of the total population size in shelf waters of the eastern Bering Sea. This gives a point estimate of 25 animals (CV 29.1%; 95% confidence interval 13-47), which agrees well with what little is known for this species. We discuss the assumptions underlying the method. Obtaining more reliable values requires a larger sample of randomly located hydrophones, together with improved estimates of call rate.
Full-text available
Article
Methods for the fully automatic detection and species classification of odontocete whistles are described. The detector applies a number of noise cancellation techniques to a spectrogram of sound data and then searches for connected regions of data which rise above a pre-determined threshold. When tested on a dataset of recordings which had been carefully annotated by a human operator, the detector was able to detect (recall) 79.6% of human identified sounds that had a signal-to-noise ratio above 10 dB, with 88% of the detections being valid. A significant problem with automatic detectors is that they tend to partially detect whistles or break whistles into several parts. A classifier has been developed specifically to work with fragmented whistle detections. By accumulating statistics over many whistle fragments, correct classification rates of over 94% have been achieved for four species. The success rate is, however, heavily dependent on the number of species included in the classifier mix, with the mean correct classification rate dropping to 58.5% when 12 species were included.
Full-text available
Article
Whistles are narrowband, frequency-modulated sounds produced by many cetaceans. Whistles are extensively studied in delphinids, where several factors have been proposed to explain between- and within-species variation. We examined factors associated with geographic variation in whistles of common bottlenose dolphins (Tursiops truncatus) by assessing the role of ambient noise, noise from boats, and sympatry with other dolphin species, and reviewing and comparing whistle structure across populations in the western and eastern Atlantic Ocean. Whistles of adjacent populations differed, particularly in frequency parameters. A combination of factors may contribute to microgeographic whistle variation, including differences in ambient noise levels (dolphins produced relatively higher frequency whistles in the noisiest habitat), and differences in number of boats present (when multiple boats were present, dolphins whistled with greater frequency modulation and whistles were higher in maximum frequency and longer than when a single boat was present). Whistles produced by adjacent populations were relatively similar in structure. However, for clearly separated populations, the distance between them did not relate directly to whistle structure. We propose that plasticity in bottlenose dolphin whistles facilitates adaptation to local and changing conditions of their habitat, thus promoting variation between populations at different geographic scales.
Full-text available
Article
AbstractA line-transect survey for the critically endangered vaquita, Phocoena sinus, was carried out in October–November 2008, in the northern Gulf of California, Mexico. Areas with deeper water were sampled visually from a large research vessel, while shallow water areas were covered by a sailboat towing an acoustic array. Total vaquita abundance in 2008 was estimated to be 245 animals (CV = 73%, 95% CI 68–884). The 2008 estimate was 57% lower than the 1997 estimate, an average rate of decline of 7.6%/yr. Bayesian analyses found an 89% probability of decline in total population size during the 11 yr period, and a 100% probability of decline in the central part of the range. Acoustic detections were assumed to represent porpoises with an average group size of 1.9, the same as visual sightings. Based on simultaneous visual and acoustic data in a calibration area, the probability of detecting vaquitas acoustically on the trackline was estimated to be 0.41 (CV = 108%). The Refuge Area for the Protection of the Vaquita, where gill net fishing is currently banned, contained approximately 50% of the population. While animals move in and out of the Refuge Area, on average half of the population remains exposed to bycatch in artisanal gill nets.
Book
A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics * Wide variety of examples involving many taxa (birds, amphibians, mammals, insects, plants) * Development of classical, likelihood-based procedures for inference, as well as Bayesian methods of analysis * Detailed explanations describing the implementation of hierarchical models using freely available software such as R and WinBUGS * Computing support in technical appendices in an online companion web site.
Article
Whistle vocalizations of five odontocete cetaceans, the false killer whale P. crassidens, short-finned pilot whale G. macrorhynchus, long-finned pilot whale G. melas, white-beaked dolphin L. albirostris and Risso's dolphin G. griseus, were analysed and summarized quantitatively. Recordings were acquired from a number of locations and encounters. Significant differences were found between species and, to a lesser extent, between locations. The calls of the two pilot whale species are distinct despite their close relatedness, and similar size and morphology. This may be due to selection pressures to maintain distinctiveness. The variance was partitioned into between-species, between-location (within species) and within-location factors. For the frequency variables, variation between-species is high relative to variation between locations. Thus geographic variation is a relatively minor effect, compared to the many processes which cause interspecific differences. The within-location component includes such factors as social context, behaviour and group composition. This component is of a similar magnitude to the between-species component, indicating that whistles vary considerably with these factors. Significant between-location differences may be attributable to these confounding factors. For whistle duration, most of the variation occurred within location. There is less significant variation in duration across species compared with the frequency measures. This study highlights the need to collect samples across all potential strata whenever possible, and provides a framework for future, more comprehensive work.
Article
Delphinid communication has been shaped by the marine environment. This resulted in specific adaptations such as echolocation and a sophisticated communication system that allows animals to maintain contact over several kilometers even if no other cues are available. The communication system of delphinids is characterized by large call repertoires, recognition calls shaped by vocal learning, and a great plasticity of the vocal repertoire. Delphinids also display complex cognitive skills that influence how they use communication signals. Complex social systems provide opportunities to apply these skills. Most of our knowledge on delphinid communication comes from studies on bottlenose dolphins and killer whales. Future studies need to focus on additional species and try to assess the threat imposed by anthropogenic noise on the communication behavior of delphinids.
Article
Resident killer whales off British Columbia form four acoustically distinct clans, each with a unique dialect of discrete pulsed calls. Three clans belong to the northern and one to the southern community. Resident killer whales also produce tonal whistles, which play an important role in close-range communication within the northern community. However, there has been no comparative analysis of repertoires of whistles across clans. We investigated the structural characteristics, stability and group specificity of whistles in resident killer whales off British Columbia. Acoustic recordings and behavioural observations were made between 1978 and 2003. Whistles were classified spectrographically and additional observers were used to confirm our classification. Whistles were compared across clans using discriminant function analysis. We found 11 types of stereotyped whistles in the northern and four in the southern community with some of the whistle types being stable over at least 13 years. In northern residents, 10 of the 11 whistle types were structurally identical in two of the three acoustic clans, whereas the whistle types of southern residents differed clearly from those of the northern residents. Our study shows that killer whales that have no overlap in their call repertoire use essentially the same set of stereotyped whistles. Shared stereotyped whistles might provide a community-level means of recognition that facilitates association and affiliation of members of different clans, which otherwise use distinct signals. We further suggest that vocal learning between groups plays an important role in the transmission of whistle types.