PreprintPDF Available

Whistles characterisation using artificial intelligence: responses of short-beaked common dolphins (Delphinus delphis) to a bio-inspired acoustic mitigation device

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Understanding cetacean whistles is crucial for assessing their social interactions, behaviors, and responses to anthropicactivities. However, to detect and dissociate different kinds of whistles within acoustic records remains challenging. Wedeveloped an innovative semi-automatic deep learning approach (DYOC) to rapidly extract whistle contours from audiorecordings, using YOLOv8m for detection and ResNet18 for identification. Applied to 808 minutes of audio recordings of wildfree-ranging short-beaked common dolphin from the Bay of Biscay, France, DYOC enabled the annotation of 8,730 contours6 times faster than manual annotation. Their features (such as duration, frequency range, and number of inflections) werethen compared based on dolphin behavior, presence of fishing nets, and the DOLPHINFREE acoustic beacon’s influence.Beacon activation led to significant frequency shifts and lower Signal-to-Noise Ratios, while during activation and deactivationphases, whistles were longer with more inflections. A dimension reduction technique (UMAP) revealed gradients betweenarchetypal whistle shapes. This study provides the first characterisation of whistle features for a population of short-beakedcommon dolphins in the Bay of Biscay. The proposed methodological approach has the potential to be applied to the study ofwhistles across a wide range of research areas, species and applications related to animal behaviour.
Whistles characterisation using articial
intelligence: responses of short-beaked common
dolphins (Delphinus delphis) to a bio-inspired
acoustic mitigation device
Loïc Lehnhoff
Marine Biodiversity Exploitation and Conservation
Hervé Glotin
Laboratoire d’Informatique et Systèmes
Yves Le Gall
French Research Institute for the Exploitation of the Sea IFREMER
Hélène Peltier
Observatoire Pélagis
Alain Pochat
SAS Ocean technology (OCTECH)
Krystel Pochat
SAS Ocean technology (OCTECH)
Olivier Van Canneyt
Observatoire Pélagis
Bastien Mérigot
Marine Biodiversity Exploitation and Conservation
Article
Keywords:
Posted Date: October 24th, 2024
DOI: https://doi.org/10.21203/rs.3.rs-5234650/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License. 
Read Full License
Additional Declarations: No competing interests reported.
Whistles characterisation using artificial intelligence:
1
responses of short-beaked common dolphins2
(Delphinus delphis) to a bio-inspired acoustic3
mitigation device4
Lo¨ıc Lehnhoff1,2,6,*, Herv ´
e Glotin2,6, Yves Le Gall3, Eric Menut3, H ´
el`
ene Peltier4, Alain5
Pochat5, Krystel Pochat5, Olivier Van Canneyt4, and Bastien M´
erigot1,6
6
1MARBEC, Universit´
e de Montpellier, CNRS, IFREMER, IRD, S`
ete, 34200, France7
2Universit´
e de Toulon, Aix Marseille Univ, CNRS, LIS DYNI,Toulon, France8
3French Research Institute for Exploitation of the Sea (IFREMER), Centre Bretagne, Plouzan ´
e, 29280, France9
4Observatoire Pelagis (UAR 3462), La Rochelle Universit ´
e, CNRS, La Rochelle, 17000, France10
5SAS Ocean technology (OCTECH), Brest, 29200, France11
6Int. Center of AI for Natural Acoustics https://cian.univ-tln.fr12
*loic.lehnhoff@gmail.com13
ABSTRACT
14
Understanding cetacean whistles is crucial for assessing their social interactions, behaviors, and responses to anthropic
activities. However, to detect and dissociate different kinds of whistles within acoustic records remains challenging. We
developed an innovative semi-automatic deep learning approach (DYOC) to rapidly extract whistle contours from audio
recordings, using YOLOv8m for detection and ResNet18 for identification. Applied to 808 minutes of audio recordings of wild
free-ranging short-beaked common dolphin from the Bay of Biscay, France, DYOC enabled the annotation of 8,730 contours
6 times faster than manual annotation. Their features (such as duration, frequency range, and number of inflections) were
then compared based on dolphin behavior, presence of fishing nets, and the DOLPHINFREE acoustic beacon’s influence.
Beacon activation led to significant frequency shifts and lower Signal-to-Noise Ratios, while during activation and deactivation
phases, whistles were longer with more inflections. A dimension reduction technique (UMAP) revealed gradients between
archetypal whistle shapes. This study provides the first characterisation of whistle features for a population of short-beaked
common dolphins in the Bay of Biscay. The proposed methodological approach has the potential to be applied to the study of
whistles across a wide range of research areas, species and applications related to animal behaviour.
15
Introduction16
Sounds are extensively used by cetacean species in all aspects of their lives. Echolocation clicks are mainly used by odontocetes
17
for spatial orientation
13
and prey detection
4
, but can also be used in social interactions in the form of buzzes
5
. On the other
18
hand, whistles are used most often to communicate between individuals
68
during social interactions. Whistles are stationary
19
signals, meaning that they are continuous frequency-modulated narrow-band signals. For cetaceans, their duration varies from
20
about 0.1 to 5 seconds, with very distinct frequency ranges depending on species. For example, whales are known to emit
21
low-frequency whistles (
<
200 Hz
9
), while delphinids emit whistles at higher frequencies (1–25 kHz
10
). Traditionally, whistle
22
contours are annotated manually in order to study them in detail. However, this initial step is particularly time-consuming and
23
repetitive, making it an ideal task for the development of Artificial Intelligence (AI) models and other algorithms with the
24
aim of facilitating or speeding up the process. Several approaches already exist: based on deep learning models (e.g.
1114
) or
25
not (e.g.
1520
). While these methods facilitate the initial annotation of whistles, they are not equally effective on all datasets.
26
Contour extraction is a difficult task, even for a human annotator, and further developments are still needed to improve the
27
identification and characterisation of whistles in audio recordings.28
Whistles of delphinids have been extensively studied through research carried out on the common bottlenose dolphin
29
Tursiops truncatus (Montagu, 1821)
21
, with numerous experiments conducted on both captive and free-ranging individuals.
30
Whistles of other delphinid species are relatively less well-documented. Those of the short-beaked common dolphin Delphinus
31
delphis (Linnaeus, 1758) were first described from audio recordings made on captive dolphins off the coast of California
6
.
32
Since then, many studies have described the repertoire of this species
22,23
for a variety of populations. The main results from
33
past literature showed that those whistles vary for neighbouring and distant populations of common dolphin subspecies
2429
34
and other variables. Short-beaked common dolphins located in South Africa and in the Western North Atlantic Ocean are also
35
believed to produce signature whistles similar to those described for common bottlenose dolphins
30,31
, with highly stereotyped
36
shapes.37
In the Bay of Biscay, France, short-beaked common dolphins are threatened by fishery bycatch with worrying levels of
38
strandings since 2016
32
. The latest reliable estimates show that 8,950 (CI95% [6,710; 12,630]) individuals died as a result of
39
bycatch in 2021
33
. While the number of short-beaked common dolphins in the Bay of Biscay is estimated at around 634,286
40
(CI95%: 352,227 - 1,142,213) individuals
32
, with variations depending on year and sources
3234
, some models are rather
41
pessimistic concerning the long term state of this population
10
. In response, numerous projects have been launched in an
42
attempt to find ways of keeping these dolphins away from fishing nets. During the DOLPHINFREE project (“Dolphins free
43
from fishery bycatch”), a bio-inspired acoustic beacon prototype, emitting an informative signal, was developed and tested at
44
sea from research boats and onboard fishing vessels35
45
As part of the DOLPHINFREE project, we recorded the signals emitted by wild free-ranging short-beaked common
46
dolphins in Brittany, France
35
. The whistle repertoire of this population is poorly known, even though it could provide insights
47
into dolphin social behaviours, group dynamics, responses to environment stimuli and individual recognition
24,25,28,36
, and
48
could even be linked to bycatch events
37
. We therefore aimed at conducting a comprehensive analysis of the whistles of these
49
animals as part of the DOLPHINFREE experiments.50
However, annotating animal vocalisations that vary in time and frequency is a difficult and time-consuming task, regardless
51
of the species being studied. As stated above, further developments are still needed to improve the accuracy of models used for
52
the identification of whistles in audio recordings.53
We propose an innovative methodological approach that combines AI and human interventions to focus on the acceleration
54
of the annotation process of whistles in audio recordings. During the development of this semi-automatic process, we sought
55
for a compromise between accuracy and complexity, in order to be able to annotate acoustic data as easily as possible, without
56
too many errors. This novel approach was applied to the dataset collected during the DOLPHINFREE experiments. The aim
57
was to investigate the dolphins’ whistling behaviours in response to the DOLPHINFREE bio-inspired beacon, the presence
58
or absence of a fishing net, and the animal’s behavioural states when conducting experiments. The approach enabled us to
59
annotate whistle contours in audio recordings 6 times faster than using manual annotation, to extract their features, and to
60
assess the effects of a bio-inspired acoustic beacon on these features.61
Methods62
Data collection63
Experiments were conducted with wild free-ranging short-beaked common dolphins at sea off the coast of Brittany (France)
64
during the summers of 2020, 2021
35
and 2022 (see map in Figure 1). Experiments were conducted in accordance with relevant
65
guidelines and regulations (see institutional review statement).66
2/17
Figure 1. Map of dolphin encounters during the sampling campaigns of the DOLPHINFREE project (2020-2022).
Opportunistic encounters enabled us to test prototypes of a bio-inspired beacon for the DOLPHINFREE project. Tests
67
involved visual observations from the boat and the recording of acoustic signals emitted by dolphins. They were based on the
68
observation of three main variables: behavioural state of the dolphins (attracted to the boat, foraging, socialising, travelling
69
or milling/resting), absence/presence of a fishing net, and activation state of the beacon. These experiments are described in
70
detail in
35
. A total of 808 minutes of audio recording were collected using a single hydrophone ’Ocean Sonics icListen HF’
71
(dynamic range: 118 dB, sensitivity: -170 dBV re. µPa) with 512 kHz sampling rate and 32-bits depth. We chose to group
72
observations and audio recordings into 1-minute bin intervals as a more practical solution. Thus, each audio file is associated
73
with the behavioural state observed during its recording, as well as with the activation state of the DOLPHINFREE acoustic
74
beacon during that minute: off (before or after its activation), on (while activated) or switched on/off during a recording (which
75
is the moment when the beacon’s signal is introduced into the acoustic environment).76
Data annotation77
Annotating all the whistles in our whole dataset manually would be very time-consuming. In this context, AI based models
78
are of particular interest. Supervised learning algorithms can be trained on a small sample of a dataset in order to make
79
predictions on the whole dataset. Therefore, we annotated the whistle contours of a small sample of the DOLPHINFREE
80
dataset (43 files or
5% of the recordings, chosen arbitrarily) using a custom-made annotation tool
38
. Annotations were made
81
on spectrograms computed from raw audio recordings resampled at 96,000 kHz, with a frame size of 1024 samples (11 ms) and
82
a hop length of 512 samples (5 ms), represented on a linear frequency scale. Dolphins often vocalise in groups, with whistles of
83
different individuals often overlapping. Therefore, during the annotation process, the bins of the spectrograms were limited
84
to a minimum value of -60 dBFS in order to only consider the whistles of highest energy for contour extraction. Only the
85
fundamental frequencies of the whistles were annotated manually, resulting in a dataset with a total of 1,750 whistles. In order
86
to avoid training on fragments of whistles and to reduce overfitting, a hard threshold was applied in order to keep only the
87
whistles with a duration >100 ms and a Signal-to-Noise Ratio >20 dB. 431 whistles corresponding to these constraints were
88
kept (distributions shown in Figure 2). Signal-to-Noise Ratio (SNR) is computed as the difference (in dB) between the mean
89
level of a signal and the mean level of ambient noise, for a similar time-frequency frame.90
3/17
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Duration (s)
0
10
20
30
40
50
SNR (dB)
Prediction mean
Standard error
Selection for training
Figure 2. Distribution of Signal-to-Noise Ratios (SNRs) and durations of annotated whistles (N = 1750). Green areas show
which whistles were selected for model training. Red curve shows GLM fit.
Note that there is a relation between the two criteria chosen for whistle selection: longer whistles tend to have higher SNRs
91
(GLM, z-value=5.5, d.f.=1610, p-value<0.01).92
The ”Draw Your Own Contours” (DYOC) extraction method93
The detection and extraction of whistle contours is challenging and requires a model to achieve satisfactory performance in both
94
object detection and instance segmentation tasks. The model will have to delimit whistle boundaries and to find the contours of
95
the whistles within these boundaries. In order to achieve this task, we decided to create our own method: DYOC ("Draw Your
96
Own Contours"), relying on two existing deep learning models. For its training, we use our annotated dataset of whistles.97
The DYOC structure98
We built DYOC with the aim of developing a rapid annotation process, by using two efficient models: YOLOv8m
39,40
and
99
ResNet18
41
, which are trained and tested on our dataset using PyTorch
42
. The efficiency of YOLO applied to marine bioacoustic
100
data has already been demonstrated in other studies
4349
, and ResNet is a standard model widely used for image processing
101
tasks. Pretrained weights were used for these models, as transfer learning has been shown to be more effective than using
102
randomly assigned weights for similar tasks50. The structure of DYOC was designed to be minimalist.103
In order to predict the contours of whistles, the method directly takes audio files as inputs. Each waveform is converted
104
to a spectrogram image, and each 6-second segment of these spectrograms is resized to a 640x640 pixels image in order to
105
match YOLO’s input size. In order to facilitate the recognition of whistles despite ambient noises, contrast is also added using a
106
gamma correction (
γ
= 0.5) before making predictions. YOLO then predicts the time-frequency limits of the whistles within
107
each square image using bounding boxes (bboxes). YOLO only predicts a single ’whistle’ class. Then, an image is extracted
108
from the coordinates of each bbox, and resized to 224x224 pixels that can be used by ResNet18. Finally, ResNet18 performs
109
a regression on these images and outputs a prediction of the whistle contour coordinates: 112 y-positions corresponding to
110
uniformly distributed points along the x-axis. The complete DYOC architecture is described in Figure 3.111
4/17
Input data
Pre-
proces sin g
Bbox
predic tio n
Contour
predic tio n
Results
Diagram Keys
Data
Predictions
IA models
Manual interaction
Aud io
fil es
Spect rog rams Seg ment ed
spec tro gram s
YOLO
Conv ersio n
Segmen tatio n
Predi ctio n Predi cted
Bbo xes
Images of
detec ted
whi stl es
ResNet18
Predi cted
con tou rs
Check
BBo xes
Conv ersio n
Keep
pred icti ons
Check
Cont our s
Conv ersio n
Keep
pred icti ons
Predi ctio n
Figure 3. Simplified diagram of the architecture of the ”Draw Your Own Contours”(DYOC) extraction method.
Initially, YOLO was designed to predict the position of objects in complex landscapes and photographic scenes using
112
bboxes. Thus, we expected that a simple whistle detection task should be a fairly straightforward process for this model. With
113
regard to ResNet18, the model performs a regression to predict the contours of the whistles extracted from YOLO predictions.
114
Its purpose is to achieve accurate annotation of the simplest isolated whistles. It is not expected that the model will be able to
115
make accurate predictions in complex situations where whistles are overlapping with other sounds, as these situations are also
116
challenging for a human annotator to judge. This is why steps with manual checks are included in DYOC (see Fig. 3), allowing
117
for rapid correction of its predictions.118
DYOC Training and performance119
DYOC is trained in two parts, by independently training YOLO and ResNet18. YOLO uses segmented spectrograms as input
120
and output bboxes. ResNet18 uses images of isolated whistles to predict the coordinates of their frequencies along the time axis
121
of the image. We used ground truth bboxes to train ResNet18 in order to avoid propagating YOLO’s prediction errors to the
122
ResNet18 training.123
In order to train DYOC, the dataset was split into train/test/validation subsets at proportions of 80%/10%/10%. Test scores
124
were computed using an 8-fold cross-validation. For YOLO, we used TP (true positive), FP (false positive) and FN (false
125
negative) rates to compute several metrics (see eq. 1): precision (proportion of relevant detections in predictions) and recall
126
(proportion of relevant detections made).127
Precision =T P
T P +F P ,Recall =T P
T P +FN .(1)
For ResNet18 we mainly used the RMSE (Root Mean Square Error, where the errors are the distances between predicted
128
and ground truth coordinates), as it is easily interpreted as a proportion of the image size. In addition, YOLO uses a specific
129
metric for its evaluation: the mAP (mean Average Precision), which is the average of the area under precision-recall curves for
130
different confidence thresholds.131
Spectrograms were converted to and treated as black and white images (0–255 pixel values correspond to [-60, -40] dBFS
132
values in spectrograms). To enhance performances, we also included common data augmentation transformations for the
133
training of our models: white padding, vertical and horizontal flipping as well as slight colour changes. For ResNet18, we
134
created synthetic whistle contours in order to train the model on a wide variety of shapes, not present in our initial dataset.
135
These were obtained by generating points at random coordinates in an image, which were then linked by a quadratic spline.
136
Columns of black pixels were added to simulate echolocation clicks, and white noise to simulate ambient noise. Finally, we
137
added contrast to all our images using a gamma correction (
γ
= 0.5). We show examples of the images used as inputs in Figure
138
4, and test results in Table 1.139
5/17
(a) (b) (c)
Figure 4. Examples of inputs to the DYOC model.
(4a) Segmented spectrogram, (4b) whistle in a bounding box and (4c) randomly generated whistle.
Data YOLO (scores) Regression (errors in % of image size)
Precision Recall mAP50 mAP50-95 RSME (mean) RMSE (median)
Baseline 63.2% 50.6% 52.7% 29.5% 22.1% 13.8%
Improved 68.1% 68.8% 69.2% 42.5% 6.1% 4.6%
Table 1. DYOC scores on test sets (means of an 8-fold cross-validation).
Baseline: no data augmentation, Improved: data augmentation and added contrast.
The results presented in Table 1indicate that the transformations applied to the dataset improved DYOC’s predictions on
140
test data. In addition, we tested the generalisation capabilities of DYOC by adding random Gaussian noise to the test images in
141
order to artificially reduce the SNRs of the whistles. The associated results are presented in Figure 5and show that YOLO
142
scores increased linearly with SNR (i.e. higher scores when less noise). ResNet18 is more resistant to noise: errors decreased
143
in a logarithmic fashion when the SNR increased (see Figure 5b).144
(a) (b)
Figure 5. Performance of the DYOC model for different SNR levels. (5a) YOLO scores and (5b) regression errors.
Means computed from an 8-fold cross-validation, error bars show standard deviations.
DYOC in practice145
DYOC predicted whistle bboxes in a spectrogram with a mAP50 of 69.2% on average. Precision and recall scores in Table 1
146
are comparable to those of other approaches on neighbouring populations of dolphins
20
. These scores show that YOLO makes
147
few errors, but this is not sufficient to permit its use in DYOC without verifying its predictions. Even if there is still room for
148
6/17
improvement, these results are satisfactory for our purposes, as our aim is only to accelerate the annotation process, not to make
149
perfect predictions. It should also be noted that there were many overlapping whistles in the initial manually annotated dataset.
150
These are difficult cases to handle, even for a human annotator.151
ResNet18 regression makes only a few errors: RMSE is 6.1% of the image size on average. This is satisfactory for further
152
use: predicted contours are arguably of equal or better quality than human labels in easy cases of annotation (see Fig. 6a &6b),
153
but predictions are poor when the model is confronted with low-SNR whistles (Fig. 6c) or overlapping signals (Fig. 6d). This
154
means that, depending on YOLO bboxes predictions, the annotator will only need to correct the predictions in the most difficult
155
cases, thus saving a good deal of time. We were therefore able to apply it to the DOLPHINFREE dataset. The most complex
156
cases, however, remain the responsibility of the annotator. In all, around 60 hours were required to manually annotate the
157
1500 whistles used in training. Using DYOC, it took only 10 hours to annotate the same quantity of whistles: the annotation
158
time is divided by 6. The method could be further improved by using more training data or by filtering the audio data. However,
159
again, our goal was simply to speed up the annotation process, which DYOC does.160
(a) RMSE: 6.4% (b) RMSE: 4.4% (c) RMSE: 14.7% (d) RMSE: 16.7%
Figure 6. Example of predictions by ResNet18. Figures (6a) and (6b) show correct predictions and Figures (6c) and (6d)
show common error types (low SNR and signal overlap). Scores are expressed as RMSE in proportion to the image size. Green:
annotations, Red: predictions.
Case study: DOLPHINFREE experiments161
Overview of results162
Using DYOC trained on our manually annotated dataset, we were able to detect and verify the whistle contours extracted from
163
808 minutes of audio recordings, collected during the DOLPHINFREE experiments. In total, 13,576 whistles were extracted
164
with DYOC. These whistles have durations ranging from 0.02 to 3.37 seconds, and frequencies from 1.71 to 31.8 kHz. In
165
order to minimise the number of fragments of whistles in our final analysis, 8,730 whistles were retained for further analyses
166
(constraints: signal
>
-60 dB, SNR
>
10 dB, duration
>
200 ms). We then extracted classic characteristics from these contours
51
:
167
duration, minimum/starting/ending/maximum frequencies, frequency range, number of inflections and global gradient. We
168
analysed these features depending on the behavioural state of the dolphins observed, the presence/absence of a fishing net and
169
the activation state of the DOLPHINFREE bio-inspired beacon. Then, we used a dimension reduction technique (UMAP) to
170
represent whistle shapes in a 2D space.171
Effects on whistles172
We compared the features of detected whistles on the basis of three variables of interest: the observed behaviour of the dolphins,
173
the state of activation of the DOLPHINFREE acoustic beacon and the presence/absence of a fishing net. Results are presented
174
in Figures 7&8. Frequency ranges and frequency means are not shown in order to make the figures easier to read. In Figure 8,
175
we chose to represent whistle features depending on travelling and foraging behaviours only, because these behaviours were
176
the most likely to be associated with bycatch. Figure 8shows less variables than Figure 7for easier readability. Violin plots
177
represent data with kernel density estimates of the distributions and box-plots in their centres. Kruskal-Wallis tests and Dunn’s
178
post-hoc tests were computed for each variable in order to determine whether there were differences between the distributions
179
of features.180
7/17
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Duration (s)
a b b ab c ab b a b a a a
10
15
20
25
30
35
40
45
50
SNR (dB)
b a a c b bc a b c d b a
3
2
1
0
1
Frequence gradient (kHz/s)
ab b a ab ab a a a a a a a
1.6
1.8
2.0
2.2
2.4
2.6
# of inflections
c b c a c ab b a b a a b
Socialising
Travelling
Attraction
Foraging
Milling
Observed behaviour
8
9
10
11
12
13
14
15
Frequency (kHz)
Before
Activation
Activated
Deactivation
After
Activation of beacon
Absent
Present
Fishing net
Frequency
Min.
Start.
End.
Max.
Figure 7. Distribution of features extracted from whistles, depending on 3 variables: observed behaviour of the dolphins,
activation state of the DOLPHINFREE acoustic beacon and presence/absence of a fishing net. Two representations: violin plots
(duration & SNR), or means and 95%CI (frequency gradient, number of inflections and other frequency features). For each
plot, different letters show different distributions according to Dunn’s post-hoc tests.
8/17
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Duration (s)
abc abc c bc ab ab bc a a ab c a b c
Behaviour
Travelling
Foraging
10
15
20
25
30
35
40
45
SNR (dB)
abcde bcde ab cd ab c abc de a e b c a a
1.5
2.0
2.5
3.0
3.5
# of inflections
ab ab b ab a a ab a ab ab ab a b b
Behaviour
Travelling
Foraging
Before
Activation
Activated
Deactivation
After
Activation of beacon
7
8
9
10
11
12
13
14
15
Frequency (kHz)
Absent
Present
Fishing net
Frequency
Min. (Travelling)
Min. (Foraging)
Max. (Travelling)
Max. (Foraging)
Figure 8. Distribution of features extracted from whistles, depending on 2 variables (activation state of the DOLPHINFREE
acoustic beacon and presence/absence of a fishing net) represented in interaction with 2 behaviours: travelling and foraging.
Two representations: violin plots (duration & SNR), or means with 95%CI (number of inflections and frequency features). For
each plot, different letters show different distributions according to Dunn’s post-hoc tests.
9/17
Our results highlighted differences in the features of detected whistles depending on the behaviour of dolphins (Fig. 7).
181
The average detected SNRs were lowest when dolphins were attracted by the boat or travelling, median during socialising
182
and milling activities and the highest during foraging behaviours (Fig. 7, K-W test, H = 522, d.f. = 4, p-value
<
0.001), with
183
SNRs ranging from 17.4 to 22.2 dB depending on the behavioural state. Whistles recorded from milling dolphins were, on
184
average, slightly longer (by 0.1 s) than those recorded when other activities were observed (Fig. 7, K-W test, H = 38, d.f. = 4,
185
p-value
<
0.001). Slight variations of the average frequency gradient can be observed depending on behavioural activities:
186
dolphins attracted to the boat emitted whistles with globally decreasing frequencies as opposed to relatively stable whistles
187
when they were travelling (Fig. 7, Dunn’s post-hoc test, p-value
<
0.05). However, these mean values are associated with fairly
188
wide confidence intervals, showing great variations within each behavioural state. Conversely, smaller intervals around the
189
mean numbers of inflections per whistle enabled us to show high variations of this feature depending on behavioural activities.
190
Whistles emitted during socialisation, milling activities, or when dolphins were attracted to the boat had the highest mean
191
numbers of inflections measured. These numbers were lower when dolphins were travelling and lowest when dolphins were
192
foraging (Fig. 7, K-W test, H = 77, d.f. = 4, p-value
<
0.001). Whistle minimum and maximum frequencies also showed subtle
193
differences depending on behavioural states (Fig. 7, K-W test, p-values
<
0.05), which was not the case for starting and ending
194
frequencies. Additionally, whistles detected during milling activities showed slightly higher maximum frequencies and lower195
minimum frequencies than whistles detected during other behavioural activities. This reflected a slightly wider frequency range
196
during milling activities (K-W test, H = 81, d.f. = 4, p-value <0.001) (frequency ranges not shown on Fig. 7for readability).197
During experiments, when a fishing net was set underwater, detected whistles had a SNR that was clearly lower (Fig. 7,
198
K-W test, H = 959, d.f. = 1, p-value
<
0.001), and a higher number of inflections per whistle on average (Fig. 7, K-W test, H =
199
21, d.f. = 1, p-value
<
0.001). Frequency features were less affected by the presence of a fishing net: the average of maximum
200
frequencies shows a subtle decrease (Fig. 7, K-W test, H = 16, d.f. = 1, p-value
<
0.001), while other frequency features did
201
not show changes in presence or absence of a fishing net (Fig. 7, K-W tests, p-values
>
0.05). In addition, neither the duration
202
nor the frequency gradients had statistically different distributions when a fishing net was absent/present (Fig. 7, K-W tests,
203
p-values >0.05).204
Distributions of whistle features also varied depending on the behaviour state of dolphins in the presence (or not) of a fishing
205
net (i.e. interaction of these two variables, see Fig. 8). Travelling dolphins emitted longer whistles than foraging individuals
206
when the fishing net was absent (0.7 s & 0.62 s, respectively). However, this effect reversed when a fishing net was present (Fig.
207
8, K-W test, H = 46, d.f. = 3, p-value
<
0.001), although the differences are slighter (i.e. 0.65 s & 0.7 s, respectively). he same
208
inversion can be observed for the number of inflections and frequency features extracted from the whistles (see Fig. 8) with
209
very little difference between each modality. Finally, the SNRs varied in a different motion (Dunn’s post-hoc test, p-value
<210
0.05): detected whistles were recorded with lower SNRs when a fishing net was present underwater (17.95 dB). Conversely, in
211
absence of this disturbance, whistles of travelling dolphins were recorded at significantly higher SNRs (22.49 dB), and those of
212
foraging dolphins at even higher SNRs (23.19 dB).213
The use of the bio-inspired DOLPHINFREE beacon also had an effect on detected whistles. The mean SNR decreased
214
upon activation of the beacon (i.e. from 21.3 dB to 19.4 dB), and then rose to levels that were higher than before its use (Fig. 7,
215
K-W tests, H = 109, d.f. = 4, p-value
<
0.001) with wide variations (from 19.4 to 22.7 dB). Other features of whistles showed
216
completely different patterns. Duration and average numbers of inflections per whistle followed the same trend: whistles were
217
slightly longer with noticeably more inflections at beacon activation/deactivation than when the beacon remained activated or
218
after its deactivation (K-W tests, H = 41 & H = 63 respectively, d.f. = 4, p-value
<
0.001). All measured frequency features
219
varied depending on the activation state of the beacon during the experiments (K-W tests, p-values
<
0.05). They followed the
220
same relative pattern as the duration of detected whistles: measured frequency features were on average distinctly higher before
221
the activation of the beacon. Then, they were lowest at activation/deactivation of the beacon, with little difference between
222
before and after the beacon’s deactivation.223
The features of detected whistles from foraging and travelling dolphins showed little difference depending on the activation
224
state of the DOLPHINFREE beacon (Fig. 8). Frequency features showed some difference in distributions depending on
225
behaviours within each modality of activation (K-W tests, H = 45, d.f. = 9, p-value
<
0.001). Namely, whistles minimum
226
frequencies were higher (10 kHz) for travelling dolphins than foraging ones (9 kHz) after the beacon’s deactivation (Dunn’s
227
post-hoc test, p-value
<
0.05). Detected whistles of travelling dolphins showed a distinctly higher number of inflections
228
compared to foraging dolphins when the beacon was switched on/off, but followed the same patterns for the rest of the
229
experimental sequences (Dunn’s post-hoc tests, p-values
>
0.05). Similarly, durations of recorded whistles remained mostly
230
similar during the sequences of activation between foraging and travelling dolphins, with little to no variation detectable
231
(Dunn’s post-hoc tests, p-values
>
0.05). For the SNRs, recorded levels of sound intensity were lower by 4.8 dB on average
232
for travelling dolphins compared to foraging dolphins over the whole experimental sequence (see Fig. 8). For all modalities
233
in the activation sequence, distributions of recorded SNRs were statistically different when comparing between foraging and
234
travelling dolphins, except for before the activation of the beacon (Dunn’s post-hoc tests, p-values
>
0.05). After the use of the
235
10/17
DOLPHINFREE signal, the detected SNR of whistles emitted by travelling dolphins declined, while it increased for foraging236
dolphins.237
Whistles shapes238
In order to visualise overall differences in the shapes of the whistles, we applied a dimension reduction method to the 8,730
239
whistle contours extracted with DYOC. To this end, we first calculated the Dynamic Time Warping (DTW
52
) mmetric between
240
all pairs of contours (with normalised frequencies). This metric is commonly used to compare two temporal sequences that are
241
not perfectly synchronised. Then, we performed a Uniform Manifold Approximation and Projection for dimension reduction
242
(UMAP
53
) on the contours, using this metric. Our goal was to group together whistles that have similar shapes, with no a priori
243
information on these data. We obtained the representation shown in Figure 9.244
5 0 5 10 15
UMAP dimension 1
0
2
4
6
8
10
UMAP dimension 2
Behaviour
Socialising
Travelling
Attraction
Foraging
Milling
(a)
5 0 5 10 15
UMAP dimension 1
0
2
4
6
8
10
UMAP dimension 2
Sequence
Before
Activation
Activated
Deactivation
After
(b)
5 0 5 10 15
UMAP dimension 1
0
2
4
6
8
10
UMAP dimension 2
Fishing net
Present
Absent
(c)
5 0 5 10 15
UMAP dimension 1
0
2
4
6
8
10
UMAP dimension 2
Whistle shapes
Upsweeps (n=1703)
Downsweeps (n=1511)
Convexes (n=2540)
Sinusoidal (n=1146)
Concaves (n=1830)
(d)
Figure 9. 2D Uniform Manifold Approximation and Projection (UMAP) representations
53
of the shapes of the whistles based
on DTW metric52. Sub-figures coloured according to behavioural states (9a), activation state of the DOLPHINFREE beacon
(9b), presence/absence of a fishing net during experiments (9c) and manual shape identification (9d).
This representation did not highlight any relation between the shapes of the annotated whistles and our experimental
245
modalities (Figures 9a,9b &9c). However, it enabled us to manually differentiate groups of similar shapes (associated with
246
different colours in Fig. 9d). Upsweeps and downsweeps are easily clustered together in this representation. However, it is
247
interesting to note that convex and concave whistles appear as clear gradients linking upsweeps and downsweeps. Sinusoidal
248
whistles stand out in the centre of the UMAP representation. This position is not surprising as those whistles are constituted of
249
concatenations of the other shapes. This last group is the one whose forms vary the most, as can be seen from the presence in it
250
11/17
of many smaller clusters. Sinusoidal whistles have between more than 3 inflections with original and varying shapes.251
Discussion252
In this study, we propose a new approach, DYOC, that combines deep learning models and uses transfer learning to efficiently
253
speed up the extraction of whistle contours. Image post-processing techniques were used in order to improve its predictions,
254
despite being applied on audio data. We applied DYOC to a dataset containing 808 minutes of audio recordings of short-beaked
255
common dolphins. With DYOC, we complete the annotation of all our recordings about 6 times faster than if we had annotated
256
them manually.257
The main criticism that can be made regarding the use of YOLOv8m and ResNet18 in DYOC is that they are not initially
258
intended to be applied to sounds, but rather to images. Here, we made predictions using spectrograms that are considered to be
259
images, which means that the models do not differentiate between the temporal and frequency axes (i.e. images are not read
260
from left to right as they are by humans). Nevertheless, the results of the regression part of DYOC showed very good prediction
261
scores on test data (RMSE
=
6.1% of image size). Visualisations (Fig. 6) also showed that the model makes predictions that
262
have a polynomial-like appearance, even for images that are difficult to predict. It underlines that the ResNet part of DYOC
263
has successfully integrated the general pattern of a whistle. Predictions made by YOLO were less convincing than expected
264
(mAP50 =69.2%), but are satisfactory for our purposes.265
Both models used in DYOC are fully operational without having to create a deep learning architecture from scratch for each
266
particular use, which is why we chose them. Ultralytics, the company which develops YOLO, recommends training YOLOv8
267
with 10,000 instances per class in order to obtain good results. Despite using a relatively very small number of whistle instances
268
for our training (431), we were still able to obtain satisfactory results, proving the efficiency of a generalist method of this kind.
269
Knowing that the whistles annotated for the training of DYOC had varying shapes, SNR, and overall visual qualities, our results
270
could probably be improved by using a larger training dataset.271
Overall, these high-performance models are well-suited for our general task, but are not good enough to be entirely relied272
upon. Without being perfect, the DYOC method enabled us to correctly detect the coordinates of whistles in spectrograms and
273
to make precise annotations of their contours. Although YOLO makes false positive and false negative detections (see Table 1)
274
and ResNet18 has difficulty making predictions on low-energy whistles and in presence of overlapping noises (see Figures
275
6c &6d). , the steps that we included in order to facilitate the manual correction of predictions within DYOC enabled us to
276
greatly speed up our annotation process (Fig. 3). When making predictions, these steps were the slowest, as YOLO and ResNet
277
forward passes are almost instantaneous, but they were necessary to ensure the good quality of our contours for our analyses.278
We extracted features commonly used in previous studies and found that the duration and frequency ranges of our whistles
279
were similar to those collected from different populations of short-beaked common dolphins in other zones
24,25,29,51
. Common
280
dolphin whistling rates are known to vary in response to anthropogenic acoustic stimuli
35,54
. In this study, we aimed at
281
studying whistle responses in more detail, by comparing their characteristics in function of different variables: behaviour state,
282
presence/absence of a fishing net and activation state of a bio-inspired acoustic beacon to limit bycatch. We used non-parametric
283
statistical tests to account for differences in data sampling.284
We showed that almost all features extracted from detected whistles (except starting and ending frequencies) varied in
285
function of the behaviour of dolphins. Namely, milling dolphins emit longer whistles, on average, than dolphins in other
286
behavioural states (0.74 compared to 0.65 seconds on average for the other states, respectively), with a larger mean frequency
287
range (6.1 compared to 4.9 kHz, respectively) and a higher mean number of inflections (2.2 compared to 2, respectively). This
288
means that despite being in an apparent resting state, dolphins continue to exchange information. These whistles could also
289
help dolphins to stay grouped together
5557
when they emit fewer echolocation clicks and are not actively travelling. Whistles
290
also had a similarly higher number of inflections when dolphins were socialising and attracted to the survey boat, which is
291
comparable to other populations of the same dolphin species
24,58
. In addition, mean SNRs varied greatly between 3 modes: 19
292
dB when travelling or when attracted by the boat, 20.6 dB when socialising or resting and 22.4 dB during foraging activities.
293
These differences are greater than what we would have expected from variations between the emissions of on-axis and off-axis
294
dolphins towards the hydrophone used for experiments
59
. It shows that travelling and attracted dolphins were either emitting
295
whistles at a lower sound intensity level than foraging dolphins, or from a greater distance, hence reducing the signal’s intensity.
296
Bottlenose dolphins have been documented to produce whistles at a higher rate when feeding in order to attract conspecifics
60
.
297
The elevated SNRs recorded during our experiments while dolphins were foraging could play a similar role.298
The estimated gradient of frequency of detected whistles did not show noticeable variations (Figure 7), which is why we
299
did not further consider this variable (Figure 8). Overall, its only significant difference in distribution was between travelling
300
dolphins and those attracted to the boat: attracted animals emitted whistles with a lower frequency gradient (with faster
301
descending frequencies in time). Whistles recorded during foraging, travelling or attraction behaviour represented
90% of the
302
detections. We chose to focus our analyses on foraging and travelling states as those behaviours are more likely to occur during
303
interaction with fishing activities.304
12/17
The use of the bio-inspired DOLPHINFREE beacon led to considerable variations in the observed characteristics of the
305
detected whistles (Fig. 7). Whistles were slightly longer when the beacon was switched on/off compared to when it was fully306
activated or after its use (mean durations of 0.72s and 0.73s compared to 0.65s and 0.63s, respectively). A similar pattern, with
307
greater differences, was observed for the numbers of inflection per whistle (2.45 inflections at beacon activation/deactivation
308
compared to 1.9 inflections during and after use). This underlines that dolphins communicated differently when the signal was
309
introduced rather than during the whole emission sequence of the acoustic beacon. In addition, SNRs of recorded whistles
310
decreased strongly when the beacon was activated (19.4 compared to 20.3 dB before activation), but showed very high SNRs
311
after the extinction of the beacon (22.7 dB). These variations show a response to the signal emitted by the beacon but it is more
312
difficult to explain why SNRs were higher after the use of the beacon. Odontocetes have highly variable recovery rates to
313
return to their initial behavioural state after an acoustic event
61
ranging from a couple of minutes to a couple of hours. In this
314
experiment, we could only wait for up to 5 minutes after the extinction of the beacon, as dolphins left the emission zone after
315
generally less than 1 minute, which could be why these higher SNRs can still be observed. Finally, frequency features showed a
316
shift to lower frequencies when the beacon was activated (e.g. mean of maximum recorded frequencies: 14.7 kHz before the
317
activation of the beacon compared to 14 kHz for other modalities). This difference is further accentuated at the activation and
318
deactivation of the beacon. This response could be interpreted as a way to contact dolphins further away regarding the presence
319
of a particular acoustic event36.320
Our results showed some differences in the distribution of the frequency and number of inflections of whistles among
321
activation sequences during foraging and travelling behaviours, while those of duration and SNRs presented slight differences
322
(see Fig. 8). The number of inflections and the frequency features of the whistles varied more in function of behaviour and the
323
activation state of the beacon alone, rather than their interaction. Interestingly, after the deactivation of the beacon, travelling
324
dolphins emitted whistles at considerably lower SNRs than foraging dolphins (16.8 VS 25.8 dB respectively), a difference that
325
can be explained by our visual observations: travelling dolphins exhibited a tendency to move away from the beacon faster than
326
foraging dolphins. In contrast, the numbers of inflections per whistle were significantly higher for travelling dolphins compared
327
to foraging ones at the activation/deactivation of the beacon, showing that more complex whistles are emitted by travelling
328
dolphins in response to the DOLPHINFREE signal.329
Finally, whistle characteristics varied significantly in response to the presence of a fishing net: detected signals were quieter
330
but more complex. When a fishing net was set underwater, detected whistles had a lower mean SNR (18.2 compared to 23 dB)
331
but a higher mean number of inflections (2.1 compared to 1.8), and a slightly lower mean maximum frequency (13.9 compared
332
to 14.1 kHz), while other frequency features showed no significant changes. The maximum frequencies varied slightly but
333
could have two origins: an acoustic response of dolphins to the presence of the fishing net, or dolphins emitting from farther
334
away when a fishing net was present (high frequencies are the first that are absorbed by the environment). The fishing net was
335
set underwater from the boat, from which the hydrophone was also set. Thus, the lower SNRs corroborates the hypothesis of
336
dolphins avoiding the area when a fishing net is present. Unfortunately, our data does not enable us to estimate precisely the
337
distance from which the whistles were emitted. According to previous research, dolphins can detect fishing nets only from a
338
few metres away
62
. When considered in relation to dolphins’ behaviour (Fig. 8), detected whistles of foraging dolphins had a
339
higher SNR on average than those of travelling dolphins when no fishing net was set. This reflects the observed higher SNR
340
during foraging behaviours. However, despite this difference of SNR between behaviours in absence of fishing net, the average
341
SNR decreased to a similar level when a fishing net was present, showing an equal acoustic response to the presence of a fishing
342
net whether dolphins were travelling or foraging. Moreover, the duration of whistles decreases slightly when travelling dolphins
343
encounter a fishing net, while detected whistles of foraging dolphins become longer (from 0.62 to 0.7 seconds). Similar relative
344
patterns of change can be observed for the number of inflections per detected whistles. In addition, maximum frequencies (with
345
ending frequencies and frequency ranges, not shown in Fig. 8for easier readability) vary depending on absence/presence of
346
the fishing net and the behaviour of dolphins. When there is no fishing net, measured maximum frequencies are higher for
347
travelling than foraging dolphins but when a fishing net is present, it is the exact opposite. Overall, these differences show
348
complex whistling responses of dolphins to the presence of an object in their environment (the fishing net) and depending on
349
their behaviour. Foraging dolphins modulate their whistles sharply in presence of a fishing net compared to travelling dolphins.
350
This tends to show that dolphins are more responsive to the presence of a fishing net when they are hunting than when they are
351
travelling.352
We did not find any direct link between the general shapes of the whistles and our experimental variables. Assessing in
353
detail the repertoire of short-beaked common dolphins was not the main aim of this study. However, using an unsupervised
354
clustering technique with no a priori information, we showed that unsupervised methods can be used to group whistles in
355
categories similar to those previously reported for other populations of short-beaked common dolphins
24,25,51
. Moreover, this
356
technique highlights that the categorisation of whistles into different types appears to be overly simplistic as some whistles
357
exhibit a more gradual transition between classes (especially for concave and convex whistles, see Fig. 9). Recent studies
30,31
358
also showed that signature whistles could be identified among common dolphin whistles. Stereotyped whistles could be
359
13/17
included in small clusters in Figure 9.360
Ultimately, the findings of this study indicate that generalist computer vision deep learning models are well-suited for use
361
on spectrograms, even when trained on limited datasets. This is true when the objective is to achieve efficiency and rapidity
362
(not perfect accuracy) of annotation. The application of this approach to a case study enabled us to substantially accelerate our
363
annotation process. Detected whistle characteristics were shown to vary very differently depending on the behaviour of dolphins,
364
the presence/absence of a fishing net and the activation state of a DOLPHINFREE bio-inspired beacon. Interactions between
365
these variables show a complex acoustic behaviour of dolphins in response to environmental and acoustic conditions. Regarding
366
the DOLPHINFREE acoustic beacon, these results suggest that emitting its signal periodically elicits more pronounced acoustic
367
responses from short-beaked common dolphins compared to a continuous emission. For that reason, the latest version of the
368
beacon includes a passive acoustic monitoring module, thus the emission of the signal only occurs when dolphins are detected
369
within a radius of 250 m from the beacon. Variations observed in whistles characteristics suggested that dolphins could use
370
these signals to inform their conspecifics about a change in their environment, notably the emissions of the DOLPHINFREE
371
bio-inspired beacon
36,60
. It could potentially enhance their alertness and increase their chances of identifying fishing nets. An
372
action plan introduced by the French government, planned for 2024-2026 in collaboration with scientists and fishers, aims to
373
assess the efficiency of the bio-inspired acoustic beacon DOLPHINFREE to limit bycatch of short beaked common dolphins in
374
the Bay of Biscay. More generally, the proposed methodological approach based on deep learning models has the potential to
375
be applied to the study of whistles across a wide range of species and geographical areas, with applications related to social
376
interactions, behaviour, and species responses to anthropogenic activities.377
References378
1. Norris, K. S., Prescott, J. H., Asa-Dorian, P. V. & Perkins, P. An experimental demonstration of echolocation behavior in379
the porpoise, Tursiops truncatus (montagu). The Biol. Bull. 120, 163–176, DOI: 10.2307/1539374 (1961).380
2.
Tyack, P. Population biology, social behavior and communication in whales and dolphins. Trends Ecol. & Evol. 1, 144–150,
381
DOI: 10.1016/0169-5347(86)90042-X (1986).382
3. Au, W. W. L. The Sonar of Dolphins (Springer New York, 1993).383
4.
Au, W. W. L. & Hastings, M. C. Principles of marine bioacoustics, vol. 1 of Modern acoustics and signal processing
384
(Springer, 2008), springer edn.385
5.
Overstrom, N. A. Association between burst-pulse sounds and aggressive behavior in captive Atlantic bottlenosed dolphins
386
(Tursiops truncatus). Zoo Biol. 2, 93–103, DOI: 10.1002/zoo.1430020203 (1983).387
6.
Caldwell, M. C. & Caldwell, D. K. Vocalization of Naive Captive Dolphins in Small Groups. Science 159, 1121–1123,
388
DOI: 10.1126/science.159.3819.1121 (1968).389
7.
Sayigh, L. S. Sayigh, Laela Suad. Development and functions of signature whistles of free-ranging bottlenose dolphins,
390
Tursiops truncatus. Ph.D. thesis, Massachusetts Institute of Technology (1992).391
8. Au, W. W. & Hastings, M. C. Emission of Social Sounds by Marine Animals, 401–499 (Springer US, 2008).392
9.
Rivers, J. A. Blue whale, Balaenoptera musculus , vocalizations from the waters off central california. Mar. Mammal Sci.
393
13, 186–195, DOI: 10.1111/j.1748-7692.1997.tb00626.x (1997).394
10.
Esch, H. C., Sayigh, L. S. & Wells, R. S. Quantifying parameters of bottlenose dolphin signature whistles. Mar. Mammal
395
Sci. 25, 976–986, DOI: 10.1111/j.1748-7692.2009.00289.x (2009).396
11.
Li, P. et al. Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours. In 2020 International
397
Joint Conference on Neural Networks (IJCNN), 1–10, DOI: 10.1109/IJCNN48605.2020.9206992 (IEEE, Glasgow, United
398
Kingdom, 2020).399
12.
Conant, P. C. et al. Silbido profundo : An open source package for the use of deep learning to detect odontocete whistles.
400
The J. Acoust. Soc. Am. 152, 3800–3808, DOI: 10.1121/10.0016631 (2022).401
13.
Jin, C., Kim, M., Jang, S. & Paeng, D.-G. Semantic segmentation-based whistle extraction of indo-pacific bottlenose
402
dolphin residing at the coast of jeju island. Ecol. Indic. 137, 108792, DOI: 10.1016/j.ecolind.2022.108792 (2022).403
14.
Li, P., Liu, X., Klinck, H., Gruden, P. & Roch, M. A. Using deep learning to track time × frequency whistle contours of
404
toothed whales without human-annotated training data. The J. Acoust. Soc. Am. 154, 502–517, DOI: 10.1121/10.0020274
405
(2023).406
15.
Halkias, X. C. & Ellis, D. P. Call detection and extraction using bayesian inference. Appl. Acoust. 67, 1164–1174, DOI:
407
10.1016/j.apacoust.2006.05.006 (2006).408
14/17
16.
Oswald, J. N., Rankin, S., Barlow, J. & Lammers, M. O. A tool for real-time acoustic species identification of delphinid
409
whistles. The J. Acoust. Soc. Am. 122, 587–595, DOI: 10.1121/1.2743157 (2007).410
17.
Roch, M. A. et al. Automated extraction of odontocete whistle contours. The J. Acoust. Soc. Am. 130, 2212–2223, DOI:
411
10.1121/1.3624821 (2011).412
18.
Baumgartner, M. F. & Mussoline, S. E. A generalized baleen whale call detection and classification system. The J. Acoust.
413
Soc. Am. 129, 2889–2902, DOI: 10.1121/1.3562166 (2011).414
19.
Gruden, P. & White, P. R. Automated tracking of dolphin whistles using gaussian mixture probability hypothesis density
415
filters. The J. Acoust. Soc. Am. 140, 1981–1991, DOI: 10.1121/1.4962980 (2016).416
20.
Miralles, R., Gallardo, C., Lara, G. & Bou Cabo, M. Extracting dolphin whistles in complex acoustic scenarios: a case
417
study in the bay of biscay. Bioacoustics 33, 260–277, DOI: 10.1080/09524622.2024.2338387 (2024).418
21.
Janik, V. M. Chapter 4 acoustic communication in delphinids. In Advances in the Study of Behavior, vol. 40, 123–157,
419
DOI: 10.1016/S0065-3454(09)40004-4 (Elsevier, 2009).420
22.
Moore, S. E. & Ridgway, S. H. Whistles produced by common dolphins from the Southern California Bight. Aquatic
421
Mamm. 21, 55–55 (1995).422
23.
Carlón-Beltrán, O., Viloria-Gómora, L., Urbán R., J., Martínez-Aguilar, S. & Antichi, S. Whistle characterization of
423
long-beaked common dolphin ( Delphinus delphis bairdii ) in La Paz Bay, Gulf of California. PeerJ 11, e15687, DOI:
424
10.7717/peerj.15687 (2023).425
24.
Ansmann, I. C., Goold, J. C., Evans, P. G., Simmonds, M. & Keith, S. G. Variation in the whistle characteristics of
426
short-beaked common dolphins, Delphinus delphis , at two locations around the British Isles. J. Mar. Biol. Assoc. United
427
Kingd. 87, 19–26, DOI: 10.1017/S0025315407054963 (2007).428
25.
Petrella, V., Martinez, E., Anderson, M. G. & Stockin, K. A. Whistle characteristics of common dolphins ( Delphinus sp.)
429
in the Hauraki Gulf, New Zealand. Mar. Mammal Sci. 28, 479–496, DOI: 10.1111/j.1748-7692.2011.00499.x (2011).430
26.
Papale, E. et al. Macro- and micro-geographic variation of short-beaked common dolphin’s whistles in the Mediterranean
431
Sea and Atlantic Ocean. Ethol. Ecol. & Evol. 26, 392–404, DOI: 10.1080/03949370.2013.851122 (2014).432
27.
Azzolin, M. et al. Whistle variability of the Mediterranean short beak common dolphin. Aquatic Conserv. Mar. Freshw.
433
Ecosyst. 31, 36–50, DOI: 10.1002/aqc.3168 (2021).434
28.
Oswald, J. N. et al. Species information in whistle frequency modulation patterns of common dolphins. Philos. Transactions
435
Royal Soc. B: Biol. Sci. 376, 20210046, DOI: 10.1098/rstb.2021.0046 (2021).436
29.
Pagliani, B., Amorim, T. O. S., De Castro, F. R. & Andriolo, A. Intraspecific variation in short-beaked common dolphin’s
437
whistle repertoire. Bioacoustics 31, 1–16, DOI: 10.1080/09524622.2020.1858449 (2022).438
30.
Fearey, J., Elwen, S. H., James, B. S. & Gridley, T. Identification of potential signature whistles from free-ranging common
439
dolphins (Delphinus delphis) in South Africa. Animal Cogn. 22, 777–789, DOI: 10.1007/s10071-019-01274-1 (2019).440
31.
Cones, S. et al. Probable signature whistle production in atlantic white-sided ( Lagenorhynchus acutus ) and short-
441
beaked common ( Delphinus delphis ) dolphins near cape cod, massachusetts. Mar. Mammal Sci. 39, 338–344, DOI:
442
10.1111/mms.12976 (2023).443
32.
ICES. Workshop on mitigation measures to reduce bycatch of short-beaked common dolphins in the Bay of Biscay
444
(WKEMBYC2). Tech. Rep., ICES Scientific Reports (2023). DOI: 10.17895/ICES.PUB.21940337.V1.445
33.
Dars, C. et al. Les échouages de mammifères marins sur le littoral français en 2021. Rapport scientifique de l’Observatoire
446
Pelagis, Réseau National échouage (2021).447
34.
Gilles, A. et al. Estimates of cetacean abundance in european atlantic waters in summer 2022 from the SCANS-IV aerial
448
and shipboard surveys. (2023).449
35.
Lehnhoff, L. et al. Behavioural Responses of Common Dolphins Delphinus delphis to a Bio-Inspired Acoustic Device for
450
Limiting Fishery By-Catch. Sustainability 14, 13186, DOI: 10.3390/su142013186 (2022).451
36.
Antichi, S., Urbán R., J., Martínez-Aguilar, S. & Viloria-Gómora, L. Changes in whistle parameters of two common
452
bottlenose dolphin ecotypes as a result of the physical presence of the research vessel. PeerJ 10, e14074, DOI: 10.7717/
453
peerj.14074 (2022).454
37.
Corrias, V. et al. Bottlenose dolphin (tursiops truncatus) whistle modulation during a trawl bycatch event in the adriatic sea.
455
Animals 11, 3593, DOI: 10.3390/ani11123593 (2021).456
15/17
38. Lehnhoff, L. PyAVA : Python interface for the Annotation of Vocalisations in Audio recordings (2022).457
39. Jocher, G., Chaurasia, A. & Qiu, J. Ultralytics YOLO (2023).458
40.
Redmon, J., Divvala, S., Girshick, R. & Farahadi, A. You Only Look Once: Unified, Real-Time Object Detection. In
459
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).460
41. He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition (2015).461
42.
Ansel, J. et al. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph
462
Compilation. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming
463
Languages and Operating Systems, Volume 2, 929–947, DOI: 10.1145/3620665.3640366 (ACM, La Jolla CA USA, 2024).
464
43.
Venkatesh, S., Moffat, D. & Miranda, E. R. You only hear once: A YOLO-like algorithm for audio segmentation and
465
sound event detection. Appl. Sci. 12, 3293, DOI: 10.3390/app12073293 (2022).466
44.
Duan, D. et al. Real-time identification of marine mammal calls based on convolutional neural networks. Appl. Acoust.
467
192, 108755, DOI: 10.1016/j.apacoust.2022.108755 (2022).468
45.
Escobar-Amado, C., Badiey, M. & Wan, L. Computer vision for bioacoustics: Detection of bearded seal vocalizations in
469
the chukchi shelf using YOLOV5. IEEE J. Ocean. Eng. 49, 133–144, DOI: 10.1109/JOE.2023.3307175 (2024).470
46.
Chavin, S., Simard, A., Jetté, J.-F., Villard, M.-A. & Glotin, H. Efficient automatic diarization of simultaneously voicing
471
birds from 82 recorders during 6 years from arctic to tempered quebec. submission to int. J. Sci. Rep. (2024).472
47.
Chavin, S., Couvat, J., Best, P., Ourmières, Y. & Glotin, H. Exploring the repertoire and evolution of humpback whale
473
songs in the caribbean sea: a multi-year survey using yolov5 neural network. submission to PLOS Comput. Biol. (2024).474
48.
Chavin, S., Ferré, L., Villepreux, T. & Glotin, H. Passive acoustic monitoring to study the behaviour of river dolphins
475
(sotalia guianensis). submission to Ecol. Informatics J. - Elsevier (2024).476
49.
Girardet, J., Chavin, S., Poupard, M., Guiderdoni, J. & Glotin, H. Orcas, fin whales and humpback whales overlaped
477
voicings in arctic fjord : interactions with anthropophonic pressure. review J. Acoust. Soc. Am. (2024).478
50.
Palanisamy, K., Singhania, D. & Yao, A. Rethinking CNN models for audio classification. arXiv:2007.11154 [cs, eess]
479
(2020). 2007.11154.480
51.
Figueiredo, L. D. d., Maciel, I., Viola, F. M., Savi, M. A. & Simão, S. M. Nonlinear features in whistles produced by the
481
short-beaked common dolphin ( Delphinus delphis ) off southeastern Brazil. The J. Acoust. Soc. Am. 153, 2436, DOI:
482
10.1121/10.0017883 (2023).483
52.
Giorgino, T. Computing and visualizing dynamic time warping alignments in R: The dtw package. J. Stat. Softw. 31, DOI:
484
10.18637/jss.v031.i07 (2009).485
53.
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.
486
Tech. Rep. arXiv:1802.03426, arXiv (2020).487
54.
Casey, C. et al. Common dolphin whistle responses to experimental mid-frequency sonar. PLOS ONE 19, e0302035, DOI:
488
10.1371/journal.pone.0302035 (2024).489
55.
Lammers, M. O. & Au, W. W. L. Directionality in the whistles of hawaiian spinner dolphins (Stenella longirostris): a
490
signal feature to cue direction of movement? Mar. Mammal Sci. 19, 249–264, DOI: 10.1111/j.1748-7692.2003.tb01107.x
491
(2003).492
56.
Rasmussen, M. H., Lammers, M., Beedholm, K. & Miller, L. A. Source levels and harmonic content of whistles in
493
white-beaked dolphins ( Lagenorhynchus albirostris ). The J. Acoust. Soc. Am. 120, 510–517, DOI: 10.1121/1.2202865
494
(2006).495
57.
Branstetter, B. K., Moore, P. W., Finneran, J. J., Tormey, M. N. & Aihara, H. Directional properties of bottlenose
496
dolphin ( Tursiops truncatus ) clicks, burst-pulse, and whistle sounds. The J. Acoust. Soc. Am. 131, 1613–1621, DOI:
497
10.1121/1.3676694 (2012).498
58.
Henderson, E. E., Hildebrand, J. A., Smith, M. H. & Falcone, E. A. The behavioral context of common dolphin (delphinus
499
sp.) vocalizations. Mar. Mammal Sci. 28, 439–460, DOI: 10.1111/j.1748-7692.2011.00498.x (2012).500
59.
Lehnhoff, L., Glotin, H., Pochat, H., Pochat, K. & Mérigot, B. Cetacean bearing using a compact four-hydrophone array:
501
echolocation and communication features highlighted for the free-ranging short-beaked common dolphin. submitted to
502
applied acoustics (2024).503
60.
Acevedo-Gutiérrez, A. & Stienessen, S. C. Bottlenose dolphins (tursiops truncatus) increase number of whistles when
504
feeding. Aquatic Mamm. 30, 357–362, DOI: 10.1578/AM.30.3.2004.357 (2004).505
16/17
61.
Finneran, J. J. Noise-induced hearing loss in marine mammals: A review of temporary threshold shift studies from 1996 to
506
2015. The J. Acoust. Soc. Am. 138, 1702–1726, DOI: 10.1121/1.4927418 (2015).507
62.
Mooney, T. A., Au, W. W. L., Nachtigall, P. E. & Trippel, E. A. Acoustic and stiffness properties of gillnets as they relate
508
to small cetacean bycatch. ICES J. Mar. Sci. 64, 1324–1332, DOI: 10.1093/icesjms/fsm135 (2007).509
Acknowledgements510
We thank Michael Paul for improving the English of the paper. We also thank Yves Le Gall and Eric Menut from IFREMER as
511
well as Eleonore Meheust and Jérôme Spitz from the Pélagis Observatory for their help in collecting data during field surveys
512
of the DOLPHINFREE project.513
Author contributions514
Conceptualisation: B.M. H.G. and L.L.; methodology: B.M., H.G., L.L., H.P., O.V.C., A.P., K.P.; software: L.L and H.G.;
515
validation: B.M. and H.G.; formal analysis: L.L., H.G. and B.M.; data acquisition: all authors; data curation: L.L., H.G. and
516
B.M.; writing original draft preparation: L.L.; writing review and editing: L.L., B.M., H.G., H.P., O.V.C.; visualisation, figures
517
production: L.L., O.V.C.; supervision: B.M. and H.G.; project administration: B.M.; All authors read the manuscript.518
Data availability519
Acoustic recordings that do not contain the bio-inspired signal, which is confidential, are available upon request. DYOC
520
model along with all scripts and results related to this work are available at
https://gitlab.lis-lab.fr/loic.521
lehnhoff/dyoc-df (accessed on 16 July 2024).522
Funding523
The DOLPHINFREE project coordinated by B.M. is funded by the European Maritime and Fisheries Fund (EMFF) and France
524
Filière Pêche (FFP). L.L.’s PhD grant is provided by Montpellier University. Part of this work is funded by national Chair in
525
Artificial Intelligence for bioacoustics funded by ADSIL ANR-20-CHIA-0014-01 DGA and AID (PI H.G.).526
Institutional Review527
The DOLPHINFREE project had (i) agreement #0-12520-2021/PREMAR_ATLANT/AEM/NP from the French Maritime
528
Prefecture of the Atlantic “to conduct a survey for monitoring groups of common dolphins by means of scientific instruments
529
off the south Finistère coast, following Décret n2017-956 of the scientific marine research”, (ii) favourable notification from530
the Ethical Committee in Animal Experiment of Languedoc Roussillon (CEEA-LR) for request #26568 “Behavioural study of
531
wild dolphin groups in response to acoustic signals for limiting bycatch from professional fishery”.532
Competing interest533
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or
534
interpretation of data; in the writing of the manuscript, nor in the decision to publish the results.535
17/17
Preprint
Full-text available
Dolphins are highly vocal cetaceans with a complex acoustic repertoire. These marine mammals rely heavily on sound for critical activities: echolocation clicks for navigation and prey detection, whistles for social communication, and pulsed sounds for less well-documented purposes. Understanding their acoustic behaviour is essential for insights into their ecology, social structure, and responses to anthropogenic noise. However, to date, there is a lack of open-access datasets of acoustic recordings of wild free-ranging short-beaked common dolphins (Delphinus delphis), coupled with observations data. Here, we present a dataset (DOI:10.5281/zenodo.14637674, Lehnhoff (2025)) of high resolution acoustic recordings of (D. delphis) observed during various behavioural states, including foraging, travelling, socializing, milling, and attraction to the boat. The dataset was collected in the northern Bay of Biscay, France, from summers of 2020 to 2022 during surveys conducted as part of the DOLPHINFREE project. The dataset contains acoustic recordings of wild free-ranging short-beaked common dolphins (Delphinus delphis) observed during various behavioural states, including foraging, travelling, socializing, milling, and attraction to the boat. Audio recordings were performed during opportunistic encounters using two devices: a single high-quality hydrophone (sampling rate: 512 kHz, bit-depth: 32 bits) and a compact array of four hydrophones (256 kHz to 512 kHz, 16 to 24 bits) for localization purposes. The dataset comprises over 400 minutes of unedited audio recordings of D. delphis accompanied by visual observations. In total, we identified about 68,000 echolocation clicks, 4,600 whistle contours, and more that 350 pulsed sounds. This comprehensive resource is valuable for detailed studies of the acoustic repertoire of common dolphins and their two-dimensional movements.
Technical Report
Full-text available
The large-scale international survey SCANS-IV from summer 2022 is complementing previous surveys (SCANS 1994, SCANS-II/CODA 2005/2007, SCANS-III 2016). These data provide information on changes in the abundance and distribution of the most abundant cetacean species over a considerable period of almost three decades. This enables an assessment of the conservation status and is thus an important basis for establishing effective conservation measures. The SCANS-IV project is supported by the governments of Denmark, France, Germany, the Netherlands, Spain, Sweden, Portugal and the United Kingdom. SCANS-IV, in which teams of observers in eight aircraft and on one research vessel were deployed simultaneously in a study area from southern Norway to the Strait of Gibraltar in Portugal, investigated the European Atlantic in the summer of 2022 with the largest search effort to date. Over a period of six weeks, an area of 1.7 million km² was covered and a distance of 75,000 km was travelled along transects.
Article
Full-text available
By-catch is the most direct threat to marine mammals globally. Acoustic repellent devices (pingers) have been developed to reduce dolphin by-catch. However, mixed results regarding their efficiency have been reported. Here, we present a new bio-inspired acoustic beacon, emitting returning echoes from the echolocation clicks of a common dolphin ‘Delphinus delphis’ from a fishing net, to inform dolphins of its presence. Using surface visual observations and the automatic detection of echolocation clicks, buzzes, burst-pulses and whistles, we assessed wild dolphins’ behavioural responses during sequential experiments (i.e., before, during and after the beacon’s emission), with or without setting a net. When the device was activated, the mean number of echolocation clicks and whistling time of dolphins significantly increased by a factor of 2.46 and 3.38, respectively (p < 0.01). Visual surface observations showed attentive behaviours of dolphins, which kept a distance of several metres away from the emission source before calmly leaving. No differences were observed among sequences for buzzes/burst-pulses. Our results highlight that this prototype led common dolphins to echolocate more and communicate differently, and it would favour net detection. Complementary tests of the device during the fishing activities of professional fishermen should further contribute to assessment of its efficiency.
Article
Full-text available
Marine mammal vocal elements have been investigated for decades to assess whether they correlate with stress levels or stress indicators. Due to their acoustic plasticity, the interpretation of dolphins’ acoustic signals of has been studied most extensively. This work describes the acoustic parameters detected in whistle spectral contours, collected using passive acoustic monitoring (PAM), in a bycatch event that involved three Bottlenose dolphins during midwater commercial trawling. The results indicate a total number of 23 upsweep whistles recorded during the bycatch event, that were analyzed based on the acoustic parameters as follows: (Median; 25th percentile; 75th percentile) Dr (second), total duration (1.09; 0.88; 1.24); fmin (HZ), minimum frequency (5836.4; 5635.3; 5967.1); fmax (HZ), maximum frequency, (11,610 ± 11,293; 11,810); fc (HZ), central frequency; (8665.2; 8492.9; 8982.8); BW (HZ), bandwidth (5836.4; 5635.3; 5967.1); Step, number of step (5; 4; 6). Furthermore, our data show that vocal production during the capture event was characterized by an undescribed to date combination of two signals, an ascending whistle (upsweep), and a pulsed signal that we called “low-frequency signal” in the frequency band between 4.5 and 7 kHz. This capture event reveals a novel aspect of T. truncatus acoustic communication, it confirms their acoustic plasticity, and suggests that states of discomfort are conveyed through their acoustic repertoire.
Article
Full-text available
The most flexible communication systems are those of open-ended vocal learners that can acquire new signals throughout their lifetimes. While acoustic signals carry information in general voice features that affect all of an individual's vocalizations, vocal learners can also introduce novel call types to their repertoires. Delphinids are known for using such learned call types in individual recognition, but their role in other contexts is less clear. We investigated the whistles of two closely related, sympatric common dolphin species, Delphinus delphis and Delphinus bairdii , to evaluate species differences in whistle contours. Acoustic recordings of single-species groups were obtained from the Southern California Bight. We used an unsupervised neural network to categorize whistles and compared the resulting whistle types between species. Of the whistle types recorded in more than one encounter, 169 were shared between species and 60 were species-specific (32 D. delphis types, 28 D. bairdii types). Delphinus delphis used 15 whistle types with an oscillatory frequency contour while only one such type was found in D. bairdii . Given the role of vocal learning in delphinid vocalizations, we argue that these differences in whistle production are probably culturally driven and could help facilitate species recognition between Delphinus species. This article is part of the theme issue ‘Vocal learning in animals and humans’.
Article
Full-text available
UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP as described has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.
Article
1. The short‐beaked common dolphin is a highly vocal species, with a wide distribution in all oceans, including the Mediterranean and the Black Seas. In the Mediterranean Sea, the short‐beaked common dolphin inhabits both pelagic and neritic waters. 2. Osteological collections and the literature show that short‐beaked common dolphins were widespread and abundant in much of the Mediterranean Sea until the late 1960s. During recent decades the species has declined in the whole basin, and, in 2003, it was listed as Endangered in the IUCN Red List. 3. Genetic studies strongly suggest that the Mediterranean and the Eastern North Atlantic populations are isolated from each other. Genetic differentiation within the Mediterranean Sea, between the Eastern Mediterranean (Ionian Sea) and Western Mediterranean populations, is also reported. 4. The aim of this study was to investigate the geographical variation in the characteristics of whistles of free‐ranging short‐beaked common dolphins living in the Mediterranean Sea, and to evaluate if whistle acoustic structure is the result of adaptation to local environment characteristics or of a possible genetic diversification. 5. Recordings were collected from 1994 to 2012 throughout the basin, employing multiple platforms. Twenty‐six independent acoustic detections were made, and 704 whistles were extracted and considered for statistical analysis. 6. Whistle analysis enabled the identification of distinct geographical units of short‐beaked common dolphin within the Mediterranean Sea. Genetic isolation is probably the major cause of the geographic variance of the Mediterranean short‐beaked common dolphin whistle structure, which may reflect some evolutionary adaptations to particular ecological conditions or may be the by‐product of morphological evolution. 7. The results of the present study show that intra‐Mediterranean variability of whistle structure reflects the path of genetic studies, highlighting the possible use of acoustic data in combination with other sources of data (genetic, morphological, etc.) to identify geographic areas where discrete management units occur.
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
The vocal repertoire of many delphinid odontocetes includes narrowband tonal whistles used mainly for communication. The aim of this study was to describe the whistle repertoire of short-beaked common dolphins, Delphinus delphis, recorded in the Celtic Sea between May and August 2005. The 1835 whistles recorded were classified into six broad categories and 30 sub-types, of which simple upsweeps and downsweeps were the most common. Furthermore, the parameters duration, inflections, steps and various frequency variables were measured. The whistles covered a frequency span from 3.56 kHz to 23.51 kHz and had durations between 0.05 and 2.02 seconds. Whistle parameters varied with behavioural context, group size and between encounters. The whistle repertoire of Celtic Sea common dolphins was compared to that of D. delphis from the Western Approaches of the English Channel, recorded during a survey between January and March 2004. The relative abundances of the broad whistle types did not differ between the two locations, but most whistle parameters were significantly different: almost all frequency variables measured were significantly higher in English Channel whistles. This may indicate some degree of population structuring of short-beaked common dolphins around Britain. Alternatively, the common dolphins in the English Channel may have shifted the frequencies of their vocalizations up to avoid masking by low-frequency ambient noise produced by high levels of vessel traffic in this area.