Conference Paper

A General Purpose Automatic Detector of Broadband Transient Signals in Underwater Audio

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In contrast, some of the other methods rely on image-processing techniques as their workhorse. Examples include "edge" detectors (Gillespie, 2004), "ridge" detectors (Kershenbaum and Roch, 2013;Madhusudhana et al., 2016), and "blob" detectors (Madhusudhana et al., 2018). ...
... The "blob" detector is derived from an image-processing technique which facilitates the automatic extraction of 2 D blob-like features in grayscale images (Lindeberg, 1993). A general-purpose implementation for the extraction of regions of arbitrary bandwidth and duration in spectrograms was proposed by Madhusudhana et al. (2018). The detector employed a suite of one-dimensional convolutional operators for detecting contiguous regions of higher intensity in spectrogram frames and then associated detections from neighboring frames together (based on heuristics) to "trace" an underlying band-limited signal over time. ...
... The blob detector, a lightweight algorithm implemented with a very low operational-memory footprint, offers high throughput and is readily capable of performing detections on streaming audio (Madhusudhana et al., 2018) in in situ applications. Its ability to detect arbitrarily shaped spectrotemporal regions of higher intensity was restricted to the frequency range corresponding to the bandwidth of Omura's whale vocalizations. ...
Article
Automatically detecting animal signals in soundscape recordings is of benefit to passive acoustic monitoring programs which may be undertaken for research or conservation. Numerous algorithms exist, which are typically optimized for certain situations (i.e., certain animal sound types and ambient noise conditions). Adding to the library of algorithms, this paper developed, tested, and compared three detectors for Omura's whale vocalizations (15–62 Hz; <15 s) in marine soundscape recordings which contained noise from other animals, wind, earthquakes, ships, and seismic surveys. All three detectors were based on processing of spectrographic representations. The specific methods were spectrogram cross-correlation, entropy computation, and spectral intensity “blob” tracing. The latter two were general-purpose detectors that were adapted for detection of Omura's whale vocalizations. Detector complexity and post-processing effort varied across the three detectors. Performance was assessed qualitatively using demonstrative examples, and quantitatively using Receiver-Operating Characteristics and Precision-Recall curves. While the results of quantitative assessment were dominated by the spectrogram cross-correlation method, qualitative assessment showed that all three detectors offered promising performance.
... The proposed pre-conditioning layer applies onedimensional Laplacian of Gaussian (LoG) operators at two scales (σ = 2, 4) along the frequency axis of the input spectrogram and is followed by the application of twelve 3 × 3 convolutions to the responses at each scale. LoG operators enhance the signal-tonoise ratio (SNR) of features in spectrograms, and the application of multi-scale LoG operators allows spectrographic features of different sizes to be captured in the responses at comparable scales [52] (figure 4). The pre-conditioning layer's outputs, formed from the concatenation of the LoG operators' outputs and convolution outputs, comprised 26 channels (commonly also referred to as feature maps). ...
Article
Full-text available
Many animals rely on long-form communication, in the form of songs, for vital functions such as mate attraction and territorial defence. We explored the prospect of improving automatic recognition performance by using the temporal context inherent in song. The ability to accurately detect sequences of calls has implications for conservation and biological studies. We show that the performance of a convolutional neural network (CNN), designed to detect song notes (calls) in short-duration audio segments, can be improved by combining it with a recurrent network designed to process sequences of learned representations from the CNN on a longer time scale. The combined system of independently trained CNN and long short-term memory (LSTM) network models exploits the temporal patterns between song notes. We demonstrate the technique using recordings of fin whale (Balaenoptera physalus) songs, which comprise patterned sequences of characteristic notes. We evaluated several variants of the CNN + LSTM network. Relative to the baseline CNN model, the CNN + LSTM models reduced performance variance, offering a 9-17% increase in area under the precision-recall curve and a 9-18% increase in peak F1-scores. These results show that the inclusion of temporal information may offer a valuable pathway for improving the automatic recognition and transcription of wildlife recordings.
... The proposed pre-conditioning layer applies onedimensional Laplacian of Gaussian (LoG) operators at two scales (σ = 2, 4) along the frequency axis of the input spectrogram and is followed by the application of twelve 3 × 3 convolutions to the responses at each scale. LoG operators enhance the signal-tonoise ratio (SNR) of features in spectrograms, and the application of multi-scale LoG operators allows spectrographic features of different sizes to be captured in the responses at comparable scales [52] (figure 4). The pre-conditioning layer's outputs, formed from the concatenation of the LoG operators' outputs and convolution outputs, comprised 26 channels (commonly also referred to as feature maps). ...
Presentation
Convolutional neural networks (CNNs) are commonly employed for detecting animal vocalizations. We explored whether use of the temporal patterns in song notes can improve recognition. Fin whales (Balaenoptera physalus) produce sequences of low-frequency, down-swept calls (20 Hz pulses) over many minutes. Timing between calls can be exploited to improve detection. We trained a base CNN model to detect 20 Hz pulses in 4 s audio segments. Then, we trained three variants of long short-term memory (LSTM) networks to process sequences produced by the CNN. In the first, the inputs to the LSTM were the scalar prediction scores from the CNN. The second examined sequences of features produced by the CNN before classification. The third combined the feature vectors and scores produced by the CNN. We conducted cross-validation experiments on recordings from the Southern California Bight collected between 2008 and 2014. All three variants outperformed the CNN. The precision-recall (PR) curves of the hybrid models dominated that of the base model, with improvements of 8%–13% in both peak F1-score and area under PR-curve. The second and third hybrid variants performed better than the first. CNN-LSTM hybrid models efficiently improve recognition of call sequences by incorporating temporal context.
Article
Full-text available
Dolphins and whales use tonal whistles for communication, and it is known that frequency modulation encodes contextual information. An automated mathematical algorithm could characterize the frequency modulation of tonal calls for use with clustering and classification. Most automatic cetacean whistle processing techniques are based on peak or edge detection or require analyst assistance in verifying detections. An alternative paradigm is introduced using techniques of image processing. Frequency information is extracted as ridges in whistle spectrograms. Spectral ridges are the fundamental structure of tonal vocalizations, and ridge detection is a well-established image processing technique, easily applied to vocalization spectrograms. This paradigm is implemented as freely available matlab scripts, coined IPRiT (image processing ridge tracker). Its fidelity in the reconstruction of synthesized whistles is compared to another published whistle detection software package, silbido. Both algorithms are also applied to real-world recordings of bottlenose dolphin (Tursiops trunactus) signature whistles and tested for the ability to identify whistles belonging to different individuals. IPRiT gave higher fidelity and lower false detection than silbido with synthesized whistles, and reconstructed dolphin identity groups from signature whistles, whereas silbido could not. IPRiT appears to be superior to silbido for the extraction of the precise frequency variation of the whistle.
Article
Full-text available
Since May 1996, an array of autonomous hydrophone moorings has been continuously deployed in the eastern equatorial Pacific to provide long-term monitoring of seismic activity, including low-level volcanic signals, along the East Pacific Rise between 20°N and 20°S and the Galapagos Ridge. The instruments and moorings were designed to continuously record low-frequency acoustic energy in the SOFAR channel for extended periods and produce results comparable to those previously derived by using the U.S. Navy Sound Surveillance System (SOSUS) in the northeast Pacific. The technology and methodology developed for this experiment, including instrument design, mooring configuration, analysis software, location algorithms (with an analysis of errors), and a predicted error field, are described in detail. Volcanic activity is observed throughout the Pacific, along with seismicity along transform faults, subduction zones, and intraplate regions. Comparison data sets indicate detection thresholds and accuracy better than the land networks for open ocean areas and results comparable to, or better than, SOSUS. Volcanic seismicity along the fast spreading East Pacific Rise appears similar to documented examples in the northeast Pacific but with much shorter durations. One example from the intermediate spreading Galapagos Ridge is comparable to northeast Pacific examples, and several episodes of activity were observed in the Wilkes Transform Fault Zone. A site of continuing off-axis seismicity is located near 18°S and 116°W. Isolated intraplate earthquakes are observed throughout the study area. Earthquake information from this experiment and future observations will be provided through the World Wide Web and earthquake data centers.
Article
Full-text available
We propose a new probabilistic scheme for the automatic recognition of underwater acoustic signals generated by teleseismic P-waves recorded by hydrophones in the ocean. The recognition of a given signal is based on the relative distribution of its power among different frequency bands. The signal's power distribution is compared with a statistical model developed by analyzing relative power distributions of many signals of the same origin and a numerical criterion is calculated, which can serve as a measure of the probability for the signal to belong to the statistical model. Our recognition scheme was applied to 6-month-long continuous records of seven ocean bottom hydrophones (OBH) deployed in the Ligurian Sea. A maximum of 94% of all detectable teleseismic P-waves recorded during the deployment of the OBHs were recognized correctly with no false recognitions. The proposed recognition method will be implemented in autonomous underwater robots dedicated to detect and transmit acoustic signals generated by teleseismic P-waves. Citation: Sukhovich, A., J.-O. Irisson, F. J. Simons, A. Oge, Y. Hello, A. Deschamps, and G. Nolet (2011), Automatic discrimination of underwater acoustic signals generated by teleseismic P-waves: A probabilistic approach, Geophys. Res. Lett., 38, L18605, doi:10.1029/2011GL048474.
Article
Full-text available
An automated procedure has been developed for detecting and localizing frequency-modulated bowhead whale sounds in the presence of seismic airgun surveys. The procedure was applied to four years of data, collected from over 30 directional autonomous recording packages deployed over a 280 km span of continental shelf in the Alaskan Beaufort Sea. The procedure has six sequential stages that begin by extracting 25-element feature vectors from spectrograms of potential call candidates. Two cascaded neural networks then classify some feature vectors as bowhead calls, and the procedure then matches calls between recorders to triangulate locations. To train the networks, manual analysts flagged 219 471 bowhead call examples from 2008 and 2009. Manual analyses were also used to identify 1.17 million transient signals that were not whale calls. The network output thresholds were adjusted to reject 20% of whale calls in the training data. Validation runs using 2007 and 2010 data found that the procedure missed 30%-40% of manually detected calls. Furthermore, 20%-40% of the sounds flagged as calls are not present in the manual analyses; however, these extra detections incorporate legitimate whale calls overlooked by human analysts. Both manual and automated methods produce similar spatial and temporal call distributions.
Article
Full-text available
An algorithm is presented for the detection of frequency contour sounds-whistles of dolphins and many other odontocetes, moans of baleen whales, chirps of birds, and numerous other animal and non-animal sounds. The algorithm works by tracking spectral peaks over time, grouping together peaks in successive time slices in a spectrogram if the peaks are sufficiently near in frequency and form a smooth contour over time. The algorithm has nine parameters, including the ones needed for spectrogram calculation and normalization. Finding optimal values for all of these parameters simultaneously requires a search of parameter space, and a grid search technique is described. The frequency contour detection method and parameter optimization technique are applied to the problem of detecting "boing" sounds of minke whales from near Hawaii. The test data set contained many humpback whale sounds in the frequency range of interest. Detection performance is quantified, and the method is found to work well at detecting boings, with a false-detection rate of 3% for the target missed-call rate of 25%. It has also worked well anecdotally for other marine and some terrestrial species, and could be applied to any species that produces a frequency contour, or to non-animal sounds as well.
Thesis
Full-text available
Long-term continuous monitoring of ice break-up on ice shelves and icebergs in Antarctica is essential for a global observation system of climate change and its consequences. While calving of massive pieces of ice from the Antarctic ice shelf is well observed from satellites, numerous ice breaks of smaller volume cannot be systematically monitored and statistically analysed by the existing means of remote sensing and local in-situ observations. This study aimed to investigate the feasibility of an alternative monitoring approach based on remote acoustic observations of ice rifting and breaking events on Antarctic ice shelves and icebergs using distant underwater acoustic listening stations in the ocean. This investigation was carried out using long-term continuous sea noise recordings made from 2002 to 2007 at two hydroacoustic stations deployed in the Indian Ocean as part of the International Monitoring System of the Comprehensive Nuclear-Test-Ban Treaty: off Cape Leeuwin in Western Australia (HA01) and off Chagos Archipelago (HA08). Investigations of a number of scientific and technical issues relevant to the main objective were carried out in this study. They include: 1) processing of the CTBT hydroacoustic data from the two IMS stations with the aim of detecting and identifying signals received from Antarctic ice breaking events; 2) investigating the time-frequency arrival structure of the signals expected from ice events using experimental data and numerical modelling of acoustic propagation from Antarctica to the IMS stations in the Indian Ocean; 3) analysing the bearing accuracy of the IMS stations; 4) examining three different schemes for localization of ice events using either one or two IMS stations; 5) analysing the spatial distribution of Antarctic ice events observed over 6 years of data collection and its correlation with the major glacial features of the Eastern Antarctic coastal zone which are most likely sources of newly calved icebergs and underwater noise produced by ice breakup; 6) analysing long-term variations in the occurrence of ice events and their links with changes in climate related metocean characteristics of the Eastern Antarctic coastal zone. A number of important findings and conclusions were made based on the results of this study. It was revealed that Antarctica is one of the major sources of low-frequency underwater noise at the two IMS stations in the Indian Ocean. The transient signals received at the IMS stations from Antarctic ice events consist mainly of a mode one arrival pulse with strong frequency dispersion, which is due to the acoustic propagation characteristics in the near-surface acoustic channel of the polar ocean environment south of the Antarctic Convergence Zone (ACZ). Both HA01 and H08S stations have bearing estimate accuracy for transient acoustic noise in Antarctica of about 0.2° RMS. The bearing error of HA01 also has a systematic component of around 0.8° clockwise. The bearing deviation induced by horizontal refraction of acoustic propagation across the ACZ polar frontal zone and over the continental slopes can be considerable, up to 1° for sources located in the easternmost and westernmost parts of the Eastern Antarctic coastal zone observed from the IMS stations. The localization of Antarctic ice events can be achieved either by triangulation using bearing estimates, if the same event can be detected at both stations, or by estimating the range to the noise source through inversion of mode 1 dispersion characteristic when the signal is detected only at one station. The location of ice events in the Antarctic coastal zone can also be coarsely estimated from the low cut-off frequency of mode 1 measured at the receive station. The majority of ice events observed at HA01 were located within a number of back-azimuth sectors which correspond to the directions to the ice shelves and iceberg tongues which are known as active zones of ice break-up in Eastern Antarctica. The temporal changes in the occurrence frequency of ice events detected at HA01 reveal strong seasonal variations but no significant interannual trend. Based on the main results and findings, this study achieved its primary aim to demonstrate the feasibility of remote monitoring of ice rifting and breaking events on Antarctic ice shelves and icebergs using the IMS hydroacoustic listening stations deployed in the Indian Ocean.
Article
Full-text available
This article describes an automatic detector for marine mammal vocalizations. Even though there has been previous research on optimizing automatic detectors for specific calls or specific species, the detection of any type of call by a diversity of marine mammal species still poses quite a challenge--and one that is faced more frequently as the scope of passive acoustic monitoring studies and the amount of data collected increase. Information (Shannon) entropy measures the amount of information in a signal. A detector based on spectral entropy surpassed two commonly used detectors based on peak-energy detection. Receiver operating characteristic curves were computed for performance comparison. The entropy detector performed considerably faster than real time. It can be used as a first step in an automatic signal analysis yielding potential signals. It should be followed by automatic classification, recognition, and identification algorithms to group and identify signals. Examples are shown from underwater recordings in the Western Canadian Arctic. Calls of a variety of cetacean and pinniped species were detected.
Article
Full-text available
This paper makes available a concise review of data windows and their affect on the detection of harmonic signals in the presence of broad-band noise, and in the presence of nearby strong harmonic interference. We also call attention to a number of common errors in the application of windows when used with the fast Fourier transform. This paper includes a comprehensive catalog of data windows along with their significant performance parameters from which the different windows can be compared. Finally, an example demonstrates the use and value of windows to resolve closely spaced harmonic signals characterized by large differences in amplitude.
Article
Full-text available
This article presents: (i) a multiscale representation of grey-level shape called the scale-space primal sketch, which makes explicit both features in scale-space and the relations between structures at different scales, (ii) a methodology for extracting significant blob-like image structures from this representation, and (iii) applications to edge detection, histogram analysis, and junction classification demonstrating how the proposed method can be used for guiding later-stage visual processes. The representation gives a qualitative description of image structure, which allows for detection of stable scales and associated regions of interest in a solely bottom-up data-driven way. In other words, it generates coarse segmentation cues, and can hence be seen as preceding further processing, which can then be properly tuned. It is argued that once such information is available, many other processing tasks can become much simpler. Experiments on real imagery demonstrate that the proposed theory gives intuitive results.
Article
Narrowband acoustic signals occur prominently in underwater environments and are commonly used in the identification of the vocalizing species. Disparate ad hoc systems have previously been developed for the detection or recognition of several forms of narrowband signals produced by specific species of marine mammals. We present a generic system, based on post-processing of spectrograms, for the automatic extraction of time-frequency contours of narrowband signals produced by marine mammals. A two-phase approach is proposed where the first phase is based on an image-processing technique for detecting intensity ridges and the second phase is a Bayesian filtering approach for tracing the trajectory of detected ridge apices. The choice of algorithm parameters and conditionals are backed with theoretical motivations and are geared to result in a generic (non-targeted) system. In comparison to an existing method, using publicly available pre-annotated recordings containing four species of dolphins, our system offered an average increase of 19% and 18% in precision and recall, respectively. The proposed system is well-suited for both offline and in-situ applications. A streaming-mode implementation in MATLAB® processes inputs with an average real-time factor of 0.13 on modest desktop computers.
Article
Prior research has shown that echolocation clicks of several species of terrestrial and marine fauna can be modelled as Gabor-like functions. Here, a system is proposed for the automatic detection of a variety of such signals. By means of mathematical formulation, it is shown that the output of the Teager-Kaiser Energy Operator (TKEO) applied to Gabor-like signals can be approximated by a Gaussian function. Based on the inferences, a detection algorithm involving the post-processing of the TKEO outputs is presented. The ratio of the outputs of two moving-average filters, a Gaussian and a rectangular filter, is shown to be an effective detection parameter. Detector performance is assessed using synthetic and real (taken from MobySound database) recordings. The detection method is shown to work readily with a variety of echolocation clicks and in various recording scenarios. The system exhibits low computational complexity and operates several times faster than real-time. Performance comparisons are made to other publicly available detectors including pamguard.
Article
In this paper we presen algorithms for the solution of the general assignment and transportation problems. In Section 1, a statement of the algorithm for the assignment problem appears, along with a proof for the correctness of the algorithm. The remarks which constitute the proof are incorporated parenthetically into the statement of the algorithm. Following this appears a discussion of certain theoretical aspects of the problem. In Section 2, the algorithm is generalized to one for the transportation problem. The algorithm of that section is stated as concisely as possible, with theoretical remarks omitted. 1. THE ASSIGNMENT PROBLEM. The personnel-assignment problem is the problem of choosing an optimal assignment of n men to n jobs, assuming that numerical ratings are given for each man’s performance on each job. An optimal assignment is one which makes the sum of the men’s ratings for their assigned jobs a maximum. There are n! possible assignments (of which several may be optimal), so that it is physically impossible, except
Conference Paper
The sea is home to a myriad of marine animal species, many of which use sound as a primary means of communication, navigation and foraging. Of particular interest are the Blue whales (Balaenoptera musculus) of the cetacean family. Massive commercial whaling prior to 1960 brought the species close to extinction and its population still remains very low. Passive acoustic monitoring of baleen whales has recently been used to provide long-term information about their presence and behavior, and provides an attractive complement to traditional visual based monitoring. In this work we present a frequency domain based algorithm developed for extracting the frequency contours of the dominant harmonic in tonal calls of blue whales (B and D calls). The algorithm uses a two pass approach to contour extraction. In the first pass, partial candidate contours are formed, followed by a second pass which uses the partial information to construct complete contours. When evaluated on a one hour labeled recording, the algorithm had 90% recall and 76% precision.
Article
The energy ratio mapping algorithm (ERMA) was developed to improve the performance of energy-based detection of odontocete echolocation clicks, especially for application in environments with limited computational power and energy such as acoustic gliders. ERMA systematically evaluates many frequency bands for energy ratio-based detection of echolocation clicks produced by a target species in the presence of the species mix in a given geographic area. To evaluate the performance of ERMA, a Teager-Kaiser energy operator was applied to the series of energy ratios as derived by ERMA. A noise-adaptive threshold was then applied to the Teager-Kaiser function to identify clicks in data sets. The method was tested for detecting clicks of Blainville's beaked whales while rejecting echolocation clicks of Risso's dolphins and pilot whales. Results showed that the ERMA-based detector correctly identified 81.6% of the beaked whale clicks in an extended evaluation data set. Average false-positive detection rate was 6.3% (3.4% for Risso's dolphins and 2.9% for pilot whales).
Article
Passive acoustic systems used to study and monitor marine mammals generate enormous datasets which are costly and time-consuming to analyze. As part of a Joint Industry Programme sponsored effort, we reviewed automated and semi-automated methods and software packages available to detect, extract, and classify marine mammal sounds; identified gaps in capabilities and knowledge; and suggested ways forward. Because of the variability in marine mammal sounds, no single method is effective for all species. While spectrogram correlation works well for stereotyped calls, more general methods like band-limited threshold detection are more effective for variable sounds. Feature extraction is a rapidly evolving field, but a reliable, automated method has yet to be successfully implemented into existing software. A major gap in our capabilities is the ability to reliably detect and classify the highly variable signals produced by some species. The development of effective, efficient, and standardized methods applicable to many species will require large, validated datasets. The acquisition, maintenance, and availability of such datasets will entail concerted, collaborative efforts. Development of common datasets and organization of workshops that focus on furthering detection, extraction, and classification methods are two ways to address these important issues in the automated analysis of marine mammal sounds.
Article
Little is known about the spatial and temporal distribution of blast fishing which hampers enforcement against this activity. We have demonstrated that a triangular array of hydrophones 1 m apart is capable of detecting blast events whilst effectively rejecting other sources of underwater noise such as snapping shrimp and nearby boat propellers. A total of 13 blasts were recorded in Sepangor bay, North of Kota Kinabalu, Sabah, Malaysia from 7th to 15th July 2002 at distances estimated to be up to 20 km, with a directional uncertainty of 0.2 degrees . With such precision, a network of similar hydrophone arrays has potential to locate individual blast events by triangulation to within 30 m at a range of 10 km.
Conference Paper
First Page of the Article
Detection and classification of right whale calls using an ‘edge’ detector operating on a smoothed spectrogram
  • D Gillespie
Operational processing of hydroacoustics at the Prototype International Data Center
  • J Hanson
  • R Lebras
  • D Brumbaugh
  • J Guern
  • P Dysart
  • A Gault