Article

A generic system for the automatic extraction of narrowband signals of biological origin in underwater audio

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Narrowband acoustic signals occur prominently in underwater environments and are commonly used in the identification of the vocalizing species. Disparate ad hoc systems have previously been developed for the detection or recognition of several forms of narrowband signals produced by specific species of marine mammals. We present a generic system, based on post-processing of spectrograms, for the automatic extraction of time-frequency contours of narrowband signals produced by marine mammals. A two-phase approach is proposed where the first phase is based on an image-processing technique for detecting intensity ridges and the second phase is a Bayesian filtering approach for tracing the trajectory of detected ridge apices. The choice of algorithm parameters and conditionals are backed with theoretical motivations and are geared to result in a generic (non-targeted) system. In comparison to an existing method, using publicly available pre-annotated recordings containing four species of dolphins, our system offered an average increase of 19% and 18% in precision and recall, respectively. The proposed system is well-suited for both offline and in-situ applications. A streaming-mode implementation in MATLAB® processes inputs with an average real-time factor of 0.13 on modest desktop computers.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In contrast, some of the other methods rely on image-processing techniques as their workhorse. Examples include "edge" detectors (Gillespie, 2004), "ridge" detectors (Kershenbaum and Roch, 2013;Madhusudhana et al., 2016), and "blob" detectors (Madhusudhana et al., 2018). ...
Article
Automatically detecting animal signals in soundscape recordings is of benefit to passive acoustic monitoring programs which may be undertaken for research or conservation. Numerous algorithms exist, which are typically optimized for certain situations (i.e., certain animal sound types and ambient noise conditions). Adding to the library of algorithms, this paper developed, tested, and compared three detectors for Omura's whale vocalizations (15–62 Hz; <15 s) in marine soundscape recordings which contained noise from other animals, wind, earthquakes, ships, and seismic surveys. All three detectors were based on processing of spectrographic representations. The specific methods were spectrogram cross-correlation, entropy computation, and spectral intensity “blob” tracing. The latter two were general-purpose detectors that were adapted for detection of Omura's whale vocalizations. Detector complexity and post-processing effort varied across the three detectors. Performance was assessed qualitatively using demonstrative examples, and quantitatively using Receiver-Operating Characteristics and Precision-Recall curves. While the results of quantitative assessment were dominated by the spectrogram cross-correlation method, qualitative assessment showed that all three detectors offered promising performance.
... If signature whistles can be assigned to individuals, or at least stable groups of dolphins, then passive acoustic monitoring programs can be developed using long-term, autonomous recorders, and automatic whistle detection tools (e.g. [20,21]). These programs would provide information on dolphin distribution, demographics, and abundance throughout the Swan-Canning River System, for effective conservation management. ...
Article
Full-text available
The Swan–Canning River System is home to an Indo-Pacific bottlenose dolphin (Tursiops aduncus) community of currently 17 adult and juvenile individuals. While a complete photo-identification catalogue exists, visual monitoring requires repeated boat-based surveys and is thus laborious and expensive. Bottlenose dolphins are known to emit individually distinctive signature whistles, and therefore, passive acoustic monitoring could be a reliable and more efficient tool. Archived acoustic and photographic data from the Fremantle Inner Harbour were reviewed for instances when dolphin whistles and individual identifying images were simultaneously available. As dolphin whistles are commonly used in social encounters, dolphins producing whistles in this study were always in groups. Consequently, to assess whether distinctive whistles could be attributed to individual dolphins, conditional probabilities for recording a specific whistle in the presence of certain individuals, as well as Bayesian posterior probabilities for encountering a specific individual at times of certain whistles were computed. While a larger sample size is needed to capture all individuals in diverse groupings, this study provides the first step in developing a passive acoustic program for monitoring this small dolphin community, in order to ultimately inform its conservation management.
Article
Full-text available
Dolphins and whales use tonal whistles for communication, and it is known that frequency modulation encodes contextual information. An automated mathematical algorithm could characterize the frequency modulation of tonal calls for use with clustering and classification. Most automatic cetacean whistle processing techniques are based on peak or edge detection or require analyst assistance in verifying detections. An alternative paradigm is introduced using techniques of image processing. Frequency information is extracted as ridges in whistle spectrograms. Spectral ridges are the fundamental structure of tonal vocalizations, and ridge detection is a well-established image processing technique, easily applied to vocalization spectrograms. This paradigm is implemented as freely available matlab scripts, coined IPRiT (image processing ridge tracker). Its fidelity in the reconstruction of synthesized whistles is compared to another published whistle detection software package, silbido. Both algorithms are also applied to real-world recordings of bottlenose dolphin (Tursiops trunactus) signature whistles and tested for the ability to identify whistles belonging to different individuals. IPRiT gave higher fidelity and lower false detection than silbido with synthesized whistles, and reconstructed dolphin identity groups from signature whistles, whereas silbido could not. IPRiT appears to be superior to silbido for the extraction of the precise frequency variation of the whistle.
Conference Paper
Full-text available
This paper presents a novel approach to categorize dolphin whistles into various types. Most accurate methods to identify dolphin whistles are tedious and not robust, especially in the presence of ocean noise. One of the biggest challenges of dolphin whistle extraction is the coexistence of short-time duration wide-band echo clicks with the whistles. In this research a subspace of select orientation parameters of the 2-D Gabor wavelet frames is utilized to enhance or suppress signals by their orientation. The result is a Gabor image that contains a noise free grayscale representation of the fundamental dolphin whistle which is resampled and fed into the Sparse Representation Classifier. The classifier uses the l1-norm to select a match. Experimental studies conducted demonstrate: (a) a robust technique based on the Gabor wavelet filters in extracting reliable call patterns, and (b) the superior performance of Sparse Representation Classifier for identifying dolphin whistles by their call type.
Article
Full-text available
An automated procedure has been developed for detecting and localizing frequency-modulated bowhead whale sounds in the presence of seismic airgun surveys. The procedure was applied to four years of data, collected from over 30 directional autonomous recording packages deployed over a 280 km span of continental shelf in the Alaskan Beaufort Sea. The procedure has six sequential stages that begin by extracting 25-element feature vectors from spectrograms of potential call candidates. Two cascaded neural networks then classify some feature vectors as bowhead calls, and the procedure then matches calls between recorders to triangulate locations. To train the networks, manual analysts flagged 219 471 bowhead call examples from 2008 and 2009. Manual analyses were also used to identify 1.17 million transient signals that were not whale calls. The network output thresholds were adjusted to reject 20% of whale calls in the training data. Validation runs using 2007 and 2010 data found that the procedure missed 30%-40% of manually detected calls. Furthermore, 20%-40% of the sounds flagged as calls are not present in the manual analyses; however, these extra detections incorporate legitimate whale calls overlooked by human analysts. Both manual and automated methods produce similar spatial and temporal call distributions.
Article
Full-text available
In this paper, the problem of detecting and recognizing North Atlantic right whale (NARW), Eubalaena glacialis , contact calls in the presence of ambient noise is considered. A proposed solution is based on a multistage, hypothesis-testing technique that involves the generalized likelihood ratio test (GLRT) detector, spectrogram testing, and feature vector testing algorithms. The main contributions of this paper are the inclusion of noise kernels for signals likely to produce false alarms and a second stage classification algorithm which extracts parameters from candidate contact calls and constructs a scaled squared error statistic for parameters which lie outside the range of expected calls. Closed-form representations of the algorithms are derived and realizable detection schemes are developed. Test results show that the proposed technique is able to detect approximately 80% of the contact calls detected by the human operator with about 26 false alarms per 24 h of observation. Testing data set included 44 227 right whale contact calls detected by eight human operators who performed visual and aural inspection of the data spectrogram. Data were collected in different periods from March 2001 to February 2007, in Cape Cod Bay, Great South Channel, and in the coastal waters of Georgia.
Article
Full-text available
Traditionally, dolphin recognition techniques in the field have relied upon photographic identification, but this has several practical disadvantages. Some whistled vocalisations may be used for group identification, and these are viable at longer ranges than visual means. Novel automated algorithms have been developed to detect, encode and classify these whistles, with the aim of allowing a rapid, quantitative assessment of group identity. Hidden Markov models were constructed for each whistle class together with statistical representations of the whistles’ detailed shapes, in an unsupervised manner and from little a priori information. The encoding and classification routines were applied to whistles from a 15min recording made during a field trial, which contained three periods of whistle activity. Cross-group comparison of the whistle classes suggested that the whistles from the first period were vocally distinct from the second and third. Further analysis revealed that the latter two periods contained whistles that had been recorded simultaneously from two separate groups, but which could indeed be separated with the classification routines. This paper will detail the problems involved with detecting the whistles, characterising and classifying them, and finally will show the analysis of the results to calculate group similarity probabilities.
Article
Full-text available
Many odontocetes produce frequency modulated tonal calls known as whistles. The ability to automatically determine time × frequency tracks corresponding to these vocalizations has numerous applications including species description, identification, and density estimation. This work develops and compares two algorithms on a common corpus of nearly one hour of data collected in the Southern California Bight and at Palmyra Atoll. The corpus contains over 3000 whistles from bottlenose dolphins, long- and short-beaked common dolphins, spinner dolphins, and melon-headed whales that have been annotated by a human, and released to the Moby Sound archive. Both algorithms use a common signal processing front end to determine time × frequency peaks from a spectrogram. In the first method, a particle filter performs Bayesian filtering, estimating the contour from the noisy spectral peaks. The second method uses an adaptive polynomial prediction to connect peaks into a graph, merging graphs when they cross. Whistle contours are extracted from graphs using information from both sides of crossings. The particle filter was able to retrieve 71.5% (recall) of the human annotated tonals with 60.8% of the detections being valid (precision). The graph algorithm's recall rate was 80.0% with a precision of 76.9%.
Article
Full-text available
An algorithm is presented for the detection of frequency contour sounds-whistles of dolphins and many other odontocetes, moans of baleen whales, chirps of birds, and numerous other animal and non-animal sounds. The algorithm works by tracking spectral peaks over time, grouping together peaks in successive time slices in a spectrogram if the peaks are sufficiently near in frequency and form a smooth contour over time. The algorithm has nine parameters, including the ones needed for spectrogram calculation and normalization. Finding optimal values for all of these parameters simultaneously requires a search of parameter space, and a grid search technique is described. The frequency contour detection method and parameter optimization technique are applied to the problem of detecting "boing" sounds of minke whales from near Hawaii. The test data set contained many humpback whale sounds in the frequency range of interest. Detection performance is quantified, and the method is found to work well at detecting boings, with a false-detection rate of 3% for the target missed-call rate of 25%. It has also worked well anecdotally for other marine and some terrestrial species, and could be applied to any species that produces a frequency contour, or to non-animal sounds as well.
Article
Full-text available
Low frequency (<100 Hz) downsweep vocalizations were repeatedly recorded from ocean gliders east of Cape Cod, MA in May 2005. To identify the species responsible for this call, arrays of acoustic recorders were deployed in this same area during 2006 and 2007. 70 h of collocated visual observations at the center of each array were used to compare the localized occurrence of this call to the occurrence of three baleen whale species: right, humpback, and sei whales. The low frequency call was significantly associated only with the occurrence of sei whales. On average, the call swept from 82 to 34 Hz over 1.4 s and was most often produced as a single call, although pairs and (more rarely) triplets were occasionally detected. Individual calls comprising the pairs were localized to within tens of meters of one another and were more similar to one another than to contemporaneous calls by other whales, suggesting that paired calls may be produced by the same animal. A synthetic kernel was developed to facilitate automatic detection of this call using spectrogram-correlation methods. The optimal kernel missed 14% of calls, and of all the calls that were automatically detected, 15% were false positives.
Article
Full-text available
Analysis of acoustic signals recorded from the U.S. Navy's SOund SUrveillance System (SOSUS) was used to detect and locate blue whale (Balaenoptera musculus) calls offshore in the northeast Pacific. The long, low-frequency components of these calls are characteristic of calls recorded in the presence of blue whales elsewhere in the world. Mean values for frequency and time characteristics from field-recorded blue whale calls were used to develop a simple matched filter for detecting such calls in noisy time series. The matched filter was applied to signals from three different SOSUS arrays off the coast of the Pacific Northwest to detect and associate individual calls from the same animal on the different arrays. A U.S. Navy maritime patrol aircraft was directed to an area where blue whale calls had been detected on SOSUS using these methods, and the presence of vocalizing blue whale was confirmed at the site with field recordings from sonobuoys.
Article
Full-text available
This paper makes available a concise review of data windows and their affect on the detection of harmonic signals in the presence of broad-band noise, and in the presence of nearby strong harmonic interference. We also call attention to a number of common errors in the application of windows when used with the fast Fourier transform. This paper includes a comprehensive catalog of data windows along with their significant performance parameters from which the different windows can be compared. Finally, an example demonstrates the use and value of windows to resolve closely spaced harmonic signals characterized by large differences in amplitude.
Article
Full-text available
The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a so-called scale-space representation. Traditional scale-space theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic approach for dealing with this problem---a heuristic principle is presented stating that local extrema over scales of different combinations of gamma-normalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is proposed that this idea can be used as a major mechanism in algorithms for automatic scale selection, which adapt the local scales of processing to the local image structure. Support is given in terms of a general theoretical investigation of the behaviour of the scale selection method under rescalings of the input pattern and by experiments on real-world and synthetic data. Support is also given by a detailed analysis of how different types of feature detectors perform when integrated with a scale selection mechanism and then applied to characteristic model patterns. Specifically, it is described in detail how the proposed methodology applies to the problems of blob detection, junction detection, edge detection, ridge detection and local frequency estimation.
Article
Full-text available
When computing descriptors of image data, the type of information that can be extracted may be strongly dependent on the scales at which the image operators are applied. This article presents a systematic methodology for addressing this problem. A mechanism is presented for automatic selection of scale levels when detecting one-dimensional image features, such as edges and ridges. A novel concept of a scale-space edge is introduced, defined as a connected set of points in scale-space at which: (i) the gradient magnitude assumes a local maximum in the gradient direction, and (ii) a normalized measure of the strength of the edge response is locally maximal over scales. An important consequence of this definition is that it allows the scale levels to vary along the edge. Two specific measures of edge strength are analysed in detail, the gradient magnitude and a di#erential expression derived from the third-order derivative in the gradient direction. For a certain way of normalizing these ...
Article
In this paper we presen algorithms for the solution of the general assignment and transportation problems. In Section 1, a statement of the algorithm for the assignment problem appears, along with a proof for the correctness of the algorithm. The remarks which constitute the proof are incorporated parenthetically into the statement of the algorithm. Following this appears a discussion of certain theoretical aspects of the problem. In Section 2, the algorithm is generalized to one for the transportation problem. The algorithm of that section is stated as concisely as possible, with theoretical remarks omitted. 1. THE ASSIGNMENT PROBLEM. The personnel-assignment problem is the problem of choosing an optimal assignment of n men to n jobs, assuming that numerical ratings are given for each man’s performance on each job. An optimal assignment is one which makes the sum of the men’s ratings for their assigned jobs a maximum. There are n! possible assignments (of which several may be optimal), so that it is physically impossible, except
Article
Methods for the automatic recognition of low‐frequency sounds of baleen whales are presented. Matched filtering is implemented with a synthetic filter kernel derived from measurements of whale sounds, and this method is found effective at detecting blue whale (Balaenoptera musculus) sounds in white background noise. Spectrogram correlation is implemented and found effective at detection of blue whale vocalizations in the presence of interfering sounds. Spectrogram correlation employs image correlation using spectrograms and carefully designed image kernels to give good performance in the presence of noise. The two methods are briefly compared. The effectiveness of the spectrogram correlator is also demonstrated with finback whale (B. physalus) sounds. A supplementary method for detecting regular, repetitive sequences of sounds is applied to minke whale (B. acutorostrata) pulse trains, and found to improve detection in conditions of high ambient noise.
Article
Marine mammal vocalizations have always presented an intriguing topic for researchers not only because they provide an insight on their interaction, but also because they are a way for scientists to extract information on their location, number and various other parameters needed for their monitoring and tracking. In the past years field researchers have used submersible microphones to record underwater sounds in the hopes of being able to understand and label marine life. One of the emerging problems for both on site and off site researchers is the ability to detect and extract marine mammal vocalizations automatically and in real time given the copious amounts of existing recordings. In this paper, we focus on signal types that have a well-defined single frequency maxima and offer a method based on Sine wave modeling and Bayesian inference that will automatically detect and extract such possible vocalizations belonging to marine mammals while minimizing human interference. The procedure presented in this paper is based on global characteristics of these calls thus rendering it a species independent call detector/extractor.
Article
A time-frequency contour extraction and classification algorithm was created to analyze humpback whale vocalizations. The algorithm automatically extracted contours of whale vocalization units by searching for gray-level discontinuities in the spectrogram images. The unit-to-unit similarity was quantified by cross-correlating the contour lines. A library of distinctive humpback units was then generated by applying an unsupervised, cluster-based learning algorithm. The purpose of this study was to provide a fast and automated feature selection tool to describe the vocal signatures of animal groups. This approach could benefit a variety of applications such as species description, identification, and evolution of song structures. The algorithm was tested on humpback whale song data recorded at various locations in Hawaii from 2002 to 2003. Results presented in this paper showed low probability of false alarm (0%-4%) under noisy environments with small boat vessels and snapping shrimp. The classification algorithm was tested on a controlled set of 30 units forming six unit types, and all the units were correctly classified. In a case study on humpback data collected in the Auau Chanel, Hawaii, in 2002, the algorithm extracted 951 units, which were classified into 12 distinctive types.
Conference Paper
The sea is home to a myriad of marine animal species, many of which use sound as a primary means of communication, navigation and foraging. Of particular interest are the Blue whales (Balaenoptera musculus) of the cetacean family. Massive commercial whaling prior to 1960 brought the species close to extinction and its population still remains very low. Passive acoustic monitoring of baleen whales has recently been used to provide long-term information about their presence and behavior, and provides an attractive complement to traditional visual based monitoring. In this work we present a frequency domain based algorithm developed for extracting the frequency contours of the dominant harmonic in tonal calls of blue whales (B and D calls). The algorithm uses a two pass approach to contour extraction. In the first pass, partial candidate contours are formed, followed by a second pass which uses the partial information to construct complete contours. When evaluated on a one hour labeled recording, the algorithm had 90% recall and 76% precision.
Article
Passive acoustic monitoring allows the assessment of marine mammal occurrence and distribution at greater temporal and spatial scales than is now possible with traditional visual surveys. However, the large volume of acoustic data and the lengthy and laborious task of manually analyzing these data have hindered broad application of this technique. To overcome these limitations, a generalized automated detection and classification system (DCS) was developed to efficiently and accurately identify low-frequency baleen whale calls. The DCS (1) accounts for persistent narrowband and transient broadband noise, (2) characterizes temporal variation of dominant call frequencies via pitch-tracking, and (3) classifies calls based on attributes of the resulting pitch tracks using quadratic discriminant function analysis (QDFA). Automated detections of sei whale (Balaenoptera borealis) downsweep calls and North Atlantic right whale (Eubalaena glacialis) upcalls were evaluated using recordings collected in the southwestern Gulf of Maine during the spring seasons of 2006 and 2007. The accuracy of the DCS was similar to that of a human analyst: variability in differences between the DCS and an analyst was similar to that between independent analysts, and temporal variability in call rates was similar among the DCS and several analysts.
Article
Marine mammal vocalizations are often analyzed using time-frequency representations (TFRs) which highlight their nonstationarities. One commonly used TFR is the spectrogram. The characteristic spectrogram time-frequency (TF) contours of marine mammal vocalizations play a significant role in whistle classification and individual or group identification. A major hurdle in the robust automated extraction of TF contours from spectrograms is underwater noise. An image-based algorithm has been developed for denoising and extraction of TF contours from noisy underwater recordings. An objective procedure for measuring the accuracy of extracted spectrogram contours is also proposed. This method is shown to perform well when dealing with the challenging problem of denoising broadband transients commonly encountered in warm shallow waters inhabited by snapping shrimp. Furthermore, it would also be useful with other types of broadband transient noise.
Article
A method is described for the automatic recognition of transient animal sounds. Automatic recognition can be used in wild animal research, including studies of behavior, population, and impact of anthropogenic noise. The method described here, spectrogram correlation, is well-suited to recognition of animal sounds consisting of tones and frequency sweeps. For a sound type of interest, a two-dimensional synthetic kernel is constructed and cross-correlated with a spectrogram of a recording, producing a recognition function--the likelihood at each point in time that the sound type was present. A threshold is applied to this function to obtain discrete detection events, instants at which the sound type of interest was likely to be present. An extension of this method handles the temporal variation commonly present in animal sounds. Spectrogram correlation was compared to three other methods that have been used for automatic call recognition: matched filters, neural networks, and hidden Markov models. The test data set consisted of bowhead whale (Balaena mysticetus) end notes from songs recorded in Alaska in 1986 and 1988. The method had a success rate of about 97.5% on this problem, and the comparison indicated that it could be especially useful for detecting a call type when relatively few (5-200) instances of the call type are known.
Detection and classification of right whale calls using an 'edge' detector operating on a smoothed spectrogram
  • D Gillespie
Gillespie, D. (2004). "Detection and classification of right whale calls using an 'edge' detector operating on a smoothed spectrogram." Canadian Acoustics, 32(2), 39-47.
Automatic extraction of biological narrowband signals
  • S Madhusudhana
S. Madhusudhana et al. Automatic extraction of biological narrowband signals in underwater audio Proceedings of Meetings on Acoustics, Vol. 29, 010002 (2017) Page 14
The Moby Sound Database for Research in the Automatic Recognition of Marine Mammal Calls
  • S Heimlich
  • H Klinck
  • D K Mellinger
Heimlich, S., Klinck, H., and Mellinger, D. K. (2011). "The Moby Sound Database for Research in the Automatic Recognition of Marine Mammal Calls." http://www.mobysound.org/ (Last viewed on March 31, 2015).
On the generalized distance in statistics
  • P C Mahalanobis
Mahalanobis, P. C. (1936). "On the generalized distance in statistics." Proceedings of the National Institute of Sciences (Calcutta), 2, 49-55.