Conference PaperPDF Available

The Xeno-canto collection and its relation to sound recognition and classification

Authors:

Abstract and Figures

This paper discusses distinguishing characteristics of the Xeno-canto bird sound collection. The main aim is to indicate the relation between automated recognition of bird sounds (or feature recognition in digital recordings more generally) and curating large bioacoustics collections. Not only do large collections make it easier to design robust algorithmic approaches to automated species classifiers, those same algorithms should also become useful in determining the actual content of the collections.
Content may be subject to copyright.
The Xeno-canto collection and its relation to
sound recognition and classification
Willem-Pier Vellinga1and Robert Planqu´e1
Stichting Xeno-canto voor natuurgeluiden (Xeno-canto Foundation), The Netherlands
{wp,bob}@xeno-canto.org
Abstract. This paper discusses distinguishing characteristics of the Xeno-
canto bird sound collection. The main aim is to indicate the relation
between automated recognition of bird sounds (or feature recognition in
digital recordings more generally) and curating large bioacoustics col-
lections. Not only do large collections make it easier to design robust
algorithmic approaches to automated species classifiers, those same al-
gorithms should also become useful in determining the actual content of
the collections.
Keywords: LifeCLEF2015, BirdCLEF2015, Xeno-canto, bird sounds, automated
recognition, citizen science, data mining.
1 Introduction
For the past two years the BirdCLEF challenge [1,2], part of the LifeCLEF
workshops [3,4], has been based on sounds from Xeno-canto. Xeno-canto (XC)
aims to popularise bird sound recording, to improve accessibility of bird sounds,
and to increase knowledge of bird sounds. It tries to achieve these aims by
facilitating and curating a collaborative, shared, global bird sound collection on
www.xeno-canto.org. The collection was initiated by the authors in 2005 [5].
When XC started out it was mainly a project to aid identification of small
collections of bird sounds made by the authors in tropical forests in Peru and
Ecuador. Identifying species by sound using the means available at the time,
mostly commercial cassette tapes or CDs with up to a hundred recordings, was
cumbersome and many sounds were simply not available. (For a discussion see
[6]).
Sjoerd Mayer’s “Birds of Bolivia” CD-ROM’s [7,8] were an inspiration. They
increased the number of sounds available and species represented by an order of
magnitude, made navigation of the sounds much easier, mapped locations, and
identified background species on a recording. Mayer also engaged the birding
community by welcoming and crediting contributions of sounds by birders and
published corrections of errors on his website.
The authors essentially took these concepts a step further, and designed and
constructed an interface to a non-commercial, open database situated on the
world wide web. A number of guiding principles were formulated that distin-
guished XC from other sound collections at the time:
Anyone with web access is invited to upload sounds. XC does not refuse
recordings. Contributors can share any bird sound they find interesting, pro-
vided they are below a fixed maximum size (initially 1 MB, now 10 MB) and
provided a required minimum set of metadata is given: species, recordist
name, location name, country, recording date, time of day, elevation, and
sound type(s). This system certainly has drawbacks: a considerable fraction
of the recordings is short, of dodgy quality, or both. Still, such recordings
may be useful. They may represent poorly known locations or vocalisations,
or may simply contribute to the sample size of individual species. Also, in
the context of automated species identification algorithms, it is clear that
any real-life deployment of such an algorithm would have to deal with poor
quality recordings as well.
The recordings uploaded to XC are shared. Re-use of the recordings is in-
tended, for purposes that are in line with the aims of XC, such as download-
ing to personal collections, embedding sounds in educational or personal web
sites, use for scientific research, etcetera. The Creative Commons licenses
(http://creativecommons.org/) offer a useful framework. After consulta-
tion with the community it was decided to settle for CC-BY-ND-NC (attri-
bution, no-derivatives, non-commercial) licenses. Since this is in fact a rather
restrictive license, nowadays one can also choose CC-BY-NC-SA (SA stands
for share-alike) and CC-BY-SA licenses that allow more liberal re-use. In all
cases attribution of the author/contributor on republication is mandatory.
For discussion of the limits of the other terms, see the Creative Commons
website. The XC website code is written in free, open source software. It
is based on a standard LAMP (Linux, Apache, MySQL, PHP) set-up, with
some additional software written to show sonograms, implement mapping,
and so on.
Anyone can contribute to the collection in some way. Apart from sharing
recordings, people may contribute expertise on identification, set identifica-
tion challenges, offer experience with equipment, write articles on-site, or
just comment on recording achievements.
Anyone can challenge an identification (ID) on the site. The vast majority of
recordings have been identified correctly to species by the recordist, but er-
rors are inevitable. When challenged, the recording is set aside and does not
appear in search results until the ID is resolved by the community. This is
usually done in an open discussion on the forum. If the ID is agreed upon, the
recording is put back into the collection by the administrators. The admin-
istrators therefore have the role of arbiters, rather than authorities, and in
fact there are no designated authorities that decide on species identification.
This is one of the more uncommon features of Xeno-canto, and in this sense
it differs from other well-known community projects on natural history, such
as eBird (ebird.org), Observado (waarneming.nl / observado.org).
At present, May 2015, the XC collection contains some 243,000 recordings
from over 9,300 bird species, shared by more than 2400 contributors from all
over the world. In the rest of this paper, the development and current status of
the XC collection are illustrated and a few points relevant to its relation with
automatic sound classification and recognition are discussed.
Fig. 1. Cumulative number of contributors over time.
2 Characteristics of the collection
The growth of XC is illustrated in Figures 1 and 2, by plotting the number of
recordings and the number of contributors over time. Two things are noteworthy.
Firstly, the data for the initial period is incomplete, since the uploading dates
were initially not recorded. Secondly, there are pronounced seasonal effects, most
obvious in the number of contributors. These points are remedied to some extent
in subsequent figures by plotting versus the number of recordings instead of
versus time.
2.1 Contributors
Both the number of recordings and the number of contributors grow at increas-
ing rates. Remarkably, plotting the number of recordings versus the number of
contributors shows that they have consistently increased at approximately the
same rate. See Figure 3. This leads to a more or less constant average number of
recordings per contributor, which turns out to be about 100. However, it should
be noted that the distribution of recordings per contributor is very broad and
skewed. At this moment 298 contributors contribute more than 100 recordings,
and many more contributors, around 2100, contribute less than 100 recordings.
Fig. 2. Cumulative number of recordings over time.
The three largest contributions each comprise more than 10000 recordings, more
than 100 times the average; 729 contributors contributed 1 recording, 100 times
less than the average. The Zipf-like plots in Figure 4 serve to characterise the
distribution at various stages during the development of XC.
2.2 Species
When a sound is uploaded a set of metadata is required, among which the name
of the species. Specifying the subspecies is optional. The taxonomy of the site
was initially based on the taxonomy in Neotropical Birds [11]. Other regions
were added over time (North-America, Africa, Asia, Europe and Australasia)
using other local taxonomies, which lead to problems with species occurring in
several regions. In 2011 the global IOC (International Ornithological Council)
taxonomy was adopted for all recordings and XC currently uses version 4.1 [12].
The constant revision of taxonomy at the species level means that the species
assignment of the recordings needs to be updated frequently. This task falls to
the team of administrators. Splits can be problematic, since the subspecific taxon
to which a recording belongs may not be indicated (see below).
IOC 4.1 recognises 10,518 extant species and 150 extinct species; to this list
XC has added 16 additional recently described or as yet undescribed species.
About 9330 are represented in XC at this moment. To our best knowledge this
constitutes the largest number of species in any public collection of bird sounds.
(There is at least one private collection that has more species, but it includes
sounds of all species from XC.)
The growth of the number of species may provide a clue about the moment
of completion of the collection at the species level. Figure 5 shows the species
Fig. 3. Order of contributor versus number of first recording by that contributor. To
a reasonable approximation the increase is linear the slope indicating that every con-
tributor adds about 100 recordings on average.
Fig. 4. Distribution of the number of recordings per recordist plotted in a Zipf plot after
2000, 5000, 20000, 50000 and 200000 recordings. These plots show that the distribution
of the number of recordings per contributor is very wide.
Fig. 5. Species accumulation curve (blue dots) and randomised species acccumulation
curve (brown dots). The drawn line is an extrapolation shown further in figure 8.
accumulation curve up to may 2015, together with a randomised accumulation
curve and a fit used for extrapolation. The randomised version is based on a
random draw from all recordings present in XC. Clearly the two curves differ
significantly. This is caused by the fact that XC started out with only Neotropical
species, and that other world areas were added later. The randomised species
accumulation curve does not take that into account. The two curves are seen to
meet up after about 170000 recordings, well after XC went global.
For any number of reasons (abundance and size range, vocal (in)activity,
accessibility of the range of the species, accessibility of the site to name just
four) the recordings are not evenly distributed across the species. The current
expectation value for the number of recordings per species is around 20. However
some 20 species have attracted over 500 recordings, while around 1200 are still
waiting to be uploaded. The distribution plotted in Zipf-fashion is shown in
Figure 6, probability densities are shown in Figure 7.
The species abundance curves can be extrapolated into the future by making
assumptions on the probability that species that are not represented at this time
will be uploaded. A reasonable fit is achieved by assuming that the probability of
a new species being uploaded is 1/3 of that of a species with 1 recording in XC,
with the ratios between probabilities of species already represented remaining
equal. An extrapolation based on that assumption is shown in Figure 8. Of
course the extrapolation follows the randomised species abundance curve very
well. The extrapolation is shown up to 900,000 recordings, at which point it is
still about 600 species shy of the total number of species. The precise number
will depend on the assumptions made, but it seems reasonable to assume that
completion at the species level will take a multiple of the number of recordings
present at this moment.
Fig. 6. Distribution of the number of recordings per species plotted in a Zipf plot after
2000, 5000, 20000, 50000 and 200000 recordings.
Fig. 7. Probability densities of species after 2000, 5000, 20000, 50000 and 200000
recordings.
Fig. 8. Extrapolation of species accumulation curve assuming that as yet unrepresented
species have a probability of being uploaded that is 1/3 that of a species represented
with 1 recording. It is likely the collection needs to multiply in size before completion
at the species level is reached.
3 Linking bird song databases and automated species
recognition
The BirdCLEF workshop requires entrants to identify recordings from the XC
collection to species level based on the species level identification provided by
the XC community. It is worthwhile to have another look at the species level
IDs in XC. For a number of reasons the ID to species level, even if correct, may
be misleading.
Presence of unnamed background species Although recordists are asked
to mention the background species present in the recordings, not all recordists
do so. On average 2 species are identified per recording, but it is certain that
many more could be identified. Interestingly, the presence of named back-
ground species helps humans to identify a sound of interest (as the authors
know from personal experience), but this does not seem to lead to a higher
rate of identification in the algorithmic identifications [1,2].
Hidden diversity The IOC 4.1 species list not only recognises 10668 species,
but identifies another 20976 subspecies for 5093 of species, bringing the to-
tal number of taxa to 26551. On XC about 9330 species and 9140 additional
subspecies have been identified. This does not mean that 18470 taxa are
represented. It is likely that some recordings represent subspecies that have
not been named now. This means that the currently recognised number is an
underestimate. But it is also likely that in some cases the taxa represented
by recordings without subspecific ID are in fact already named, which would
lead to an overestimate. Of the 9300 species present on XC 4484 are mono-
typic. The 9100 subspecies therefore belong to about 4900 species adding
at least 4200 taxa. The maximum number of named taxa represented is
therefore 18400 and the minimum number 13500. An estimate based on the
number of species present (9300/10668)*26551 would lead to about 23000
taxa present at this time. Based on this estimate it seems likely that a con-
siderable number of taxa remains to be named on XC. At the same time this
also means that the species category may represent considerable taxonomic
diversity. It is to be expected that such diversity hidden within species on XC
is reflected in the sounds, since many subspecies are known to have distinct
vocalizations [9,10].
Other contributions to the diversity of sounds which do not necessarily align
with subspecies categories are geographical dialects, such as in Yellowhammer
(Emberiza citrinella), and the size of the vocabulary of a species, such as in
Common Nightingale (Luscinia megarhynchos). Little quantitative information
is available on the extent of dialect formation and the size of the vocabulary
across the range of the overwhelming majority of the 10518 species of birds.
Apparently the effect of such diversity at the species level on the results of auto-
matic recognition has not been quantified yet. Intuitively, given a set of training
data, one would expect a species that shows little diversity to be recognised more
faithfully than a species that shows a lot of variability.
In [1] it was concluded there that the recognition algorithms worked better
on average for species with more recordings in the training set. It would be
interesting to look for correlations with the number of subspecies recognised, or
the known size of vocabulary.
4 Conclusion
The results from the 2014 and 2015 BirdCLEF challenges offer an interesting
perspective on the use of automated algorithmic techniques on the one hand,
and large accessible public archives of sound data on the other.
At present, the focus in the challenges lies squarely in the field of automated
recognition, and understandably so. The large Xeno-canto database has been
the basis of the challenges, and give the first general insights in automated
feature extraction and classification to species level for general vocalizations.
The species set included in the latest 2015 edition spans 1000 species with a
huge range of different types of bird songs and calls. The BirdCLEF paper in
this volume contributes to our understanding which techniques excel at this type
of challenge.
We would welcome a second application of the algorithms, however, one that
would allow a deeper insight into the variety of vocalizations actually repre-
sented in archives such as Xeno-canto. There is great potential for collaborative
projects, in which estimates would be computed of a number of statistics. Ex-
ample include (a) estimates of repertoire sizes in song birds (or other taxa); (b)
discovery of subspecies with different vocal signatures; (c) the ability to extract
a small representative sample of different vocalizations for focal species, or fo-
cal localities. We hope that we may attract the computer science community to
work with us to start to address these types of challenges.
References
1. Go¨eau, H., Glotin, H., Vellinga, W.P., Planqu´e, R, Joly, A.: LifeCLEF Bird Identi-
fication Task 2014. In: Proceedings of CLEF 2014 (2014)
2. Go¨eau, H., Glotin, H., Vellinga, W.P., Planqu´e, R., Rauber, A.,Joly, A.: LifeCLEF
Bird Identification Task 2015. In: CLEF working notes 2015 (2015)
3. Joly, A., M¨uller, H., Go¨eau, H., Glotin, H., Spampinato, C., Rauber, A., Bonnet,
P., Vellinga, W.P., Fisher, B., Planqu´e, R.: LifeCLEF 2014: multimedia life species
identification. In: Proceedings of CLEF 2014 (2014)
4. Joly, A., M¨uller, H., Go¨eau, H., Glotin, H., Rauber, A., Bonnet, P., Vellinga, W.P.,
Fisher, B., Planqu´e, R.: LifeCLEF 2015: multimedia life species identification chal-
lenges. In: Cappellato, L., Ferro, N., Jones, G., and San Juan, E., editors (2015).
CLEF 2015 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings
(CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.org/ Vol-1391/. (2015)
5. Vellinga, W.P., Planqu´e, R.: http://tinyurl.com/xcstart05 (2005)
6. Moore, J.V.: Ecuador’s avifauna: the state of knowledge and availability of sound-
recordings. Cotinga 29, 19–21 (2008)
7. Mayer, S.: Bird sounds of Bolivia / Sonidos de aves de Bolivia, 1.0. CD-ROM. Bird
Songs International, Westernieland, The Netherlands. (1996)
8. Mayer, S.: Bird sounds of Bolivia / Sonidos de aves de Bolivia, 2.0. CD-ROM. Bird
Songs International, Westernieland, The Netherlands. (2000)
9. Stotz, D.F., Fitzpatrick, J.W., III, T.A.P., Moskovits, D.K.: Neotropical Birds. Uni-
versity of Chicago Press (1996)
10. Gill, F., Donsker, D.: IOC World Bird Names v4.1. Available at
www.worldbirdnames.org. CC-BY 3.0 (2015)
11. Kroodsma, D.E., Miller, E.H. (eds.): Ecology and evolution of acoustic communi-
cation in birds. Comstock Publishing Associates (1996)
12. Marler, P., Slabbekoorn, H.: Nature’s Music. Elsevier Academic Press (2004)
... At the same time, large volumes of unannotated bioacoustics data are recorded daily, particularly through passive acoustic monitoring (PAM, Dufourq et al. (2021)) or citizen science platforms e.g. Xeno-canto (Vellinga & Planqué, 2015)). There is thus a growing need for machine learning tools capable of performing tasks such as detection, classification, and annotation on these data at scale. ...
Preprint
Full-text available
Large language models (LLMs) prompted with text and audio represent the state of the art in various auditory tasks, including speech, music, and general audio, showing emergent abilities on unseen tasks. However, these capabilities have yet to be fully demonstrated in bioacoustics tasks, such as detecting animal vocalizations in large recordings, classifying rare and endangered species, and labeling context and behavior - tasks that are crucial for conservation, biodiversity monitoring, and the study of animal behavior. In this work, we present NatureLM-audio, the first audio-language foundation model specifically designed for bioacoustics. Our carefully curated training dataset comprises text-audio pairs spanning a diverse range of bioacoustics, speech, and music data, designed to address the challenges posed by limited annotated datasets in the field. We demonstrate successful transfer of learned representations from music and speech to bioacoustics, and our model shows promising generalization to unseen taxa and tasks. Importantly, we test NatureLM-audio on a novel benchmark (BEANS-Zero) and it sets the new state of the art (SotA) on several bioacoustics tasks, including zero-shot classification of unseen species. To advance bioacoustics research, we also open-source the code for generating training and benchmark data, as well as for training the model.
... In recent years, deep learning models have emerged as a powerful tool to process and analyze complex bioacoustic data [2]. A key source of training data for these models comes from citizen science platforms like Xeno-Canto [3], which contains over one million annotated vocalizations from more than 10,000 species, primarily birds. These citizenled initiatives have significantly expanded the availability of labeled wildlife sound data, enabling the development of more robust and accurate deep learning models [4]. ...
Preprint
Full-text available
Passive acoustic monitoring (PAM) is crucial for bioacoustic research, enabling non-invasive species tracking and biodiversity monitoring. Citizen science platforms like Xeno-Canto provide large annotated datasets from focal recordings, where the target species is intentionally recorded. However, PAM requires monitoring in passive soundscapes, creating a domain shift between focal and passive recordings, which challenges deep learning models trained on focal recordings. To address this, we leverage supervised contrastive learning to improve domain generalization in bird sound classification, enforcing domain invariance across same-class examples from different domains. We also propose ProtoCLR (Prototypical Contrastive Learning of Representations), which reduces the computational complexity of the SupCon loss by comparing examples to class prototypes instead of pairwise comparisons. Additionally, we present a new few-shot classification benchmark based on BirdSet, a large-scale bird sound dataset, and demonstrate the effectiveness of our approach in achieving strong transfer performance.
... The audio recordings are collected from the bird sound database [33] of the Xeno-Canto Foundation 1 . Audio files (32 bit mono) with sampling rate of 16kHz have been employed for the experiments. ...
... To train and validate the classifier, we present an updated version of the BirdVox American Northeast Avian Flight Call Classification (BirdVox-ANAFCC, or ANAFCC for short) Dataset [50], which we refer to as ANAFCC-v2 4 . This dataset aggregates isolated flight calls from different data sources: BirdVox-full-night, CLO-43SD, CLO-SWTH, CLO-WTSP [26], the Macaulay Library [52], Xeno-Canto [53], and Old Bird [54] 5 . More information on ANAFCC-v2 is made available as supplementary material. ...
Article
Sound event classification has the potential to advance our understanding of bird migration. Although it is long known that migratory species have a vocal signature of their own, previous work on automatic flight call classification has been limited in robustness and scope: e.g., covering few recording sites, short acquisition segments, and simplified biological taxonomies. In this paper, we present BirdVoxDetect (BVD), the first full-fledged solution to bird migration monitoring from acoustic sensor network data. As an open-source software, BVD integrates an original pipeline of three machine learning modules. The first module is a random forest classifier of sensor faults, trained with human-in-the-loop active learning. The second module is a deep convolutional neural network for sound event detection with per-channel energy normalization (PCEN). The third module is a multitask convolutional neural network which predicts the family, genus, and species of flight calls from passerines (Passeriformes) of North America. We evaluate BVD on a new dataset (296 hours from nine locations, the largest to date for this task) and discuss the main sources of estimation error in a real-world deployment: mechanical sensor failures, sensitivity to background noise, misdetection, and taxonomic confusion. Then, we deploy BVD to an unprecedented scale: 6672 hours of audio (approximately one terabyte), corresponding to a full season of bird migration. Running BVD in parallel over the full-season dataset yields 1.6 billion FFT's, 480 billion neural network predictions, and over six petabytes of throughput. With this method, our main finding is that deep learning and bioacoustic sensor networks are ready to complement radar observations and crowdsourced surveys for bird migration monitoring, thus benefiting conservation ecology and land-use planning at large.
... In PAM, a feature vector x ∈ represents a -dimensional instance, originating from either a focal recording where = ℱ, or a soundscape recording with = . Focal recordings are extensively available on the citizen-science platform Xeno-Canto (XC) [10] with a global collection of over 800,000 recordings, making them particularly suitable for model training. Large-scale bird sound classification models (e.g., BirdNET [2]) are primarily trained on focals. ...
Preprint
Full-text available
Passive acoustic monitoring (PAM) in avian bioacoustics enables cost-effective and extensive data collection with minimal disruption to natural habitats. Despite advancements in computational avian bioacoustics, deep learning models continue to encounter challenges in adapting to diverse environments in practical PAM scenarios. This is primarily due to the scarcity of annotations, which requires labor-intensive efforts from human experts. Active learning (AL) reduces annotation cost and speed ups adaption to diverse scenarios by querying the most informative instances for labeling. This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.
... Buoyed by immense quantities of data, broad spatial and temporal extents, and cost effectiveness, citizen-science projects have greatly proliferated over the past 20-30 years (Theobald et al. 2015;Pocock et al. 2017). Since 2000, online platforms such as eBird (established 2002), Xeno-canto (established 2005), and iNaturalist (established 2008) have emerged (Sullivan et al. 2009;Vellinga and Planqué 2015), becoming so popular that birds now comprise more than half of all biodiversity occurrence data in the Global Biodiversity Information Facility (GBIF) (Amano et al. 2016;Troudet et al. 2017). Digital media is rapidly emerging as a central component of today's biological record, enabling instantaneous documentation and independent verification of observations (August et al. 2015;Mesaglio et al. 2023). ...
Article
Full-text available
Biodiversity knowledge gaps, which limit scientific research and conservation planning, are especially acute for the most poorly known organisms. Citizen science offers a powerful and effective means to fill these gaps. The recent growth of citizen‐science platforms has resulted in near‐complete coverage of global avian diversity (~11,849 species). Because shrinking knowledge gaps increasingly reveal meaningful absences, we evaluated the potential of citizen‐science data to establish “lost” bird taxa: those without documentation for more than 10 years. Collating more than 42 million photographic, audio, and video records returned 144 bird species (1.2%) as lost, the majority of which (62%) are in danger of extinction. The higher the coverage by citizen scientists and the longer the interval since their last documented record, the more likely that lost birds are to be imperiled. Our approach provides a data‐driven and reproducible method to identify lost species and elucidates high‐priority knowledge gaps to inform future conservation action.
... We first extracted five pure recordings without sonic background for each of the 25 candidate insectivore species from the online database Xeno-canto. org (Vellinga & Planque, 2015). We then calculated the number of peaks (i.e., NPIC) in the audio signal (see § Acoustic diversity, above) as well as the frequency of the maximum amplitude peaks for each vocal element using the 'seewave' library (Sueur, Aubin, & Simonis, 2008) and averaged these frequencies for each species. ...
Article
Aim Climate is a major driver of large-scale variability in biodiversity, as a likely result of more intense biotic interactions under warmer conditions. This idea fuelled decades of research on plant-herbivore interactions, but much less is known about higher-level trophic interactions. We addressed this research gap by characterizing both bird diversity and avian predation along a climatic gradient at the European scale. Location Europe. Taxon Insectivorous birds and pedunculate oaks. Methods We deployed plasticine caterpillars in 138 oak trees in 47 sites along a 19° latitudinal gradient in Europe to quantify bird insectivory through predation attempts. In addition, we used passive acoustic monitoring to (i) characterize the acoustic diversity of surrounding soundscapes; (ii) approximate bird abundance and activity through passive acoustic recordings; and (iii) infer both taxonomic and functional diversity of insectivorous birds from recordings. Results The functional diversity of insectivorous birds increased with warmer climates. Bird predation increased with forest cover and bird acoustic activity but decreased with mean annual temperature and functional richness of insectivorous birds. Contrary to our predictions, climatic clines in bird predation attempts were not directly mediated by changes in insectivorous bird diversity or acoustic activity, but climate and habitat still had independent effects on predation attempts. Main Conclusions Our study supports the hypothesis of an increase in the diversity of insectivorous birds towards warmer climates but refutes the idea that an increase in diversity would lead to more predation and advocates for better accounting for activity and abundance of insectivorous birds when studying the large-scale variation in insect-tree interactions.
... Recordings of the bird species are collected from the Xeno-canto website [36] 3 . We standardized all the files to a minimum sampling rate of 16 kHz because the original files' sample rates ranged from 16 kHz to 44.1 kHz. ...
Article
Birds are excellent bioindicators, playing a vital role in maintaining the delicate balance of ecosystems. Identifying species from bird vocalization is arduous but has high research gain. The paper focuses on the detection of multiple bird vocalizations from recordings. The proposed work uses a deep convolutional neural network (DCNN) and a recurrent neural network (RNN) architecture to learn the bird's vocalization from mel-spectrogram and mel-frequency cepstral coefficient (MFCC), respectively. We adopted a sequential aggregation strategy to make a decision on an audio file. We normalized the aggregated sigmoid probabilities and considered the nodes with the highest scores to be the target species. We evaluated the proposed methods on the Xeno-canto bird sound database, which comprises ten species. We compared the performance of our approach to that of transfer learning and Vanilla-DNN methods. Notably, the proposed DCNN and VGG-16 models achieved average F1 metrics of 0.75 and 0.65, respectively, outperforming the acoustic cue-based Vanilla-DNN approach.
Conference Paper
Full-text available
Using multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipments have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching the real world’s requirements. The LifeCLEF lab proposes to evaluate these challenges around 3 tasks related to multimedia information retrieval and fine-grained classification problems in 3 living worlds. Each task is based on large and real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders in order to reflect realistic usage scenarios. This paper presents more particularly the 2015 edition of LifeCLEF. For each of the three tasks, we report the methodology and the data sets as well as the raw results and the main outcomes.
Article
Full-text available
The LifeCLEF bird identification task provides a testbed for a system-oriented evaluation of 501 bird species identification. The main originality of this data is that it was specifically built through a citizen science initiative conducted by Xeno-Canto, an international social network of amateur and expert ornithologists. This makes the task closer to the conditions of a real-world application than previous, similar initiatives. This overview presents the resources and the assessments of the task, summarizes the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results. With a total of ten groups from seven countries and with a total of twenty-nine runs submitted, involving distinct and original methods, this first year task confirms the interest of the audio retrieval community for biodiversity and ornithology, and highlights further challenging studies in bird identification.
Conference Paper
Full-text available
Using multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipments have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching the real world’s requirements. The LifeCLEF lab proposes to evaluate these challenges around three tasks related to multimedia information retrieval and fine-grained classification problems in three living worlds. Each task is based on large and real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders in order to reflect realistic usage scenarios. This paper presents more particularly the 2014 edition of LifeCLEF, i.e. the pilot one. For each of the three tasks, we report the methodology and the datasets as well as the official results and the main outcomes.
Conference Paper
Full-text available
Building accurate knowledge of the identity, the geographic distribution and the evolution of living species is essential for a sustainable development of humanity as well as for biodiversity conservation. In this context, using multimedia identification tools is considered as one of the most promising solution to help bridging the taxonomic gap. With the recent advances in digital devices/equipment, network bandwidth and information storage capacities, the production of multimedia big data has indeed become an easy task. In parallel, the emergence of citizen sciences and social networking tools has fostered the creation of large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) that have started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art multimedia analysis techniques on such data is still not well understood and is far from reaching the real world's requirements in terms of identification tools. The LifeCLEF lab proposes to evaluate these challenges around 3 tasks related to multimedia information retrieval and fine-grained classification problems in 3 living worlds. Each task is based on large and real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders in order to reflect realistic usage scenarios.
Ecuador's avifauna: the state of knowledge and availability of soundrecordings
  • J V Moore
Moore, J.V.: Ecuador's avifauna: the state of knowledge and availability of soundrecordings. Cotinga 29, 19-21 (2008)
Bird sounds of Bolivia / Sonidos de aves de Bolivia, 1.0. CD-ROM
  • S Mayer
Mayer, S.: Bird sounds of Bolivia / Sonidos de aves de Bolivia, 1.0. CD-ROM. Bird Songs International, Westernieland, The Netherlands. (1996)
Bird sounds of Bolivia / Sonidos de aves de Bolivia, 2.0. CD-ROM
  • S Mayer
Mayer, S.: Bird sounds of Bolivia / Sonidos de aves de Bolivia, 2.0. CD-ROM. Bird Songs International, Westernieland, The Netherlands. (2000)
IOC World Bird Names v4.1. Available at www
  • F Gill
  • D Donsker
Gill, F., Donsker, D.: IOC World Bird Names v4.1. Available at www.worldbirdnames.org. CC-BY 3.0 (2015)
LifeCLEF 2015: multimedia life species identification challenges
  • A Joly
  • H Müller
  • H Goëau
  • H Glotin
  • A Rauber
  • P Bonnet
  • W P Vellinga
  • B Fisher
  • R Planqué
  • L Cappellato
  • N Ferro
  • G Jones
  • San Juan
Joly, A., Müller, H., Goëau, H., Glotin, H., Rauber, A., Bonnet, P., Vellinga, W.P., Fisher, B., Planqué, R.: LifeCLEF 2015: multimedia life species identification challenges. In: Cappellato, L., Ferro, N., Jones, G., and San Juan, E., editors (2015). CLEF 2015 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073, http://ceur-ws.org/ Vol-1391/. (2015)