[Show abstract][Hide abstract] ABSTRACT: This study uses an audio signal transformation, splicing, to create an experimental situation where human listeners judge the similarity of audio signals, which they cannot easily categorize. Splicing works by segmenting audio signals into 50-ms frames, then shuffling and concatenating these frames back in random order. Splicing a signal masks the identification of the categories that it normally elicits: For instance, human participants cannot easily identify the sound of cars in a spliced recording of a city street. This study compares human performance on both normal and spliced recordings of soundscapes and music. Splicing is found to degrade human similarity performance significantly less for soundscapes than for music: When two spliced soundscapes are judged similar to one another, the original recordings also tend to sound similar. This establishes that humans are capable of reconstructing consistent similarity relations between soundscapes without relying much on the identification of the natural categories associated with such signals, such as their constituent sound sources. This finding contradicts previous literature and points to new ways to conceptualize the different ways in which humans perceive soundscapes and music.
The Journal of the Acoustical Society of America 05/2009; 125(4):2155-61. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The "bag-of-frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or soundscapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music data sets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music data set. Second, the modification of this measure by two custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.
The Journal of the Acoustical Society of America 09/2007; 122(2):881-91. · 1.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The "bag of frames" approach (BOF) to audio pattern recognition
represents signals as the long-term statistical distribution of their
local spectral features. This computational technique was recently
used to simulate human judgements of the holistic similarity between
pieces of polyphonic music ("this sounds like The Beatles"), without
any direct modelling of their individual musical instruments (namely,
voice, electric guitar, drums, etc.). This paper proposes to apply
the same measure of acoustic similarity to natural and human sound
environments (or soundscapes). We find that the approach can simulate
human categorization in a simple taxonomy of urban soundscapes to
near-perfect precision, better in fact than that achieved with musical
signals. Such techniques to recognize environmental acoustic configurations
globally without any direct modelling of their constituent sound
sources can be used to disambiguate finer-but-noisier source identification
algorithms: if this globally sounds like a "park", then this "car
horn" must be a "bird". Based on the proposed algorithm, we discuss
the difference between such contextual effects in soundscapes and
[Show abstract][Hide abstract] ABSTRACT: The goal of the FDAI project is to create a general system that computes
an efficient representation of the acoustic environment. More precisely,
FDAI has to compute a noise disturbance indicator based on the identification
of six categories of sound sources. This paper describes experiments
carried out to identify acoustic features and recognition models
that were implemented in FDAI. This framework is based on EDSâExtractor
Discovery Systemâan innovative acoustic feature extraction system
for sound feature extraction. The design and development of FDAI
raised two critical issues. Completeness: it is very difficult to
design descriptors that identify every sound source in urban environments;
and Consistency: some sound sources are not acoustically consistent.
We solved the first issue with a conditional evaluation of a family
of acoustic descriptors, rather than the evaluation of a single general-purpose
extractor. Indeed, a first hierarchical separation between vehicles
(moped, bus, motorcycle and car) and non-vehicles (bird and voice)
significantly raised the accuracy of identification of the buses.
The second issue turned out to be more complex and is still under
study. We give here preliminary results.
[Show abstract][Hide abstract] ABSTRACT: Territoires et Environnements, Rue d'Eragny, Neuville sur Oise, 95031 Cergy Pontoise, firstname.lastname@example.org Starting with a panel of 20 stimuli of 15 seconds each, representing an urban sound environment, a perceptive study centred on the "noise object" is undertaken. Listeners are invited to judge their feeling of every sound sources heard in a three dimension perceptual space dealing with the "prominence", the "presence" and the "proximity" of the source. Statistical analyses illustrate the relative sensibility of listeners between each type of sound sources (vehicles, birds and voices). Different physical measures related to time and energy calculated from the different source coded L Aeq curves, are correlated to the perceptive dimensions for each type of sources. Unpleasantness and perceived loudness of each stimulus are asked in a second and third test. Naturally, the unpleasantness is correlated with the perceived loudness, but multiple regression analysis including the variables linked to "prominence", "presence" or "proximity" of certain sources increases the correlation with unpleasantness of the stimuli. With the regression model based on perceived loudness, presence of buses, presence of mopeds and prominence of children voices, the correlation with unpleasantness rises up to 98 %. With the regression model based on objective measures such as loudness calculated on Zwicker's model, and physical criteria extracted from the different source coded L Aeq curves, the correlation with unpleasantness rises up to 95 %.
[Show abstract][Hide abstract] ABSTRACT: The European Directive 2002/49/CE proposes the Lden and/or the Lnight criteria to assess noise impact on populations. These indicators are based on an average of the noise levels over long periods. When an infrastructure is not subjected to an important and regular flow of vehicles but rather with specific events that emerge from the background noise, the major influence on inhabitant's feelings is the nature of the sound sources identified. Moreover, although mechanical sounds are perceived negatively, some urban locations are perceived as pleasant (for instance a park or a market) due to the presence of birds or voices. The goal of O.R.U.S. project is to create a general system that provides an efficient representation of the acoustic environment i.e. in respect to what citizens perceive and describe. Monitoring systems located in strategic areas compute in real time an indicator that relates the unpleasantness and which is based on the identification of six categories of sound sources. Strategic maps providing spatial and temporal representations of sound sources identification and unpleasantness indicator are produced. An experiment is conducted in a Parisian neighbourhood.