- [Show abstract] [Hide abstract] ABSTRACT: The "bag-of-frames" approach (BOF), which encodes audio signals as the long-term statistical distribution of short-term spectral features, is commonly regarded as an effective and sufficient way to represent environmental sound recordings (soundscapes) since its introduction in an influential 2007 article. The present paper describes a concep-tual replication of this seminal article using several new soundscape datasets, with results strongly questioning the adequacy of the BOF approach for the task. We show that the good accuracy originally re-ported with BOF likely result from a particularly thankful dataset with low within-class variability, and that for more realistic datasets, BOF in fact does not perform significantly better than a mere one-point av-erage of the signal's features. Soundscape modeling, therefore, may not be the closed case it was once thought to be. Progress, we ar-gue, could lie in reconsidering the problem of considering individual acoustical events within each soundscape.
- [Show abstract] [Hide abstract] ABSTRACT: The "bag-of-frames" approach (BOF), which encodes audio signals as the long-term statistical distribution of short-term spectral features, is commonly regarded as an effective and sufficient way to represent environmental sound recordings (soundscapes) since its introduction in an influential 2007 article. The present paper describes a conceptual replication of this seminal article using several new soundscape datasets, with results strongly questioning the adequacy of the BOF approach for the task. We show that the good accuracy originally reported with BOF likely result from a particularly thankful dataset with low within-class variability, and that for more realistic datasets, BOF in fact does not perform significantly better than a mere one-point average of the signal's features. Soundscape modeling, therefore, may not be the closed case it was once thought to be. Progress, we argue, could lie in reconsidering the problem of considering individual acoustical events within each soundscape.
- [Show abstract] [Hide abstract] ABSTRACT: La gestion des nuisances et l'intégration dans leur environnement est un des enjeux des grandes infrastructures tels que les aéroports internationaux. En matière de bruit généré par le trafic aérien, Aéroports de Paris s'est vu confier par l'État les missions d'assurer la réalisation des mesures de bruit, conformément aux prescriptions de l'ACNUSA (Autorité de Contrôle des NUisances Sonores Aéroportuaires), et de mettre à la disposition du public les informations environnementales et les résultats des mesures sur les nuisances sonores causées par les aéronefs. Aéroports de Paris dispose d'un réseau d'une quarantaine de stations de mesure de bruit installées autour des aéroports parisiens. Ces stations mesurent en continu le niveau sonore et détectent automatiquement les évènements acoustiques aéronautiques par l'analyse de l'évolution du niveau sonore. Par ailleurs, les informations de vol des aéronefs reçues en temps réel par un radar secondaire permettent d'identifier précisément les avions mesurés. L'ensemble de ces données est ensuite analysé à travers plusieurs indicateurs statistiques. L'indicateur annuel IGmp (Indicateur Global Mesuré Pondéré, arrêté du 28 janvier 2003) qui limite l'impact sonore du trafic aérien sur l'aéroport de Paris-Charles de Gaulle impose d'obtenir une mesure acoustique exhaustive de chaque survol d'aéronef. Cette exigence nécessite d'améliorer la robustesse de la détection acoustique et de la corrélation avec les traces radars. Pour répondre à cette volonté d'amélioration du système, le Laboratoire d'Aéroports de Paris teste une nouvelle technologie dont le principe consiste en une analyse fine du signal audio. L'efficacité de ce nouveau système est comparable à l'oreille humaine et montre un solide potentiel pour améliorer la détection des bruits d'avions, et par conséquent la qualité des indicateurs de bruit.
- [Show abstract] [Hide abstract] ABSTRACT: This study uses an audio signal transformation, splicing, to create an experimental situation where human listeners judge the similarity of audio signals, which they cannot easily categorize. Splicing works by segmenting audio signals into 50-ms frames, then shuffling and concatenating these frames back in random order. Splicing a signal masks the identification of the categories that it normally elicits: For instance, human participants cannot easily identify the sound of cars in a spliced recording of a city street. This study compares human performance on both normal and spliced recordings of soundscapes and music. Splicing is found to degrade human similarity performance significantly less for soundscapes than for music: When two spliced soundscapes are judged similar to one another, the original recordings also tend to sound similar. This establishes that humans are capable of reconstructing consistent similarity relations between soundscapes without relying much on the identification of the natural categories associated with such signals, such as their constituent sound sources. This finding contradicts previous literature and points to new ways to conceptualize the different ways in which humans perceive soundscapes and music.
- [Show abstract] [Hide abstract] ABSTRACT: The "bag-of-frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This approach has proved nearly optimal for simulating the auditory perception of natural and human environments (or soundscapes), and is also the most predominent paradigm to extract high-level descriptions from music signals. However, recent studies show that, contrary to its application to soundscape signals, BOF only provides limited performance when applied to polyphonic music signals. This paper proposes to explicitly examine the difference between urban soundscapes and polyphonic music with respect to their modeling with the BOF approach. First, the application of the same measure of acoustic similarity on both soundscape and music data sets confirms that the BOF approach can model soundscapes to near-perfect precision, and exhibits none of the limitations observed in the music data set. Second, the modification of this measure by two custom homogeneity transforms reveals critical differences in the temporal and statistical structure of the typical frame distribution of each type of signal. Such differences may explain the uneven performance of BOF algorithms on soundscapes and music signals, and suggest that their human perception rely on cognitive processes of a different nature.
- [Show abstract] [Hide abstract] ABSTRACT: The "bag of frames" approach (BOF) to audio pattern recognition represents signals as the long-term statistical distribution of their local spectral features. This computational technique was recently used to simulate human judgements of the holistic similarity between pieces of polyphonic music ("this sounds like The Beatles"), without any direct modelling of their individual musical instruments (namely, voice, electric guitar, drums, etc.). This paper proposes to apply the same measure of acoustic similarity to natural and human sound environments (or soundscapes). We find that the approach can simulate human categorization in a simple taxonomy of urban soundscapes to near-perfect precision, better in fact than that achieved with musical signals. Such techniques to recognize environmental acoustic configurations globally without any direct modelling of their constituent sound sources can be used to disambiguate finer-but-noisier source identification algorithms: if this globally sounds like a "park", then this "car horn" must be a "bird". Based on the proposed algorithm, we discuss the difference between such contextual effects in soundscapes and music perception.
- [Show abstract] [Hide abstract] ABSTRACT: The aim of this research is to characterize the appraisal of urban soundscapes where sound sources are explicitly recognized. A series of three experiments is conducted in a laboratory where subjects listen to twenty 15-s sound samples representing a sound environment along a classical street in Paris. In these samples, listeners can recognize cars, mopeds, motorbikes, buses, birds and human voices. The first experiment collects ratings of the subjective descriptors "prominence, presence and proximity" of sound sources, the second and third respectively obtain assessments of the overall hedonic judgment and of the loudness of the sound samples. Physical parameters of the different sound sources are extracted from the coded LAeq curves. Multiple regressions provide a good summary of the relationship between appraisal and subjective descriptors or objective parameters. They show that, by adding the information about sound source characteristics to the perceived loudness or to the Zwicker's Loudness, the percentage of explained variance increases. To investigate the appraisal of other typical urban soundscapes such as market and park, an on-site experiment using the same procedure is carried out during an urban walk divided into sixteen 90-s sequences. Again the percentage of the explained variance of the hedonic judgement is increased by taking into account the sound source characteristics. The prediction of the scale unpleasantness can even be effective based only on objective characteristics of sound sources such as the number of sources or their time ratio of presence. These results are discussed in terms of sound "events" compared to ambient sound.
Conference Paper: Automatic Recognition of Urban Sound Sources[Show abstract] [Hide abstract] ABSTRACT: The goal of the FDAI project is to create a general system that computes an efficient representation of the acoustic environment. More precisely, FDAI has to compute a noise disturbance indicator based on the identification of six categories of sound sources. This paper describes experiments carried out to identify acoustic features and recognition models that were implemented in FDAI. This framework is based on EDSâExtractor Discovery Systemâan innovative acoustic feature extraction system for sound feature extraction. The design and development of FDAI raised two critical issues. Completeness: it is very difficult to design descriptors that identify every sound source in urban environments; and Consistency: some sound sources are not acoustically consistent. We solved the first issue with a conditional evaluation of a family of acoustic descriptors, rather than the evaluation of a single general-purpose extractor. Indeed, a first hierarchical separation between vehicles (moped, bus, motorcycle and car) and non-vehicles (bird and voice) significantly raised the accuracy of identification of the buses. The second issue turned out to be more complex and is still under study. We give here preliminary results.
- [Show abstract] [Hide abstract] ABSTRACT: Territoires et Environnements, Rue d'Eragny, Neuville sur Oise, 95031 Cergy Pontoise, email@example.com Starting with a panel of 20 stimuli of 15 seconds each, representing an urban sound environment, a perceptive study centred on the "noise object" is undertaken. Listeners are invited to judge their feeling of every sound sources heard in a three dimension perceptual space dealing with the "prominence", the "presence" and the "proximity" of the source. Statistical analyses illustrate the relative sensibility of listeners between each type of sound sources (vehicles, birds and voices). Different physical measures related to time and energy calculated from the different source coded L Aeq curves, are correlated to the perceptive dimensions for each type of sources. Unpleasantness and perceived loudness of each stimulus are asked in a second and third test. Naturally, the unpleasantness is correlated with the perceived loudness, but multiple regression analysis including the variables linked to "prominence", "presence" or "proximity" of certain sources increases the correlation with unpleasantness of the stimuli. With the regression model based on perceived loudness, presence of buses, presence of mopeds and prominence of children voices, the correlation with unpleasantness rises up to 98 %. With the regression model based on objective measures such as loudness calculated on Zwicker's model, and physical criteria extracted from the different source coded L Aeq curves, the correlation with unpleasantness rises up to 95 %.
Article: Separation of urban sound sources[Show abstract] [Hide abstract] ABSTRACT: The perception of the urban sound environmental quality depends on several acoustic indices but also on the identification of the sources. Sources related to the human activities are not perceived same manner as mechanical or natural sources. This work presents a method to separate these types of sources starting from their spectral and temporal signature in order to extract acoustic parameters which take into account the identification of these sound sources. Therefore, a spectral signature called "gauge" is set up for each type of source. These gauges are used as filter to store in real time a specific temporal evolution by source category. On each evolution, the background noise and emerging peak levels are extracted from different slipping periods at every moment. These two parameters allow, by difference, to calculate a new index called "activity" of the source. This method requires to optimise the parameters which influence the activity measurements for the various sources (gauges and slipping periods). The gauges and the time parameters, which have been extracted from recordings carried out along Parisian streets, and which allow the differentiation of the categories, like trucks, private cars and motor cycles, are particularly studied.
- [Show abstract] [Hide abstract] ABSTRACT: The European Directive 2002/49/CE proposes the Lden and/or the Lnight criteria to assess noise impact on populations. These indicators are based on an average of the noise levels over long periods. When an infrastructure is not subjected to an important and regular flow of vehicles but rather with specific events that emerge from the background noise, the major influence on inhabitant's feelings is the nature of the sound sources identified. Moreover, although mechanical sounds are perceived negatively, some urban locations are perceived as pleasant (for instance a park or a market) due to the presence of birds or voices. The goal of O.R.U.S. project is to create a general system that provides an efficient representation of the acoustic environment i.e. in respect to what citizens perceive and describe. Monitoring systems located in strategic areas compute in real time an indicator that relates the unpleasantness and which is based on the identification of six categories of sound sources. Strategic maps providing spatial and temporal representations of sound sources identification and unpleasantness indicator are produced. An experiment is conducted in a Parisian neighbourhood.
Article: Activity of Urban Sound sources[Show abstract] [Hide abstract] ABSTRACT: The objective characterization of an urban environment is generally assessed by the sound level evolution. This study completes the global classical indicators, by finding new indicators which characterize the importance or the activity of urban sources. They are calculated from activity curves, using the spectral characteristics of each type of source. Feelings of prominence, proximity and presence of sources are studied. For cars and mopeds, prominence of a source is correlated to the percentile criteria calculated on the activity curves and the feeling of the presence, defined as the time of source presence, does not seem to be interpreted so easily.
- [Show abstract] [Hide abstract] ABSTRACT: Résumé L'objectif de ce travail est de caractériser le désagrément sonore d'un environnement urbain, en se basant sur l'identification des sources sonores. Un test perceptif in situ est effectué. Il consiste en un parcours de 45 minutes en site urbain passant un parc, une rue de quartier et un marché. Les sujets, répartis en deux groupes selon leur sens de parcours, évaluent l'environnement sonore selon 3 dimensions perceptives (présence, prégnance et proximité) relatives à chaque catégorie de source sonore entendue (véhicules, voix et oiseaux) ; la force sonore et le désagrément sonore des séquences sont également évalués. Une étude statistique met ensuite en évidence le fait que la mesure du désagrément sonore à court terme (durée des séquences : 90 secondes) dans les ambiances calmes (marché et parc), ne peut s'appuyer que faiblement sur un indicateur physique d'intensité, mais est améliorée par la prise en compte de variables liées aux sources sonores caractéristiques de l'ambiance considérée. De plus, le sens de parcours n'a pas d'influence sur l'évaluation, si ce n'est au niveau des transitions entre les ambiances.