Conference Paper

Non-Speech Audio Event Detection

INESC-ID, Lisboa
DOI: 10.1109/ICASSP.2009.4959998 Conference: ICASSP 2009 - IEEE International Conference on Acoustics, Speech and Signal Processing, At Taipei, Taiwan
Source: IEEE Xplore

ABSTRACT Audio event detection is one of the tasks of the European project VIDIVIDEO. This paper focuses on the detection of non-speech events, and as such only searches for events in audio segments that have been previously classified as non-speech. Preliminary experiments with a small corpus of sound effects have shown the potential of this type of corpus for training purposes. This paper describes our experiments with SVM and HMM-based classifiers, using a 290-hour corpus of sound effects. Although we have only built detectors for 15 semantic concepts so far, the method seems easily portable to other concepts. The paper reports experiments with multiple features, different kernels and several analysis windows. Preliminary experiments on documentaries and films yielded promising results, despite the difficulties posed by the mixtures of audio events that characterize real sounds.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Audio event detection (AED) and recognition is a signal processing and analysis domain used in a wide range of applications including surveillance, home automation and behavioral assessment. The field presents numerous challenges to the current state-of-the-art due to its highly nonlinear nature. High false alarm rates (FARs) in such applications particularly limit the capabilities of vision-based perimeter monitoring systems by inducing high operator dependence. On the other hand, conventional fence-based vibration detectors and pressure-driven “taut wires” offer high sensitivity at the cost of a high FAR due to debris, animals and weather.
    Robotics and Autonomous Systems 04/2015; DOI:10.1016/j.robot.2015.04.004 · 1.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new front-end for Acoustic Event Classification tasks (AEC). First, we study the spectral characteristics of different acoustic events in comparison with the structure of speech spectra. Second, from the findings of this study, we propose a new parameterization for AEC, which is an extension of the conventional Mel Frequency Cepstrum Coefficients (MFCC) and is based on the high pass filtering of the acoustic event signal. The proposed front-end have been tested in clean and noisy conditions and compared to the conventional MFCC in an AEC task. Results support the fact that the high pass filtering of the audio signal is, in general terms, beneficial for the system, showing that the removal of frequencies below 100-275 Hz in the feature extraction process in clean conditions and below 400-500 Hz in noisy conditions, improves significantly the performance of the system with respect to the baseline.
    Computer Speech & Language 04/2014; 30(1). DOI:10.1016/j.csl.2014.04.001 · 1.81 Impact Factor
  • Foundations and Trends® in Information Retrieval 01/2012; 5(3):235-422. DOI:10.1561/1500000020

Full-text (3 Sources)

Available from
Jun 5, 2014