Conference Paper

Non-Speech Audio Event Detection

INESC-ID, Lisboa
DOI: 10.1109/ICASSP.2009.4959998 Conference: ICASSP 2009 - IEEE International Conference on Acoustics, Speech and Signal Processing, At Taipei, Taiwan
Source: IEEE Xplore


Audio event detection is one of the tasks of the European project VIDIVIDEO. This paper focuses on the detection of non-speech events, and as such only searches for events in audio segments that have been previously classified as non-speech. Preliminary experiments with a small corpus of sound effects have shown the potential of this type of corpus for training purposes. This paper describes our experiments with SVM and HMM-based classifiers, using a 290-hour corpus of sound effects. Although we have only built detectors for 15 semantic concepts so far, the method seems easily portable to other concepts. The paper reports experiments with multiple features, different kernels and several analysis windows. Preliminary experiments on documentaries and films yielded promising results, despite the difficulties posed by the mixtures of audio events that characterize real sounds.

  • Source
    • "Acoustic event detection and classification (AED/C) [1] recently draws great attention of audio research community [2]. Its potential applications are diverse, such as surveillance [3], healthcare [4], and meeting room transcription [5], among many others. "

    Full-text · Conference Paper · Oct 2015
  • Source
    • "A number of scene classification studies have explored the relevance of low-level features in capturing scene characteristics. These features include low-level time based and frequency based descriptors like short-time energy (STE), zerocrossing rate (ZCR), voicing features like periodicity and pitch information , linear predictive coding coefficients (LPC), as well as the energy distribution entropy of discrete Fourier transform components [3] [4] [5] [6] [7] [8]. These reports suggest that low-level acoustic features are powerful in distinguishing simple scenes. "

    Full-text · Conference Paper · Oct 2015
  • Source
    • "SIFT or SURF features [5], distributed acoustic SLAM (DASLAM) would be based on matching acoustic events. There exist many algorithms in the field of audio event detection (AED) [6] [7]. However, in case of acoustic SLAM the event detection and classification accuracy is irrelevant. "
    Conference Paper: Distributed acoustic SLAM
    [Show abstract] [Hide abstract]
    ABSTRACT: Vision-based methods are very popular for simultaneous localization and environment mapping (SLAM). One can imagine that exploiting the natural acoustic landscape of the robot’s environment can prove to be a useful alternative to vision SLAM. Visual SLAM depends on matching local features between images, whereas distributed acoustic SLAM is based on matching acoustic events. Proposed DASLAM is based on distributed microphone arrays, where each microphone is connected to a separate, moving, controllable recording device, which requires compensation for their different clock shifts. We show that this controlled mobility is necessary to deal with underdetermined cases. Estimation is done using particle filtering. Results show that both tasks can be accomplished with good precision, even for the theoretically underdetermined cases. For example, we were able to achieve mapping error as low as 17.53 cm for sound sources with localization error of 18.61 cm and clock synchronization error of 42 μs for 2 robots and 2 sources.
    Full-text · Conference Paper · Aug 2015
Show more