Conference Paper

Non-Speech Audio Event Detection

INESC-ID, Lisboa
DOI: 10.1109/ICASSP.2009.4959998 Conference: ICASSP 2009 - IEEE International Conference on Acoustics, Speech and Signal Processing, At Taipei, Taiwan
Source: IEEE Xplore

ABSTRACT Audio event detection is one of the tasks of the European project VIDIVIDEO. This paper focuses on the detection of non-speech events, and as such only searches for events in audio segments that have been previously classified as non-speech. Preliminary experiments with a small corpus of sound effects have shown the potential of this type of corpus for training purposes. This paper describes our experiments with SVM and HMM-based classifiers, using a 290-hour corpus of sound effects. Although we have only built detectors for 15 semantic concepts so far, the method seems easily portable to other concepts. The paper reports experiments with multiple features, different kernels and several analysis windows. Preliminary experiments on documentaries and films yielded promising results, despite the difficulties posed by the mixtures of audio events that characterize real sounds.

Download full-text


Available from: António Joaquim Serralheiro, Sep 25, 2015
1 Follower
134 Reads
  • Source
    • "Acoustic event detection and classification (AED/C) [1] recently draws great attention of audio research community [2]. Its potential applications are diverse, such as surveillance [3], healthcare [4], and meeting room transcription [5], among many others. "
    IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2015), New York, USA; 10/2015
  • Source
    • "SIFT or SURF features [5], distributed acoustic SLAM (DASLAM) would be based on matching acoustic events. There exist many algorithms in the field of audio event detection (AED) [6] [7]. However, in case of acoustic SLAM the event detection and classification accuracy is irrelevant. "
    Conference Paper: Distributed acoustic SLAM
    [Show abstract] [Hide abstract]
    ABSTRACT: Vision-based methods are very popular for simultaneous localization and environment mapping (SLAM). One can imagine that exploiting the natural acoustic landscape of the robot’s environment can prove to be a useful alternative to vision SLAM. Visual SLAM depends on matching local features between images, whereas distributed acoustic SLAM is based on matching acoustic events. Proposed DASLAM is based on distributed microphone arrays, where each microphone is connected to a separate, moving, controllable recording device, which requires compensation for their different clock shifts. We show that this controlled mobility is necessary to deal with underdetermined cases. Estimation is done using particle filtering. Results show that both tasks can be accomplished with good precision, even for the theoretically underdetermined cases. For example, we were able to achieve mapping error as low as 17.53 cm for sound sources with localization error of 18.61 cm and clock synchronization error of 42 μs for 2 robots and 2 sources.
    EUSIPCO 2015; 08/2015
  • Source
    • "The feature extraction module extracts different types of features, some of these features are common in Automatic Speech Recognition (ASR), others borrowed from Music Information Retrieval (MIR). The audio inference module concerns the final detection of the audio events, which can be achieved using different machine learning methods [1], [2]. Different techniques are used and in the literature there is no optimal solution, because many times they focus on specific cases. "
    Conference Paper: OxyBlood
    [Show abstract] [Hide abstract]
    ABSTRACT: With the growth of the video game industry, interest in video game research has increased, leading to the study of Serious Games. Serious Games are generally perceived as games with purposes other than mere entertainment. These purposes range from education to training, marketing to design, among others. By exploiting the potential of these virtual worlds, we can provide the possibility to experience situations that would otherwise be impossible to experience in the real world, mainly due to reasons of cost, safety and time. Web browsers can now provide easy access to realistic virtual worlds with WebGL, which grants video game developers the tools to create compelling and rich environments that can be used in the future by virtually anyone. This paper presents a prototype of a Serious Game developed with WebGL.
    Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on; 07/2011
Show more