Conference Paper

Non-Speech Audio Event Detection

INESC-ID, Lisboa
DOI: 10.1109/ICASSP.2009.4959998 Conference: ICASSP 2009 - IEEE International Conference on Acoustics, Speech and Signal Processing, At Taipei, Taiwan
Source: IEEE Xplore

ABSTRACT Audio event detection is one of the tasks of the European project VIDIVIDEO. This paper focuses on the detection of non-speech events, and as such only searches for events in audio segments that have been previously classified as non-speech. Preliminary experiments with a small corpus of sound effects have shown the potential of this type of corpus for training purposes. This paper describes our experiments with SVM and HMM-based classifiers, using a 290-hour corpus of sound effects. Although we have only built detectors for 15 semantic concepts so far, the method seems easily portable to other concepts. The paper reports experiments with multiple features, different kernels and several analysis windows. Preliminary experiments on documentaries and films yielded promising results, despite the difficulties posed by the mixtures of audio events that characterize real sounds.

Download full-text


Available from: António Joaquim Serralheiro, Jun 29, 2015
  • Source
    Conference Paper: OxyBlood
    [Show abstract] [Hide abstract]
    ABSTRACT: With the growth of the video game industry, interest in video game research has increased, leading to the study of Serious Games. Serious Games are generally perceived as games with purposes other than mere entertainment. These purposes range from education to training, marketing to design, among others. By exploiting the potential of these virtual worlds, we can provide the possibility to experience situations that would otherwise be impossible to experience in the real world, mainly due to reasons of cost, safety and time. Web browsers can now provide easy access to realistic virtual worlds with WebGL, which grants video game developers the tools to create compelling and rich environments that can be used in the future by virtually anyone. This paper presents a prototype of a Serious Game developed with WebGL.
    Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on; 07/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper focuses on Audio Event Detection (AED), a research area which aims to substantially enhance the access to audio in multimedia content. With the ever-growing quantity of multimedia documents uploaded on the Web, automatic description of the audio content of videos can provide very useful information, to index, archive and search multimedia documents. Preliminary experiments with a sound effects corpus showed good results for training models. However, the performance on the real data test set, where there are overlapping audio events and continuous background noise is lower. This paper describes the AED framework and methodologies used to build 6 Audio Event detectors, based on statistical machine learning tools (Support Vector Machines). The detectors showed some promising improvements achieved by adding background noises to the training data, comprised of clean sound effects that are quite different from the real audio events in real life videos and movies. A graphical interface prototype is also presented, that allows browsing a movie by its content and provides an audio event description with time codes.
    Proceedings of EUROCON 2011, International Conference on Computer as a Tool, 27-29 April 2011, Lisbon, Portugal; 04/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a novel scheme for unstructured audio scene classification that possesses three highly desirable and powerful features: autonomy, scalability, and robustness. Our scheme is based on our recently introduced machine learning algorithm called Simultaneous Temporal And Contextual Splitting (STACS) that discovers the appropriate number of states and efficiently learns accurate Hidden Markov Model (HMM) parameters for the given data. STACS-based algorithms train HMMs up to five times faster than Baum-Welch, avoid the overfitting problem commonly encountered in learning large state-space HMMs using Expectation Maximization (EM) methods such as Baum-Welch, and achieve superior classification results on a very diverse dataset with minimal pre-processing. Furthermore, our scheme has proven to be highly effective for building real-world applications and has been integrated into a commercial surveillance system as an event detection component.
    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, 14-19 March 2010, Sheraton Dallas Hotel, Dallas, Texas, USA; 01/2010