Nikos Fakotakis

University of Patras, Rhion, West Greece, Greece

Are you Nikos Fakotakis?

Claim your profile

Publications (291)106.72 Total impact

  • T. Theodorou · I. Mporas · N. Fakotakis
    [Show abstract] [Hide abstract]
    ABSTRACT: The audio analysis of speaker’s surroundings has been a first step for several processing systems that enable speaker’s mobility though his daily life. These algorithms usually operate in a short-time analysis decomposing the incoming events in time and frequency domain. In this paper, an automatic sound recognizer is studied, which investigates audio events of interest from urban environment. Our experiments were conducted using a close set of audio events from which well known and commonly used audio descriptors were extracted and models were training using powerful machine learning algorithms. The best urban sound recognition performance was achieved by SVMs with accuracy equal to approximately 93%.
    No preview · Article · Jan 2015
  • Source
    Theodoros Theodorou · Iosif Mporas · Nikos Fakotakis

    Preview · Article · Oct 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Expression of emotional state is considered to be a core facet of an individual's emotional competence. Emotional processing in BN has not been often studied and has not been considered from a broad perspective. This study aimed at examining the implicit and explicit emotional expression in BN patients, in the acute state and after recovery. Sixty-three female participants were included: 22 BN, 22 recovered BN (R-BN), and 19 healthy controls (HC). The clinical cases were drawn from consecutive admissions and diagnosed according to DSM-IV-TR diagnostic criteria. Self reported (explicit) emotional expression was measured with State-Trait Anger Expression Inventory-2, State-Trait Anxiety Inventory, and Symptom Check List-90 items-Revised. Emotional facial expression (implicit) was recorded by means of an integrated camera (by detecting Facial Feature Tracking), during a 20 minutes therapeutic video game. In the acute illness explicit emotional expression [anxiety (p<0.001) and anger (p<0.05)] was increased. In the recovered group this was decreased to an intermediate level between the acute illness and healthy controls [anxiety (p<0.001) and anger (p<0.05)]. In the implicit measurement of emotional expression patients with acute BN expressed more joy (p<0.001) and less anger (p<0.001) than both healthy controls and those in the recovered group. These findings suggest that there are differences in the implicit and explicit emotional processing in BN, which is significantly reduced after recovery, suggesting an improvement in emotional regulation.
    Full-text · Article · Jul 2014 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Monitoring of animal communities is necessary to assess the conservation status of threatened species and to implement efficient conservation measures. However, classical observer-based survey techniques are expensive and time-consuming. Automated acoustic monitoring provides a solution for monitoring sound-emitting animals, such as mammals, birds, amphibians, and insects. Several Autonomous Recording Units (ARUs) can be simultaneously operated in 24/7 modus.
    Full-text · Conference Paper · Sep 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report on the development of an automated acoustic bird recognizer with improved noise robustness, which is part of a long-term project, aiming at the establishment of an automated biodiversity monitoring system at the Hymettus Mountain near Athens, Greece. In particular, a typical audio processing strategy, which has been proved quite successful in various audio recognition applications, was amended with a simple and effective mechanism for integration of temporal contextual information in the decision-making process. In the present implementation, we consider integration of temporal contextual information by joint post-processing of the recognition results for a number of preceding and subsequent audio frames. In order to evaluate the usefulness of the proposed scheme on the task of acoustic bird recognition, we experimented with six widely used classifiers and a set of real-field audio recordings for seven bird species commonly present at the Hymettus Mountain. The highest achieved recognition accuracy obtained on the real-field data was approximately 93%, while experiments with additive noise showed significant robustness in low signal-to-noise ratio setups. In all cases, the integration of temporal contextual information was found to improve the overall accuracy of the recognizer.
    Full-text · Article · Jul 2013 · International Journal of Intelligent Systems Technologies and Applications
  • [Show abstract] [Hide abstract]
    ABSTRACT: The MoveOn speech and noise database was purposely designed and implemented in support of research on spoken dialogue interaction in a motorcycle environment. The distinctiveness of the MoveOn database results from the requirements of the application domain—an information support and operational command and control system for the two-wheel police force—and also from the specifics of the adverse open-air acoustic environment. In this article, we first outline the target application, motivating the database design and purpose, and then report on the implementation details. The main challenges related to the choice of equipment, the organization of recording sessions, and some difficulties that were experienced during this effort, are discussed. We offer a detailed account of the database statistics, the suggested data splits in subsets, and discuss results from automatic speech recognition experiments which illustrate the degree of complexity of the operational environment.
    No preview · Article · Jun 2013 · Language Resources and Evaluation
  • [Show abstract] [Hide abstract]
    ABSTRACT: The performance of recent dereverberation methods for reverberant speech preprocessing prior to Automatic Speech Recognition (ASR) is compared for an extensive range of room and source-receiver configurations. It is shown that room acoustic parameters such as the clarity (C50) and the definition (D50) correlate well with the ASR results. When available, such room acoustic parameters can provide insight into reverberant speech ASR performance and potential improvement via dereverberation preprocessing. It is also shown that the application of a recent dereverberation method based on perceptual modelling can be used in the above context and achieve significant Phone Recognition (PR) improvement, especially under highly reverberant conditions.
    No preview · Article · Jan 2013 · Computer Speech & Language
  • Source
    I Mporas · I Kotinas · D Kyrgiopoulos · O Jahn · K Riede · O Kocsis · N Fakotakis

    Full-text · Article · Jan 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report on a recent progress with the development of an automated bioacoustic bird recognizer, which is part of a long-term project , aiming at the establishment of an automated biodiversity monitoring system at the Hymettus Mountain near Athens. In particular, employing a classical audio processing strategy, which has been proved quite successful in various audio recognition applications, we evaluate the appropriateness of six classifiers on the bird species recognition task. In the experimental evaluation of the acoustic bird recognizer, we made use of real-field audio recordings of two bird species, which are known to be present at the Hymettus Mountain. Encouraging recognition accuracy was obtained on the real-field data, and further experiments with additive noise demonstrated significant noise robustness in low SNR conditions.
    Full-text · Conference Paper · Nov 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a novel design, implementation and evaluation of a speech interface, as part of a platform for the development of serious games. The speech interface consists of the speech recognition component and the emotion recognition from speech component. The speech interface relies on a platform designed and implemented to support the development of serious games, which supports cognitive-based treatment of patients with mental disorders. The implementation of the speech interface is based on the Olympus/RavenClaw framework. This framework has been extended for the needs of the specific serious games and the respective application domain, by integrating new components, such as emotion recognition from speech. The evaluation of the speech interface utilized purposely collected domain-specific dataset. The speech recognition experiments show that emotional speech moderately affects the performance of the speech interface. Furthermore, the emotion detectors demonstrated satisfying performance for the emotion states of interest, Anger and Boredom, and contributed towards successful modelling of the patient’s emotion status. The performance achieved for speech recognition and for the detection of the emotional states of interest was satisfactory. Recent evaluation of the serious games showed that the patients started to show new coping styles with negative emotions in normal stress life situations.
    No preview · Article · Sep 2012 · Expert Systems with Applications
  • S. Ntalampiras · I. Potamitis · N. Fakotakis
    [Show abstract] [Hide abstract]
    ABSTRACT: Automatic recognition of sound events can be valuable for efficient situation analysis of audio scenes. In this article we address the problem of detecting human activities in natural environments based solely on the acoustic modality. The primary goal is the continuous acoustic surveillance of a particular natural scene for illegal human activities (trespassing, hunting, etc.) in order to promptly alert an authorized officer for taking the appropriate measures. We constructed a novel system that is mainly characterized by its hierarchical structure as well as by its acoustic parameters. Each sound class is represented by a hidden Markov model created using descriptors from the time, frequency, and wavelet domains. The system has the ability to automatically adapt to acoustic conditions of different scenes via the feedback loop that serves unsupervised model refinement. We conducted extensive experiments for assessing the performance of the system with respect to its recognition and detection capabilities. To this end we employed confusion matrices and Detection Error Tradeoff curves while we report that high performance was achieved for both detection and recognition.
    No preview · Article · Sep 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper gives an overview of the assessment and evaluation methods which have been used to determine the quality of the INSPIRE smart home system. The system allows different home appliances to be controlled via speech, and consists of speech and speaker recognition, speech understanding, dialogue management, and speech output components. The performance of these components is first assessed individually, and then the entire system is evaluated in an interaction experiment with test users. Initial results of the assessment and evaluation are given, in particular with respect to the transmission channel impact on speech and speaker recognition, and the assessment of speech output for different system metaphors.
    Full-text · Dataset · Aug 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a two-stage phone duration modelling scheme, which can be applied for the improvement of prosody modelling in speech synthesis systems. This scheme builds on a number of independent feature constructors (FCs) employed in the first stage, and a phone duration model (PDM) which operates on an extended feature vector in the second stage. The feature vector, which acts as input to the first stage, consists of numerical and non-numerical linguistic features extracted from text. The extended feature vector is obtained by appending the phone duration predictions estimated by the FCs to the initial feature vector. Experiments on the American-English KED TIMIT and on the Modern Greek WCL-1 databases validated the advantage of the proposed two-stage scheme, improving prediction accuracy over the best individual predictor, and over a two-stage scheme which just fuses the first-stage outputs. Specifically, when compared to the best individual predictor, a relative reduction in the mean absolute error and the root mean square error of 3.9% and 3.9% on the KED TIMIT, and of 4.8% and 4.6% on the WCL-1 database, respectively, is observed.
    Full-text · Article · Aug 2012 · Computer Speech & Language
  • Source
    T Ganchev · I Potamitis · O Jahn · K Riede · N Fakotakis

    Full-text · Article · Jun 2012
  • Source
    I Mporas · T Ganchev · I Potamitis · N Fakotakis · O Kocsis · O Jahn · K Riede

    Full-text · Article · Jun 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report on a research effort aiming at the development of an acoustic bird activity detector (ABAD), which plays an important role for automating the traditional biodiversity assessment studies -- presently performed by human experts. The proposed on-line ABAD is considered an integral part of an automated system for acoustic identification of bird species, which is currently under development. In particular, taking advantage of real-field recordings collected at the Hymettus Mountain near Athens, we investigate the applicability of various machine learning techniques for the needs of our ABAD, which is intended to run on a mobile device. Performance is reported in terms of recogni-tion accuracy on audio frame level, due to the restrictions imposed by the requirement of run-time decision making with limited memory and energy resources. We report recognition accuracy of approximately 86% on a frame level, which is quite promising and encourages further research efforts in that direction.
    Full-text · Chapter · May 2012
  • Source
    I Mporas · T Ganchev · I Potamitis · N Fakotakis · O Kocsis · O Jahn · K Riede

    Full-text · Article · Apr 2012
  • Source
    T Ganchev · I Potamitis · O Jahn · K Riede · N Fakotakis

    Full-text · Article · Apr 2012
  • Source
    E. Skodras · G. Siogkas · E. Dermatas · N. Fakotakis
    [Show abstract] [Hide abstract]
    ABSTRACT: Vehicle detection based on on-board mounted cameras is an integral component of many driver assistance systems aiming at alerting the driver about impending collisions. In this paper an automated algorithm for detection of preceding vehicles is proposed, based on the detection of rear vehicle lights. Unlike many systems which make use of static threshold boundaries for the red color segmentation of rear lights, our method combines color and radial symmetry cues while the threshold is dynamically adapted. The extracted candidate rear lights are morphologically paired in order to define possible areas where vehicles are present. The verification of vehicle presence is then carried out through axial symmetry check. Experimental results that demonstrate the system's high detection rates and robustness even in adverse illumination and weather conditions are finally presented.
    Full-text · Conference Paper · Apr 2012
  • Source
    T Kostoulas · C Tsimpouris · O Jahn · K Riede · T Ganchev · O Kocsis · N Fakotakis

    Full-text · Article · Jan 2012

Publication Stats

2k Citations
106.72 Total Impact Points

Institutions

  • 1925-2014
    • University of Patras
      • • Laboratory of Wire Communications
      • • Department of Electrical and Computer Engineering
      Rhion, West Greece, Greece
  • 2012
    • Hospital Universitari de Bellvitge
      • Department of Psychiatry
      l'Hospitalet de Llobregat, Catalonia, Spain
  • 2001
    • University of Bucharest
      București, Bucureşti, Romania