Are you Claude Barras?

Claim your profile

Publications (25)0 Total impact

  • Conference Proceeding: Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data
    [show abstract] [hide abstract]
    ABSTRACT: Short-term cepstral features have long been chosen as standard features for speaker recognition thanks to their relevance and effectiveness. In contrast, discriminative features, calculated by a multi-layer perceptron (MLP) from much longer stretches of time, has been gradually adopted in automatic speech recognition (ASR). It has been shown that augmenting short-term cepstral features with long-term MLP (multi-layer perceptron) features makes it possible to improve significantly the performance of ASR. In this work, we investigate the possibility of augmenting short-term cepstral features with MLP features in order to improve the performance of text-independent speaker verification. We show, that, even though augmenting cepstral features with MLP features does not directly improve speaker verification performance, reducing the dimension of the augmented features, using principal component analysis (PCA), makes it possible to reduce, relatively, around 12% of the equal error rate (EER). Experiments are performed on telephone data of the 2008 NIST SRE (speaker recognition evaluation) database.
    INTERSPEECH; 08/2013
  • Source
    Conference Proceeding: Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization.
    INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011; 01/2011
  • Article: Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition.
    IEEE Transactions on Audio, Speech & Language Processing. 01/2010; 18:1366-1378.
  • Chapter: Acoustic Speaker Identification: The LIMSI CLEAR’07 System
    [show abstract] [hide abstract]
    ABSTRACT: The CLEAR 2007 acoustic speaker identification task aims to identify speakers in CHIL seminars via the acoustic channel. The LIMSI system for this task consists of a standard Gaussian mixture model based system working on cepstral coefficients, with MAP adaptation of a Universal Background Model (UBM). It builds upon the LIMSI CLEAR’06 system with several modifications: removal of feature normalization and frames filtering, and pooling of all speaker enrollment data for UBM training. The primary system uses a beamforming of all audio channels, while a single channel is selected for the contrastive system. This latter system performs the best and improves the baseline system by 50% relative for the 1 second and 5 seconds test conditions.
    06/2008: pages 233-239;
  • Conference Proceeding: Annotation and analysis of overlapping speech in political interviews.
    Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May - 1 June 2008, Marrakech, Morocco; 01/2008
  • Conference Proceeding: Comparing prosodic models for speaker recognition.
    INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, September 22-26, 2008; 01/2008
  • Conference Proceeding: The LIMSI RT07 Lecture Transcription System.
    Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers; 01/2007
  • Conference Proceeding: Multi-stage Speaker Diarization for Conference and Lecture Meetings.
    Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers; 01/2007
  • Conference Proceeding: Acoustic Speaker Identification: The LIMSI CLEAR'07 System.
    Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers; 01/2007
  • Source
    Conference Proceeding: Speaker Diarization: From Broadcast News to Lectures.
    Machine Learning for Multimodal Interaction, Third International Workshop, MLMI 2006, Bethesda, MD, USA, May 1-4, 2006, Revised Selected Papers; 01/2006
  • Article: Multistage speaker diarization of broadcast news.
    IEEE Transactions on Audio, Speech & Language Processing. 01/2006; 14:1505-1512.
  • Conference Proceeding: The CLEAR'06 LIMSI Acoustic Speaker Identification System for CHIL Seminars.
    Multimodal Technologies for Perception of Humans, First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006, Southampton, UK, April 6-7, 2006, Revised Selected Papers; 01/2006
  • Source
    Conference Proceeding: Combining speaker identification and BIC for speaker diarization.
    INTERSPEECH 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005; 01/2005
  • Article: Feature And Score Normalization For Speaker
    Claude Barras, Jean-luc Gauvain
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents some experiments with feature and score normalization for text-independent speaker verification of cellular data. The speaker verification system is based on cepstral features and Gaussian mixture models with 1024 components. The following methods, which have been proposed for feature and score normalization, are reviewed and evaluated on cellular data: cepstral mean subtraction (CMS), variance normalization, feature warping, T-norm, Z-norm and the cohort method. We found that the combination of feature warping and T-norm gives the best results on the NIST 2002 test data (for the one-speaker detection task). Compared to a baseline system using both CMS and variance normalization and achieving a 0.410 minimal decision cost function (DCF), feature warping and T-norm respectively bring 8% and 12% relative reductions, whereas the combination of both techniques yields a 22% relative reduction, reaching a DCF of 0.320. This result approaches the state-of-the-art performance level obtained for speaker verification with land-line telephone speech.
    02/2004;
  • Source
    Article: Processing Broadcast Audio for Information Access
    [show abstract] [hide abstract]
    ABSTRACT: This paper addresses recent progress in speaker-independent, large vocabulary, continuous speech recognition, which has opened up a wide range of near and mid-term applications. One rapidly expanding application area is the processing of broadcast audio for information access. At LIMSI, broadcast news transcription systems have been developed for English, French, German, Mandarin and Portuguese, and systems for other languages are under development. Audio indexation must take into account the specificities of audio data, such as needing to deal with the continuous data stream and an imperfect word transcription.
    10/2002;
  • Article: Transcribing Audio-Video Archives
    Claude Barras, Alexandre Allauzen, Lori Lamel
    [show abstract] [hide abstract]
    ABSTRACT: This paper addresses the automatic transcription of audiovideo archives using a state-of-the-art broadcast news speech transcription system. A 9-hour corpus spanning the latter half of the 20th century (1945-1995) has been transcribed and an analysis of the transcription quality carried out. In addition to the challenges of transcribing heterogenous broadcast news data, we are faced with changing properties of the archive over time, such as the audio quality, the speaking style, vocabulary items and manner of expression. After assessing the performance of the transcription system, several paths are explored in an attempt to reduce the mismatch between the acoustic and language models and the archived data.
    07/2002;
  • Source
    Article: Automatic Transcription Of Compressed Broadcast Audio
    Claude Barras, Lori Lamel, Jean-luc Gauvain
    [show abstract] [hide abstract]
    ABSTRACT: With increasing volumes of audio and video data broadcast over the web, it is of interest to assess the performance of state-of-theart automatic transcription systems on compressed audio data for media indexation applications. In this paper the performance of the LIMSI 10x French broadcast news transcription system is measured on a two-hour audio set for a range of MP3 and RealAudio codecs at various bitrates and the GSM codec used for European cellular phone communications. The word error rates are compared with those obtained on high quality PCM recordings prior to compression. For a 6.5 kbps audio bit rate (the most commonly used on the web), word error rates under 40% can be achieved, which makes automatic media monitoring systems over the web a realistic task. 1.
    06/2001;
  • Source
    Article: An Overview of Speech Recognition Activities at LIMSI
    [show abstract] [hide abstract]
    ABSTRACT: This paper provides an overview of recent activities at LIMSI in multilingual speech recognition and its applications. The main goal of speech recognition is to provide a transcription of the speech signal as a sequence of words. Speech recognition is a core technology for most applications involving voice technology. The two main classes of applications currently addressed are transcription and indexation of broadcast data and spoken language dialog systems for information access. Speaker-independent, large vocabulary, continuous speech recognition systems for different European languages (French, German and British English) and for American English and Mandarin Chinese have been developed. These systems rely on supporting research in acoustic-phonetic modeling, lexical modeling and language modeling. 1. INTRODUCTION Speech recognition and related application areas have been a long term research topic at LIMSI, going back to the early 1980's. Our aim is to develop basic speech reco...
    03/2001;
  • Source
    Article: The LIMSI SDR System for TREC-9
    [show abstract] [hide abstract]
    ABSTRACT: In this paper we describe the LIMSI Spoken Document Retrieval system used in the TREC-9 evaluation. This system combines an adapted version of the LIMSI 1999 Hub-4E transcription system for speech recognition with text-based IR methods. Compared with the LIMSI TREC-8 system, this year's system is able to index the audio data without knowledge of the story boundaries using a double windowing approach. The query expansion procedure of the information retrieval component has been revised and makes use of contemporaneous text sources.
    03/2001;
  • Article: Automatic Transcription
    Claude Barras, Lori Lamel, Jean-luc Gauvain
    [show abstract] [hide abstract]
    ABSTRACT: With increasing volumes of audio and video data broadcast over the web, it is of interest to assess the performance of state-of-theart automatic transcription systems on compressed audio data for media indexation applications. In this paper the performance of the LIMSI 10x French broadcast news transcription system is measured on a two-hour audio set for a range of MP3 and RealAudio codecs at various bitrates and the GSM codec used for European cellular phone communications. The word error rates are compared with those obtained on high quality PCM recordings prior to compression. For a 6.5 kbps audio bit rate (the most commonly used on the web), word error rates under 40% can be achieved, which makes automatic media monitoring systems over the web a realistic task.
    03/2001;