
Omid Ghahabi- Ph.D.
- Research Scientist at EML Speech Technology GmbH
Omid Ghahabi
- Ph.D.
- Research Scientist at EML Speech Technology GmbH
About
23
Publications
5,987
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
304
Citations
Introduction
Current institution
EML Speech Technology GmbH
Current position
- Research Scientist
Additional affiliations
November 2016 - July 2020
EML Speech Technology GmbH
Position
- Researcher
July 2015 - July 2015
RTTH Summer School on Speech Technology: A Deep Learning Perspective
Position
- Lecturer
Description
- Deep Neural Networks for Speaker Recognition. http://rtthss2015.talp.cat/videos/RTTHSS2015-07Jul2015_Gahabi.mp4 http://rtthss2015.talp.cat/download/RTTHSS2015_Ghahabi.pdf
Education
September 2011 - February 2016
September 2006 - March 2009
Publications
Publications (23)
Speech Activity Detection (SAD), locating speech segments within an audio recording, is a main part of most speech technology applications. Robust SAD is usually more difficult in noisy conditions with varying signal-to-noise ratios (SNR). The Fearless Steps challenge has recently provided such data from the NASA Apollo-11 mission for different spe...
This technical report describes the EML submission to the first VoxCeleb speaker diarization challenge. Although the aim of the challenge has been the offline processing of the signals, the submitted system is basically the EML online algorithm which decides about the speaker labels in runtime approximately every 1.2 sec. For the first phase of the...
It is supposed in Speaker Recognition (SR) that everyone has a unique voice which could be used as an identity rather than or in addition to other identities such as fingerprint, face, or iris. Even though steps have been taken long ago to apply neural networks in SR, recent advances in computing hardware, new deep learning (DL) architectures and t...
Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily accessible in practice. On the other hand, the lack of speaker-labeled background data makes a big performance gap,
i...
Over the last few years, i-vectors have been the state-of-the-art technique in speaker and language recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need speaker or/and phonetic labels for the background data, which are not easily acce...
Over the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternat...
The lack of labeled background data makes a big performance gap between cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring baseline techniques for i-vectors in speaker recognition. Although there are some unsupervised clustering techniques to estimate the labels, they cannot accurately predict the true labels and they also assume...
This paper is focused on the application of the Language Identification (LID) technology for intelligent vehicles. We cope with short sentences or words spoken in moving cars in four languages: English, Spanish, German, and Finnish. As the response time of the LID system is crucial for user acceptance in this particular task, speech signals of diff...
Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBM-vector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying...
In this paper, we propose to discriminatively model target and impostor spectral features using Deep Belief Networks (DBNs) for speaker recognition. In the feature level, the number of impostor samples is considerably large compared to previous works based on i-vectors. Therefore, those i-vector based impostor selection algorithms are not computati...
The use of Restricted Boltzmann Machines (RBM) is proposed in this paper as a non-linear transformation of GMM supervectors for speaker recognition. It will be shown that the RBM transformation will increase the discrimination power of raw GMM supervectors for speaker recognition. The experimental results on the core test condition of the NIST SRE...
An effective global impostor selection method is proposed in this paper for discriminative Deep Belief Networks (DBN) in the context of a multi-session i-vector based speaker recognition. The proposed method is an iterative process in which in each iteration the whole impostor i-vector dataset is divided randomly into two subsets. The impostors in...
The acoustic environment of a typical neonatal intensive care unit (NICU) is very rich and may contain a large number of dif-ferent sounds, which come either from the equipment or from the human activities taking place in it. There exists a medi-cal concern about the effect of that acoustical environment on preterm infants, since loud sounds or par...
In this paper we propose an impostor selection method for a Deep Belief Network (DBN) based system which models i-vectors in a multi-session speaker verification task. In the proposed method, instead of choosing a fixed number of most informative impostors, a threshold is defined according to the frequencies of impostors. The selected impostors are...
The use of Deep Belief Networks (DBNs) is proposed in this paper to model discriminatively target and impostor i-vectors in a speaker verification task. The authors propose to adapt the network parameters of each speaker from a background model, which will be referred to as Universal DBN (UDBN). It is also suggested to backpropagate class errors up...
A fast, efficient and scalable algorithm is proposed, in this paper, for
re-encoding of perceptually quantized wavelet-packet transform (WPT)
coefficients of audio and high quality speech and is called "adaptive variable
degree-k zero-trees" (AVDZ). The quantization process is carried out by taking
into account some basic perceptual considerations,...
In this paper an efficient and low complexity perceptual method is proposed for quantizing the wavelet packet coefficients of high quality speech signals. The performance of the proposed method is compared, using the same codec, with the case where all coefficients are quantized using a fixed number of bits. The results on 500 TIMIT files show that...
In this paper an adaptive variable degree-k zero-tree (AVDZ) algorithm is proposed for re-encoding of perceptually quantized wavelet packet transform coefficients of high quality wideband speech. Its performance is compared with two well-known schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees...
This paper evaluates the problems of implementing two well-known zero-tree-based re-encoding schemes of Embedded Zero-tree Wavelet (EZW) and the set partitioning in hierarchical trees (SPIHT) for perceptually audio and high quality speech coding. Since the original EZW and SPIHT algorithms are designed for image compression, some new modifications...
This paper reports on the results of four re-encoding schemes on perceptually quantized wavelet packet transform (WPT) coefficients of audio and high quality speech. These schemes comprises: 1- embedded zero-tree wavelet (EZW) 2- The set partitioning in hierarchical trees (SPIHT) 3-JPEG-based entropy/run length Huffman and 4-JPEG-type audio Huffman...