Zhi Hao Lim’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (5)


Performance of the primary submissions on the devel- opment and evaluation sets.
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
  • Preprint
  • File available

April 2019

·

307 Reads

·

Ville Hautamaki

·

·

[...]

·

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.

Download




Table 2 : List of telephone speech copora partitioned into train- ing, development, and test sets used in I4U for SRE'16. 
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016

August 2017

·

324 Reads

·

18 Citations

The 2016 speaker recognition evaluation (SRE'16) is the latest edition in the series of benchmarking events conducted by the National Institute of Standards and Technology (NIST). I4U is a joint entry to SRE'16 as the result from the collaboration and active exchange of information among researchers from sixteen Institutes and Universities across 4 continents. The joint submission and several of its 32 subsystems were among top-performing systems. A lot of efforts have been devoted to two major challenges, namely, unlabeled training data and dataset shift from Switchboard–Mixer to the new Call My Net dataset. This paper summarizes the lessons learned, presents our shared view from the sixteen research groups on recent advances, major paradigm shift, and common tool chain used in speaker recognition as we have witnessed in SRE'16. More importantly, we look into the intriguing question of fusing a large ensemble of subsystems and the potential benefit of large-scale collaboration .

Citations (4)


... Promising results have been reported in literature using machine learning (ML) and deep learning (DL) models and techniques, including support vector machines (SVMs) [2], multilayer perceptron (MLP) DNNs [3][4], recurrent neural networks (RNNs) with long short term memory (LSTM) cells [5][6], convolutional neural networks (CNNs) or convolutional-recurrent neural networks (CRNNs) [7]. In [8], a very large feature set was used, obtained by applying several statistical functions (e.g., mean, variance, the first, second, and third quartiles, etc.) on mathematical descriptors (e.g., the maximum value, the minimum value, frame-to-frame differences, etc.) computed for the log-energy, the estimated pitch, the Mel-frequency cepstral coefficients (MFCCs), and their delta and delta-delta coefficients, with the employed models being SVMs. ...

Reference:

Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques
Investigation of fixed-dimensional speech representations for real-time speech emotion recognition system
  • Citing Conference Paper
  • December 2017

... Apart from that, standard replay detection systems model all the frames identically, however voiced and unvoiced regions could mask channel information differently and a speaker's voice might mask the channel information in voiced regions, so if unvoiced portions are focused on, the channel information may become more pronounced [17]. This further suggests that voiced and unvoiced frames might not contain identically emphasized discriminative information for replay detection. ...

An investigation of spectral feature partitioning for replay attacks detection
  • Citing Conference Paper
  • December 2017

... In this regard, the state-of-the-art SR systems incorporate joint factor analysis (JFA) [12] and i-vector [13][14][15], as one of its important variants, in their implementations. In recent years, deep neural network (DNN) is also used in the structure of i-vector-based systems [16,17]. The initial attempts with DNNs for SR have been made in the context of i-vector speaker modeling in terms of computing the phonetic posteriors [18,19]. ...

The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016

... Mis à part les travaux réalisés dans le cadre de cette thèse, des collaborations ont été menés avec des collègues du laboratoire LIA ainsi que des chercheurs faisant partie d'autres équipes de recherche dans le domaine du TALN ( Lee et al., , 2017Ajili et al., 2017a;Laaridh et al., 2017;Rouvier et al., 2016;Ajili et al., 2017bAjili et al., , 2016Ben Kheder et al., 2016b;). ...

The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016