Conference Paper

EXTRACTING DEEP BOTTLENECK FEATURES USING STACKED AUTO-ENCODERS

DOI: 10.1109/ICASSP.2013.6638284 Conference: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Volume: 38

ABSTRACT In this work, a novel training scheme for generating bottleneck features from deep neural networks is proposed. A stack of denoising auto-encoders is first trained in a layer-wise, unsupervised manner. Afterwards, the bottleneck layer and an additional layer are added and the whole network is fine-tuned to predict target phoneme states. We perform experiments on a Cantonese conversational telephone speech corpus and find that increasing the number of auto-encoders in the network produces more useful features, but requires pre-training, especially when little training data is available. Using more unlabeled data for pre-training only yields additional gains. Evaluations on larger datasets and on different system setups demonstrate the general applicability of our approach. In terms of word error rate, relative improvements of 9.2% (Cantonese, ML training), 9.3% (Tagalog, BMMI-SAT training), 12% (Tagalog, confusion network combinations with MFCCs), and 8.7% (Switchboard) are achieved.

0 Bookmarks
 · 
55 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Knowledge mining from immense datasets demands fast, reliable and affordable tools for interactive feature extraction and data visualization. First, we demonstrate interesting properties of the particle based multidimensional scaling (MDS) in the context of visual exploration of data. Then, we show that the novel Intel Many Integrated Core Architecture (MIC) represents a potentially interesting solution to speed up data embedding realized by multidimensional scaling when used for visualizing large data sets. We emphasize the advantages of MIC solution over both classical CPU and GPU implementations of MDS, especially, in encouraging computational speed and shorter code production time. We conclude that this combined approach offers a viable way for exploring and interpreting big data. We also propose a cost-effective desktop solution combining both GPU and MIC for interactive visualization.
    Concurrency and Computation Practice and Experience 02/2014; · 0.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Kaldi toolkit is becoming popular for constructing automated speech recognition (ASR) systems. Meanwhile, in recent years, deep neural networks (DNNs) have shown state-of-the-art performance on various ASR tasks. This document describes our open-source recipes to implement fully-fledged DNN acoustic modeling using Kaldi and PDNN. PDNN is a lightweight deep learning toolkit developed under the Theano environment. Using these recipes, we can build up multiple systems including DNN hybrid systems, convolutional neural network (CNN) systems and bottleneck feature systems. These recipes are directly based on the Kaldi Switchboard 110-hour setup. However, adapting them to new datasets is easy to achieve.
    01/2014;

Full-text

View
316 Downloads
Available from
May 16, 2014