Javier Tejedor

Javier Tejedor
University Foundation San Pablo CEU | CEU

PhD Computer Science and Telecommunic. Engineering

About

86
Publications
12,220
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
870
Citations
Additional affiliations
February 2008 - August 2008
The University of Edinburgh
Position
  • Research visitor
Description
  • Internship at AMIDA project
April 2007 - September 2007
The University of Edinburgh
Position
  • Research visitor
Description
  • Keyword Spotting and Spoken Term Detection

Publications

Publications (86)
Preprint
Full-text available
This work focuses on designing low complexity hybrid tensor networks by considering trade-offs between the model complexity and practical performance. Firstly, we exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN. Secondly, a hybrid model combining LR-TT-DNN with a convoluti...
Preprint
Full-text available
This work aims to design a low complexity spoken command recognition (SCR) system by considering different trade-offs between the number of model parameters and classification accuracy. More specifically, we exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our command recognition system, namely...
Preprint
Full-text available
This work investigates an extension of transfer learning applied in machine learning algorithms to the emerging hybrid end-to-end quantum neural network (QNN) for spoken command recognition (SCR). Our QNN-based SCR system is composed of classical and quantum components: (1) the classical part mainly relies on a 1D convolutional neural network (CNN)...
Article
The large amount of information stored in audio and video repositories makes search on speech (SoS) a challenging area that is continuously receiving much interest. Within SoS, spoken term detection (STD) aims to retrieve speech data given a text-based representation of a search query (which can include one or more words). On the other hand, query-...
Article
Full-text available
This study evaluates and compares the suitability for child–computer interaction (CCI, the branch within human–computer interaction focused on interactive computer systems for children) of two devices: a standard computer mouse and the ENLAZA interface, a head mouse that measures the user’s head posture using an inertial sensor. A multidirectional...
Article
Full-text available
Heart disease is currently the leading cause of death in the world. The electrocardiogram (ECG) is the recording of the electrical activity generated by the heart. Its low cost and simplicity have made it an essential test for monitoring heart disease, especially for the identification of arrhythmias. With the advances in electronic technology, the...
Article
Full-text available
We present a new pipeline integrity surveillance system for long gas pipeline threat detection and classification. The system is based on distributed acoustic sensing with phase-sensitive optical time domain reflectometry (ϕ-OTDR) and pattern recognition for event classification. The proposal incorporates a multi-position approach in a Gaussian Mix...
Article
Full-text available
Time and spatial domains in ϕ-OTDR perturbation detection and recognition for pipeline and border security applications in very long fiber under test (FUT) environments have not been properly analyzed so far. We propose in this paper several issues and the corresponding solutions that should be considered in both domains when developing those appli...
Preprint
Full-text available
Time and spatial domains in φ-OTDR perturbation detection and recognition for pipeline and border security applications in very long fiber under test (FUT) environments have not been properly analyzed so far. We propose in this paper several issues and the corresponding solutions that should be considered in both domains when developing those appli...
Preprint
This paper proposes to generalize the variational recurrent neural network (RNN) with variational inference (VI)-based dropout regularization employed for the long short-term memory (LSTM) cells to more advanced RNN architectures like gated recurrent unit (GRU) and bi-directional LSTM/GRU. The new variational RNNs are employed for slot filling, whi...
Conference Paper
Full-text available
Distributed automatic speech recognition (ASR) requires to aggregate outputs of distributed deep neural network (DNN)-based models. This work studies the use of submodular functions to design a rank aggregation on score-based permutations, which can be used for distributed ASR systems in both supervised and unsupervised modes. Specifically, we comp...
Preprint
Distributed automatic speech recognition (ASR) requires to aggregate outputs of distributed deep neural network (DNN)-based models. This work studies the use of submodular functions to design a rank aggregation on score-based permutations, which can be used for distributed ASR systems in both supervised and unsupervised modes. Specifically, we comp...
Article
Full-text available
Φ-OTDR perturbation detection applications demand optimal precision of the perturbation location. Strategies for improving both Signal-to-Noise (SNR) and precision of the perturbation location in a laboratory environment may fail when applying to a very long fiber under test (FUT) in real-field environments. With this deployment, meaningful energy...
Article
Full-text available
This paper presents a review of the techniques found in the literature that aim to achieve a robust heartbeat detection from fusing multi-modal physiological signals (e.g., electrocardiogram (ECG), blood pressure (BP), artificial blood pressure (ABP), stroke volume (SV), photoplethysmogram (PPG), electroencephalogram (EEG), electromyogram (EMG), an...
Article
Full-text available
In this work, a new clustering algorithm especially geared towards merging data arising from multiple sensors is presented. The algorithm, called PN-EAC, is based on the ensemble clustering paradigm and it introduces the novel concept of negative evidence. PN-EAC combines both positive evidence, to gather information about the elements that should...
Article
Full-text available
Nuisance Alarm Rate (NAR) is critical in ϕ-OTDR perturbation detection systems. We present in this letter a novel match filtering-based feature extractor which aims to noise reduction so that the detection system gets improved performance. This feature extractor requires a small number of data vectors to be acquired which is combined with a random...
Article
Full-text available
Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a speech repository given a textual representation of a search term (which can include one or more words). This paper presents a multi-domain inter...
Article
Full-text available
Abstract The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given a spoken query. Research on this area is continuously fostered with the organization of QbE STD evalua...
Article
This paper presents a novel pipeline integrity surveillance system aimed to the detection and classification of threats in the vicinity of a long gas pipeline. The sensing system is based on phase-sensitive optical time domain reflectometry ( $\phi$ -OTDR) technology for signal acquisition and pattern recognition strategies for threat identificati...
Conference Paper
Full-text available
Huge training datasets for automatic speech recognition (ASR) typically contain redundant information so that a subset of data is generally enough to obtain similar ASR performance to that obtained when the entire dataset is employed for training. Although the centralized submodular-based data selection methods have been successfully applied to obt...
Article
Full-text available
Query-by-example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given an acoustic (spoken) query containing the term of interest as the input. This paper presents the systems submitted to the ALBAYZIN QbE STD 2016 Evaluation held as a part of the ALBAYZIN 2016 Evaluation Campaign at the IberSPEECH 2016 conference. Sp...
Conference Paper
A pipeline integrity threat detection system using Distributed Acoustic Sensing and Artificial Intelligence (AI) is presented. The AI uses a combination of Gaussian Mixture Models and Hidden Markov Models (GMMs-HMMs), outperforming our former GMM-based system.
Article
This paper presents an on-line augmented surveillance system that aims to real time monitoring of activities along a pipeline. The system is deployed in a fully realistic scenario and exposed to real activities carried out in unknown places at unknown times within a given test time interval (so-called blind field tests). We describe the system arch...
Article
Full-text available
Within search-on-speech, Spoken Term Detection (STD) aims to retrieve data from a speech repository given a textual representation of a search term. This paper presents an international open evaluation for search-on-speech based on STD in Spanish and an analysis of the results. The evaluation has been designed carefully so that several analyses of...
Article
Full-text available
There is an increasing interest in researchers and companies on the combination of Distributed Acoustic Sensing (DAS) and a Pattern Recognition System (PRS) to detect and classify potentially dangerous events that occur in areas above fiber optic cables deployed along active pipelines, aiming to construct pipeline surveillance systems. This paper p...
Article
Full-text available
Unsupervised rank aggregation on score-based permutations, which is widely used in many applications, has not been deeply explored yet. This work studies the use of submodular optimization for rank aggregation on score-based permutations in an unsupervised way. Specifically, we propose an unsupervised approach based on the Lovasz Bregman divergence...
Article
Full-text available
This paper presents a novel surveillance system aimed at the detection and classification of threats in the vicinity of a long gas pipeline. The sensing system is based on phase-sensitive optical time domain reflectometry (ϕ-OTDR) technology for signal acquisition and pattern recognition strategies for threat identification. The proposal incorporat...
Article
This paper presents the first available report in the literature of a system aimed at the detection and classification of threats in the vicinity of a long gas pipeline. The system is based on phase-sensitive optical time domain reflectometry (φ-OTDR) technology for signal acquisition and pattern recognition strategies for threat identification. Th...
Article
The popular n-gram language model (LM) is weak for unfrequent words. Conventional approaches such as class-based LMs pre-define some sharing structures (e.g., word classes) to solve the problem. However, defining such structures requires prior knowledge, and the context sharing based on these structures is generally inaccurate. This paper presents...
Conference Paper
Full-text available
Distributed deep neural networks are commonly employed for building automatic speech recognition (ASR) systems. In this work, we employ the robust submodular partitioning approach, which aims to split the training data into small disjoint data subsets and use each of these subsets to train a particular deep neural network. Two efficient algorithms...
Conference Paper
Full-text available
This work is originated from the MLSP 2014 Classification Challenge which tries to automatically detect subjects with schizophrenia and schizo-affective disorder by analyzing multi-modal features derived from magnetic resonance imaging (MRI) data. We employ Deep Neural Network (DNN)-based multi-view representation learning for combining multi-modal...
Article
Full-text available
Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due to the large volume of multimedia information. This paper presents the systems submitted to the ALBAYZIN QbE STD 2014 evaluation held as a par...
Article
Full-text available
Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient...
Article
Full-text available
Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data...
Conference Paper
Full-text available
The preliminary results of a surveillance system set up for real time monitoring activities along a pipeline and analyzing for possible threats are presented. The system consists of a phi-OTDR based sensor used to monitor vibrations along an optical fiber combined with a pattern recognition system that classifies the recorded signals. The acoustic...
Article
Full-text available
One of the aims of Assistive Technologies is to help people with disabilities to communicate with others and to provide means of access to information. As an aid to Deaf people, we present in this work a production-quality rule-based machine system for translating from Spanish to Spanish Sign Language (LSE) glosses, which is a necessary precursor t...
Chapter
This paper presents the ATVS-CSLT-HCTLab spoken term detection (STD) system submitted to the NIST 2013 Open Keyword Search evaluation. The evaluation consists of searching a list of query terms in Vietnamese conversational speech data. Our submission involves an automatic speech recognition (ASR) subsystem which converts speech signals into word/ph...
Conference Paper
Full-text available
Speaker verification suffers from significant performance degradation with emotion variation. In a previous study, we have demonstrated that an adaptation approach based on MLLR/CMLLR can provide a significant performance improvement for verification on emotional speech. This paper follows this direction and presents an emotional adaptive training...
Article
Full-text available
Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR) an...
Data
The bottleneck (BN) feature, particularly based on deep structures, has gained significant success in automatic speech recognition (ASR). However, applying the BN feature to small/medium-scale tasks is nontrivial. An obvious reason is that the limited training data prevent from training a compli-cated deep network; another reason, which is more sub...
Data
Recent work demonstrates impressive success of the bottle-neck (BN) feature in speech recognition, particularly with deep networks plus appropriate pre-training. A widely admitted ad-vantage associated with the BN feature is that the network struc-ture can learn multiple environmental conditions with abundant training data. For tasks with limited t...
Conference Paper
Full-text available
The bottleneck (BN) feature, particularly based on deep structures, has gained significant success in automatic speech recognition (ASR). However, applying the BN feature to small/medium-scale tasks is nontrivial. An obvious reason is that the limited training data prevent from training a compli-cated deep network; another reason, which is more sub...
Conference Paper
Full-text available
Recent work demonstrates impressive success of the bottle-neck (BN) feature in speech recognition, particularly with deep networks plus appropriate pre-training. A widely admitted ad-vantage associated with the BN feature is that the network struc-ture can learn multiple environmental conditions with abundant training data. For tasks with limited t...
Article
Discriminative confidence based on multi-layer perceptrons (MLPs) and multiple features has shown significant advantage compared to the widely used lattice-based confidence in spoken term detection (STD). Although the MLP-based framework can handle any features derived from a multitude of sources, choosing all possible features may lead to over com...
Conference Paper
Deaf people cannot properly access the speech information stored in any kind of recording format (audio, video, etc). We present a system that provides with subtitling and Spanish Sign Language rep- resentation capabilities to allow Spanish Deaf population can access to such speech content. The system is composed by a speech recognition module, a m...
Conference Paper
Watch a demo at: http://www.youtube.com/watch?v=ctmvCHguJKM An on-line Spanish-Spanish Sign Language (LSE) translation system is presented in which Spanish speech content is translated into LSE to provide Spanish deaf peo- ple access to speech information. It is cloud-based, built over a speech recognition module, a transfer-based machine translat...
Article
This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing phone-state posteriors and the second making use of a compressive NN layer. They are combined with th...
Article
An important component of a spoken term detection (STD) system involves estimating confidence measures of hypothesised detections. A potential problem of the widely used lattice-based confidence estimation, however, is that the confidence scores are treated uniformly for all search terms, regardless of how much they may differ in terms of phonetic...
Article
Convolutive non-negative matrix factorization (CNMF) and its sparse version, convolutive non-negative sparse coding (CNSC), exhibit great success in speech processing. A particular limitation of the current CNMF/CNSC approaches is that the convolution ranges of the bases in learning are identical, resulting in patterns covering the same time-span....
Article
An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexi...
Conference Paper
This paper presents the first results of the integration of a Spanish-to-LSE Machine Translation (MT) system into an e-learning platform. Most e-learning platforms provide speech-based contents, which makes them inaccessible to the Deaf. To solve this issue, we have developed a MT system that translates Spanish speech-based contents into LSE. To t...
Article
We propose a new discriminative confidence measurement approach based on an evolution strategy for spoken term detection (STD). Our evolutionary algorithm, named evolutionary discriminant analysis (EDA), optimizes classification errors directly, which is a salient advantage compared with some conventional discriminative models which optimize object...
Conference Paper
We present the three approaches submitted to the Spoken Web Search. Two of them rely on Acoustic Keyword Spotting (AKWS) while the other relies on Dynamic TimeWarping. Features are 3-state phone posterior. Results suggest that applying a Karhunen-Loeve transform to the log-phone posteriors representing the query to build a GMM/HMM for each query an...
Article
Spoken term detection (STD) is the task of searching for occurrences of spoken terms in audio archives. It relies on robust confidence estimation to make a hit/false alarm (FA) decision. In order to optimize the decision in terms of the STD evaluation metric, the confidence has to be discriminative. Multi-layer perceptrons (MLPs) and support vector...
Article
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, marzo de 2009 Bibliogr.: p. 155-165
Article
Full-text available
Phones and graphemes have a high dependence in Spanish language, contrary to English and other languages. Modelling graphemes instead of phones avoids having a big knowledge about Spanish language and no grapheme-to-phone conversion rules are needed. A Keyword Spotting system based on graphemes as acoustic modelling was tested over the geographic c...
Conference Paper
Full-text available
Discriminative confidence estimation along with confidence normalisation have been shown to construct robust decision maker modules in spoken term detection (STD) systems. Discriminative confidence estimation, making use of term-dependent features, has been shown to improve the widely used lattice-based confidence estimation in STD. In this work, w...
Article
Full-text available
Query-by-example (QbE) spoken term detection (STD) is necessary for low-resource scenarios where training mate-rial is hardly available and word-based speech recognition systems cannot be employed. We present two novel contri-butions to QbE STD: the first introduces several criteria to select the optimal example used as query throughout the search...
Article
Confidence measures play a very important role in keyword spotting systems. Traditional confidence measures are based on the score computed when the audio is decoded. Classification-based techniques by means of Multi-layer Perceptrons (MLPs) and Support Vector Machines have shown to be powerful ways to improve the final performance in terms of hits...
Conference Paper
Full-text available
Confidence measures play a key role in spoken term detection (STD) tasks. The confidence measure expresses the posterior probability of the search term appearing in the detection period, given the speech. Traditional approaches are based on the acoustic and language model scores for candidate detections found using automatic speech recognition, wit...