Figure 6 - uploaded by David Novak
Content may be subject to copyright.
Source publication
We overview current problems of audio retrieval and time-series subsequence
matching. We discuss the usage of subsequence matching approaches in audio data
processing, especially in automatic speech recognition (ASR) area and we aim at
improving performance of the retrieval process. To overcome the problems known
from the time-series area like the...
Contexts in source publication
Context 1
... measures allows to compare sequence even if they have different lengths. It is allowed by the ability of time warping (See Fig. ...
Context 2
... sort of elasticity to the process of the time series comparison. It allows warping of sequences in a time to eliminate scaling or gaps on the time axis. Semantically, it means that for example in an audio retrieval process, we can identify a spoken word even if it was spoken in a different tempo than is the tempo of the indexed reference word. On Fig. 6 you can see the difference in a way of pairing values of the series being compared by fixed step Euclidean Distance and elastic DTW. Generally, DTW tries to find an optimal match between two sequences. More formally, Dynamic Time Warping is a linear programming method for finding a minimum cost path in an accumulation ...
Similar publications
With the increase in multi-media data over the Internet, query by example spoken term detection (QbE-STD) has become important in providing a search mechanism to find spoken queries in spoken audio. Audio search algorithms should be efficient in terms of speed and memory to handle large audio files. In general, approaches derived from the well know...
In this paper we propose a fast and memory efficient Dynamic Time Warping (MES-DTW) algorithm for the task of Query-by-Example Spoken Term Detection (QbE-STD). The proposed algorithm is based on the subsequence-DTW (S-DTW) algorithm, which allows the search for small spoken queries within a much bigger search collection of spoken documents by consi...
In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2012 MediaEval benchmark evaluation campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available. Continuing in the spirit of the 2011 SpokenWeb Search Task, which used speech from four...
Citations
... The Automatic Speech Recognition (ASR) is an extremely important area for human-computer interaction and the fundamental problem in this field is Spoken Term Detection (STD). Besides well-developed complex approaches based on anatomical landmarks trajectories x y z specific language models, the pattern recognition based on unsupervised methods, that focus on situations when no linguistic corpus is available, represent quite recent stream of research [28]. One of the studied approaches uses posteriorgram templates [29] extracted using a phonetic recognizer. ...
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.