About
48
Publications
10,654
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
552
Citations
Citations since 2017
Introduction
Jakub Gałka has been involved in several Polish and European research projects related to speech and audio processing, signal processing and data analysis. His research focus lies in speech and language processing and recognition, speaker recognition, multimedia signal processing and data analysis. He works on the development of commercially available ASR and speaker verification systems.
Additional affiliations
January 2003 - present
Publications
Publications (48)
Multimodal data is being used more widely for human action recognition nowadays due to the progress of machine learning methods and the development of new types of sensors. The acquisition of the data required by such solutions is often troublesome, and it is difficult to find the proper tools for this process. In this paper, we present a new toolk...
The basic definitions of the wavelets theory are presented. Proposed is transformation which combines two transformations: wavelet and Fourier. It is compared with the well known composition of Hilbert and Fourier transformation. The properties of the new transformation and its exemplary application is presented.
The most popular systems for automatic sign language recognition are based on vision. They are user-friendly, but very sensitive to changes in regard to recording conditions. This paper presents a description of the construction of a more robust system-an accelerometer glove-as well as its application in the recognition of sign language gestures. T...
The aim of the described system is to provide an online solution that profiles customers of a call centre. As an auxiliary module it might enhance functionality of modern call centre systems by active voice analysis. Integrated with existing databases, our system allows for analysis of constant and temporal caller
characteristics during a call — re...
This paper presents the creation of a Polish Sign Language corpus suitable for recognition research and automatic translation of sign language. The recording approach used and the captured data modalities are presented, as well as the description of the acquisition system implementation. The evaluation of the collected corpus is presented and compa...
A speech recognition system for the Polish language is described. The presentation will focus on an adjustment of the Kaldi toolkit for Polish, our own grapheme to phoneme conversion tool and a corpus of Polish we collected. The approaches to commercial applications will also be described.
A supporting system of voice analysis for emergency call centers is being developed at AGH University of Science and Technology in Krakow. The aim of our work is to provide an innovative supporting tool for rapid and accurate assessment of caller profile. The project covers not only speaker identification (when speaker's speech sample is known), bu...
Playback attacks constitute one of the biggest threats in biometric speaker verification systems, in which a previously recorded passphrase is played back by an unprivileged person in order to gain access. This paper features a description of the playback attack detection (PAD) algorithm, designed to protect text-dependent speaker verification syst...
The paper presents the concept of the embedded solution for voice biometric access system. The most important requirements for access control systems are presented, as well as the resulting design intent. The architecture of the system, its functionality and the methods used to verify the speakers is described along with a discussion of performance...
The paper presents the concept of embedded solution for voice biometric access system. The most important requirements for access control systems are presented, as well as the resulting design intent. The architecture of the created system, its functionality and the methods used to verify the speakers is described along with a discussion of basic t...
The aim of our work is to develop the software for caller identification or to create his characteristic by analysis of his voice. Based on collected speech samples, our system aims to identify emergency callers both on-line and off-line. This homeland security project covers speaker recognition (when speaker's speech sample is known), speaker's ge...
An automatic speech recognition system for Polish is demonstrated. A few layers of our system are different from popular approaches as a result of differences between Polish and English languages.
The paper presents successful experiments on combining two
speaker recognition methods into a hybrid system. The �rst branch of
recognition is an innovative approach based on discrete wavelet-Fourier
transform. The second one is classic, based on HTK and classi�cation
into voice and unvoice segments. The hybrid solution outperforms both
on a small...
The phonemic statistics were collected from several large Polish corpora. The paper presents methodology of the acquisition process, summarisation of the data and some phenomena in the statistics. Triphone statistics apply context-dependent speech units which have an important role in speech technologies. The phonemic alphabet for Polish, SAMPA, an...
A method of choosing a word hypothesis from a dictionary of a speech recognition system is presented. The method applies a modified weighted Levenshtein distance for better accuracy. The distance is counted between phonetic transcriptions of a string of phonemes received from a classifier and of a dictionary. It allows efficient conducting of speec...
The statistics of Polish phones, biphones and triphones were collected from several corpora. The paper presents summarisation of the data and some statistics phenomena including a distribution of frequency of biphones and triphones occurring. The model applying these statistics in speech recognition is presented as well.
We demonstrate an automatic speech recognition system for Polish continuous speech. As most of the progress in the field is done for English, a few layers of our system are different from popular approaches in this field. These elements of our system could be successfully ported to other languages which share some features with Polish: the speech c...
This paper presents Mean Best Basis algorithm, an extension of the well known Best Basis Wickerhouser’s method, for an adaptive
wavelet decomposition of variable-length signals. A novel approach is used to obtain adecomposition tree of the wavelet-packet
cosine hybrid transform forspeech signal feature extraction. Obtained features are tested using...
This paper suggests a speech enhancement approach to an eavesdropping audio system. Speech signal is disturbed by non-stochastic noise. The algorithm is based on recordings from dual-microphone system. The Wiener filter was applied for speech extraction. The algorithm is designed to capture dialogues in noisy environment as well. It uses the small...
A non-uniform speech segmentation method based on discrete wavelet transform is used for the localization of phoneme boundaries.
A vector of real values representing the digital speech signal is decomposed into phone-like units by placing segment borders
according to the result of the multiresolution analysis. The final decision on localization of...
A non-uniform speech segmentation method based on wavelet packet transform is used for the localisation of phoneme boundaries. Eleven subbands are chosen by applying the mean best basis algorithm. Perceptual scale is used for decomposition of speech via Meyer wavelet in the wavelet packet structure. A real valued vector representing the digital spe...
The paper presents modifications of the well know Levenshtein metric. The suggested improvements result in better automatic speech recognition when Levenshtein metric is applied to compare words from a dictionary and speech recognition hypotheses. It allows to evaluate hypotheses and to choose the word which was actually spoken.
This paper presents two different methods of speech ex-traction: cross-correlation analysis and adaptive filtering. Algorithms are designed to extract conversations in noisy environment. Such situations can appear in police inves-tigations' materials or multi-speaker environment. Noise can be added intentionally by suspects or not intentionally (e....
In this paper, we propose a feature selection and transformation approach for universal steganalysis based on genetic algorithm (GA) and higher order statistics. We choose three types of typical statistics as candidate features and twelve kinds of basic functions as candidate transformations. The GA is utilized to select a subset of candidate featu...
In this paper a new application of the wavelet packet cosine transform (WPCT), used in the adaptive wavelet parameterization scheme, is presented. This is an extension of the best basis algorithm. Obtained optimized wavelet decomposition schemes are used for speech feature extraction and are tested using Polish language hidden Markov model (HMM) ph...
The phonetical statistics of Polish were collected from a newspaper corpus of around 110 000 000 words. The paper presents summarisation of the data which are phoneme ngrams and some phenomena in the statistics including a distribution of frequency of triphones occurring. Triphone statistics apply context-dependent speech units which have an import...
The phonetical statistics were collected from several Polish corpora. The paper is a summary of the data which are phoneme n-grams and some phenomena in the statistics. Triphone statistics apply context-dependent speech units which have an important role in speech recognition systems and were never calculated for a large set of Polish written texts...
A speech recognition system based on HTK for Polish is presented. It was trained on 365 utterances, all spoken by 26 males. The features of Polish with respect to speech recognition are described. Some aspects of speech recognition differ in comparison to English. Errors in recognition were analysed in details in an attempt to find reasons and scen...
A new event-driven method of speech signals segmentation is presented. The wavelet discrete transform was used for spectral analysis and to create a segmentation procedure. Innovative event detector is the core of the process. Efficiency of the algorithm is tested against the hand annotated speech corpus.
Speech segmentation is widely used in many speech applications. We propose a new wavelet-based extension of the typical spectrum-based non-uniform speech segmentation methods. The use of wavelets improves computation performance and provides easy and flexible adjusting of algorithm parameters. Segmentation accuracy measures are introduced and appli...
Speech segmentation is a very difficult problem, because of continuous nature of speech. Segmenting speech into various units (phonemes, syllables, and acoustic atoms) is essential in many applications. Choosing the best method of segmentation must be preceded by evaluation of its performance. This paper is a study of various numerical measures for...
Abstract The Polish text corpus was analysed to find information about phoneme,statistics. We were especially interested in triphones as they are commonly,used in many,speech processing applications like HTK speech recogniser. An attempt to create the full list of triphones for Polish language,is presented. A vast amount,of phonetically transcribed...
Progress of automatic speech recognition systems’ (ASR) development is, inter alia, made by using signal representation sensitive
for more and more sophisticated features. This paper is an overview of our investigation of the new context-sensitive speech
signal’s representation, based on wavelet-Fourier transform (WFT), and proposal of it’s quality...
Statistical data on phonemes, useful in continuous speech recognition system, are presented. This paper explains basics of a simple system for phonemes, diphones and triphones statistics estimation from a text corpus of Polish language. Obtained results are presented for exemplar text database. Possible application of the statistics is suggested.
The wavelet-Fourier transform, as a new attitude to representation and analysis of a dynamically changeable signal, a specially the speech one is in-troduced. It delivers global characteristic of frequencies' local changes. This representation is used in phonemes to find the similarities between them. To explore behavior of a wavelet-Fourier transf...
The statistics of Polish phonemes, diphones and triphones were collected from a large literature corpus. The paper presents summarisation of the data and focuses on interesting phenom-ena in the statistics. Triphone statistics play an important role in speech recognition systems. They are used to improve the proper transcription of the analysed spe...