Jakub Gałka

Jakub Gałka
AGH University of Science and Technology in Kraków | AGH · Department of Electronics

PhD

About

48
Publications
10,654
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
552
Citations
Citations since 2017
6 Research Items
413 Citations
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080
2017201820192020202120222023020406080
Introduction
Jakub Gałka has been involved in several Polish and European research projects related to speech and audio processing, signal processing and data analysis. His research focus lies in speech and language processing and recognition, speaker recognition, multimedia signal processing and data analysis. He works on the development of commercially available ASR and speaker verification systems.
Additional affiliations
January 2003 - present
AGH University of Science and Technology in Kraków
Position
  • Professor (Assistant)

Publications

Publications (48)
Article
Full-text available
Multimodal data is being used more widely for human action recognition nowadays due to the progress of machine learning methods and the development of new types of sensors. The acquisition of the data required by such solutions is often troublesome, and it is difficult to find the proper tools for this process. In this paper, we present a new toolk...
Conference Paper
The basic definitions of the wavelets theory are presented. Proposed is transformation which combines two transformations: wavelet and Fourier. It is compared with the well known composition of Hilbert and Fourier transformation. The properties of the new transformation and its exemplary application is presented.
Article
The most popular systems for automatic sign language recognition are based on vision. They are user-friendly, but very sensitive to changes in regard to recording conditions. This paper presents a description of the construction of a more robust system-an accelerometer glove-as well as its application in the recognition of sign language gestures. T...
Conference Paper
Full-text available
The aim of the described system is to provide an online solution that profiles customers of a call centre. As an auxiliary module it might enhance functionality of modern call centre systems by active voice analysis. Integrated with existing databases, our system allows for analysis of constant and temporal caller characteristics during a call — re...
Article
Full-text available
This paper presents the creation of a Polish Sign Language corpus suitable for recognition research and automatic translation of sign language. The recording approach used and the captured data modalities are presented, as well as the description of the acquisition system implementation. The evaluation of the collected corpus is presented and compa...
Conference Paper
Full-text available
A speech recognition system for the Polish language is described. The presentation will focus on an adjustment of the Kaldi toolkit for Polish, our own grapheme to phoneme conversion tool and a corpus of Polish we collected. The approaches to commercial applications will also be described.
Conference Paper
Full-text available
A supporting system of voice analysis for emergency call centers is being developed at AGH University of Science and Technology in Krakow. The aim of our work is to provide an innovative supporting tool for rapid and accurate assessment of caller profile. The project covers not only speaker identification (when speaker's speech sample is known), bu...
Article
Full-text available
Playback attacks constitute one of the biggest threats in biometric speaker verification systems, in which a previously recorded passphrase is played back by an unprivileged person in order to gain access. This paper features a description of the playback attack detection (PAD) algorithm, designed to protect text-dependent speaker verification syst...
Article
The paper presents the concept of the embedded solution for voice biometric access system. The most important requirements for access control systems are presented, as well as the resulting design intent. The architecture of the system, its functionality and the methods used to verify the speakers is described along with a discussion of performance...
Article
The paper presents the concept of embedded solution for voice biometric access system. The most important requirements for access control systems are presented, as well as the resulting design intent. The architecture of the created system, its functionality and the methods used to verify the speakers is described along with a discussion of basic t...
Conference Paper
The aim of our work is to develop the software for caller identification or to create his characteristic by analysis of his voice. Based on collected speech samples, our system aims to identify emergency callers both on-line and off-line. This homeland security project covers speaker recognition (when speaker's speech sample is known), speaker's ge...
Conference Paper
An automatic speech recognition system for Polish is demonstrated. A few layers of our system are different from popular approaches as a result of differences between Polish and English languages.
Article
Full-text available
The paper presents successful experiments on combining two speaker recognition methods into a hybrid system. The �rst branch of recognition is an innovative approach based on discrete wavelet-Fourier transform. The second one is classic, based on HTK and classi�cation into voice and unvoice segments. The hybrid solution outperforms both on a small...
Conference Paper
Full-text available
The phonemic statistics were collected from several large Polish corpora. The paper presents methodology of the acquisition process, summarisation of the data and some phenomena in the statistics. Triphone statistics apply context-dependent speech units which have an important role in speech technologies. The phonemic alphabet for Polish, SAMPA, an...
Article
A method of choosing a word hypothesis from a dictionary of a speech recognition system is presented. The method applies a modified weighted Levenshtein distance for better accuracy. The distance is counted between phonetic transcriptions of a string of phonemes received from a classifier and of a dictionary. It allows efficient conducting of speec...
Conference Paper
Full-text available
The statistics of Polish phones, biphones and triphones were collected from several corpora. The paper presents summarisation of the data and some statistics phenomena including a distribution of frequency of biphones and triphones occurring. The model applying these statistics in speech recognition is presented as well.
Conference Paper
Full-text available
We demonstrate an automatic speech recognition system for Polish continuous speech. As most of the progress in the field is done for English, a few layers of our system are different from popular approaches in this field. These elements of our system could be successfully ported to other languages which share some features with Polish: the speech c...
Conference Paper
This paper presents Mean Best Basis algorithm, an extension of the well known Best Basis Wickerhouser’s method, for an adaptive wavelet decomposition of variable-length signals. A novel approach is used to obtain adecomposition tree of the wavelet-packet cosine hybrid transform forspeech signal feature extraction. Obtained features are tested using...
Conference Paper
Full-text available
This paper suggests a speech enhancement approach to an eavesdropping audio system. Speech signal is disturbed by non-stochastic noise. The algorithm is based on recordings from dual-microphone system. The Wiener filter was applied for speech extraction. The algorithm is designed to capture dialogues in noisy environment as well. It uses the small...
Article
Full-text available
A non-uniform speech segmentation method based on discrete wavelet transform is used for the localization of phoneme boundaries. A vector of real values representing the digital speech signal is decomposed into phone-like units by placing segment borders according to the result of the multiresolution analysis. The final decision on localization of...
Conference Paper
Full-text available
A non-uniform speech segmentation method based on wavelet packet transform is used for the localisation of phoneme boundaries. Eleven subbands are chosen by applying the mean best basis algorithm. Perceptual scale is used for decomposition of speech via Meyer wavelet in the wavelet packet structure. A real valued vector representing the digital spe...
Conference Paper
Full-text available
The paper presents modifications of the well know Levenshtein metric. The suggested improvements result in better automatic speech recognition when Levenshtein metric is applied to compare words from a dictionary and speech recognition hypotheses. It allows to evaluate hypotheses and to choose the word which was actually spoken.
Conference Paper
Full-text available
This paper presents two different methods of speech ex-traction: cross-correlation analysis and adaptive filtering. Algorithms are designed to extract conversations in noisy environment. Such situations can appear in police inves-tigations' materials or multi-speaker environment. Noise can be added intentionally by suspects or not intentionally (e....
Conference Paper
In this paper, we propose a feature selection and transformation approach for universal steganalysis based on genetic algorithm (GA) and higher order statistics. We choose three types of typical statistics as candidate features and twelve kinds of basic functions as candidate transformations. The GA is utilized to select a subset of candidate featu...
Article
Full-text available
In this paper a new application of the wavelet packet cosine transform (WPCT), used in the adaptive wavelet parameterization scheme, is presented. This is an extension of the best basis algorithm. Obtained optimized wavelet decomposition schemes are used for speech feature extraction and are tested using Polish language hidden Markov model (HMM) ph...
Conference Paper
Full-text available
The phonetical statistics of Polish were collected from a newspaper corpus of around 110 000 000 words. The paper presents summarisation of the data which are phoneme ngrams and some phenomena in the statistics including a distribution of frequency of triphones occurring. Triphone statistics apply context-dependent speech units which have an import...
Article
Full-text available
The phonetical statistics were collected from several Polish corpora. The paper is a summary of the data which are phoneme n-grams and some phenomena in the statistics. Triphone statistics apply context-dependent speech units which have an important role in speech recognition systems and were never calculated for a large set of Polish written texts...
Conference Paper
A speech recognition system based on HTK for Polish is presented. It was trained on 365 utterances, all spoken by 26 males. The features of Polish with respect to speech recognition are described. Some aspects of speech recognition differ in comparison to English. Errors in recognition were analysed in details in an attempt to find reasons and scen...
Conference Paper
A new event-driven method of speech signals segmentation is presented. The wavelet discrete transform was used for spectral analysis and to create a segmentation procedure. Innovative event detector is the core of the process. Efficiency of the algorithm is tested against the hand annotated speech corpus.
Article
Full-text available
Speech segmentation is widely used in many speech applications. We propose a new wavelet-based extension of the typical spectrum-based non-uniform speech segmentation methods. The use of wavelets improves computation performance and provides easy and flexible adjusting of algorithm parameters. Segmentation accuracy measures are introduced and appli...
Article
Full-text available
Speech segmentation is a very difficult problem, because of continuous nature of speech. Segmenting speech into various units (phonemes, syllables, and acoustic atoms) is essential in many applications. Choosing the best method of segmentation must be preceded by evaluation of its performance. This paper is a study of various numerical measures for...
Conference Paper
Full-text available
Abstract The Polish text corpus was analysed to find information about phoneme,statistics. We were especially interested in triphones as they are commonly,used in many,speech processing applications like HTK speech recogniser. An attempt to create the full list of triphones for Polish language,is presented. A vast amount,of phonetically transcribed...
Chapter
Full-text available
Progress of automatic speech recognition systems’ (ASR) development is, inter alia, made by using signal representation sensitive for more and more sophisticated features. This paper is an overview of our investigation of the new context-sensitive speech signal’s representation, based on wavelet-Fourier transform (WFT), and proposal of it’s quality...
Conference Paper
Full-text available
Statistical data on phonemes, useful in continuous speech recognition system, are presented. This paper explains basics of a simple system for phonemes, diphones and triphones statistics estimation from a text corpus of Polish language. Obtained results are presented for exemplar text database. Possible application of the statistics is suggested.
Article
Full-text available
The wavelet-Fourier transform, as a new attitude to representation and analysis of a dynamically changeable signal, a specially the speech one is in-troduced. It delivers global characteristic of frequencies' local changes. This representation is used in phonemes to find the similarities between them. To explore behavior of a wavelet-Fourier transf...
Article
Full-text available
The statistics of Polish phonemes, diphones and triphones were collected from a large literature corpus. The paper presents summarisation of the data and focuses on interesting phenom-ena in the statistics. Triphone statistics play an important role in speech recognition systems. They are used to improve the proper transcription of the analysed spe...

Network

Cited By