Stanisław Kacprzak

Stanisław Kacprzak
AGH University of Science and Technology in Kraków | AGH · Department of Electronics

PhD

About

17
Publications
48,574
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
168
Citations
Additional affiliations
April 2020 - present
AGH University of Science and Technology in Kraków
Position
  • Professor (Assistant)
March 2017 - March 2020
AGH University of Science and Technology in Kraków
Position
  • Research Assistant
Education
October 2012 - January 2020
October 2005 - March 2011
Lodz University of Technology
Field of study
  • Computer Science

Publications

Publications (17)
Conference Paper
Full-text available
In this paper we suggest to apply a new feature, called Minimum Energy Density (MED), in discrimination of audio signals between speech and music. Our method is based on the analysis of local energy for 1 or 2.5 seconds audio signals. An elementary analysis of the probability for the power distribution is an effective tool supporting the decision m...
Preprint
In classification tasks, the classification accuracy diminishes when the data is gathered in different domains. To address this problem, in this paper, we investigate several adversarial models for domain adaptation (DA) and their effect on the acoustic scene classification task. The studied models include several types of generative adversarial ne...
Conference Paper
Full-text available
A new VR application for voice and speech training has emerged from a problem observable in everyday life: an anxiety of public speaking. In the design process, we incorporated both domain knowledge of experts as well as research with end-users in order to explore the needs and the context of the problem. Functionalities of the prototype are the ef...
Article
Full-text available
Phones for 239 non-annotated languages were selected by automatic segmentation based on changes of energy in the time-frequency representation of speech signals. Phone boundaries were set at location of relatively major changes in energy distribution between seven frequency bands. A vector of average energies calculated for eleven frequency bands w...
Conference Paper
Full-text available
In this paper, we examine the use of i-vectors both for age regression as well as for age classification. Although i-vectors have been previously used for age regression task, we extend this approach by applying fusion of i-vectors and acoustic features regression to estimate the speaker age. By our fusion we obtain a relative improvement of 12.6%...
Conference Paper
The automatic segmentation and parametrization based on the frequency analysis was used to compare with manually annotated phones. The phones boundaries were fixed in places of relatively large changes in the energy distribution between the frequency bands. Frequency parametrization and clustering enabled the division of phones into groups (cluster...
Article
Full-text available
A comparative analysis of multi-language speech samples is conducted using acoustic characteristics of phoneme realisations in spoken languages. Different approaches to investigation of phonemic diversity in the context of language evolution are compared and discussed. We introduced our approach (materials and methods) and presented preliminary res...
Article
Full-text available
The paper presents the possibility of automatic speech processing in order to determine the acoustic similarity between phones. Subsequent processing steps of recorded speech signal result in phones’ segmentation, even without prior knowledge of their boundaries. The use of frequency signal parameterization and clustering algorithms facilitates a d...
Conference Paper
Full-text available
The results of investigation of the differences among the phonemes of 574 languages all over the world are presented. We attempt to verify the hypothesis of African origin for all languages and gradual languages diversification on other parts of the globe. The obtained results justify the languages classification by applying the methods used in evolu...

Questions

Questions (6)
Question
From what I found (according to comparision from 2007) the best ratio is achieved by YULS. Is this still actual?
Question
As I'm new to the topic, I'm looking for information on benchmark corpora that can be obtained (not necessary free) for audio events classification or computational auditory scene analysis.
I'm especially interested in house/street sounds.
Question
I'm looking for fast DPGMM (Dirichlet Process Guassian Mixture Model) implementation intended for high number of observations?
Question
I'm doing research on clustering speech utterances based on language. It seems to me that the only article dealing with such problem is:
Reynolds, Douglas A., et al. "Blind clustering of speech utterances based on speaker and language characteristics." ICSLP. 1998.
Maybe someone here is familiar with more recent work or is also working on that problem?

Network

Cited By

Projects

Projects (3)
Archived project
The project aims at measuring the acoustic diversity in the phoneme inventories of the world’s languages.
Project
Use of unsupervised algorithms for language/OOS modeling. Spoken language clustering.