How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
In this paper we suggest to apply a new feature, called Minimum Energy Density (MED), in discrimination of audio signals between speech and music. Our method is based on the analysis of local energy for 1 or 2.5 seconds audio signals. An elementary analysis of the probability for the power distribution is an effective tool supporting the decision m...
In classification tasks, the classification accuracy diminishes when the data is gathered in different domains. To address this problem, in this paper, we investigate several adversarial models for domain adaptation (DA) and their effect on the acoustic scene classification task. The studied models include several types of generative adversarial ne...
A new VR application for voice and speech training has emerged from a problem observable in everyday life: an anxiety of public speaking. In the design process, we incorporated both domain knowledge of experts as well as research with end-users in order to explore the needs and the context of the problem. Functionalities of the prototype are the ef...
Phones for 239 non-annotated languages were selected by automatic segmentation based on changes of energy in the time-frequency representation of speech signals. Phone boundaries were set at location of relatively major changes in energy distribution between seven frequency bands. A vector of average energies calculated for eleven frequency bands w...
In this paper, we examine the use of i-vectors both for age regression as well as for age classification. Although i-vectors have been previously used for age regression task, we extend this approach by applying fusion of i-vectors and acoustic features regression to estimate the speaker age. By our fusion we obtain a relative improvement of 12.6%...
The automatic segmentation and parametrization based on the frequency analysis was used to compare with manually annotated phones. The phones boundaries were fixed in places of relatively large changes in the energy distribution between the frequency bands. Frequency parametrization and clustering enabled the division of phones into groups (cluster...
A comparative analysis of multi-language speech samples is conducted using acoustic characteristics of phoneme realisations in spoken languages. Different approaches to investigation of phonemic diversity in the context of language evolution are compared and discussed. We introduced our approach (materials and methods) and presented preliminary res...
The paper presents the possibility of automatic speech processing in order to determine the acoustic similarity between phones. Subsequent processing steps of recorded speech signal result in phones’ segmentation, even without prior knowledge of their boundaries. The use of frequency signal parameterization and clustering algorithms facilitates a d...
The results of investigation of the differences among the phonemes of 574 languages all over the world are presented. We attempt to verify the hypothesis of African origin for all languages and gradual languages diversiﬁcation on other parts of the globe. The obtained results justify the languages classiﬁcation by applying the methods used in evolu...
From what I found (according to comparision from 2007) the best ratio is achieved by YULS. Is this still actual?
As I'm new to the topic, I'm looking for information on benchmark corpora that can be obtained (not necessary free) for audio events classification or computational auditory scene analysis.
I'm especially interested in house/street sounds.
I'm doing research on clustering speech utterances based on language. It seems to me that the only article dealing with such problem is:
Reynolds, Douglas A., et al. "Blind clustering of speech utterances based on speaker and language characteristics." ICSLP. 1998.
Maybe someone here is familiar with more recent work or is also working on that problem?