András Beke

András Beke
Hungarian Academy of Sciences | HAS · Department of Phonetics

PhD

About

37
Publications
5,942
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
217
Citations
Citations since 2016
11 Research Items
176 Citations
20162017201820192020202120220102030
20162017201820192020202120220102030
20162017201820192020202120220102030
20162017201820192020202120220102030

Publications

Publications (37)
Conference Paper
In forensic comparison, document classification techniques are used mainly for authorship classification and author profiling. In the present study, we aim to in-troduce paragraph vector modelling (by Doc2Vec) into the likelihood-ratio framework paradigm of forensic evidence comparison. Transcriptions of sponta-neous speech recording are used as in...
Article
This paper reviews the applied Deep Learning (DL) practices in the field of Speaker Recognition (SR), both in verification and identification. Speaker Recognition has been a widely used topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5–6 years. However, as Deep Learning techniq...
Preprint
This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5-6 years. However, as deep learning techniques...
Chapter
In human speech, laughter has a special role as an important non-verbal element, signaling a general positive affect and cooperative intent. However, laughter occurrences may be categorized into several sub-groups, each having a slightly or significantly different role in human conversation. It means that, besides automatically locating laughter ev...
Conference Paper
Full-text available
Punctuation of ASR-produced transcripts has received increasing attention in the recent years; RNN-based sequence modelling solutions which exploit textual and/or acoustic features show encouraging performance. Switching the focus from the technical side, qualifying and quantifying the benefits of such punctuation from end-user perspective have not...
Article
Numerous investigations have identified weaknesses in speech processing and language skills in children with dyslexia; however, little is known about these abilities in children with reading difficulties (RD). The primary objective of this investigation was to determine the utility of auditory speech processing tasks in differentiating children wit...
Article
Filled pauses may reveal speech planning or execution problems to a greater extent in L2 spontaneous speech than in L1. The purpose of this study was to analyze the forms and position of all filled pauses, and the durations and the formants of vocalic filled pauses in English (L2) and in Hungarian (L1) spontaneous speech produced by 30 young learne...
Conference Paper
Full-text available
This paper addresses speech summarization of highly spontaneous speech. The audio signal is transcribed using an Automatic Speech Recognizer, which operates at relatively high word error rates due to the complexity of the recognition task and high spontaneity of speech. An analysis is carried out to assess the propagation of speech recognition erro...
Conference Paper
This paper addresses speech summarization of highly spontaneous speech. Speech is converted into text using an ASR, then segmented into tokens. Human made and automatic, prosody based tokenization are compared. The obtained sentence-like units are analysed by a syntactic parser to help automatic sentence selection for the summary. The preprocessed...
Conference Paper
The aim of this paper is an objective presentation of temporal features of spontaneous Hungarian narratives, as well as a characterization of separable portions of spontaneous speech. Ten speakers’ spontaneous speech materials taken from the BEA Hungarian Spontaneous Speech Database were analyzed in terms of hierarchical units of narratives (durati...
Article
Full-text available
Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter dete...
Conference Paper
Full-text available
Several studies use idealized, fluent utterances to comprehend spoken language. Disfluencies are often regarded to be just a noise in the speech flow. Other works argue that fragmented structures (disfluencies, silent and filled pauses) are important and can help better understanding. By extending the original concept of speech disfluency, the curr...
Conference Paper
Full-text available
Generating proper and natural sounding prosody is one of the key interests of today’s speech synthesis research. An important factor in this effort is the availability of a precisely labelled speech corpus with adequate prosodic stress marking. Obtaining such a labelling constitutes a huge effort, whereas interannotator agreement scores are usually...
Article
Information extraction from written or spoken archives is a challenging infocommunication task, especially if a deep automatic analysis of the information structure is also targeted. The present research investigates focus detection approaching from an automatic analysis point of view for text (NLP) and speech (prosody) modalities. Deep syntactic a...
Conference Paper
Full-text available
In this paper, a large Hungarian spoken language database is introduced. This phonetically-based multi-purpose database contains various types of spontaneous and read speech from 333 monolingual speakers (about 50-minute speech sample per speaker). This study presents the background and motivation of the development of the BEA Hungarian database, d...
Conference Paper
The accuracy of speech recognizers may decrease in the case of spontaneous speech because of non-verbal vocalizations such as laughter. Previous studies showed that laughter resemble to speech sounds in terms of their acoustic characteristics. The aim of the present research is to perform for the first time for Hungarian language an acoustic analys...
Article
Full-text available
The present paper investigates automatic prosodic phrasing of spontaneous speech: a two-step segmentation technique is presented, based on unsupervised learning. In the first step, the Intonational Phrases (IP) are detected automatically based on speech energy, spectral centroid and a double-thresholding technique. In the second step, Phonological...
Conference Paper
The aim of this research is to segment spontaneous speech using an unsupervised learning technique. We are especially interested from a machine perception or detection point-of-view, and focus on revealing some structure of prosody in spontaneous speech. The BEA spontaneous speech database is used to develop a speech segmentation system. The sponta...
Conference Paper
Spontaneous conversations frequently contain various non-verbal vocalizations (such as laughter). The accuracy of a speech recognizer may decrease in the case of spontaneous speech because of these non-verbal vocalization phenomena. The aim of the present research is to develop an accurate and efficient method in order to recognize laughter in spon...
Article
This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor N-gram language modelling capabilities. Therefore,...
Article
Full-text available
The relation between syntax and prosody is evident, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. Syntax-to-prosody mapping is widely used in text-tospeech applications, but prosody-to-syntax mapping is mostly missing from automatic speech recognition/understanding systems. This paper presents an expe...
Conference Paper
Many phonetic and phonology domain research papers analyzed segmental duration: what factors and interactions between factors determine their duration. Their results often play an important role in Language Technology applications, for example TTS (text-to-speech synthesis), ASR (automatic speech recognition) widely used in infocommunication. Speec...
Conference Paper
Speech prosody and speech syntax are closely related, and this correspondence - syntax to prosody mapping - is exploited in text-to-speech infocommunication applications. However, in automatic speech recognition and understanding based inter-cognitive infocommunication, the use of prosody to syntax mapping is mostly restricted to minimal pair disam...
Conference Paper
Dealing with spontaneous speech constitutes big challenge both for linguistics and engineers of speech technology. For read speech, prosody was assessed as an automatic decomposition for phonological phrases using supervised method (HMM) in earlier experiments. However, when trying to adapt this automatic approach for spontaneous speech, the cluste...
Conference Paper
Full-text available
Subglottal resonances are claimed to divide front/back vowels and low/high vowels in several languages, including Hungarian. However, some ‘recalcitrant’ vowels appear to resist this mould. We therefore performed a careful analysis of the role coarticulation and speaker-dependent effects might play in the recalcitrance of these vowels in Hungarian....
Conference Paper
Full-text available
Prosody and syntax are highly related, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. This paper presents an experiment for exploring in what degree a powerful HMM-based automatic prosodic segmentation tool can recover the syntactic structure of an utterance in speech understanding systems. Results sho...
Article
Full-text available
Subglottal resonances (SGRs) have been reported to divide vowels into certain contrasting natural categories: low – non-low; front – back; front unrounded non-low – other front. This role of the subglottal resonances has been investigated for a handful of languages in the speech of adults and children. The present paper aims to consider the pattern...
Article
This paper analyzes the nature of the process involved in optional vowel reduction in Hungarian, and the acoustic structure of schwa variants in spontaneous speech. The study focuses on the acoustic patterns of both the basic realizations of Hungarian vowels and their realizations as neutral vowels (schwas), as well as on the design, implementation...
Article
The duration of the vowel and the nasal was analyzed in the casual pronunciation of Hungarian words containing the sequence Vn.C, where . is a syllable boundary and C is a stop, affricate, fricative, or approximant. It was found that due to anticipatory coarticulation the duration n is significantly shorter before fricatives and approximants than b...
Article
In Hungarian casual speech, [n] may not be fully pronounced before continuant consonants. In this paper the durational changes of the vowel and the nasal as well as the formants of the vowel were analyzed in VNC sequences. The spreading coarticulatory effect in these sequences is supposed to originate from the continuant C (fricatives and approxima...

Network

Cited By

Projects

Projects (2)
Project
The aim of the FORENSICspeech project is to create a reliable, representative, follow-up forensic speaker database for the Hungarian language. The database is necessary for both forensic phonetic research and development / validation of forensic voice comparison systems. At least 120 Hungarian native speakers’ voice (60 males and 60 females) is planned to be recorded at Laboratory of Speech Acoustics at the Department of Telecommunication and Media Informatics of Budapest University of Technology and Economics. The collection of the voice samples will follow a strict protocol according to the international standards: 1) there must be at least two non-contemporaneous recordings of each speaker; 2) the database contains recordings of each speaker using different speaking styles; voice sessions will cover informal telephone conversation, information exchange task over the telephone, pseudo-police-style interview; 3) the database will comply with research and forensic criteria (involving the modelling of recording and transmission channel mismatches).