John Kane

John Kane
Cogito Corporation

BComm, MPhil, PhD

About

48
Publications
9,742
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,777
Citations
Introduction
I am a researcher and developer of algorithms to derive meaning and value from large volumes of data. To do this I exploit advanced methods from digital signal processing, statistical modelling and machine learning. My career to date has focused on detecting, modelling and predicting human behaviour, mainly through analysis of speech signals. I am most interested in creating scalable analytics systems for processing behavioural data in real-time.
Additional affiliations
January 2014 - present
Cogito Corporation
Position
  • Signal processing researcher and developer
October 2012 - December 2013
Trinity College Dublin
Position
  • PostDoc Position
Education
September 2008 - September 2012
Trinity College Dublin
Field of study
  • Speech processing
September 2007 - September 2008
Trinity College Dublin
Field of study
  • Speech and Language Processing
September 2001 - August 2004
University College Dublin
Field of study
  • Commerce and Marketing

Publications

Publications (48)
Article
Full-text available
Background: There is a critical need for real-time tracking of behavioral indicators of mental disorders. Mobile sensing platforms that objectively and noninvasively collect, store, and analyze behavioral indicators have not yet been clinically validated or scalable. Objective: The aim of our study was to report on models of clinical symptoms fo...
Article
This paper investigates the temporal excitation patterns of creaky voice. Creaky voice is a voice quality frequently used as a phrase-boundary marker, but also as a means of portraying attitude, affective states and even social status. Consequently, the automatic detection and modelling of creaky voice may have implications for speech technology ap...
Conference Paper
Full-text available
This paper presents a voice source modelling method employ-ing a deep neural network (DNN) to map from acoustic fea-tures to the time-domain glottal flow waveform. First, acous-tic features and the glottal flow signal are estimated from each frame of the speech database. Pitch-synchronous glottal flow time-domain waveforms are extracted, interpolat...
Conference Paper
Full-text available
As part of a broader study of voice prosody in speech communication, this paper looks at intonation in turn-taking. It examines the distribution of pitch patterns and communicative types in the interpausal units (IPUs) preceding pause or gap silences extracted from a corpus of spontaneous speech of Irish English. IPUs preceding speaker change ('Gap...
Conference Paper
Full-text available
Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This paper presents a new collaborati...
Article
The effectiveness of glottal source analysis is known to be dependent on the phonetic properties of its concomitant supraglottal features. Phonetic classes like nasals and fricatives are particularly problematic. Their acoustic characteristics, including zeros in the vocal tract spectrum and aperiodic noise, can have a negative effect on glottal in...
Article
Full-text available
For many applications in human-computer interaction, it is desirable to predict between-(gaps) and within-(pauses) speaker silences independently of automatic speech recognition (ASR). In this study, we focus a dataset of 6 dyadic task-based interactions and aim at automatic discrimination of gaps and pauses based on f0, energy and glottal paramete...
Article
This papers presents GlóRí - the glottal research instrument. GlóRí is a speech analysis interface which offers a flexibility and multiplicity of approaches to voice analysis. The system allows for fully automatic processing, for instance for analysis of large corpora. However, for more fine-grained studies, which may require precise voice source m...
Article
This paper examines how pitch range declination and reset contribute to turn-taking organisation. This is part of a broader study of voice prosody, i.e., how pitch, voice quality and temporal features combine for various prosodic functions, both linguistic and paralinguistic. The present study first investigates the effect of the speech unit positi...
Conference Paper
Full-text available
Voice quality plays a pivotal role in speech style variation. Therefore, control and analysis of voice quality is critical for many areas of speech technology. Until now, most work has focused on small purpose built corpora. In this paper we apply state-of-the-art voice quality analysis to large speech corpora built for expressive speech synthesis....
Conference Paper
Full-text available
Creaky voice, also referred to as vocal fry, is a voice quality fre-quently produced in many languages, in both read and conversational speech. In order to enhance the naturalness of speech synthesisers, these latter should be able to generate speech in all its expressive di-versity. This includes a proper use of creaky voice. The goal of this pape...
Conference Paper
Full-text available
The robust and efficient extraction of features related to the glottal excitation source has become increasingly important for speech technology. The glottal open quotient (OQ) is one relevant measurement which is known to significantly vary with changes in voice quality on a breathy to tense continuum. The extraction of OQ, however, is hampered in...
Conference Paper
Full-text available
Creaky voice, also referred to as vocal fry, is a voice quality fre-quently produced in many languages, in both read and conver-sational speech. To enhance the naturalness of speech synthesis, these latter should be able to generate speech in all its expressive diversity, including creaky voice. The present study looks to exploit our recent develop...
Conference Paper
This paper documents a comprehensive evaluation carried out on automatic glottal inverse filtering and glottal source parameterisation methods. The experiments consist of analysis of a wide variety of synthetic vowels and assessment of the ability of derived parameters to differentiate breathy to tense voice. One striking finding is that glottal mo...
Article
This paper describes a new algorithm for automatically detecting creak in speech signals. Detection is made by utilising two new acoustic parameters which are designed to characterise creaky excitations following previous evidence in the literature combined with new insights from observations in the current work. In particular the new method focuse...
Article
This paper proposes a new parameter, the Maxima Dispersion Quotient (MDQ), for differentiating breathy to tense voice. Maxima derived following wavelet decomposition are often used for detecting edges in image processing, where locations of these maxima organize in the vicinity of the edge location. Similarly for tense voice, which typically displa...
Article
A large part of the research carried out at the Phonetics and Speech Laboratory is concerned with the role of the voice source in the prosody of spoken language, including its linguistic and expressive dimensions. Due to the lack of robustness of automatic voice source analysis methods we have tended to use labour intensive methods which require pu...
Article
Recently developed speech technology platforms, such as statistical speech synthesis and voice transformation systems, facilitate the modification of voice characteristics. To fully exploit the potential of such platforms, speech analysis algorithms need to be able to handle the different acoustic characteristics of a variety of voice qualities. Gl...
Article
Recently developed speech technology platforms, such as statistical speech synthesis and voice transformation systems, facilitate the modification of voice characteristics. To fully exploit the potential of such platforms, speech analysis algorithms need to be able to handle the different acoustic characteristics of a variety of voice qualities. Gl...
Article
The dynamic use of voice qualities in spoken language can reveal useful information on a speakers attitude, mood and affective states. This information may be very desirable for a range of, both input and output, speech technology applications. However, voice quality annotation of speech signals may frequently produce far from consistent labeling....
Article
This paper explores the interplay of source correlates of accentuation, examining a hypothesis (the Voice Prominence Hypothesis) that different source parameters are involved and may serve as equivalent. It predicts that where accentuation is not marked by pitch salience there will be more extensive changes in other source parameters. This follows...
Article
Parameterisation of the glottal source has become increasingly useful for speech technology. For many applications it may be desirable to restrict the glottal source feature data to only speech regions where it can be reliably extracted. In this paper we exploit the previously proposed set of binary phonetic feature extractors to help determine opt...
Conference Paper
Full-text available
The dynamic use of voice qualities in spoken language can reveal useful information on a speaker's attitude, mood and affective states. This information may be desirable for a range of speech technology applications. However, annotation of voice quality may frequently be inconsistent across raters. But whom should one trust or is the truth somewher...
Conference Paper
Full-text available
In order to produce natural sounding output, corpus-based speech synthesis systems need to be able to properly model the acoustic variability in the corpus. Creaky voice is a voice qual-ity frequently produced in many languages, in both read and conversational speech settings. However, the creaky excitation displays different acoustic characteristi...
Article
Full-text available
Creak, also referred to as vocal fry or glottal fry, is a perceptually distinctive vocal occurrence that is frequently produced in running speech. The pres-ence of creak causes problems in speech process-ing particularly with f 0 and spectral analysis. How-ever, as creak frequently occurs in natural spoken communication and as it often carries impo...
Article
Full-text available
Creaky voice is used by speakers for a variety of interactive, ex-pressive and stylistic reasons. As a result the accurate detection of creaky regions in speech can yield important information not captured within the propositional content of spoken utterances. Hence, we describe a new method for automatically detecting creaky regions following the...
Article
Much of our research has focused on the role of the voice source in the prosody of spoken language, including its linguistic and expressive dimensions. However, as automatic methods, both for deriving the voice source and for modelling it tend to lack robustness, we have generally conducted studies on small amounts of speech data. These studies hav...
Conference Paper
Full-text available
The present study proposes a new parameter for identifying breathy to tense voice qualities in a given speech segment using measurements from the wavelet transform. Techniques that can deliver robust information on the voice quality of a speech segment are desirable as they can help tune analysis strategies as well as provide automatic voice qualit...
Conference Paper
Full-text available
According to the source-filter model of speech production, speech can be represented by passing the excitation signal through the vocal tract filter. The epoch or instant of maximum excitation corresponds to the glottal closure instant. Several speech processing applications require robust epoch detection but this can be a difficult task. Although...
Conference Paper
Full-text available
The detection of short utterances in conversational or interactive speech is essential to the proper processing of meaning in spoken interaction. Short, simple utterances are extremely common, and because of their highly variable prosody, carry many different forms of subtle interpersonal information. This paper reports on our approach to this prob...
Conference Paper
Full-text available
This paper presents a new method of extracting LF model based parameters using a spectral model matching approach. Strategies are described for overcoming some of the known difficulties of this type of approach, in particular high frequency noise. The new method performed well compared to a typical time based method particularly in terms of robustn...
Conference Paper
Full-text available
This pilot study explores how the voice source parameters vary in focally accented syllables. It examines the dynamics of the voice source parameters in an all-voiced short declarative utterance in which the focus placement was varied. The voice source parameters F0, EE, UP, OQ, RG, RA, RK and RD were obtained through inverse filtering and subseque...
Article
Full-text available
This paper describes a new technique for auto- matically parameterising the inverse filtered speech wave- form by exploiting frequency domain measures and ampli- tude measures in the time domain. The technique is moti- vated by the difficulties posed by time domain analysis and by the consequent risks of inconsistencies on the part of both research...

Network

Cited By