
Aidan O. T. HoggQueen Mary University of London | QMUL · School of Electronic Engineering and Computer Science
Aidan O. T. Hogg
PhD, MEng
About
17
Publications
3,119
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
121
Citations
Introduction
My current research focuses on using deep learning to capture head-related transfer functions and, more generally, spatial acoustics and immersive audio. Other research interests include speaker diarization and statistical signal processing for audio applications.
More information about my current research can be found at: www.aidanhogg.uk.
Publications
Publications (17)
An individualised head-related transfer function (HRTF) is essential for creating realistic virtual reality (VR) and augmented reality (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampli...
Immersive audio technologies, ranging from rendering spatialized sounds accurately to efficient room simulations, are vital to the success of augmented and virtual realities. To produce realistic sounds through headphones, the human body and head must both be taken into account. However, the measurement of the influence of the external human morpho...
Direction of arrival (DoA) estimation for sound source localization is increasingly prevalent in modern devices. In this paper, we explore a polynomial extension to the multiple signal classification (MUSIC) algorithm, spatio-spectral polynomial (SSP)-MUSIC, and evaluate its performance when using speech sound sources. In addition, we also propose...
This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during ove...
Estimating the quality of speech without the use of
a clean reference signal is a challenging problem, in part due to
the time and expense required to collect sufficient training data
for modern machine learning algorithms. We present a novel,
non-intrusive estimator that exploits recurrent neural network
architectures to predict the intrusive POLQ...
A Head-Related Transfer Function (HRTF) is able to capture alterations a sound wave undergoes from its source before it reaches the entrances of a listener's left and right ear canals, and is imperative for creating immersive experiences in virtual and augmented reality (VR/AR). Nevertheless, creating personalized HRTFs demands sophisticated equipm...
An individualised (HRTF) is very important for creating realistic (VR) and (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HR...
Individualised head-related transfer functions (HRTFs) are essential for creating realistic virtual reality (VR) and augmented reality (AR) environments and interactions. Performing acoustic measurements is the most accurate way to capture these individualised HRTFs. However, one of the main challenges is acoustically capturing high-quality HRTFs w...
Human-based speaker diarization experiments were carried out on a five-minute extract of a typical AMI corpus meeting to see how much variance there is in human reviews based on hearing only and to compare with state-of-the-art diarization systems on the same extract. There are three distinct experiments: (a) one with no prior information; (b) one...
Voice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose...
Diarization systems are an essential part of many speech processing applications, such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker domains. This thesis will focus on the first task of the diarization process, that being the task of speake...
An essential part of any diarization system is the task of speaker segmentation which is important for many applications including speaker indexing and automatic speech recognition (ASR) in multi-speaker environments. Segmentation of overlapping speech has recently been a key focus of this work. In this paper we explore the use of a new multimodal...
Evaluation of speaker segmentation and diarization normally makes use of forgiveness collars around ground truth speaker segment boundaries such that estimated speaker segment boundaries with such collars are considered completely correct. This paper shows that the popular recent approach of removing forgiveness collars from speaker diarization eva...
Speaker segmentation is an essential part of any diarization system. Applications of diarization include tasks such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker environments. This paper proposes a multiple hypothesis tracking (MHT) method...
This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual's pitch is smoothly varying and, therefore, can be pre...