About
75
Publications
22,726
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,883
Citations
Introduction
Signal processing
Machine learning
Robotic perception
Current institution
Additional affiliations
September 2003 - April 2010
Publications
Publications (75)
In this paper we investigate the sampling rate mismatch problem in distributed microphone arrays and propose a correlation maximization algorithm to blindly estimate the sampling rate offset between two asynchronously sampled microphone signals. We approximate the sampling rate offset with a linear-phase drift model in the short-time Fourier transf...
We investigate the problem of sensor and source joint localization using time-difference of arrivals (TDOAs) of an ad-hoc array. A major challenge is that the TDOAs contain unknown time offsets between asynchronous sensors. To address this problem, we propose a low-rank approximation method that does not need any prior knowledge of sensor and sourc...
In this paper, a multi-microphone noise reduction system based on the generalized sidelobe canceller (GSC) structure is investigated. The system consists of a fixed beamformer providing an enhanced speech reference, a blocking matrix providing a noise reference by suppressing the target speech, and a single-channel spectral post-filter. The spectra...
We propose an over-determined source separation and localization method for a set of M microphones distributed around an unknown number, N < M, of sources.We reformulate the over-determined acoustic mixing procedure with a new determined mixing model and apply a determined M M independent component analysis (ICA) in each frequency bin directly. The...
Multi-modal emotion recognition is challenging due to the difficulty of extracting features that capture subtle emotional differences. Understanding multi-modal interactions and connections is key to building effective bimodal speech emotion recognition systems. In this work, we propose Bimodal Connection Attention Fusion (BCAF) method, which inclu...
Multi-modal emotion recognition in conversations is a challenging problem due to the complex and complementary interactions between different modalities. Audio and textual cues are particularly important for understanding emotions from a human perspective. Most existing studies focus on exploring interactions between audio and text modalities at th...
Linear Text Segmentation is the task of automatically tagging text documents with topic shifts, i.e. the places in the text where the topics change. A well-established area of research in Natural Language Processing, drawing from well-understood concepts in linguistic and computational linguistic research, the field has recently seen a lot of inter...
Background
Hypertension is the commonest cause of left ventricular hypertrophy (LVH). Using cardiac magnetic resonance (CMR) imaging, four distinct hypertension mediated LVH phenotypes have been reported: normal LV, LV remodelling, eccentric LVH and concentric LVH. Early detection of hypertensives with LVH can enable timely initiation of therapy ho...
Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Pr...
In this paper we propose a new framework and new methods for the reference-free evaluation of topic segmentation systems directly in the embedding space. Specifically, we define a common framework for reference-free, embedding-based topic segmentation metrics, and show how this applies to an existing metric. We then define new metrics, based on a p...
Background
Four hypertension-mediated left ventricular hypertrophy (LVH) phenotypes have been reported using cardiac magnetic resonance (CMR): normal LV, LV remodeling, eccentric and concentric LVH, with varying prognostic implications. The electrocardiogram (ECG) is routinely used to detect LVH, however its capacity to differentiate between LVH ph...
Neural sentence encoders (NSE) are effective in many NLP tasks, including topic segmentation. However, no systematic comparison of their performance in topic segmentation has been performed. Here, we present such a comparison, using supervised and unsupervised segmentation models based on NSEs. We first compare results with baselines, showing that...
Multi‐rotor drones equipped with acoustic sensors have great potential for bioacoustically monitoring vocal species in the environment for biodiversity conservation. The bottleneck of this emerging technology is the ego‐noise from the rotating motors and propellers, which can completely mask the target sound and make sound recordings unusable for f...
Recent works on linear text segmentation have shown new state-of-the-art results nearly every year. Most times, however, these recent advances include a variety of different elements which makes it difficult to evaluate which individual components of the proposed methods bring about improvements for the task and,more generally, what actually works...
Funding Acknowledgements
Type of funding sources: Foundation. Main funding source(s): British Heart Foundation Pat Merriman Clinical Research Training Fellowship
Background
Hypertension is the commonest cause of left ventricular hypertrophy (LVH), an established independent predictor of cardiovascular morbidity and mortality. Four distinct LVH phe...
Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. We present an embedded multichannel sound acquisition and recording system with eight microphones mounted on a quadco...
We present two multimodal models for topic segmentation of podcasts built on pre-trained neural text and audio embeddings. We show that results can be improved by combining different modalities; but also by combining different encoders from the same modality, especially general-purpose sentence embeddings with specifically fine-tuned ones. We also...
Aims
Left ventricular hypertrophy (LVH) is an established, independent predictor of cardiovascular disease. Indices derived from the electrocardiogram (ECG) have been used to infer the presence of LVH with limited sensitivity. This study aimed to classify LVH defined by cardiovascular magnetic resonance (CMR) imaging using the 12-lead ECG for cost-...
Speech enhancement for drone audition is made challenging by the strong ego-noise from the rotating motors and propellers, which leads to extremely low signal-to-noise ratios (e.g. SNR < -15 dB) at onboard microphones. In this paper, we extensively assess the ability of single-channel deep learning approaches to ego-noise reduction on drones. We tr...
Sound source localization from a flying drone is a challenging task due to the strong ego-noise from rotating motors and propellers as well as the movement of the drone and the sound sources. To address this challenge, we propose a deep learning-based framework that integrates single-channel noise reduction and multi-channel source localization. In...
This chapter demonstrates how adversarial learning can be used in the mobile computing domain. Specifically, we address the problem of improving the recognition of human activities from smartphone sensors, when limited training data is available. Generative Adversarial Networks (GANs) provide an approach to model the distribution of a dataset and c...
The Sussex-Huawei Locomotion-Transportation (SHL) Recognition Challenges aim to advance and capture the state-of-the-art in locomotion and transportation mode recognition from smartphone motion (inertial) sensors. The goal of this series of machine learning and data science challenges was to recognize eight locomotion and transportation activities...
In this paper we summarize the contributions of participants to the fourth Sussex-Huawei Locomotion-Transportation (SHL) Recognition Challenge organized at the HASCA Workshop of UbiComp/ISWC 2021. The goal of this machine learning/data science challenge is to recognize eight locomotion and transportation activities (Still, Walk, Run, Bike, Bus, Car...
Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight...
In this paper we summarize the contributions of participants to the third Sussex-Huawei Locomotion-Transportation (SHL) Recognition Challenge organized at the HASCA Workshop of UbiComp/ISWC 2020. The goal of this machine learning/data science challenge is to recognize eight locomotion and transportation activities (Still, Walk, Run, Bike, Bus, Car,...
This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the fi...
We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three types of sensors, respectively. We then propose tw...
Acoustic sensing from a multi-rotor drone is heavily degraded by the strong ego-noise produced by the rotating motors and propellers. To address this problem, we propose a blind source separation (BSS) framework that extracts a target sound from noisy multi-channel signals captured by a microphone array mounted on a drone. The proposed method addre...
In this chapter we present a case study on drinking gesture recognition from a dataset annotated by Experience Sampling (ES). The dataset contains 8825 “sensor events” and users reported 1808 “drink events” through experience sampling. We first show that the annotations obtained through ES do not reflect accurately true drinking events. We present...
The Sussex-Huawei Transportation-Locomotion (SHL) Recognition Challenge 2018 aims to recognize eight transportation activities (Still, Walk, Run, Bike, Bus, Car, Train, Subway) from the inertial and pressure sensor data of a smartphone. In this chapter, we, as part of competition organizing team, present reference recognition performance obtained b...
In this paper we summarize the contributions of participants to the Sussex-Huawei Transportation-Locomotion (SHL) Recognition Challenge organized at the HASCA Workshop of Ubi-Comp 2019. The goal of this machine learning/data science challenge is to recognize eight locomotion and transportation activities (Still, Walk, Run, Bike, Bus, Car, Train, Su...
Transportation and locomotion mode recognition from multimodal smartphone sensors is useful to provide just-in-time context-aware assistance. However, the field is currently held back by the lack of standardized datasets, recognition tasks and evaluation criteria. Currently, recognition methods are often tested on ad-hoc datasets acquired for one-o...
In this paper we, as part of the Sussex-Huawei Locomotion-Transportation (SHL) Recognition Challenge organizing team, present reference recognition performance obtained by applying various classical and deep-learning classifiers to the testing dataset. We aim to recognize eight modes of transportation (Still, Walk, Run, Bike, Bus, Car, Train, Subwa...
In this paper we summarize the contributions of participants to the Sussex-Huawei Transportation-Locomotion (SHL) Recognition Challenge organized at the HASCA Workshop of UbiComp 2018. The SHL challenge is a machine learning and data science competition, which aims to recognize eight transportation activities (Still, Walk, Run, Bike, Bus, Car, Trai...
In this paper we present a case study on drinking gesture recognition from a dataset annotated by Experience Sampling (ES). The dataset contains 8825 "sensor events" and users reported 1808 "drink events" through experience sampling. We first show that the annotations obtained through ES do not reflect accurately true drinking events. We present th...
We propose a method to track from a multi-rotor drone a moving source, such as a human speaker or an emergency whistle, whose sound is mixed with the strong ego-noise generated by rotating motors and propellers. The proposed method is independent of the specific drone and does not need pre-training nor reference signals. We first employ a time-freq...
Scientific advances build on reproducible research which need publicly available benchmark datasets. The computer vision and speech recognition communities have led the way in establishing benchmark datasets. There are much less datasets available in mobile computing, especially for rich locomotion and transportation analytics. This paper presents...
We propose a time-frequency processing method that localizes and enhances a target sound by exploiting spectral and spatial characteristics of the ego-noise captured by a microphone array mounted on a multi-rotor micro aerial vehicle. We first exploit the time-frequency sparsity of the acoustic signal to estimate at each individual time-frequency b...
We propose a pseudo-determined blind source separation framework that exploits the information from a large number of microphones in an ad-hoc network to extract and enhance sound sources in a reverberant scenario. After compensating for the time offsets and sampling rate mismatch between (asynchronous) signals, we interpret as a determined M $\tim...
The ego-noise generated by the motors and propellers of a micro aerial vehicle (MAV) masks the environmental sounds and considerably degrades the quality of the on-board sound recording. Sound enhancement approaches generally require knowledge of the direction of arrival of the target sound sources, which are difficult to estimate due to the low si...
When a micro aerial vehicle (MAV) captures sounds emitted by a ground or aerial source, its motors and propellers are much closer to the microphone(s) than the sound source, thus leading to extremely low signal-to-noise ratios (SNR), e.g. -15 dB. While microphone-array techniques have been investigated intensively, their application to MAV-based eg...
We propose a time difference of arrival (TDOA) estimation framework based on time-frequency inter-channel phase difference (IPD) to count and localize multiple acoustic sources in a reverberant environment using two distant microphones. The time-frequency processing enables exploitation of the nonstationarity and sparsity of audio signals, increasi...
We investigate the self-localisation problem of an adhoc network of randomly distributed and independent devices in an open-space environment with low reverberation but heavy noise (e.g. smartphones recording videos of an outdoor event). Assuming a sufficient number of sound sources, we estimate the distance between a pair of devices from the extre...
We use audio fingerprinting to solve the synchronization problem between multiple recordings from an ad-hoc array consisting of randomly placed wireless microphones or hand-held smartphones. Synchronization is crucial when employing conventional microphone array techniques such as beamforming and source localization. We propose a fine audio landmar...
The paper investigates vector quantization coding of high-order (e.g., 20th-50th order) linear prediction coding (LPC) parameters, and proposes a novel hierarchical decomposition vector quantization method for a scalable speech coding framework with variable orders of LPC analysis. Instead of vector quantizing the whole group of LPC parameters in t...
Blind source separation (BSS) and beamforming are two well-known multiple
microphone techniques for speech separation and extraction in cocktail-party environments.
However, both of them perform limitedly in highly reverberant and dynamic scenarios.
Emulating human auditory systems, this chapter proposes a combined method for better
separation and...
Conventional fixed-point implementation of the DCT coefficients quantization algorithm in video compression may result in
deteriorated image quality. The paper investigates this problem and proposes an improved floating-to-fixed-point conversion
scheme. With a proper scaling factor and a new-established look-up table, the proposed fixed-point schem...
An important objective of binaural noise reduction algorithms in hearing aids is the preservation of the binaural cues. Recently a Multi-channel Wiener filter with instantaneous binaural cue preservation (MWF-ITFhc) has been presented, which relies on an accurate estimate of the noise signal vector. In this paper we propose a GSC-like structure for...
Due to the ambient noise, interferences, reverberation, and the speakers moving and talking concurrently, it is a challenge to extract a target speech in a real cocktail-party environment. Emulating human auditory systems, this paper proposes a two-stage target speech extraction method which combines fixed beamforming and blind source separation. W...
The convolutive blind source separation (BSS) problem can be solved efficiently in the frequency domain, where instantaneous BSS is performed separately in each frequency bin. However, the permutation ambiguity in each frequency bin should be resolved so that the separated frequency components from the same source are grouped together. To solve the...
The principle of 3D audio technology was introduced, the application of signal processing methods was re-viewed in 3D audio from the measure, computation, interpolation, approximation of head-related transfer function (HRTF) and the methods for crosstalk cancellation, summarized currently hot topics in this area. Finally, the future research trends...
This paper proposes an improved method of solving the permutation problem inherent in frequency-domain of convolutive blind source separation (BSS). It combines a novel inter-frequency dependence measure: the power ratio of separated signals, and a simple but effective bin-wise permutation alignment scheme. The proposed method is easy to implement...
Frequency-domain blind source separation (BSS) performs poorly in high reverberation because the independence assumption collapses at each frequency bins when the number of bins increases. To improve the separation result, this paper proposes a method which combines two techniques by using beamforming as a preprocessor of blind source separation. W...
Crosstalk cancellation plays an important role in displaying binaural signals with loudspeakers. It aims to reproduce binaural signals at a listener's ears via inverting acoustic transfer paths. The crosstalk cancellation filter should be updated in real time according to the head position. This demands high computational efficiency for a crosstalk...
Head-related transfer function (HRTF) interpolation plays an important role for implementation of 3D sound system because it can not only reduce the number of measurements for HRTFs, but also reduce the data of HRTFs for seamless binaural synthesis. This paper addresses the problem of accurately realizing the interpolation of HRTF for synthesis of...
The paper proposes a hybrid compression method to resolve the storage problem of a large number of head-related transfer functions (HRTFs). First, each HRTF is approximated by a minimum-phase HRTF and an all pass filter whose group delay equals the interaural time delay (ITD). Second, principal component analysis is applied to the entire HRTF set t...
An indirect interpolation method for head-related transfer function (HRTF) has been developed on the basis of the all-zero interpolation methods. The method transforms the pole-zero models to all-zero models, proceeding to interpolating HRTF from the reference all-zero models, using the linear interpolation method. The interpolated all-zero model i...
The efficiency of traditional binary Huffman decoding method is very low. A new decoding method based on octonary Huffman trees is presented in this paper to improve the decoding speed. Huffman codes are represented as an octonary tree and reconstructed as a single dimensional array according to the position of each node in the tree. When decoding,...
The realization scheme of ldquoout of headrdquo sound field enhancement for headphone is proposed, which is dedicated to embedded systems. The stereo enhancement system renders out of head localization by simulating virtual loudspeakers in the room, using head-related transfer function (HRTF) filtering and adding slight reverberation. Further more,...
The computation burden and the huge memory size of the head-related transfer function (HRTF) database challenge the practical application of 3D sound system. This paper therefore proposes a novel method which employs principal components analysis and vector quantization jointly to reduce the size of HRTF set in 3D sound system. Numerical experiment...