Archived project

Source counting by ad-hoc microphones

Updates

0 new
0
Recommendations

0 new
0
Followers

0 new
1
Reads

0 new
51

Project log

Shahab Pasha
added 4 research items
A spatially modified multi-channel linear prediction analysis is proposed and tested for the dereverberation of ad-hoc microphone arrays. The proposed spatial multi-channel linear prediction takes into account the estimated spatial distances between each microphone and the source and is applied for the short-term dereverberation (pre-whitening). The delayed linear prediction is then applied for the suppression of the late reverberation. Results suggest that the proposed method outperforms the standard linear prediction based methods when applied to the ad-hoc microphones. It is also concluded that the kurtosis of the linear prediction residual signal is a reliable distance feature when the microphone gains are inconsistent and the sources energy levels vary.
In an era of ubiquitous digital devices with built-in microphones and recording capability, distributed microphone arrays of a few digital recording devices are the emerging recording tool in hands-free speech communications and immersive meetings. Such so-called ad hoc microphone arrays can facilitate high-quality spontaneous recording experiences for a wide range of applications and scenarios, though critical challenges have limited their applications. These challenges include unknown and changeable positions of the recording devices and sound sources, resulting in varying time delays of arrival between microphones in the ad hoc array as well as varying recorded sound power levels. This paper reviews state-of-the-art techniques to overcome these issues and provides insight into possible ways to make existing methods more effective and flexible. The focus of this paper is on scenarios in which the microphones are arbitrarily located in an acoustic scene and do not communicate directly or through a fusion centre.
Shahab Pasha
added a research item
Ad-hoc microphone arrays formed from the microphones of mobile devices such as smart phones, tablets and notebooks are emerging recording platforms for meetings, press conferences and other sound scenes. As opposed to the Wireless Acoustic Sensor Networks (WASN), ad-hoc microphones do not communicate within the array and location of each microphone is unknown. Analysing speech signals and the acoustic scene in the context of ad-hoc microphones is the goal of this thesis. Despite conventional known geometry microphone arrays (e.g. a Uniform Linear array), ad-hoc arrays do not have fixed geometries and structures and therefore standard speech processing techniques such as beamforming and dereverbearion techniques cannot be directly applied to these. The main reasons for this include unknown distances between microphones an hence unknown relative time delays and the changeable array topology. This thesis focuses on utilising the side information obtained by the acoustic scene analysis to improve the speech enhancement by ad-hoc microphone arrays randomly distributed within a reverberant environment. New discriminative features are proposed, applied and tested for various signal and audio processing applications such as microphone clustering, source localisation, multi-channel dereverberation, source counting and multi-talk detection. The main contributions of this thesis fall into two categories: 1) Novel spatial features extracted from Room Impulse Responses (RIRs) and speech signals 2) Speech enhancement and acoustic scene analysis methods specifically designed for the ad-hoc arrays. Microphone clustering, source localisation, speech enhancement, source counting and multi-talk detection in the context of ad-hoc arrays are investigated in this thesis and novel methods are proposed and tested. A clustered speech enhancement and dereverberation method tailored for the ad-hoc microphones is proposed and it is concluded that exclusively using a cluster of microphones located closer to the source, improves the dereverberation performance. Also proposed is a multi-channel speech dereverberation method based on a novel spatial multi-channel linear prediction analysis approach for the ad-hoc microphones. The spatially modified multi-channel linear prediction approach takes into account the estimated relative ii distances between the source and the microphones and improves the dereverberation performance. The coherence based features are applied for multi-talk detection and source counting in highly reverberant environments and it is shown that the proposed features are reliable source counting features in the context of ad-hoc microphones. Highly accurate offline source counting and pseudo real-time multi-talk detection results are achieved by the proposed methods.
Jacob Donley
added a research item
This paper proposes the use of the frequency domain Magnitude Squared Coherence (MSC) between two ad-hoc recordings of speech as a reliable speaker discrimination feature for source counting applications in highly reverberant environments. The proposed source counting method does not require knowledge of the microphone spacing and does not assume any relative distance between the sources and the microphones. Source counting is based on clustering the frequency domain MSC of the speech signals derived from short time segments. Experiments show that the frequency domain MSC is speaker-dependent and the method was successfully used to obtain highly accurate source counting results for up to six active speakers for varying levels of reverberation and microphone spacing.
Shahab Pasha
added 2 research items
Coherent-to-diffuse ratio (CDR) estimates over short time frames are utilized for source counting using ad-hoc microphone arrays to record speech from multiple participants in scenarios such as a meeting. It is shown that the CDR estimates obtained at ad-hoc dual (two channel) microphone nodes, located at unknown locations within an unknown reverberant room, can detect time frames with more than one active source and are informative for source counting applications. Results show that interfering sources can be detected with accuracies ranging from 69% to 89% for delays ranging from 20 ms to 300 ms, with source counting accuracies ranged from 61% to 81% for two sources and the same range of delays.
This paper proposes a novel approach to detecting multiple, simultaneous talkers in multi-party meetings using localisation of active speech sources recorded with an ad-hoc microphone array. Cues indicating the relative distance between sources and microphones are derived from speech signals and room impulse responses recorded by each of the microphones distributed at unknown locations within a room. Multiple active sources are localised by analysing a surface formed from these cues and derived at different locations within the room. The number of localised active sources per each frame or utterance is then counted to estimate when multiple sources are active. The proposed approach does not require prior information about the number and locations of sources or microphones. Synchronisation between microphones is also not required. A meeting scenario with competing speakers is simulated and results show that simultaneously active sources can be detected with an average accuracy of 75% and the number of active sources counted accurately 65% of the time.