Mark R. P. Thomas

Mark R. P. Thomas
Dolby Laboratories, Inc. · ATG Sound Tech Research

MEng PhD

About

55
Publications
9,572
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,104
Citations
Citations since 2016
11 Research Items
721 Citations
2016201720182019202020212022020406080100120
2016201720182019202020212022020406080100120
2016201720182019202020212022020406080100120
2016201720182019202020212022020406080100120
Introduction
I am interested in many areas of signal processing for speech, audio, and acoustics. I enjoy working with both theoretical and practical problems in Spatial audio capture and reproduction, Multichannel Acoustic Signal Processing, and Glottal-Synchronous Speech Processing.
Additional affiliations
October 2011 - October 2015
Microsoft
Position
  • Research Associate
Description
  • Contributed to the development of HoloLens, Windows 10, Kinect for Xbox One, Surface Pro 3, Cities Unlocked. Conducted basic research in audio/acoustic signal processing (beamforming, Fourier acoustics, dereverberation, HRTFs, WFS).
February 2010 - October 2011
Imperial College London
Position
  • Research Associate
Description
  • MEng thesis: A Novel Loudspeaker Equalizer. PhD thesis: Glottal-Synchronous Speech Processing.
April 2007 - October 2011
Imperial College London
Position
  • Research Assistant
Description
  • Real-Time DSP. MSc and MEng (part IV). Digital Signal Processing. BEng/MEng (part III) Digital Electronics. BEng/MEng (part I).
Education
October 2006 - March 2010
Imperial College London
Field of study
  • Glottal-Synchronous Speech Processing
September 2002 - September 2006
Imperial College London
Field of study
  • Electrical and Electronic Engineering

Publications

Publications (55)
Preprint
We propose a straightforward and cost-effective method to perform diffuse soundfield measurements for calibrating the magnitude response of a microphone array. Typically, such calibration is performed in a diffuse soundfield created in reverberation chambers, an expensive and time-consuming process. A method is proposed for obtaining diffuse field...
Preprint
The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art G...
Conference Paper
The perception of source location using multi-loudspeaker amplitude panning is considered. While there exist many perceptual models for pairwise panning, relatively few studies consider the general multi-loudspeaker case. This paper evaluates panning scenarios in which a source is panned on the boundary or within the volume bounded by discrete loud...
Conference Paper
The near-uniform distribution of nodes on the surface of a sphere has found many uses in numerical integration, physics, chemistry, crystallography, and more recently in the capture, representation and reproduction of spatial audio. A popular solution posed by Fliege and Meyer treats nodes as charged particles that are constrained to lie on the sur...
Conference Paper
Full-text available
Head-related transfer functions (HRTFs) depend on the shape of the human head and ears, motivating HRTF personalization methods that detect and exploit morphological similarities between subjects in an HRTF database and a new user. Prior work determined similarity from sets of morphological parameters. Here we propose a non-parametric morphological...
Conference Paper
Full-text available
Microphone arrays are beneficial for distant speech capture because the signals they capture can be exploited with beamforming to suppress noise and reverberation. The theory for the design and analysis of microphone arrays is well established, however the performance of a microphone array beamformer is often subject to conflicting criteria that ne...
Research
Full-text available
Accurate modelling of the interaural time difference (ITD) is crucial for rendering localised sound. Parametric models allow personalis- ing ITDs using anthropometrics. However, the mapping between anthropometric features and model parameters is not straightfor- ward. Here, we propose deriving personalised ITD model param- eters from a sphere fitte...
Conference Paper
Full-text available
Reverberation time, or T60, is a key parameter used for characterizing acoustic spaces. Blind T60 estimation is useful for many applications including speech intelligibility estimation, acoustic scene analysis and dereverberation. In our previous work, a single-channel blind T60 estimator was proposed employing spectral analysis in the modulation f...
Patent
Full-text available
Systems, methods, and computer media for generating an avatar reflecting a player's current appearance. Data describing the player's current appearance is received. The data includes a visible spectrum image of the player, a depth image including both the player and a current background, and skeletal data for the player. The skeletal data indicates...
Conference Paper
Full-text available
Reverberation time is an important parameter for characterizing acoustic environments. It is useful in many applications including acoustic scene analysis, robust automatic speech recognition and dereverberation. Given knowledge of the acoustic impulse response, reverberation time can be measured using Schroeder’s backward integration method. Since...
Conference Paper
Full-text available
This paper presents a method for speech time scale modification. Voiced speech is pseudo-periodic, allowing time scale modification by the repetition or removal of cycles as necessary. However, in the case of unvoiced speech and at the boundaries of voiced speech, no such periodicity exists so the speech should not be modified. To address this issu...
Conference Paper
Full-text available
Signals captured by microphone arrays provide spatial diversity that can be exploited by multichannel processing algorithms to suppress noise and reverberation. Beamforming is a class of approaches that treats the problem with respect to the spatial location of wanted and competing sources, leveraging properties of propagation of waves in free spac...
Conference Paper
Full-text available
We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subject’s anthropometric features w.r.t. the anthropometric features in the training set. The fundamenta...
Conference Paper
Full-text available
Spherical microphone and circular microphone arrays are useful for sampling sound fields that may be resynthesized with loudspeaker arrays. Spherical microphone arrays are desirable because of their ability to capture three-dimensional sound fields, however it is often more practical to construct loudspeaker arrays in the form of a closed circle lo...
Conference Paper
Full-text available
Reverberation is a process that distorts a wanted signal and impairs perceived speech quality. In the context of multichannel dereverberation, channel-based methods and beamforming are two common approaches. Channel-based methods such as the multiple input/output inverse theorem (MINT) can provide perfect dereverberation provided the exact acoustic...
Conference Paper
Full-text available
The Spectral Division Method is an analytic approach for sound field synthesis that determines the loudspeaker driving function in the wavenumber domain. Compact expressions for the driving function in time-frequency domain or in time domain can only be determined for a low number of special cases. Generally, the involved spatial Fourier transforms...
Conference Paper
Full-text available
We propose the concept of gentle acoustic crosstalk cancelation, which aims at reducing the crosstalk between a loudspeaker and the listener's contralateral ear instead of eliminating it completely as aggressive methods intend to do. The expected benefit is higher robustness and a tendency to collapse less unpleasantly. The proposed method employs...
Patent
Full-text available
A method and system for providing a user with user-friendly handles for manipulating graphics and other displayed objects using a pointer. An initial toolset of handles can evolve into a toolset with enhanced functionality. Selecting an object can invoke a first toolset. Pausing the pointer over an object for a preset length of time can invoke a se...
Conference Paper
Full-text available
Head-related transfer functions (HRTFs) represent the acoustic transfer function from a sound source at a given location to the ear drums of a human. They are typically measured from discrete source positions at a constant distance. Spherical harmonics decompositions have been shown to provide a flexible representation of HRTFs. Practical constrain...
Conference Paper
Full-text available
The design process for time-invariant acoustic beamformers often assumes that the microphones have an omnidirectional directivity pattern, a flat frequency response in the range of interest, and a 2D environment in which wavefronts propagate as a function of azimuth angle only. In this paper we investigate those cases in which one or more of these...
Conference Paper
Full-text available
The design of time-invariant beamformers is often posed as an optimization problem using practical design constraints. In many scenarios it is sufficient to assume that the microphones have an omnidirectional directivity pattern, a flat frequency response in the range of interest, and a 2D environment in which wavefronts propagate as a function of...
Article
Simulated room impulse responses have been proven to be both useful and indispensable for comprehensive testing of acoustic signal processing algorithms while controlling parameters such as the reverberation time, room dimensions, and source-array distance. In this work, a method is proposed for simulating the room impulse responses between a sound...
Article
Acoustic scene reconstruction is a process that aims to infer characteristics of the environment from acoustic measurements. We investigate the problem of locating planar reflectors in rooms, such as walls and furniture, from signals obtained using distributed microphones. Specifically, localization of multiple two- dimensional (2-D) reflectors is...
Article
Full-text available
The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the glottal closure instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art G...
Conference Paper
Full-text available
The effect of additive sensor noise on single-input-multiple-output (SIMO) blind system identification (BSI) algorithms based upon cross-relation (CR) error is investigated. Previous studies have shown that additive noise in the observed signal results in systems comprising the true estimated channels convolved with an erroneous 'common filter', an...
Article
A data-driven approach is introduced for studying, analyzing and processing the voice source signal. Existing approaches parameterize the voice source signal by using models that are motivated, for example, by a physical model or function-fitting. Such parameterization is often difficult to achieve and it produces a poor approximation to a large va...
Article
Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing including pitch tracking, prosodic speech modification, speech dereverberation, synthesis and study of pathological voice. We propose the Yet Another GCI/GOI Algorithm (Y...
Conference Paper
Full-text available
SCENIC is an EC-funded project aimed at developing a harmonized corpus of methodologies for environmentaware acoustic sensing and rendering. The project focusses on space-time acoustic processing solutions that do not just accommodate the environment in the modeling process but that make the environment help towards achieving the goal at hand. The...
Conference Paper
Full-text available
In this paper we discuss a method for localizing acoustic reflectors in space based on acoustic measurements on source-to-microphone reflective paths. The method converts Time of Arrival (TOA) and Time Difference of Arrival (TDOA) into quadratic constraints on the line corresponding to the reflector. In order to be robust against measurement errors...
Conference Paper
Full-text available
The inverse-filtering of acoustic impulse responses (AIRs) can be achieved with existing methods provided a good estimate of the channel is available and the observed signals contain little or no noise. Such assumptions are not generally valid in practical scenarios, leading to much interest in the issue of robustness. In particular, channel shorte...
Conference Paper
Full-text available
The problem of localizing reflective boundaries in an acoustic environment from acoustic measurements is considered. Specifically, localization of multiple two-dimensional (2-D) line reflectors is achieved by estimation of the time of arrival (TOA) of reflected signals by analysis of acoustic impulse responses (AIRs). The estimated TOAs are used in...
Conference Paper
Linear microphone arrays have been extensively used for dereverberation. In this paper we look at the dereverberation performance of two types of spherical microphone array: the open array (microphones suspended in free space) and the rigid array (microphones mounted on a rigid baffle). Dereverberation is performed in the spherical harmonic domain...
Conference Paper
Full-text available
The SCENIC project is aimed at making the environment become an integral part of the acoustic system. The goal is to boost the performance of arrays of speakers and microphones and, in some cases, to enable applications that would not be possible otherwise. This paper describes how this can be achieved.
Conference Paper
A method is proposed for simulating the sound pressure signals on a spherical microphone array in a reverberant enclosure. The method employs spherical harmonic decomposition and takes into account scattering from a solid sphere. An analysis shows that the error in the decomposition can be made arbitrarily small given a sufficient number of spheric...
Conference Paper
Adaptive blind system identification with LMS-type algorithms is prone to misconvergence in the presence of noise. In this paper we consider the hypothesis that such misconvergence is due to the introduction of a common filter to the estimated impulse respones. A technique is presented for identifying and removing the common filter using prior know...
Chapter
A class of reverberant speech enhancement techniques involve processing of the linear prediction residual signal following Linear Predictive Coding (LPC). These approaches are based on the assumption that reverberation is mainly confined to the prediction residual and affects the LPC coefficients to a lesser extent. This chapter begins with a study...
Conference Paper
Full-text available
Artificial bandwidth extension (ABWE) of speech signals aims to estimate wideband speech (50 Hz - 7 kHz) from narrowband signals (300 Hz - 3.4 kHz). Applying the source-filter model of speech, many existing algorithms estimate vocal tract filter parameters independently of the source signal. However, many current methods for extending the narrowban...
Thesis
Full-text available
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames...
Article
Full-text available
Accurate estimation of glottal closure instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing. The majority of existing approaches detect GCIs by comparing the differentiated EGG signal to a threshold and are able to provide accurate results during voiced speech....
Conference Paper
Full-text available
The paper presents a voice source waveform modeling tech- niques based on principal component analysis (PCA) and Gaus- sian mixture modeling (GMM). The voice source is obtained by inverse-filteirng speech with the estimated vocal tract fil- ter. This decomposition is useful in speech analysis, synthesis, recognition and coding. Existing models of t...
Conference Paper
Full-text available
Accurate estimation of glottal closure instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous process-ing. This paper proposes a novel improvement to the DYPSA framework, based upon a multiscale analysis technique and an accurate estimation of glottal volume velocity. This r...
Conference Paper
Full-text available
Accurate estimation of glottal closure instants (GCIs) in voiced speech is important for speech analysis applications which bene-fit from glottal-synchronous processing. Electroglottograph (EGG) recordings give a measure of the electrical conductance of the glot-tis, providing a signal which is proportional to its contact area. EGG signals contain...
Conference Paper
Full-text available
Equalization of room transfer functions (RTFs) is important in many speech and audio processing applications. It is a challenging problem because RTFs are several thousand taps long and non-minimum phase and in practice only approximate measurements of the RTFs are available. In this paper, we present a subband multichannel least squares method for...
Conference Paper
Full-text available
Speech signals for hands-free telecommunication applications are received by one or more microphones placed at some distance from the talker. In an office environment, for example, unwanted signals such as reverberation and background noise from computers and other talkers will degrade the quality of the received signal. These unwanted components h...
Conference Paper
Full-text available
Identification of glottal closure instants (GCIs) is impor-tant in speech applications which benefit from larynx-synchronous processing. In modern telecommunication ap-plications, speech signals are often obtained inside office rooms, with one or more microphones placed at a distance from the talker. Such speech signals are affected by reverber-ati...
Thesis
Full-text available
Fundamentally, loudspeaker design has changed very little in the past 30 years. The late 70s saw the introduction of electromechanical modelling techniques which remain the basis for the design and analysis of loudspeakers today. Aside from the use of lighter, stronger materials in the construction of drive units, the area which has undergone the...

Network

Cited By