Zhuohuang Zhang

Zhuohuang Zhang
Shenzhen Polytechnic University

Doctor of Philosophy

About

28
Publications
4,191
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
388
Citations
Introduction
Zhuohuang Zhang currently works at Shenzhen Polytechnic University as a lecturer. Zhuohuang does research in artificial intelligence, speech enhancement/separation and human speech perception.
Additional affiliations
May 2020 - October 2020
Tencent
Position
  • Research Intern
Description
  • Speech/NLP Research Intern at Tencent AI Lab working on multi-modal multi-channel speech enhancement algorithms (ADL-MVDR).
August 2022 - June 2023
Tencent
Position
  • Senior Researcher
May 2021 - August 2021
Microsoft
Position
  • Research Intern
Description
  • Research intern working on multi-channel speech enhancement algorithms.
Education
August 2017 - May 2022
Indiana University Bloomington
Field of study
  • Speech and Hearing Sciences, Computer Science
August 2015 - May 2017
University of Rochester
Field of study
  • Electrical and Computer Engineering
August 2011 - June 2015
Beijing Institute of Technology
Field of study
  • Opto-electrical Information Engineering

Publications

Publications (28)
Preprint
Full-text available
Phase serves as a critical component of speech that influences the quality and intelligibility. Current speech enhancement algorithms are beginning to address phase distortions, but the algorithms focus on normal-hearing (NH) listeners. It is not clear whether phase enhancement is beneficial for hearing-impaired (HI) listeners. We investigated the...
Preprint
Full-text available
Recent work has shown that it is feasible to use generative ad-versarial networks (GANs) for speech enhancement, however, these approaches have not been compared to state-of-the-art (SOTA) non GAN-based approaches. Additionally, many loss functions have been proposed for GAN-based approaches, but they have not been adequately compared. In this stud...
Preprint
Full-text available
Speech separation algorithms are often used to separate the target speech from other interfering sources. However, purely neural network based speech separation systems often cause nonlinear distortion that is harmful for automatic speech recognition (ASR) systems. The conventional mask-based minimum variance distortionless response (MVDR) beamform...
Preprint
Full-text available
Recently we proposed an all-deep-learning minimum variance distortionless response (ADL-MVDR) method where the unstable matrix inverse and principal component analysis (PCA) operations in the MVDR were replaced by recurrent neural networks (RNNs). However, it is not clear whether the success of the ADL-MVDR is owed to the calculated covariance matr...
Article
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, howe...
Preprint
Full-text available
With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is trad...
Article
Full-text available
Personal hearing devices, such as hearing aids, may be fine-tuned by allowing the users to conduct self-adjustment. Two self-adjustment procedures were developed to collect the listener preferred gains in six octave-frequency bands from 0.25 kHz to 8 kHz. These procedures were designed to allow rapid exploration of a multi-dimensional parameter spa...
Preprint
Full-text available
Continuous speech separation (CSS) aims to separate overlapping voices from a continuous influx of conversational audio containing an unknown number of utterances spoken by an unknown number of speakers. A common application scenario is transcribing a meeting conversation recorded by a microphone array. Prior studies explored various deep learning...
Preprint
Full-text available
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, howe...
Presentation
No PDF available ABSTRACT Phase is important for speech since it contributes to the quality and intelligibility during speech perception. Many speech enhancement algorithms lack the ability to predict phase for speech reconstruction and apply the noisy phase instead. In this study, we investigated the influence of phase distortion on the speech-qua...
Conference Paper
Full-text available
Hearing loss is prevalent among elderly adults, which leads to speech-understanding difficulties in noisy environments. Speech enhancement algorithms are thus proposed to alleviate this speech-in-noise problem. However, most of these algorithms have not been evaluated for hearing-impaired people either with or without the use of hearing aids. In th...
Conference Paper
Full-text available
Many speech enhancement algorithms have been proposed over the years and it has been shown that deep neural networks can lead to significant improvements. These algorithms , however, have not been validated for hearing-impaired listeners. Additionally, these algorithms are often evaluated under a limited range of signal-to-noise ratios (SNR). Here,...
Poster
Full-text available
Personal hearing devices, such as hearing aids, may be fine-tuned for individual users’ preferences by allowing them to self-adjust the amplification profiles. The purpose of the current study was to compare two self-adjustment methods in terms of their test-retest reliability. Both methods estimated preferred amplification profiles in six octave-f...
Presentation
Objective speech-quality metrics have been used widely as a tool to evaluate the performance of speech enhancement algorithms. Two widely adopted metrics are Perceptual Evaluation of Speech Quality (PESQ) and Hearing-Aid Speech Quality Index (HASQI). While PESQ is based on a highly-simplified phenomenological model of auditory perception for normal...
Article
Full-text available
A Bayesian adaptive procedure, the interleaved-equal-loudness contour (IELC) procedure, was developed to improve the efficiency in estimating the equal-loudness contour. Experiment 1 evaluated the test-retest reliability of the IELC procedure using six naive normal-hearing listeners. Two IELC runs of 200 trials were conducted and excellent test-ret...
Article
Retina-like imaging system is an imaging system with space-variant resolution similar to the photoreceptor distribution of primate retina. In this paper, the design and implementation of the retina-like imaging system based on non-uniform lens array has been introduced. Firstly, the mathematical model of the non-uniform lens array is deduced. Secon...

Network

Cited By