John H. L. Hansen

John H. L. Hansen
University of Texas at Dallas | UTD · CRSS: Center for Robust Speech Systems, Dept. of Electrical Engineering

About

625
Publications
126,304
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,585
Citations

Publications

Publications (625)
Preprint
Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-t...
Conference Paper
Full-text available
Adult-child interaction is an important component for language development in young children. However, such development varies based on the quality and quantity of these conversations. Teachers responsible for the language acquisition of their students have a vested interest in improving such conversation in their classrooms. Advancements in speech...
Article
Assessing child growth in terms of speech and language development is a critical indicator of long term learning ability and life-long development progress. The earlier an at-risk child is identified, the earlier support can be provided to reduce the social impact of the speech or language issue. The preschool classroom provides an opportunity for...
Article
Natural compensation of speech production in challenging listening environments is referred to as the Lombard effect (LE). The resulting acoustic differences between neutral and Lombard speech have been shown to provide intelligibility benefits for normal hearing (NH) and cochlear implant (CI) listeners alike. Motivated by this outcome, three LE pe...
Preprint
Full-text available
Audio analysis for forensic speaker verification offers unique challenges in system performance due in part to data collected in naturalistic field acoustic environments where location/scenario uncertainty is common in the forensic data collection process. Forensic speech data as potential evidence can be obtained in random naturalistic environment...
Preprint
Full-text available
With the rapid development of intelligent vehicles and Advanced Driver-Assistance Systems (ADAS), a new trend is that mixed levels of human driver engagements will be involved in the transportation system. Therefore, necessary visual guidance for drivers is vitally important under this situation to prevent potential risks. To advance the developmen...
Preprint
Full-text available
The goal of speech separation is to extract multiple speech sources from a single microphone recording. Recently, with the advancement of deep learning and availability of large datasets, speech separation has been formulated as a supervised learning problem. These approaches aim to learn discriminative patterns of speech, speakers, and background...
Preprint
Full-text available
Most current speech technology systems are designed to operate well even in the presence of multiple active speakers. However, most solutions assume that the number of co-current speakers is known. Unfortunately, this information might not always be available in real-world applications. In this study, we propose a real-time, single-channel attentio...
Article
Full-text available
Experimental hardware-research interfaces form a crucial role during the developmental stages of any medical, signal-monitoring system as it allows researchers to test and optimize output results before perfecting the design for the actual FDA approved medical device and large-scale production. These testing platforms, intake the raw signal through...
Conference Paper
Full-text available
Many communities which are experiencing increased gun violence are turning to acoustic gunshot detection systems (GSDS) with the hope that their deployment would provide increased 24/7 monitoring and the potential for more rapid response by law enforcement to the scene. In addition to real-time monitoring, data collected by gunshot detection system...
Article
Young children's friendships fuel essential developmental outcomes (e.g., social-emotional competence) and are thought to provide even greater benefits to children with or at-risk for disabilities. Teacher and parent report and sociometric measures are commonly used to measure friendships, and ecobehavioral assessment has been used to capture its f...
Preprint
In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This str...
Conference Paper
Full-text available
While speech understanding for cochlear implant (CI) users in quiet is relatively effective, listeners experience difficulty in identification of speaker and sound location. Previous studies have reported improved localization and better speech perception when the CI is coupled with a second CI or hearing aid in the contralateral ear. This is refer...
Article
Full-text available
An objective metric that predicts speech intelligibility under different types of noise and distortion would be desirable in voice communication. To date, the majority of studies concerning speech intelligibility metrics have focused on predicting the effects of individual noise or distortion mechanisms. This study proposes an objective metric, the...
Article
Full-text available
Intelligent vehicles and Advanced Driver Assistance Systems (ADAS) are being developed rapidly over the past few years. Many applications such as vehicle localization, environment perception, and path planning have shown promising potentialities. While there is great interest in migrating from complete human-controlled vehicles towards fully autono...
Preprint
Full-text available
Part 1: Broad Description: This Early-concept Grants for Exploratory Research (EAGER) funding project focuses on exploring and developing a novel operational collection of speech, language and perception-based measures to objectively assess speech intelligibility for second language (L2) speech production, as well as providing effective learner-spe...
Article
Full-text available
With the rapid development of intelligent vehicles and Advanced Driver-Assistance Systems (ADAS), a new trend is that mixed levels of human driver engagements will be involved in the transportation system. Therefore, necessary visual guidance for drivers is vitally important under this situation to prevent potential risks. To advance the developmen...
Article
Speech, speaker, and language systems have traditionally relied on carefully collected speech material for training acoustic models. There is an enormous amount of freely accessible audio content. A major challenge, however, is that such data is not professionally recorded, and therefore may contain a wide diversity of background noise, nonlinear d...
Article
The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for s...
Poster
Full-text available
It has been previously shown that advantages in auditory processing exist when the following situational context traits or subject/system properties are present: (i) availability of a wider radial range up to 360 degrees, (ii) intolerance towards acoustic or visual obstruction, (iii) distant event horizon, (iv) availability for quick neural process...
Article
Full-text available
Most current speech technology systems are designed to operate well even in the presence of multiple active speakers. However, most solutions assume that the number of co-current speakers is known. Unfortunately, this information might not always be available in real-world applications. In this study, we propose a real-time, single-channel attentio...
Article
Performance of Automatic Speech Recognition (ASR) systems is known to suffer considerable degradation when exposed to Far-Field speech data capture. Consequently, far-field ASR has received considerable attention in recent years. Motivated by our recent work using Curriculum Learning (CL) based strategies to improve Speaker Identification (SID) und...
Article
Speaker recognition continues to grow as a research challenge in the field with expanded application in commercial, forensic, educational and general speech technology interfaces. However, challenges remain, especially for naturalistic audio streams including recordings with mismatch between train and test data (i.e., when train or system developme...
Conference Paper
Restoration of auditory function among hearing impaired individuals using Cochlear Implant (CI) technology has contributed significantly towards an improved quality of life. CI users experience greater challenges in recognizing speech effectively in noisy, reverberant, or time-varying diverse environments. Most CI research efforts focus on enhancin...
Article
Full-text available
Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the...
Article
Cochlear implants (CIs) and hearing aids (HAs) are advanced assistive hearing devices that perform sound processing to achieve acoustic to acoustic/electrical stimulation, thus enabling the prospects for hearing restoration and rehabilitation. Since commercial CIs/HAs are typically constrained by manufacturer design/production constraints, it is ne...
Preprint
Full-text available
Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field. In this study, we propose a novel framework to disentangle speaker-related and domain-specific features and apply domain adaptation on the speaker-related feature s...
Preprint
Full-text available
With the widespread use of telemedicine services, automatic assessment of health conditions via telephone speech can significantly impact public health. This work summarizes our preliminary findings on automatic detection of respiratory distress using well-known acoustic and prosodic features. Speech samples are collected from de-identified telemed...
Article
Full-text available
Speech technology systems such as Automatic Speech Recognition (ASR), speaker diarization, speaker recognition, and speech synthesis have advanced significantly by the emergence of deep learning techniques. However, none of these voice-enable systems perform well in natural environmental circumstances, specifically in situations where one or more p...
Article
Normalizing intrinsic variabilities (e.g., variability in speech production brought on by aging, physical or cognitive task stress, Lombard effect, etc.) in speech and speaker recognition models is essential for system robustness. This study focuses on analysis of speech under physical task stress with its application for speaker recognition and ph...
Preprint
Full-text available
In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowle...
Conference Paper
Full-text available
In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowle...
Conference Paper
Full-text available
Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directl...
Conference Paper
Full-text available
In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations. We use the global context information to enhance important channels and recalibrate salient time-frequency locations by computing the similari...
Preprint
Traditionally, the performance of non-native mispronunciation verification systems relied on effective phone-level labelling of non-native corpora. In this study, a multi-view approach is proposed to incorporate discriminative feature representations which requires less annotation for non-native mispronunciation verification of Mandarin. Here, mode...
Preprint
Full-text available
Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings. The lack of real naturalistic forensic audio corpora with ground-truth speaker identity represents a major challenge in this field. It is also difficult to directl...
Preprint
In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations. We use the global context information to enhance important channels and recalibrate salient time-frequency locations by computing the similari...
Preprint
Full-text available
The Fearless Steps Initiative by UTDallas-CRSS led to the digitization, recovery, and diarization of 19,000 hours of original analog audio data, as well as the development of algorithms to extract meaningful information from this multi-channel naturalistic data resource. The 2020 FEARLESS STEPS (FS-2) Challenge is the second annual challenge held f...
Article
Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To b...
Preprint
Full-text available
The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications. One of the most popular variants of these FCNs is the `U-Net', which is an encoder-decoder network with skip connections. In this study, we propose `SkipConvNet' where we replace each skip connection with mu...
Preprint
With the rapid development of intelligent vehicles and Advanced Driving Assistance Systems (ADAS), a mixed level of human driver engagements is involved in the transportation system. Visual guidance for drivers is essential under this situation to prevent potential risks. To advance the development of visual guidance systems, we introduce a novel s...
Conference Paper
Full-text available
With the rapid development of intelligent vehicles and Advanced Driving Assistance Systems (ADAS), a mixed level of human driver engagements is involved in the transportation system. Visual guidance for drivers is essential under this situation to prevent potential risks. To advance the development of visual guidance systems, we introduce a novel s...
Conference Paper
Full-text available
Internet of things (IoT) in healthcare, has effi-ciently accelerated medical monitoring and assessment through the real-time analysis of collected data. Hence, to support the hearing-impaired community with better calibrations to their clinical processors and hearing aids, a portable smart space interface - AURIS has been developed by the Cochlear...
Article
Full-text available
The goal of this study is to determine potential intelligibility benefits from Lombard speech for cochlear implant (CI) listeners in speech-in-noise conditions. “Lombard effect” (LE) is the natural response of adjusting speech production via auditory feedback due to noise exposure within acoustic environments. To evaluate intelligibility performanc...
Chapter
There are a number of scenarios where effective human-to-human speech communication is vital, yet either limited speech production capabilities due to pathology or in noisy environmental conditions limit intelligible information exchange and reduce overall quality. Traditionally, front-end speech enhancement techniques have been employed to allevia...
Preprint
Full-text available
Naturalistic speech recordings usually contain speech signals from multiple speakers. This phenomenon can degrade the performance of speech technologies due to the complexity of tracing and recognizing individual speakers. In this study, we investigate the detection of overlapping speech on segments as short as 25ms using Convolutional Neural Netwo...
Preprint
Full-text available
Speech separation refers to extracting each individual speech source in a given mixed signal. Recent advancements in speech separation and ongoing research in this area, have made these approaches as promising techniques for pre-processing of naturalistic audio streams. After incorporating deep learning techniques into speech separation, performanc...
Preprint
Despite significant efforts over the last few years to build a robust automatic speech recognition (ASR) system for different acoustic settings, the performance of the current state-of-the-art technologies significantly degrades in noisy reverberant environments. Convolutional Neural Networks (CNNs) have been successfully used to achieve substantia...
Preprint
Full-text available
Training acoustic models with sequentially incoming data -- while both leveraging new data and avoiding the forgetting effect-- is an essential obstacle to achieving human intelligence level in speech recognition. An obvious approach to leverage data from a new domain (e.g., new accented speech) is to first generate a comprehensive dataset of all d...
Article
No PDF available ABSTRACT Over the past two decades, machine learning technologies have been targeting real-world problems in the Speech-Language (SLT) domain. Speech corpora developed under diverse environments have been paramount to progress, though most are simulated/controlled scenarios. Success in Machine Learning for SLT requires new innovati...
Article
No PDF available ABSTRACT Apollo-11 was the first manned space mission to successfully bring astronauts to the moon and return them safely. As a massive collaborative effort, with astronauts flying the missions in outer space, the entire communications between flight controllers, their backroom support teams and, astronauts have taken place inside...
Article
No PDF available ABSTRACT Between 1963 and 1972, a massive team of dedicated scientists, engineers, and specialists at the NASA Mission Control Center (MCC) worked seamlessly together in a cohesive manner to successfully carry out multiple manned missions to the moon. All communications between personnel were carried out over multiple inter-connect...
Article
No PDF available ABSTRACT Cochlear Implants—Hearing Aids(CI/HA) offer opportunities for researchers wishing to advance algorithm development, since the existing commercial CIs/HAs are closed/sealed for customization, due to intellectual property and safety requirements. In general, Research Platforms (RPs), such as CCi-MOBILE (developed at CRSS-CIL...
Conference Paper
Full-text available
Advanced driver assistance and automated driving systems rely on risk estimation modules to predict and avoid dangerous situations. Current methods use expensive sensor setups and complex processing pipeline, limiting their availability and robustness. To address these issues, we introduce a novel deep learning based action recognition framework fo...