Julien Epps

Julien Epps
UNSW Sydney | UNSW

About

299
Publications
77,001
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,013
Citations
Introduction
Skills and Expertise

Publications

Publications (299)
Article
Quantum computing, communications, sensing, and simulations are radically transformative technologies, with great potential to impact industries and economies. Worldwide, national governments, industries, and universities are moving to create a new class of workforce—the Quantum Engineers. Demand for such engineers is predicted to be in the tens of...
Article
Adults over the age of 60 years are a rising population at-risk for depression, and there is a need to create automatic screening for this illness. Most existing voice-based depression datasets comprise speakers younger than 60 and variations in speech due to age and depression are not well understood. In this study, which uses Patient Health Quest...
Conference Paper
Full-text available
While the psychological Stroop color test has frequently been used to analyze response delays in temporal cognitive processing, minimal research has examined incorrect/correct verbal test response pattern differences exhibited in healthy control and clinically depressed populations. Further, the development of speech error features with an emphasis...
Preprint
Full-text available
Quantum technology is exploding. Computing, communication, and sensing are just a few areas likely to see breakthroughs in the next few years. Worldwide, national governments, industries, and universities are moving to create a new class of workforce - the Quantum Engineers. Demand for such engineers is predicted to be in the tens of thousands with...
Preprint
Full-text available
Biologically inspired auditory models play an important role in developing effective audio representations that can be tightly integrated into speech and audio processing systems. Current computational models of the cochlea are typically expressed in terms of systems of differential equations and do not directly lend themselves for use in computati...
Article
Full-text available
Currently, there is an increasing global need for COVID-19 screening to help reduce the rate of infection and at-risk patient workload at hospitals. Smartphone-based screening for COVID-19 along with other respiratory illnesses offers excellent potential due to its rapid-rollout remote platform, user convenience, symptom tracking, comparatively low...
Article
The cochlea is a remarkable spectrum analyser with desirable properties including sharp frequency tuning and level-dependent compression and the potential advantages of incorporating these characteristics in a speech processing front-end are investigated. This paper develops a framework for an active transmission line cochlear model employing adapt...
Conference Paper
Detecting depression from the voice in naturalistic environments is challenging, particularly for short-duration audio recordings. This enhances the need to interpret and make optimal use of elicited speech. The rapid consonant-vowel syllable combination ‘pataka’ has frequently been selected as a clinical motor-speech task. However, there is signif...
Article
In this article, we describe and discuss the design-based approach for signal processing education at the undergraduate level at the University of New South Wales (UNSW) Sydney. The electrical engineering (EE) undergraduate curriculum at UNSW Sydney includes three dedicated signal processing courses as well as a design course that involves a major...
Article
Individuals that have incurred trauma due to a suicide attempt often acquire residual health complications, such as cognitive, mood, and speech-language disorders. Due to limited access to suicidal speech audio corpora, behavioral differences in patients with a history of suicidal ideation and/or behavior have not been thoroughly examined using sub...
Conference Paper
Full-text available
In this paper we describe our children’s Automatic Speech Recognition (ASR) system for the first shared task on ASR for English non-native children’s speech. The acoustic model comprises 6 Convolutional Neural Network (CNN) layers and 12 Factored Time-Delay Neural Network (TDNN-F) layers, trained by data from 5 different children’s speech corpora....
Presentation
Full-text available
Currently, there is an increasing global need for COVID-19 screening to help reduce the rate of infection and at- risk patient workload at hospitals. Smartphone-based screening for COVID-19 along with other respiratory illnesses offers excellent potential due to its rapid-rollout remote platform, user convenience, symptom tracking, comparatively lo...
Article
Physiological and behavioral measures allow computing devices to augment user interaction experience by understanding their mental load. Current techniques often utilize complementary information between different modalities to index load level typically within a specific task. In this study, we propose a new approach utilizing the timing between p...
Conference Paper
Full-text available
Depression disorders are a major growing concern worldwide, especially given the unmet need for widely deployable depression screening for use in real-world environments. Speech-based depression screening technologies have shown promising results, but primarily in systems that are trained using laboratory-based recorded speech. They do not generali...
Article
Like many psychological scales, depression scales are ordinal in nature. Depression prediction from behavioural signals has so far been posed either as classification or regression problems. However, these naive approaches have fundamental issues because they are not focused on ranking, unlike ordinal regression, which is the most appropriate appro...
Article
With the emergence of low-cost wearable hardware for eye activity analysis comes the opportunity to use pupil and blink behavior during conversations to improve human computer interaction. Conversations in general can be decomposed into four segments, i.e. listening, speaking, thinking (transition from listening to speaking) and waiting (transition...
Article
Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for developing any robust machine learning model in general. In th...
Article
Full-text available
In the future, automatic speech-based analysis of mental health could become widely available to help augment conventional healthcare evaluation methods. For speech-based patient evaluations of this kind, protocol design is a key consideration. Read speech provides an advantage over other verbal modes (e.g. automatic, spontaneous) by providing a cl...
Article
For longitudinal behavior analysis, task type is an inevitable and important variable. In this paper, we propose an event-based behavior modeling approach and employ non-invasive wearable sensing modalities - eye activity, speech and head movement - to recognize task load level under four different task load types. The novelty lies in converting ph...
Article
Full-text available
The processing of speech as an explicit sequence of events is common in automatic speech recognition (linguistic events), but has received relatively little attention in paralinguistic speech classification despite its potential for characterizing broad acoustic event sequences. This paper proposes a framework for analyzing speech as a sequence of...
Article
Full-text available
The eyelid contour, pupil contour, and blink event are important features of eye activity, and their estimation is a crucial research area for emerging wearable camera-based eyewear in a wide range of applications e.g., mental state estimation. Current approaches often estimate a single eye activity, such as blink or pupil center, from far-field an...
Article
Full-text available
The massive and growing burden imposed on modern society by depression has motivated investigations into early detection through automated, scalable and non-invasive methods, including those based on speech. However, speech-based methods that capture articulatory information effectively across different recording devices and in naturalistic environ...
Preprint
Artificial intelligence and machine learning systems have demonstrated huge improvements and human-level parity in a range of activities, including speech recognition, face recognition and speaker verification. However, these diverse tasks share a key commonality that is not true in affective computing: the ground truth information that is inferred...
Preprint
Full-text available
Despite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs to be improved to make commercial application of SER viable. A key underlying reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for developing any robust machine learning model in general. In...
Conference Paper
Full-text available
Detection of depression from speech has attracted significant research attention in recent years but remains a challenge, particularly for speech from diverse smartphones in natural environments. This paper proposes two sets of novel features based on speech landmark bigrams associated with abrupt speech articulatory events for depression detection...
Preprint
Full-text available
Speech emotion recognition is a challenging task and heavily depends on hand-engineered acoustic features, which are typically crafted to echo human perception of speech signals. However, a filter bank that is designed from perceptual evidence is not always guaranteed to be the best in a statistical modelling framework where the end goal is for exa...
Article
Full-text available
Physical activity recognition using wearable sensors has achieved good performance in discriminating heterogeneous activities for health monitoring, but there has been less investigation of sedentary activities, e.g. desk work, which is often physically homogenous, to improve health in office environments. In this study, we explored head movement a...
Conference Paper
Full-text available
As the use of machine-learning techniques expand throughout the healthcare industry, the future of automatic speech analysis holds substantial promise for a non-invasive, investigative diagnosis or monitoring method for numerous medical conditions. However, depending on the disease/disorder, there is still significant uncertainty about which elicit...
Presentation
Full-text available
While acoustic-based links between clinical depression and abnormal speech have been established, there is still however little knowledge regarding what kinds of phonological content is most impacted. Moreover, for automatic speech-based depression classification and depression assessment elicitation protocols, even less is understood as to what ph...
Presentation
Full-text available
As the use of machine-learning techniques expand throughout the healthcare industry, the future of automatic speech analysis holds substantial promise for a non-invasive, investigative diagnosis or monitoring method for numerous medical conditions. However, depending on the disease/disorder, there is still significant uncertainty about which elicit...
Article
Full-text available
There are many benefits to facilitating ‘always-on’ pupillary light reflex (PLR)-aware pupil size measurement in eyewear, including improving the reliability of pupil-based cognitive and affective load monitoring and enabling PLR-based diagnosis of cognitive and eye-related diseases which have neurological symptoms manifested in the form of aberran...
Article
Eye activity based within-task cognitive load measurement is currently not feasible in everyday situations. One important issue to be addressed to move such cognitive load measurement beyond controlled laboratory environments is determining practical methods for mitigating the pupillary light reflex (PLR) effect in cognitive load measurement. In th...
Conference Paper
Full-text available
This paper presents a novel framework for speech-based continuous emotion prediction. The proposed model characterises the perceived emotion estimation as time-invariant responses to salient events. Then arousal and valence variation over time is modelded as the ouput of a parallel array of time-invariant filters where each filter represents a sali...
Article
Full-text available
A task is arguably the most basic unit of human activity, yet, we currently have only extremely limited means by which to detect a change in task and to estimate the level of physical, mental, and other types of load experienced during tasks. Task analysis today is typically a manual, subjective process, except when all of the user?s primary tasks...
Conference Paper
Full-text available
Depression is a leading cause of disease burden worldwide, however there is an unmet need for screening and diagnostic measures that can be widely deployed in real-world environments. Voice-based diagnostic methods are convenient, non-invasive to elicit, and can be collected and processed in near real-time using modern smartphones, smart speakers,...
Article
The effects of psychomotor retardation associated with clinical depression are linked to a reduction in variability in acoustic parameters. However, linguistic stress differences between non-depressed and clinically depressed individuals have yet to be investigated. In this paper, by examining vowel articulatory parameters, statistically significan...
Article
Full-text available
The fact that emotions are dynamic in nature and evolve across time has been explored relatively less often in automatic emotion recognition systems to date. Although within-utterance information about emotion changes recently has received some attention, there remain open questions unresolved, such as how to approach delta emotion ground truth, ho...
Article
Full-text available
Phonetic variability has long been considered a confounding factor for emotional speech processing, so phonetic features have been rarely explored. However, surprisingly some features with purely phonetic information have shown state-of-the-art performance for continuous prediction of emotions (e.g. arousal and valence), for which the underlying ca...
Article
Full-text available
Cross-corpus speech emotion recognition can be a useful transfer learning technique to build a robust speech emotion recognition system by leveraging information from various speech datasets - cross-language and cross-corpus. However, more research needs to be carried out to understand the effective operating scenarios of cross-corpus speech emotio...
Article
Full-text available
Latent representation of data in unsupervised fashion is a very interesting process. It provides more relevant features that can enhance the performance of a classifier. For speech emotion recognition tasks generating effective features is very crucial. Recently, deep generative models such as Variational Autoencoders (VAEs) have gained enormous su...
Article
Full-text available
This paper introduces a novel speech-based depression score prediction paradigm, the 2-stage ranking prediction framework, and highlights the benefits it brings to depression prediction. Conventional regression approaches aim to discern a single functional relationship between speech features and depression scores, making an implicit assumption abo...