One in five experiences hearing loss. The World Health Organization estimates that this number will increase to one in four in 2050. Luckily, effective hearing devices such as hearing aids and cochlear implants exist with advanced speaker enhancement algorithms that can significantly improve the quality of life of people suffering from hearing loss. State-of-the-art hearing devices, however, underperform in a so-called `cocktail party' scenario, when multiple persons are talking simultaneously (such as at a family dinner or reception). In such a situation, the hearing device does not know which speaker the user intends to attend to and thus which speaker to enhance and which other ones to suppress. Therefore, a new problem arises in cocktail party problems: determining which speaker a user is attending to, referred to as the auditory attention decoding (AAD) problem.
The problem of selecting the attended speaker could be tackled using simple heuristics such as selecting the loudest speaker or the one in the user's look direction. However, a potentially better approach is decoding the auditory attention from where it originates, i.e., the brain. Using neurorecording techniques such as electroencephalography (EEG), it is possible to perform AAD, for example, by reconstructing the attended speech envelope from the EEG using a neural decoder (i.e., the stimulus reconstruction (SR) algorithm). Integrating AAD algorithms in a hearing device could then lead to a so-called `neuro-steered hearing device'. These traditional AAD algorithms are, however, not fast enough to adequately react to a switch in auditory attention, and are supervised and fixed over time, not adapting to non-stationarities in the EEG and audio data. Therefore, the general aim of this thesis is to develop novel signal processing algorithms for EEG-based AAD that allow fast, accurate, unsupervised, and time-adaptive decoding of the auditory attention.
In the first part of the thesis, we compare different AAD algorithms, which allows us to identify the gaps in the current AAD literature that are partly addressed in this thesis. To be able to perform this comparative study, we develop a new performance to evaluate AAD algorithms in the context of adaptive gain control for neuro-steered hearing devices.
In the second part, we address one of the main signal processing challenges in AAD: unsupervised and time-adaptive algorithms. We first develop an unsupervised version of the stimulus decoder that can be trained on a large batch of EEG and audio data without knowledge of ground-truth labels on the attention. This unsupervised but subject-specific stimulus decoder, starting from a random initial decoder, outperforms a supervised subject-independent decoder, and, using subject-independent information, even approximates the performance of a supervised subject-specific decoder. We also extend this unsupervised algorithm to an efficient time-adaptive algorithm, when EEG and audio are continuously streaming in, and show that it has the potential to outperform a fixed supervised decoder in a practical use case of AAD.
In the third part, we develop novel AAD algorithms that decode the spatial focus of auditory attention to provide faster and more accurate decoding. The developed methods achieve a much higher accuracy compared to the SR algorithm at a very fast decision rate. Furthermore, we show that these methods are also applicable on different directions of auditory attention, using only EEG channels close to the ears, and when generalizing to data from an unseen subject.
To summarize, in this thesis we have developed crucial building blocks for a plug-and-play, time-adaptive, unsupervised, fast, and accurate AAD algorithm that could be integrated with a low-latency speaker separation and enhancement algorithm, and a wearable, miniaturized EEG system to eventually lead to a neuro-steered hearing device.