The human auditory system manages to handle very different tasks ranging from orientation
in complex traffic situations, speech communication at a crowded party or communication via mobile devices, even in highly adverse situations where the target signal is disturbed by different types of maskers as environmental noise, disturbing talkers, detrimental sound reflections or distortions from signal processing. Therefore, experimental methods from different fields of hearing research as psychoacoustics (discrimination or detection thresholds), speech intelligibility, and audio quality are required to capture the abilities and limitations of the auditory system. Only a few rather complex auditory models have been demonstrated to be applicable to predict data from psychoacoustics, speech intelligibility and audio quality, reflecting the three areas of auditory perception considered in this thesis. However, some parameters (e.g., the frequency range of the auditory filterbank) were often adapted according to the individual experiments. A generalized modeling approach, that consequently uses identical model parameters and processing stages for the extraction of auditory features in the model front end in combination with a task-dependent decision stage (back end) would be required to identify and understand which features are universal and capture information relevant for predictions of experiments in the three areas of auditory perception considered here. Moreover, with regard to computational efficiency of the model as would be required for applications as, for example, online monitoring of
speech quality for signal processing algorithms in hearing-aids, it is unclear to which extent such a generalized auditory modeling approach can be simplified while still providing a reasonable prediction performance.
Hence, the aim of this thesis is to provide a modeling approach with low complexity, that
consists of a joint front end only including basic auditory processing stages required to account for the most relevant masking effects, and a task-dependent back end for predicting effects of psychoacoustic masking, speech intelligibility, and audio quality. The first part (chapter 2) of this thesis suggests an auditory modeling approach based on the power spectrum model (PSM; Fletcher, 1940, Patterson and Moore, 1986) and the envelope power spectrum model (EPSM; Ewert and Dau, 2000) as front end to predict psychoacoustic masking and speech intelligibility on basis of spectral and temporal features. The proposed model was assessed by a critical set of psychoacoustic and speech intelligibility experiments and achieved a prediction performance comparable to state-of-the-art models for predicting psychoacoustic and speech intelligibility data. Motivated by findings from Schubotz et al. (2016), implying the relevance of short-time power features for speech intelligibility predictions, the second part (chapter 3) provides a revised spectral feature analysis within the PSM-pathway of the model suggested in the first part. This revised model was successfully evaluated with the identical set of experiments applied in the first part of this work, and the speech intelligibility experiments carried out in Schubotz et al. (2016).
An analysis of the PSM- and EPSM-pathway of the revised model provides information about
the contribution of spectral and temporal cues to speech intelligibility predictions for different maskers. The third part of this thesis (chapter 4) represents an extension of the auditory models presented in chapters 2 and 3 to account for signal degradations in terms of audio quality. The suggested audio quality model was successfully evaluated for four databases with different types of distortions that cover a broad range of quality influencing factors and offered better average prediction performance across the four databases than other state-of-the-art quality models. So far, the proposed modeling approaches described in the previous chapters only rely on monaural cues, while binaural cues are not considered. The fourth part of this thesis (chapter 5) contributes towards an binaural extension of these proposed models by providing an experimental evaluation framework, that can be applied as benchmark test to binaural speech intelligibility models. Thus, in chapter 5, based on the studies of Schubotz et al. (2016), Ewert et al. (2017), the effect of different room acoustical properties on speech reception thresholds and the spatial release were assessed. Findings of this study indicate the importance of spatial cues for speech intelligibility in reverberant surroundings. Taken together, this thesis offers a generalized modeling approach
for predicting data from of psychoacoustic masking, speech intelligibility, and audio quality
experiments. Additionally, the thesis provides benchmark databases that can be utilized for the development and evaluation of auditory models.