Sebastian Bosse's research while affiliated with Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut and other places

Publications (61)

Preprint
The emerging field of eXplainable Artificial Intelligence (XAI) aims to bring transparency to today's powerful but opaque deep learning models. While local XAI methods explain individual predictions in form of attribution maps, thereby identifying where important features occur (but not providing information about what they represent), global expla...
Preprint
Despite significant advances in machine learning, decision-making of artificial agents is still not perfect and often requires post-hoc human interventions. If the prediction of a model relies on unreasonable factors it is desirable to remove their effect. Deep interactive prototype adjustment enables the user to give hints and correct the model's...
Article
Full-text available
Gaming video streaming services are growing rapidly due to new services such as passive video streaming of gaming content, e.g. Twitch.tv, as well as cloud gaming, e.g. Nvidia GeForce NOW and Google Stadia. In contrast to traditional video content, gaming content has special characteristics such as extremely high and special motion patterns, synthe...
Article
Full-text available
Several video quality metrics (VQMs) have been proposed in many publications to predict how humans perceive video quality. It is common to observe significant disagreements amongst the quality predictions of these VQMs for the same video sequence. Following an extensive literature search, we found no publicised work that has investigated if such di...
Conference Paper
Accurate and reliable gesture recognition is a central problem in human-computer interaction (HCI). Many applications that make use of gesture recognition call for mobile devices with reduced power consumption, weight and form factors. Recent advances in computer vision were particularly brought by deep neural networks and come at the cost of high...
Preprint
The performance of visual quality prediction models is commonly assumed to be closely tied to their ability to capture perceptually relevant image aspects. Models are thus either based on sophisticated feature extractors carefully designed from extensive domain knowledge or optimized through feature learning. In contrast to this, we find feature ex...
Conference Paper
Volumetric video allows viewers to experience highly-realistic 3D content with six degrees of freedom in mixed reality (MR) environments. Rendering complex volumetric videos can require a prohibitively high amount of computational power for mobile devices. A promising technique to reduce the computational burden on mobile devices is to perform the...
Article
Full-text available
This article investigates neural and physiological correlates of simulator sickness (SS) through a controlled experiment conducted within a fully immersive dome projection system. Our goal is to establish a reliable, objective, and in situ measurable predictive indicator of SS. SS is a problem common to all types of visual simulators consisting of...
Preprint
Volumetric video allows viewers to experience highly-realistic 3D content with six degrees of freedom in mixed reality (MR) environments. Rendering complex volumetric videos can require a prohibitively high amount of computational power for mobile devices. A promising technique to reduce the computational burden on mobile devices is to perform the...
Preprint
Full-text available
With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable int...
Technical Report
With the coming of age of virtual/augmented reality and interactive media, numerous definitions, frameworks, and models of immersion have emerged across different fields ranging from computer graphics to literary works. Immersion is oftentimes used interchangeably with presence as both concepts are closely related. However, there are noticeable int...
Article
Background: Electroencephalography (EEG) is widely used to investigate human brain function. Simulation studies are essential for assessing the validity of EEG analysis methods and the interpretability of results. New method: Here we present a simulation environment for generating EEG data by embedding biologically plausible signal and noise int...
Article
Full-text available
The assessment of perceived quality based on psychophysiological methods recently gained attraction as it potentially overcomes certain flaws of psychophysical approaches. Although studies report promising results, it is not possible to arrive at decisive and comparable conclusions that recommend the use of one or another method for a specific appl...
Article
Full-text available
The PSNR and MSE are the computationally simplest and thus most widely used measures for image quality, although they correlate only poorly with perceived visual quality. More accurate quality models that rely on processing on both the reference and distorted image are potentially difficult to integrate in time-critical communication systems where...
Article
Steady-state visual evoked potentials (SSVEP) are neural responses, measurable using electroencephalography (EEG), that are directly linked to sensory processing of visual stimuli. In this study, SSVEP are used to assess the perceived quality of texture images. The EEG-based assessment method is compared to conventional methods and recorded EEG dat...
Article
Full-text available
In most practical situations, images and videos can neither be compressed nor transmitted without introducing distortions that will eventually be perceived by a human observer. Vice versa, most applications of image and video restoration techniques, such as inpainting or denoising, aim to enhance the quality of experience of human viewers. Correctl...
Article
This paper presents a deep neural network-based approach to image quality assessment (IQA). The network can be trained end-to-end and comprises 10 convolutional layers and 5 pooling layers for feature extraction, and 2 fully connected layers for regression, which makes it significantly deeper than related IQA methods. An unique feature of the propo...
Conference Paper
In this study we have assessed the quality of experience objectively when vertical disparity is introduced in the stereoscopic presentation of images. Four different conditions including a cube in 2D and the same cube in 3D with and without vertical disparities are compared based on the EEG signals recorded from 17 subjects. Two different vertical...
Conference Paper
In an objective approach for the assessment of quality of experience the neural correlates of EEG data are studied when stereoscopic images are presented in three different conditions containing vertical disparity. These conditions are compared to a similar image in 2D both on the channel level by studying the ERP components and on the source level...
Article
Objective: Neurophysiological correlates of vertical disparity in 3D images are studied in an objective approach using EEG technique. These disparities are known to negatively affect the quality of experience and to cause visual discomfort in stereoscopic visualizations. Approach: We have presented four conditions to subjects: one in 2D and thre...
Conference Paper
This paper presents a full-reference (FR) image quality assessment (IQA) method based on a deep convolutional neural network (CNN). The CNN extracts features from distorted and reference image patches and estimates the perceived quality of the distorted ones by combining and regressing the feature vectors using two fully connected layers. The CNN c...
Conference Paper
This paper investigates the robustness of two state-of-the-art action recognition algorithms: a pixel domain approach based on 3D convolutional neural networks (C3D) and a compressed domain approach requiring only partial decoding of the video, based on feature description using motion vectors and Fisher vector encoding (MV-FV). We study the robust...
Conference Paper
The assessment of perceived multimedia quality is a central research field in information and media technology. Conventionally, psychophysical techniques are used for determining the quality of multimedia signals. Recently, Brain-Computer Interfacing (BCI)-based methods have been proposed for the assessment of perceived multimedia signal quality. I...
Article
We present a survey of psychophysiology-based assessment for Quality of Experience (QoE) in advanced multimedia technologies. We provide a classification of methods relevant to QoE and describe related psychological processes, experimental design considerations, and signal analysis techniques. We summarise multimodal techniques and discuss several...
Conference Paper
This paper presents a no reference image (NR) quality assessment (IQA) method based on a deep convolutional neural network (CNN). The CNN takes unpreprocessed image patches as an input and estimates the quality without employing any domain knowledge. By that, features and natural scene statistics are learnt purely data driven and combined with pool...
Conference Paper
This paper proposes a reduced reference image quality assessment method using only a low number of features. It involves a shearlet decomposition, directional pooling of the obtained coefficient and extracts the scalewise statistical location parameter as a feature. The proposed method is tested and compared to similar approaches on the LIVE image...
Conference Paper
In this paper we propose a hybrid tracking method which detects moving objects in videos compressed according to H.265/HEVC standard. Our framework largely depends on motion vectors (MV) and block types obtained by partially decoding the video bitstream and occasionally uses pixel domain information to distinguish between two objects. The compresse...
Conference Paper
This paper presents a full-reference (FR) image quality assessment (IQA) method based on a deep convolutional neural network (CNN). The CNN extracts features from distorted and reference image patches and estimates the quality of the distorted ones by combining and regressing the feature vectors using two fully connected layers. Experiments are per...
Conference Paper
An approach to the neural measurement of perceived image quality using electroencephalography (EEG) is presented. 6 different images were tested on 6 different distortion levels. The distortions were introduced by a hybrid video encoder. The presented study consists of two parts: In a first part, subjects were asked to evaluate the quality of the t...
Article
Full-text available
Recent studies exploit the neural signal recorded via electroencephalography (EEG) to get a more objective measurement of perceived video quality. Most of these studies capitalize on the event-related potential component P3. We follow an alternative approach to the measurement problem investigating steady state visual evoked potentials (SSVEPs) as...
Article
Conventionally, the quality of images and related codecs are assessed using subjective tests, such as Degradation Category Rating. These quality assessments consider the behavioral level only. Recently, it has been proposed to complement this approach by investigating how quality is processed in the brain of a user (using electroencephalography, EE...
Article
Full-text available
The paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data. In addition to the known concept of disparity-compensated prediction, inter-view motion parameter and inter-view residual prediction for coding of the dependent video views have been developed and integrated. Furthermo...
Conference Paper
This paper presents an approach for 3D video coding that uses a format in which a small number of views as well as associated depth maps are coded and transmitted. At the receiver side, additional views required for displaying the 3D video on an autostereoscopic display can be generated based on the corresponding decoded signals by using depth imag...
Article
The presented approach for D video coding uses the multiview video plus depth format, in which a small number of video views as well as associated depth maps are coded. Based on the coded signals, additional views required for displaying the D video on an autostereoscopic display can be generated by depth image based rendering techniques. The devel...
Article
This paper describes a new encoder control method for multiview video plus depth coding. Since large parts of a multiview scenery are present in more than one of the captured video sequences, a depth-aware encoder control is introduced, which identifies those regions based on given depth maps and omits the coding of the residual signal for those re...
Article
Full-text available
An approach to the direct measurement of perception of video quality change using electroencephalography (EEG) is presented. Subjects viewed 8-s video clips while their brain activity was registered using EEG. The video signal was either uncompressed at full length or changed from uncompressed to a lower quality level at a random time point. The di...
Article
Today, H.264/AVC is the state-of-the-art video coding standard. Especially after the 2004 development of its High Profile (HP), it has become one of the primary formats in high definition video content delivery. Recently, a joint Call for Proposals (CfP) on video compression technology has been issued by ISO/IEC MPEG and ITU-T VCEG, targeting at th...
Conference Paper
This paper describes a novel video coding scheme that can be considered as a generalization of the block-based hybrid video coding approach of H.264/AVC. While the individual building blocks of our approach are kept simple similarly as in H.264/AVC, the flexibility of the block partitioning for prediction and transform coding has been substantially...
Article
Full-text available
A video coding architecture is described that is based on nested and pre-configurable quadtree structures for flexible and signal-adaptive picture partitioning. The primary goal of this partitioning concept is to provide a high degree of adaptability for both temporal and spatial prediction as well as for the purpose of space-frequency representati...
Conference Paper
Recent investigations have shown that a non-separable Wiener filter, that is applied inside the motion-compensation loop, can improve the coding efficiency of hybrid video coding designs. In this paper, we study the application of separable Wiener filters. Our design includes the possibility to adaptively choose between the application of the verti...

Citations

... There is a standing version in which the user pushes a virtual shopping trolley and a sitting version in which the user steers a virtual electric scooter. Navigation is based on a real handle bar (shown in the right image of Fig. 1) that is mapped to the respective handlebar in VR [25]. A straightforward approach to implement self-representation with synchronous movement would be a motion capture system. ...
... Another aspect concerning the fulfillment of social needs by means of telemeeting technology is the feeling of co-presence [136], [137], i.e., the feeling of being there with the other person(s), or ''a sense of being together in a shared space at the same time'' [138], [139]. Another related term is that of social presence, i.e., ''the sense of being together with a virtual or remotely located communication partner'', which implies the feeling of co-presence and being in a communication with the other persons [138]- [141]. ...
... In the last decades, with the development of image quality assessment, scientists began to explore the neural mechanism of image quality perception. Neurophysiological approaches are treated as complementary methods to traditional psychophysical ones since quality assessment processes occur inside the media consumer's brain [1][2][3][4][5][6]. In the wake of the development of the electroencephalogram (EEG) technique, neurophysiological assessment of image quality becomes more economical and portable [7][8][9][10][11][12][13][14][15][16]. ...
... They introduced a novel mixed reality system for nondestructive evaluation (NDE) training, for which, after a user study, they concluded that such systems are preferred for NDE training. In [168], Gul et al. presented a Kalman filter for headmotion prediction for cloud-based volumetric video streaming. Practically, server-side rendering systems, although they can provide high-resolution 3D content in any device with an acceptable internet connection speed, suffer from interaction latency. ...
... More recently, the use of deep learning and machine learning (ML) techniques have been proposed as a potential method to gain a better insight into the relationship between physiological changes and the subjective feeling of VIMS (Tauscher et al., 2020). For instance, Li et al. (2019) showed that classification of users in a VR application as sick or non-sick was possible with a high accuracy rate based on measures of Electroencephalography (EEG), postural sway, and head and waist motion tracking. ...
... Regression techniques are fit for forecasting, e.g., future traffic demand or user behavior, or for learning complex relationships, such as relating network Quality of Service (QoS) indicator to user Quality of Experience (QoE) as exemplified in Fig. 4-(a). In this latter context, a large body of literature employed ML techniques, to e.g., learn QoS indicators such as latency distribution [82], [83] from topology, traffic matrix and routing information, or learn QoE indicators for specific applications such as Web [84], [85], video [86], [87] or games [88]. ...
... A modification of the PSNR, called PSNR-HVS-M, that takes into account the calculated between-coefficient masking and the contrast sensitivity function is proposed. Another approach that proved effective is simple block-wise weighting of the MSE based on spatial activity of each block, resulting in WPSNR and XPSNR measures (Helmrich et al., 2020). ...
... wPSNR [29] weighs spatial frequency in the error image using contrast sensitivity function to calculate the peak-signal-to-noise ratio. Comparing wPSNR as in Fig. 4d, the proposed codec outperformed all other codecs for Funt and Mixed datasets ranging from 3.6 to 32.7% improvement in performance. ...
... However, both algorithms were proposed for different use cases from the proposed algorithm. Bosse et al. proposed a distortion sensitivity model based on a deep neural network to estimate bitrate at CTU-level [43]. The work shows a significant improvement compared with a constant QP setting under the 'all-intra' configuration of the HEVC encoder. ...