Article

The influence of video quality on perceived audio quality and vice versa

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Three questions are being studied. Question one: `To what extent is the perceived audio quality of an audiovisual stream influenced by the video quality?' Question two is the reverse: `To what extent is the perceived video quality of an audiovisual stream influenced by the audio quality?' Finally: `How do audio and video quality contribute to the overall perceived audiovisual quality?' The quality ranges from broadcast audio and video quality to standard videophone (telephone) quality. The main conclusion is that when subjects are asked to judge the audio quality of an audiovisual stimulus, the video quality will contribute significantly to the subjectively perceived audio quality. The effect is about 1.2 points on a nine-point quality scale. The reversed effect is much smaller, about 0.2 point. Furthermore a simple mapping from the audio and video quality to the overall audiovisual quality is given, and it is shown that the video quality dominates the overall perceived audiovisual quality in nonconversational experiments.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The two most recent studies regarding audio and video quality were conducted in 1999 and 2001 [10,11]. Unfortunately, they were just at the beginning of the digital era, so what they used as the standard of home media and reproduction during the period is far before high definition or even ultra-high definition media available today. ...
... Streaming was also not available during this time. Joley, Montard, and Buttin concluded that the audio significantly influenced the video [11], whereas Berrends and De Caluwe reported the opposite [10]. In 2015, a similar experiment was conducted, although solely on audio changes. ...
... In contrast, the perceived audio quality ratings across all subjects (Fig. 5) showed a significant main effect of the audio resolution and not of the video resolution. It means that different video resolutions did not influence the perceived audio quality, which is contradictory to the study reported by Berrends and De Caluwe [10] or several other studies that reported that the auditory system is dominated by the visual system [5][6][7][8][9]. The interaction was significant between the audio and video resolutions, probably because the mean ratings were very similar (and not statistically different) for the 56 kbps and 320 kbps audio resolutions. ...
Conference Paper
Full-text available
A new quality assessment test was carried out to examine the relationship between the perception of audio and video resolutions. Three video resolutions and four audio resolutions were used to answer the question: "Does lower resolution video influence the perceived quality of audio, or vice versa?" Subjects were asked to use their own equipment, which they would be likely to stream media with. They were asked to watch a short video clip of various qualities and to indicate the perceived audio and video qualities on separate 5-point Likert scales. Four unique 10-second video clips were presented in each of 12 experimental conditions. The perceived audio and video quality ratings data showed different effects of audio and video resolutions. The perceived video quality ratings showed a significant effect of audio resolutions, whereas the perceived audio quality did not show a significant effect of video resolutions. Subjects were divided into two groups based on the self-identification of whether they were visually or auditorily inclined. These groups showed slightly different response patterns in the perceived audio quality ratings.
... Though various details of neurophysiological processing of audiovisual data remain unknown, empirical studies have demonstrated that the auditory and the visual domain have mutual influence on the perceived overall audiovisual quality [14]. One set of studies, e.g., [159,160], indicated that the video channel is more important in perceived audiovisual quality. Other, however, e.g., [2], have suggested that the audio channel is more vital than video one, specially in teleconference scenario where humans pay more attention to audio information. ...
... Detailed studies about the effects of various types of interaction between audio and video modalities on perceptual quality can be seen, e.g. in [2,5,6]. Majority of previous works indicate that video quality has more influence on perceptual audio quality than vice versa [159]. However, contrary results have been reported in the literature, e.g., the study in [2] showed audio quality to be more vital than video quality in 'talking head' scenarios. ...
... Few studies on human cognitive understanding suggest that audio and video channel might be integrated in an early phase of human perception formation [167]. Based on this, several researchers [2,159] proposed audiovisual quality models as a multiplication of audio and video quality with equal importance, as shown in (Eq. 6): ...
Article
Full-text available
Measuring perceived quality of audio-visual signals at the end-user has become an important parameter in many multimedia networks and applications. It plays a crucial role in shaping audio-visual processing, compression, transmission and systems, along with their implementation, optimization and testing. Service providers are enacting different quality of service (QoS) solutions to issue the best quality of experience (QoE) to their customers. Thus, devising precise perception-based quality metrics will greatly help improving multimedia services over wired and wireless networks. In this paper, we provide a comprehensive survey of the works that have been carried out over recent decades in perceptual audio, video and joint audiovisual quality assessments, describing existing methodologies in terms of requirement of a reference signal, feature extraction, feature mapping and classification schemes. In this context, an overview of quality formation and perception, QoS, QoE as well as QoP (quality of perception) is also presented. Finally, open issues and challenges in audio-visual quality assessment are highlighted and potential future research directions are discussed.
... The mutual influence is often established by the MOS of the audio and video quality while stimuli are presented in unimodal and audio-visual modes. Results of previous studies suggest a rninor mutual influence of unimodal qual ities (Beerends & De Caluwe, 1999;Kitawaki, Arayama & Ya mada, 2005;ITU SG 12 Contribution COM 12.61-E, 1998), such that judgments of audio quality were influenced by video quality and vice versa. Beerends and De Caluwe (1999) additionally examined whether judgments of an unimodal quality differs from audio-visual presentations but could not find any significant differences. ...
... Results of previous studies suggest a rninor mutual influence of unimodal qual ities (Beerends & De Caluwe, 1999;Kitawaki, Arayama & Ya mada, 2005;ITU SG 12 Contribution COM 12.61-E, 1998), such that judgments of audio quality were influenced by video quality and vice versa. Beerends and De Caluwe (1999) additionally examined whether judgments of an unimodal quality differs from audio-visual presentations but could not find any significant differences. In fact, the judgments between presentation modes differed by less than one percent in their experiment. ...
... With regard to the mutual influence of the unimodal qualities, results indicate a cross-modal influence of the video quality on perceived audio quality. Previous studies suggest a small but noticeable influence for both modalities on each other (Beerends & De Caluwe, 1999;Kitawaki et al., 2005;ITU SG 12 Contri bution COM 12.61-E, 1998). The results suggest that the perceived video qual ity was not influenced by the audio quality, which should not exclude other possibilities. ...
Article
Full-text available
One of the main questions of audio-visual quality perception is what influences the receptive overall impression. While video quality has the highest impact on audio-visual quality assessment across all video genres (You et al., 2010), research showed that audio quality is more influential in music videos compared to other video genres (Garcia et al., 2011). The present study investigated the extent to which unimodal qualities influence the perceived audio-visual quality as well as mutual influences of the unimodal qualities by using three professional music videos. Participants evaluated the subjectively perceived audio and video quality in addition to a combination (audio-visual quality). Results show that the quality of both modalities influenced the audio-visual quality similarly and is not dominated by the video quality. Furthermore, the perceived audio quality was affected by the video quality but not vice versa. In other words, participants’ judgments were not influenced by the simultaneously presented audio quality when judging video quality, yet they were influenced by the video quality when judging audio quality. This indicates that in music videos the perceived audio quality is more strongly influenced by the video quality than vice versa.
... Our study investigates how different qualities shape together a combined quality perception. This is similar to two aspects which have been researched in the area of QoE: how variations in quality over time are combined to a final judgment [21] and how audio and video quality are combined to an overall perceived audiovisual quality [5]. Findings in the area of time varying quality found the 'recency' effect [2,21]: when participants are asked to rate the quality of a video with segments in different quality the last presented quality had a stronger than average impact. ...
... Here it is investigated how the audio and video channel impact the overall experience of the user [5-9, 27, 56]. One of the main findings from this research was that there is a clear interaction effect between the audio and visual channel: if one of the two is kept constant and the other stream varied in quality the perceived quality of the unmodified stream will also be rated worse [5]. ...
... The crowdsourcing experiment was conducted over the crowdsourcing platform Microworkers. 5 In total 959 crowdworkers finished one of the campaigns, of which 153 did not answer the content questions correctly. We further removed 12 participants because they gave unreasonable ages (e.g. 2 years). ...
Article
Full-text available
In desktop multi-party video-conferencing videostreams of participants are delivered in different qualities, but we know little about how such composition of the screen affects the quality of experience. Do the different videostreams serve as indirect quality references and the perceived video quality is thus dependent on other streams in the same session? How is the relation between the perceived qualities of each stream and the perceived quality of the overall session? To answer these questions we conducted a crowdsourcing study, in which we gathered over 5000 perceived quality ratings of overall sessions and individual streams. Our results show a contrast effect: high quality streams are rated better when more low quality streams are co-present, and vice versa. In turn, the quality perception of an overall session can increase significantly by exchanging one low quality stream with a high quality one. When comparing the means of individual and overall ratings we can further observe that users are more critical when asked for individual streams than for an overall rating. However, the results show that while contrast effect exists, the effect is not strong enough, to optimize the experience by lowering the quality of other participants.
... Perceptual findings, like the one presented by Beerends and De Caluwe [18], stating that there is a significant mutual influence between audio and video quality, suggest a joint design and examination of audio and video in audiovisual systems. Having this in mind, we present a joint system design and implementation of a complete audio-visual end-to-end chain from spatial audio and video recording, data processing, to immersive reproduction, albeit focusing mainly on the audio part. ...
... Beerends and De Caluwe [18], there is a significant mutual influence between audio and video. Our findings confirm that especially spatial audio perception can be significantly supported by a visual stimulus. ...
... Especially professional musicians were extremely sensitive to temporal misalignment. Our observations are in line with the quantitative findings by Beerends and De Caluwe [18]. ...
Conference Paper
The complex mutual interaction between human visual perception and hearing demands combined examinations of 360◦ video and spatial audio systems for Virtual Reality (VR) applications. Therefore, we present a joint audio-visual end-to-end chain from spatial recording to immersive reproduction with full rotational three degrees of freedom (3DOF). The audio-subsystem is based on Higher Order Ambisonics (HOA) obtained by Spherical Microphone Array (SMA) recordings, while the video is captured with a 360 camera rig. A spherical multi-loudspeaker setup for audio in conjunction with a VR head-mounted video display is used to reproduce a scene as close as possible to the original scene with regard to the perceptual modalities of the user. A data base of immersive content as a basis for future research in spatial signal processing was set up by recording several rehearsals and concerts of the Aachen Symphony Orchestra. The data was used for a qualitative assessment of the eligibility of the proposed end-to-end system. A discussion shows the potential and limitations of the approach. Therein, we highlight the importance of coherent audio and video to achieve a high degree of immersion with VR recordings.
... Additionally, the inQ was developed for audio/video content, whereas the (ITU-R, 2012) recommendations are to use video material only. Reviewing the literature on combining audio and video quality yielded results from Beerends & de Caluwe (1999), showing that people rate video content higher when high quality audio content is presented together with the video content. Furthermore, video content presented together with low quality audio content caused a lowering of the ratings for perceived video quality. ...
... If the audio/video content mainly consists of talking heads, the audio quality weighs slightly heavier than the video quality. For content with higher levels of motion, video quality appears to be more important (Beerends & de Caluwe, 1999;Hands, 2004). ...
Thesis
Full-text available
The main objective of the thesis work is to show how image and video quality studies can incorporate cognitive and affective elements when evaluating Quality of Experience (QoE) for audio and video content. The focus of the thesis is on perceived video quality when video is streamed via a wireless network, as this can give an irregular throughput. The latest definition of QoE, as stated in the Qualinet White Paper (2013), “is the degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and / or enjoyment of the application or service in the light of the user’s personality and current state”. There are still quite some unknowns in this area, especially around what the exact link is between affective factors such as involvement, sensory perception, and the underlying technical parameters. This thesis therefore sets out a model which aims to clarify the relationship between perceived video quality and involvement. It also aims to supply an operational definition of involvement, as well as a measure which can be used in the bigger framework of Quality of Experience. Assuming that Human Influencing Factors (IFs) are important to determine people’s preferences, a measure which reflects people’s opinion about scene content is necessary. A literature study indicated that related existing measures are not applicable since they assume objects (in marketing), interaction with the content (User Experience, games and immersion) or virtual reality (presence). To develop a fitting measure, a psychometric approach was adopted. First, it was necessary to devise an operational definition of involvement. This was done through concept mapping, and the end result is a model for involvement with audio/video content with 6 attributes: captivation, expressions of involvement, informative interest, relatedness, negative affect and disinterest. Based on the developed model, an involvement questionnaire (inQ) was developed and tested. A first draft version was quickly tested for understanding and wording through cognitive interviews. The second version of the inQ was then placed online, where about 100 participants scored 9 different video scenes with the involvement questionnaire. Results were analysed with the help of exploratory factor analyses, and a third version of the inQ was created. The third version of the InQ was validated through an experiment where 100 participants were shown 15 one-minute scenes, 5 different fragments, with three versions each: a reference version, a version where temporal artifacts (e.g. jerkiness) were induced, and a version where spatial artifacts (e.g. blockiness or blurriness) were induced. The results of this test showed that there was a 2-way relationship between the inQ and the perceived video quality, meaning that scores on inQ can partially predict scores on perceived video quality and vice versa. The results also showed that involvement can be defined by three attributes: positive / negative effect, relatedness and internal captivation. Video content can be judged differently across these factors, which could help differentiating people’s preferences for temporal or spatial artifacts in the future. To conclude, the work presented in this thesis has shown that it is possible to create reliable and valid measures for Human IFs. This was established through both an explorative data analysis and a confirmative analysis. Furthermore, it has also shown that involvement with audio/video content is a salient Human IF and an important characteristic for QoE. Hence, involvement should be taken into account whenever participants are asked to judge video or audio quality, as it may change their quality judgement.
... Audio/Video Quality: Previous literature have shown that the audio/video quality of a video has a major impact on the viewers' perception of product quality [3,50]. To explore whether production quality is also important for crowdfunding videos, we asked crowd workers four questions. ...
... We used these subjective ratings to conduct our quantitative analysis. We collected opinions and subjective ratings from 15 different MTurk workers for each of the 210 videos to reduce the bias across individual workers 3 . For the statistical analysis, we averaged the 15 responses for each video. ...
Conference Paper
To successfully raise money using crowdfunding, it is important for a campaign to communicate ideas or products effectively to the potential backers. One of the lesser explored but powerful components of a crowdfunding campaign is the campaign video. To better understand how videos affect campaign outcomes, we analyzed videos from 210 Kickstarter campaigns across three different project categories. In a mixed-methods study, we asked 3150 Amazon Mechanical Turk (MTurk) workers to evaluate the campaign videos. We found six recurrent factors from a qualitative analysis as well as quantitative analysis. Analysis revealed product related and video related factors that were predictive of the final outcome of campaigns over and above the static project representation features identified in previous studies. Both the qualitative and quantitative analysis showed that videos influenced perception differently for projects in different categories, and the differential perception was important for predicting successes of the projects. For example, in technology campaigns, projects perceived to have a lower level of complexity were more likely to be successful; but in design and fashion campaigns, projects perceived to have a higher level of complexity - which perhaps reflected craftsmanship - were more likely to be successful. We conclude with design implications to better support the video making process.
... Furthermore, a significant cross-modal influence of audio on visual quality and vice versa has been reported in [4,27]. Beerends and Caluwe [4] found a bidirectional interaction between video and audio, being the influence of video quality on the perceived audio quality significantly (about six times) higher than the inverse effect. ...
... Furthermore, a significant cross-modal influence of audio on visual quality and vice versa has been reported in [4,27]. Beerends and Caluwe [4] found a bidirectional interaction between video and audio, being the influence of video quality on the perceived audio quality significantly (about six times) higher than the inverse effect. In coherence, the overall perceived audiovisual quality was found to be dominated by the video quality in nonconversational experiments. ...
Article
Full-text available
The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-016-3360-z Recent studies encourage the development of sensorially-enriched media to enhance the user experience by stimulating senses other than sight and hearing. Sensory effects as odor, wind, vibration and light effects, as well as an enhanced audio quality, have been found to favour media enjoyment and to have a positive influence on the sense of Presence and on the perceived quality, relevance and reality of a multimedia experience. In particular, sports is among the genres that could benefit the most from these solutions. Several works have demonstrated also the technical feasibility of implementing and deploying end-to-end solutions integrating sensory effects into a legacy system. Thus, multi-sensorial media emerges as a mean to deliver a new form of immersive experiences to the mass market in a non-disruptive manner. However, many questions remain concerning issues as the sensory effects that can better complement a given audiovisual content or the best way in which to integrate and combine them to enhance the user experience of a target audience segment. The work presented in this paper aims to gain insight into the impact of binaural audio and sensory (light and olfactory) effects on the sports media experience, both at the overall level (average effect) and as a function of users’ characteristics (heterogeneous effects). To this aim, we conducted an experimental study exploring the influence of these immersive elements on the quality and Presence dimensions of the media experience. Along the quality dimension, we look for possible variations on the quality scores assigned to the overall media experience and to the media components content, image, audio and sensory effects. The potential impact on Presence is analyzed in terms of Spatial Presence and Engagement. The users’ characteristics considered encompass specific personal affective, cognitive and behavioral attributes. At the overall level we found that participants preferred binaural audio over standard stereo audio and that the presence of sensory effects increased significantly the level of Spatial Presence. Several heterogeneous effects were also revealed as a result of our experimental manipulations. Whereas binaural audio was found to have a generalized impact on the majority of the quality and Presence measures considered, the effects of sensory effects concentrate mainly on the Presence dimension. Personal characteristics explained most of the variation in the dependent variables, being individuals’ preferences in relation to the content, knowledge of involved technologies, tendency to emotional involvement and conscientiousness among the user variables with the most generalized influence. In particular, the former two features seem to present a conflict in the allocation of attentional resources towards the media content versus the technical features of the system, respectively. Additionally, football fans’ experience seems to be modulated by emotional processes whereas for not fans cognitive processes –and in particular those related to quality judgment– prevail.
... Nevertheless, from an end-to-end point of view, the impact of different impairments on user experience is analyzed in some works, but only taking into account specific impairments ( Eg, Griwodz, Halvorsen, & Behne, 2015) or with a small population ( Arslan, Usman, & Shin, 2015). In this sense, many issues are assumed, such as audio preference versus video ( Beerends & De Caluwe, 1999). However, the formal evaluation of different types of impairments over different types of contents is not taken into account in the subjective studies. ...
... But not only the different types of content, affected by different types of impairments, influence the QoE. There are other factors that affect the perceived quality, like the relationship between video perceived quality and audio perceived quality ( Beerends & De Caluwe, 1999). As authors in Gaston, Boley, Selter, and Ratterman (2010) state, audio artifacts can either enhance or reduce the ability to detect a certain video impairment present in the same sequence. ...
Article
Nowadays, access to multimedia content is one of the most demanded services on the Internet. However, the transmission of audio and video over these networks is not free of problems that negatively affect user experience. Factors such as low image quality, cuts during playback or losses of audio or video, among others, can occur and there is no clear idea about the level of distortion introduced in the perceived quality. For that reason, different impairments should be evaluated based on user opinions, with the aim of analyzing the impact in the perceived quality. In this work, we carried out a subjective evaluation of different types of impairments with different types of contents, including news, cartoons, sports and action movies. A total of 100 individuals, between the ages of 20 and 68, participated in the subjective study. Results show that short-term rebuffering events negatively affect the quality of experience and that desynchronization between audio and video is the least annoying impairment. Moreover, we found that the content type determines the subjective results according to the impairment present during the playback.
... Beerends and de Caluwe [8] found very clear multimodal interactions in real audiovisual systems. Tests showed that video quality influences subjective opinions of audio quality and vice versa. ...
... This test focuses on the recording aspects of a VC room only, and vision is omitted. This is consistent with the second conclusion by Beerends and de Caluwe [8] stating that the overall audiovisual quality can be predicted from the perceived audio quality in an audio only experiment, and the perceived video quality in a video only experiment. ...
Article
In telephone and video conferencing situations, the transfer function from a talker to a listener involves two rooms, one microphone and one loudspeaker. The room and the electroacoustical factors affect the speech quality for the listener, and this influence has been studied in this paper. Experiments were carried out with a loudspeaker playing back anechoically recorded speech in a room with variable reverberation time. Three different types of microphones recorded this speech at two different distances. These monaurally recorded clips were then played back to listeners over a single loudspeaker in a separate semi-anechoic room. A three-factor test was carried out as a within-subject design with three factors: two reverberation times, two microphone distances, and three microphone types (omnidirectional, supercardioid, and 3rd order hypercardioid). 21 persons participated in the pair-comparison test, with the task to judge which of the two stimuli in each pair was preferred, including two repetitions. Results were analyzed using ANOVA and the loglinear Bradley-Terry model. Significant effects were found for all the three main factors, but interactions were found to be negligible. The three levels of microphone directivity lead to steps in preference rating on the same order as the steps caused by changing the microphone distance. The change in room reverberation had less effect on experienced quality.
... Subjective tests have clearly shown that there is a strong inter-relationship between audio and video quality [2], and thus research has progressively focused on developing combined audiovisual models. The authors in [33] focused on the relative importance of the audio and video quality in the audiovisual quality assessment and questioned whether a regression model predicting audiovisual quality can be devised that is generally applicable. ...
... All audiovisual metric combinations yield relatively good Pearson coefficients between MOS n and MOS p (PLCC > 0. 8 Figure 8 presents surface fittings of MOS n using parametric regression of the proposed model, i.e. (2). The obtained surfaces fit the data in a less rigid manner than the surfaces obtained with (1), suggesting a better approximation of MOS n . ...
Article
Full-text available
The MPEG-DASH protocol has been rapidly adopted by most major network content providers and enables clients to make informed decisions in the context of HTTP streaming, based on network and device conditions using the available media representations. A review of the literature on adaptive streaming over mobile shows that most emphasis has been on adapting the video quality whereas this work examines the trade-off between video and audio quality. In particular, subjective tests were undertaken for live music streaming over emulated mobile networks with MPEG-DASH. A group of audio/video sequences was designed to emulate varying bandwidth arising from network congestion, with varying trade-off between audio and video bit rates. Absolute Category Rating was used to evaluate the relative impact of both audio and video quality in the overall Quality of Experience (QoE). One key finding from the statistical analysis of Mean Opinion Scores (MOS) results using Analysis of Variance indicates that providing reduced audio quality has a much lower impact on QoE than reducing video quality at similar total bandwidth situations. This paper also describes an objective model for audiovisual quality estimation that combines the outcomes from audio and video metrics into a joint parametric model. The correlation between predicted and subjective MOS was computed using several outcomes (Pearson and Spearman correlation coefficients, Root Mean Square Error (RMSE) and epsilon-insensitive RMSE). The obtained results indicate that the proposed approach is a viable solution for objective audiovisual quality assessment in the context of live music streaming over mobile network.
... Two studies support the hypothesis that television images presented with a high-quality soundtrack are more 'involving' and of better quality [7,17]. However, another study from Beerends [28] based on 25-s commercials reported asymmetric interaction effects with a noticeable influence of the video quality level on the perceived audio quality (0.5 on a 5-point MOS scale) and a weaker influence of the audio quality level on the perceived video quality (only 0.15 on a MOS). Comparing these results to the ones obtained with headand-shoulders material, Hands pointed out that the nature of the audiovisual content may have influenced the results as commercials are visually more captivating, thus leading to a more video dominant situation [29]. ...
... In turn, the video quality level had an impact on the subjective audio quality toward high audio quality and for the passive context only. The observed cross-modal interactions were measured to be on average of 0.5 on a 5-point MOS scale, which is in line with two studies from the literature [26,28] but contrasts with the larger value found in [27] (above 2 MOS points). The absence of significant influence of the audio quality level on the perceived video quality for both experimental contexts in [26] can be explained by the reduced range of variation in audio quality level used in that experiment. ...
Article
Full-text available
This paper investigates multi-modal aspects of audiovisual quality assessment for interactive communication services. It shows how perceived auditory and visual qualities integrate to an overall audiovisual quality perception in different experimental contexts. Two audiovisual experiments are presented and provide experimental data for the present analysis. First, two experimental contexts are compared, i.e., passive ‘viewing and listening’ and interactive, with regard to their impact on the audiovisual qualities as subjectively perceived by the user. Second, the effects of cross-modal interactions on the assessment of the audio and video qualities are measured for those experimental contexts. The results are compared to the ones found in the literature revealing both similarities and differences in terms of magnitude and also in which cases they occur. Third, the impact of the conversational scenario on the assessment of the auditory and visual qualities is investigated. Finally, results from the literature related to audiovisual integration are gathered by the type of application. A general integration function is proposed for each category, and the performances of these ‘application-oriented’ models demonstrate a direct gain in prediction.
... Furthermore, although in many cases, the test sequences in subjective quality experiments are watched by the participants without audio, subjective studies have shown that it can influence the perceived quality, and that the degradations affecting (jointly or separately) the audio and video signals impact the user QoE [60], [65], [66]. Thus, objective metrics have been developed trying to model the overall audiovisual quality [65]- [67]. ...
... Furthermore, although in many cases, the test sequences in subjective quality experiments are watched by the participants without audio, subjective studies have shown that it can influence the perceived quality, and that the degradations affecting (jointly or separately) the audio and video signals impact the user QoE [60], [65], [66]. Thus, objective metrics have been developed trying to model the overall audiovisual quality [65]- [67]. This influence can be emphasized when visualising 360°videos, given that audio can impact the exploration and attention of the observers [68]- [71], thus, possibly influencing the quality assessments. ...
Article
Full-text available
Recently an impressive development in immersive technologies, such as Augmented Reality (AR), Virtual Reality (VR) and 360 video, has been witnessed. However, methods for quality assessment have not been keeping up. This paper studies quality assessment of 360 video from the cross-lab tests (involving ten laboratories and more than 300 participants) carried out by the Immersive Media Group (IMG) of the Video Quality Experts Group (VQEG). These tests were addressed to assess and validate subjective evaluation methodologies for 360 video. Audiovisual quality, simulator sickness symptoms, and exploration behavior were evaluated with short (from 10 seconds to 30 seconds) 360 sequences. The following factors' influences were also analyzed: assessment methodology, sequence duration, Head-Mounted Display (HMD) device, uniform and non-uniform coding degradations, and simulator sickness assessment methods. The obtained results have demonstrated the validity of Absolute Category Rating (ACR) and Degradation Category Rating (DCR) for subjective tests with 360 videos, the possibility of using 10-second videos (with or without audio) when addressing quality evaluation of coding artifacts, as well as any commercial HMD (satisfying minimum requirements). Also, more efficient methods than the long Simulator Sickness Questionnaire (SSQ) have been proposed to evaluate related symptoms with 360 videos. These results have been instrumental for the development of the ITU-T Recommendation P.919. Finally, the annotated dataset from the tests is made publicly available for the research community.
... However, the first audiovisual quality models to be found in the literature appeared as late as in the 90's. At this time, these models addressed either analog degradations, such as audio and video noise [49,35,3], or compression artifacts, such as blockiness [8,50,52,24]. For an overview of audiovisual quality models covering analog and compression degradations, see [106]. ...
Chapter
Full-text available
This chapter addresses QoE in the context of video streaming services. Both reliable and unreliable transport mechanisms are covered. An overview of video quality models is provided for each case, with a focus on standardized models. The degradations typically occurring in video streaming services, and which should be covered by the models, are also described. In addition, the chapter presents the results of various studies conducted to fill the gap between the existing video quality models and the estimation of QoE in the context of video streaming services. These studies include work on audiovisual quality modeling, field testing, and on the user impact. The chapter finishes with a discussion on the open issues related to QoE.
... Train colors can influence the loudness estimation of train sound [4−5] , and the loudness estimation of red train sound was 15% higher than that of light green train. The interaction between human perception of audio quality and video quality also exists [6,7] . When the correlation of these two factors was different, the degree of these two factors affecting each other was also different. ...
Article
Though the importance of visual-auditory interaction is increasing, the research on the auditory perception in the multisensory mode is not comprehensive enough. This paper addresses the JND (Just Noticeable Difference) change of auditory perception with synchronous visual stimuli. Through psychoacoustics experiments, loudness JND, subjective duration JND and pitch JND of pure tone were measured in auditory-only mode and visual-auditory mode with different visual stimuli which have different attributes such as color, illumination, quality and moving state. Statistical analyses of the experimental data indicate that, comparing with JND in auditory-only mode, the amount of JND with visual stimuli is often larger. The JND's average increment of subjective duration, pitch and loudness are severally 45.1%, 14.8% and 12.3%. The conclusion is that the ability of JND-based auditory perception often decreases with visual stimuli. The incremental amount of JND is significantly effected by the attributes of visual stimuli. If the visual stimuli make subjects feel more comfortable, the JND of auditory perception will change smaller.
... Real-life applicability is the degree to which users can use Smart TV appropriately in various situations [5]. Perceived sound quality is the degree of perceived sound quality output from the Smart TV [6]. Perceived security is the degree to which the Smart TV appears to safely handle personal information and avoid unnecessary exposure [14]. ...
Conference Paper
The objective of this research is to explore and identify Smart TV user experience (UX) factors over different time periods employing multiple methods so as to overcome the weakness of a single study approach. To identify the effect of contextual dimensions on the Smart TV UX, we conducted empirical studies exploiting different methods of think-aloud and diary method under two usage conditions: laboratory and real-life in the participants' residence. The factors identified through each study were integrated into a single set and further refined through peer review resulting in a final set of 19 UX factors. Metrics for these 19 UX factors were generated and used in an online survey, in which over 300 Smart TV users participated. The empirical evidences from each study suggest that the UX factors vary with respect to product temporality. The findings indicate practical implications for Smart TV manufacturers, marketing managers, application developers, and service providers.
... For video, a whole range of metrics have emerged, such as the basic PSNR (Peak Signal Noise Ratio), SSIM (Structural similarity), and PEVQ (Perceptual Evaluation of Video Quality) [6]. However, subjective tests have shown [7] that there is a strong inter-relationship between audio and video, and thus research has more recently focused on developing a combined audio-visual model. In [8], a review of 978-1-5090-0354-9/16/$31.00 c 2016 IEEE audio and video metrics is presented as well as an investigation into the key issues in developing joint audio-visual quality metrics. ...
Conference Paper
Full-text available
The rapid adoption of MPEG-DASH is testament to its core design principles that enable the client to make the informed decision relating to media encoding representations, based on network conditions, device type and preferences. Typically , the focus has mostly been on the different video quality representations rather than audio. However, for device types with small screens, the relative bandwidth budget difference allocated to the two streams may not be that large. This is especially the case if high quality audio is used, and in this scenario, we argue that increased focus should be given to the bit rate representations for audio. Arising from this, we have designed and implemented a subjective experiment to evaluate and analyse the possible effect of using different audio quality levels. In particular, we investigate the possibility of providing reduced audio quality so as to free up bandwidth for video under certain conditions. Thus, the experiment was implemented for live music concert scenarios transmitted over mobile networks, and we suggest that the results will be of significant interest to DASH content creators when considering bandwidth tradeoff between audio and video.
... The analysis showed that the perceived audio quality was also slightly affected by the manipulation of the video quality (note the audio quality stayed the same throughout the experiment). This effect that has been shown in previous research [13]. Our analysis showed that the differences in video quality could only explain small parts of the differences in perceived audio quality, but gave indications that the differences are related to either user or group factors. ...
... Especially in popular music genres an increasing number of music videos are produced to raise more awareness of the performers. Research on intermodal integration of video and sound suggest a reciprocal influence of these modalities on perception (Beerends & de Caluwe, 1999;You, Reiter, Hannuksela, Gabbouj, & Perkis, 2010;Ernst & Rohde, 2012). Although the combination of music with visual components has a long tradition predating the internet, the widespread use of music videos has multiplied with the internet, leading to an increased need for compressed digital formats. ...
Conference Paper
The widespread use of music videos is a result of technological developments and has fundamentally changed the possibilities and habits of music perception in the last decade. As a consequence of these new possibilities, it is necessary to further investigate the influence of the visual modality on the perceived music experience for audiovisual recordings. This explorative study investigates the extent to which the quality of compressed visual information affects the perceived audio quality of music videos. A live music-video (performance) was edited according to three visual compression rates. The audio quality remained constant. Musically experienced participants judged the auditory quality of the music video as well as the visual and overall quality. Results reveal a significant influence of the visual compression rate on perceived audio quality. Therefore, the rate of data compression needs to be carefully considered for music videos, if individuals should be encouraged to purchase the music.
... In general, the results Diffgrade obtained showed that video presence had a small (but statistically significant) effect on the audio scores. This observation is in line with results obtained by Beerends and Caluwe [21]. ...
Article
The subjective effects of controlled limitation of audio bandwidth on the assessment of audio quality were studied. The investigation was focused on the standard 5.1 multichannel audio setup and limited to the optimum listening position. The effect of video presence on the audio quality assessment was also investigated. The results of formal subjective tests indicato that it is possible to limit the bandwidth of the center or the rear channels without significant deterioration of the audio quality for most program material types investigated. Video presence had a small effect on the audio quality assessment.
... There is also interaction between audio quality and video quality. When the relevance of audio contents and visual contents is different, they would affect each other at the different degree [3]- [4]. John J. O'hare used four colors to investigate their influence on audible threshold of four tones with different frequencies. ...
Conference Paper
The previous research on visual-auditory interaction has been shown that the quality of visual stimuli may have effect on auditory perception. This paper investigated the quantitative relation between the just noticeable difference (JND) of pure tone loudness and the mean opinion score (MOS) of video quality of different bit rate. In the psychoacoustic experiments loudness JND was measured, in which visual stimuli with three kinds of bit rate were presented to subjects with audio stimuli simultaneously. MOS of video quality was evaluated by subjective assessment. The experiment results indicated that with the decrease of the video bit rate, video quality MOS declined, and loudness JND increased. From experiment results, it is concluded that loudness perception showed downtrend when MOS of video quality declined.
... For contents corresponding to news, teleconference or music clip, the audio stream quality has greater weight on the overall quality [4]. In addition, some studies have shown that there is a significant mutual interaction between the video and the audio quality [5]. ...
... Rimell and Hollier [109] state that: "the quality of one mode affects the perceived quality of the other mode and a single mode should not be considered in isolation". When judging the overall quality of an audiovisual system, video quality is found to be the dominant element [110,111]. Furthermore, if the perceiver's attention is focused on a particular modality, their ability to detect errors in the other modality is greatly impaired [112]. Audiovisual content is found to have an impact on the relative importance of audio and video quality [113]. ...
... where a, b, c are weighting coefficients and d represents a constant. The precision of a prediction of the audiovisual quality using the above function was studied, among others, by Beerends et al. [42]. The exact values of the coefficients are likely to depend on many factors. ...
Conference Paper
This paper addresses the need to develop unified methods for subjective and objective quality assessment across speech, audio, picture and multimedia applications. Commonalities and differences between the currently used standards are overviewed. Examples of the already undertaken research attempting to “bridge the gap” between the quality assessment methods used in various disciplines are indicated. Prospective challenges faced by researchers in the unification process are outlined. They include development of unified scales, defining unified anchors, integration of objective models, maintaining “backward comparability” and undertaking joint standardization efforts across industry sectors.
... We decided to leave out audio of the tests, as the goal was to evaluate subjective video quality. Since audio quality can affect video quality perception, this would have been an undesirable influence on the results [4]. Note that the influence of video quality on perceived audio quality is higher than the other way around, but audio quality does affect video quality and this influence should be avoided. ...
Conference Paper
Full-text available
In viewport-adaptive streaming of omnidirectional video, only the field of view is streamed in high quality. While this has significant benefits over streaming the entire 360 sphere, no standard test method for perceived quality and simulator sickness is available to evaluate the quality of experience (QoE) of such streaming approaches. QoE testing is important as tile-based viewport-adaptive streaming technologies are replacing classical approaches because of significant bandwidth savings and increase in viewing quality. In this work, we propose a testbed, a test method, as well as test metrics for QoE tests of viewport-adaptive streaming approaches. The proposed method is validated in two different test setups, using a specific tile-based streaming technology available in the market. The chosen input variables (videos sequences, resolution, bandwidth, and network round-trip delay) are tested for their statistical significance. We found that our test method is suitable for QoE testing of viewport-adaptive streaming technologies. We also found that simulator sickness scores increase with test duration, but that breaks between tests reduce this effect. With our systematic test approach, it is possible to compare metrics among different test setups. On the tested technology, we found that a typical network delay (47 ms) only has a minimal effect on the quality ratings. Furthermore, the magnitude of the network delay does not influence simulator sickness for the system we have tested.
... However, it was only the case for one of the three used content types. Similarly, a study by Beerends et al., using a 29cm monitor, found that the rating of video quality was slightly higher when accompanied by CD quality audio than when accompanied by no audio (Beerends & de Caluwe, 1999). The effect, however, was small and has not been replicated with small screens. ...
Chapter
Full-text available
This chapter provides an overview of the key factors that influence the quality of experience (QoE) of mobile TV services. It compiles the current knowledge from empirical studies and recommendations on four key requirements for the uptake of mobile TV services: (1) handset usability and its acceptance by the user, (2) the technical performance and reliability of the service, (3) the usability of the mobile TV service (depending on the delivery of content), and (4) the satisfaction with the content. It illustrates a number of factors that contribute to these requirements ranging from the context of use to the size of the display and the displayed content. The chapter highlights the interdependencies between these factors during the delivery of content in mobile TV services to a heterogeneous set of low resolution devices.
... In certain tests it may be necessary to perform the tests in an anechoic chamber, but for headphone based tests, a quiet listening test room is deemed sufficient in most cases (Brüggen, 2001;Faller and Baumgarte, 2003;Par et al., 2005;Katz and Parseihian, 2012;Brinkmann et al., 2014b;Ahrens and Andersson, 2019). In evaluations where audio is accompanied by visuals, video quality has been found to have a notable effect on perceived quality of audio (Beerends and De Caluwe, 1999). Therefore, even in audio only tests, the accompanying visual components are still considered. ...
Thesis
--- Download link: http://etheses.whiterose.ac.uk/26445/ --- Humans can localise sounds in all directions using three main auditory cues: the differences in time and level between signals arriving at the left and right eardrums (interaural time difference and interaural level difference, respectively), and the spectral characteristics of the signals due to reflections and diffractions off the body and ears. These auditory cues can be recorded for a position in space using the head-related transfer function (HRTF), and binaural synthesis at this position can then be achieved through convolution of a sound signal with the measured HRTF. However, reproducing soundfields with multiple sources, or at multiple locations, requires a highly dense set of HRTFs. Ambisonics is a spatial audio technology that decomposes a soundfield into a weighted set of directional functions, which can be utilised binaurally in order to spatialise audio at any direction using far fewer HRTFs. A limitation of low-order Ambisonic rendering is poor high frequency reproduction, which reduces the accuracy of the resulting binaural synthesis. This thesis presents novel HRTF pre-processing techniques, such that when using the augmented HRTFs in the binaural Ambisonic rendering stage, the high frequency reproduction is a closer approximation of direct HRTF rendering. These techniques include Ambisonic Diffuse-Field Equalisation, to improve spectral reproduction over all directions; Ambisonic Directional Bias Equalisation, to further improve spectral reproduction toward a specific direction; and Ambisonic Interaural Level Difference Optimisation, to improve lateralisation and interaural level difference reproduction. Evaluation of the presented techniques compares binaural Ambisonic rendering to direct HRTF rendering numerically, using perceptually motivated spectral difference calculations, auditory cue estimations and localisation prediction models, and perceptually, using listening tests assessing similarity and plausibility. Results conclude that the individual pre-processing techniques produce modest improvements to the high frequency reproduction of binaural Ambisonic rendering, and that using multiple pre-processing techniques can produce cumulative, and statistically significant, improvements.
... As for the audio, because we do not evaluate the audio quality or volume of the streams, we maintain the same audio source (from one camera) for all the simulations, so we will not alter the overall perception of the videos [13] [58]. In practice, this can be achieved by "late-binding" the selected audio source and exposing only that audio source on the HAS manifest. ...
Thesis
Full-text available
Video recording devices are often equipped with sensors (smartphones for example, with GPS receiver, gyroscope etc.), or used in settings where sensors are present (e.g. monitor cameras, in areas with temperature and/or humidity sensors). As a result, many systems process and distribute video together with timed metadata streams, often sourced as User-Generated Content. Video delivery has been thoroughly studied, however timed metadata streams have varying characteristics and forms, thus a consistent and effective way to handle them in conjunction with the video streams does not exist. In this Thesis we study ways to enhance video applications through timed metadata. We define as timed metadata all the non-audiovisual data recorded or produced, that are relevant to a specific time on the media timeline. ”Enhancing” video applications has a double meaning, and this work consists of two respective parts. First, using the timed metadata to extend the capabilities of multimedia applications, by introducing novel functionalities. Second, using the timed metadata to improve the content delivery for such applications. To extend multimedia applications, we have taken an exploratory approach, and we demonstrate two use cases with application examples. In the first case, timed metadata is used as input for generating content, and in the second, it is used to extend the navigational capabilities for the underlying multimedia content. By designing and implementing two different application scenarios we were able to identify the potential and limitations of video systems with timed metadata. We use the findings from the first part, to work from the perspective of enhancing video applications, by using the timed metadata to improve delivery of the content. More specifically, we study the use of timed metadata for multi-variable adaptation in multi-view video delivery - and we test our proposals on one of the platforms developed previously. Our final contribution is a buffering scheme for synchronous and lowlatency playback in live streaming systems.
... We have chosen to mute the sound in all the videos, for both the reference and effect versions, so as to avoid any interfering factor in the comparison. Indeed, Beerends and De Caluwe showed in [5] that the video quality heavily impacts the subjectively perceived audio quality in audiovisual stimulus. Though, reversely, the audio quality impacts the subjectively perceived video quality to a lesser extent, it impacts it nonetheless. ...
Article
Virtual reality videos are an important element in the range of immersive contents as they open new perspectives for story-telling, journalism or education. Accessing these immersive contents through Internet streaming is however much more difficult owing to the required data rates much higher than for regular videos. While current streaming strategies rely on video compression, in this paper we investigate a radically new stance: we posit that degrading the visual quality is not the only choice to reduce the required data rate, and not necessarily the best. Instead, we propose two new impairments, Virtual Walls (VWs) and Slow Downs (SDs), that change the way the user can interact with the 360∘ video in an environment with insufficient available bandwidth. User experiments with a double-stimulus approach show that, when triggered in proper time periods, these impairments are better perceived than visual quality degradation from video compression. We confirm with network simulations the usefulness of these new types of impairments: incorporated into a FoV-based adaptation, they can enable reduction in stalls and startup delay, and increase quality in FoV, even in the presence of substantial playback buffers.
... For realistic experiences of feedback, e.g. from interactive virtual objects, congruence logically depends on the perceived realism of objects themselves, through all sensory modalities. In a study on the perceptual relationship between audio, video and audiovisual quality, results showed that high image quality positively affects the perception of (accompanying) audio quality, and vice versa [25,26]. As the quality of the visual display is steadily increasing, it can be cogently derived that rendering techniques for interfaces targeting other sensory modalities must be paid equal attention and enhancement, to maintain sensory consistency. ...
Conference Paper
Full-text available
This paper describes a novel framework for real-time sonification of surface textures in virtual reality (VR), aimed towards realistically representing the experience of driving over a virtual surface. A combination of capturing techniques of real-world surfaces are used for mapping 3D geometry, texture maps or auditory attributes (aural and vibrotactile) feedback. For the sonification rendering, we propose the use of information from primarily graphical texture features, to define target units in concatenative sound synthesis. To foster models that go beyond current generation of simple sound textures (e.g., wind, rain, fire), towards highly "synchronized" and expressive scenarios, our contribution draws a framework for higher-level modeling of a bicycle's kinematic rolling on ground contact, with enhanced perceptual symbiosis between auditory, visual and vibrotactile stimuli. We scanned two surfaces represented as texture maps, consisting of different features, morphology and matching navigation. We define target trajectories in a 2-dimensional audio feature space, according to a temporal model and morphological attributes of the surfaces. This synthesis method serves two purposes: a real-time auditory feedback, and vibrotactile feedback induced through playing back the concatenated sound samples using a vibrotactile inducer speaker.
... Other research [6] shows biasing in assessments of AV media quality from both audio and visual interactions. The research indicates that, in their study, quality of visual presentation had more impact on assessments of audio quality than quality of audio presentation had on assessments of visual quality. ...
Article
Full-text available
Age demographics have led to an increase in the proportion of the population suffering from some form of hearing loss. The introduction of object-based audio to television broadcast has the potential to improve the viewing experience for millions of hearing impaired people. Personalization of object-based audio can assist in overcoming difficulties in understanding speech and understanding the narrative of broadcast media. The research presented here documents a Multi-Dimensional Audio (MDA) implementation of object-based clean audio to present independent object streams based on object category elicitation. Evaluations were carried out with hearing impaired people and participants were able to personalize audio levels independently for four object-categories using an on-screen menu: speech, music, background effects, and foreground effects related to on-screen events. Results show considerable preference variation across subjects but indicate that expanding object-category personalization beyond a binary speech/non-speech categorization can substantially improve the viewing experience for some hearing impaired people.
... Regarding the input media, the Immersive Methodology recommends to use audio-visual sequences (audio + video components) as stimuli source. This recommendation is made on the grounds that using a video-only (or audio-only) stimuli does not reflect the ways users consume a multimedia service, since a transmission of a video without sound is not likely to occur [9]. Using audio-visual sequences as test material requires that participants rate the overall audio-visual quality, independent of the component under study. ...
Chapter
This section presents a brief description of the perception of audio and video. In general, the perception of a stimulus can be divided roughly into four major steps. This holds not only for audio and video stimuli.
Article
This study investigated the overall subjective impression of audiovisual material. Audio stimuli of varied degradation were coupled with actual loudspeakers of different visual appearance, as well as 1:1 and 1:10 scale photographs of the same set of loudspeakers. Additional unimodal experiments produced a baseline against which the audiovisual evaluation was compared. Results indicate that the influence of the auditory modality dominates the audiovisual evaluation and suggest that in audiovisual subjective evaluations a photograph presentation can be a valid substitute of the actual product. © 2015 Journal of the Audio Engineering Society.
Article
This paper presents a method for improving users' quality of experience through processing of movie soundtracks. The dialogue clarity enhancement algorithms were introduced for detecting dialogue in movie soundtrack mixes and then for amplifying the dialogue components. The front channel signals (left, right, center) are analyzed in the frequency domain. The selected partials in the center channel signal, which yield high disparity between left and right channels, are detected as dialogue. Subsequently, the dialogue frequency components are boosted to achieve an increased dialogue intelligibility. Techniques for reduction of artifacts in the processed signal are also introduced. It is done through smoothing in the time domain and in the frequency domain, applied to reduce unpleasant artifacts. The results of objective and subjective tests are provided, which prove that an increased dialogue intelligibility is achieved with the aid of the proposed algorithm. The algorithm is particularly applicable in mobile devices while listening in changing conditions and in the presence of noise.
Article
The paper presents a concept of multi-modal evaluation of quality for measuring high-definition multimedia degraded from a network or other impairments. Sophisticated audio and video metric algorithms enhanced with a structure detector are implemented. The detector reveals the Regions Of Interest known to be importantly and impacting the overall quality score in case of degradation. This evaluator is a useful tool in investigating the interaction between the different modal streams perceived by the end-user.
Article
Extant research on online user-generated content, especially product reviews, has consistently examined the effects of textual reviews on consumers and has ignored an emerging and popular review format such as product review video (PRV) on YouTube. Specifically, no research exists that suggests how to develop an effective PRV. The present article addresses this gap by examining the effects of three important attributes of PRVs – review depth, review frame, and review disposition, on consumers' attitude toward the PRV and their propensity to share it. Technical quality of the video is included as a moderator. A between-subjects experiment was conducted with a sample of Internet users. The findings suggest that PRVs are most effective when the review depth is moderate, the review is comparative in nature, and it highlights product benefits instead of attributes. Technical quality positively moderates these afore-mentioned relationships.
Chapter
This chapter reviews the existing objective QoE methodologies and provides a taxonomy of objective quality metrics that may be grouped using the characteristics of the human visual system and the availability of the original signal.
Chapter
Audiovisual communication has expanded rapidly over the last years on computers and mobile devices. This chapter discusses the key aspects of Quality of Experience of the audiovisual communication. We will give an introduction to audiovisual communication and explain technical elements and perceptual features which relate to the Quality of Experience. Main subjective and instrumental quality assessment methods will be presented. Finally, we specifically discuss a few key aspects impacting quality, namely, time-varying quality perception, audiovisual quality integration as well as the impact of overall delay and audiovisual synchrony, and give an outlook for future work.
Chapter
There has been increasing interest in visual quality assessment (VQA) during recent years. Of all these VQA methods, machine learning (ML) based ones became more and more popular. In this book, ML-based VQA and related issues have been extensively investigated. Chapters 1–2 present the fundamental knowledge of VQA and ML. In Chap. 3, ML was exploited for image feature selection and image feature learning. Chapter 4 presents two ML-based frameworks for pooling image features of an image into a number score. In Chap. 5, two metric fusion frameworks designed to combine multiple existing metrics into a better one, were developed by the aid of ML tools.
Article
Media consumption in broadcasting is heading towards high degrees of content personalization also in audio thanks to next-generation audio systems. It is thus crucial to assess the benefit of personalized media delivery. To this end, the adjustment/satisfaction test was recently proposed. This is a perceptual test where subjects interact with a user-adjustable system and their adjustments and the resulting satisfaction levels are studied. Two configurations of this test paradigm are implemented and compared for the evaluation of Dialogue Enhancement (DE). This is an advanced broadcast service which enables the personalization of the relative level of the dialog and the background sounds. The test configuration closer to the final application is found to provide less noisy data and to be more conclusive about the quality of experience. For this configuration, DE is tested both in the case in which the original audio objects are readily available and in the case in which they are estimated by blind source separation. The results show that personalization is extensively used, resulting in increased user satisfaction, in both cases.
Chapter
Most of existing visual quality assessment databases are created in controlled conditions where the experimental environments are always kept silent. However, the practical viewing environments often contain diverse environmental sounds. It is our daily experience that different sounds (e.g. chatter, honk and music) can affect our emotions, hence influencing our perceptions of images. So, there is a gap between visual quality under environmental sounds and existing researches of visual quality. Therefore, in this paper, we perform subjective quality evaluations with different types and volumes of environmental sounds. We build a rigorous experimental system to control various conditions of environmental sounds and construct the environmental sound–image database. Afterwards, the influence of environmental sounds on perceived visual quality are analysed from four perspectives: sound categories, sound volumes, distortion levels of images, and image contents.
Thesis
Full-text available
Subjective quality assessment of multi-modal services depends on a number of external factors that affect the final judgment, e.g. user expectations, user fatigue, room environment or methodology used in the evaluation process. In order to obtain as accurate as possible measurement of the perceived quality, an experimenter should carefully consider all factors contributing to the overall experience. Particularly important is to choose the right measurement method for the purpose of a specific task. In spite of the fact that a number of standardized test procedures for the quality assessment exist it is not always possible to find the one which suits a certain research purpose. In such case, the development of new assessment techniques is usually necessary, which then needs to be followed by appropriate testing and validation procedures. The lack of an appropriate methodology for instantaneous measurement of user’s audio-visual quality expectations/preferences over extended periods of time, as well as the scarce attention devoted to the topic of temporal development of Quality of Experience, were the main driving factors for this work. This dissertation, composed of five papers, helps to understand the underlying attributes of perceived quality and user cognitive processes used in evaluation of long duration audiovisual content. The work described here is twofold: firstly, a novel methodology for continuous quality evaluation is proposed, and secondly, using the method, the effect of the time dimension on user’s behavioral reaction to the experienced quality is investigated. The momentary-based approach described in this work reflects instantaneous measures of users’ quality judgements. Such measures allow capturing time varying changes of system characteristics and help to contribute to the holistic vision of quality of experience. The results obtained from several experiments described in this work reveal the importance of content duration in the process of quality assessment and its impact on user’s quality requirements. The knowledge gained from those studies can be directly applied to the quality assurance models of multimedia content providers and may serve as a valuable source of information for objective quality metrics development.
Chapter
The proliferation of video cameras, such as those embedded in smartphones and wearable devices, has made it increasingly easy for users to film interesting events (such as public performance, family events, and vacation highlights) in their daily lives. Moreover, often there are multiple cameras capturing the same event at the same time, from different views. Concatenating segments of the videos produced by these cameras together along the event time forms a video mashup, which could depict the event in a less monotonous and more informative manner. It is, however, inefficient and costly to manually create a video mashup. This chapter aims to introduce the problem of automated video mashup to the readers, survey the state-of-the-art research work in this area, and outline the set of open challenges that remain to be solved. It provides a comprehensive introduction to practitioners, researchers, and graduate students who are interested in the research and challenges of automated video mashup.
Conference Paper
Full-text available
This paper presents the results from three lab-based studies that investigated different ways of delivering Mobile TV News by measuring user responses to different encoding bitrates, Image resolutions and text quality. All studies were carried out with participants watching News content on mobile devices, with a total of 216 participants rating the acceptability of the viewing experience. Study 1 compared the acceptability of a 15-second video clip at different video and audio encoding bit rates on a 3G phone at a resolution of 176x144 and an iPAQ PDA (240x180). Study 2 measured the acceptability of video quality of full feature news clips of 2.5 minutes which were recorded from broadcast TV, encoded at resolutions ranging from 120x90 to 240x180, and combined with different encoding bit rates and audio qualities presented on an iPAQ. Study 3 improved the legibility of the text included in the video simulating a separate text delivery.The acceptability of News' video quality was greatly reduced at a resolution of 12000. The legibility of text was a decisive factor in the participants' assessment of the video quality. Resolutions of 168x126 and higher were substantially more acceptable when they were accompanied by optimized high quality text compared to proportionally scaled inline text. When accompanied by high quality text TV news clips were acceptable to the vast majority of participants at resolutions as small as 168x126 for video encoding bitrates of 160kbps and higher. Service designers and operators can apply this knowledge to design a cost-effective mobile TV experience.
Chapter
Artistic renditions are mediated by the performance rooms in which they are staged. The perceived egocentric distance to the artists and the perceived room size are relevant features in this regard. The influences of both the presence and the properties of acoustic and visual environments on these features were investigated. Recordings of music and a speech performance were integrated into direct renderings of six rooms by applying dynamic binaural synthesis and chroma-key compositing. By the use of a linearized extraaural headset and a semi-panoramic stereoscopic projection, the auralized, visualized, and auralized-visualized spatial scenes were presented to test participants who were asked to estimate the egocentric distance and the room size. The mean estimates differed between the acoustic and the visual as well as between the acoustic-visual and the combined single-domain conditions. Geometric estimations in performance rooms relied upon nine-tenths on the visual, and one-tenth on the acoustic properties of the virtualized spatial scenes, but negligibly on their interaction. Structural and material properties of rooms may also influence auditory-visual distance perception.
Chapter
Similarly to the study that was presented in the previous chapter, user perception of system dynamics that might occur during a video call in heterogeneous wireless networks are addressed in this chapter. The presented work is based on several user tests that were designed to analyse the Quality of Experience in wireless networks, such as WiFi and HSDPA, where either robust or high fidelity video codecs are available and whose encoding bit rate can be adjusted. In particular, it is analysed how the effect of switching between those networks, codecs, and bit rates is perceived during an ongoing video call and whether those techniques can be used to adapt the transmission quality. Similarly to the study that was presented in the previous chapter, the results are used to draw guidelines for a perception-based mobility management, and to analyse how well the existing quality predictions models can estimate the perceived quality in the context of time-varying transmission that is expected in heterogeneous wireless networks.
Article
Several recent studies have suggested that ordinary stereophonic systems are not sufficient for HDTV use. These investigations have noted the localization error between picture and sound as a reason, but have not studied the extent to which this phenomenon causes viewer annonyance. The psychological experiment described was designed to investigate the acceptable extent of angular displacement between visual and auditory images for on-axis viewing. The results show that it is about 11° for acoustic engineers and 20° for the members of the general audience. In addition, the number of frontal channels in HDTV is discussed.
Article
Subjects were presented with a film and its soundtrack through apparatus which enabled asynchrony between picture and sound to be increased. It was found that asynchrony is more easily detected when sound precedes picture, and for a hammer hitting a peg than for someone speaking. These preliminary results suggest that we learn to tolerate the asynchrony between hearing and vision produced by the slower transmission of sound than of light.
Article
Previous research has shown that perceivers naturally integrate auditory and visual information in face-to-face speech perception. Two experiments were carried out to study whether integration would be disrupted by differences in the stimulus onset asynchrony (SOA), the temporal arrival of the two sources of information. Synthetic visible and natural and synthetic auditory syllables off/ba/, /va/, [symbol: see text]a/, and /da/ were used in an expanded factorial design to present all possible combinations of the auditory and visual syllables, as well as the unimodal syllables. The fuzzy logical model of perception (FLMP), which accurately describes integration, was used to measure the degree to which integration of audible and visible speech occurred. These findings provide information about the temporal window of integration and its apparent dependence on the range of speech events in the test.
The Detection of Audidards Institute
  • N F Dixon
  • L Spitz
N. F. Dixon and L. Spitz, "The Detection of Audidards Institute, New York (1996 May).
Telephone Transmission and Vision Signals in Television
  • Itu-T Rec
the Subjective Effects of Timing Errors between Sound [18] ITU-T Rec. P.800, "Telephone Transmission and Vision Signals in Television" (1995 Nov.). Quality Subjective Opinion Tests," International Telecom
The Influence of Audio on Perceived munications Union
  • S Ribs
S. Ribs, "The Influence of Audio on Perceived munications Union, Geneva, Switzerland (1996 Aug.).
Pulse Code Modulation Visual and Auditory Speech
  • Smeele
Smeele, "Perception of Asynchronous and Conflicting [20] ITU-T Rec. G.711, "Pulse Code Modulation Visual and Auditory Speech," J. Acoust. Soc. Am., vol. (PCM) of Voice Frequencies," International Telecom100, pp. 1777-1786 (1980 Sept.). munications Union, Geneva, Switzerland (1988).
  • Desynchrony Tory Visual
tory Visual Desynchrony," Perception, vol. 9, pp. 719[17] ITU-T Rec. P.930, "Principles of a Reference 721 (1980).
The Influence of Audio on Perceived Picture Quality and Subjective Audio-Video Delay Tolerance
  • S Rihs
S. Rihs, "The Influence of Audio on Perceived Picture Quality and Subjective Audio-Video Delay Tolerance," in "RACE MOSAIC deliverable no: R2111 80CESR007.B1" (1995 June), chap. 13.
Television Sound and Viewer Perceptions
  • W R Neuman
  • A N Crigler
  • V M Bove
W. R. Neuman, A. N. Crigler, and V. M. Bove, "Television Sound and Viewer Perceptions," in Proc. AES 9th Int. Conf. (Detroit, MI, 1991 Feb.), pp. 101104.
Objective Performance Assessment: Video Quality as an Influence on Audio Perception
  • M P Hollier
  • R Voelcker
M. P. Hollier and R. Voelcker, "Objective Performance Assessment: Video Quality as an Influence on Audio Perception," presented at the 103rd Convention of the Audio Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. 45, p. 1022 (1997 Nov.), preprint 4590.
Listening Tests on Loudspeakers
IEC 268-13, "Sound System Equipment, Part 13: Listening Tests on Loudspeakers," International Electrotechnical Commission, Geneva, Switzerland (1985).
Methodology for the Subjective Assessment of the Quality of Television Pictures
  • Itu-R Rec
  • Bt
ITU-R Rec. BT.500-6, "Methodology for the Subjective Assessment of the Quality of Television Pictures," International Telecommunications Union, Geneva, Switzerland (1994).
Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide-Screen 16:9 Aspect Ratios
  • Itu-R Rec
  • Bt
ITU-R Rec. BT.601-5, "Studio Encoding Parameters of Digital Television for Standard 4:3 and Wide-Screen 16:9 Aspect Ratios," International Telecommunications Union, Geneva, Switzerland (1995).
Principles of a Reference Impairment System for Video
  • . P Itu-T Rec
ITU-T Rec. P.930, "Principles of a Reference Impairment System for Video," International Telecommunications Union, Geneva, Switzerland (1996).
Telephone Transmission Quality Subjective Opinion Tests
  • P Itu-T Rec
ITU-T Rec. P.800, "Telephone Transmission Quality Subjective Opinion Tests," International Telecommunications Union, Geneva, Switzerland (1996 Aug.).
Subjective Video Quality Assessment Methods for Multimedia Applications
  • . P Itu-T Rec
ITU-T Rec. P.910, "Subjective Video Quality Assessment Methods for Multimedia Applications," International Telecommunications Union, Geneva, Switzerland (1996 Aug.).
Pulse Code Modulation (PCM) of Voice Frequencies
  • . G Itu-T Rec
ITU-T Rec. G.711, "Pulse Code Modulation (PCM) of Voice Frequencies," International Telecommunications Union, Geneva, Switzerland (1988).
7 kHz Audio-Coding within
  • . G Itu-T Rec
ITU-T Rec. G.722, "7 kHz Audio-Coding within