Conference Paper

Eye gaze correction for videoconferencing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper describes a 2D videoconferencing system with eye gaze correction. Tracking the eyes and warping the eyes appropriately each frame appears to create natural eye contact between users. The geometry of the eyes as well as the displacement of the camera with the remote user's image determines the warp. We implement this system within software, not requiring any specialized hardware.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Andersson presents a simplified idea of the previous work that reorients the eyes by only shifts iris and eyelids in the 2D camera image (Andersson, 1997). Jerald & Daily (2002) also shift the iris but apply a non-linear warp. However, their work requires prior calibration to determine the maximum iris offset. ...
... While such a system makes spontaneous videoconferencing impossible, it also shows problems with rapid head movements in that they lead to an artificial experience due to poor visual results. Synthesizing videoconferencing images is a vivid area of research (Ott, Lewis, & Cox, 1993) that range from one or multiple camera systems that artificially replace the eye gaze in the video stream (Jerald & Daily, 2002;Gemmel et al., 2000;Tsai, Kao, Hung, & Shih, 2004;Schreer et al., 2008;Vertegaal, Weevers, Sohn, & Cheung, 2003) to systems that combine a half-silvered mirror with multiple cameras for multi-party videoconferencing (Vertegaal et al., 2003). ...
Article
Full-text available
Videoconferencing allows geographically dispersed parties to communicate by simultaneous audio and video transmissions. It is used in a variety of application scenarios with a wide range of coordination needs and efforts, such as private chat, discussion meetings, and negotiation tasks. In particular, in scenarios requiring certain levels of trust and judgement non-verbal communication, cues are highly important for effective communication. Mutual gaze support plays a central role in those high coordination need scenarios but generally lacks adequate technical support from videoconferencing systems. In this paper, we review technical concepts and implementations for mutual gaze support in videoconferencing, classify them, evaluate them according to a defined set of criteria, and give recommendations for future developments. Our review gives decision makers, researchers, and developers a tool to systematically apply and further develop videoconferencing systems in "serious" settings requiring mutual gaze. This should lead to well-informed decisions regarding the use and development of this technology and to a more widespread exploitation of the benefits of videoconferencing in general. For example, if videoconferencing systems supported high-quality mutual gaze in an easy-to-set-up and easy-to-use way, we could hold more effective and efficient recruitment interviews, court hearings, or contract negotiations.
... A very recent development is that of systems that morph video images taken from different angles in order to produce an accurate representation of gaze direction [14,32,18]. Although this is a promising approach, the main problem with these systems has been the computational complexity associated with realistic image morphing. ...
... Yang and Zhang [32] describe a system that finds and warps the eyes in the user's video image to correct visual parallax. However, authors report it is difficult to achieve the correct warp equation for all desired angles, given any initial angle of the user's eyes in a video image [18]. ...
Conference Paper
Full-text available
GAZE-2 is a novel group video conferencing system that uses eye-controlled camera direction to ensure parallax-free transmission of eye contact. To convey eye contact, GAZE-2 employs a video tunnel that allows placement of cameras behind participant images on the screen. To avoid parallax, GAZE-2 automatically directs the cameras in this video tunnel using an eye tracker, selecting a single camera closest to where the user is looking for broadcast. Images of users are displayed in a virtual meeting room, and rotated towards the participant each user looks at. This way, eye contact can be conveyed to any number of users with only a single video stream per user. We empirically evaluated whether eye contact perception is affected by automated camera direction, which causes angular shifts in the transmitted images. Findings suggest camera shifts do not affect eye contact perception, and are not considered highly distractive.
... The issue of whether video adds value to audio at all is a classic problem, and a recent study has found that video of content is important for all business needs video of content is needed but not necessarily video of people [96]. When people are considered to be important, much attention has been paid to exotic solutions that preserve eye gaze, usually by warping the image of the eyes [49] but sometimes by changing the interface itself [107]. Beyond gaze, preserving spatiality has also been explored [42,102]. ...
Preprint
Effective meetings are effortful, but traditional videoconferencing systems offer little support for reducing this effort across the meeting lifecycle. Generative AI (GenAI) has the potential to radically redefine meetings by augmenting intentional meeting behaviors. CoExplorer, our novel adaptive meeting prototype, preemptively generates likely phases that meetings would undergo, tools that allow capturing attendees' thoughts before the meeting, and for each phase, window layouts, and appropriate applications and files. Using CoExplorer as a technology probe in a guided walkthrough, we studied its potential in a sample of participants from a global technology company. Our findings suggest that GenAI has the potential to help meetings stay on track and reduce workload, although concerns were raised about users' agency, trust, and possible disruption to traditional meeting norms. We discuss these concerns and their design implications for the development of GenAI meeting technology.
... In order to reduce the algorithmic complexity, pure monocular image-based approaches were proposed for approximate eye contact correction. In [JD02], a local image manipulation within the eye region was introduced. Based on eye tracking, the relevant image parts that need to be warped for eye contact correction are identified. ...
Thesis
Das Problem des fehlenden Augenkontaktes vermindert den Eindruck einer natürlichen Kommunikationssituation bei Videokonferenzen. Während eine Person auf den Bildschirm blickt, wird sie von Kameras aufgenommen, die sich normalerweise direkt daneben befinden. Mit dem Aufkommen von massiv paralleler Computer Hardware und ganz speziell den sehr leistungsstarken Spielegrafikkarten ist es möglich geworden, viele Eingabeansichten für eine Echtzeit 3D Rekonstruktion zu verarbeiten. Eine größere Anzahl von Eingabeansichten mildert Verdeckungsprobleme ab und führt zu vollständigeren 3D Daten. In dieser Arbeit werden neue Algorithmen vorgeschlagen, welche eine hochqualitative Echtzeit 3D Rekonstruktion, die kontinuierliche Anpassung der photometrischen Kameraparameter und die benutzerunabhängige Schätzung der Augenkontaktkameras ermöglichen. Die Echtzeit 3D Analyse besteht aus zwei komplementären Ansätzen. Einerseits gibt es einen Algorithmus, der auf der Verarbeitung geometrischer Formen basiert und auf der anderen Seite steht eine patchbasierte Technik, die 3D Hypothesen durch das Vergleichen von Bildtexturen evaluiert. Zur Vorbereitung für die Bildsynthese ist es notwendig, Texturen von verschiedenen Ansichten anzugleichen. Hierfür wird die Anwendung eines neuen Algorithmus zur kontinuierlichen photometrischen Justierung der Kameraparameter vorgeschlagen. Die photometrische Anpassung wird iterativ, im Wechsel mit einer 3D Registrierung der entsprechenden Ansichten, ausgeführt. So ist die Qualität der photometrischen Parameter direkt mit jener der Ergebnisse der 3D Analyse verbunden und vice versa. Eine weitere wichtige Voraussetzung für eine korrekte Synthese der Augenkontaktansicht ist die Schätzung einer passenden virtuellen Augenkontaktkamera. Hierfür wird die Augenkontaktkamera kontinuierlich an die Augenposition der Benutzer angeglichen. Auf diese Weise wird eine virtuelle Kommunikationsumgebung geschaffen, die eine natürlichere Kommunikation ermöglicht.
... However, these systems often correct gaze misalignment over small angles. Similarly, work done by Jason Jerald and Mike Daily [13] involves real-time tracking and warping of eyes using machine learning algorithms appropriately on each frame giving a feeling of natural eye contact between the interacting participants. Such solutions are applicable only in bipartite video conferencing solutions. ...
Article
Full-text available
Existing live tele-teaching systems enable eye-contact between interacting participants, however, they are often incomplete as they neglect finer levels of adherence to gaze such as gaze awareness and gaze following. A multilocation eLearning classroom setting often does not preserve relative neighborhood i.e., displays showing videos of remote participants at each location might not be congruent with their actual seating positions. This leads to incoherent gaze patterns during interactions. We present a media-rich distributed classroom architecture with multiple cameras and displays in each classroom. During interaction changes, cameras capturing appropriate perspectives of participants are streamed to displays in other classrooms. Hence for all interactions, the physical participants of a classroom are presented with appropriate perspectives of remote participants resembling gaze patterns during conventional-classroom interactions. We also present a framework to systematically analyze gaze patterns with its dependencies. The framework dictates optimal placement of media devices ensuring minimal deviation in capturing appropriate perspectives for a given set of resources. Evaluation results on a three classroom test-bed indicates a marked reduction in viewer cognitive load in discerning the entity-at-focus in an eLearning classroom environment.
... Vetter [1998] maps a single image of a face on a generic 3D model of a human head. Similarly, Jerald and Daily [2002] warp the imagery of the eyes within a single captured video frame so that the eyes appear to be looking at the camera, even though the head is turned away. Gemmell et al. [2000] texture map the face onto a 3D head model and furthermore replace the eyes with newly synthesized ones with corrected gaze. ...
Thesis
Conventional video conferencing (e.g. Skype with a webcam) suffers from some fundamental flaws that keep it from attaining a true sense of immersivity and copresence and thereby emulating a real face-to-face conversation. Not in the least does it not allow its users to look directly into each other’s eyes. The webcam is usually set up next to the screen or at best integrated into the bezel. This forces the user to alternate his gaze between looking at the screen to observe his remote conferencing partner and looking into the webcam. It is this conflict between both viewing directions that stands in the way of experiencing true eye contact. This issue of missing eye contact is the central problem to solve in this dissertation. An Image-Based Approach We opt for an image-based approach to solving our problem, meaning that we synthesize an eye gaze corrected image from real-world (live) captured images. In the newly reconstructed image, the user’s gaze will be corrected and thus the conflict between viewing directions will no longer be present. By using live imagery, we avoid the more artificial look and feel of many previous solutions that employ model-based reconstructions or avatar-based representations. Specifically, we investigate three main view synthesis algorithms to reach our goal. This results in contributions to environment mapping, disparity estimation from rectified stereo, and plane sweeping. By designing and implementing all algorithms for and on the GPU exclusively, we take advantage of its massive parallel processing capabilities and guarantee real-time performance and future-proof scalability. This strategy of exploiting the GPU for general – non-graphical – computations is known as general-purpose GPU (GPGPU) computing. Although developed here to correct eye gaze in video conferencing, our algorithms are more generally applicable to any type of scene and usage scenario. Four Prototypes We develop four different system prototypes, with each prototype relying on its specific (combination of) view synthesis algorithm(s) to reconstruct the eye gaze corrected image. Each view synthesis algorithm is enabled by a specific configuration of the capturing cameras, allowing us to arrange and present the prototypes according to increasing physical complexity of their camera setup.Maintaining Camera Calibration Maintaining the calibration of those cameras, however, may pose a challenge for a prototype that can be subject to a lot of dynamic user activity. Therefore, we first develop an efficient algorithm to detect camera movement and to subsequently reintegrate a single displaced camera into an a priori calibrated network of cameras. Assuming the intrinsic calibration of the displaced camera remains known (physical movement is reflected in the extrinsic parameters), we robustly recompute its extrinsic calibration as follows. First, we compute pairs of essential matrices between the displaced camera and its neighboring cameras using image point correspondences. This provides us with an estimate of a local coordinate frame for each camera pair, with each pair related to the real world coordinates up to a similarity transformation. From all these estimates, we deduce a (mean) rotation and (intersecting) translation in the common coordinate frame of the previously fully calibrated system. Unlike other approaches, we do not explicitly reconstruct any 3D scene structure, but rely solely on image-space correspondences. We achieve a reprojection error of less than a pixel, comparable to state-of-the-art (de-)centralized network recalibration algorithms. Prototype 1: Environment Remapping Our first prototype is immediately our most outside-of-the-box solution. It requires only the bare minimum of capturing cameras, namely a single one, together with a single projector for display. Drawing inspiration from the field of environment mapping, we capture omnidirectional video (in other words, the environment) by filming a spherical mirror (the northern hemisphere) and combine this – after a remap of the captured image – with projection on an identically-shaped spherical screen (the southern hemisphere). Both hemispheres are combined into a single full sphere, forming a single communication device that allows to capture from the top and display at the bottom. The unconventional novelty lies in the observation that we do not perform image interpolation in the traditional sense, but rather compose an eye gaze corrected image by remapping the captured environment pixel-to-pixel. We develop the mathematical equations that govern this image transformation by mapping the captured input to the projected output, both interpreted as parallel rays of light under an affine camera model. The resulting equations are completely independent of the scene structure and do not require the recovery of the depth of scene. Consequently, they have to be precomputed only once, which allows for an extremely lightweight implementation that easily operates in real-time on any contemporary GPU and even CPU. Unfolding the environmental reflection captured on a (relatively small) specular sphere yields omnidirectional imagery with a projection center located at the center of that sphere. Consequently, the user looks directly into the camera when looking at the center of the sphere and eye contact is inherently guaranteed. Moreover, the prototype effortlessly supports multiple users simultaneously, unveils their full spatial context and offers them an unprecedented freedom of movement. Its main drawback, however, is the image quality. It is severely diminished by limitations of the mathematical model and off-the-shelf hardware components.Edge-Sensitive Disparity Estimation with Iterative Refinement Our second prototype, which we will present in a moment, relies heavily on our novel algorithm for accurate disparity estimation. We make three main contributions. First, we present a matching cost aggregation method that uses two edge-sensitive shapeadaptive support windows per pixel neighborhood. The windows are defined such that they cover image patches of similar color; one window follows horizontal edges in the image, the other vertical edges. Together they form the final aggregation window shape that closely follows all object edges and thereby achieves increased disparity hypothesis confidence. Second, we formalize an iterative process to further refine the estimated disparity map. It consists of four well-defined stages (cross-check, bitwise fast voting, invalid disparity handling, median filtering) and primarily relies on the same horizontal and vertical support windows. By assuming that color discontinuity boundaries in the image are also depth discontinuity boundaries in the scene, the refinement is able to efficiently detect and fill in occlusions. It only requires the input color images as prior knowledge, can be applied to any initially estimated disparity map and quickly converges to a final solution. Third, next to improving the cost aggregation and disparity refinement, we introduce the idea of restricting the disparity search range itself. We observe that peaks in the disparity map’s histogram indicate where objects are located in the scene, whereas noise with a high probability represents mismatches. We derive a two-pass hierarchical method, where, after analyzing the histogram at a reduced image resolution, all disparity hypotheses for which the histogram bin value does not reach a dynamically determined threshold (proportional to the image resolution or the histogram entropy) are excluded from the disparity search range at the full resolution. Constructing the low-resolution histogram is relatively cheap and in turn the potential to simultaneously increase the matching quality and decrease the processing complexity (of any local stereo matching algorithm) becomes very high. Implementation is done in CUDA, a modern GPU programming paradigm that exposes the hardware as a massive pool of directly operable parallel threads and that maps very well to scanline-rectified pixel-wise algorithms. On contemporary hardware, we reach real-time performance of about 12 FPS for the standard resolution (450 375) of the Middlebury dataset. Our algorithm is easy to understand and implement and generates smooth disparity maps with sharp object edges and little to no artifacts. It is very competitive with the current stateof- the-art of real-time local stereo matching algorithms. Prototype 2: Stereo Interpolation Our second prototype turns to rectified stereo interpolation. We mount two cameras around the screen, one to the left and one to the right, and let the user be seated in the horizontal middle. We then interpolate the intermediate (and thus eye gaze corrected) viewpoint by following (and extending) the depth-image-based rendering (DIBR) pipeline. This pipeline essentially consists of a disparity estimation and view synthesis stage. The view synthesis is straightforward and very lightweight, but relies heavily on accurate disparity estimation to correctly warp the input pixels to the intermediate viewpoint.On the one hand, the prototype is able to synthesize an eye gaze corrected image that contains very sharp and clearly discernible eyes. On the other hand, its reliance on stereo matching also gives rise to its biggest disadvantages. First, the user is restricted to move on the horizontal baseline between the left and right cameras, which causes eye contact to be difficult to maintain. Second, the small baseline preference of dense stereo matching forces us to either place the cameras around a smaller screen or assume a larger user-to-screen distance to avoid too large occlusions. Prototype 3: Plane Sweeping Our third prototype aims to overcome these shortcomings by mounting six cameras closely around the screen on a custom-made lightweight metal frame. The more general camera configuration avoids large occlusions, but, as such a configuration is no longer suitable for rectified stereo, we must turn to plane sweeping to interpolate the eye gaze corrected image. The flexible plane sweeping algorithm allows us to reconstruct any freely selectable viewpoint, without the need of image extrapolation. Combined with a concurrently running eye tracker to determine the user’s viewpoint, this ensures that eye contact is maintained at all times and from any position and angle. A number of carefully considered design and implementation choices ensures over realtime performance of about 40 FPS for the SVGA resolution (800 600) without noticeable loss of visual quality, even on low-end hardware. First, from our strategy for disparity range restriction, we devise a method to efficiently keep a uniform distribution of planes focused around a single dominant object-of-interest (e.g. the user’s head and torso) as it moves through the scene. A Gaussian fit on the histogram of the depth map will indicate the depth (mean) and extent (standard deviation) of the object. We can use this to retroactively respond to movements of the object by dynamically shifting a condensed set of planes back and forth, instead of sweeping the entire space with a sparser distribution. This not only leverages the algorithmic performance, but also implicitly increases the accuracy of the plane sweep by significantly reducing the chance at mismatches. Second, we present an iterative spatial filter that removes photometric artifacts from the interpolated image. It does so by detecting and correcting geometric outliers in the jointly linked depth map that is assumed to be locally linear. Third, we use OpenGL and Cg to reprogram the GPU vertex and fragment processing stages of the traditional graphics rendering pipeline, which better suits the inherent structure and scattered memory access patterns of plane sweeping. We even further improve the end-toend performance by developing granular optimization schemes that map well to the polygonbased processing of the traditional graphics pipeline. Finally, a fine-tuned set of user-independent parameters grants the system a general applicability. The result is a fully functional prototype for close-up one-to-one eye gaze corrected video conferencing that has a minimal amount of constraints, is intuitive to use and is very convincing as a proof-of concept. Prototype 4: Immersive Collaboration Environment Our fourth and final prototype is realized after recognizing that current tools for computer-supported cooperative work (CSCW) suffer from two major deficiencies. First, they do not allow to observe the body language, facial expressions and spatial context of the (remote) collaborators. Second, they miss the ability to naturally and synchronously manipulate objects in a shared environment. We solve these issues by integrating our plane sweeping algorithm for eye gaze correction into an immersive environment that supports collaboration at a distance. In doing so, we identify and implement five fundamental technical requirements of the ultimate collaborative environment, namely dynamic image-based modeling, subsequent reconstruction and correction for rendering, a spatially immersive display, cooperative surface computing, and aural communication. We also propose our last adaptation of the plane sweeping algorithm to efficiently interpolate a complex scene that contains multiple dominant depths, e.g. when multiple users are present in the environment. This time, we interpret the cumulative histogram of the depth map as a probability density function that describes the likelihood that a plane should be positioned at a particular depth in the scene. The result is a non-uniform plane distribution that responds to a redistribution of any and all content in the scene. Our final prototype truly brings together many key research areas that have been the focus of our institute as a whole over the past years: view interpolation for free viewpoint video, calibration of camera networks, tracking, omnidirectional cameras, multi-projector immersive displays, multi-touch interfaces, and audio processing. Seven Evaluated Requirements From practical experience with our prototypes, we learn that other factors besides eye contact contribute to attaining a true sense of immersivity and copresence in video conferencing. Seven constantly recurring requirements have been identified: eye contact (and the related gaze awareness), spatial context, freedom of movement, visual quality, algorithmic performance, physical complexity, and communication modes (one-to-one, many-to-many, multi-party). We discover that they are subject to many trade-offs and interdependencies as we use them to (informally) evaluate and compare all our prototypes. A concise sociability study not only points toward the importance of the seven requirements, but also validates our initial preference for image-based methods. However, to arrive at the ideal video conferencing solution, more insight should be gained into the concept of presence, what it means to experience a virtual telepresence and exactly what factors enable this experience. Nevertheless, we believe that the seven requirements provide a reference framework around the experience gained in this dissertation on which to design, develop and evaluate any future solution to eye gaze corrected video conferencing.
... Other systems (e.g., Triesch, Sullivan, Hayhoe, & Ballard, 2002) take advantage of the visual suppression during saccades to update graphical displays without the user noticing. Yet another rather novel use is tracking the point-of-regard during video-conferencing, and warping the image of the eyes so that they maintain eye contact with other participants in the meeting (Jerald & Daily, 2002). ...
Chapter
Full-text available
Eye-movement tracking is a method that is increasingly being employed to study usability issues in HCI contexts. The objectives of the present chapter are threefold. First, we introduce the reader to the basics of eye-movement technology, and also present key aspects of practical guidance to those who might be interested in using eye tracking in HCI research, whether in usability-evaluation studies, or for capturing people's eye movements as an input mechanism to drive system interaction. Second, we examine various ways in which eye movements can be systematically measured to examine interface usability. We illustrate the advantages of a range of different eye-movement metrics with reference to state-of-the-art usability research. Third, we discuss the various opportunities for eye-movement studies in future HCI research, and detail some of the challenges that need to be overcome to enable effective application of the technique in studying the complexities of advanced interactive-system use.
... Only if someone looks directly at the camera will the image of the person on screen seem to look at the viewers. Gemmel, Toyama, Zitnick, Kang & Seitz (2000), as well as Jerald & Daily (2002), manipulated the real-time video image by rendering a modified image of the eyes upon the original video image. The idea was that after the manipulation, the eyes seemed to look at the correct direction, creating an illusion of eye contact. ...
Article
Full-text available
Interactive applications that make use of eye tracking have traditionally been based on command-and-control. Applications that make more subtle use of eye gaze have recently become increasingly popular in the domain of attentive interfaces that adapt their behaviour based on the visual attention of the user. We provide a review of the main systems and application domains where this genre of interfaces has been used.
Article
Videoconferencing has become a ubiquitous medium for collaborative work. It does suffer however from various drawbacks such as zoom fatigue. This paper addresses the quality of user experience by exploring an enhanced system concept with the capability of conveying gaze and attention. Gazing Heads is a round-table virtual meeting concept that uses only a single screen per participant. It enables direct eye contact, and signals gaze via controlled head rotation. The technology to realise this novel concept is not quite mature though, so we built a camera-based simulation for four simultaneous videoconference users. We conducted a user study comparing Gazing Heads with a conventional “Tiled View” video conferencing system, for 20 groups of 4 people, on each of two tasks. The study found that head rotation clearly conveys gaze and strongly enhances the perception of attention. Measurements of turn-taking behaviour did not differ decisively between the two systems (though there were significant differences between the two tasks). A novel insight in comparison to prior studies is that there was a significant increase in mutual eye contact with Gazing Heads, and that users clearly felt more engaged, encouraged to participate and more socially present. Overall, participants expressed a clear preference for Gazing Heads. These results suggest that fully implementing the Gazing Heads concept, using modern computer vision technology as it matures, could significantly enhance the experience of videoconferencing.
Article
In video conferences, one user can either look at the remote image of the other captured by a video camera, or look at the camera that is capturing her/him, but not two tasks at the same time. The lack of eye contact caused by this misalignment substantially reduces the effectiveness of communication with an unpleasant feeling of disconnectedness. We propose an approach to bring eye contact back while user looks at the remote image of the other by a novel system composed by a Time-of-Flight depth sensor and traditional stereo. The key success of this system is to faithfully recover scene’s depth. In this 2.5D space, the controlling of the user’s eye-gaze becomes relatively easier. To evaluate the performance of the system, we conducted two user studies. One focuses on subjects have been trained to be familiar to eye gaze displayed in images; another is blind evaluation that subjects have no prior knowledge about eye gaze. Both evaluations show that the system can bring desktop participants closer to each other.
Conference Paper
In traditional video conferencing systems, it is impossible for users to have eye contact when looking at the conversation partner’s face displayed on the screen, due to the disparity between the locations of the camera and the screen. In this work, we implemented a gaze correction system that can automatically maintain eye contact by replacing the eyes of the user with the direct looking eyes (looking directly into the camera) captured in the initialization stage. Our real-time system has good robustness against different lighting conditions and head poses, and it provides visually convincing and natural results while relying only on a single webcam that can be positioned almost anywhere around the screen.
Conference Paper
The purpose of this paper is to assess the feasibility of predicting customer churn using eye-tracking data. The eye movements of 175 respondents were tracked when they were looking at advertisements of three mobile operators. These data are combined with data that indicate whether or not a customer has churned in the one year period following the collection of the eye tracking data. For the analysis we used Random Forest and leave-one-out cross validation. In addition, at each fold we used variable selection for Random Forest. An AUC of 0.598 was obtained. On the eve of the commoditization of eye-tracking hardware this is an especially valuable insight. The findings denote that the upcoming integration of eye-tracking in cell phones can create a viable data source for predictive Customer Relationship Management. The contribution of this paper is that it is the first to use eye-tracking data in a predictive customer intelligence context.
Article
In video conferences, one user can either look at the remote image of the other captured by a video camera, or look at the camera that is capturing her/him, but not two tasks at the same time. The lack of eye contact caused by this misalignment substantially reduces the effectiveness of communication with an unpleasant feeling of disconnectedness. We propose an approach to bring eye contact back while user looks at the remote image of the other by a novel system composed by a Time-of-Flight depth sensor and traditional stereo. The key success of this system is to faithfully recover scene’s depth. In this 2.5D space, the controlling of the user’s eye-gaze becomes relatively easier. To evaluate the performance of the system, we conducted two user studies. One focuses on subjects have been trained to be familiar to eye gaze displayed in images; another is blind evaluation that subjects have no prior knowledge about eye gaze. Both evaluations show that the system can bring desktop participants closer to each other. KeywordsTime-of-Flight–Stereo–Eye-gaze
Article
This article presents a method for automating rendering parameter selection to simplify tedious user interaction and improve the usability of visualization systems. Our approach acquires the important/interesting regions of a dataset through simple user interaction with an eye tracker. Based on this importance information, we automatically compute reasonable rendering parameters using a set of heuristic rules, which are adapted from visualization experience and psychophysical experiments. A user study has been conducted to evaluate these rendering parameters, and while the parameter selections for a specific visualization result are subjective, our approach provides good preliminary results for general users while allowing additional control adjustment. Furthermore, our system improves the interactivity of a visualization system by significantly reducing the required amount of parameter selections and providing good initial rendering parameters for newly acquired datasets of similar types.
Article
We present a set of algorithms and an associated display system capable of producing correctly rendered eye contact between a three-dimensionally transmitted remote participant and a group of observers in a 3D teleconferencing system. The participant's face is scanned in 3D at 30Hz and transmitted in real time to an autostereoscopic horizontal-parallax 3D display, displaying him or her over more than a 180° field of view observable to multiple observers. To render the geometry with correct perspective, we create a fast vertex shader based on a 6D lookup table for projecting 3D scene vertices to a range of subject angles, heights, and distances. We generalize the projection mathematics to arbitrarily shaped display surfaces, which allows us to employ a curved concave display surface to focus the high speed imagery to individual observers. To achieve two-way eye contact, we capture 2D video from a cross-polarized camera reflected to the position of the virtual participant's eyes, and display this 2D video feed on a large screen in front of the real participant, replicating the viewpoint of their virtual self. To achieve correct vertical perspective, we further leverage this image to track the position of each audience member's eyes, allowing the 3D display to render correct vertical perspective for each of the viewers around the device. The result is a one-to-many 3D teleconferencing system able to reproduce the effects of gaze, attention, and eye contact generally missing in traditional teleconferencing systems.
Article
Full-text available
Human interfaces for computer graphics systems are now evolving towards multi-modal approach. Information gathered using visual, audio and motion capture systems are now becoming increasingly important within user-controlled virtual environments. This paper discusses real-time interaction with virtual world through the visual analysis of human facial features from video. The underlying approach to recognize and analyze the facial movements of a real performance is described in detail. The output of the system is compatible with MPEG-4 standard and therefore enhances the ability to use the available data in any other MPEG-4 compatible application. The MPEG-4 standard mainly focuses on networking capabilities and it therefore offers interesting possibilities for teleconferencing, as the requirements for the network bandwidth are quite low. The real-time facial analysis system enables the user to control the facial animation. This is used primarily with real-time facial animation systems,...
Article
Full-text available
We present a new facial animation approach that produces a wide variety of realistic facial expressions with advantages over existing geometry deformation and morphing methods for performance-driven animation systems. These systems sense and reconstruct a real person's facial expressions from an image sequence. The novelty of our approach is to sense and animate both textures and geometry using classification, volume morphing, and 3D-texture interpolations. Feature tracking drives our volume morphing animation system directly, without a need for intermediate animation parameters. Classification encodes realistic dynamic skin wrinkles, eye blinking, and eye motions. Classification results control 3D-texture animations that reconstruct the observed facial appearance. Classification leads to extremely low bandwidths for communicating visual information and independence of the rendered resolution and image quality from sensing resolution and pose. We demonstrate our approach with results obtained with a realtime volume morphing engine that animates the geometry of a facial model from sparse motion samples. A wavelet-based classifier and a 3D-texture engine reproduce dynamic textures on the animated geometry, including wrinkling effects, eye blinking, and eye movements.
Article
This paper reviews techniques for achieving eye contact between participants in a videoconference, where video screens (more than 30 cm) have forced the video camera out of acceptable alignment with the participant's face. The techniques are of particular use with screens portraying life size images, where camera misalignment becomes unacceptable. It also addresses the role of eye contact in interactions between two people to demonstrate the value of preserving it.
Article
ABSTRACT - Asystem isdesctibed which allows forthesynthesis ofavideo sequence ofa realistic - appearing talking human head A phonetic based approach is used to describe facial motion; image processing rather than physical modeling techniques are used to create the video frames
Article
This paper presents the major outcomes of a human factors experiment which investigated the benefits of a new multipoint desktop videoconferencing feature. The videoconference system enables interlocutors to establish individual eye-contact, a nonverbal cue which has not been transmitted by standard videoconferencing systems until now. Eye-contact is seen as an important component of interpersonal interaction particularly with regard to the individual addressability of interlocutors. It was assumed that the transmission of this nonverbal signal produces, among other things, a smoother conversation flow and therefore enhances the feeling of telepresence as well as affects other relevant dependent variables in a positive direction (e.g. overall user satisfaction). Two different videoconferencing systems were compared against each other, one supported individual eye contact, the other did not. An audio conference system as a reference condition was included in the experimental design. The scenario for the experiment was a group discussion on a given topic. The results show that users have significantly more often the feeling of being addressed i.e. of being looked at and recognise if they are addressed or not. The expected advantages in terms of a higher degree of telepresence or a higher overall satisfaction did not occur. Introduction The experiment described in the following was part of the research project "Telepresence in Desktop Multimedia Conferencing" aiming at investigating means that can improve the feeling of telepresence which was seen as being particularly important for enhancing the effectiveness and attractiveness of tele-cooperation and telework as former findings suggested.
Article
In distributed collaborative virtual environments, participants are often embodied or represented in some form within a virtual world. The representations take many different forms and are often driven by limitations in the available technology. Desktop Web based environments typically use textual or two dimensional representations, while high end environments use motion trackers to embody a participant and their actions in an avatar or human form. This paper describes this wide range of virtual user representations and their creation and performance issues investigated as part of the Human-Computer Symbiotes project within DARPA's Intelligent Collaboration and Visualization (IC&V) program.
Article
This paper addresses the problem of robust 2D image motion estimation in natural environments. We develop an adaptive tracking-region selection and optical-flow estimation technique. The strategy of adaptive region selection locates reliable tracking regions and makes their motion estimation more reliable and computationally efficient. The multi-stage estimation procedure makes it possible to discriminate between good and poor estimation areas, which maximizes the quality of the final motion estimation. Furthermore, the model fitting stage further reduces the estimation error and provides a more compact and flexible motion field representation that is better suited for high-level vision processing. We demonstrate the performance of our techniques on both synthetic and natural image sequences. Keywords: motion estimation, optical flow, parametric model, augmented reality 1. INTRODUCTION The estimation of inter-frame image motion is a key element in image sequence analysis. Almost all...