Thesis

Unconstrained Gaze Estimation Using RGB-D Camera

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In this work, we tackled the automatic gaze estimation problem in unconstrained user environments. By using RGB-D sensor, we aimed to exploit the multi-modal data to provide a sufficiently accurate and robust system. This work can be summarized through 3 principle axes: model, paradigm and data. In the first part of this thesis, we described in details Random Forest algorithm. Through our investigation, we formulated some tasks as a learning problems, we used decision forest as model to capture the mapping functions. We gave a global overview of this tool, and compared it to some machine learning techniques under different tasks. We finished this part by highlighting the recent achieved improvements of this algorithm in computer vision and medical image analysis. Through this survey, we reported some empirical proofs of the potential of Random Forest in handling highly non-linear problems such as gaze estimation. The second axis of this work is about gaze estimation paradigms. We first developed two automatic gaze estimation systems following two classical approaches: feature and semi appearance-based approaches. Our feature-based system is based on a robust eye pupil localization component which allows to build a 3D eye model. Combined with head pose estimation, a 3D gaze information can be inferred. The second system is fundamentally based on building a frontal gaze manifold corrected with the head pose parameters. This system aims to learn gaze information from eye image appearance under frontal configurations then uses head pose-based geometric transformation to infer the final gaze information. The major limitation of such paradigms lies in their way of designing gaze systems which assume a total independence between eye appearance and head pose blocks. To overcome this limitation, we converged to a novel paradigm which aims at unifying the two previous component and building a global gaze manifold. To achieve such unification, we built an input data from both RGB cue related to eye appearance, and depth cue related to head pose. A robust mapping between such input space and gaze information space is learned robustly. We performed a comprehensive comparisons between these systems under unconstrained environment and reported a deepen analysis about the obtained results. The final axis of this work represents the data. Providing sufficient input data to learn mapping functions with a high ability of generalization is fundamental. We explored two global approaches across the experiments by using synthetic and real RGB-D gaze samples. For each type of data, we described the acquisition protocol and evaluated the ability of handling the task in hand. We finished by performing a synthetic/real learning comparison in terms of robustness and accuracy and inferredsome empirical correlations.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.