Learning to be a Depth Camera for Close-Range Human Capture and Interaction

ACM Transactions on Graphics (Impact Factor: 4.1). 07/2014; 33(4). DOI: 10.1145/2601097.2601223


We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human-computer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Full body interactions are becoming increasingly important for Human-Computer Interaction (HCI) and very essential in thriving areas such as mobile applications, games and Ambient Assisted Living (AAL) solutions. While this enriches the design space of interactive applications in ubiquitous and pervasive environments, it dramatically increases the complexity of programming and customising such systems for end-users and non-professional interaction developers. This work addresses the growing need for simple ways to define, customise and handle user interactions by manageable means of demonstration and declaration. Our novel approach fosters the use of Labanotation (as one of the most popular movement description visual notations) and off-shelf motion capture technologies for interaction recoding, generation and analysis. This paper presents a novel reference implementation, called Ambient Movement Analysis Engine, to allow for recording movement scores and subscribing to events in Labanotation format from live motion data streams.
    No preview · Conference Paper · Jun 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose an articulated and generalized Gaussian kernel correlation (GKC)-based framework for human pose estimation. We first derive a unified GKC representation that generalizes previous sum of Gaussians (SoG)- based methods for the similarity measure between a template and an observation both of which are represented by various SoG variants. Then we develop articulated GKC (AGKC) by integrating a kinematic skeleton in a multivariate SoG template that supports subject-specific shape modeling and articulated pose estimation for both the full body and hands. We further propose a sequential (body/hand) pose tracking algorithm by incorporating three regularization terms in the AGKC function, including visibility, intersection penalty and pose continuity. Our tracking algorithm is simple yet effective and computationally efficient. We evaluate our algorithm on two benchmark depth datasets. The experimental results are promising and competitive when compared with state-of-the-art algorithms.
    No preview · Article · Dec 2015 · IEEE Transactions on Image Processing