• Home
  • Microsoft
  • Interactive 3D Technologies Research Group (I3D)
  • Danhang Tang
Danhang Tang

Danhang Tang
Microsoft · Interactive 3D Technologies Research Group (I3D)

Doctor of Philosophy

About

30
Publications
6,522
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,612
Citations
Additional affiliations
September 2011 - January 2016
Imperial College London
Position
  • PhD Student

Publications

Publications (30)
Preprint
Full-text available
We introduce a new implicit shape representation called Primary Ray-based Implicit Function (PRIF). In contrast to most existing approaches based on the signed distance function (SDF) which handles spatial locations, our representation operates on oriented rays. Specifically, PRIF is formulated to directly produce the surface hit point of a given i...
Preprint
Full-text available
Immersive maps such as Google Street View and Bing Streetside provide true-to-life views with a massive collection of panoramas. However, these panoramas are only available at sparse intervals along the path they are taken, resulting in visual discontinuities during navigation. Prior art in view synthesis is usually built upon a set of perspective...
Preprint
We propose VoLux-GAN, a generative framework to synthesize 3D-aware faces with convincing relighting. Our main contribution is a volumetric HDRI relighting method that can efficiently accumulate albedo, diffuse and specular lighting contributions along each 3D ray for any desired HDR environmental map. Additionally, we show the importance of superv...
Preprint
Full-text available
We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical representation that can recover fine geometry detail, while being able to perform global operations such as shape completion. Our model represents a complex 3D shape with a hierarchy of latent grids, which can be decoded into different levels of detail and also achieve bett...
Preprint
Full-text available
In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g., left vs. right hand. In contrast, we propose a de...
Preprint
Full-text available
We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. To compress the TSDF, our method relies on a block-based neural network architecture trained end-to-end, achieving state-of-the-art rate-distortion trade-off. To prevent topological errors, we losslessly c...
Conference Paper
Motivated by recent availability of augmented and virtual reality platforms, we tackle the challenging problem of immersive storytelling experiences on mobile devices. In particular, we show an end-to-end system to generate 3D assets that enable real-time rendering of an opera on high end mobile phones. We call our system AR-ia and in this paper we...
Article
We present "The Relightables", a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on volumetric capture systems, focusing on 3D geometric reconstruction with high resolution textures, much less work has been done to recover photometric properties needed...
Preprint
Full-text available
In this work, we present a modified fuzzy decision forest for real-time 3D object pose estimation based on typical template representation. We employ an extra preemptive background rejector node in the decision forest framework to terminate the examination of background locations as early as possible, result in a significantly improvement on effici...
Conference Paper
Full-text available
We introduce a realtime compression architecture for 4D performance capture that is two orders of magnitude faster than current state-of-the-art techniques, yet achieves comparable visual quality and bitrate. We note how much of the algorithmic complexity in traditional 4D compression arises from the necessity to encode geometry using an explicit m...
Conference Paper
The advent of consumer depth cameras has incited the development of a new cohort of algorithms tackling challenging computer vision problems. The primary reason is that depth provides direct geometric information that is largely invariant to texture and illumination. As such, substantial progress has been made in human and object pose estimation, 3...
Article
Hand pose estimation, formulated as an inverse problem, is typically optimized by an energy function over pose parameters using a ‘black box’ image generation procedure, knowing little about either the relationships between the parameters or the form of the energy function. In this paper, we show significant improvement upon the black box optimizat...
Article
The state of the art in articulated hand tracking has been greatly advanced by hybrid methods that fit a generative hand model to depth data, leveraging both temporally and discriminatively predicted starting poses. In this paradigm, the generative model is used to define an energy function and a local iterative optimization is performed from these...
Article
In this paper we present Latent-Class Hough Forests, a method for object detection and 6 DoF pose estimation in heavily cluttered and occluded scenarios. We adapt a state of the art template matching feature into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. We train with p...
Article
In this paper we present the latent regression forest (LRF), a novel framework for real-time, 3D hand pose estimation from a single depth image. Prior discriminative methods often fall into two categories: holistic and patch-based. Holistic methods are efficient but less flexible due to their nearest neighbour nature. Patch-based methods can genera...
Article
Videos tend to yield a more complete description of their content than individual images. And egocentric vision often provides a more controllable and practical perspective for capturing useful information. In this study, we presented new insights into different object recognition methods for video-based rigid object instance recognition. In order...
Conference Paper
Full-text available
In this paper we propose a novel framework, Latent-Class Hough Forests, for 3D object detection and pose estimation in heavily cluttered and occluded scenes. Firstly, we adapt the state-of-the-art template matching feature, LINEMOD [14], into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based s...
Conference Paper
Full-text available
In this paper we present the Latent Regression Forest (LRF), a novel framework for real-time, 3D hand pose estimation from a single depth image. In contrast to prior forest-based methods, which take dense pixels as input, classify them independently and then estimate joint positions afterwards, our method can be considered as a structured coarse-to...
Conference Paper
Full-text available
This paper presents the first semi-supervised transductive algorithm for real-time articulated hand pose estimation. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the discrepancies among realistic and synthetic pose data undermine the performances of existing approaches that use synthetic data...

Projects

Project (1)
Project
Real-time depth data is readily available on mobile phones with passive or active sensors and on VR/AR devices. However, this rich data about our environment is under-explored in mainstream AR applications. Slow adoption of depth information in the UX layer may be due to the complexity of processing depth data to simply render a mesh or detect interaction based on changes in the depth map. In this paper, we introduce DepthLab, a software library that encapsulates a variety of UI/UX features for depth, including geometry-aware rendering (occlusion, shadows), depth interactive behaviors (physically based collisions, avatar path planning), and visual effects (relighting, aperture effects). We break down the usage of depth map into point depth, surface depth, and per-pixel depth and introduce our real-time algorithms for different use cases. We present the design process, system, and components of DepthLab to streamline and centralize the development of interactive depth features. We open-sourced our software with external developers and collected both qualitative and quantitative feedback. Our results and feedback from engineers suggest that DepthLab help mobile AR developers unleash their creativity and effortlessly integrate depth in mobile AR experiences, and amplify their prototyping efforts. Real-time environmental tracking has become a fundamental capability in modern mobile phones and AR/VR devices. However, it only allows user interfaces to be anchored at a static location. Although fiducial and natural-feature tracking overlays interfaces with specific visual features, they typically require developers to define the pattern before deployment. In this paper, we introduce opportunistic interfaces to grant users complete freedom to summon virtual interfaces on everyday objects via voice commands or tapping gestures. We present the workflow and technical details of Ad hoc UI (AhUI), a prototyping toolkit to empower users to turn everyday objects into opportunistic interfaces on the fly. We showcase a set of demos with real-time tracking, voice activation, 6DoF interactions, and mid-air gestures and prospect the future of opportunistic interfaces.