Conference PaperPDF Available

Real-Time High Resolution 3D Data on the HoloLens


Abstract and Figures

The recent appearance of augmented reality headsets, such as the Microsoft HoloLens, is a marked move from traditional 2D screen to 3D hologram-like interfaces. Striving to be completely portable, these devices unfortunately suffer multiple limitations, such as the lack of real-time, high quality depth data, which severely restricts their use as research tools. To mitigate this restriction, we provide a simple method to augment a HoloLens headset with much higher resolution depth data. To do so, we calibrate an external depth sensor connected to a computer stick that communicates with the HoloLens headset in real-time. To show how this system could be useful to the research community, we present an implementation of small object detection on HoloLens device.
Content may be subject to copyright.
Real-time High Resolution 3D Data on the HoloLens
Mathieu GaronPierre-Olivier Boulet
Laval University
Jean-Philippe DoironLuc Beaulieu§
Frima Studio, Inc.
Jean-Franc¸ois Lalonde
Laval University
stick PC
to battery
(no shown)
(a) User wearing our system (b) Scene (c) Scene observed through HoloLens
Figure 1: By combining a high resolution depth camera (Intel RealSense) with a Microsoft HoloLens headset (a), we can use object detection
algorithms to precisely locate small objects (b) and overlay object-dependent information in the HoloLens field of view (c). Our resulting system
is tetherless, and allows the use of algorithms that exploit high resolution depth information for HoloLens applications.
The recent appearance of augmented reality headsets, such as the
Microsoft HoloLens, is a marked move from traditional 2D screen
to 3D hologram-like interfaces. Striving to be completely portable,
these devices unfortunately suffer multiple limitations, such as the
lack of real-time, high quality depth data, which severely restricts
their use as research tools. To mitigate this restriction, we provide a
simple method to augment a HoloLens headset with much higher
resolution depth data. To do so, we calibrate an external depth
sensor connected to a computer stick that communicates with the
HoloLens headset in real-time. To show how this system could be
useful to the research community, we present an implementation of
small object detection on HoloLens device.
As one of the first AR headsets available today on the market, the
Microsoft HoloLens is bound to have a profound impact on the de-
velopment of AR applications. For the first time, an intuitive and
easy-to-use device is available to developers, who can use it to de-
ploy AR applications to an ever expanding range of users and cus-
tomers. What is more, that device is itself a portable computer, ca-
pable of operating without being connected to an external machine,
making it well-suited for a variety of applications.
A downside of this portability is that access to the raw data pro-
vided by HoloLens sensors is not available. This severely restricts
the use of the HoloLens as a research tool: by being forced to ex-
clusively use the provided API functionality, future research using
the HoloLens is quite limited indeed. Of note, the unavailability of
high resolution depth data prohibits the development of novel ob-
ject detection [7,9], SLAM [5,1], or tracking [6,8] algorithms to
Stick PC Power Pack
Figure 2: Overview of hardware setup. We attach an Intel RealSense
RGBD camera on a HoloLens unit via a custom mount. The Re-
alSense is connected to a stick PC via USB. This PC can then relay
the high resolution depth data—or information such as detected ob-
jects computed from it—back to the HoloLens via WiFi.
name just a few, on-board these devices.
In this poster, we present a system that bypasses this limitation
and provides high resolution 3D data to HoloLens applications in
real-time. The 3D data is accurately registered to the HoloLens
reference frame and allows the integration of any depth- or 3D-
based algorithm for use on the HoloLens. As seen in fig. 1, our key
idea is to couple a depth camera (an Intel Realsense in our case—
but other such portable RGBD cameras could be used as well) with
a HoloLens unit using a custom-made attachment, and to transfer
the depth information in real-time via a WiFi connection operated
on a stick PC, also attached to the HoloLens. We do so without
sacrificing the portability of the device: our system is still tetherless,
as is the original HoloLens.
The remainder of this short paper will describe the hardware
setup in greater details in sec. 2, and demonstrate the usability of
our approach with real-time 3-D object detection on the HoloLens
in sec. 3.
2.1 System overview
Fig. 1-(a) shows a user wearing our system, which is also illus-
trated schematically in fig. 2. It is composed of an Intel RealSense
(a) HoloLens 3D (b) RealSense 3D (c) Scene
Figure 3: Comparison of 3D data obtained from the HoloLens (a) and the RealSense (b) for the same scene (c). The 3D HoloLens data is
insufficient for many applications such as small scale object detection. For clarity, the background has been cropped out in the 3D data for both
(a) and (b). Please see the supplementary video for an animated version of this figure [2].
RGBD camera that is attached to a Microsoft HoloLens headset
via a custom-made mount. The RealSense is connected to an Intel
Compute Stick PC (specifically, Intel Core M5-6Y57 at 1.1 GHz
with 4GB RAM) via USB 3.0. The stick PC is attached to the back
of the HoloLens, and the battery pack can typically be carried in the
users pockets. Thus, our system does not sacrifice mobility and is
still completely tetherless.
Real-time depth data obtained from the RealSense camera can
be processed by the PC, which can beam it back to the HoloLens
directly via WiFi. To limit bandwidth usage however, it is typically
preferable to have the PC implement the particular task at hand and
send the results back to the HoloLens, rather than transmitting the
raw depth data. The specific task and transmitted data depend upon
the application: in sec. 3we demonstrate the use of our system in
the context of small-scale object detection, but others could be used
(e.g. depth-based face detection).
2.2 Calibration
This section describes the calibration procedure to transform the
depth data in the RealSense reference frame to the HoloLens ref-
erence frame so that it can be displayed accurately to the user.
Unfortunately, as shown in fig. 3, the depth data provided by the
HoloLens is too coarse to be useful for calibration. Therefore, we
must rely on the color cameras also present to do so.
Fig. 4provides an overview of the transformations that must be
computed to display the RealSense depth, originally in the “Depth”
coordinate system, in the HoloLens virtual camera coordinate sys-
tem “Virtual”. This amounts to computing the transform TVirtual
Depth (in
our notation, Tb
ais the transformation that maps a point pain ref-
erence frame ato reference frame b, i.e. pb=Tb
apa). Following
fig. 4and chaining the transformations, we have:
Depth =TVirtual
Depth ,(1)
where “RGB” and “Webcam” are the RealSense color camera and
the HoloLens webcam coordinate systems respectively.
We must therefore estimate the three individual transformations
in (1). First, we directly employ the transformations TRGB
Depth and
Webcam that are provided by the RealSense and HoloLens APIs re-
spectively. Then, we estimate the remaining TWebcam
RGB by placing a
planar checkerboard calibration target (which defines the “Calib”
reference frame) that is visible by all cameras simultaneously in
the scene, and estimating the individual transformations TCalib
RGB and
Webcam with standard camera calibration (we use the OpenCV
implementation of [10]). Finally, TWebcam
RGB = (TCalib
Please refer to fig. 4for a graphical illustration of this process.
Virtual camera
Web c a m
TWeb c a m
Web c a m
Required transformations
Calibrated transformations
Figure 4: Several rigid transformations must be estimated in order to
express the 3D information acquired by the RealSense depth sen-
sor in the HoloLens virtual camera coordinate system. We employ
a checkerboard pattern to determine the relationship between the
color cameras, since the HoloLens depth information is not reliable
enough to obtain accurate calibration results. Bold lines and transfor-
mations indicate what is required, and the grayed ones indicate what
we explicitly calibrate.
By comparing the 3D data provided by the HoloLens to the one by
the RealSense in fig. 3, it is easy to see that the HoloLens data is
insufficient for applications that require high resolution depth data.
One such application is object detection and pose estimation, where
the task is to accurately locate a known object in the scene.
To demonstrate the usefulness of our system, we thus imple-
mented the multimodal detection system proposed in [4] to detect
candidate objects and estimate their corresponding poses. Similarly
to [3], we train a template-based detector from a 3D CAD model.
The detections are then refined through a photometric verification
with the projected 3D model, followed by a fast geometric verifica-
tion to remove false positives. The most confident object pose are
subsequently refined through a dense and more accurate geometric
verification step with ICP. This procedure is repeated independently
at every frame (although tracking [8] could also be employed to ob-
tain more stable results under object or camera movement).
Fig. 5shows results obtained on three different, small-scale ob-
jects. The object detection and pose estimation algorithms are run
on the stick PC, and only the 6-DOF pose information is transmit-
ted back to the HoloLens (see fig. 2). The high resolution mesh of
the corresponding object, initially pre-computed and pre-loaded on
the HoloLens, is then transformed according to the detected pose
Original sceneSee-through HoloLens
Figure 5: Object detection results. Original scenes are shown on the top row. On the bottom row, the corresponding scene is photographed by
placing the HoloLens over the camera to simulate what a viewer would see. A high resolution mesh of objects detected via the high resolution
depth stream are realistically overlaid on the scene. Please see the supplementary video for an animated version of this figure [2].
and overlaid at the correct location in the HoloLens display.
In this poster, we presented a simple system for providing HoloLens
applications with real-time and high resolution 3D data. We do so
by attaching another RGBD camera—the Intel RealSense in our
case but others could be used as well—to the HoloLens, and by
streaming the acquired data (or post-processed information) back to
the HoloLens via a stick PC. We demonstrate the usefulness of our
system by detecting small objects with the high resolution 3D data
and overlaying their 3D model in the HoloLens viewpoint, which
would be impossible to do from the original 3D data.
In future work, we will improve the precision of the system by
explicitly measuring the latency caused by communication and the
data acquisition rates of the RealSense. Our current solution works
well even without latency compensation, but fast camera or object
motions may cause misalignments between the detections and the
real objects. We believe that accounting for latency will be an im-
portant next step in the development of even more stable systems.
This project was supported by Frima Studio and Mitacs through an
internship to Mathieu Garon. We also thank Frima Studio for the
use of their equipment and facilities.
[1] A. Dai, M. Nießner, M. Zollh¨
ofer, S. Izadi, and C. Theobalt. Bundle-
Fusion: Real-time globally consistent 3D reconstruction using online
surface re-integration. In IEEE Conference on Computer Vision and
Pattern Recognition, 2016.
[2] M. Garon, P.-O. Boulet, J.-P. Doiron, L. Beaulieu, and J.-F.
Lalonde. Real-time high resolution 3d data on the hololens: Project
projects/hololens3d/, July 2016.
[3] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, K. Konolige, and
N. Navab.
[4] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige,
N. Navab, and V. Lepetit. Multimodal templates for real-time de-
tection of texture-less objects in heavily cluttered scenes. In IEEE
International Conference on Computer Vision, 2011.
[5] R. Newcombe, D. Fox, and S. M. Seitz. DynamicFusion: Reconstruc-
tion and tracking of non-rigid scenes in real-time. In IEEE Conference
on Computer Vision and Pattern Recognition, 2015.
[6] C. Y. Ren, V. Prisacariu, O. Kaehler, I. Reid, and D. Murray. 3D
tracking of multiple objects with identical appearance using RGB-D
input. In International Conference on 3D Vision, 2015.
[7] R. Rios-Cabrera and T. Tuytelaars. Discriminatively trained templates
for 3D object detection: A real time scalable approach. In IEEE Inter-
national Conference on Computer Vision, 2013.
[8] D. J. Tan, F. Tombari, S. Ilic, and N. Navab. A versatile learning-based
3D temporal tracker: Scalable, robust, online. In IEEE International
Conference on Computer Vision, 2015.
[9] P. Wohlhart and V. Lepetit. Learning descriptors for object recognition
and 3D pose estimation. In IEEE Conference on Computer Vision and
Pattern Recognition, 2015.
[10] Z. Zhang. A flexible new technique for camera calibration.
IEEE Transactions on pattern analysis and machine intelligence,
22(11):1330–1334, 2000.
... The headset has an onboard RGB camera (H), which supports stabilised video streaming of 1280×720 resolution at 30 frames-per-second (fps) with around 2.1 arcmin/pixel average angular resolution. Although the HoloLens 1 embeds an on-board time-of-flight (ToF) depth camera sensor for 3D world anchoring and gesture recognition, its resolution is too coarse for small scale object detection [17] no matter which mode is used (i.e., θ ≈10.4 arcmin/pixel in short-throw mode and ≈8 arcmin/pixel in long-throw mode [18]). Moreover, its accuracy degrades with temperature drift over time: the depth tracking error reaches 8 mm after one hour of work [18]. ...
... We proposed two fast calibration schemes that exploit a specially designed depth-RGB multimodality tool, and integrated them into our AR platform. 1) Calibration for the Stereo Pose H D T: In a similar work in which the HoloLens 1 was augmented with an additional RGBD camera [17], H D T was determined in two steps by H D T = H RGB T × RGB D T: the stereo-pose between the two color cameras H RGB T was first calibrated by a conventional procedure [25]. Then, the internal transformation RGB D T was queried from the RGBD camera's application programming interface (API). ...
... i correspondence can thus be collected by just displaying the tool to the depth camera-HMD system. After repeating such collection multiple times (e.g., [15][16][17][18][19][20], the stereo-pose H D T is finally optimised by solving a Perspective-n-Point (PnP) optimisation [28] using all the collected correspondences: ...
Visual augmented reality (AR) has the potential to improve the accuracy, efficiency and reproducibility of computer-assisted orthopaedic surgery (CAOS). AR Head-mounted displays (HMDs) further allow non-eye-shift target observation and egocentric view. Recently, a markerless tracking and registration (MTR) algorithm was proposed to avoid the artificial markers that are conventionally pinned into the target anatomy for tracking, as their use prolongs surgical workflow, introduces human-induced errors, and necessitates additional surgical invasion in patients. However, such an MTR-based method has neither been explored for surgical applications nor integrated into current AR HMDs, making the ergonomic HMD-based markerless AR CAOS navigation hard to achieve. To these aims, we present a versatile, device-agnostic and accurate HMD-based AR platform. Our software platform, supporting both video see-through (VST) and optical see-through (OST) modes, integrates two proposed fast calibration procedures using a specially designed calibration tool. According to the camera-based evaluation, our AR platform achieves a display error of 6.31 2.55 arcmin for VST and 7.72 3.73 arcmin for OST. A proof-of-concept markerless surgical navigation system to assist in femoral bone drilling was then developed based on the platform and Microsoft HoloLens 1. According to the user study, both VST and OST markerless navigation systems are reliable, with the OST system providing the best usability. The measured navigation error is 4.90 1.04 mm, 5.96 2.22 for VST system and 4.36 0.80 mm, 5.65 1.42 for OST system.
... It is, however, not connected to the HoloLens 1 but instead to an additional PC. Garon et al. (2017) mount a RealSense to the top of HoloLens 1 to enhance the quality of the depth sensor. This approach is more mobile as it uses an Intel Compute Stick PC which is attached to the back of the HMD which is only connected to a powerbank. ...
Full-text available
Microsoft HoloLens 2 (HL2) is often found in research and products as a cutting-edge device in Mixed Reality medical applications. One application is surgical telementoring, that allows a remote expert to support surgeries in real-time from afar. However, in this field of research two problems are encountered: First, many systems rely on additional sensors to record the surgery in 3D which makes the deployment cumbersome. Second, clinical testing under real-world surgery conditions is only performed in a small number of research works. In this article, we present a standalone system that allows the capturing of 3D recordings of open cardiac surgeries under clinical conditions using only the existing sensors of HL2. We show difficulties that arose during development, especially related to the optical system of the HL2, and present how they can be solved. The system has successfully been used to record surgeries from the surgeons point of view and to reconstruct a 3D view in real-time which can be streamed over network to a remote expert. In a study, we present a recording of a captured surgery under real-world clinical conditions to expert surgeons which estimate the quality of the recordings and their overall applicability for diagnosis and support. The study shows benefits from a 3D reconstruction compared to video-only transmission regarding perceived quality and feeling of immersion.
... For example, [23] tried to use this device as a therapeutic tool for people with Alzheimer's Disease, and [24] used HoloLens 1 in anatomic pathology to test clinical and non-clinical applications. Moreover, [25] proposed a study of mixing HoloLens 1 and 3D geographic in-formation, and [26] provided a method to add higher resolution depth data in the device. [27] and [7] implemented a series of evaluation tests to evaluate the performance, advantages, and disadvantages of HoloLens 1. ...
Full-text available
Mixed Reality (MR) is an evolving technology lying in the continuum spanned by related technologies such as Virtual Reality (VR) and Augmented Reality (AR), and creates an exciting way of interacting with people and the environment. This technology is fast becoming a tool used by many people, potentially improving living environments and work efficiency. Microsoft HoloLens has played an important role in the progress of MR, from the first generation to the second generation. In this paper, we systematically evaluate the functions of applicable functions in HoloLens 2. These evaluations can serve as a performance benchmark that can help people who need to use this instrument for research or applications in the future. The detailed tests and the performance evaluation of the different functionalities show the usability and possible limitations of each function. We mainly divide the experiment into the existing functions of the HoloLens 1, the new functions of the HoloLens 2, and the use of research mode. This research results will be useful for MR researchers who want to use HoloLens 2 as a research tool to design their own MR applications.
... A lot of work has been done to make better use of the powerful capabilities of MR devices. Some research has focused on personalizing MR devices to enable them to concentrate fully on dedicated areas [10][11][12][13][14][15][16][17][18][19][20], while others have optimized the data visualization process for physical reality and applied it to MR devices for better performance [21][22][23][24][25][26]. In addition, others have summarized the evolution of MR from a macro perspective and have raised practical questions to guide the development of MR devices [5]. ...
... Ils peuvent être utilisés notamment pour l'assistance sur une tâche d'assemblage [99] ou pour des gestes chirurgicaux [35,44,105], mais aussi pour téléopérer un robot [71]. [52], à condition qu'un laps de temps minimum nécessaire soit consacré à cet apprentissage afin que le transfert soit significatif [111]. ...
Full-text available
La cobotique, ou robotique collaborative, présente l’avantage de garder l’humain impliqué dans l’élaboration d’une tâche où son expertise est nécessaire. De plus, la cobotique permet de décupler la force de l’opérateur, d’augmenter sa dextérité et donc apporter une assistance à l’agent. On trouve son application dans l’industrie sur les chaînes d’assemblages ou en médecine pour des microchirurgies et les chirurgies mini-invasives. En industrie, plusieurs tâches de micro-assemblage, nécessitant la manipulation d’objets d’une centaine de micromètres à quelques millimètres, sont encore faites à la main, à l’aide d’une pince brucelles. Cependant, ces métiers, qui n’ont pas énormément évolué, font face à des désagréments liés à l’ergonomie de leur poste de travail et à la pratique de leurs activités. La thèse a pour but de développer une interface homme/machine robotique pour faciliter la manipulation fine. Reprenant la forme et la fonction des pinces brucelles, un nouvel outil actif instrumenté est proposé. Il vise à faciliter tous les travaux utilisant ce type de pince, en ajoutant des fonctions robotiques comme la commande de l’effort de serrage ou encore un retour haptique. Il peut aussi servir de dispositif maître dans une chaîne de téléopération et piloter un robot micromanipulateur. Cet outil sera notamment proposé à des artisans mais pourrait aussi répondre à certains besoins rencontrés par les chirurgiens utilisant eux aussi les pinces brucelles.
... The use of XR equipment such as Oculus Rift headsets and controller tracking device (leap motion) generates interactive applications [12,13]. Immersive virtual reality (IVR) aims to improve interaction through tracking technology for instance head, body and hand gestures. ...
Conference Paper
Full-text available
Due to the possibilities provided by such technologies to provide people with live immersive virtual worlds, Extended Reality (XR) technologies such as virtual (VR), augmented (AR), and mixed reality (MR) have grown. The role of XR has a highly effective and efficient technology in virtual laboratories (VL) as experiments can be performed almost anywhere. During the laboratory activities, students are facing the issue of dealing with dangerous equipment such as Semiconductor fabrication furnace, which has a high temperature. In recent years, modern sensors such as Leap Motion Controller can track human movements and analyses the accuracy. This work presents a novelty approach of design training that can tracks body movements. Moreover, it provides warring alerts to trainer when it comes closer to dangerous places. While she/he is making wrong hand/body movement. When user do the experiment a semiconductor furnace training is operation in a VL. To illustrate more, this training enables the trainee to prevent his/her burning during the experiment according to safety and social distancing standards as well. Thus, Unity 3D has the capability to measure the following: accuracy, quality and efficiency of the system. XR training is significantly considering as safe technology to avoid user’s accidents. To conclude, in the field of Human-computer interaction (HCI), the analysis has been improved the training movement in terms of accuracy and interactivity respectively.
... A new advanced experimental setup based on the use of MR smartglasses [8] has been employed at the IRCCS Hospital from March 2020 to December 2020, with the aim to carry out remote mentoring when face-to-face training activities were suspended by law due to the Covid-19 pandemic ( Figure 1). ...
Full-text available
In this paper, Mixed Reality (MR) has been exploited in the operating rooms to perform laparoscopic and open surgery with the aim of providing remote mentoring to the medical doctors under training during the Covid-19 pandemic. The employed architecture, which has put together MR smartglasses, a Digital Imaging Player, and a Mixed Reality Toolkit, has been used for cancer surgery at the IRCCS Hospital ‘Giovanni Paolo II’ in southern Italy. The feasibility of using the conceived platform for real-time remote mentoring has been assessed on the basis of surveys distributed to the trainees after each surgery.
... Based on these functions, various studies have been conducted to determine methods for using HoloLens more efficiently. For instance, a study on effectively controlling the HoloLens, using the gaze, gesture and voice control functions provided by the device, and studies on methods for utilizing this device in various industrial fields have been conducted [7][8][9][10][11][12][13][14][15][16][17]. Research on methods of visualizing the scanned image or data of an actual object as 3D contents [18][19][20] and correcting them [21,22] have also been conducted. ...
Full-text available
Since Microsoft HoloLens first appeared in 2016, HoloLens has been used in various industries, over the past five years. This study aims to review academic papers on the applications of HoloLens in several industries. A review was performed to summarize the results of 44 papers (dated between January 2016 and December 2020) and to outline the research trends of applying HoloLens to different industries. This study determined that HoloLens is employed in medical and surgical aids and systems, medical education and simulation, industrial engineering, architecture, civil engineering and other engineering fields. The findings of this study contribute towards classifying the current uses of HoloLens in various industries and identifying the types of visualization techniques and functions.
Challenges in personality science abound. We need a systematic, theory-based way to examine the dynamics of within-person variability in behavior in response to context and over time. Such a theory-based approach should afford an analysis at different “grain” sizes from moment-to-moment for assessing individual patterns of behavior variability given situational challenges, and changes in those over time (Miller, Shaikh, et al., 2019). And, that approach to personality measurement should afford generalizability to everyday life (GEL). Virtual environments that are representative for assessing personality (VE-RAP) could be key to such methods: Here we argue for rethinking our traditional personality assessment toolbox as an environment that representatively samples from “the world” we wish to generalize to and instantiates that in a representative digital game that affords both within and between-person assessments of personality.
Conference Paper
Full-text available
We present a method for detecting 3D objects using multi-modalities. While it is generic, we demonstrate it on the combination of an image and a dense depth map which give complementary object information. It works in real-time, under heavy clutter, does not require a time consuming training stage, and can handle untextured objects. It is based on an efficient representation of templates that capture the different modalities, and we show in many experiments on commodity hardware that our approach significantly outperforms state-of-the-art methods on single modalities.
Full-text available
We propose a flexible technique to easily calibrate a camera. It only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form solution, followed by a nonlinear refinement based on the maximum likelihood criterion. Both computer simulation and real data have been used to test the proposed technique and very good results have been obtained. Compared with classical techniques which use expensive equipment such as two or three orthogonal planes, the proposed technique is easy to use and flexible. It advances 3D computer vision one more step from laboratory environments to real world use.
Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges of drift in pose estimation, introducing significant errors in the accumulated model. Approaches often require hours of offline processing to globally correct model errors. Recent online methods demonstrate compelling results, but suffer from: (1) needing minutes to perform online correction preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model) pose estimation resulting in many tracking failures; or (3) supporting only unstructured point-based representations, which limit scan quality and applicability. We systematically address these issues with a novel, real-time, end-to-end reconstruction framework. At its core is a robust pose estimation strategy, optimizing per frame for a global set of camera poses by considering the complete history of RGB-D input with an efficient hierarchical approach. We remove the heavy reliance on temporal tracking, and continually localize to the globally optimized frames instead. We contribute a parallelizable optimization framework, which employs correspondences based on sparse features and dense geometric and photometric matching. Our approach estimates globally optimized (i.e., bundle adjusted) poses in real-time, supports robust tracking with recovery from gross tracking failures (i.e., relocalization), and re-estimates the 3D model in real-time to ensure global consistency; all within a single framework. Our approach outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness. Our framework leads to a comprehensive online scanning solution for large indoor environments, enabling ease of use and high-quality results.
Most current approaches for 3D object tracking rely on distinctive object appearances. While several such trackers can be instantiated to track multiple objects independently, this not only neglects that objects should not occupy the same space in 3D, but also fails when objects have highly similar or identical appearances. In this paper we develop a probabilistic graphical model that accounts for similarity and proximity and leads to robust real-time tracking of multiple objects from RGB-D data, without recourse to bolt-on collision detection.
Conference Paper
In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, we propose a challenging new dataset made of 12 objects, for future competing methods on monocular color images.
Real-time high resolution 3d data on the hololens: Project webpage.
  • Lalonde
Lalonde. Real-time high resolution 3d data on the hololens: Project webpage. ˜ jflalonde/ projects/hololens3d/, July 2016.
Bundle-Fusion: Real-time globally consistent 3D reconstruction using online surface re-integration
  • A Dai
  • M Nießner
  • M Zollhöfer
  • S Izadi
  • C Theobalt
A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt. Bundle-Fusion: Real-time globally consistent 3D reconstruction using online surface re-integration. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.