Matthias NieBner’s research while affiliated with Technical University of Munich and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (30)


Intrinsic Image Diffusion for Indoor Single-view Material Estimation
  • Conference Paper

June 2024

·

24 Citations

Peter Kocsis

·

Vincent Sitzmann

·

Matthias NieBner

SSR-2D: Semantic 3D Scene Reconstruction from 2D Images

June 2024

·

15 Reads

·

1 Citation

IEEE Transactions on Pattern Analysis and Machine Intelligence

·

Alexey Artemov

·

·

[...]

·

Matthias NieBner

Most deep learning approaches to comprehensive semantic modeling of 3D indoor spaces require costly dense annotations in the 3D domain. In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations. The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images, fusing cross-domain features into volumetric embeddings to predict complete 3D geometry, color, and semantics with only 2D labeling which can be either manual or machine-generated. Our key technical innovation is to leverage differentiable rendering of color and semantics to bridge 2D observations and unknown 3D space, using the observed RGB images and 2D semantics as supervision, respectively. We additionally develop a learning pipeline and corresponding method to enable learning from imperfect predicted 2D labels, which could be additionally acquired by synthesizing in an augmented set of virtual training views complementing the original real captures, enabling more efficient self-supervision loop for semantics. As a result, our end-to-end trainable solution jointly addresses geometry completion, colorization, and semantic mapping from limited RGB-D images, without relying on any 3D ground-truth information. Our method achieves state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet, surpasses baselines even with costly 3D annotations in predicting both geometry and semantics. To our knowledge, our method is also the first 2D-driven method addressing completion and semantic segmentation of real-world 3D scans simultaneously.










Citations (29)


... LightIt [27] achieves consistent and controllable lighting changes in image generation by conditioning on shading and normal maps in diffusion models. Other recent image-based methods have explored various representations with diffusion models including shading maps [38] and spherical gaussians [28]. ...

Reference:

LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
  • Citing Conference Paper
  • June 2024

... A popular approach is to leverage object retrieval in order to create 3D scenes with high-fidelity object structures, and instead synthesize the scene graph of object layouts [2,8,13,16,23,35,45,46,61,65,68,75,76,79]. Due to the use of object retrieval, scene geometry remains limited to the object database used for retrieval. ...

Learning 3D Scene Priors with 2D Supervision
  • Citing Conference Paper
  • June 2023

... Dynamic surface reconstruction: Reconstructing dynamic surfaces from monocular video is essential for applications such as intelligent robotics and virtual reality. Traditional approaches often depend on predefined object templates [6,20,68] or temporal tracking [11,16,67]. With advances in neural implicit 3D representations [37,40], methods like LASR [56] and ViSER [57] reconstruct articulated shapes using differentiable rendering techniques [30], while BANMo [58] and PPR [59] apply NeRF to dynamic scenes, and SDFFlow [35] models dynamic motion by estimating derivatives of the SDF value. ...

Neural Head Avatars from Monocular RGB Videos
  • Citing Conference Paper
  • June 2022

... Due to the scarcity of supervised data, these methods often suffer from poor reconstruction quality in unseen scenarios. The other class [17,19,26,[30][31][32] stores 3D models in a database, then retrieve and assemble similar 3D models to match the input image. However, the limited geometric clues from a single image make it difficult to precisely identify and arrange the correct models. ...

ROCA: Robust CAD Model Retrieval and Alignment from a Single Image
  • Citing Conference Paper
  • June 2022

... However, these methods struggle with accurate geometry and physical surfaces and are primarily for photorealistic rendering. Some approaches [11,39,50] incorporate depth supervision for better geometry, and others [36,42] attempted to extract watertight meshes from NeRFs. However, these methods are generally limited to small synthetic objects or carefully constructed inward-facing captures. ...

Dense Depth Priors for Neural Radiance Fields from Sparse Input Views
  • Citing Conference Paper
  • June 2022

... 3. Dynamic Dataset. This spatiotemporal dataset consists [31,32] ∼ 330 [60] n/a n/a 300k × × ✓ ✓ Real-World NuScenes [13] 1k 20 10.5k individual trajectories, each of 100 timesteps, resulting in a length of 10 seconds per trajectory. ...

DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data
  • Citing Conference Paper
  • June 2020

... SSC was first proposed by SSCNet [69], and recently surveyed in [67]. Prior works mainly focus on indoor scenes [7,10,11,16,17,19,25,35,47,48,54,71,78,79] with dense, uniform and small-scale point clouds. Semantic KITTI [3] sparked interest in SSC for urban scenes, which pose new challenges due to LiDAR sparsity, large scale, and varying density. ...

RevealNet: Seeing Behind Objects in RGB-D Scans
  • Citing Conference Paper
  • June 2020

... Traditionally, texture generation relies on manual or procedural techniques [29,31,50,54], which were effective for basic applications but lacked complexity. The introduction of global optimization techniques [25,53] allows for more detailed textures that better matched 3D model geometries. AI-based 3D texture generation is initially dominated by generative adversarial networks (GANs) [18,36,41,64], and then the focus has shifted towards latent diffusion models (LDM) [21,44], with models like Stable Diffusion [39,44] showing promising results. ...

Adversarial Texture Optimization From RGB-D Scans
  • Citing Conference Paper
  • June 2020

... DeepDeform [7] leverages deep learning to replace classical feature matching with CNN-based correspondence matching. Li et al. [24] took a step further by differentiating through the N-ICP algorithm, yielding a dense feature matching term. Neural Non-Rigid Tracking [6] shares a similar approach but emphasizes end-to-end robust correspondence estimation. ...

Learning to Optimize Non-Rigid Tracking
  • Citing Conference Paper
  • Full-text available
  • June 2020