July 2024
·
95 Reads
·
2 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
July 2024
·
95 Reads
·
2 Citations
June 2024
·
18 Reads
·
9 Citations
IEEE Transactions on Visualization and Computer Graphics
High-fidelity online 3D scene reconstruction from monocular videos continues to be challenging, especially for coherent and fine-grained geometry reconstruction. The previous learning-based online 3D reconstruction approaches with neural implicit representations have shown a promising ability for coherent scene reconstruction, but often fail to consistently reconstruct fine-grained geometric details during online reconstruction. This paper presents a new on-the-fly monocular 3D reconstruction approach, named GP-Recon, to perform high-fidelity online neural 3D reconstruction with fine-grained geometric details. We incorporate geometric prior (GP) into a scene's neural geometry learning to better capture its geometric details and, more importantly, propose an online volume rendering optimization to reconstruct and maintain geometric details during the online reconstruction task. The extensive comparisons with state-of-the-art approaches show that our GP-Recon consistently generates more accurate and complete reconstruction results with much better fine-grained details, both quantitatively and qualitatively.
March 2024
·
29 Reads
·
11 Citations
Proceedings of the AAAI Conference on Artificial Intelligence
Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained diffusion priors into 3D representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, and detailed results for both novel-view synthesis (NVS) and geometry. In this work, we present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. Specifically, we employ a controller that harnesses epipolar features from input views, guiding a pre-trained diffusion model, such as Stable Diffusion, to produce novel-view images that maintain 3D consistency with the input. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results, even when faced with open-world objects. To address the blurriness introduced by conventional SDS, we introduce the category-score distillation sampling (C-SDS) to enhance detail. We conduct experiments on CO3DV2 which is a multi-view dataset of real-world objects. Both quantitative and qualitative evaluations demonstrate that our approach outperforms previous state-of-the-art works on the metrics regarding NVS and geometry reconstruction.
March 2024
·
8 Reads
Proceedings of the AAAI Conference on Artificial Intelligence
The recent neural surface reconstruction approaches using volume rendering have made much progress by achieving impressive surface reconstruction quality, but are still limited to dense and highly accurate posed views. To overcome such drawbacks, this paper pays special attention on the consistent surface reconstruction from sparse views with noisy camera poses. Unlike previous approaches, the key difference of this paper is to exploit the multi-view constraints directly from the explicit geometry of the neural surface, which can be used as effective regularization to jointly learn the neural surface and refine the camera poses. To build effective multi-view constraints, we introduce a fast differentiable on-surface intersection to generate on-surface points, and propose view-consistent losses on such differentiable points to regularize the neural surface learning. Based on this point, we propose a joint learning strategy, named SC-NeuS, to perform geometry-consistent surface reconstruction in an end-to-end manner. With extensive evaluation on public datasets, our SC-NeuS can achieve consistently better surface reconstruction results with fine-grained details than previous approaches, especially from sparse and noisy camera views. The source code is available at https://github.com/zouzx/sc-neus.git.
October 2023
·
23 Reads
·
4 Citations
August 2023
·
30 Reads
Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained diffusion priors into 3D representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, and detailed results for both novel-view synthesis (NVS) and geometry. In this work, we present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. Specifically, we employ a controller that harnesses epipolar features from input views, guiding a pre-trained diffusion model, such as Stable Diffusion, to produce novel-view images that maintain 3D consistency with the input. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results, even when faced with open-world objects. To address the blurriness introduced by conventional SDS, we introduce the category-score distillation sampling (C-SDS) to enhance detail. We conduct experiments on CO3DV2 which is a multi-view dataset of real-world objects. Both quantitative and qualitative evaluations demonstrate that our approach outperforms previous state-of-the-art works on the metrics regarding NVS and geometry reconstruction.
July 2023
·
21 Reads
The recent neural surface reconstruction by volume rendering approaches have made much progress by achieving impressive surface reconstruction quality, but are still limited to dense and highly accurate posed views. To overcome such drawbacks, this paper pays special attention on the consistent surface reconstruction from sparse views with noisy camera poses. Unlike previous approaches, the key difference of this paper is to exploit the multi-view constraints directly from the explicit geometry of the neural surface, which can be used as effective regularization to jointly learn the neural surface and refine the camera poses. To build effective multi-view constraints, we introduce a fast differentiable on-surface intersection to generate on-surface points, and propose view-consistent losses based on such differentiable points to regularize the neural surface learning. Based on this point, we propose a jointly learning strategy for neural surface and camera poses, named SC-NeuS, to perform geometry-consistent surface reconstruction in an end-to-end manner. With extensive evaluation on public datasets, our SC-NeuS can achieve consistently better surface reconstruction results with fine-grained details than previous state-of-the-art neural surface reconstruction approaches, especially from sparse and noisy camera views.
September 2022
·
50 Reads
High-fidelity 3D scene reconstruction from monocular videos continues to be challenging, especially for complete and fine-grained geometry reconstruction. The previous 3D reconstruction approaches with neural implicit representations have shown a promising ability for complete scene reconstruction, while their results are often over-smooth and lack enough geometric details. This paper introduces a novel neural implicit scene representation with volume rendering for high-fidelity online 3D scene reconstruction from monocular videos. For fine-grained reconstruction, our key insight is to incorporate geometric priors into both the neural implicit scene representation and neural volume rendering, thus leading to an effective geometry learning mechanism based on volume rendering optimization. Benefiting from this, we present MonoNeuralFusion to perform the online neural 3D reconstruction from monocular videos, by which the 3D scene geometry is efficiently generated and optimized during the on-the-fly 3D monocular scanning. The extensive comparisons with state-of-the-art approaches show that our MonoNeuralFusion consistently generates much better complete and fine-grained reconstruction results, both quantitatively and qualitatively.
July 2022
·
42 Reads
·
12 Citations
Graphical Models
Previous object-level Simultaneous Localization and Mapping (SLAM) approaches still fail to create high quality object-oriented 3D map in an efficient way. The main challenges come from how to represent the object shape effectively and how to apply such object representation to accurate online camera tracking efficiently. In this paper, we provide ObjectFusion as a novel object-level SLAM in static scenes which efficiently creates object-oriented 3D map with high-quality object reconstruction, by leveraging neural object priors. We propose a neural object representation with only a single encoder–decoder network to effectively express the object shape across various categories, which benefits high quality reconstruction of object instance. More importantly, we propose to convert such neural object representation as precise measurements to jointly optimize the object shape, object pose and camera pose for the final accurate 3D object reconstruction. With extensive evaluations on synthetic and real-world RGB-D datasets, we show that our ObjectFusion outperforms previous approaches, with better object reconstruction quality, using much less memory footprint, and in a more efficient way, especially at the object level.
January 2022
·
84 Reads
·
14 Citations
IEEE Transactions on Systems Man and Cybernetics Systems
Lecturers, as the guidance of the classroom, play a significant role in the teaching process. However, the lecturers’ sense of space immersion has been ignored in current virtual teaching systems. In this article, we explore the cyber–physical–social intelligence for Edu-Metaverse in cyber–physical–social space and specially design a lecturer-centered immersive teaching system, taking the social and lecturers’ factors into consideration. We call this system VirtualClassroom (V-Classroom). Specifically, we first introduce the cyber–physical–social system (CPSS) paradigm of V-Classroom so that the workflow is standardized and significantly simplified, and the systems can be constructed with off-the-shelf hardware. The key component of V-Classroom is a cyber-world representation of a physical-world classroom instrumented with sparse consumer-grade RGBD cameras for capturing the 3-D geometry and texture of the classrooms. We provide each V-Classroom lecturer with a physical device for sending 6DoF view-change messages and showing view-dependent content of the remote classroom. Following the above paradigm, we develop the V-Classroom algorithms, including V-Classroom depth algorithm (V-DA) and V-Classroom view algorithm (V-VA), to achieve the real-time rendering of remote classrooms. V-DA is dedicated to recovering accurate depth information of the classrooms while V-VA is devoted to real-time novel view synthesis. Finally, we illustrate our implemented CPSS-driven V-Classroom prototype, based on real-world classroom scenarios we collected, and discuss the main challenges and future direction.
... These methods learn motion patterns from independent time input in a vanilla way, neglecting internal cross-time relationships. Some studies introduce the motion flow regularization [23] to explicitly learn motion patterns from neighboring frames [32,34]. Although these methods promote similar motion between adjacent timestamps, they adopt a local perspective on time series during optimization. ...
July 2024
... Online scene reconstruction and rendering. Most online scene reconstruction methods [4,35,43,54,55] focus on 3D reconstruction only via TSDF-based fusion [4,35], surfelbased fusion [43] and SLAM-based reconstruction [54]. However, they cannot render photorealistic novel view synthesis. ...
June 2024
IEEE Transactions on Visualization and Computer Graphics
... Only SplatterImage [97] can predict 3D splats from a single image, and even this method is limited to masked, object-centric scenes. While some approaches incorporate information from generative models ad-hoc to increase plausibility of uncertain regions [89,77,138,63,19,69], they do not learn to sample the true posterior distribution on scenes, which our method aims to learn. ...
March 2024
Proceedings of the AAAI Conference on Artificial Intelligence
... The success of NeRF (Mildenhall et al. 2021) and subsequent works (Trevithick and Yang 2021;Wang et al. 2021b;Yu et al. 2021) have achieved impressive novel view synthesis applications. To overcome the drawback of dense input views, multiple works propose to extra regularizations or priors for sparse view novel view synthesis, such as depth and appearance smoothness (RegNeRF (Niemeyer et al. 2022), MVSNeRF ), ray entropy regularization (InfoNeRF (Kim, Seo, and Han 2022)), perceptual losses (SVS (González et al. 2022)), Spatio-Temporal consistency (Li et al. 2023) or ray distortion (Mip-NeRf360 ) et al. Besides, some recent approaches (Wei et al. 2021;Deng et al. 2022;Roessle et al. 2022) use depth priors to constrain the NeRF optimization, which also achieves promising novel view synthesis results from sparse input views. ...
October 2023
... Far and Rad (2022) state the importance of digital twins in immersive educational contexts, which bridge the physical and digital domains using modern 3D scanning tools. Shen et al. (2023) discuss the V-Classroom, which improves educational interactions through real-time 6DoF visual communication and spatial immersion. Mitra (2023) shows the enhancement of virtual learning environments by incorporating non-player characters (NPCs) that use AI and ML for characteristics such as sentiment analysis and facial recognition. ...
January 2022
IEEE Transactions on Systems Man and Cybernetics Systems
... This method estimates all camera poses and objects within the scene across different views and then employs bundle adjustment to optimize the estimated poses. Certain studies utilize object-level SLAM to estimate the pose of objects [29,30]. Some methods utilize known camera poses, requiring only the prediction of objects' poses [31,32]. ...
July 2022
Graphical Models
... Traditionally, 3D semantic scene reconstruction has relied heavily on 3D data from sensors such as LiDAR and depth cameras [8,9]. While effective, these sensors are often expensive, bulky, and less portable, limiting their widespread use. ...
December 2021
IEEE Transactions on Visualization and Computer Graphics
... Neural implicit scene representations, also known as neural fields [29], have attracted considerable attention in the field of RGB-D SLAM for their impressive expressiveness and low memory footprint. Initial studies, including iMap [6] and DI-Fusion [30], explored the utilization of a single MLP and a feature grid to encode scene geometries within a latent space. However, they both share a critical issue: the problem of network forgetting, which is catastrophic for long-term localization and mapping. ...
June 2021
... However, the predictions on 2D image is not geometric and temporal-aware, which makes the fusion step difficult and inaccurate. Fusion-aware 3D-Conv [37] and SVCNN [11] construct data structures to maintain the information of previous frames and conduct point-based 3D aggregation to fuse the 3D features for semantic segmentation. INS-CONV [16] extends sparse convolution [9; 5] to incremental CNN to efficiently extract global 3D features for semantic and instance segmentation. ...
August 2021
ACM Transactions on Graphics
... They evaluate whether feature points are dynamic by comparing the similarity of triangles formed by three sets of feature points in two keyframes. Zheng-Jun Du [12] proposed a method based on graph cut RANSAC for initial camera pose estimation, using a long-term consistent CRF to perform dynamic 3D landmark detection and constructing a conditional random field with long-term observation consistency for more accurate dynamic point identification. Hyeongjun Jeon [13] proposed a method based on scene flow and conditional random fields. ...
October 2020
IEEE Transactions on Visualization and Computer Graphics