Rendered images and viewpoint candidates. a Top fixed trajectory viewpoints and CAD model images, b Fibonacci viewpoints and point cloud rendering images

Rendered images and viewpoint candidates. a Top fixed trajectory viewpoints and CAD model images, b Fibonacci viewpoints and point cloud rendering images

Source publication
Article
Full-text available
In this paper, we propose a novel unsupervised pre-training method for point cloud deep learning models using multimodal contrastive learning. Point clouds, which consist of a set of three-dimensional coordinate points acquired from 3D scanners, lidars, depth cameras, etc. play an important role in representing 3D scenes, and understanding them is...

Similar publications

Article
Full-text available
Autonomous driving technology faces significant challenges in processing complex environmental data and making real-time decisions. Traditional supervised learning approaches heavily rely on extensive data labeling, which incurs substantial costs. This study presents a complete implementation framework combining Deep Deterministic Policy Gradient (...

Citations

... The contrastive learning framework [37] is widely adopted to form a latent feature space shared across different modalities. A notable example is bimodal contrastive learning, which combines 3D point clouds and 2D images [38][39][40][41][42][43][44][45]. In this method, each positive pair is created between a point cloud and an image derived from the same object/scene, while a negative pair is formed by using different objects/scenes. ...
Preprint
Full-text available
Parameter-efficient fine-tuning (PEFT) of pre-trained 3D point cloud Transformers has emerged as a promising technique for 3D point cloud analysis. While existing PEFT methods attempt to minimize the number of tunable parameters, they still suffer from high temporal and spatial computational costs during fine-tuning. This paper proposes a novel PEFT algorithm for 3D point cloud Transformers, called Side Token Adaptation on a neighborhood Graph (STAG), to achieve superior temporal and spatial efficiency. STAG employs a graph convolutional side network that operates in parallel with a frozen backbone Transformer to adapt tokens to downstream tasks. STAG's side network realizes high efficiency through three key components: connection with the backbone that enables reduced gradient computation, parameter sharing framework, and efficient graph convolution. Furthermore, we present Point Cloud Classification 13 (PCC13), a new benchmark comprising diverse publicly available 3D point cloud datasets, enabling comprehensive evaluation of PEFT methods. Extensive experiments using multiple pre-trained models and PCC13 demonstrates the effectiveness of STAG. Specifically, STAG maintains classification accuracy comparable to existing methods while reducing tunable parameters to only 0.43M and achieving significant reductions in both computational time and memory consumption for fine-tuning. Code and benchmark will be available at: https://github.com/takahikof/STAG