Satoshi Ikehata’s research while affiliated with Tokyo Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (38)


A Simple Finetuning Strategy Based on Bias-Variance Ratios of Layer-Wise Gradients
  • Chapter

December 2024

·

1 Read

Mao Tomita

·

Ikuro Sato

·

Rei Kawakami

·

[...]

·

Masayuki Tanaka


Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
  • Preprint
  • File available

October 2024

In this paper, we present a groundbreaking spectrally multiplexed photometric stereo approach for recovering surface normals of dynamic surfaces without the need for calibrated lighting or sensors, a notable advancement in the field traditionally hindered by stringent prerequisites and spectral ambiguity. By embracing spectral ambiguity as an advantage, our technique enables the generation of training data without specialized multispectral rendering frameworks. We introduce a unique, physics-free network architecture, SpectraM-PS, that effectively processes multiplexed images to determine surface normals across a wide range of conditions and material types, without relying on specific physically-based knowledge. Additionally, we establish the first benchmark dataset, SpectraM14, for spectrally multiplexed photometric stereo, facilitating comprehensive evaluations against existing calibrated methods. Our contributions significantly enhance the capabilities for dynamic surface recovery, particularly in uncalibrated setups, marking a pivotal step forward in the application of photometric stereo across various domains.

Download


Fig. 1: Overview of Gumbel-NeRF. In the forward pass, a set of experts are processed to return densities and radiances. Out of N experts, only one expert with the highest density is selected. This maximum-pooling expert selection guarantees continuity in the final density field, like the original NeRF. Each expert is associated with an expert-specific latent code so that the expert learn to model a part of the object.
Fig. 3: Qualitative results of novel view synthesis of unseen objects using one-shot test-time optimization. Compared to CodeNeRF (CN) and Coded Switch-NeRF (CSN), our Gumbel-NeRF (GN-C) generally produces higher quality, especially for those parts marked by red boxes.
Fig. 4: Visualization of the decomposition provided by Coded Switch-NeRF (CSN) and Gumbel-NeRF (GN). Images in each column are rendered from only the 3D points handled by the corresponding expert.
Quantitative evaluation on ShapeNet-SRN cars test set. Note that we clip the rendered values to 0-1.
GUMBEL-NERF: Representing Unseen Objects as Part-Compositional Neural Radiance Fields

October 2024

·

7 Reads

We propose Gumbel-NeRF, a mixture-of-expert (MoE) neural radiance fields (NeRF) model with a hindsight expert selection mechanism for synthesizing novel views of unseen objects. Previous studies have shown that the MoE structure provides high-quality representations of a given large-scale scene consisting of many objects. However, we observe that such a MoE NeRF model often produces low-quality representations in the vicinity of experts' boundaries when applied to the task of novel view synthesis of an unseen object from one/few-shot input. We find that this deterioration is primarily caused by the foresight expert selection mechanism, which may leave an unnatural discontinuity in the object shape near the experts' boundaries. Gumbel-NeRF adopts a hindsight expert selection mechanism, which guarantees continuity in the density field even near the experts' boundaries. Experiments using the SRN cars dataset demonstrate the superiority of Gumbel-NeRF over the baselines in terms of various image quality metrics.



Investigating the Perception of Facial Anonymization Techniques in 360° Videos

September 2024

·

10 Reads

·

2 Citations

ACM Transactions on Applied Perception

In this work, we investigate facial anonymization techniques in 360° videos and assess their influence on the perceived realism, anonymization effect, and presence of participants. In comparison to traditional footage, 360° videos can convey engaging, immersive experiences that accurately represent the atmosphere of real-world locations. As the entire environment is captured simultaneously, it is necessary to anonymize the faces of bystanders in recordings of public spaces. Since this alters the video content, the perceived realism and immersion could be reduced. To understand these effects, we compare non-anonymized and anonymized 360° videos using blurring, black boxes, and face-swapping shown either on a regular screen or in a head-mounted display (HMD). Our results indicate significant differences in the perception of the anonymization techniques. We find that face-swapping is most realistic and least disruptive, however, participants raised concerns regarding the effectiveness of the anonymization. Furthermore, we observe that presence is affected by facial anonymization in HMD condition. Overall, the results underscore the need for facial anonymization techniques that balance both photo-realism and a sense of privacy.


Fig. 7: Qualitative results on photometric stereo over LUCES dataset [27] We compare (a) single image-based normals obtained by the material decoder of MERLiN and the normals estimated by Fast-NFPS [24] through (b) relit images by MERLiN and (c) real 32 images. Note that the relit images are generated by MERLiN using a single image.
Quantitative comparison of svBRDF estimation (MSE ×10 −2 ) and relighting (SSIM) of MERLiN with Li et al. [22] and Sang et al. [34] over images under point light global illumination from the test set of [22].
MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

September 2024

·

13 Reads

Photometric stereo typically demands intricate data acquisition setups involving multiple light sources to recover surface normals accurately. In this paper, we propose MERLiN, an attention-based hourglass network that integrates single image-based inverse rendering and relighting within a single unified framework. We evaluate the performance of photometric stereo methods using these relit images and demonstrate how they can circumvent the underlying challenge of complex data acquisition. Our physically-based model is trained on a large synthetic dataset containing complex shapes with spatially varying BRDF and is designed to handle indirect illumination effects to improve material reconstruction and relighting. Through extensive qualitative and quantitative evaluation, we demonstrate that the proposed framework generalizes well to real-world images, achieving high-quality shape, material estimation, and relighting. We assess these synthetically relit images over photometric stereo benchmark methods for their physical correctness and resulting normal estimation accuracy, paving the way towards single-shot photometric stereo through physically-based relighting. This work allows us to address the single image-based inverse rendering problem holistically, applying well to both synthetic and real data and taking a step towards mitigating the challenge of data acquisition in photometric stereo.


Investigating the Perception of Facial Anonymization Techniques in 360{\deg} Videos

August 2024

·

10 Reads

In this work, we investigate facial anonymization techniques in 360{\deg} videos and assess their influence on the perceived realism, anonymization effect, and presence of participants. In comparison to traditional footage, 360{\deg} videos can convey engaging, immersive experiences that accurately represent the atmosphere of real-world locations. As the entire environment is captured simultaneously, it is necessary to anonymize the faces of bystanders in recordings of public spaces. Since this alters the video content, the perceived realism and immersion could be reduced. To understand these effects, we compare non-anonymized and anonymized 360{\deg} videos using blurring, black boxes, and face-swapping shown either on a regular screen or in a head-mounted display (HMD). Our results indicate significant differences in the perception of the anonymization techniques. We find that face-swapping is most realistic and least disruptive, however, participants raised concerns regarding the effectiveness of the anonymization. Furthermore, we observe that presence is affected by facial anonymization in HMD condition. Overall, the results underscore the need for facial anonymization techniques that balance both photo-realism and a sense of privacy.



Citations (18)


... Several composite radiance fields for decoupling static and dynamic elements have been proposed with NeRFs [14,38] and 3DGS [43,47] learned from monocular video. Large-scale novel view synthesis with NeRF and 3DGS in urban scenes have incorporated scene semantics [21,24,32] and 3D bounding boxes [35,41,48] to specify moving objects. Moreover, separate modeling of backgrounds to disentangle static objects from lighting and weather conditions have been proposed [13,17,39]. ...

Reference:

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
  • Citing Conference Paper
  • June 2024

... Their benefits include their ease of use and low cost, as well as interactivity, and the possibility to convey a sense of immersive realism and presence [2,3]. Furthermore, 360°videos of real world locations can be enriched by introducing further interaction and collaboration possibilities to create virtual tours or photo-realistic virtual environments [21,22,23,8]. This way, they enhance the engagement and presence of users to bridge the gap between videos and 3D generated virtual environments. ...

360RVW: Fusing Real 360° Videos and Interactive Virtual Worlds
  • Citing Conference Paper
  • October 2023

... Low-light image enhancement techniques allow observation and localization of surface defects in the first wall of the vacuum chamber. To further analyze the impact of the defects on the next round of experiments in the fusion reactor, 3D reconstruction of the defects is also necessary to assess their damage [21,22]. The photometric stereo vision has the advantages of simple operation, fast measurement speed, and the ability to obtain detailed features on the surface, which are more widely used [23,24]. ...

Scalable, Detailed and Mask-Free Universal Photometric Stereo
  • Citing Conference Paper
  • June 2023

... The overlap between the anticipated bounding box coordinates and the ground truth box is shown by the intersection over union. Greater IoU means that there is a tight similarity between the ground truth box coordinates and the anticipated bounding box coordinates (Cao et al., 2023). The metrics (mean average precision, or mAP) and (intersection over union, or IoU) are used to assess the classification and localization performance of the detection approach. ...

Field-of-View IoU for Object Detection in 360° Images
  • Citing Article
  • July 2023

IEEE Transactions on Image Processing

... Un premier réseau détermine les éclairages [5], qui rendent le problème calibré, mais cette approche échoue si l'hypothèse d'éclairages directionnels est mise en défaut. La PS universelle, qui a été proposée très récemment pour limiter ce risque [16,17], s'accommode de conditions d'éclairage très variées, apprises par deep learning. Il s'agit sans doute aujourd'hui de l'approche la plus performante pour résoudre la PS non calibrée. ...

Universal Photometric Stereo Network using Global Lighting Contexts
  • Citing Conference Paper
  • June 2022

... They performed basic image processing, key bone point localization, and RoI enhancement. Sawabe et al. [29] worked on RoI extraction in 360 • images. Considering the benefits obtained in different fields, the RoI extraction is also considered in this work. ...

Saliency-Based Multiple Region of Interest Detection From a Single 360° Image

IEEE Access

... In bandwidth-constrained environments, it becomes crucial to consider both downscaling and compression distortions to optimize visual quality and, subsequently, improve the users' quality of experience (QoE). Although recent research has proposed methods specifically designed for omnidirectional image and video SR [16,17,18,19], most of these approaches focus on upscaling uncompressed content. There remains a scarcity of specialized techniques for enhancing compressed content, particularly at low bitrates. ...

360° Single Image Super Resolution via Distortion-Aware Network and Distorted Perspective Images
  • Citing Conference Paper
  • September 2021

... An alternative to maximum likelihood for classic HS was proposed in Roubtsova and Guillemaut (2014a, b, 2017, 2018. They present a maximum a posteriori formulation for both classic HS (Roubtsova & Guillemaut, 2014a, 2018 The methods are initialised using a Visual Hull (VH) obtained from silhouettes of the input images. ...

CNN-PS: CNN-Based Photometric Stereo for General Non-convex Surfaces: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV
  • Citing Chapter
  • September 2018

Lecture Notes in Computer Science

... The pipeline consists in several steps: first, two cost volume cubes are calculated using respectively the elemental images, in Section 3.2, and the focal stack, in Section 3.3. These two cubes are then refined using a multi-scale approach similar to [31] in Section 3.4 and a contribution from superpixels inspired from [32] in Section 3.5. In Appendix A an overview of the parameters is given in Table A1. ...

Efficiency-enhanced cost-volume filtering featuring coarse-to-fine strategy