PreprintPDF Available

Spec-NeRF: Multi-Spectral Neural Radiance Fields

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

p>Spec-NeRF jointly optimizes the degradation parameters and achieves high-quality multi-spectral image reconstruction results at novel views, which only requires a low-cost camera (like a phone camera but in RAW mode) and several off-the-shelf color filters. We also provide real scenarios and synthetic datasets for related studies. Code is available at https://github.com/CPREgroup/SpecNeRF-v2 </p
Content may be subject to copyright.
SPEC-NERF: MULTI-SPECTRAL NEURAL RADIANCE FIELDS
Jiabao Li, Yuqi Li, Ciliang Sun, Chong Wang, and Jinhui Xiang
Ningbo University
ABSTRACT
We propose Multi-spectral Neural Radiance Fields(Spec-
NeRF) for jointly reconstructing a multispectral radiance
field and spectral sensitivity functions(SSFs) of the camera
from a set of color images filtered by different filters. The
proposed method focuses on modeling the physical imag-
ing process, and applies the estimated SSFs and radiance
field to synthesize novel views of multispectral scenes. In
this method, the data acquisition requires only a low-cost
trichromatic camera and several off-the-shelf color filters,
making it more practical than using specialized 3D scan-
ning and spectral imaging equipment. Our experiments
on both synthetic and real scenario datasets demonstrate
that utilizing filtered RGB images with learnable NeRF and
SSFs can achieve high fidelity and promising spectral re-
construction while retaining the inherent capability of NeRF
to comprehend geometric structures. Code is available at
https://github.com/CPREgroup/SpecNeRF-v2.
Index TermsNovel view synthesis, spectral recon-
struction, spectral sensitivity function
1. INTRODUCTION
In high-fidelity scanning applications, there is a need to ac-
curately reconstruct spectral and geometric information, as
they provide insights into the inherent physical properties
of objects. These attributes play a crucial role in represent-
ing the fundamental characteristics of the scanned objects,
making their precise reconstruction highly desirable[1]. Re-
cently, there has been a growing fascination with merging 3D
computer vision and spectral analysis. Numerous researchers
have crafted spectral 3D models with the aim of enhancing the
accuracy and dependability of computer vision tasks across
diverse applications. These applications encompass areas like
plant modeling [2], agriculture surveillance [3], preservation
of digital cultural heritage [4], and material classification [5].
The conventional methods can be classified into three cat-
egories [1]. 1) Multi-source data fusion includes mapping and
estimation strategies, it combines different data types for bet-
ter 3D models. The mapping strategy mainly focuses on pro-
jecting 2D spectral data onto a 3D point cloud [6, 7, 8, 9,
10] and the estimation strategy using active illumination to
This work is supported by Key Research and Development Program of
Ningbo.
capture and estimate spectral reflection on the 3D structure
[11, 12], especially, in [12], multi-spectral data was acquired
by directly using LED bulbs with various spectral power dis-
tributions. To recover 2) structures from spectra, a standard
Structure from Motion (SfM) technique is employed, gener-
ating 3D models band-by-band from multi-view images at
the same wavelength, which are subsequently fused to cre-
ate multispectral 3D models [13, 14]. 3) Depth estimation,
spectral data can offer more depth cues (including reflectance,
chromatic aberration, and defocus blur) than standard RGB
images [15, 16, 17, 18]. With a dual-camera system, [19,
20] are able to obtain the pixels from multiple bands helping
to get a better correlation for the stereo disparity. In recent
years, NeRF and its follow-up have made great strides in the
field of multi-view synthesis in the field of 3D reconstruction.
[21] proposed the X-NeRF, given a set of images acquired
from sensors with different light spectrum sensitivity (like in-
frared), which can learn a shared cross-spectral scene repre-
sentation, allowing for novel view synthesis across spectra.
In this paper, to provide the capacity of NeRF to recover
the spectral information of a scene without the need for costly
spectrum measuring devices and a priori knowledge of the
Spectral Sensitive Function(SSF) of the used camera, we
modify the conventional NeRF paradigm and propose the
Spec-NeRF , which only takes filtered RGB images from
different views as input. By leveraging a NeRF framework,
we simulate the filtered images using the known filter trans-
mittance profiles and the estimated SSF, employing volume
rendering techniques. This self-supervised learning approach
enables us to learn the spectral characteristics of the scene
effectively. The contribution of this paper can be summarized
as follows:
We present Spec-NeRF to jointly recover the spectral
and geometric information of the scene and the degra-
dation parameter SSF.
Our method only leverages a regular camera and a few
cheap filters, without known SSFs or constructing a
training set, is able to achieve excellent spectra recover-
ing results on both quantitative metrics and subjective
evaluations.
We construct both synthetic and real multi-view multi-
spectral image datasets, providing a general-purpose
benchmark for the training and evaluation of spectral
3D computer vision tasks.
filter1
filter2
filter3
(", $, %, & , ')
!"
#$%&
render
SSF
SpecNeRF
filter2
GT Color
Loss
spectrum
Filter Disc
a) b) c)
Filter Set
Preparation
'!
Image
selection
Fig. 1: The workflow of our proposed method, Spec-NeRF . Left: The experimental setups. We utilize a filter wheel to switch
the filters and acquire the RGB images of the scene using different filters at different positions. Right: The proposed Spec-NeRF
network. a) The assembly groups of the training set, which are selected before sending to b), the key module, where a neural
radiance field is adopted to reconstruct the spectral information. c) The spectrum is rendered into tristimulus values (e.g. RGB)
using estimated SSF and the corresponding filter. The fidelity loss is built upon the RGB modality.
2. METHOD
2.1. Preliminary
The NeRF network takes 5D coordinates (x, y, z, θ , ϕ)of
light rays as inputs and predicts the color and density (c, σ),
which can be formulated as:
c, σ =FΘ(x, y, z , θ, ϕ)(1)
where Θis the network parameters. Specifically, the color Cr
of a camera ray, with nsample points selected along the ray,
is defined as follows:
Cr=
n
X
i=1
Tiαici(2)
where Ti=Qi1
j=0(1 αj)is the accumulated transmittance,
also known as the probability of the ray casting ceases, αi=
1exp(δiσi)is the opacity.
2.2. Spec-NeRF
Our objective is to reconstruct a Neural Representation Field
that encompasses the scene spectrum using kspectral bands
within a specific wavelength range (e.g. 380nm-730nm), and
we can subsequently render a multi-spectral image from any
desired novel viewpoint once the scene reconstruction is com-
pleted. Hence, the NeRF network FΘoutputs c R1×k, and
the rendered color Cr R1×kin Eq.2 now represented as
the spectrum.
To gather the spectral information of the scene, we capture
the scene through distinct filter sets to RGB images, to do
that we built an automatic capturing device to construct the
real-world datasets presented in Fig.1 left, the camera and the
rotatable filter disc are firmly bonded by a structure to ensure
relative position remains unchanged, each filter will be rotated
to the front of the lens before camera takes a shot. Consider
that the observed image has the size of m×n, denoted as
Y R(mn)×3(e.g., tristimulus color image) at each pose,
is captured through a filter f R1×k. This process can be
regarded as a degradation model, expressed as follows:
Yi= (Xif)R(3)
Here, i[1, mn]represents the index of a specific pixel,
while X R(mn)×kdenotes the multi-spectral image at a
single viewpoint. R Rk×3is denoted as the camera’s SSF,
donates the Hadamard product.
To recover the spectral and geometric information from
the multi-view filtered images, we propose a modified NeRF
framework called Spec-NeRF , which adheres to the physical
imaging processing described in Eq.3 to learn the scene rep-
resentation and estimate the SSF, as shown in Fig.1. Since the
filter number is limited, and note that as the number of filters
increases, the solutions to the equation become theoretically
precise, hence, it is desirable to maintain a filter transmittance
matrix with as high rank as possible, that is the filters should
be distinguishable to each other and the total covering range
of wavelength should be continuous.
To represent the camera parameter SSF Rduring the
training of Spec-NeRF , the most straightforward solution
is to represent it directly as a matrix with 3ktrainable pa-
rameters, or a neural function with positional encoding [22].
However, this would introduce a greater degree of freedom to
the optimization process. We address this problem by lever-
aging an implicit neural low-rank representation to represent
the desired SSFs. The basis functions bRk×sof SSFs
can be extracted from SSFs datasets such as [23] by applying
nonnegative PCA; and the coefficients αRs×3are gener-
ated by an MLP with positional encoding. Here sdenotes the
amount of the basis functions. Therefore, the representation
of SSFs is formulated as:
R=bα. (4)
2.3. Training
Our experiment is conducted using the TensoRF framework
[24], however, it can be applied to other NeRF-based ap-
proaches, given its model-agnostic characteristics. Now we
define fidelity loss as
Lrecon =1
2
Yiˆ
Yi
sg(ˆ
Yi) + ϵ
2
F
(5)
also we have ˆ
Yi= (CrfY)×R, in which Crrepresents
the volume-rendered spectrum, and fYdenotes the filter asso-
ciated with the input image. Note here we use relative MSE
loss [25] to penalize the low radiance area, sg() indicates a
stop-gradient operator. Furthermore, we utilize the distortion
loss [26] to reduce the ”floaters” in the empty space, but it is
optional. To this end, the final training loss is:
L=Lrecon +βLdist (6)
where βis a hyper-parameter.
In real datasets, due to imperfections in the capturing pro-
cedure, thus, we initially train the network only using the im-
ages captured with the same filter for the first 2000 epochs
to stabilize the geometry, then we adjust the learning rate to
1.2 times its initial value to avoid local minima and train the
network with the rest images. Note that we do not adopt this
strategy on synthetic datasets.
3. EXPERIMENT
3.1. Experimental Setup & Datasets
Here we discuss the experimental settings and the instructions
for constructing the real and synthetic datasets. We set ϵto
0.01,βto 0.1if we employed the distortion loss term, and
set spectral band number kto 31 in the real dataset and 15 in
the synthetic dataset. Our model is conducted on an NVIDIA
3090 GPU, with a batch size of 8192. The training process
lasted for 25000 epochs, which took approximately 15 min-
utes to conclude when input consisted of 170 images.
We specifically selected 25 SSFs from the database and
identified the most plausible basis matrix through multiple it-
erations of the NMF algorithm, shown in Fig.3 a). The rest
three SSFs are for evaluation in synthetic dataset experiments.
For real scenario dataset, images are captured with a
single-band industrial camera, featuring an SSF with only
one band, denoted as R Rk×1. The images have a resolu-
tion of 1920 ×1080 and RAW format is required for training.
During the scene capture process, we maintained constant
camera settings such as the aperture size, ISO, and shutter
speed. Specifically, we set the gain to 0dB, and the gamma
values were set to (1,1,1). The transmittance profiles of the
filters we measured cover the range of 430nm to 730nm with
an interval of 10nm.
Apropos our capturing device, the disc with 20 uniformly
spaced holes, allowing the filters to be securely attached. A
stepping motor, which is connected to an ESP32 board, con-
trols the rotation of the filter disc, and the camera and disc
are combined using a 3D-printed holder, also a web interface
was developed for controlling the camera’s shutter when each
filter rotates to the front of the lens. The apparatus and multi-
view images are presented in Fig.2.
For synthetic dataset, we built the scenes, especially a 24-
color color checker, in Blender, which is then exported to the
Mitsuba [27, 28]. We used the ”spectral” mode in Mitsuba,
also adjusted the color checker’s reflectance to match its true
spectral profiles, after that the multi-spectral images were ren-
dered with the size of 512 ×512 and covered the wavelength
range from 440nm to 720nm with an interval of 20nm. Next,
a synthetic filter set and a test SSF were adopted to generate
the RGB images.
There remains redundancy in all the input images, hence
we randomly select a specific proportion number of them,
which means the input image has a different number and type
of filters for each viewpoint.
band band
a) b)
Basis Function
Basis Function
Basis Function
Fig. 3: The visualization of the three channels basis matrices
in a) derived from the decomposition of the NMF within the
SSF database. In b), the jointly optimized three-channel SSF
in a synthetic dataset using the basis function of NMF.
3.2. Results
3.2.1. Synthetic Scene
We first evaluate our model on a synthetic dataset and conduct
a visual assessment of the estimated SSF and multi-spectral
image. Fig.3 b) shows the estimated SSF by using the ba-
sis matrix of NMF, the experiment is performed by using 140
input images. Next, in Fig.4 we present the reconstruction
results of color-checker achieved by employing four different
numbers of input images. It exhibits the capability to approx-
imate the ground truth shape even when the number of input
samples is exceedingly low. As the number of input samples
increases to a reasonable level, our approach consistently de-
livers high-quality results that closely align with the ground
truth.
3.2.2. Real Scene
We further test our method on a real dataset captured by
our apparatus in Fig.2. As the spectral reconstruction re-
Acquisition System
holder
disc
camera
Image Samples In Real Dataset Image Samples In Synthetic Dataset
Fig. 2: A multi-view scenario capturing system for the image acquirement, and the resulting images in real datasets were
filtered using the designated filters attached to the rotatable disc. Note that the camera utilized in our real dataset setup is
a single-channel camera, capturing images in a monochromatic format, and a normal three-channel SSF is used in synthetic
datasets.
1
911 15
1
24
wavelength
wavelengthwavelength
wavelength
7
wavelength
Fig. 4: The reconstructed spectrum profiles of the color-
checker block within the synthetic dataset. These profiles are
generated using four different numbers of input images. The
order of color blocks is from left to right, from top to bottom.
PSNR SSIM LPIPS(alex) LPIPS(vgg)
45.50 0.993 0.014 0.108
Table 1: Four metrics between the rendered and captured im-
ages on the real dataset.
sults of the checkerboard shown in Fig.5, the most repre-
sentative three spectrums of color blocks (in color of R, G,
B) are presented. Note that while we do not have the ac-
tual reflected spectrum of the color board, we possess the
reflectance profiles of it from the manufacturer, and the am-
bient light spectrum is also measured, hence, with both, we
can calculate a reflected spectrum as a ”reference”. The RGB
image is rendered by employing an authentic SSF with three
bands, and additionally, a depth map is rendered for geometry
understanding visualization.
We present four quantitative metrics on the test set, as
outlined in Tab.1, which have been computed on degraded
images. These metrics compare two monochromatic images:
one is directly captured, while the other is generated by ren-
dering using the reconstructed multi-spectral image and the
estimated SSF.
Furthermore, we provide synthetic multi-spectral images
from two novel viewpoints across four spectral bands in Fig.6,
demonstrating the capability of our method to generate high-
fidelity images while preserving high resolution in both spa-
tial and spectral dimensions.
Wavelength (nm)
Intensity
1
Wavelength (nm)
Intensity
2
Wavelength (nm)
Intensity
3
RGB
Depth
123
Fig. 5: The rendered RGB image, depth map and spectrum
reconstruction results on the real dataset at a novel view.
490nm 530nm 590nm430nm
1
2
Fig. 6: Visualization of the rendered multi-spectral images at
two novel views, we present the scene at four specific wave-
lengths. 4. CONCLUSION
In this paper, we propose Spec-NeRF that jointly optimizes
the degradation parameters and achieves high-quality multi-
spectral image reconstruction results at novel views, which
only requires a low-cost camera and filters. We also provide
two types of datasets for related studies. For future work,
we seek to decouple the albedo into ambient light and the
reflectance of the surfaces with similar low-cost settings.
5. REFERENCES
[1] Yajie Sun et al., “Spectral 3d computer vision–a review,
arXiv preprint arXiv:2302.08054, 2023.
[2] Jie Liang et al., “3d plant modelling via hyperspectral
imaging,” in CVPR, 2013, pp. 172–177.
[3] Lu´
ıs P´
adua et al., “Vineyard variability analysis through
uav-based vigour maps to assess climate change im-
pacts,” Agronomy, vol. 9, no. 10, pp. 581, 2019.
[4] Camille Simon Chane et al., “Registration of 3d and
multispectral data for the study of cultural heritage sur-
faces, Sensors, vol. 13, no. 1, pp. 1004–1020, 2013.
[5] Haida Liang et al., “Remote spectral imaging with si-
multaneous extraction of 3d topography for historical
wall paintings, ISPRS, vol. 95, pp. 13–22, 2014.
[6] Magdv Elbahnasawy et al., “Multi-sensor integration
onboard a uav-based mobile mapping system for agri-
cultural management,” in IGARSS. IEEE, 2018, pp.
3412–3415.
[7] Alfonso L´
opez et al., “Generation of hyperspectral point
clouds: Mapping, compression and rendering,” Com-
puters & Graphics, vol. 106, pp. 267–276, 2022.
[8] Alejandro Graciano et al., “Quadstack: An efficient
representation and direct rendering of layered datasets,”
TVCG, vol. 27, no. 9, pp. 3733–3744, 2020.
[9] Alfonso L´
opez et al., An optimized approach for gen-
erating dense thermal point clouds from uav-imagery,”
ISPRS, vol. 182, pp. 78–95, 2021.
[10] Juan Manuel Jurado et al., “Multispectral mapping on
3d models and multi-temporal monitoring for individual
characterization of olive trees, Remote Sensing, vol. 12,
no. 7, pp. 1106, 2020.
[11] Chunyu Li et al., “Pro-cam ssfm: Projector-camera sys-
tem for structure and spectral reflectance from motion,”
in ICCV, 2019, pp. 2414–2423.
[12] Chunyu Li et al., “Spectral mvir: Joint reconstruction
of 3d shape and spectral reflectance,” in ICCP. IEEE,
2021, pp. 1–12.
[13] Ali Zia et al., “3d reconstruction from hyperspectral
images,” in WACV. IEEE, 2015, pp. 318–325.
[14] Ali Zia et al., “3d plant modelling using spectral data
from visible to near infrared range,” in Computer Vision
and Pattern Recognition in Environmental Informatics,
pp. 273–294. IGI Global, 2016.
[15] Himanshu Kumar et al., “Defocus map estimation from
a single image using principal components,” in ISPCC.
IEEE, 2015, pp. 163–167.
[16] Shin Ishihara et al., “Depth from spectral defocus blur,”
in ICIP. IEEE, 2019, pp. 1980–1984.
[17] Ali Zia et al., “Relative depth estimation from hyper-
spectral data,” in DICTA. IEEE, 2015, pp. 1–7.
[18] Ali Zia et al., “Exploring chromatic aberration and de-
focus blur for relative depth estimation from monocu-
lar hyperspectral image,” TIP, vol. 30, pp. 4357–4370,
2021.
[19] Nina Heide et al., “Real-time hyperspectral stereo pro-
cessing for the generation of 3d depth information,” in
ICIP. IEEE, 2018, pp. 3299–3303.
[20] Yisong Luo et al., Augmenting depth estima-
tion from deep convolutional neural network us-
ing multi-spectral photometric stereo,” in Smart-
World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI. IEEE,
2017, pp. 1–6.
[21] Matteo Poggi et al., “Cross-spectral neural radiance
fields,” in 3DV. IEEE, 2022, pp. 606–616.
[22] Jiabao Li, Yuqi Li, Chong Wang, et al., “Busifusion:
Blind unsupervised single image fusion of hyperspectral
and rgb images, TCI, vol. 9, pp. 94–105, 2023.
[23] Jun Jiang et al., “What is the space of spectral sensitivity
functions for digital color cameras?,” in WACV. IEEE,
2013, pp. 168–179.
[24] Anpei Chen et al., “Tensorf: Tensorial radiance fields,”
in ECCV. Springer, 2022, pp. 333–350.
[25] Ben Mildenhall et al., “Nerf in the dark: High dynamic
range view synthesis from noisy raw images, in CVPR,
2022, pp. 16190–16199.
[26] Jonathan T Barron et al., “Mip-nerf 360: Unbounded
anti-aliased neural radiance fields,” in CVPR, 2022, pp.
5470–5479.
[27] Wenzel Jakob et al., “Dr.jit: A just-in-time compiler
for differentiable rendering, Transactions on Graphics
(Proceedings of SIGGRAPH), vol. 41, no. 4, July 2022.
[28] Wenzel Jakob et al., “Mitsuba 3: A retargetable
forward and inverse renderer,” https://www.mitsuba-
renderer.org/, 2023.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Hyperspectral data are being increasingly used for the characterization and understanding of real-world scenarios. In this field, UAV-based sensors bring the opportunity to collect multiple samples from different viewpoints. Thus, light-material interactions of real objects may be observed in outdoor scenarios with a significant spatial resolution (5 cm/pixel). Nevertheless, the generation of hyperspectral 3D data still poses challenges in post-processing due to the high geometric deformation of images. Most of the current solutions use both LiDAR (Light Detection and Ranging) and hyperspectral sensors, which are integrated into the same acquisition system. However, these present several limitations due to errors derived from inertial measurements for data fusion and the spatial resolution according to the LiDAR capabilities. In this work, a method is proposed for the generation of hyperspectral point clouds. Input data are formed by push-broom hyperspectral and 3D point clouds. On the one hand, the point clouds may be obtained by applying a typical photogrammetric workflow or LiDAR technology. Then, hyperspectral images are geometrically corrected and aligned with the RGB orthomosaic. Accordingly, hyperspectral data are ready to be mapped on the 3D point cloud. This procedure is implemented on the GPU by testing which points are visible for each pixel of the hyperspectral imagery. This work also provides a novel solution to generate, compress and render 3D hyperspectral point clouds, enabling the study of geometry and the hyperspectral response of natural and artificial materials in the real world.
Article
Full-text available
Thermal infrared (TIR) images acquired from Unmanned Aircraft Vehicles (UAV) are gaining scientific interest in a wide variety of fields. However, the reconstruction of three-dimensional (3D) point clouds utilizing consumer-grade TIR images presents multiple drawbacks as a consequence of low-resolution and induced aberrations. Consequently, these problems may lead photogrammetric techniques, such as Structure from Motion (SfM), to generate poor results. This work proposes the use of RGB point clouds estimated from SfM as the input for building thermal point clouds. For that purpose, RGB and thermal imagery are registered using the Enhanced Correlation Coefficient (ECC) algorithm after removing acquisition errors, thus allowing us to project TIR images into an RGB point cloud. Furthermore, we consider several methods to provide accurate thermal values for each 3D point. First, the occlusion problem is solved through two different approaches, so that points that are not visible from a viewing angle do not erroneously receive values from foreground objects. Then, we propose a flexible method to aggregate multiple thermal values considering the dispersion from such aggregation to the image samples. Therefore, it minimizes error measurements. A naive classification algorithm is then applied to the thermal point clouds as a case study for evaluating the temperature of vegetation and ground points. As a result, our approach builds thermal point clouds with up to 798,69% more point density than results from other commercial solutions. Moreover, it minimizes the build time by using parallel computing for time-consuming tasks. Despite obtaining larger point clouds, we report up to 96,73% less processing time per 3D point.
Article
Full-text available
3D plant structure observation and characterization to get a comprehensive knowledge about the plant status still poses a challenge in Precision Agriculture (PA). The complex branching and self-hidden geometry in the plant canopy are some of the existing problems for the 3D reconstruction of vegetation. In this paper, we propose a novel application for the fusion of multispectral images and high-resolution point clouds of an olive orchard. Our methodology is based on a multi-temporal approach to study the evolution of olive trees. This process is fully automated and no human intervention is required to characterize the point cloud with the reflectance captured by multiple multispectral images. The main objective of this work is twofold: (1) the multispectral image mapping on a high-resolution point cloud and (2) the multi-temporal analysis of morphological and spectral traits in two flight campaigns. Initially, the study area is modeled by taking multiple overlapping RGB images with a high-resolution camera from an unmanned aerial vehicle (UAV). In addition, a UAV-based multispectral sensor is used to capture the reflectance for some narrow-bands (green, near-infrared, red, and red-edge). Then, the RGB point cloud with a high detailed geometry of olive trees is enriched by mapping the reflectance maps, which are generated for every multispectral image. Therefore, each 3D point is related to its corresponding pixel of the multispectral image, in which it is visible. As a result, the 3D models of olive trees are characterized by the observed reflectance in the plant canopy. These reflectance values are also combined to calculate several vegetation indices (NDVI, RVI, GRVI, and NDRE). According to the spectral and spatial relationships in the olive plantation, segmentation of individual olive trees is performed. On the one hand, plant morphology is studied by a voxel-based decomposition of its 3D structure to estimate the height and volume. On the other hand, the plant health is studied by the detection of meaningful spectral traits of olive trees. Moreover, the proposed methodology also allows the processing of multi-temporal data to study the variability of the studied features. Consequently, some relevant changes are detected and the development of each olive tree is analyzed by a visual-based and statistical approach. The interactive visualization and analysis of the enriched 3D plant structure with different spectral layers is an innovative method to inspect the plant health and ensure adequate plantation sustainability.
Article
Hyperspectral images (HSIs) provide rich spectral information that has been widely used in numerous computer vision tasks. However, their low spatial resolution often prevents their use in applications such as image segmentation and recognition. Fusing low-resolution HSIs with high-resolution RGB images to reconstruct high-resolution HSIs has attracted great research attention recently. In this paper, we propose an unsupervised blind fusion network that operates on a single HSI and RGB image pair and requires neither known degradation models nor any training data. Our method takes full advantage of an unrolling network and coordinate encoding to provide a state-of-the-art HSI reconstruction. It can also estimate the degradation parameters relatively accurately through the neural representation and implicit regularization of the degradation model. The experimental results demonstrate the effectiveness of our method both in simulations and in our real experiments. The proposed method outperforms other state-of-the-art nonblind and blind fusion methods on two popular HSI datasets. Our related code and data is available at https://github.com/CPREgroup/Real-Spec-RGB-Fusion .
Chapter
We present TensoRF, a novel approach to model and reconstruct radiance fields. Unlike NeRF that purely uses MLPs, we model the radiance field of a scene as a 4D tensor, which represents a 3D voxel grid with per-voxel multi-channel features. Our central idea is to factorize the 4D scene tensor into multiple compact low-rank tensor components. We demonstrate that applying traditional CANDECOMP/PARAFAC (CP) decomposition – that factorizes tensors into rank-one components with compact vectors – in our framework leads to improvements over vanilla NeRF. To further boost performance, we introduce a novel vector-matrix (VM) decomposition that relaxes the low-rank constraints for two modes of a tensor and factorizes tensors into compact vector and matrix factors. Beyond superior rendering quality, our models with CP and VM decompositions lead to a significantly lower memory footprint in comparison to previous and concurrent works that directly optimize per-voxel features. Experimentally, we demonstrate that TensoRF with CP decomposition achieves fast reconstruction (<30 min) with better rendering quality and even a smaller model size (<4 MB) compared to NeRF. Moreover, TensoRF with VM decomposition further boosts rendering quality and outperforms previous state-of-the-art methods, while reducing the reconstruction time (<10 min) and retaining a compact model size (<75 MB).
Article
This article investigates spectral chromatic and spatial defocus aberration in a monocular hyperspectral image (HSI) and proposes methods on how these cues can be utilized for relative depth estimation. The main aim of this work is to develop a framework by exploring intrinsic and extrinsic reflectance properties in HSI that can be useful for depth estimation. Depth estimation from a monocular image is a challenging task. An additional level of difficulty is added due to low resolution and noises in hyperspectral data. Our contribution to handling depth estimation in HSI is threefold. Firstly, we propose that change in focus across band images of HSI due to chromatic aberration and band-wise defocus blur can be integrated for depth estimation. Novel methods are developed to estimate sparse depth maps based on different integration models. Secondly, by adopting manifold learning, an effective objective function is developed to combine all sparse depth maps into a final optimized sparse depth map. Lastly, a new dense depth map generation approach is proposed, which extrapolate sparse depth cues by using material-based properties on graph Laplacian. Experimental results show that our methods successfully exploit HSI properties to generate depth cues. We also compare our method with state-of-the-art RGB image-based approaches, which shows that our methods produce better sparse and dense depth maps than those from the benchmark methods.
Article
We introduce QuadStack, a novel algorithm for volumetric data compression and direct rendering. Our algorithm exploits the data redundancy often found in layered datasets that are common in science and engineering fields such as geology, biology, mechanical engineering, medicine, etc. QuadStack first compresses the volumetric data into vertical stacks that are then compressed into a quadtree that identifies and represents the layered structures at the internal nodes. The associated data (color, material, density, etc.) and shape of these layer structures are decoupled and encoded independently, leading to high compression rates (4× to 45×) of the original voxel model memory footprint in our experiments). We also introduce an algorithm for value retrieving from the QuadStack representation and we show that the access has logarithmic complexity. Because of the fast access, QuadStack is suitable for efficient data representation and direct rendering and we show that our GPU implementation performs comparable in speed with the state-of-the-art algorithms (18-79 MRays/s in our implementation), while maintaining a significantly smaller memory footprint.