ArticlePDF Available

Compressive Non-Line-of-Sight Imaging with Deep Learning

Authors:

Abstract and Figures

In non-line-of-sight (NLOS) imaging, the spatial information of hidden targets is reconstructed from the time-of-light (TOF) of the multiple bounced signal photons. The need for NLOS imagers to perform extensive scanning in the transverse spatial dimensions constrains the imaging speed and reconstruction quality while limiting their applications on static scenes. Utilizing a photon TOF histogram with picosecond temporal resolution, we develop compressive non-line-of-sight imaging enabled by deep learning. Two-dimensional images (32×32 pixels) of the NLOS targets can be reconstructed with superior reconstruction quality via a convolutional neural network (CNN), using significantly downscaled data (8×8 scanning points) at a downsampling ratio of 6.25% compared to the traditional methods. The CNN is end-to-end trained purely using simulated data but robust for image reconstruction with experiment data. Our results suggest that deep learning is effective for reducing the scanning points and total capture time towards scanningless NLOS imaging and videography.
Content may be subject to copyright.
PHYSICAL REVIEW APPLIED 19, 034090 (2023)
Compressive Non-Line-of-Sight Imaging with Deep Learning
Shenyu Zhu ,1,2 Yong Meng Sua ,1,2, *Ting Bu,1,2 and Yu-Ping Huang1,2,
1Department of Physics, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, New Jersey 07030,
USA
2Center for Quantum Science and Engineering, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken,
New Jersey 07030, USA
(Received 19 July 2022; revised 19 January 2023; accepted 23 February 2023; published 28 March 2023)
In non-line-of-sight (NLOS) imaging, the spatial information of hidden targets is reconstructed from
the time-of-light (TOF) of the multiple bounced signal photons. The need for NLOS imagers to perform
extensive scanning in the transverse spatial dimensions constrains the imaging speed and reconstruction
quality while limiting their applications on static scenes. Utilizing a photon TOF histogram with picosec-
ond temporal resolution, we develop compressive non-line-of-sight imaging enabled by deep learning.
Two-dimensional images (32 ×32 pixels) of the NLOS targets can be reconstructed with superior recon-
struction quality via a convolutional neural network (CNN), using significantly downscaled data (8 ×8
scanning points) at a downsampling ratio of 6.25% compared to the traditional methods. The CNN is
end-to-end trained purely using simulated data but robust for image reconstruction with experiment data.
Our results suggest that deep learning is effective for reducing the scanning points and total capture time
towards scanningless NLOS imaging and videography.
DOI: 10.1103/PhysRevApplied.19.034090
I. INTRODUCTION
Imaging and sensing a hidden target outside of the
direct line-of-sight has been an emerging research topic
in various fields, such as autonomous driving [1], remote
sensing [2], and biomedical imaging [3]. To image a hid-
den object around the corner, the geometric information
of the observable “wall” as the diffuser is first measured.
The time-of-flight (TOF) information of the returning sig-
nal photons is measured at a series of scanning points on
the wall and then processed regarding the geometric infor-
mation [47]. The temporal profile and statistics of the
returning TOF histograms of the signal photons are related
to the travel path length in the scene of detection, from
which the computational imaging methods can image and
sense the target in various environments. Thus, the three-
dimensional position information of the hidden target can
be retrieved. At each scanning point, the scattered probe
light reaches a certain area on the target (defined by the
scattering angle of the wall); thus, the measured tempo-
ral histogram of the returning photons contains the spatial
information of that area of the target [5]. This scanning
area coverage opens up the possibility for reconstructing
a higher pixel number NLOS image of the hidden target
using much fewer scanning points, which can be achieved
*ysua@stevens.edu
yuping.huang@stevens.edu
using compressed sensing [8]. Providing adequate tem-
poral resolution, the reconstruction using fewer scanning
points can also be done using deep-learning-based meth-
ods, which has recently played a role in extracting effective
information and reconstructing the target image regard-
ing various fields [9], such as TOF imaging [10,11],
compressed sensing [12], polarimetric imaging [13], and
optoacoustic tomography [14].
Various algorithms have been used to reconstruct the
NLOS image. One branch of methods first models the
imaging scenario and then computes the three-dimensional
position of the hidden target based on the model. Back-
projection-based methods estimate the most probable posi-
tion and shape for the target [4,15]. Deconvolution-based
methods compute the least-error estimation of the target
position according to the Poisson statistics and light prop-
agation model [16,17]. Fast Fourier transform increases
the processing speed for solving the deconvolution prob-
lem [5,1820]. These methods can reconstruct a three-
dimensional point cloud of the hidden target. However,
these methods usually require the same dimension for the
input data and the result, which increases the data acqui-
sition time if higher pixel numbers are needed. Another
branch of methods utilizes deep learning to train a mathe-
matical model projection from the three dimensional TOF
information to the desired feature of the hidden target
[21]. The feature can be the three-dimensional positions
of the target, the two-dimensional intensity image, or
the depth map of the hidden target. Recent studies have
2331-7019/23/19(3)/034090(9) 034090-1 © 2023 American Physical Society
ZHU, SUA, BU, and HUANG PHYS. REV. APPLIED 19, 034090 (2023)
demonstrated that the depth map and image of the hidden
scene can be reconstructed using various kinds of deep
learning methods, such as the U-Net [22], the nonlocal
neural network [23], the neural transient field [24], and so
on [25]. These methods provide an alternative method for
image reconstruction and may extract the desired informa-
tion from the input data more effectively [11,26]. In both
cases, higher temporal resolution is required for higher
reconstruction quality.
Conventional single-photon detection systems are lim-
ited by the timing jitter and the “pile-up” effect. The timing
resolution is several tens of picoseconds for the state-of-art
single-photon avalanche diode (SPAD) imaging systems
[17,27]. Recently, up-conversion single-photon detection
has achieved picosecond-resolution NLOS imaging and
sensing by performing nonlinear optical gating for the
picosecond pulses of the returning NLOS signal photons
[28,29]. Here, we utilize a single-pixel nonlinear gated
single-photon imager for NLOS data acquisition [30,31],
which has a 10-ps timing resolution and is independent
from the timing jitter of the associated electrical device.
It also isolates the three-time bounced signal photons
from the environmental noise, especially the one-time scat-
tered photons from the “wall,” which is usually several
magnitudes stronger. As a result, the imager gets rid of
the “pile-up” effect. Thus the higher temporal resolution
provides more precise temporal information of the hid-
den target, and is able to yield a more detailed spatial
information of the target.
Utilizing picosecond temporal resolution photon TOF
histogram, we demonstrate a NLOS image reconstruc-
tion method using a convolutional neural network (CNN),
which projects the three-dimensional input data (pho-
ton counting histograms on different pixels) to the two-
dimensional NLOS image. The CNN can reconstruct the
image of the hidden target using spatially downscaled
data [8,12] at a downsampling ratio of 6.25 %. Thanks
to the results that the simulated photon counting tempo-
ral histograms are very similar to the experiment ones, the
CNN can be trained by only using the simulated data with
end-to-end training, yet it is very robust in image recon-
struction with experiment data. This shows that the high
timing resolution shrinks the difference between the simu-
lation and experiment data, enabling CNN training without
real-world data [22]. Additionally, the simulated data are
generated using a set of simple geometric shapes [32],
which are different from the shapes of experimental targets.
This shows that fed by the high temporal resolution input
data, the reconstruction model from three-dimensional data
to two-dimensional NLOS image can be built using the
CNN [23,33]. Using the CNN, the input data are down-
scaled in the temporal domain and upscaled in the spatial
domain for reconstructing the NLOS image, showing the
capability of retrieving the spatial information from the
temporal data [26].
II. IMAGING SYSTEM SETUP
The NLOS imaging system utilizes a nonlinear gated
single-photon detection (NGSPD) system [29] to capture
the three-bounced signal photons. The NLOS imaging sys-
tem is shown in Fig. 1. The imaging system uses a 50-MHz
femtosecond mode-locked laser as the light source. The
pump and probe laser pulses are generated by filtering
the mode-locked laser using a pair of 200-GHz dense
wavelength-division multiplexers (DWDMs). The center
wavelength of the pump pulse is 1565.5 nm, and that of
the probe is 1554.1 nm. The pulse widths of the pump
and the probe are 6.8 and 6.3 ps, respectively, shown in
the upper left and middle in Fig. 1. The pump laser passes
through an optical delay line (ODL). The probe laser pulse
is sent out from an optical transceiver at a beam size
of 2.2 mm FWHM, from which the probe beam is first
steered by a MEMS mirror, and then illuminates differ-
ent scanning points on the diffuser to probe the hidden
target. The transceiver receives the returning signal pho-
tons on the same scanning point for illumination, which
forms a confocal configuration [5]. The optical transceiver
is made of a fiber collimator, which is composed of an
angle-polished single mode fiber and an aspheric lens
(f=11.0 mm, NA =0.25). The returning signal pulses
are first isolated from the outgoing probe using an opti-
cal circulator ( 55 dB isolation ratio), then mixed with an
optical pump pulse train using another DWDM. Then the
pump and signal are fiber coupled into a commercial quasi-
phase-matching nonlinear optical waveguide, where the
pump up-converts the returning signal into sum-frequency
photons.
The nonlinear optical waveguide has the center quasi-
phase-matching wavelength at 1559.8 nm, and the internal
conversion efficiency at 137 %/(Wcm
2). When the delay
time of the pump is scanned by the ODL, the pump
pulses temporally sweep across the signal photons. At each
optical delay, the photons at the band of the designed sum-
frequency wavelength are detected using a silicon-based
SPAD (approximately 70 % efficiency at 780 nm), then
counted by a field-programmable gate array (FPGA). At
each scanning point, the photon count versus the delay
time of the pump forms a temporal histogram. The up-
conversion process can happen effectively only when the
signal is at certain temporal-frequency mode [30]. The
impulse response of the system is defined by the FWHM
of the sum-frequency temporal histogram, as shown in
the upper right inset figure of Fig. 1. This FWHM is
10 ps, which defines the temporal resolution [31,34]of
the system. The background count rate is about 5.5 kHz,
including the intrinsic dark count rate of the SPAD (200
Hz) and the Raman noise (5.3 kHz) in the nonlinear optical
waveguide.
The detected scene is composed of a 2-inch diame-
ter metallic diffuser (120 grit, reflectivity >96 %) as the
034090-2
COMPRESSIVE NON-LINE-OF-SIGHT IMAGING... PHYS. REV. APPLIED 19, 034090 (2023)
“wall,” and the hidden target, which is 12 cm away from
the diffuser. The hidden targets, made of retroreflective
tape cut into various shapes, are attached on an ordinary
BK-7 glass plate. The transmittance of the glass plates is
about 92 %, so that most of the light can transmit through
the glass plate.
III. NLOS DATA SIMULATION AND
EXPERIMENT ACQUISITION
The training dataset for the CNN is generated from the
confocal light-cone model [5] using simulated simple geo-
metric shapes as the targets. First, eight kinds of simple
geometric shapes (circle, arc, square, triangle, semicir-
cle, rectangle, ring, and Lshape) are randomly generated
in different geometric parameters (i.e., position, size, and
rotation angle, etc.) [32], which are later used as the train-
ing labels (output) for the CNN. These shapes are used
since they contain the basic geometric elements of more
complex targets, such as letters, while not containing the
exact experiment target themselves. The total number of
the shapes are 20 000, and the size of each shape is 32 ×32
pixels. The simulated data, i.e., the temporal histograms of
each shape are generated from the confocal model as
c(u,v,t)=h(t)1
rbx,y,z
s(α)δ (ux)2+(v y)2
+z2ct
22o(x,y,z),(1)
where x,y,zare the three-dimensional coordinates in the
object space, u,vare the scanning points on the diffuser
space, tis the measured photon arrival time. In such a
way, we define o(x,y,z)as the reflectivity of the target
in the object space, and c(u,v,t)is the retrieved photon
counting temporal histograms at different scanning points
(u,v). The function δ((ux)2+(v y)2+z2(ct/2)2)
describes the TOF (t) for the probe beam to travel from
point (u,v) on the diffuser to the object (x,y,z). We assume
the object surface is even, and the reflectivity is normal-
ized to 1. s(α) is the scattering angle profile of the diffuser
(about 60FWHM), where αis the angle between the inci-
dent laser and scattering direction toward point (x,y,z)at
the scanning point (u,v). The optical power falloff caused
by the scattering is described by the term 1/rb, where
r=(ux)2+(v y)2+z2,andbis the falloff factor.
The falloff factor is evaluated by comparing the temporal
Optical
circulator
Returning
signal
MEMS
mirror
Opcal
transceiver
Mirror
Diffuser
Hidden
target
Block
Probe laser pulses
NGSPD
Center
point
Average five points in
the circle
Scanning
paern
on the
diffuser
FIG. 1. Setup of the NLOS imaging system. MEMS, microelectromechanical system; NGSPD, nonlinear optical gated single photon
detector. The upper three figures show the pulse profile of the pump and signal as well as the sum-frequency impulse response of the
system. The pump and the probe pulse profiles are measured using frequency-resolved optical gating (FROG), while the sum-frequency
power is measured using a power meter at different optical delays. The middle left subplot is the scanning pattern on the diffuser for
NLOS imaging.
034090-3
ZHU, SUA, BU, and HUANG PHYS. REV. APPLIED 19, 034090 (2023)
Input
8×8×80
Output
32 × 32
Conv3D
(Kernel, 3×3×32)
Conv2D
(Kernel, 3×3)
Conv3DTrans
(Kernel, 3×3×32)
Flatten/Reshape
Dense
3D feature extraction 2D feature
extraction
Down -sampler
FIG. 2. Structure of the convolutional neural network. The size of the input and the output layer is labeled at the bot-
tom of the block. The network structure between layers is labeled with arrows of different colors, labeled in the right-bottom
part.
histograms of the experiment results and simulated results,
and we set bto be approximately 2.3 for the retroreflec-
tive target. h(t)is the impulse response of the system. The
simulated dataset is then computed using Eq. (1).Wesim-
ulate 8 ×8 scanning points for each shape as the input of
the training set, while the time bin width is set to 1 ps.
Using the NGSPD, the returning photons from the diffuser
and the background are gated out from those from the tar-
get. Thus, the temporal histogram has no effective signals
at arrival time earlier than the TOF position of the target.
Hence, we discard most of the time bins beyond the posi-
tion of the target, leaving only 80 time bins that contain the
signal from the target as the input of CNN. The temporal
histograms are normalized before being put into the CNN.
The CNN is built using Keras [35]. We first use three-
dimensional convolutional layers for feature extraction,
and use one dense layer to downsample the features
and then reshape them into two-dimensional images, and
finally use a series of two-dimensional convolutional lay-
ers to optimize the result. The structure of the neural
network is shown in Fig. 2. All the layers use ReLU func-
tion as activation function, except for the dense layer,
which uses sigmoid function. All the simulated data are
used as the training dataset. During the training process,
we first randomize the sequence of the dataset and divide
the dataset into five parts, with each part having 4000
shapes. Then, we use the k-fold cross validation to address
the possible overfitting problem caused by the random-
ness of training and validation datasets. Finally the CNN
will be fit for all the training data, and mean-squared
error function is used as the optimization function. Conse-
quently, we build a three-dimensional to two-dimensional
NLOS image reconstruction model using the CNN, with
the input size of 8 ×8 scanning positions ×80 time bins,
and output size of 32 ×32 pixels. We also work on a more
compressed 4 ×4 scanning positions case and a lower
temporal resolution case, where the structure of the CNNs
are the same but trained using respective simulated data
sets. The training is conducted on a computer with an
Intel Core I9-10900 CPU, 64 GB RAM, and an NVIDIA
GeForce RTX 3080 GPU. In the training process, the
learning rate is set at 1 ×105, and the CNN converges
in 200 epochs. Even though a simple CNN model is being
used in this work, note that other deep learning models
with more sophisticated architectures incorporating phys-
ical priori [36] may provide better reconstruction quality;
see the Supplemental Material [37] where a simple U-Net
is used as a comparison.
To capture the experiment data for each target, the
MEMS mirror is steered to raster scan the probe laser
beam on the diffuser. To train the CNN by using purely
simulated data, the experiment data has to closely resem-
ble the simulated data. This places stringent requirements
on noise rejection and temporal resolution of the pho-
ton detection. The former will help to mitigate anomaly
spike in the photon arrival-time histogram, and the latter
is crucial to fully capture the geometrical information of
the target. To minimize the effect of speckle noise (aris-
ing from the rough surface of diffusive wall) that typically
causes discrepancy between the simulated and experiment
temporal histograms (see Appendix I) [38], at each scan-
ning position, we measure the temporal histogram of five
neighboring scanning points and calculate their average as
the final histogram for each pixel. The scanning pattern is
shown in the middle-left inset of Fig. 1. In this way, the
random photon-number spike in the temporal histogram
caused by the speckle field can be mitigated, reducing
the dissimilarity between experiment data to the simulated
data. At each scanning point, the dwell time per delay is 2
ms, so that the total dwell time per delay for each pixel is
summedupto10ms[29]. The temporal scanning inter-
val is 1 ps. We scan 16 ×16 pixels for each target in
order to reconstruct the image using the LCT algorithm [5]
shown in the third row in Fig. 3, and then downsample the
034090-4
COMPRESSIVE NON-LINE-OF-SIGHT IMAGING... PHYS. REV. APPLIED 19, 034090 (2023)
X (mm)
–17.5 17.5
X (mm)
–17.5 17.5
X (mm)
–17.5 17.5
X (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Y (mm)
–17.5 17.5
Ground Truth
LCT 20-ps temporal
resoluon, use
16 16 pixels
LCT, use 16 16
pixels
CNN Simulaon
use 8 8 pixels
CNN Experiment
use 8 8 pixels
CNN Simulaon
use 4 4 pixels
CNN Experiment
use 4 4 pixels
CNN Experiment
88 pixels, 20-ps
temporal resoluon
CNN Simulaon
88 pixels, 20-ps
temporal resoluon
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
FIG. 3. NLOS imaging results for different letters of “NLOS.”
NLOS imaging results for different letters of “NLOS.” Each row
lists the ground truth and reconstruction results of one letter. The
ground truth of the letters are listed in the first row. The next
two rows list the reconstructed image using the LCT method, the
second for the simulated 20-ps temporal resolution reconstructed
images, and the third for the experiment 10-ps resolution ones.
The other six rows are results reconstructed using the CNN. The
fourth and fifth are the reconstruction results using 4 ×4pixels
using simulated data and the experiment data, respectively. The
sixth and seventh rows are the reconstruction results of the 20-ps
temporal resolution data using 8 ×8 pixels. The eighth and ninth
rows are the reconstruction results using 8 ×8 pixels. The color
bar indicates the normalized reflectivity of the surface.
experiment data to 8 ×8 pixels as the input of the neural
network (the ninth row in Fig. 3).
IV. COMPRESSIVE NLOS IMAGE
RECONSTRUCTION
Four shapes of English letters “NLOS” are prepared as
the target in the experiment. The results are shown in Fig.
3. The CNN retrieves the 32 ×32 pixel spatial image of
the target from 8 ×8 temporal histograms, showing excel-
lent agreement with simulation result. The eighth row in
Fig. 3shows the predicted results from the simulated data,
and the ninth row shows the results of the experiment
data. As a comparison, although using 16 ×16 scanning
points, the LCT reconstructed results is not as clear as
the CNN reconstructed results, shown in the third row in
Fig. 3. A three-dimensional Gaussian filter has been added
to the LCT results to remove the background noises. The
quality of the reconstructed results are evaluated by cross-
correlating the result figures with the ground truth. The
similarity evaluation criterion is defined as the maximum
value of the cross-correlation result divided by the maxi-
mum of the autocorrelation of the ground truth, listed in
Table I. The CNN reconstructed results (per image size
=32 ×32) are directly compared with the training labels,
while the LCT reconstructed results (per image size =
16 ×16) are compared with the resized training labels. The
evaluation indicates that the CNN reconstructed images
have higher similarity to the ground truth than the LCT
ones; see Supplemental Material [37] for the raw experi-
mental data and the three-dimensional point clouds of the
LCT results.
Several results using other reconstruction conditions are
also shown in Fig. 3. A more compressed case spatially
downsamples the input data further more and uses only
4×4 pixels as input, as shown in the fourth and fifth row
of Fig. 3. The other decreases the temporal resolution to
about 20 ps in both the simulation and experiment. For this
case, we simulate the temporal histograms with 20-ps tem-
poral resolution as the simulated data. For the experiment
data, we convolute the experiment temporal histograms on
each pixel by a Gaussian function with 17-ps FWHM, so
that the result impulse response would be 20 ps. The time-
bin width of the histograms remains 1 ps. For each shape,
the CNN predicted experiment results using 8 ×8 pixels
have the highest similarity of the ground truth, as shown
in the seventh and the eighth row of Table I.The4×4
pixel input case also provides less similar reconstruction
results. The reconstructed shapes of the letters are dis-
torted but still have a coarse profile indicating the letter,
which is evaluated in the third and fourth row of Table
I. The reduced temporal resolution results, although using
8×8 scanning points as the input, have even lower recon-
struction quality comparing with the results using 4 ×4
scanning points. The lower temporal resolution experi-
ment results using the LCT method with 16 ×16 pixels
are shown in the second row of Fig. 3, and the CNN recon-
structed results are shown in the sixth and seventh row
of Fig. 3. The result image becomes coarser, and the let-
ters cannot be distinguished, indicating the loss of detailed
spatial information, as the evaluation results in the sec-
ond, fifth, and sixth row in Table Iare among the lowest.
This indicates the relevance of adequate temporal informa-
tion for NLOS image reconstruction. Regarding the spatial
resolution of the system x=(c(x/2)2+z2)/(x)t
1.1 cm, the reconstruction results show a clear shape
beyond the spatial resolution [32]. The reconstructed
034090-5
ZHU, SUA, BU, and HUANG PHYS. REV. APPLIED 19, 034090 (2023)
TABLE I. Evaluation of the reconstruction results using cross-correlation.
Targ et Nshape Lshape Oshape Sshape
LCT reconstruction using 16 ×16 scanning points (experiment) 0.588 0.526 0.707 0.702
LCT reconstruction of 20-ps resolution using 16 ×16 scanning points (experiment) 0.450 0.456 0.520 0.540
CNN reconstruction using 4 ×4 scanning points (simulation) 0.502 0.549 0.808 0.689
CNN reconstruction using 4 ×4 scanning points (experiment) 0.458 0.490 0.713 0.692
CNN reconstruction of 20-ps resolution using 8 ×8 scanning points (simulation) 0.453 0.488 0.740 0.587
CNN reconstruction of 20-ps resolution using 8 ×8 scanning points (experiment) 0.418 0.459 0.642 0.600
CNN reconstruction using 8 ×8 scanning points (simulation) 0.644 0.642 0.850 0.786
CNN reconstruction using 8 ×8 scanning points (experiment) 0.596 0.507 0.898 0.798
letters do not exist in the training dataset, while the CNN
still provides results that match with the ground truths,
which indicates that the trained CNN builds a model
projecting from the temporal data to the spatial image.
The capability of capturing the NLOS signal in high
temporal resolution guarantees the CNN reconstruction
results. First, the differences in the temporal histograms
of different targets lead to different reconstructed image.
Several samples of the normalized temporal histograms of
both simulation and experiment are shown in Fig. 4.The
temporal histograms in the center scanning point do not
have much difference between different targets as shown
in the third row of Fig. 4. The reason is that the TOF
of the back-scattered photons from the edge of the tar-
get to the center scanning point do not differ much, thus
the “tail” of the histograms is not significant. At corner
scanning points, however, the temporal histograms differ
from each other because of the target shape difference.
On these scanning points, the TOF difference from differ-
ent locations of the target contributes to a longer tail in
the histogram, which provides the information for recon-
structing the shape. Second, considering that the CNN is
trained using simulated data, it is required that the exper-
iment result histograms match those of the simulation. In
this case, high temporal resolution is required for catch-
ing the slight difference between the histograms of the
targets. As a comparison, reconstruction results of lower
temporal resolution have even lower quality than the spa-
tially downsampled case using 4 ×4 pixels. Additionally,
the fluctuation in the temporal histogram, caused by the
scattering randomness of the diffuser and the target, is
mitigated by averaging the histograms from nearby
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
Simulation
Experiment
O
Top-left
Top-right
Middle
Bottom-left
Bottom-right
SLN
Delay (ps)
FIG. 4. Comparison of the temporal histograms between the simulated data and the experiment captured data. Each column in the
figure indicates one target letter, while each row lists the temporal histogram from the same scanning point. The picked scanning points
are at the corner of the scanning area, as labeled. The histograms are normalized on each scanning point.
034090-6
COMPRESSIVE NON-LINE-OF-SIGHT IMAGING... PHYS. REV. APPLIED 19, 034090 (2023)
0.0
0.5
1.0
0.0
0.2
0.4
0.0
0.1
0.2
0.0
0.2
0.0
0.2
0.0
0.1
0.2
0.0
0.5
1.0
0.0
0.2
0.00
0.05
0.10
0.0
0.1
0.2
0.0
0.5
0.0
0.1
0.2
0.0
0.5
1.0
0.0
0.2
0.0
0.1
0.2
0.0
0.1
0.2
0.0
0.1
0.2
0.0
0.1
0.2
0.0
0.5
1.0
0.0
0.1
0.2
0.0
0.1
0.2
0.0
0.5
0.00
0.05
0.10
0.0
0.1
050100050100050100050100050100
050100050100050100050100050100
050100050100050100050100050100
050100050100050100050100050100
050100050100050100050100050100
050100050100050100050100050100
0.0
0.5
1.0
0.00
0.05
0.10
0.0
0.5
0.0
0.2
0.4
0.0
0.1
0.2
0.0
0.2
0.4
Delay ps
Sum-up
histogram
Normalized
photon count
of the five
points
Top-left Top-right Middle Bottom-left Bottom-right
FIG. 5. The temporal histograms of the target N letter at different points. The first row are the summed-up histograms on the
top-left, top-right, middle, bottom-left, and bottom-right positions. The second to the sixth row list the temporal histograms at the five
scanning points of the combined temporal histogram. Each column indicates the histogram at one combined temporal histogram on the
diffuser. Each row of temporal histograms are normalized to the maximum value of the sum-up histogram to better show the amplitude
and shape difference.
scanning points. The remaining discrepancy between the
simulation and experiment data causes the mismatch
between the reconstructed images. From the experiment
results, we show that the nonlinear gated single-photon
detection system is able to capture the NLOS signal in high
temporal resolution such that the temporal histogram from
the experiment can match those from the simulation.
V. DISCUSSION
Image reconstruction of NLOS hidden targets from the
temporal signal has been a long-pursued task, where higher
temporal resolution gives better reconstruction results. We
demonstrate that high temporal resolution enables com-
pressive NLOS image reconstruction through deep learn-
ing. Using a simulated-data-trained CNN, a 32 ×32 pixel
image can be reconstructed using 8 ×8 temporal his-
tograms. The CNN is trained using the simulated data
generated from the LCT model. Given that the exper-
iment histograms match the simulation histograms, the
CNN can interpret the spatially downsampled experi-
ment captured histograms into the NLOS image. Our
results show a potential possibility of enhancing the NLOS
data-acquisition speed, which can be helpful for NLOS
videography [6,27]. One potential advantage of this
deep-learning method is that the CNN, once trained, does
not need the iteration-based optimization calculation for
compressive sensing [8], therefore, requiring less com-
putational resources and reconstruction speed. Moreover,
the spatial resolution of the system can be enhanced by
further improving temporal resolution using narrower opti-
cal pulses as the optical gating [28]. In conclusion, we
demonstrate that a NLOS imaging modality from three-
dimensional data to two-dimensional images is built by
combining the nonlinear gating single-photon imaging
system and deep-learning methods. Such a method may
extend some imaging and sensing applications such as
pose estimation [39] and item recognition [40] into the
NLOS scenario.
ACKNOWLEDGMENTS
This material is based upon work supported by the ACC-
New Jersey under Contract No. W15QKN-18-D-0040. We
thank Yayuan Li for his support and suggestion.
APPENDIX: MINIMIZING DISCREPANCY
BETWEEN SIMULATION AND EXPERIMENT
DATA
In the data-acquisition process of the NLOS imaging, we
scan the probe beam on five scanning points at each pixel.
This is to mitigate the photon-counting randomness caused
034090-7
ZHU, SUA, BU, and HUANG PHYS. REV. APPLIED 19, 034090 (2023)
by the scattering effect from the diffuser and the target. An
example of the temporal histogram differences between the
scanning points are shown in Fig. 5. The second to the sixth
rows show that even at very near scanning points, the tem-
poral histograms differ in both the amplitude and the shape,
and are also quite different from the simulation results. This
difference is mainly brought in by the scattering random-
ness of the diffuser and the target. We sum up these five
histograms and normalize the result histogram as shown in
the first row in Fig. 5. This average partially compensates
the random fluctuation from the scattering, and makes the
histogram look more similar to the simulated case.
[1] C. Rablau, in Fifteenth Conference on Education and
Training in Optics and Photonics: ETOP 2019, Vol. 11143,
International Society for Optics and Photonics (SPIE, Que-
bec City, Quebec, Canada, 2019), p. 84.
[2] K. Zhang, B. Li, X. Zhu, H. Chen, and G. Sun, NLOS signal
detection based on single orthogonal dual-polarized GNSS
antenna, Int. J. Antennas Propag. 2017, 8548427 (2017).
[3] P. Bruza, A. Petusseau, A. Ulku, J. Gunn, S. Streeter, K.
Samkoe, C. Bruschini, E. Charbon, and B. Pogue, Single-
photon avalanche diode imaging sensor for subsurface
fluorescence LIDAR, Optica 8, 1126 (2021).
[4] A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M.
G. Bawendi, and R. Raskar, Recovering three-dimensional
shape around a corner using ultrafast time-of-flight imag-
ing, Nat. Commun. 3, 745 (2012).
[5] M. O’Toole, D. B. Lindell, and G. Wetzstein, Confocal non-
line-of-sight imaging based on the light-cone transform,
Nature 555, 338 (2018).
[6] X. Feng and L. Gao, Ultrafast light field tomography
for snapshot transient and non-line-of-sight imaging, Nat.
Commun. 12, 2179 (2021).
[7] G. Musarra, A. Lyons, E. Conca, Y. Altmann, F. Villa,
F. Zappa, M. Padgett, and D. Faccio, Non-Line-of-Sight
Three-Dimensional Imaging with a Single-Pixel Camera,
Phys. Rev. Appl. 12, 011002 (2019).
[8] J.-T. Ye, X. Huang, Z.-P. Li, and F. Xu, Compressed sensing
for active non-line-of-sight imaging, Opt. Express 29, 1749
(2021).
[9] G. Barbastathis, A. Ozcan, and G. Situ, On the use of deep
learning for computational imaging, Optica 6, 921 (2019).
[10] J. Peng, Z. Xiong, X. Huang, Z.-P. Li, D. Liu, and F. Xu,
in European Conference on Computer Vision (Springer,
Glasgow, UK, 2020), p. 225.
[11] A. Turpin, G. Musarra, V. Kapitany, F. Tonolini, A. Lyons,
I. Starshynov, F. Villa, E. Conca, F. Fioranelli, R. Murray-
Smith, and D. Faccio, Spatial images from temporal data,
Optica 7, 900 (2020).
[12] F. Wang, C. Wang, C. Deng, S. Han, and G. Situ,
Single-pixel imaging using physics enhanced deep learn-
ing, Photon. Res. 10, 104 (2022).
[13] L. Si, T. Huang, X. Wang, Y. Yao, Y. Dong, R. Liao,
and H. Ma, Deep learning Mueller matrix feature retrieval
from a snapshot Stokes image, Opt. Express 30, 8676
(2022).
[14] J. Li, C. Wang, T. Chen, T. Lu, S. Li, B. Sun, F. Gao, and V.
Ntziachristos, Deep learning-based quantitative optoacous-
tic tomography of deep tissues in the absence of labeled
experimental data, Optica 9, 32 (2022).
[15] M. Buttafava, J. Zeman, A. Tosi, K. Eliceiri, and A. Velten,
Non-line-of-sight imaging using a time-gated single photon
avalanche diode, Opt. Express 23, 20997 (2015).
[16] F. Xu, G. Shulkind, C. Thrampoulidis, J. H. Shapiro, A.
Torralba, F. N. Wong, and G. W. Wornell, Revealing hidden
scenes by photon-efficient occlusion-based opportunistic
active imaging, Opt. Express 26, 9945 (2018).
[17] C. Wu, J. Liu, X. Huang, Z.-P. Li, C. Yu, J.-T. Ye, J. Zhang,
Q. Zhang, X. Dou, V. K. Goyal, F. Xu, and J.-W. Pan,
Non–line-of-sight imaging over 1.43 km, Proc. Natl. Acad.
Sci. 118, 10 (2021).
[18] D. B. Lindell, G. Wetzstein, and M. O’Toole, Wave-based
non-line-of-sight imaging using fast f-kmigration, ACM
Trans. Graph. 38, 116 (2019).
[19] X. Liu, I. Guillén, M. La Manna, J. H. Nam, S. A. Reza, T.
H. Le, A. Jarabo, D. Gutierrez, and A. Velten, Non-line-of-
sight imaging using phasor-field virtual wave optics, Nature
572, 620 (2019).
[20] X. Liu, S. Bauer, and A. Velten, Phasor field diffrac-
tion based reconstruction for fast non-line-of-sight imaging
systems, Nat. Commun. 11, 1 (2020).
[21] R. Geng, Y. Hu, and Y. Chen, Recent advances on non-
line-of-sight imaging: Conventional physical models, deep
learning, and new scenes, arXiv preprint arXiv:2104.13807
(2021).
[22] J. G. Chopite, M. B. Hullin, M. Wand, and J. Iseringhausen,
in 2020 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR) (IEEE Computer Society, Los
Alamitos, CA, USA, 2020), p. 957.
[23] J. Peng, F. Mu, J. H. Nam, S. Raghavan, Y. Li, A. Vel-
ten, and Z. Xiong, Towards non-line-of-sight photography,
arXiv preprint arXiv:2109.07783 (2021).
[24] S. Shen, Z. Wang, P. Liu, Z. Pan, R. Li, T. Gao, S. Li, and
J. Yu, Non-line-of-sight imaging via neural transient fields,
IEEE Trans. Pattern Anal. Mach. Intell. 43, 2257 (2021).
[25] W. Chen, F. Wei, K. N. Kutulakos, S. Rusinkiewicz, and
F. Heide, Learned feature embeddings for non-line-of-sight
imaging and recognition, ACM Trans. Graphics (Proc.
SIGGRAPH Asia) 39, 230 (2020).
[26] A. Turpin, V. Kapitany, J. Radford, D. Rovelli, K. Mitchell,
A. Lyons, I. Starshynov, and D. Faccio, 3D Imaging from
Multipath Temporal Echoes, Phys.Rev.Lett.126, 174301
(2021).
[27]J.H.Nam,E.Brandt,S.Bauer,X.Liu,M.Renna,A.
Tosi, E. Sifakis, and A. Velten, Low-latency time-of-flight
non-line-of-sight imaging at 5 frames per second, Nat.
Commun. 12, 6526 (2021).
[28] B. Wang, M.-Y. Zheng, J.-J. Han, X. Huang, X.-P. Xie,
F. Xu, Q. Zhang, and J.-W. Pan, Non-Line-of-Sight Imag-
ing with Picosecond Temporal Resolution, Phys.Rev.Lett.
127, 053602 (2021).
[29] S. Zhu, Y. M. Sua, P. Rehain, and Y.-P. Huang, Single pho-
ton imaging and sensing of highly obscured objects around
the corner, Opt. Express 29, 40865 (2021).
[30] A. Shahverdi, Y. M. Sua, I. Dickson, M. Garikapati, and
Y.-P. Huang, Mode selective up-conversion detection for
LIDAR applications, Opt. Express 26, 15914 (2018).
034090-8
COMPRESSIVE NON-LINE-OF-SIGHT IMAGING... PHYS. REV. APPLIED 19, 034090 (2023)
[31] P. Rehain, Y. M. Sua, S. Zhu, I. Dickson, B. Muthuswamy,
J. Ramanathan, A. Shahverdi, and Y.-P. Huang, Noise-
tolerant single photon sensitive three-dimensional imager,
Nat. Commun. 11, 921 (2020).
[32] A. A. Pushkina, G. Maltese, J. I. Costa-Filho, P. Patel,
and A. I. Lvovsky, Superresolution Linear Optical Imag-
ing in the Far Field, Phys. Rev. Lett. 127, 253602
(2021).
[33] C. Pei, A. Zhang, Y. Deng, F. Xu, J. Wu, D. U.-L. Li,
H. Qiao, L. Fang, and Q. Dai, Dynamic non-line-of-sight
imaging system based on the optimization of point spread
functions, Opt. Express 29, 32349 (2021).
[34] S. Maruca, P. Rehain, Y. M. Sua, S. Zhu, and Y. Huang,
Non-invasive single photon imaging through strongly scat-
tering media, Opt. Express 29, 9981 (2021).
[35] F. Chollet et al., Keras, https://keras.io (2015).
[36] F. Mu, S. Mo, J. Peng, X. Liu, J. H. Nam, S. Raghavan, A.
Velten, and Y. Li, Physics to the rescue: Deep non-line-of-
sight reconstruction for high-speed imaging, arXiv preprint
arXiv:2205.01679 (2022).
[37] See Supplemental Material at http://link.aps.org/supple
mental/10.1103/PhysRevApplied.19.034090 for (i) the
simulated and the experimental results using a U-Net;
and (ii) additional figures for the raw data and the three-
dimensional light-cone transformation reconstructed point
clouds.
[38] I. Starshynov, O. Ghafur, J. Fitches, and D. Faccio, Coher-
ent Control of Light for Non-Line-of-Sight Imaging, Phys.
Rev. Appl. 12, 064045 (2019).
[39] P. Kirkland, V. Kapitany, A. Lyons, J. Soraghan, A. Turpin,
D. Faccio, and G. D. Caterina, in Emerging Imaging
and Sensing Technologies for Security and Defence V;
and Advanced Manufacturing Technologies for Micro- and
Nanosystems in Security and Defence III, Vol. 11540, Inter-
national Society for Optics and Photonics (SPIE, 2020), p.
66.
[40] G. Mora-Martín, A. Turpin, A. Ruget, A. Halimi, R. Hen-
derson, J. Leach, and I. Gyongy, High-speed object detec-
tion with a single-photon time-of-flight image sensor, Opt.
Express 29, 33184 (2021).
034090-9

Supplementary resource (1)

Data
October 2023
Shenyu Zhu · Yong Meng Sua · Ting Bu · Yu-Ping Huang
... There have been substantial advancements in the field with the exploration of unconventional principles, including the time-of-flight (TOF) of light [3][4][5][6][7][8][9][10][11][12] , speckle correlations [13][14][15] , wavefront shaping 16,17 and others [18][19][20][21][22] . The TOF-based approaches [3][4][5][6][7][8][9][10][11][12] have demonstrated robust three-dimensional (3D) reconstructions in a wide range of scenarios [23][24][25][26][27][28][29][30][31] , and stand out as one of the most promising real-life solutions. Despite their advances, achieving real-time NLOS video for complex and dynamic scenes remains an outstanding obstacle limited by a few factors 2 . ...
... In addition, the computational complexity of the algorithm should allow for fast reconstruction on common central processing units (CPUs) or graphics processing units (GPUs). Consequently, recent NLOS imaging systems typically require capture times of minutes or tens of seconds to reconstruct a room-sized scene [4][5][6][7][8][9][10][11][12][27][28][29][30][31] , imposing limitations on their practical applications. ...
Article
Full-text available
Non-line-of-sight (NLOS) imaging aims at recovering the shape and albedo of hidden objects. Despite recent advances, real-time video of complex and dynamic scenes remains a major challenge owing to the weak signal of multiply scattered light. Here we propose and demonstrate a framework of spectrum filtering and motion compensation to realize high-quality NLOS video for room-sized scenes. Spectrum filtering leverages a wave-based model for denoising and deblurring in the frequency domain, enabling computational image reconstruction with a small number of sampling points. Motion compensation tailored with an interleaved scanning scheme can compute high-resolution live video during the acquisition of low-quality image sequences. Together, we demonstrate live NLOS videos at 4 fps for a variety of dynamic real-life scenes. The results mark a substantial stride toward real-time, large-scale and low-power NLOS imaging and sensing applications.
... Recently, NLOS reconstruction methods based on deep learning have shown remarkable visual gains over traditional methods, as they have a powerful ability to extract features. Existing deep learning-based passive reconstruction methods can be roughly divided into two categories: the hybrid method that is combined with physical models [14,15] and end-to-end learning methods [16][17][18][19][20][21][22][23][24][25][26][27][28][29]. The former can perform effectively even in situations with limited or no available data. ...
... Nevertheless, it still relies on prior knowledge of the reconstruction system for computations like the light transport matrix. The latter end-to-end methods, which do not require prior knowledge of the reconstruction system, can be broadly categorized into two types: convolutional neural networks (CNNs) [16][17][18][19][20][21][22][23][24][25]29] and generative models [26][27][28]. Among these, CNNs, especially U-Net and its variants, are the most widely used due to their exceptional performance in tasks such as image segmentation and image restoration. ...
Article
Full-text available
Passive non-line-of-sight (NLOS) reconstruction has received considerable success in diverse fields. However, the existing reconstruction methods ignore that complex scenes attenuate object-related information and view object-related information and noise in measured images as equivalent, yielding low-quality recovery. We propose an attention-based encoder–decoder (AED) network to tackle this problem. Specifically, we introduce an attention in the attention (A2B) module that can prune the attention layers to help the network focus on the object-related information in the measured images. In addition, we establish several datasets in complex scenes, including varying ambient light conditions and parameter settings of reconstruction systems, as well as complex hidden objects, to verify the generalization of our method. Experiments on our constructed datasets demonstrate that our methods achieve better recovery performance than existing methods, with more robustness to complex scenes.
... To balance acquisition time and reconstruction time, methods based on deep learning have been proposed to learn operators that can recover high spatial resolution signals 33 or learn reconstruction results from under-sampled data directly. 34 Leveraging in-depth physics information is also one of the efficient methods with limited sampling data. 35 These methods can reduce the number of sampling points to a maximum of 4 × 4, enabling near real-time imaging. ...
Article
Full-text available
Real-time non-line-of-sight imaging is crucial for practical applications. Among existing methods, transient methods present the best visual reconstruction ability. However, most transient methods require a long acquisition time, thus failing to deal with real-time imaging tasks. Here, we provide a dual optical coupling model to describe the spatiotemporal propagation of photons in free space, then propose an efficient non-confocal transformation algorithm and establish a non-confocal time-to-space boundary migration model. Based on these, a scan-free boundary migration method is proposed. The data acquisition speed of the method can reach 151 fps, which is ∼7 times faster than the current fastest data acquisition method, while the overall imaging speed can also reach 19 fps. The background stability brought by fast scan-free acquisition makes the method suitable for dynamic scenes. In addition, the high robustness of the model to noise makes the method have the capability of non-line-of-sight imaging in outdoor environments during the daytime. To further enhance the practicality of this method in real-world scenarios, we exploit the statistical prior and propose a plug-in-and-play super-resolution method to extract higher spatial resolution signals, reducing the detector array requirement from 32 × 32 to 8 × 8 without compromising imaging quality, thus reducing the device expense of detectors.
... Single-pixel Imaging (SPI) design offers a cost-effective multispectral imaging system compared to conventional NLOS imaging. NLOS imaging enables unprecedented capabilities in various applications [5], including internal pipeline inspection, deep downhole object detection imaging, internal topography measurement of special-shaped parts, and detection of special objects in underground caves [6]. Therefore, NLOS imaging based on the SPI technique has received wide attention and a few researchers have concentrated on achieving color imaging by SPI system [7][8][9]. ...
Article
Full-text available
Non-line-of-sight (NLOS) imaging aims to reconstruct objects obscured by direct line of sight. Traditional Single-pixel Imaging (SPI) performs correlation operations on signals through the illumination pattern and intensity of a single-pixel detector. However, the reconstructed result mainly provides spatial information of objects, which limits its practical applications, including autonomous driving and smart cities for defense. In this work, leveraging active correlations-based imaging techniques, a multi-wavelength single-pixel non-line-of-sight (NLOS) reconstruction framework is proposed. By introducing compressive sensing, a Total Variation minimization (TV) RGB color space algorithm is designed for more object information reconstructions via under-sampling. The proposed approach is capable of reconstructing both the space and color information of hidden objects with fine detail under the intermediate reflector and filter settings. The experimental results demonstrate that the proposed scheme achieves a compression rate of 29% and outperforms conventional single-pixel imaging in terms of object information at low sampling rates, having potential practical applications.
Preprint
Full-text available
Normal reconstruction is crucial in non-line-of-sight (NLOS) imaging, as it provides key geometric and lighting information about hidden objects, which significantly improves reconstruction accuracy and scene understanding. However, jointly estimating normals and albedo expands the problem from matrix-valued functions to tensor-valued functions that substantially increasing complexity and computational difficulty. In this paper, we propose a novel joint albedo-surface reconstruction method, which utilizes the Frobenius norm of the shape operator to control the variation rate of the normal field. It is the first attempt to apply regularization methods to the reconstruction of surface normals for hidden objects. By improving the accuracy of the normal field, it enhances detail representation and achieves high-precision reconstruction of hidden object geometry. The proposed method demonstrates robustness and effectiveness on both synthetic and experimental datasets. On transient data captured within 15 seconds, our surface normal-regularized reconstruction model produces more accurate surfaces than recently proposed methods and is 30 times faster than the existing surface reconstruction approach.
Article
Non-line-of-sight (NLOS) imaging has the ability to reconstruct hidden objects, allowing a wide range of applications. Existing NLOS systems rely on pulsed lasers and time-resolved single-photon detectors to capture the information encoded in the time of flight of scattered photons. Despite remarkable advances, the pulsed time-of-flight LIDAR approach has limited temporal resolution and struggles to detect the frequency-associated information directly. Here, we propose and demonstrate the coherent scheme—frequency-modulated continuous wave calibrated by optical frequency comb—for high-resolution NLOS imaging, velocimetry, and vibrometry. Our comb-calibrated coherent sensor presents a system temporal resolution at subpicosecond and its superior signal-to-noise ratio permits NLOS imaging of complex scenes under strong ambient light. We show the capability of NLOS localization and 3D imaging at submillimeter scale and demonstrate NLOS vibrometry sensing at an accuracy of dozen Hertz. Our approach unlocks the coherent LIDAR techniques for widespread use in imaging science and optical sensing.
Article
Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes by using time-of-flight photon information after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is highly likely to be degraded due to noises and distortions. In this paper, we propose novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (signal and object)-domain curvature regularization model. In what follows, we develop efficient optimization algorithms relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, for which all solvers can be implemented on GPUs. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. Based on GPU computing, our algorithm is the most effective among iterative methods, balancing reconstruction quality and computational time. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.
Article
Non-line-of-sight imaging recovers hidden objects around the corner by analyzing the diffuse reflection light on the relay surface that carries hidden scene information. Due to its huge application potential in the fields of autonomous driving, defense, medical imaging, and post-disaster rescue, non-line-of-sight imaging has attracted considerable attention from researchers at home and abroad, especially in recent years. The research on non-line-of-sight imaging primarily focuses on imaging systems, forward models, and reconstruction algorithms. This paper systematically summarizes the existing non-line-of-sight imaging technology in both active and passive scenes, and analyzes the challenges and future directions of non-line-of-sight imaging technology.
Article
Full-text available
A Mueller matrix (MM) provides a comprehensive representation of the polarization properties of a complex medium and encodes very rich information on the macro- and microstructural features. Histopathological features can be characterized by polarization parameters derived from MM. However, a MM must be derived from at least four Stokes vectors corresponding to four different incident polarization states, which makes the qualities of MM very sensitive to small changes in the imaging system or the sample during the exposures, such as fluctuations in illumination light and co-registration of polarization component images. In this work, we use a deep learning approach to retrieve MM-based specific polarimetry basis parameters (PBPs) from a snapshot Stokes vector. This data post-processing method is capable of eliminating errors introduced by multi-exposure, as well as reducing the imaging time and hardware complexity. It shows the potential for accurate MM imaging on dynamic samples or in unstable environments. The translation model is designed based on generative adversarial network with customized loss functions. The effectiveness of the approach was demonstrated on liver and breast tissue slices and blood smears. Finally, we evaluated the performance by quantitative similarity assessment methods in both pixel and image levels.
Article
Full-text available
Deep learning (DL) shows promise for quantitating anatomical features and functional parameters of tissues in quantitative optoacoustic tomography (QOAT), but its application to deep tissue is hindered by a lack of ground truth data. We propose DL-based “QOAT-Net,” which functions without labeled experimental data: a dual-path convolutional network estimates absorption coefficients after training with data-label pairs generated via unsupervised “simulation-to-experiment” data translation. In simulations, phantoms, and ex vivo and in vivo tissues, QOAT-Net affords quantitative absorption images with high spatial resolution. This approach makes DL-based QOAT and other imaging applications feasible in the absence of ground truth data.
Article
Full-text available
Non-Line-Of-Sight (NLOS) imaging aims at recovering the 3D geometry of objects that are hidden from the direct line of sight. One major challenge with this technique is the weak available multibounce signal limiting scene size, capture speed, and reconstruction quality. To overcome this obstacle, we introduce a multipixel time-of-flight non-line-of-sight imaging method combining specifically designed Single Photon Avalanche Diode (SPAD) array detectors with a fast reconstruction algorithm that captures and reconstructs live low-latency videos of non-line-of-sight scenes with natural non-retroreflective objects. We develop a model of the signal-to-noise-ratio of non-line-of-sight imaging and use it to devise a method that reconstructs the scene such that signal-to-noise-ratio, motion blur, angular resolution, and depth resolution are all independent of scene depth suggesting that reconstruction of very large scenes may be possible.
Article
Full-text available
Non-line-of-sight (NLOS) optical imaging and sensing of objects imply new capabilities valuable to autonomous technology, machine vision, and other applications, in which case very few informative photons are buried in strong background counts. Here, we introduce a new approach to NLOS imaging and sensing using the picosecond-gated single photon detection generated by nonlinear frequency conversion. With exceptional signal isolation, this approach can reliably achieve imaging and position retrieval of obscured objects around the corner, in which case only 4 × 10⁻³ photons are needed to be detected per pulse for each pixel with high temporal resolution. Furthermore, the vibration frequencies of different objects can be resolved by analyzing the photon number fluctuation received within a ten-picosecond window, allowing NLOS acoustic sensing. Our results highlight the prospect of photon efficient NLOS imaging and sensing for real-world applications.
Article
Full-text available
Single-pixel imaging (SPI) is a typical computational imaging modality that allows two- and three-dimensional image reconstruction from a one-dimensional bucket signal acquired under structured illumination. It is in particular of interest for imaging under low light condition and in spectral regions that good cameras are unavailable. However, the resolution of the reconstructed image in SPI is strongly dependent on the number of measurements in the temporal domain. Data-driven deep learning has been proposed for high quality image reconstruction from a under-sampled bucket signal. But the generalization issue prohibits its practical application. Here we propose a physics-enhanced deep learning approach for SPI. By blending a physics-informed layer and a model-driven fine-tune process, we show that the proposed approach is generalizable for image reconstruction. We implement the proposed method in an in-house SPI and an out-door Single-pixel LiDAR system, and demonstrate that it outperforms some other widespread SPI algorithms in terms of both robustness and fidelity. The proposed method establishes a bridge between data-driven and model-driven algorithms, allowing one to impose both data and physics priors for inverse problem solvers in computational imaging, ranging from remote sensing to microscopy.
Article
Full-text available
3D time-of-flight (ToF) imaging is used in a variety of applications such as augmented reality (AR), computer interfaces, robotics and autonomous systems. Single-photon avalanche diodes (SPADs) are one of the enabling technologies providing accurate depth data even over long ranges. By developing SPADs in array format with integrated processing combined with pulsed, flood-type illumination, high-speed 3D capture is possible. However, array sizes tend to be relatively small, limiting the lateral resolution of the resulting depth maps and, consequently, the information that can be extracted from the image for applications such as object detection. In this paper, we demonstrate that these limitations can be overcome through the use of convolutional neural networks (CNNs) for high-performance object detection. We present outdoor results from a portable SPAD camera system that outputs 16-bin photon timing histograms with 64×32 spatial resolution, with each histogram containing thousands of photons. The results, obtained with exposure times down to 2 ms (equivalent to 500 FPS) and in signal-to-background (SBR) ratios as low as 0.05, point to the advantages of providing the CNN with full histogram data rather than point clouds alone. Alternatively, a combination of point cloud and active intensity data may be used as input, for a similar level of performance. In either case, the GPU-accelerated processing time is less than 1 ms per frame, leading to an overall latency (image acquisition plus processing) in the millisecond range, making the results relevant for safety-critical computer vision applications which would benefit from faster than human reaction times.
Article
Full-text available
Non-line-of-sight (NLOS) imaging reveals hidden objects reflected from diffusing surfaces or behind scattering media. NLOS reconstruction is usually achieved by computational deconvolution of time-resolved transient data from a scanning single-photon avalanche diode (SPAD) detection system. However, using such a system requires a lengthy acquisition, impossible for capturing dynamic NLOS scenes. We propose to use a novel SPAD array and an optimization-based computational method to achieve NLOS reconstruction of 20 frames per second (fps). The imaging system's high efficiency drastically reduces the acquisition time for each frame. The forward projection optimization method robustly reconstructs NLOS scenes from low SNR data collected by the SPAD array. Experiments were conducted over a wide range of dynamic scenes in comparison with confocal and phase-field methods. Under the same exposure time, the proposed algorithm shows superior performances among state-of-the-art methods. To better analyze and validate our system, we also used simulated scenes to validate the advantages through quantitative benchmarks such as PSNR, SSIM and total variation analysis. Our system is anticipated to have the potential to achieve video-rate NLOS imaging.
Article
Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms. A recent development towards practical NLOS imaging, Nam et al. [1] demonstrated a high-speed non-confocal imaging system that operates at 5 Hz, 100x faster than the prior art. This enormous gain in acquisition rate, however, necessitates numerous approximations in light transport, breaking many existing NLOS reconstruction methods that assume an idealized image formation model. To bridge the gap, we present a novel deep model that incorporates the complementary physics priors of wave propagation and volume rendering into a neural network for high-quality and robust NLOS reconstruction. This orchestrated design regularizes the solution space by relaxing the image formation model, resulting in a deep model that generalizes well on real captures despite being exclusively trained on synthetic data. Further, we devise a unified learning framework that enables our model to be flexibly trained using diverse supervision signals, including target intensity images or even raw NLOS transient measurements. Once trained, our model renders both intensity and depth images at inference time in a single forward pass, capable of processing more than 5 captures per second on a high-end GPU. Through extensive qualitative and quantitative experiments, we show that our method outperforms prior physics and learning based approaches on both synthetic and real measurements. We anticipate that our method along with the fast capturing system will accelerate future development of NLOS imaging for real world applications that require high-speed imaging.
Article
The resolution of optical imaging devices is ultimately limited by the diffraction of light. To circumvent this limit, modern superresolution microscopy techniques employ active interaction with the object by exploiting its optical nonlinearities, nonclassical properties of the illumination beam, or near field probing. Thus, they are not applicable whenever such interaction is not possible, for example, in astronomy or noninvasive biological imaging. Far field, linear optical superresolution techniques based on passive analysis of light coming from the object would cover these gaps. In this Letter, we present the first proof-of-principle demonstration of such a technique for 2D imaging. It works by accessing information about spatial correlations of the image optical field and, hence, about the object itself via measuring projections onto Hermite-Gaussian transverse spatial modes. With a basis of 21 spatial modes in both transverse dimensions, we perform two-dimensional imaging with twofold resolution enhancement beyond the diffraction limit.