Conference PaperPDF Available

A Near Real Time Space Based Computer Vision System for Accurate Terrain Mapping

Authors:

Abstract and Figures

The Multiview Onboard Computational Imager (MOCI) is a 3U cube satellite designed to convert high resolution imagery, 4K images at 8m Ground Sample Distance (GSD), into useful end data products in near real time. The primary data products that MOCI seeks to provide are a 3D terrain models of the surface of Earth that can be directly compared to the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) v3 global Digital Elevation Model (DEM). MOCI utilizes a Nvidia TX2 Graphic Processing Unit (GPU)/System on a Chip (SoC) to perform the complex calculations required for such a task. The reconstruction problem, which MOCI can solve, contains many complex computer vision subroutines that can be used in less complicated computer vision pipelines.
Content may be subject to copyright.
Adams 1 32nd Annual AIAA/USU
Conference on Small Satellites
SSC18-x-x
A Near Real Time Space Based Computer Vision System for Accurate Terrain Mapping
Caleb Adams
University of Georgia Small Satellite Research Laboratory
106 Pineview Court, Athens, GA 30606
CalebAshmoreAdams@gmail.com
Faculty Advisor: Dr. David Cotten
Center for Geospatial Research, UGA Small Satellite Research Laboratory
ABSTRACT
The Multiview Onboard Computational Imager (MOCI) is a 3U cube satellite designed to convert high resolution
imagery, 4K images at 8m Ground Sample Distance (GSD), into useful end data products in near real time. The
primary data products that MOCI seeks to provide are a 3D terrain models of the surface of Earth that can be directly
compared to the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) v3 global Digital
Elevation Model (DEM). MOCI utilizes a Nvidia TX2 Graphic Processing Unit (GPU)/System on a Chip (SoC) to
perform the complex calculations required for such a task. The reconstruction problem, which MOCI can solve,
contains many complex computer vision subroutines that can be used in less complicated computer vision pipelines.
INTRODUCTION
This paper does not seek to describe the entire satellite
system; it seeks to describe, in detail, the complex
computation system that MOCI will utilize to generate
scientific data on orbit. An overview of the satellite
system and optical system are provided for clarity and
context. A detailed explanation of the subroutines in
MOCI’s primary computer vision pipeline are described
in detail over the course of this paper.
System Overview
The MOCI satellite primarily uses Commercial Off the
Shelf (COTS) hardware so that the focus can be on
payload development. The MAI-401 with a star-tracker
is utilized to achieve the necessary pointing
requirements. The GomSpace BP4 P60 Electrical Power
System (EPS) is used. The F’Sati Ultra High Frequency
(UHF) transceiver and F’Sati S-Band transmitter are
used for communications of commands, telemetry, and
science data. A Clyde Space On Board Computer (OBC)
is used as the main flight computer. The payload uses the
Nvidia TX2 SoC as the high-performance computation
unit and a custom optical system developed by Ruda-
Cardinal that produces 4K images from at 8m GSD from
a 400km orbit.
Surface Reconstruction Pipeline
In our case, a computer vision pipeline, sometimes
referred to as a workflow1, consists of a set of chained
computer vision subroutines where the output of the
previous is the input to the next. Subroutines are often
referred to as stages in this sense1. MOCI implements a
Surface Reconstruction Pipeline with the initial inputs
being a set of images, the position of the spacecraft per
image, and the orientation of the spacecraft per image.
The first stage in the pipeline is the feature detection
stage.
Figure 1: Multiview Reconstruction2
The feature detection stage identifies regions within each
image that should be considered for feature description.
This stage produces a set of points at location !" # $%,
scale &, and rotation q. The feature description stage
takes this set and encodes the information from local
regions into a feature vector '. The next stage is the
feature matching stage, which seeks to find the best
correspondence, or minimum difference, between the set
of points in the images. Once points have been matched
in the image they need to be placed into () from (*. for
reprojection. Vectors are made at the position of each
matched point and used to calculate the point of
Adams 2 32nd Annual AIAA/USU
Conference on Small Satellites
minimum distance between all projected lines, which is
the calculated point of intersection. The output of the
reprojection is a set of points in (). After the initial
reprojection a Bundle Adjustment is performed as the
next stage in the pipeline. This takes the sets of points
and uses camera data to estimate the reprojection error
and remove it from the generated points. The result after
the Bundle Adjustment is a more accurate point cloud.
The next stage is the final stage, which is a surface
reconstruction. First normal are calculated for the set of
points to make an oriented point set. The oriented point
set is then used for a Poisson Surface reconstruction to
make the final data product. Any additional computer
vision subroutines discussed here are not part of MOCI’s
primary pipeline.
RELATED WORK
The techniques relayed here are not new, but are built
from well understood algorithms and computer vision
subroutines. The implementations of these well
understood principals are built from previous work that
the University of Georgia (UGA) Small Satellite
Research Laboratory (SSRL) has done. The adaptation
of structure from motion and real time mapping for aerial
based photogrammetry and autonomous robotics is
commonplace.
Multiview Reconstruction
GPU accelerated mechanics Structure from Motion have
been implemented on many occasions. Chang Chang
Wu’s research to develop an incremental approach to
Structure from Motion demonstrated that it was possible
to solve the reconstruction problem in +!,% rather than
+!,-%# greatly improving efficiency and speed3.
Additional research has shown that the triangulation
problem can be achieve a 40x speed up4 when utilizing
Compute Unified Device Architecture (CUDA) capable
Nvidia GPUs. Additionally, multicore GPU Bundle
Adjustment has been shown to achieve a 30x speed up
over previous implementations. GPU accelerated feature
detectors and descriptors are now commonplace. This
can typically lead to the identification and extraction of
features within a few milliseconds5, which allows the
near real time extraction of input information into the
pipeline. A standard Poisson Reconstruction,
fundamentally limited by the octree data structure, can
run two orders of magnitude6 faster when implemented
on a CUDA capable GPU.
Surface Normal Calculation
A key problem in a surface reconstruction or structure
from motion pipeline is the generation of an oriented
point set. Recent research, testing the feasibility of
generating oriented point sets from cube satellites, has
claimed that normal estimation is only accurate between
5o and 29o when utilizing 2m GSD imagery7.
Additionally, new techniques have recently been
demonstrated showing that a Randomized Hough
Transform can preserve sharp features, improving the
accuracy of point normal while being almost an order of
magnitude faster8. It is expected the more efficient point
normalization methods will improve the accuracy of the
3D models generated by MOCI.
Cloud Height and Planetary Modeling
With previous studies, we have shown that image data
from the International Space Station (ISS) High
Definition Earth-Viewing System (HDEV) can produce
accurate cloud height models within 5.926 7.012 km9.
Additionally, available structure from motion pipelines
and surface reconstruction, in conjunction with the
SSRL’s custom simulation software, have been used to
demonstrate that a 3D surfaces of mountain ranges can
be reconstructed. The SSRL has demonstrated, that the
proposed pipeline can generate 3D models of large
geographic features within 68.2% accuracy of ASTER
v3 global DEM data10, resulting in an approximately
10m resolution surface model.
PAYLOAD SYSTEM OVERVIEW
A simple overview of the system is provided in this
section to make later sections about scientific
computations more clear. The payload sits at the top of
the electronics stack of the MOCI system and contains
the Nvidia TX2, an optical assembly, an e-con systems
See3CAM_CU135 with an AR1335 image sensor from
ON Semiconductor, Core GPU Interface (CORGI)
Board to connect all the subsystems together and allow
them to communicate over a standard PC104+ bus.
Figure 2: Payload Electronics
Adams 3 32nd Annual AIAA/USU
Conference on Small Satellites
Data Interfacing and Power
The Nvidia TX2 is interfaced to the CORGI Board via a
400 pinout connector. The CORGI Board connected to
the See3CAM_CU135 interface board via a Universal
Serial Bus type C connector (USB-C) allowing for 4K
image data to be streamed to the GPU. The CORGI also
routes an Inter-Integrated Circuit (I2C) bus, Ethernet, and
a Serial Peripheral Interface (SPI) into the satellite’s
PC104+ bus. The TX2’s maximum power draw is 7W,
but current computations are only running at
approximately 4.5W.
Thermal Properties
For a worst case, thermal analysis, an unrealistic power
draw of 14W is used. Additionally, a system with 0%
efficiency was also assumed as a worst-case scenario. A
maximum temperature of approximately 51o C was
simulated with these conditions. The TX2 is attached to
a Thermal Transfer Plate and simulations have shown
that the max operating temperature is sustainable for the
system. Further, more detailed, research will soon be
published on how we have managed these thermal
conditions.
Optical System
The SSRL is partnering with Ruda-Cardinal to make a
custom optical assembly capable of generating images at
a resolution of at most 8m GSD. The optical system has
a 4.5o Field of View (FOV) and an effective focal length
of 120mm.
Figure 3: MOCI Optical Assembly
GPU SYSTEM OVERVIEW
The GPU (Nvidia TX2) is a complete SoC, capable of
running GNU/Linux on an ARM v8 CPU with a Tegra
GPU running the Pascal Architecture. The TX2 is has
256 CUDA cores, 8 GB of 128 bit LPDDR4, and 32 GB
of eMMC.
Radiation Mitigation
The primary concerns in LEO are Single Event Upsets
(SEU), Single Event Functional Interrupts (SEFI), and
Single Event Latchups (SEL)11. These are certainly
concerns for a dense SoC like the TX2. Thus, MOCI will
utilize aluminized capton as a thin layer of protection for
the payload. Software mitigation is also implemented.
The Clyde Space OBC contains hardware-encoded ECC
and could flash a new image onto the TX2 if necessary.
The TX2 also utilizes a custom implementation of
software-encoded error correction coding (ECC).
Further, more detailed, research will soon be published
on how we have managed and characterized these
radiation conditions.
Compute Unified Device Architecture
Currently the TX2 utilizes CUDA 9.0. CUDA capable
GPUs can parallelize tasks, leading to computational
speeds orders of magnitude higher than those of a CPU
system performing the same operations. CUDA’s
computational parallel model is comprised of a grid that
contains blocks made up of threads. The TX2 can handle
up to 65535 blocks per dimension, leading to a potential
total of 2.81 x 1014 blocks each containing a maximum
of 1024 threads. The potential for parallelization here is
substantial, and is the key to developing a near real time
computer vision system.
GPU Accelerated Linear Algebra
In Hartley’s survey paper on optimal algorithms in
Multiview Geometry, every algorithm he identifies
benefits greatly from hyper optimized matrix operations
that are made possible by the massive parallelization that
CUDA enables12. Furthermore, widely available linear
algebra libraries, such as the Basic Linear Algebra
Subsystem (BLAS) have been accelerated with CUDA13.
These modified libraries, such as cuBLAS and
cuSOLVER are critical to the improving the functions
needed in complex computer vision pipelines.
FEATURE DETECTION AND DESCRIPTION
After images have been acquired from the payload
system, the first step in the pipeline is feature detection.
Typically, feature detection attempts to identifies regions
within an image that should be considered for feature
description14. Feature descriptions are only given to
candidate regions/points that meet the requirements of
the algorithm. For our purposes, we utilize the Scale
Invariant Feature Transform (SIFT) developed by Lowe.
The SIFT algorithm, which contains several standard
subroutines, has become a standard in computer vision
Detection of Scale Space Extrema
To detect feature that are scale-invariant, the SIFT
algorithm uses a Difference of Gaussians (DoG) to
identify local extrema in scale-space. The Laplacian of
Gaussians (LoG) is often used to detect stable features in
scale-space14. The convolution of an image with a
Adams 4 32nd Annual AIAA/USU
Conference on Small Satellites
Gaussian kernel is defined by a function .!"# $# /%,
which is produced by the convolution of a scale space
Gaussian15, 0!"# $# /%, and input image, 1!" # $% and 2 is
a convolution operation between functions:
. "# $# / 3 40 "# $# / 2 41!"# $% (1)
An efficient way to calculate the DoG function,
5!"# $# /% is to simply compute the difference two
nearby scales with a separation of 6.
5 "# $# / 3 0 " # $# 6/ 7 0 " # $# / 2 1 "# $ (2)4
5 "# $# / 3 . "# $# 6/ 7 . "# $# / (3)
The Gaussian kernel is convolved with the input image
to form a Gaussian Scale Pyramid.
Figure 4: The DoG in Scale-Space15
To detect the local minima and maxima of the DoG
function, the candidate point is compared to the 8 local
neighbors of its current scale, the 9 neighbors above its
scale, and the 9 neighbors below its scale in the DoG
pyramid. For ease of computation, the scale separation
of 6 is chosen to be represented as 6 3 4 89:; . where & is
chosen to be an integer number such that a doubling of &
results in a division of the scale space / in the next
octave15.
Keypoint Localization and Filtering
Given a set of candidate points, given by the detection of
scale-space extrema from the DoG, the challenge is to
localize the point by determining the ratio of principal
curvature. A method proposed by Brown uses a Taylor
expansion of the scale-space function, 5!"# $# /%, where
the origin is at the center of the sample point16:
5 " 3 45 < 4 =5>
=" " < 4 ?
8">=*5
="*"4444444444444444444444444444444!@%
The derivatives are located at the center of the sample
point and the offset from the sample point is defined as
" 3 !"# $# / %>. The local extreme " is given by taking
the derivative of the function with respect to "4and
setting it equal to zero:
" 3 4 7 =*5A9
="*4=5
=" 4444444444444444444444444444444444444444444444444444444444444 !B%
Often additional calculations are preformed to eliminate
unstable extrema and edge responses. To eliminate
strong edge responses, which a DoG will often produce,
the principal curvature is computed from a Hessian
matrix containing the partial derivatives of the DoG
function, 5!"# $# /%:
C 3 4 5DD 5DE
5DE 5EE (6)
Harris and Stephens have shown that we only need be
concerned with the ratios of eigenvalues17. We let F
represent be the eigenvalue with the largest magnitude
and G be the smaller magnitude. Then we compute the
trace of C and the determinate.
HI C 3 4 5DD < 4 5EE 3 4F < 4G44 (7)
5JK C 3 4 5DD5EE 7 !5DD*% 3 4FG (8)
We then let I be the ratio of the between the largest and
smallest eigenvalues such that F 3 IG. Then we
discover:
HI!C%*
5JK!C% 3 4 !F < 4G%*
FG 3 4 !IG < 4G%*
IG*3 4 !I < ?%*
I44444!L%
We can then use this ratio as a cut off for undesired
edge points. Typically, a value of I 3 4?M is used15 to
eliminate principal curvatures greater than I.
Orientation and Magnitude Assignment
To achieve rotation invariance, so that we can identify
the same keypoints from any rotation, we must assign an
orientation to the keypoints from the previous step. We
want these computation to occur in a scale invariant
manner as well, so we select the Gaussian smoothed
image .!"# $% at scale / where the extrema was detected.
We can use pixel differences to compute the gradient
magnitude, N!"# $%, and orientation, O "# $ # to assign:
N "# $ 3 4 !. " < ?# $ 7 .!" 7 ?# $%%*
<!. "# $ < ? 7 .!"# $ 7 ?%%* (10)
O "# $ 3 4 KP,A9 . "# $ < ? 7 . "# $ 7 ?
. " < ?# $ 7 4.!" 7 ?# $%44444444444!??%
Adams 5 32nd Annual AIAA/USU
Conference on Small Satellites
Feature Description
For each keypoint, the SIFT algorithm will start by
calculating the image gradient magnitudes and in a 16 ×
16 region around each keypoint using its scale to select
the level of Gaussian blur for the image. A set of oriented
histograms is created for each 4 × 4 region of the image
gradient window15.
A Gaussian weighting function with / equal to half the
region size assigns weights to each sample point. Given
that there are 4 × 4 and 8 possible orientations, the length
of the generated feature vector is 128. In other words,
there are 128 elements describing each point in the final
output of the SIFT algorithm.
Figure 5: A SIFT Feature Descriptor15
FEATURE MATCHING
Feature matching can be thought of as a simple problem
of Euclidean distance. First, sets of points are eliminated
that do not fit within a radius I, a set of close points is
generated with the simple Euclidean distance Q, where
each feature has a coordinate !"# $% on the image.
Q 3 4 !$*7 $9% < 4 !"*7 "9%4 (12)
We iterate through each point in image one, 19, and
image two, 1*, and accumulate potential matches where
Q4 R 4I.
Figure 6: SIFT feature matching
For each value in the feature vector, ', we find the
minimum 128 dimensional Euclidean distance, N.
N 3 4 !'
S9 7 4 '
S9%
9*T
U
4444444444444444444444444444444444444444444444444444!?V%
The resulting “matched” points should also be checked
against some maximum threshold. If the minimum
Euclidean distance is more than that threshold, the match
should be discarded.
MULTIVIEW RECONSTRUCTION
Once the features have been identified for each image
and the features between images have been matched, the
image planes must be placed into (). The matched
keypoints and camera information must be used to
triangulate the location of the identified feature in ().
Moving into 3D space
The first step to moving a key point into (). is to place
it onto a plane in (*. The coordinates !"W# $W% in (*
require the size of a pixel QXY", the location of the
keypoint !"# $%, and the resolution of the image
!"IJ&#$IJ&% to yield:
"Z3 4QXY"4 " 7 "IJ&
84444444444444444444444444444444444444444444444444!?@%
$Z3 4QXY"4$IJ&
87 $ 4444444444444444444444444444444444444444444444444 !?B%
This is repeated for the other matching keypoint. The
coordinate !"W# $W# [W%4in () of the keypoint !"W# $W% in (*
is given by three rotation matrices and one translation
matrix. First we treat !"W# $W% in (* as a homogenous
vector in () to yield !"W# $W# ?%. Given a unit vector
representing the camera, in our case the spacecraft’s
camera’s, orientation !I
D# I
E# I
\% we find the angle to
rotate in each axis !OD# OE# O\%. In a simple case, we find
the angle in the "$ plane with:
O\3 4 ]^_A9 ?4M4M ` aI
D4I
E4I
\b
aI
D4I
E4I
\b ` aI
D4I
E4I
\b4444444444444444444444444444444 !?c%
Rotations for all planes are generated in an identical way.
Now, given a rotation in each plane !OD# OE# O\% we
calculate the homogeneous coordinate !I
D# I
E# I
\# ?% in ()
using linear transformations. The values !H
D# H
E# H
\%
represent a translation in (- and use camera position
coordinates dD# dE# d\# the camera unit vectors
representing orientation eD# eE# e\, and focal length ':
Adams 6 32nd Annual AIAA/USU
Conference on Small Satellites
?
M
M
M
4
M
]^_ OD4
_fg OD
M
M
7_fg OD4
]^_ OD
M
M
M
M
?
]^_ OE
M
7_fg OE
M
4
M
?
M
M
4
_fg OE
M
]^_ OE
M
4
M
M
M
?
44444444444444
]^_ O\
_fg OD
M
M
4
7_fg O\
]^_ O\4
M
M
4
M
M
?
M
4
M
M
M
?
"
$
[
?
3
"W
$W
[W
?
444444444444444444444444444444444 !?h%
4
dD7 !"i< ' 2 eD%
dE7 !$i< ' 2 eE%
d\7 ![i< ' 2 e\%
?
4
"i
$i
[i
?
3 4
H
D
H
E
H
\
?
444444444444444444444444444!?j%
?
M
M
M
4
M
?
M
M
4
M
M
?
M
4
H
D
H
E
H
\
?
4
"i
$i
[i
?
3
"W
$W
[W
?
44444444444444444444444444444444444444444444444444!?L%
Point and Vector format
The resulting transformation in equations 17 and 19
should be performed for all matched points. This should
result in , homogeneous points of the form
!"k# $k# [k# ?%. Each point has a corresponding camera,
which is already known by the camera coordinate,
!dDk# dEk# d\k# ?%. From this find a vector l
k from the
camera position:
l
k3dDk47 "kdEk47 $kd\k47 [k4444444444444444444!8M%
l
k should then be normalized so that it is a unit vector.
N-view Reprojection
Now the point cloud can finally be generated. To do such
a thing, the goal of the n-view reprojection is to find the
point, m, that best fits a set of lines. Traa shows we can
start with the distance function, 5, between our ideal
point, m, and a parameterized line with vector, l, and
point, n. We can think of distance function as a projector
onto the orthocomplement of l, giving18:
5 mo n# l 3 41 7 ll>444444444444444444444444444444444444444444444444444!8?%
Equation 21 should be thought of as a projecting vectors
m and nonto the space orthogonal to l. The challenge is
solving this least squares problem given only matching
sets of points n
k and their vectors l
k. Let the set of
matched points/vectors be represented by the set . 3
p nU# l
U# q # n
k# l
kr. We can view this set . as a set of
parameterized lines. We should minimum the sum of
squared differences with the equation:
5 mo n # l 3 4 5 m o n
s# l
s
t
suU
4444444444444444444444444444444444444 !88%
To produce the point v, the equation to minimize is:
v 3 wfg
x!5 mo n
k# l
k%44444444444444444444444444444444444444444444444444 !8V%
Taking both derivatives with respect m to we receive:
=5
=m 4 3 4 784 1 7 l
sl
s>!n
s7 m%
t
suU
3 M44444444444444444444!8@%4
We then obtain a linear of the form yX 3 z, where:
y 3 4 1 7 l
sl
s>
t
suU
#444z 3 1 7 l
sl
s>
t
suU
n
s4444444444444!8B%
Traa shows that we can either solve the system directly
or apply the Moore-Penrose pseudoinverse:
v 3 4 y {z4444444444444444444444444444444444444444444444444444444444444444444444444444!8c%
The resulting v is the point of best fit for the members of
set .. Once v is calculated, the next set of point/vector
matches is loaded. The computation is repeated until all
points best fit points v have been calculated. At the end
of this stage in the pipeline, the point cloud has been
generated.
Bundle Adjustment
All the calculations to this point have been in preparation
for a reprojection, which is a simple triangulation. The
Bundle Adjustment is used to calibrate the position of
features in () based on a camera calibration matrix and
minimize the projection error19. The camera calibration
matrix, |, is stored by the spacecraft at the time of image
acquisition. Given the location of an observes feature is
!"# $% and the real location of the feature !"# $% then the
reprojection error, I, for that feature is given by:
I 3 " 7 "# $ 7 $ 4444444444444444444444444444444444444444444444444444444444!8h%
The camera parameters include the focal lengths !'
D# '
E%,
the center pixel !dD# dE%, and coefficients !69# 6*% that
represent the first and second order radial distortion of
the lens system. The vector } contains those 6 camera
parameters, the feature’s position in ()given in equation
19 as !"Z# $ Z# [W%, and the camera’s position and
orientation represented by dD# dE# d\4and eD# eE# e\,
as previously equation 18. The goal is to minimize the
function of vector }:
wfg '!}% 3 4 ?
8I!}%>I } 4444444444444444444444444444444444444444444444!8j%
MOCI’s system will supply a pre-estimation of camera
data from sensors onboard, allowing for a bound
Adams 7 32nd Annual AIAA/USU
Conference on Small Satellites
constrained bundle adjustment that will greatly improve
the speed of the computation. Levenberg-Marquardt
(LM) algorithm is utilized to by iteratively solving a
sequence of linear least squares and minimize he
problem. A constrained multicore implementation of the
bundle adjustment would only be a slight modification
of the parallel algorithm proposed and described by
Wu19. This is the last step of the point cloud generation
process, when the reprojection error is minimized the
point cloud computation is considered done. Additional
research is necessary to determine if MOCI needs to
perform a bundle rather than a real time custom
calibration step before a standard n-view reprojection.
POINT CLOUD NORMALIZATION
Once the point set has been generated it must be oriented
so that a more accurate surface reconstruction can take
place.
Finding the Normals of a Point Set
The coordinates of the points in the point cloud and the
camera position ~# ~# ~4 which generated each
point. The problem of determining the normal to a point
on the surface is approximated by estimating the tangent
plane of the point and then taking the normal vector to
the plane. However, the correct orientation of the normal
vector cannot directly be inferred mathematically, so an
additional subroutine is needed to orient each normal
vector. Let the points in the point cloud be members of
the set n 3 pXU# X9# q # Xkr where Xk3 !"k# $k# [k%. The
normal vector of Xk is ,k3 !"k# $k# [k%, which we want
to compute for all Xk‚ n. Lastly, the camera position
corresponding to4Xkis denoted, in vector form, dk3
~ƒ•# ~ƒ€ # ~ƒ• .
An octree data structure is used to search for the nearest
neighbors of point ƒ. The 6 nearest neighbors are
defined by the set … 3 pzU# z9# q # zr. The centroid of
is calculated by:
‡4 3 4 ˆ
4 Š
Š4‚‹
4444444444444444444444444444444444444444444444444444444444444444!8L%
Let A be a k × 3 matrix as follows:
Π3
zU
z9
z
444444444444444444444444444444444444444444444444444444444444444444444444!VM%444
Now we factor matrix Πusing singular value
decomposition (SVD) into Œ4 3 Žl>. Where Ž is a (k ×
k) orthogonal matrix, l> is a (3 × 3) orthogonal matrix,
and is a (k × 3) diagonal matrix, where the elements on
the diagonal, called the “singular values” of Œ, appear in
descending order. Note that the covariance matrix, ŒHŒ,
can be easily diagonalized using our singular value
decomposition:
• 3 “’‘3 ‘ ’ ‘444444444444444444!V?%
The eigenvectors for the covariance matrix are the
columns of vector l. The eigenvalues of the covariance
matrix are the elements on the diagonal of >, and they
are exactly the squares of the singular values of matrix
Œ. In this formula, both l and > are (3 × 3) matrices,
just like the covariance matrix Œ>Œ. For randomly
ordered diagonal elements !/%*‚ 4 ”> we keep only
the maximum I many of them, along with their
corresponding eigenvectors in matrix l. To produce the
best approximation of a plane in () we would take the
two eigenvectors, !–9# –*%, of the covariance matrix with
the highest corresponding eigenvalues. Thus, the normal
vector ,k is simply the cross product of these
eigenvectors, ,k3 4 –9—4–*.
Orienting a Point Set
Orientation of all the normals begins once we have
computed the normal for every point ƒ‚ ˜. We also
want the normals of neighboring points to be consistently
oriented. For the simple case where only a single
viewpoint dW is used to generate a point cloud, we can
simply orient our normal vector such that the following
equation holds:
dZ7 4 Xk`4,kR M444444444444444444444444444444444444444444444444!V8%
If the equation does not hold for a computed normal
vector, we simply “flip” the normal vector by taking
7,k3 !7•ƒ# 7€ƒ# 7ƒ%. If the dot product between the
two vectors is exactly 0, then additional methods need to
be applied. We need to account for the fact that multiple
camera positions may apply to each point in the point
cloud. Let d 3 4 p d•U# d•9# q # d•kr be the set of all many
camera position vectors for point ƒ.
We define ,k to be an “ambiguous normal” if:
1. There exists a d•k 4 ‚ d such that d•k 7 4Xk`
4,kš M AND
2. There exists a d•k 4 ‚ d such that d•k 7 4Xk`
4,kR M
Normals are assigned to all points that do not generate
ambiguous normals. The way to make sure all normals
are consistently correctly is to orient all the non-
ambiguous normals, adding them to a list of finished
normals, while placing all ambiguous normals into a
queue. We then use the queue of ambiguous normals and
try to determine the orientation by looking at the
neighboring points of pi. If the neighboring points of pi
Adams 8 32nd Annual AIAA/USU
Conference on Small Satellites
have already finished normals, we orient ,k consistently
with the neighboring normals Nk by setting ,kNkš M.
If the neighboring points do not have already finished
normals, move ,k to the back of the queue, and continue
until all normals are finalized.
The point normal generation process needs significant
improvements and can be aided greatly by the
implementation of an octree data structure in CUDA6.
SURFACE RECONSTRUCTION
MOCI implements a Poisson Surface Reconstruction
algorithm that is parallelized with CUDA. The input to
this algorithm is an oriented set of points and the output
is a 3D modeled surface. This surface is the final end-
product of MOCI’s computer vision pipeline and is
stored in the Stanford PLY format.
Poisson Surface Reconstruction
Thanks to Kazhdan, Bolitho, and Hoppe, an
implementation of Poisson surface reconstruction on any
GPU has become feasible20. A Poisson surface
reconstruction takes in a set of oriented points, l, and
generates a surface.
Figure 7: Stages of Poisson Reconstruction in 2D
Poisson computes an indicator gradient for the oriented
points set by first finding the function that best
approximates the vector field l. When the divergence
operator is applied the it becomes a Poisson problem
where the goal is to compute the scalar function whose
Lapacian equals the divergence of vector field vector
field l:
œ› ž ` ž› 3 žl444444444444444444444444444444444444444444444444444444444!VV%
CONCLUSION AND INITIAL RESULTS
Comparison to ASTER data
When compared with ASTER data MOCI is already
meeting minimum mission success: MOCI can generate
Digital Elevation Models within one sigma of accuracy
relative to ASTER models.
Accuracy is calculated by a percent pixel difference. A
simple program is used to project the 3D surface onto a
plane, essentially rasterization. This plots a histogram of
how likely it is that a given elevation is off. The percent
difference is calculated from the minimum and
maximum elevations. In other words, the percent pixel
difference is the inverse of the magnitude of the
elevation error.
Figure 8: A comparison of mount a simulated
Mount Everest from MOCI with ASTER data
MOCI’s accuracy is likely to increase as better methods
are implemented in the computer vision pipeline.
2-view Reconstruction
A simple 2-view projection has been fully implemented
and is often used to test the point cloud reconstruction
portion of the pipeline. When supplied with near perfect
keypoint pairs, the algorithm can reconstruct a point
cloud with 86% accuracy. Accuracy is calculated by a
percent pixel difference, which plots a histogram of how
likely a given elevation is to be a given distance off.
A CPU implementation of 2-view reconstruction runs on
a 50,000 point set in approximately 25 minutes. It is
expected that this stage, when optimized for the GPU,
will take less than 90 seconds19.
Adams 9 32nd Annual AIAA/USU
Conference on Small Satellites
Figure 9: A point cloud of Mount Everest generated
from a simulation of a 2-view reconstruction
Simulation of Data Acquisition
The simple blender workflow that was demonstrated in
the initial feasibility study has been expanded, improved,
and is now included in a custom simulation package10.
This simulation package has the capability to simulate a
satellite with variable imaging payloads. This was so the
SSRL could discover the optical lens systems, GSD,
focal length, and sensor for the mission requirements.
The simulation software allows the user to edit these as
parameters - GSD is calculated. The simulation also
allows for variable orbits, variation of ground targets,
custom target objects, and more. A list of generated and
input variables, seen to the left as a json file, shows some
of the current capabilities of the simulation. The SSRL is
currently working on porting these simulations to the
supercomputing cluster available at UGA. Once the user
has created a json file with the variables and environment
that they would like to simulate, they can run the image
acquisition simulation in a terminal10.
Figure 10: A point cloud of Mount Everest
generated from a simulated orbit over the region.
Simulated data acquisition can be piped into
reconstruction any algorithm. The position and
orientation of the camera, as well as the image set, are all
part of the standard output.
FUTURE WORK
Testing and Simulation
While initial results are promising and terrestrial
technologies have shown that MOCI’s computer vision
pipeline is successful, more tests are needed to
understand the limitations and capabilities of MOCI’s
computer vision system.
N-view Reconstruction vs. Bundle Adjustment
It is unclear if a full Bundle Adjustment is necessary
given MOCI’s knowledge of its camera parameters. It
may be possible to calibrate the first and second order
radial distortion of the lens system once for all images
and feature matches instead of calculating it every time
a reprojection occurs. It may also be the case that, despite
somewhat accurate knowledge of camera parameters,
MOCI’s multi-view reconstruction stage will benefit
from improved accuracy with a bound constrained
Bundle Adjustment.
IMPLICATIONS
The complex computation system on the MOCI cube
satellite may show that it is worth performing more
complex computations in space rather than on the
ground. In MOCI’s case, it is beneficial because the 40
or more 4K images that it takes to generate a 3D model
contain much more data than the final 3D model. This
could also be the cause for real time data analysis, with
a GPU accelerated system it may be possible to analyze
data onboard a spacecraft to determine which data is the
most useful and prioritize the downlink of that data. This
has clear applications for autonomous space system or
deep space missions. During a deep space mission, it
would be possible to implement an AI to decide what
data is worth sending back to Earth. In general, it’s likely
that Neural Networks will be easily implemented for
space based applications on the TX2, or a system like
MOCI’s.
Acknowledgments
A significant thank you to Aaron Martinez, who helped
me understand most of the mathematics of MOCI’s
computer vision pipeline. The same thanks goes to
Nicholas (Hollis) Neel, who helped me understand the
concepts relating to the SIFT algorithm. Thanks to
Jackson Parker for being the person who usually has to
experiment with writing these algorithms in CUDA.
Last, but not least, I would like to thank Dr. David
Cotton, who has helped guide the SSRL to where it is
today.
Adams 10 32nd Annual AIAA/USU
Conference on Small Satellites
References
1. Rossi, Adam J., "Abstracted Workflow
Framework with a Structure from Motion
Application" (2014). Thesis. Rochester Institute of
Technology. Accessed from
http://scholarworks.rit.edu/theses/7814
2. Michot J., Bartoli A., Gaspard F.,Algebraic Line
Search for Bundle Adjustment”- British Machine
Vision Conference (BMVC)
3. Wu C, Towards linear-time incremental structure
from motion.”, 3D Vision-3DV 2013, 2013
International conference on, 2013
4. J Mak, M Hess-Flores, S Recker, JD Owens, KI
Joy, “GPU-accelerated and effiecient multi-view
triangulation for scene reconstruction”
Applications of Computer Vision (WACV), 2014
IEEE, 2014
5. Aniruddha Acharya K and R. Venkatesh Babu,
"Speeding up SIFT using GPU," 2013 Fourth
National Conference on Computer Vision, Pattern
Recognition, Image Processing and Graphics
(NCVPRIPG), Jodhpur, 2013, pp. 1-4.
6. K. Zhou, M. Gong, X. Huang and B. Guo, "Data-
Parallel Octrees for Surface Reconstruction," in
IEEE Transactions on Visualization and
Computer Graphics, vol. 17, no. 5, pp. 669-681,
May 2011.
7. J. Stoddard, D. Messinger and J. Kerekes, "Effects
of cubesat design parameters on image quality and
feature extraction for 3D reconstruction," 2014
IEEE Geoscience and Remote Sensing
Symposium, Quebec City, QC, 2014, pp. 1995-
1998.
8. Alexandre Boulch, Renaud Marlet. Fast Normal
Estimation for Point Clouds with Sharp. Features
using a Robust Randomized Hough Transform.
Computer Graphics Forum, Wiley. 2012, 31 (5),
pp.1765-1774. HAL Id: hal-00732426 https://hal-
enpc.archives-ouvertes.fr/hal-00732426
9. Adams, C., Neel N., “Structure from Motion from
a Constrained Orbiting Platform Using ISS image
data to generate cloud height models.”, presented
at the NASA/CASIS International Space Station
Research and Development Conference,
Washington D.C., 2017.
10. Adams, C., Neel N., “The Feasibility of Structure
from Motion over Planetary Bodies with Small
Satellites”, presented at the The AIAA/Utah State
Small Satellite Conference - SmallSat, Logan
Utah, 2017.
11. Likar J. J., Stone S. E., Lombardi R. E., Long K.
A., “Novel Radiation Design Approach for
CubeSat Based Missions.” Presented at 24th
Annual AIAA/USU Conference on Small
Satellites, August 2010
12. Hartley R., Kahl F. (2007) Optimal Algorithms in
Multiview Geometry. In: Yagi Y., Kang S.B.,
Kweon I.S., Zha H. (eds) Computer Vision
ACCV 2007. ACCV 2007. Lecture Notes in
Computer Science, vol 4843. Springer, Berlin,
Heidelberg
13. Chrzeszczyk, Andrzej & Chrzeszczyk, Jakub.
(2013). Matrix computations on the GPU,
CUBLAS and MAGMA by example.
14. Hassaballah, M & Ali, Abdelmgeid & Alshazly,
Hammam. (2016). Image Features Detection,
Description and Matching. 630. 11-45.
10.1007/978-3-319-28854-3_2.
15. Lowe, D.G.: Distinctive image features from
scale-invariant keypoints. Int. J. Comput. Vis.
60(2), 91110 (2004)
16. M. Brown and D. Lowe. Invariant Features from
Interest Point Groups. In David Marshall and Paul
L. Rosin, editors, Proceedings of the British
Machine Conference, pages 23.1-23.10. BMVA
Press, September 2002.
17. Harris, C. and Stephens, M. (1988) A Combined
Corner and Edge Detector. Proceedings of the 4th
Alvey Vision Conference, Manchester, 31
August-2 September 1988, 147-151.
18. Traa J., “Least Square Intercetion of Lines”, UIUC
2013,
http://cal.cs.illinois.edu/~johannes/research/LS_li
ne_intersect.pdf
19. Wu C., Agarwal S., Curless B., SM Seitz.
Multicore bundle adjustment. Computer Vision
and Pattern Recognition (CVPR), 2011 IEEE
Conference on …, 2011. 574, 2011.
20. Michael Kazhdan, Matthew Bolitho, and Hugues
Hoppe. 2006. Poisson surface reconstruction. In
Proceedings of the fourth Eurographics
symposium on Geometry processing (SGP '06).
Eurographics Association, Aire-la-Ville,
Switzerland, Switzerland, 61-70.
... For example, dense SIFT can allow for stereo disparity rather than stereo multi-view and reprojection, greatly simplifying computer vision pipelines [58] [21]. Stereo disparity, though it is not the focus of SSRLCV, can be desirable because it can be easily implemented on smaller, less optically complex cube satellites. ...
Thesis
Full-text available
In this thesis research I discuss the design and implementation of 2 Earth observation Cube Satellites with a focus on the computational methods used and the design of their computer systems. The satellite computer systems are tested by simulating imaging of single view observations and multiview observations. Observations are simulated by imaging existing 3D models of the Earth's surface in 3D rendering software. A custom computer vision library, known as SSRLCV, is used to compute the final 3D models which are then compared to the ground truth. Restrictions, unique to the space environment, are mitigated with a specialized operating system, hardware, and software. Tests are run on the Nvidia TX2 and TX2i with timing, state, and power usage tracking. The Nvidia TX2i GPU accelerated SoC is modified for use in a Cube Satellite and is used as the platform for high performance onboard computation. The results show accurate 3D reconstruction of the surface of Earth feasible within 15 to 100 meters, depending on the camera system and altitude, while maintaining favorable power usage and computation time.
... Although the system utilizes a GPU, an additional On Board Computer (OBC) is still required for control and communication with core avionics. The UGA SSRL has developed a board, the Core GPU Interface (CORGI), that is capable of interfacing the Nvidia TX1, Nvidia TX2, or Nvidia TX2i into a PC/104+ compliant CubeSat [1] [2][3]. In this paper we discuss the previously designed CORGI board and elaborate on lessons learned, including how the CORGI is being used to combine an OBC and an Nvidia TX2i into one board. ...
Poster
Full-text available
Flight computer utilizing Jetson Tx2i for accelerated computing . This was presented at the 2019 Georgia Space Symposium
Conference Paper
This paper reports the initial investigation of the use of cube-satellites (cubesats) for image-based, three-dimensional (3D) reconstruction. Cubesats are emerging as a low-cost, inexpensive, and quickly deployable alternative to their larger satellite predecessors but their functionality is limited by the payload and bus electronics that can fit in the minimal volume. This paper addresses the impact of the spatial resolution limitation of state-of-the-art cubesat imagers on 3D reconstruction, which is assessed with building height and surface normal measurements. For the nadir ground-sampled distance (GSD) range of 0.25 to 2 m, reconstruction results yielded building height estimates that varied by approximately two meters and surface normal estimates with an error ranging from 1-29 degrees depending on the complexity of the surface.
Conference Paper
Scale Invariant Feature Transform (SIFT) is one of the widely used interest point features. It has been successfully applied in various computer vision algorithms like object detection, object tracking, robotic mapping and large scale image retrieval. Although SIFT descriptors are highly robust towards scale and rotation variations, the high computational complexity of the SIFT algorithm inhibits its use in applications demanding real time response, and in algorithms dealing with very large scale databases. This paper presents a parallel implementation of SIFT on a GPU, where we obtain a speed of around 55 fps for a 640×480 image. One of the main contributions of our work is the novel combined kernel optimization that has led to a significant improvement of 21.79% in the execution speed. We compare our results with the existing implementations in the literature that accelerate SIFT, and find that our implementation has better speedup than the most of them.
Article
This paper presents a new method for estimating normals on unorganized point clouds that preserves sharp features. It is based on a robust version of the Randomized Hough Transform (RHT). We consider the filled Hough transform accumulator as an image of the discrete probability distribution of possible normals. The normals we estimate corresponds to the maximum of this distribution. We use a fixed-size accumulator for speed, statistical exploration bounds for robustness, and randomized accumulators to prevent discretization effects. We also propose various sampling strategies to deal with anisotropy, as produced by laser scans due to differences of incidence. Our experiments show that our approach offers an ideal compromise between precision, speed, and robustness: it is at least as precise and noise-resistant as state-of-the-art methods that preserve sharp features, while being almost an order of magnitude faster. Besides, it can handle anisotropy with minor speed and precision losses. © 2012 Wiley Periodicals, Inc.
Conference Paper
It is common for many disparate software tools to be developed in both open source and proprietary environments, which are incompatible and leave the community with disposable software. As such, a processing workflow has been abstracted and implemented in a generic fashion, where the core engine can be easily extended into domains for applications that require functionality from many independent components. Specifically, there have been a host of tools developed to address the structure from motion (SfM) domain that require interfacing in order to create a functional system. Many application-specific solutions have been constructed that are inflexible and repeatedly duplicated. The theory of operation behind the core framework and components for the SfM application are presented to demonstrate how the abstracted workflow can be easily applied to generate 3D models from 2D imagery of a scene captured from multiple cameras and varying perspectives.
Conference Paper
The time complexity of incremental structure from motion (SfM) is often known as O(n^4) with respect to the number of cameras. As bundle adjustment (BA) being significantly improved recently by preconditioned conjugate gradient (PCG), it is worth revisiting how fast incremental SfM is. We introduce a novel BA strategy that provides good balance between speed and accuracy. Through algorithm analysis and extensive experiments, we show that incremental SfM requires only O(n) time on many major steps including BA. Our method maintains high accuracy by regularly re-triangulating the feature matches that initially fail to triangulate. We test our algorithm on large photo collections and long video sequences with various settings, and show that our method offers state of the art performance for large-scale reconstructions. The presented algorithm is available as part of VisualSFM at http://homes.cs.washington.edu/~ccwu/vsfm/.
We present the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime. Our CPU based system is up to ten times and our GPU based system is up to thirty times faster than the current state of the art methods, while maintaining comparable convergence behavior. The code and additional results are available at http://grail.cs.washington.edu/projects/mcba.
Conference Paper
We show that surface reconstruction from oriented points can be cast as a spatial Poisson problem. This Poisson formulation considers all the points at once, without resorting to heuristic spatial partitioning or blending, and is therefore highly resilient to data noise. Unlike radial basis function schemes, our Poisson approach allows a hierarchy of locally supported basis functions, and therefore the solution reduces to a well conditioned sparse linear system. We describe a spatially adaptive multiscale algorithm whose time and space complexities are pro- portional to the size of the reconstructed model. Experimenting with publicly available scan data, we demonstrate reconstruction of surfaces with greater detail than previously achievable.
Conference Paper
Bundle Adjustment is based on nonlinear least squares minimization techniques, such as Levenberg-Marquardt and Gauss-Newton. It iteratively computes local parameter in- crements. Line Search techniques aim at providing an efficient magnitude for these in- crements, called the step length. In this paper, a new ad hoc Line Search technique for solving bundle adjustment is proposed. The main idea is to determine an efficient step length using an approximation of the cost function based on an algebraic distance. We use the Wolfe conditions to show that our Line Search preserves the convergence properties of the original algorithm. Our method is compared to different nonlinear optimization algorithms and Line Search techniques under several conditions, on real and synthetic data. The method improves the minimization process, decreasing the reprojection error significantly faster than the other techniques.
Conference Paper
This paper approaches the problem of finding correspondences between images in which there are large changes in viewpoint, scale and illumi- nation. Recent work has shown that scale-space 'interest points' may be found with good repeatability in spite of such changes. Further- more, the high entropy of the surrounding image regions means that local descriptors are highly discriminative for matching. For descrip- tors at interest points to be robustly matched between images, they must be as far as possible invariant to the imaging process. In this work we introduce a family of features which use groups of interest points to form geometrically invariant descriptors of image regions. Feature descriptors are formed by resampling the image rel- ative to canonical frames defined by the points. In addition to robust matching, a key advantage of this approach is that each match implies ah ypothesis of the local 2D (projective) transformation. This allows us to immediately reject most of the false matches using a Hough trans- form. We reject remaining outliers using RANSAC and the epipolar constraint. Results show that dense feature matching can be achieved in a few seconds of computation on 1GHz Pentium III machines.