Content uploaded by Caleb Adams
Author content
All content in this area was uploaded by Caleb Adams on Aug 26, 2019
Content may be subject to copyright.
Adams 1 32nd Annual AIAA/USU
Conference on Small Satellites
SSC18-x-x
A Near Real Time Space Based Computer Vision System for Accurate Terrain Mapping
Caleb Adams
University of Georgia Small Satellite Research Laboratory
106 Pineview Court, Athens, GA 30606
CalebAshmoreAdams@gmail.com
Faculty Advisor: Dr. David Cotten
Center for Geospatial Research, UGA Small Satellite Research Laboratory
ABSTRACT
The Multiview Onboard Computational Imager (MOCI) is a 3U cube satellite designed to convert high resolution
imagery, 4K images at 8m Ground Sample Distance (GSD), into useful end data products in near real time. The
primary data products that MOCI seeks to provide are a 3D terrain models of the surface of Earth that can be directly
compared to the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) v3 global Digital
Elevation Model (DEM). MOCI utilizes a Nvidia TX2 Graphic Processing Unit (GPU)/System on a Chip (SoC) to
perform the complex calculations required for such a task. The reconstruction problem, which MOCI can solve,
contains many complex computer vision subroutines that can be used in less complicated computer vision pipelines.
INTRODUCTION
This paper does not seek to describe the entire satellite
system; it seeks to describe, in detail, the complex
computation system that MOCI will utilize to generate
scientific data on orbit. An overview of the satellite
system and optical system are provided for clarity and
context. A detailed explanation of the subroutines in
MOCI’s primary computer vision pipeline are described
in detail over the course of this paper.
System Overview
The MOCI satellite primarily uses Commercial Off the
Shelf (COTS) hardware so that the focus can be on
payload development. The MAI-401 with a star-tracker
is utilized to achieve the necessary pointing
requirements. The GomSpace BP4 P60 Electrical Power
System (EPS) is used. The F’Sati Ultra High Frequency
(UHF) transceiver and F’Sati S-Band transmitter are
used for communications of commands, telemetry, and
science data. A Clyde Space On Board Computer (OBC)
is used as the main flight computer. The payload uses the
Nvidia TX2 SoC as the high-performance computation
unit and a custom optical system developed by Ruda-
Cardinal that produces 4K images from at 8m GSD from
a 400km orbit.
Surface Reconstruction Pipeline
In our case, a computer vision pipeline, sometimes
referred to as a workflow1, consists of a set of chained
computer vision subroutines where the output of the
previous is the input to the next. Subroutines are often
referred to as stages in this sense1. MOCI implements a
Surface Reconstruction Pipeline with the initial inputs
being a set of images, the position of the spacecraft per
image, and the orientation of the spacecraft per image.
The first stage in the pipeline is the feature detection
stage.
Figure 1: Multiview Reconstruction2
The feature detection stage identifies regions within each
image that should be considered for feature description.
This stage produces a set of points at location !" # $%,
scale &, and rotation q. The feature description stage
takes this set and encodes the information from local
regions into a feature vector '. The next stage is the
feature matching stage, which seeks to find the best
correspondence, or minimum difference, between the set
of points in the images. Once points have been matched
in the image they need to be placed into () from (*. for
reprojection. Vectors are made at the position of each
matched point and used to calculate the point of
Adams 2 32nd Annual AIAA/USU
Conference on Small Satellites
minimum distance between all projected lines, which is
the calculated point of intersection. The output of the
reprojection is a set of points in (). After the initial
reprojection a Bundle Adjustment is performed as the
next stage in the pipeline. This takes the sets of points
and uses camera data to estimate the reprojection error
and remove it from the generated points. The result after
the Bundle Adjustment is a more accurate point cloud.
The next stage is the final stage, which is a surface
reconstruction. First normal are calculated for the set of
points to make an oriented point set. The oriented point
set is then used for a Poisson Surface reconstruction to
make the final data product. Any additional computer
vision subroutines discussed here are not part of MOCI’s
primary pipeline.
RELATED WORK
The techniques relayed here are not new, but are built
from well understood algorithms and computer vision
subroutines. The implementations of these well
understood principals are built from previous work that
the University of Georgia (UGA) Small Satellite
Research Laboratory (SSRL) has done. The adaptation
of structure from motion and real time mapping for aerial
based photogrammetry and autonomous robotics is
commonplace.
Multiview Reconstruction
GPU accelerated mechanics Structure from Motion have
been implemented on many occasions. Chang Chang
Wu’s research to develop an incremental approach to
Structure from Motion demonstrated that it was possible
to solve the reconstruction problem in +!,% rather than
+!,-%# greatly improving efficiency and speed3.
Additional research has shown that the triangulation
problem can be achieve a 40x speed up4 when utilizing
Compute Unified Device Architecture (CUDA) capable
Nvidia GPUs. Additionally, multicore GPU Bundle
Adjustment has been shown to achieve a 30x speed up
over previous implementations. GPU accelerated feature
detectors and descriptors are now commonplace. This
can typically lead to the identification and extraction of
features within a few milliseconds5, which allows the
near real time extraction of input information into the
pipeline. A standard Poisson Reconstruction,
fundamentally limited by the octree data structure, can
run two orders of magnitude6 faster when implemented
on a CUDA capable GPU.
Surface Normal Calculation
A key problem in a surface reconstruction or structure
from motion pipeline is the generation of an oriented
point set. Recent research, testing the feasibility of
generating oriented point sets from cube satellites, has
claimed that normal estimation is only accurate between
5o and 29o when utilizing 2m GSD imagery7.
Additionally, new techniques have recently been
demonstrated showing that a Randomized Hough
Transform can preserve sharp features, improving the
accuracy of point normal while being almost an order of
magnitude faster8. It is expected the more efficient point
normalization methods will improve the accuracy of the
3D models generated by MOCI.
Cloud Height and Planetary Modeling
With previous studies, we have shown that image data
from the International Space Station (ISS) High
Definition Earth-Viewing System (HDEV) can produce
accurate cloud height models within 5.926 – 7.012 km9.
Additionally, available structure from motion pipelines
and surface reconstruction, in conjunction with the
SSRL’s custom simulation software, have been used to
demonstrate that a 3D surfaces of mountain ranges can
be reconstructed. The SSRL has demonstrated, that the
proposed pipeline can generate 3D models of large
geographic features within 68.2% accuracy of ASTER
v3 global DEM data10, resulting in an approximately
10m resolution surface model.
PAYLOAD SYSTEM OVERVIEW
A simple overview of the system is provided in this
section to make later sections about scientific
computations more clear. The payload sits at the top of
the electronics stack of the MOCI system and contains
the Nvidia TX2, an optical assembly, an e-con systems
See3CAM_CU135 with an AR1335 image sensor from
ON Semiconductor, Core GPU Interface (CORGI)
Board to connect all the subsystems together and allow
them to communicate over a standard PC104+ bus.
Figure 2: Payload Electronics
Adams 3 32nd Annual AIAA/USU
Conference on Small Satellites
Data Interfacing and Power
The Nvidia TX2 is interfaced to the CORGI Board via a
400 pinout connector. The CORGI Board connected to
the See3CAM_CU135 interface board via a Universal
Serial Bus type C connector (USB-C) allowing for 4K
image data to be streamed to the GPU. The CORGI also
routes an Inter-Integrated Circuit (I2C) bus, Ethernet, and
a Serial Peripheral Interface (SPI) into the satellite’s
PC104+ bus. The TX2’s maximum power draw is 7W,
but current computations are only running at
approximately 4.5W.
Thermal Properties
For a worst case, thermal analysis, an unrealistic power
draw of 14W is used. Additionally, a system with 0%
efficiency was also assumed as a worst-case scenario. A
maximum temperature of approximately 51o C was
simulated with these conditions. The TX2 is attached to
a Thermal Transfer Plate and simulations have shown
that the max operating temperature is sustainable for the
system. Further, more detailed, research will soon be
published on how we have managed these thermal
conditions.
Optical System
The SSRL is partnering with Ruda-Cardinal to make a
custom optical assembly capable of generating images at
a resolution of at most 8m GSD. The optical system has
a 4.5o Field of View (FOV) and an effective focal length
of 120mm.
Figure 3: MOCI Optical Assembly
GPU SYSTEM OVERVIEW
The GPU (Nvidia TX2) is a complete SoC, capable of
running GNU/Linux on an ARM v8 CPU with a Tegra
GPU running the Pascal Architecture. The TX2 is has
256 CUDA cores, 8 GB of 128 bit LPDDR4, and 32 GB
of eMMC.
Radiation Mitigation
The primary concerns in LEO are Single Event Upsets
(SEU), Single Event Functional Interrupts (SEFI), and
Single Event Latchups (SEL)11. These are certainly
concerns for a dense SoC like the TX2. Thus, MOCI will
utilize aluminized capton as a thin layer of protection for
the payload. Software mitigation is also implemented.
The Clyde Space OBC contains hardware-encoded ECC
and could flash a new image onto the TX2 if necessary.
The TX2 also utilizes a custom implementation of
software-encoded error correction coding (ECC).
Further, more detailed, research will soon be published
on how we have managed and characterized these
radiation conditions.
Compute Unified Device Architecture
Currently the TX2 utilizes CUDA 9.0. CUDA capable
GPUs can parallelize tasks, leading to computational
speeds orders of magnitude higher than those of a CPU
system performing the same operations. CUDA’s
computational parallel model is comprised of a grid that
contains blocks made up of threads. The TX2 can handle
up to 65535 blocks per dimension, leading to a potential
total of 2.81 x 1014 blocks each containing a maximum
of 1024 threads. The potential for parallelization here is
substantial, and is the key to developing a near real time
computer vision system.
GPU Accelerated Linear Algebra
In Hartley’s survey paper on optimal algorithms in
Multiview Geometry, every algorithm he identifies
benefits greatly from hyper optimized matrix operations
that are made possible by the massive parallelization that
CUDA enables12. Furthermore, widely available linear
algebra libraries, such as the Basic Linear Algebra
Subsystem (BLAS) have been accelerated with CUDA13.
These modified libraries, such as cuBLAS and
cuSOLVER are critical to the improving the functions
needed in complex computer vision pipelines.
FEATURE DETECTION AND DESCRIPTION
After images have been acquired from the payload
system, the first step in the pipeline is feature detection.
Typically, feature detection attempts to identifies regions
within an image that should be considered for feature
description14. Feature descriptions are only given to
candidate regions/points that meet the requirements of
the algorithm. For our purposes, we utilize the Scale
Invariant Feature Transform (SIFT) developed by Lowe.
The SIFT algorithm, which contains several standard
subroutines, has become a standard in computer vision
Detection of Scale Space Extrema
To detect feature that are scale-invariant, the SIFT
algorithm uses a Difference of Gaussians (DoG) to
identify local extrema in scale-space. The Laplacian of
Gaussians (LoG) is often used to detect stable features in
scale-space14. The convolution of an image with a
Adams 4 32nd Annual AIAA/USU
Conference on Small Satellites
Gaussian kernel is defined by a function .!"# $# /%,
which is produced by the convolution of a scale space
Gaussian15, 0!"# $# /%, and input image, 1!" # $% and 2 is
a convolution operation between functions:
. "# $# / 3 40 "# $# / 2 41!"# $% (1)
An efficient way to calculate the DoG function,
5!"# $# /% is to simply compute the difference two
nearby scales with a separation of 6.
5 "# $# / 3 0 " # $# 6/ 7 0 " # $# / 2 1 "# $ (2)4
5 "# $# / 3 . "# $# 6/ 7 . "# $# / (3)
The Gaussian kernel is convolved with the input image
to form a Gaussian Scale Pyramid.
Figure 4: The DoG in Scale-Space15
To detect the local minima and maxima of the DoG
function, the candidate point is compared to the 8 local
neighbors of its current scale, the 9 neighbors above its
scale, and the 9 neighbors below its scale in the DoG
pyramid. For ease of computation, the scale separation
of 6 is chosen to be represented as 6 3 4 89:; . where & is
chosen to be an integer number such that a doubling of &
results in a division of the scale space / in the next
octave15.
Keypoint Localization and Filtering
Given a set of candidate points, given by the detection of
scale-space extrema from the DoG, the challenge is to
localize the point by determining the ratio of principal
curvature. A method proposed by Brown uses a Taylor
expansion of the scale-space function, 5!"# $# /%, where
the origin is at the center of the sample point16:
5 " 3 45 < 4 =5>
=" " < 4 ?
8">=*5
="*"4444444444444444444444444444444!@%
The derivatives are located at the center of the sample
point and the offset from the sample point is defined as
" 3 !"# $# / %>. The local extreme " is given by taking
the derivative of the function with respect to "4and
setting it equal to zero:
" 3 4 7 =*5A9
="*4=5
=" 4444444444444444444444444444444444444444444444444444444444444 !B%
Often additional calculations are preformed to eliminate
unstable extrema and edge responses. To eliminate
strong edge responses, which a DoG will often produce,
the principal curvature is computed from a Hessian
matrix containing the partial derivatives of the DoG
function, 5!"# $# /%:
C 3 4 5DD 5DE
5DE 5EE (6)
Harris and Stephens have shown that we only need be
concerned with the ratios of eigenvalues17. We let F
represent be the eigenvalue with the largest magnitude
and G be the smaller magnitude. Then we compute the
trace of C and the determinate.
HI C 3 4 5DD < 4 5EE 3 4F < 4G44 (7)
5JK C 3 4 5DD5EE 7 !5DD*% 3 4FG (8)
We then let I be the ratio of the between the largest and
smallest eigenvalues such that F 3 IG. Then we
discover:
HI!C%*
5JK!C% 3 4 !F < 4G%*
FG 3 4 !IG < 4G%*
IG*3 4 !I < ?%*
I44444!L%
We can then use this ratio as a cut off for undesired
edge points. Typically, a value of I 3 4?M is used15 to
eliminate principal curvatures greater than I.
Orientation and Magnitude Assignment
To achieve rotation invariance, so that we can identify
the same keypoints from any rotation, we must assign an
orientation to the keypoints from the previous step. We
want these computation to occur in a scale invariant
manner as well, so we select the Gaussian smoothed
image .!"# $% at scale / where the extrema was detected.
We can use pixel differences to compute the gradient
magnitude, N!"# $%, and orientation, O "# $ # to assign:
N "# $ 3 4 !. " < ?# $ 7 .!" 7 ?# $%%*
<!. "# $ < ? 7 .!"# $ 7 ?%%* (10)
O "# $ 3 4 KP,A9 . "# $ < ? 7 . "# $ 7 ?
. " < ?# $ 7 4.!" 7 ?# $%44444444444!??%
Adams 5 32nd Annual AIAA/USU
Conference on Small Satellites
Feature Description
For each keypoint, the SIFT algorithm will start by
calculating the image gradient magnitudes and in a 16 ×
16 region around each keypoint using its scale to select
the level of Gaussian blur for the image. A set of oriented
histograms is created for each 4 × 4 region of the image
gradient window15.
A Gaussian weighting function with / equal to half the
region size assigns weights to each sample point. Given
that there are 4 × 4 and 8 possible orientations, the length
of the generated feature vector is 128. In other words,
there are 128 elements describing each point in the final
output of the SIFT algorithm.
Figure 5: A SIFT Feature Descriptor15
FEATURE MATCHING
Feature matching can be thought of as a simple problem
of Euclidean distance. First, sets of points are eliminated
that do not fit within a radius I, a set of close points is
generated with the simple Euclidean distance Q, where
each feature has a coordinate !"# $% on the image.
Q 3 4 !$*7 $9% < 4 !"*7 "9%4 (12)
We iterate through each point in image one, 19, and
image two, 1*, and accumulate potential matches where
Q4 R 4I.
Figure 6: SIFT feature matching
For each value in the feature vector, ', we find the
minimum 128 dimensional Euclidean distance, N.
N 3 4 !'
S9 7 4 '
S9%
9*T
U
4444444444444444444444444444444444444444444444444444!?V%
The resulting “matched” points should also be checked
against some maximum threshold. If the minimum
Euclidean distance is more than that threshold, the match
should be discarded.
MULTIVIEW RECONSTRUCTION
Once the features have been identified for each image
and the features between images have been matched, the
image planes must be placed into (). The matched
keypoints and camera information must be used to
triangulate the location of the identified feature in ().
Moving into 3D space
The first step to moving a key point into (). is to place
it onto a plane in (*. The coordinates !"W# $W% in (*
require the size of a pixel QXY", the location of the
keypoint !"# $%, and the resolution of the image
!"IJ&#$IJ&% to yield:
"Z3 4QXY"4 " 7 "IJ&
84444444444444444444444444444444444444444444444444!?@%
$Z3 4QXY"4$IJ&
87 $ 4444444444444444444444444444444444444444444444444 !?B%
This is repeated for the other matching keypoint. The
coordinate !"W# $W# [W%4in () of the keypoint !"W# $W% in (*
is given by three rotation matrices and one translation
matrix. First we treat !"W# $W% in (* as a homogenous
vector in () to yield !"W# $W# ?%. Given a unit vector
representing the camera, in our case the spacecraft’s
camera’s, orientation !I
D# I
E# I
\% we find the angle to
rotate in each axis !OD# OE# O\%. In a simple case, we find
the angle in the "$ plane with:
O\3 4 ]^_A9 ?4M4M ` aI
D4I
E4I
\b
aI
D4I
E4I
\b ` aI
D4I
E4I
\b4444444444444444444444444444444 !?c%
Rotations for all planes are generated in an identical way.
Now, given a rotation in each plane !OD# OE# O\% we
calculate the homogeneous coordinate !I
D# I
E# I
\# ?% in ()
using linear transformations. The values !H
D# H
E# H
\%
represent a translation in (- and use camera position
coordinates dD# dE# d\# the camera unit vectors
representing orientation eD# eE# e\, and focal length ':
Adams 6 32nd Annual AIAA/USU
Conference on Small Satellites
?
M
M
M
4
M
]^_ OD4
_fg OD
M
M
7_fg OD4
]^_ OD
M
M
M
M
?
]^_ OE
M
7_fg OE
M
4
M
?
M
M
4
_fg OE
M
]^_ OE
M
4
M
M
M
?
44444444444444
]^_ O\
_fg OD
M
M
4
7_fg O\
]^_ O\4
M
M
4
M
M
?
M
4
M
M
M
?
"
$
[
?
3
"W
$W
[W
?
444444444444444444444444444444444 !?h%
4
dD7 !"i< ' 2 eD%
dE7 !$i< ' 2 eE%
d\7 ![i< ' 2 e\%
?
4
"i
$i
[i
?
3 4
H
D
H
E
H
\
?
444444444444444444444444444!?j%
?
M
M
M
4
M
?
M
M
4
M
M
?
M
4
H
D
H
E
H
\
?
4
"i
$i
[i
?
3
"W
$W
[W
?
44444444444444444444444444444444444444444444444444!?L%
Point and Vector format
The resulting transformation in equations 17 and 19
should be performed for all matched points. This should
result in , homogeneous points of the form
!"k# $k# [k# ?%. Each point has a corresponding camera,
which is already known by the camera coordinate,
!dDk# dEk# d\k# ?%. From this find a vector l
k from the
camera position:
l
k3dDk47 "kdEk47 $kd\k47 [k4444444444444444444!8M%
l
k should then be normalized so that it is a unit vector.
N-view Reprojection
Now the point cloud can finally be generated. To do such
a thing, the goal of the n-view reprojection is to find the
point, m, that best fits a set of lines. Traa shows we can
start with the distance function, 5, between our ideal
point, m, and a parameterized line with vector, l, and
point, n. We can think of distance function as a projector
onto the orthocomplement of l, giving18:
5 mo n# l 3 41 7 ll>444444444444444444444444444444444444444444444444444!8?%
Equation 21 should be thought of as a projecting vectors
m and nonto the space orthogonal to l. The challenge is
solving this least squares problem given only matching
sets of points n
k and their vectors l
k. Let the set of
matched points/vectors be represented by the set . 3
p nU# l
U# q # n
k# l
kr. We can view this set . as a set of
parameterized lines. We should minimum the sum of
squared differences with the equation:
5 mo n # l 3 4 5 m o n
s# l
s
t
suU
4444444444444444444444444444444444444 !88%
To produce the point v, the equation to minimize is:
v 3 wfg
x!5 mo n
k# l
k%44444444444444444444444444444444444444444444444444 !8V%
Taking both derivatives with respect m to we receive:
=5
=m 4 3 4 784 1 7 l
sl
s>!n
s7 m%
t
suU
3 M44444444444444444444!8@%4
We then obtain a linear of the form yX 3 z, where:
y 3 4 1 7 l
sl
s>
t
suU
#444z 3 1 7 l
sl
s>
t
suU
n
s4444444444444!8B%
Traa shows that we can either solve the system directly
or apply the Moore-Penrose pseudoinverse:
v 3 4 y {z4444444444444444444444444444444444444444444444444444444444444444444444444444!8c%
The resulting v is the point of best fit for the members of
set .. Once v is calculated, the next set of point/vector
matches is loaded. The computation is repeated until all
points best fit points v have been calculated. At the end
of this stage in the pipeline, the point cloud has been
generated.
Bundle Adjustment
All the calculations to this point have been in preparation
for a reprojection, which is a simple triangulation. The
Bundle Adjustment is used to calibrate the position of
features in () based on a camera calibration matrix and
minimize the projection error19. The camera calibration
matrix, |, is stored by the spacecraft at the time of image
acquisition. Given the location of an observes feature is
!"# $% and the real location of the feature !"# $% then the
reprojection error, I, for that feature is given by:
I 3 " 7 "# $ 7 $ 4444444444444444444444444444444444444444444444444444444444!8h%
The camera parameters include the focal lengths !'
D# '
E%,
the center pixel !dD# dE%, and coefficients !69# 6*% that
represent the first and second order radial distortion of
the lens system. The vector } contains those 6 camera
parameters, the feature’s position in ()given in equation
19 as !"Z# $ Z# [W%, and the camera’s position and
orientation represented by dD# dE# d\4and eD# eE# e\,
as previously equation 18. The goal is to minimize the
function of vector }:
wfg '!}% 3 4 ?
8I!}%>I } 4444444444444444444444444444444444444444444444!8j%
MOCI’s system will supply a pre-estimation of camera
data from sensors onboard, allowing for a bound
Adams 7 32nd Annual AIAA/USU
Conference on Small Satellites
constrained bundle adjustment that will greatly improve
the speed of the computation. Levenberg-Marquardt
(LM) algorithm is utilized to by iteratively solving a
sequence of linear least squares and minimize he
problem. A constrained multicore implementation of the
bundle adjustment would only be a slight modification
of the parallel algorithm proposed and described by
Wu19. This is the last step of the point cloud generation
process, when the reprojection error is minimized the
point cloud computation is considered done. Additional
research is necessary to determine if MOCI needs to
perform a bundle rather than a real time custom
calibration step before a standard n-view reprojection.
POINT CLOUD NORMALIZATION
Once the point set has been generated it must be oriented
so that a more accurate surface reconstruction can take
place.
Finding the Normals of a Point Set
The coordinates of the points in the point cloud and the
camera position ~•# ~€# ~•4 which generated each
point. The problem of determining the normal to a point
on the surface is approximated by estimating the tangent
plane of the point and then taking the normal vector to
the plane. However, the correct orientation of the normal
vector cannot directly be inferred mathematically, so an
additional subroutine is needed to orient each normal
vector. Let the points in the point cloud be members of
the set n 3 pXU# X9# q # Xkr where Xk3 !"k# $k# [k%. The
normal vector of Xk is ,k3 !"k# $k# [k%, which we want
to compute for all Xk‚ n. Lastly, the camera position
corresponding to4Xkis denoted, in vector form, dk3
~ƒ•# ~ƒ€ # ~ƒ• .
An octree data structure is used to search for the nearest
neighbors of point „ƒ. The 6 nearest neighbors are
defined by the set … 3 pzU# z9# q # z†r. The centroid of …
is calculated by:
‡4 3 4 ˆ
‰4 Š
Š4‚‹
4444444444444444444444444444444444444444444444444444444444444444!8L%
Let A be a k × 3 matrix as follows:
Œ 3
zU
z9
•
z†
444444444444444444444444444444444444444444444444444444444444444444444444!VM%444
Now we factor matrix Œ using singular value
decomposition (SVD) into Œ4 3 Žl>. Where Ž is a (k ×
k) orthogonal matrix, l> is a (3 × 3) orthogonal matrix,
and is a (k × 3) diagonal matrix, where the elements on
the diagonal, called the “singular values” of Œ, appear in
descending order. Note that the covariance matrix, ŒHŒ,
can be easily diagonalized using our singular value
decomposition:
••• 3 ‘’•“•“’‘•3 ‘ ’•’ ‘•444444444444444444!V?%
The eigenvectors for the covariance matrix are the
columns of vector l. The eigenvalues of the covariance
matrix are the elements on the diagonal of ”>”, and they
are exactly the squares of the singular values of matrix
Œ. In this formula, both l and ”>” are (3 × 3) matrices,
just like the covariance matrix Œ>Œ. For randomly
ordered diagonal elements !/•%*‚ 4 ”>” we keep only
the maximum I many of them, along with their
corresponding eigenvectors in matrix l. To produce the
best approximation of a plane in () we would take the
two eigenvectors, !–9# –*%, of the covariance matrix with
the highest corresponding eigenvalues. Thus, the normal
vector ,k is simply the cross product of these
eigenvectors, ,k3 4 –9—4–*.
Orienting a Point Set
Orientation of all the normals begins once we have
computed the normal for every point „ƒ‚ ˜. We also
want the normals of neighboring points to be consistently
oriented. For the simple case where only a single
viewpoint dW is used to generate a point cloud, we can
simply orient our normal vector such that the following
equation holds:
dZ7 4 Xk`4,kR M444444444444444444444444444444444444444444444444!V8%
If the equation does not hold for a computed normal
vector, we simply “flip” the normal vector by taking
7,k3 !7•ƒ# 7€ƒ# 7•ƒ%. If the dot product between the
two vectors is exactly 0, then additional methods need to
be applied. We need to account for the fact that multiple
camera positions may apply to each point in the point
cloud. Let d 3 4 p d•U# d•9# q # d•kr be the set of all ™ many
camera position vectors for point „ƒ.
We define ,k to be an “ambiguous normal” if:
1. There exists a d•k 4 ‚ d such that d•k 7 4Xk`
4,kš M AND
2. There exists a d•k 4 ‚ d such that d•k 7 4Xk`
4,kR M
Normals are assigned to all points that do not generate
ambiguous normals. The way to make sure all normals
are consistently correctly is to orient all the non-
ambiguous normals, adding them to a list of finished
normals, while placing all ambiguous normals into a
queue. We then use the queue of ambiguous normals and
try to determine the orientation by looking at the
neighboring points of pi. If the neighboring points of pi
Adams 8 32nd Annual AIAA/USU
Conference on Small Satellites
have already finished normals, we orient ,k consistently
with the neighboring normals Nk by setting ,kNkš M.
If the neighboring points do not have already finished
normals, move ,k to the back of the queue, and continue
until all normals are finalized.
The point normal generation process needs significant
improvements and can be aided greatly by the
implementation of an octree data structure in CUDA6.
SURFACE RECONSTRUCTION
MOCI implements a Poisson Surface Reconstruction
algorithm that is parallelized with CUDA. The input to
this algorithm is an oriented set of points and the output
is a 3D modeled surface. This surface is the final end-
product of MOCI’s computer vision pipeline and is
stored in the Stanford PLY format.
Poisson Surface Reconstruction
Thanks to Kazhdan, Bolitho, and Hoppe, an
implementation of Poisson surface reconstruction on any
GPU has become feasible20. A Poisson surface
reconstruction takes in a set of oriented points, l, and
generates a surface.
Figure 7: Stages of Poisson Reconstruction in 2D
Poisson computes an indicator gradient for the oriented
points set by first finding the function › that best
approximates the vector field l. When the divergence
operator is applied the it becomes a Poisson problem
where the goal is to compute the scalar function › whose
Lapacian equals the divergence of vector field vector
field l:
œ› • ž ` ž› 3 žl444444444444444444444444444444444444444444444444444444444!VV%
CONCLUSION AND INITIAL RESULTS
Comparison to ASTER data
When compared with ASTER data MOCI is already
meeting minimum mission success: MOCI can generate
Digital Elevation Models within one sigma of accuracy
relative to ASTER models.
Accuracy is calculated by a percent pixel difference. A
simple program is used to project the 3D surface onto a
plane, essentially rasterization. This plots a histogram of
how likely it is that a given elevation is off. The percent
difference is calculated from the minimum and
maximum elevations. In other words, the percent pixel
difference is the inverse of the magnitude of the
elevation error.
Figure 8: A comparison of mount a simulated
Mount Everest from MOCI with ASTER data
MOCI’s accuracy is likely to increase as better methods
are implemented in the computer vision pipeline.
2-view Reconstruction
A simple 2-view projection has been fully implemented
and is often used to test the point cloud reconstruction
portion of the pipeline. When supplied with near perfect
keypoint pairs, the algorithm can reconstruct a point
cloud with 86% accuracy. Accuracy is calculated by a
percent pixel difference, which plots a histogram of how
likely a given elevation is to be a given distance off.
A CPU implementation of 2-view reconstruction runs on
a 50,000 point set in approximately 25 minutes. It is
expected that this stage, when optimized for the GPU,
will take less than 90 seconds19.
Adams 9 32nd Annual AIAA/USU
Conference on Small Satellites
Figure 9: A point cloud of Mount Everest generated
from a simulation of a 2-view reconstruction
Simulation of Data Acquisition
The simple blender workflow that was demonstrated in
the initial feasibility study has been expanded, improved,
and is now included in a custom simulation package10.
This simulation package has the capability to simulate a
satellite with variable imaging payloads. This was so the
SSRL could discover the optical lens systems, GSD,
focal length, and sensor for the mission requirements.
The simulation software allows the user to edit these as
parameters - GSD is calculated. The simulation also
allows for variable orbits, variation of ground targets,
custom target objects, and more. A list of generated and
input variables, seen to the left as a json file, shows some
of the current capabilities of the simulation. The SSRL is
currently working on porting these simulations to the
supercomputing cluster available at UGA. Once the user
has created a json file with the variables and environment
that they would like to simulate, they can run the image
acquisition simulation in a terminal10.
Figure 10: A point cloud of Mount Everest
generated from a simulated orbit over the region.
Simulated data acquisition can be piped into
reconstruction any algorithm. The position and
orientation of the camera, as well as the image set, are all
part of the standard output.
FUTURE WORK
Testing and Simulation
While initial results are promising and terrestrial
technologies have shown that MOCI’s computer vision
pipeline is successful, more tests are needed to
understand the limitations and capabilities of MOCI’s
computer vision system.
N-view Reconstruction vs. Bundle Adjustment
It is unclear if a full Bundle Adjustment is necessary
given MOCI’s knowledge of its camera parameters. It
may be possible to calibrate the first and second order
radial distortion of the lens system once for all images
and feature matches instead of calculating it every time
a reprojection occurs. It may also be the case that, despite
somewhat accurate knowledge of camera parameters,
MOCI’s multi-view reconstruction stage will benefit
from improved accuracy with a bound constrained
Bundle Adjustment.
IMPLICATIONS
The complex computation system on the MOCI cube
satellite may show that it is worth performing more
complex computations in space rather than on the
ground. In MOCI’s case, it is beneficial because the 40
or more 4K images that it takes to generate a 3D model
contain much more data than the final 3D model. This
could also be the cause for real time data analysis, with
a GPU accelerated system it may be possible to analyze
data onboard a spacecraft to determine which data is the
most useful and prioritize the downlink of that data. This
has clear applications for autonomous space system or
deep space missions. During a deep space mission, it
would be possible to implement an AI to decide what
data is worth sending back to Earth. In general, it’s likely
that Neural Networks will be easily implemented for
space based applications on the TX2, or a system like
MOCI’s.
Acknowledgments
A significant thank you to Aaron Martinez, who helped
me understand most of the mathematics of MOCI’s
computer vision pipeline. The same thanks goes to
Nicholas (Hollis) Neel, who helped me understand the
concepts relating to the SIFT algorithm. Thanks to
Jackson Parker for being the person who usually has to
experiment with writing these algorithms in CUDA.
Last, but not least, I would like to thank Dr. David
Cotton, who has helped guide the SSRL to where it is
today.
Adams 10 32nd Annual AIAA/USU
Conference on Small Satellites
References
1. Rossi, Adam J., "Abstracted Workflow
Framework with a Structure from Motion
Application" (2014). Thesis. Rochester Institute of
Technology. Accessed from
http://scholarworks.rit.edu/theses/7814
2. Michot J., Bartoli A., Gaspard F., “Algebraic Line
Search for Bundle Adjustment”- British Machine
Vision Conference (BMVC)
3. Wu C, “Towards linear-time incremental structure
from motion.”, 3D Vision-3DV 2013, 2013
International conference on …, 2013
4. J Mak, M Hess-Flores, S Recker, JD Owens, KI
Joy, “GPU-accelerated and effiecient multi-view
triangulation for scene reconstruction”
Applications of Computer Vision (WACV), 2014
IEEE, 2014
5. Aniruddha Acharya K and R. Venkatesh Babu,
"Speeding up SIFT using GPU," 2013 Fourth
National Conference on Computer Vision, Pattern
Recognition, Image Processing and Graphics
(NCVPRIPG), Jodhpur, 2013, pp. 1-4.
6. K. Zhou, M. Gong, X. Huang and B. Guo, "Data-
Parallel Octrees for Surface Reconstruction," in
IEEE Transactions on Visualization and
Computer Graphics, vol. 17, no. 5, pp. 669-681,
May 2011.
7. J. Stoddard, D. Messinger and J. Kerekes, "Effects
of cubesat design parameters on image quality and
feature extraction for 3D reconstruction," 2014
IEEE Geoscience and Remote Sensing
Symposium, Quebec City, QC, 2014, pp. 1995-
1998.
8. Alexandre Boulch, Renaud Marlet. Fast Normal
Estimation for Point Clouds with Sharp. Features
using a Robust Randomized Hough Transform.
Computer Graphics Forum, Wiley. 2012, 31 (5),
pp.1765-1774. HAL Id: hal-00732426 https://hal-
enpc.archives-ouvertes.fr/hal-00732426
9. Adams, C., Neel N., “Structure from Motion from
a Constrained Orbiting Platform Using ISS image
data to generate cloud height models.”, presented
at the NASA/CASIS International Space Station
Research and Development Conference,
Washington D.C., 2017.
10. Adams, C., Neel N., “The Feasibility of Structure
from Motion over Planetary Bodies with Small
Satellites”, presented at the The AIAA/Utah State
Small Satellite Conference - SmallSat, Logan
Utah, 2017.
11. Likar J. J., Stone S. E., Lombardi R. E., Long K.
A., “Novel Radiation Design Approach for
CubeSat Based Missions.” Presented at 24th
Annual AIAA/USU Conference on Small
Satellites, August 2010
12. Hartley R., Kahl F. (2007) Optimal Algorithms in
Multiview Geometry. In: Yagi Y., Kang S.B.,
Kweon I.S., Zha H. (eds) Computer Vision –
ACCV 2007. ACCV 2007. Lecture Notes in
Computer Science, vol 4843. Springer, Berlin,
Heidelberg
13. Chrzeszczyk, Andrzej & Chrzeszczyk, Jakub.
(2013). Matrix computations on the GPU,
CUBLAS and MAGMA by example.
14. Hassaballah, M & Ali, Abdelmgeid & Alshazly,
Hammam. (2016). Image Features Detection,
Description and Matching. 630. 11-45.
10.1007/978-3-319-28854-3_2.
15. Lowe, D.G.: Distinctive image features from
scale-invariant keypoints. Int. J. Comput. Vis.
60(2), 91–110 (2004)
16. M. Brown and D. Lowe. Invariant Features from
Interest Point Groups. In David Marshall and Paul
L. Rosin, editors, Proceedings of the British
Machine Conference, pages 23.1-23.10. BMVA
Press, September 2002.
17. Harris, C. and Stephens, M. (1988) A Combined
Corner and Edge Detector. Proceedings of the 4th
Alvey Vision Conference, Manchester, 31
August-2 September 1988, 147-151.
18. Traa J., “Least Square Intercetion of Lines”, UIUC
2013,
http://cal.cs.illinois.edu/~johannes/research/LS_li
ne_intersect.pdf
19. Wu C., Agarwal S., Curless B., SM Seitz.
“Multicore bundle adjustment”. Computer Vision
and Pattern Recognition (CVPR), 2011 IEEE
Conference on …, 2011. 574, 2011.
20. Michael Kazhdan, Matthew Bolitho, and Hugues
Hoppe. 2006. Poisson surface reconstruction. In
Proceedings of the fourth Eurographics
symposium on Geometry processing (SGP '06).
Eurographics Association, Aire-la-Ville,
Switzerland, Switzerland, 61-70.