Content uploaded by Caleb Adams

Author content

All content in this area was uploaded by Caleb Adams on Aug 26, 2019

Content may be subject to copyright.

Adams 1 32nd Annual AIAA/USU

Conference on Small Satellites

SSC18-x-x

A Near Real Time Space Based Computer Vision System for Accurate Terrain Mapping

Caleb Adams

University of Georgia Small Satellite Research Laboratory

106 Pineview Court, Athens, GA 30606

CalebAshmoreAdams@gmail.com

Faculty Advisor: Dr. David Cotten

Center for Geospatial Research, UGA Small Satellite Research Laboratory

ABSTRACT

The Multiview Onboard Computational Imager (MOCI) is a 3U cube satellite designed to convert high resolution

imagery, 4K images at 8m Ground Sample Distance (GSD), into useful end data products in near real time. The

primary data products that MOCI seeks to provide are a 3D terrain models of the surface of Earth that can be directly

compared to the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) v3 global Digital

Elevation Model (DEM). MOCI utilizes a Nvidia TX2 Graphic Processing Unit (GPU)/System on a Chip (SoC) to

perform the complex calculations required for such a task. The reconstruction problem, which MOCI can solve,

contains many complex computer vision subroutines that can be used in less complicated computer vision pipelines.

INTRODUCTION

This paper does not seek to describe the entire satellite

system; it seeks to describe, in detail, the complex

computation system that MOCI will utilize to generate

scientific data on orbit. An overview of the satellite

system and optical system are provided for clarity and

context. A detailed explanation of the subroutines in

MOCI’s primary computer vision pipeline are described

in detail over the course of this paper.

System Overview

The MOCI satellite primarily uses Commercial Off the

Shelf (COTS) hardware so that the focus can be on

payload development. The MAI-401 with a star-tracker

is utilized to achieve the necessary pointing

requirements. The GomSpace BP4 P60 Electrical Power

System (EPS) is used. The F’Sati Ultra High Frequency

(UHF) transceiver and F’Sati S-Band transmitter are

used for communications of commands, telemetry, and

science data. A Clyde Space On Board Computer (OBC)

is used as the main flight computer. The payload uses the

Nvidia TX2 SoC as the high-performance computation

unit and a custom optical system developed by Ruda-

Cardinal that produces 4K images from at 8m GSD from

a 400km orbit.

Surface Reconstruction Pipeline

In our case, a computer vision pipeline, sometimes

referred to as a workflow1, consists of a set of chained

computer vision subroutines where the output of the

previous is the input to the next. Subroutines are often

referred to as stages in this sense1. MOCI implements a

Surface Reconstruction Pipeline with the initial inputs

being a set of images, the position of the spacecraft per

image, and the orientation of the spacecraft per image.

The first stage in the pipeline is the feature detection

stage.

Figure 1: Multiview Reconstruction2

The feature detection stage identifies regions within each

image that should be considered for feature description.

This stage produces a set of points at location !" # $%,

scale &, and rotation q. The feature description stage

takes this set and encodes the information from local

regions into a feature vector '. The next stage is the

feature matching stage, which seeks to find the best

correspondence, or minimum difference, between the set

of points in the images. Once points have been matched

in the image they need to be placed into () from (*. for

reprojection. Vectors are made at the position of each

matched point and used to calculate the point of

Adams 2 32nd Annual AIAA/USU

Conference on Small Satellites

minimum distance between all projected lines, which is

the calculated point of intersection. The output of the

reprojection is a set of points in (). After the initial

reprojection a Bundle Adjustment is performed as the

next stage in the pipeline. This takes the sets of points

and uses camera data to estimate the reprojection error

and remove it from the generated points. The result after

the Bundle Adjustment is a more accurate point cloud.

The next stage is the final stage, which is a surface

reconstruction. First normal are calculated for the set of

points to make an oriented point set. The oriented point

set is then used for a Poisson Surface reconstruction to

make the final data product. Any additional computer

vision subroutines discussed here are not part of MOCI’s

primary pipeline.

RELATED WORK

The techniques relayed here are not new, but are built

from well understood algorithms and computer vision

subroutines. The implementations of these well

understood principals are built from previous work that

the University of Georgia (UGA) Small Satellite

Research Laboratory (SSRL) has done. The adaptation

of structure from motion and real time mapping for aerial

based photogrammetry and autonomous robotics is

commonplace.

Multiview Reconstruction

GPU accelerated mechanics Structure from Motion have

been implemented on many occasions. Chang Chang

Wu’s research to develop an incremental approach to

Structure from Motion demonstrated that it was possible

to solve the reconstruction problem in +!,% rather than

+!,-%# greatly improving efficiency and speed3.

Additional research has shown that the triangulation

problem can be achieve a 40x speed up4 when utilizing

Compute Unified Device Architecture (CUDA) capable

Nvidia GPUs. Additionally, multicore GPU Bundle

Adjustment has been shown to achieve a 30x speed up

over previous implementations. GPU accelerated feature

detectors and descriptors are now commonplace. This

can typically lead to the identification and extraction of

features within a few milliseconds5, which allows the

near real time extraction of input information into the

pipeline. A standard Poisson Reconstruction,

fundamentally limited by the octree data structure, can

run two orders of magnitude6 faster when implemented

on a CUDA capable GPU.

Surface Normal Calculation

A key problem in a surface reconstruction or structure

from motion pipeline is the generation of an oriented

point set. Recent research, testing the feasibility of

generating oriented point sets from cube satellites, has

claimed that normal estimation is only accurate between

5o and 29o when utilizing 2m GSD imagery7.

Additionally, new techniques have recently been

demonstrated showing that a Randomized Hough

Transform can preserve sharp features, improving the

accuracy of point normal while being almost an order of

magnitude faster8. It is expected the more efficient point

normalization methods will improve the accuracy of the

3D models generated by MOCI.

Cloud Height and Planetary Modeling

With previous studies, we have shown that image data

from the International Space Station (ISS) High

Definition Earth-Viewing System (HDEV) can produce

accurate cloud height models within 5.926 – 7.012 km9.

Additionally, available structure from motion pipelines

and surface reconstruction, in conjunction with the

SSRL’s custom simulation software, have been used to

demonstrate that a 3D surfaces of mountain ranges can

be reconstructed. The SSRL has demonstrated, that the

proposed pipeline can generate 3D models of large

geographic features within 68.2% accuracy of ASTER

v3 global DEM data10, resulting in an approximately

10m resolution surface model.

PAYLOAD SYSTEM OVERVIEW

A simple overview of the system is provided in this

section to make later sections about scientific

computations more clear. The payload sits at the top of

the electronics stack of the MOCI system and contains

the Nvidia TX2, an optical assembly, an e-con systems

See3CAM_CU135 with an AR1335 image sensor from

ON Semiconductor, Core GPU Interface (CORGI)

Board to connect all the subsystems together and allow

them to communicate over a standard PC104+ bus.

Figure 2: Payload Electronics

Adams 3 32nd Annual AIAA/USU

Conference on Small Satellites

Data Interfacing and Power

The Nvidia TX2 is interfaced to the CORGI Board via a

400 pinout connector. The CORGI Board connected to

the See3CAM_CU135 interface board via a Universal

Serial Bus type C connector (USB-C) allowing for 4K

image data to be streamed to the GPU. The CORGI also

routes an Inter-Integrated Circuit (I2C) bus, Ethernet, and

a Serial Peripheral Interface (SPI) into the satellite’s

PC104+ bus. The TX2’s maximum power draw is 7W,

but current computations are only running at

approximately 4.5W.

Thermal Properties

For a worst case, thermal analysis, an unrealistic power

draw of 14W is used. Additionally, a system with 0%

efficiency was also assumed as a worst-case scenario. A

maximum temperature of approximately 51o C was

simulated with these conditions. The TX2 is attached to

a Thermal Transfer Plate and simulations have shown

that the max operating temperature is sustainable for the

system. Further, more detailed, research will soon be

published on how we have managed these thermal

conditions.

Optical System

The SSRL is partnering with Ruda-Cardinal to make a

custom optical assembly capable of generating images at

a resolution of at most 8m GSD. The optical system has

a 4.5o Field of View (FOV) and an effective focal length

of 120mm.

Figure 3: MOCI Optical Assembly

GPU SYSTEM OVERVIEW

The GPU (Nvidia TX2) is a complete SoC, capable of

running GNU/Linux on an ARM v8 CPU with a Tegra

GPU running the Pascal Architecture. The TX2 is has

256 CUDA cores, 8 GB of 128 bit LPDDR4, and 32 GB

of eMMC.

Radiation Mitigation

The primary concerns in LEO are Single Event Upsets

(SEU), Single Event Functional Interrupts (SEFI), and

Single Event Latchups (SEL)11. These are certainly

concerns for a dense SoC like the TX2. Thus, MOCI will

utilize aluminized capton as a thin layer of protection for

the payload. Software mitigation is also implemented.

The Clyde Space OBC contains hardware-encoded ECC

and could flash a new image onto the TX2 if necessary.

The TX2 also utilizes a custom implementation of

software-encoded error correction coding (ECC).

Further, more detailed, research will soon be published

on how we have managed and characterized these

radiation conditions.

Compute Unified Device Architecture

Currently the TX2 utilizes CUDA 9.0. CUDA capable

GPUs can parallelize tasks, leading to computational

speeds orders of magnitude higher than those of a CPU

system performing the same operations. CUDA’s

computational parallel model is comprised of a grid that

contains blocks made up of threads. The TX2 can handle

up to 65535 blocks per dimension, leading to a potential

total of 2.81 x 1014 blocks each containing a maximum

of 1024 threads. The potential for parallelization here is

substantial, and is the key to developing a near real time

computer vision system.

GPU Accelerated Linear Algebra

In Hartley’s survey paper on optimal algorithms in

Multiview Geometry, every algorithm he identifies

benefits greatly from hyper optimized matrix operations

that are made possible by the massive parallelization that

CUDA enables12. Furthermore, widely available linear

algebra libraries, such as the Basic Linear Algebra

Subsystem (BLAS) have been accelerated with CUDA13.

These modified libraries, such as cuBLAS and

cuSOLVER are critical to the improving the functions

needed in complex computer vision pipelines.

FEATURE DETECTION AND DESCRIPTION

After images have been acquired from the payload

system, the first step in the pipeline is feature detection.

Typically, feature detection attempts to identifies regions

within an image that should be considered for feature

description14. Feature descriptions are only given to

candidate regions/points that meet the requirements of

the algorithm. For our purposes, we utilize the Scale

Invariant Feature Transform (SIFT) developed by Lowe.

The SIFT algorithm, which contains several standard

subroutines, has become a standard in computer vision

Detection of Scale Space Extrema

To detect feature that are scale-invariant, the SIFT

algorithm uses a Difference of Gaussians (DoG) to

identify local extrema in scale-space. The Laplacian of

Gaussians (LoG) is often used to detect stable features in

scale-space14. The convolution of an image with a

Adams 4 32nd Annual AIAA/USU

Conference on Small Satellites

Gaussian kernel is defined by a function .!"# $# /%,

which is produced by the convolution of a scale space

Gaussian15, 0!"# $# /%, and input image, 1!" # $% and 2 is

a convolution operation between functions:

. "# $# / 3 40 "# $# / 2 41!"# $% (1)

An efficient way to calculate the DoG function,

5!"# $# /% is to simply compute the difference two

nearby scales with a separation of 6.

5 "# $# / 3 0 " # $# 6/ 7 0 " # $# / 2 1 "# $ (2)4

5 "# $# / 3 . "# $# 6/ 7 . "# $# / (3)

The Gaussian kernel is convolved with the input image

to form a Gaussian Scale Pyramid.

Figure 4: The DoG in Scale-Space15

To detect the local minima and maxima of the DoG

function, the candidate point is compared to the 8 local

neighbors of its current scale, the 9 neighbors above its

scale, and the 9 neighbors below its scale in the DoG

pyramid. For ease of computation, the scale separation

of 6 is chosen to be represented as 6 3 4 89:; . where & is

chosen to be an integer number such that a doubling of &

results in a division of the scale space / in the next

octave15.

Keypoint Localization and Filtering

Given a set of candidate points, given by the detection of

scale-space extrema from the DoG, the challenge is to

localize the point by determining the ratio of principal

curvature. A method proposed by Brown uses a Taylor

expansion of the scale-space function, 5!"# $# /%, where

the origin is at the center of the sample point16:

5 " 3 45 < 4 =5>

=" " < 4 ?

8">=*5

="*"4444444444444444444444444444444!@%

The derivatives are located at the center of the sample

point and the offset from the sample point is defined as

" 3 !"# $# / %>. The local extreme " is given by taking

the derivative of the function with respect to "4and

setting it equal to zero:

" 3 4 7 =*5A9

="*4=5

=" 4444444444444444444444444444444444444444444444444444444444444 !B%

Often additional calculations are preformed to eliminate

unstable extrema and edge responses. To eliminate

strong edge responses, which a DoG will often produce,

the principal curvature is computed from a Hessian

matrix containing the partial derivatives of the DoG

function, 5!"# $# /%:

C 3 4 5DD 5DE

5DE 5EE (6)

Harris and Stephens have shown that we only need be

concerned with the ratios of eigenvalues17. We let F

represent be the eigenvalue with the largest magnitude

and G be the smaller magnitude. Then we compute the

trace of C and the determinate.

HI C 3 4 5DD < 4 5EE 3 4F < 4G44 (7)

5JK C 3 4 5DD5EE 7 !5DD*% 3 4FG (8)

We then let I be the ratio of the between the largest and

smallest eigenvalues such that F 3 IG. Then we

discover:

HI!C%*

5JK!C% 3 4 !F < 4G%*

FG 3 4 !IG < 4G%*

IG*3 4 !I < ?%*

I44444!L%

We can then use this ratio as a cut off for undesired

edge points. Typically, a value of I 3 4?M is used15 to

eliminate principal curvatures greater than I.

Orientation and Magnitude Assignment

To achieve rotation invariance, so that we can identify

the same keypoints from any rotation, we must assign an

orientation to the keypoints from the previous step. We

want these computation to occur in a scale invariant

manner as well, so we select the Gaussian smoothed

image .!"# $% at scale / where the extrema was detected.

We can use pixel differences to compute the gradient

magnitude, N!"# $%, and orientation, O "# $ # to assign:

N "# $ 3 4 !. " < ?# $ 7 .!" 7 ?# $%%*

<!. "# $ < ? 7 .!"# $ 7 ?%%* (10)

O "# $ 3 4 KP,A9 . "# $ < ? 7 . "# $ 7 ?

. " < ?# $ 7 4.!" 7 ?# $%44444444444!??%

Adams 5 32nd Annual AIAA/USU

Conference on Small Satellites

Feature Description

For each keypoint, the SIFT algorithm will start by

calculating the image gradient magnitudes and in a 16 ×

16 region around each keypoint using its scale to select

the level of Gaussian blur for the image. A set of oriented

histograms is created for each 4 × 4 region of the image

gradient window15.

A Gaussian weighting function with / equal to half the

region size assigns weights to each sample point. Given

that there are 4 × 4 and 8 possible orientations, the length

of the generated feature vector is 128. In other words,

there are 128 elements describing each point in the final

output of the SIFT algorithm.

Figure 5: A SIFT Feature Descriptor15

FEATURE MATCHING

Feature matching can be thought of as a simple problem

of Euclidean distance. First, sets of points are eliminated

that do not fit within a radius I, a set of close points is

generated with the simple Euclidean distance Q, where

each feature has a coordinate !"# $% on the image.

Q 3 4 !$*7 $9% < 4 !"*7 "9%4 (12)

We iterate through each point in image one, 19, and

image two, 1*, and accumulate potential matches where

Q4 R 4I.

Figure 6: SIFT feature matching

For each value in the feature vector, ', we find the

minimum 128 dimensional Euclidean distance, N.

N 3 4 !'

S9 7 4 '

S9%

9*T

U

4444444444444444444444444444444444444444444444444444!?V%

The resulting “matched” points should also be checked

against some maximum threshold. If the minimum

Euclidean distance is more than that threshold, the match

should be discarded.

MULTIVIEW RECONSTRUCTION

Once the features have been identified for each image

and the features between images have been matched, the

image planes must be placed into (). The matched

keypoints and camera information must be used to

triangulate the location of the identified feature in ().

Moving into 3D space

The first step to moving a key point into (). is to place

it onto a plane in (*. The coordinates !"W# $W% in (*

require the size of a pixel QXY", the location of the

keypoint !"# $%, and the resolution of the image

!"IJ&#$IJ&% to yield:

"Z3 4QXY"4 " 7 "IJ&

84444444444444444444444444444444444444444444444444!?@%

$Z3 4QXY"4$IJ&

87 $ 4444444444444444444444444444444444444444444444444 !?B%

This is repeated for the other matching keypoint. The

coordinate !"W# $W# [W%4in () of the keypoint !"W# $W% in (*

is given by three rotation matrices and one translation

matrix. First we treat !"W# $W% in (* as a homogenous

vector in () to yield !"W# $W# ?%. Given a unit vector

representing the camera, in our case the spacecraft’s

camera’s, orientation !I

D# I

E# I

\% we find the angle to

rotate in each axis !OD# OE# O\%. In a simple case, we find

the angle in the "$ plane with:

O\3 4 ]^_A9 ?4M4M ` aI

D4I

E4I

\b

aI

D4I

E4I

\b ` aI

D4I

E4I

\b4444444444444444444444444444444 !?c%

Rotations for all planes are generated in an identical way.

Now, given a rotation in each plane !OD# OE# O\% we

calculate the homogeneous coordinate !I

D# I

E# I

\# ?% in ()

using linear transformations. The values !H

D# H

E# H

\%

represent a translation in (- and use camera position

coordinates dD# dE# d\# the camera unit vectors

representing orientation eD# eE# e\, and focal length ':

Adams 6 32nd Annual AIAA/USU

Conference on Small Satellites

?

M

M

M

4

M

]^_ OD4

_fg OD

M

M

7_fg OD4

]^_ OD

M

M

M

M

?

]^_ OE

M

7_fg OE

M

4

M

?

M

M

4

_fg OE

M

]^_ OE

M

4

M

M

M

?

44444444444444

]^_ O\

_fg OD

M

M

4

7_fg O\

]^_ O\4

M

M

4

M

M

?

M

4

M

M

M

?

"

$

[

?

3

"W

$W

[W

?

444444444444444444444444444444444 !?h%

4

dD7 !"i< ' 2 eD%

dE7 !$i< ' 2 eE%

d\7 ![i< ' 2 e\%

?

4

"i

$i

[i

?

3 4

H

D

H

E

H

\

?

444444444444444444444444444!?j%

?

M

M

M

4

M

?

M

M

4

M

M

?

M

4

H

D

H

E

H

\

?

4

"i

$i

[i

?

3

"W

$W

[W

?

44444444444444444444444444444444444444444444444444!?L%

Point and Vector format

The resulting transformation in equations 17 and 19

should be performed for all matched points. This should

result in , homogeneous points of the form

!"k# $k# [k# ?%. Each point has a corresponding camera,

which is already known by the camera coordinate,

!dDk# dEk# d\k# ?%. From this find a vector l

k from the

camera position:

l

k3dDk47 "kdEk47 $kd\k47 [k4444444444444444444!8M%

l

k should then be normalized so that it is a unit vector.

N-view Reprojection

Now the point cloud can finally be generated. To do such

a thing, the goal of the n-view reprojection is to find the

point, m, that best fits a set of lines. Traa shows we can

start with the distance function, 5, between our ideal

point, m, and a parameterized line with vector, l, and

point, n. We can think of distance function as a projector

onto the orthocomplement of l, giving18:

5 mo n# l 3 41 7 ll>444444444444444444444444444444444444444444444444444!8?%

Equation 21 should be thought of as a projecting vectors

m and nonto the space orthogonal to l. The challenge is

solving this least squares problem given only matching

sets of points n

k and their vectors l

k. Let the set of

matched points/vectors be represented by the set . 3

p nU# l

U# q # n

k# l

kr. We can view this set . as a set of

parameterized lines. We should minimum the sum of

squared differences with the equation:

5 mo n # l 3 4 5 m o n

s# l

s

t

suU

4444444444444444444444444444444444444 !88%

To produce the point v, the equation to minimize is:

v 3 wfg

x!5 mo n

k# l

k%44444444444444444444444444444444444444444444444444 !8V%

Taking both derivatives with respect m to we receive:

=5

=m 4 3 4 784 1 7 l

sl

s>!n

s7 m%

t

suU

3 M44444444444444444444!8@%4

We then obtain a linear of the form yX 3 z, where:

y 3 4 1 7 l

sl

s>

t

suU

#444z 3 1 7 l

sl

s>

t

suU

n

s4444444444444!8B%

Traa shows that we can either solve the system directly

or apply the Moore-Penrose pseudoinverse:

v 3 4 y {z4444444444444444444444444444444444444444444444444444444444444444444444444444!8c%

The resulting v is the point of best fit for the members of

set .. Once v is calculated, the next set of point/vector

matches is loaded. The computation is repeated until all

points best fit points v have been calculated. At the end

of this stage in the pipeline, the point cloud has been

generated.

Bundle Adjustment

All the calculations to this point have been in preparation

for a reprojection, which is a simple triangulation. The

Bundle Adjustment is used to calibrate the position of

features in () based on a camera calibration matrix and

minimize the projection error19. The camera calibration

matrix, |, is stored by the spacecraft at the time of image

acquisition. Given the location of an observes feature is

!"# $% and the real location of the feature !"# $% then the

reprojection error, I, for that feature is given by:

I 3 " 7 "# $ 7 $ 4444444444444444444444444444444444444444444444444444444444!8h%

The camera parameters include the focal lengths !'

D# '

E%,

the center pixel !dD# dE%, and coefficients !69# 6*% that

represent the first and second order radial distortion of

the lens system. The vector } contains those 6 camera

parameters, the feature’s position in ()given in equation

19 as !"Z# $ Z# [W%, and the camera’s position and

orientation represented by dD# dE# d\4and eD# eE# e\,

as previously equation 18. The goal is to minimize the

function of vector }:

wfg '!}% 3 4 ?

8I!}%>I } 4444444444444444444444444444444444444444444444!8j%

MOCI’s system will supply a pre-estimation of camera

data from sensors onboard, allowing for a bound

Adams 7 32nd Annual AIAA/USU

Conference on Small Satellites

constrained bundle adjustment that will greatly improve

the speed of the computation. Levenberg-Marquardt

(LM) algorithm is utilized to by iteratively solving a

sequence of linear least squares and minimize he

problem. A constrained multicore implementation of the

bundle adjustment would only be a slight modification

of the parallel algorithm proposed and described by

Wu19. This is the last step of the point cloud generation

process, when the reprojection error is minimized the

point cloud computation is considered done. Additional

research is necessary to determine if MOCI needs to

perform a bundle rather than a real time custom

calibration step before a standard n-view reprojection.

POINT CLOUD NORMALIZATION

Once the point set has been generated it must be oriented

so that a more accurate surface reconstruction can take

place.

Finding the Normals of a Point Set

The coordinates of the points in the point cloud and the

camera position ~•# ~€# ~•4 which generated each

point. The problem of determining the normal to a point

on the surface is approximated by estimating the tangent

plane of the point and then taking the normal vector to

the plane. However, the correct orientation of the normal

vector cannot directly be inferred mathematically, so an

additional subroutine is needed to orient each normal

vector. Let the points in the point cloud be members of

the set n 3 pXU# X9# q # Xkr where Xk3 !"k# $k# [k%. The

normal vector of Xk is ,k3 !"k# $k# [k%, which we want

to compute for all Xk‚ n. Lastly, the camera position

corresponding to4Xkis denoted, in vector form, dk3

~ƒ•# ~ƒ€ # ~ƒ• .

An octree data structure is used to search for the nearest

neighbors of point „ƒ. The 6 nearest neighbors are

defined by the set … 3 pzU# z9# q # z†r. The centroid of …

is calculated by:

‡4 3 4 ˆ

‰4 Š

Š4‚‹

4444444444444444444444444444444444444444444444444444444444444444!8L%

Let A be a k × 3 matrix as follows:

Œ 3

zU

z9

•

z†

444444444444444444444444444444444444444444444444444444444444444444444444!VM%444

Now we factor matrix Œ using singular value

decomposition (SVD) into Œ4 3 Žl>. Where Ž is a (k ×

k) orthogonal matrix, l> is a (3 × 3) orthogonal matrix,

and is a (k × 3) diagonal matrix, where the elements on

the diagonal, called the “singular values” of Œ, appear in

descending order. Note that the covariance matrix, ŒHŒ,

can be easily diagonalized using our singular value

decomposition:

••• 3 ‘’•“•“’‘•3 ‘ ’•’ ‘•444444444444444444!V?%

The eigenvectors for the covariance matrix are the

columns of vector l. The eigenvalues of the covariance

matrix are the elements on the diagonal of ”>”, and they

are exactly the squares of the singular values of matrix

Œ. In this formula, both l and ”>” are (3 × 3) matrices,

just like the covariance matrix Œ>Œ. For randomly

ordered diagonal elements !/•%*‚ 4 ”>” we keep only

the maximum I many of them, along with their

corresponding eigenvectors in matrix l. To produce the

best approximation of a plane in () we would take the

two eigenvectors, !–9# –*%, of the covariance matrix with

the highest corresponding eigenvalues. Thus, the normal

vector ,k is simply the cross product of these

eigenvectors, ,k3 4 –9—4–*.

Orienting a Point Set

Orientation of all the normals begins once we have

computed the normal for every point „ƒ‚ ˜. We also

want the normals of neighboring points to be consistently

oriented. For the simple case where only a single

viewpoint dW is used to generate a point cloud, we can

simply orient our normal vector such that the following

equation holds:

dZ7 4 Xk`4,kR M444444444444444444444444444444444444444444444444!V8%

If the equation does not hold for a computed normal

vector, we simply “flip” the normal vector by taking

7,k3 !7•ƒ# 7€ƒ# 7•ƒ%. If the dot product between the

two vectors is exactly 0, then additional methods need to

be applied. We need to account for the fact that multiple

camera positions may apply to each point in the point

cloud. Let d 3 4 p d•U# d•9# q # d•kr be the set of all ™ many

camera position vectors for point „ƒ.

We define ,k to be an “ambiguous normal” if:

1. There exists a d•k 4 ‚ d such that d•k 7 4Xk`

4,kš M AND

2. There exists a d•k 4 ‚ d such that d•k 7 4Xk`

4,kR M

Normals are assigned to all points that do not generate

ambiguous normals. The way to make sure all normals

are consistently correctly is to orient all the non-

ambiguous normals, adding them to a list of finished

normals, while placing all ambiguous normals into a

queue. We then use the queue of ambiguous normals and

try to determine the orientation by looking at the

neighboring points of pi. If the neighboring points of pi

Adams 8 32nd Annual AIAA/USU

Conference on Small Satellites

have already finished normals, we orient ,k consistently

with the neighboring normals Nk by setting ,kNkš M.

If the neighboring points do not have already finished

normals, move ,k to the back of the queue, and continue

until all normals are finalized.

The point normal generation process needs significant

improvements and can be aided greatly by the

implementation of an octree data structure in CUDA6.

SURFACE RECONSTRUCTION

MOCI implements a Poisson Surface Reconstruction

algorithm that is parallelized with CUDA. The input to

this algorithm is an oriented set of points and the output

is a 3D modeled surface. This surface is the final end-

product of MOCI’s computer vision pipeline and is

stored in the Stanford PLY format.

Poisson Surface Reconstruction

Thanks to Kazhdan, Bolitho, and Hoppe, an

implementation of Poisson surface reconstruction on any

GPU has become feasible20. A Poisson surface

reconstruction takes in a set of oriented points, l, and

generates a surface.

Figure 7: Stages of Poisson Reconstruction in 2D

Poisson computes an indicator gradient for the oriented

points set by first finding the function › that best

approximates the vector field l. When the divergence

operator is applied the it becomes a Poisson problem

where the goal is to compute the scalar function › whose

Lapacian equals the divergence of vector field vector

field l:

œ› • ž ` ž› 3 žl444444444444444444444444444444444444444444444444444444444!VV%

CONCLUSION AND INITIAL RESULTS

Comparison to ASTER data

When compared with ASTER data MOCI is already

meeting minimum mission success: MOCI can generate

Digital Elevation Models within one sigma of accuracy

relative to ASTER models.

Accuracy is calculated by a percent pixel difference. A

simple program is used to project the 3D surface onto a

plane, essentially rasterization. This plots a histogram of

how likely it is that a given elevation is off. The percent

difference is calculated from the minimum and

maximum elevations. In other words, the percent pixel

difference is the inverse of the magnitude of the

elevation error.

Figure 8: A comparison of mount a simulated

Mount Everest from MOCI with ASTER data

MOCI’s accuracy is likely to increase as better methods

are implemented in the computer vision pipeline.

2-view Reconstruction

A simple 2-view projection has been fully implemented

and is often used to test the point cloud reconstruction

portion of the pipeline. When supplied with near perfect

keypoint pairs, the algorithm can reconstruct a point

cloud with 86% accuracy. Accuracy is calculated by a

percent pixel difference, which plots a histogram of how

likely a given elevation is to be a given distance off.

A CPU implementation of 2-view reconstruction runs on

a 50,000 point set in approximately 25 minutes. It is

expected that this stage, when optimized for the GPU,

will take less than 90 seconds19.

Adams 9 32nd Annual AIAA/USU

Conference on Small Satellites

Figure 9: A point cloud of Mount Everest generated

from a simulation of a 2-view reconstruction

Simulation of Data Acquisition

The simple blender workflow that was demonstrated in

the initial feasibility study has been expanded, improved,

and is now included in a custom simulation package10.

This simulation package has the capability to simulate a

satellite with variable imaging payloads. This was so the

SSRL could discover the optical lens systems, GSD,

focal length, and sensor for the mission requirements.

The simulation software allows the user to edit these as

parameters - GSD is calculated. The simulation also

allows for variable orbits, variation of ground targets,

custom target objects, and more. A list of generated and

input variables, seen to the left as a json file, shows some

of the current capabilities of the simulation. The SSRL is

currently working on porting these simulations to the

supercomputing cluster available at UGA. Once the user

has created a json file with the variables and environment

that they would like to simulate, they can run the image

acquisition simulation in a terminal10.

Figure 10: A point cloud of Mount Everest

generated from a simulated orbit over the region.

Simulated data acquisition can be piped into

reconstruction any algorithm. The position and

orientation of the camera, as well as the image set, are all

part of the standard output.

FUTURE WORK

Testing and Simulation

While initial results are promising and terrestrial

technologies have shown that MOCI’s computer vision

pipeline is successful, more tests are needed to

understand the limitations and capabilities of MOCI’s

computer vision system.

N-view Reconstruction vs. Bundle Adjustment

It is unclear if a full Bundle Adjustment is necessary

given MOCI’s knowledge of its camera parameters. It

may be possible to calibrate the first and second order

radial distortion of the lens system once for all images

and feature matches instead of calculating it every time

a reprojection occurs. It may also be the case that, despite

somewhat accurate knowledge of camera parameters,

MOCI’s multi-view reconstruction stage will benefit

from improved accuracy with a bound constrained

Bundle Adjustment.

IMPLICATIONS

The complex computation system on the MOCI cube

satellite may show that it is worth performing more

complex computations in space rather than on the

ground. In MOCI’s case, it is beneficial because the 40

or more 4K images that it takes to generate a 3D model

contain much more data than the final 3D model. This

could also be the cause for real time data analysis, with

a GPU accelerated system it may be possible to analyze

data onboard a spacecraft to determine which data is the

most useful and prioritize the downlink of that data. This

has clear applications for autonomous space system or

deep space missions. During a deep space mission, it

would be possible to implement an AI to decide what

data is worth sending back to Earth. In general, it’s likely

that Neural Networks will be easily implemented for

space based applications on the TX2, or a system like

MOCI’s.

Acknowledgments

A significant thank you to Aaron Martinez, who helped

me understand most of the mathematics of MOCI’s

computer vision pipeline. The same thanks goes to

Nicholas (Hollis) Neel, who helped me understand the

concepts relating to the SIFT algorithm. Thanks to

Jackson Parker for being the person who usually has to

experiment with writing these algorithms in CUDA.

Last, but not least, I would like to thank Dr. David

Cotton, who has helped guide the SSRL to where it is

today.

Adams 10 32nd Annual AIAA/USU

Conference on Small Satellites

References

1. Rossi, Adam J., "Abstracted Workflow

Framework with a Structure from Motion

Application" (2014). Thesis. Rochester Institute of

Technology. Accessed from

http://scholarworks.rit.edu/theses/7814

2. Michot J., Bartoli A., Gaspard F., “Algebraic Line

Search for Bundle Adjustment”- British Machine

Vision Conference (BMVC)

3. Wu C, “Towards linear-time incremental structure

from motion.”, 3D Vision-3DV 2013, 2013

International conference on …, 2013

4. J Mak, M Hess-Flores, S Recker, JD Owens, KI

Joy, “GPU-accelerated and effiecient multi-view

triangulation for scene reconstruction”

Applications of Computer Vision (WACV), 2014

IEEE, 2014

5. Aniruddha Acharya K and R. Venkatesh Babu,

"Speeding up SIFT using GPU," 2013 Fourth

National Conference on Computer Vision, Pattern

Recognition, Image Processing and Graphics

(NCVPRIPG), Jodhpur, 2013, pp. 1-4.

6. K. Zhou, M. Gong, X. Huang and B. Guo, "Data-

Parallel Octrees for Surface Reconstruction," in

IEEE Transactions on Visualization and

Computer Graphics, vol. 17, no. 5, pp. 669-681,

May 2011.

7. J. Stoddard, D. Messinger and J. Kerekes, "Effects

of cubesat design parameters on image quality and

feature extraction for 3D reconstruction," 2014

IEEE Geoscience and Remote Sensing

Symposium, Quebec City, QC, 2014, pp. 1995-

1998.

8. Alexandre Boulch, Renaud Marlet. Fast Normal

Estimation for Point Clouds with Sharp. Features

using a Robust Randomized Hough Transform.

Computer Graphics Forum, Wiley. 2012, 31 (5),

pp.1765-1774. HAL Id: hal-00732426 https://hal-

enpc.archives-ouvertes.fr/hal-00732426

9. Adams, C., Neel N., “Structure from Motion from

a Constrained Orbiting Platform Using ISS image

data to generate cloud height models.”, presented

at the NASA/CASIS International Space Station

Research and Development Conference,

Washington D.C., 2017.

10. Adams, C., Neel N., “The Feasibility of Structure

from Motion over Planetary Bodies with Small

Satellites”, presented at the The AIAA/Utah State

Small Satellite Conference - SmallSat, Logan

Utah, 2017.

11. Likar J. J., Stone S. E., Lombardi R. E., Long K.

A., “Novel Radiation Design Approach for

CubeSat Based Missions.” Presented at 24th

Annual AIAA/USU Conference on Small

Satellites, August 2010

12. Hartley R., Kahl F. (2007) Optimal Algorithms in

Multiview Geometry. In: Yagi Y., Kang S.B.,

Kweon I.S., Zha H. (eds) Computer Vision –

ACCV 2007. ACCV 2007. Lecture Notes in

Computer Science, vol 4843. Springer, Berlin,

Heidelberg

13. Chrzeszczyk, Andrzej & Chrzeszczyk, Jakub.

(2013). Matrix computations on the GPU,

CUBLAS and MAGMA by example.

14. Hassaballah, M & Ali, Abdelmgeid & Alshazly,

Hammam. (2016). Image Features Detection,

Description and Matching. 630. 11-45.

10.1007/978-3-319-28854-3_2.

15. Lowe, D.G.: Distinctive image features from

scale-invariant keypoints. Int. J. Comput. Vis.

60(2), 91–110 (2004)

16. M. Brown and D. Lowe. Invariant Features from

Interest Point Groups. In David Marshall and Paul

L. Rosin, editors, Proceedings of the British

Machine Conference, pages 23.1-23.10. BMVA

Press, September 2002.

17. Harris, C. and Stephens, M. (1988) A Combined

Corner and Edge Detector. Proceedings of the 4th

Alvey Vision Conference, Manchester, 31

August-2 September 1988, 147-151.

18. Traa J., “Least Square Intercetion of Lines”, UIUC

2013,

http://cal.cs.illinois.edu/~johannes/research/LS_li

ne_intersect.pdf

19. Wu C., Agarwal S., Curless B., SM Seitz.

“Multicore bundle adjustment”. Computer Vision

and Pattern Recognition (CVPR), 2011 IEEE

Conference on …, 2011. 574, 2011.

20. Michael Kazhdan, Matthew Bolitho, and Hugues

Hoppe. 2006. Poisson surface reconstruction. In

Proceedings of the fourth Eurographics

symposium on Geometry processing (SGP '06).

Eurographics Association, Aire-la-Ville,

Switzerland, Switzerland, 61-70.