Conference PaperPDF Available

Abstract and Figures

The continued advancement of neural networks and other deep learning architectures have fundamentally changed the definition of "State of the art" (SOTA) in a wide and ever-growing range of disciplines. Arguably the most impacted field of study is that of computer vision, providing flexible and general function-approximation frameworks capable of accurately and reliably performing object identification and classification on a wide range of image datasets. However, the impressive gains achieved by deep learning methods come at a cost. The incredibly large number of images required to train a deep network makes them prohibitive for certain applications where large image datasets are limited or simply do not exist. Collecting the required image data is often too expensive, too dangerous, or too cumbersome to gather for many problem sets. Space-based applications are a perfect example of an imagery limited domain due to its complex and extreme environment. Conversely, breakthroughs in space-based computer vision applications would enable a wide range of fundamental capabilities required for the eventual automation of this critical domain, including robotics-based construction and assembly, repair, and surveying tasks of orbital platforms or celestial bodies. To bridge this gap in capability researchers have started to rely upon 3D rendered synthetic image datasets generated from advanced 3D rasterization software. Generating synthetic data is only step one. Space-Based computer vision is more complex than traditional terrestrial tasks due to the extreme variances encountered that can confuse or degrade optical sensors and CV and ML algorithms. These include orientation and translation tumble in all axes, inability for a model to orient on a horizon, and extreme light saturation on lit sides of celestial bodies and occlusion on dark sides and in shadows. To produce a more reliable and robust result for computer vision architectures, a new method of synthetic image generation called domain randomization has started to be applied to more traditional computer vision problem sets. This method involves creating an environment of randomized patterns, colors, and lighting while maintaining rigid structures for objects of interest. This may prove a promising solution to the variance space-based problem. This paper explores computer vision, domain randomization, and the necessary computational hardware required to apply them to space-based applications.
Content may be subject to copyright.
Comprehensive Assessment of Neural Network Synthetic Training Methods
using Domain Randomization for Orbital and Space-based Applications
Marco Peterson, Minzhen Du, Nadhir Cherfaoui, Alby Koolipurackal, Daniel D. Doyle, Jonathan T. Black
Kevin T. Crofton Department of Aerospace and Ocean Engineering,
Hume Center for National Security and Technology,
Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Abstract The continued advancement of neural networks
and other deep learning architectures have fundamentally
changed the definition of “State of the art” (SOTA) in a
wide and ever-growing range of disciplines. Arguably the most
impacted field of study is that of computer vision, providing
flexible and general function-approximation frameworks capa-
ble of accurately and reliably performing object identification
and classification on a wide range of image datasets. However,
the impressive gains achieved by deep learning methods come at
a cost. The incredibly large number of images required to train
a deep network makes them prohibitive for certain applications
where large image datasets are limited or simply do not exist.
Collecting the required image data is often too expensive, too
dangerous, or too cumbersome to gather for many problem
sets. Space-based applications are a perfect example of an
imagery limited domain due to its complex and extreme envi-
ronment. Conversely, breakthroughs in space-based computer
vision applications would enable a wide range of fundamental
capabilities required for the eventual automation of this critical
domain, including robotics-based construction and assembly,
repair, and surveying tasks of orbital platforms or celestial
bodies. To bridge this gap in capability researchers have started
to rely upon 3D rendered synthetic image datasets generated
from advanced 3D rasterization software.
Generating synthetic data is only step one. Space-Based
computer vision is more complex than traditional terrestrial
tasks due to the extreme variances encountered that can confuse
or degrade optical sensors and CV and ML algorithms. These
include orientation and translation tumble in all axes, inability
for a model to orient on a horizon, and extreme light saturation
on lit sides of celestial bodies and occlusion on dark sides
and in shadows. To produce a more reliable and robust result
for computer vision architectures, a new method of synthetic
image generation called domain randomization has started to
be applied to more traditional computer vision problem sets.
This method involves creating an environment of randomized
patterns, colors, and lighting while maintaining rigid structures
for objects of interest. This may prove a promising solution
to the variance space-based problem. This paper explores
computer vision, domain randomization, and the necessary
computational hardware required to apply them to space-based
applications.
I. INTRODUCTION
Optical-based navigation deployed onboard the Mariner 6
and Mariner 7 missions [1] to Mars in 1969 is arguably
*This work was supported by the Hume Center for National
Security and Technology, and the Institute for Critical Technology
and Applied Science (ICTAS) https://hume.vt.edu/ and
https://ictas.vt.edu/
Fig. 1: Synthetic Training (top image) used for real-world
inferencing (bottom image) on space-based applications.
the first use of computer vision for a space application.
This capability would later prove to be mission essential
technology for the Voyager missions a decade later. Today,
the most recent use of this technology can be found onboard
the recent history-making Mars Helicopter Ingenuity [2],
which employed a downward-facing monochrome camera
to capture relative motion velocity for real-time navigation
using velocimetry. This method of local navigation was se-
lected in lieu of absolute navigation because sufficient image
data required to perform future matching on the surface of
Mars did not exist. So how do we extend the computer vision
capability past simple feature extraction/mapping on known
image datasets to generalizing automation tasks in a wide
range of environments and the image datasets required?
Pioneering Convolutional Neural Network (CNNs) Archi-
tectures such as Alexnet [4], GoogleNet [5], and ResNet
[6] and most recently, YOLO [7] have provided researchers
and engineers with an automation tool to perform object
identification and classification. However, the success of
these neural network architectures has long been a product
of training with human labeled, real-world imagery.
A promising alternative to overcome the dependence on
these often time-consuming real world datasets is the adapta-
tion of physics and environmental engines such as the Unreal
Engine [8], Unity [9], and Blender [10] to generate synthetic
datasets, that would allow a faster and more efficient data
collection process. Although it accelerates the impact of
machine learning on industries such as robotics, synthetic
data generation comes with numerous limitations due to
discrepancies between simulation and the real world. Even
by altering the parameters of the simulation to match those
of the physical world, the process remains error-prone and
ultimately unreliable due to physical behaviors like fluid
dynamics that cannot be incorporated in the simulator, a
problem otherwise known as the reality gap. In this paper,
we explore the concept of Domain Randomization [18] [19]
[20] [21] [23] [24], its effectiveness in closing the reality
gap, the most recent advances in CNNs and how to deploy
them in orbit (Fig. 1).
A. Today’s State of the art CNNs
CNNs are networks of computational layers that employ
convolutions over grid-like topology input data (e.g. images
or time-series data) to extract feature maps. These architec-
tures are most commonly used to derive features from images
and videos because they are capable of deducing relation-
ships and dependencies between pixel structures. Due to the
rapid development of machine learning, new architectures
are developed monthly (if not weekly) that advance the state
of the art, but almost all of them use the same fundamental
building blocks. Today’s CNNs have two general functions:
feature extraction and classification.
Feature extraction is accomplished via the convolutional
block, which is further divided into three subparts. The first
is the convolution operation itself, which is governed by the
size and weights of the convolution kernel. Downstream,
an activation function (generally zero-centered) defines the
output of a node and adds non-linearity to the network.
Lastly, a pooling operation is often included which serves
to refine a large matrix space into smaller feature maps by
prioritizing the most-pronounced components of the input
and discarding the rest. After pooling is completed, a feature
map matrix containing the relevant information encoded
within the input image is returned. Multiple convolutional
blocks can be (and often are) chained together to further
refine the feature map.
Classification is the last operation employed by (most)
CNNs. During this step, the feature map is processed via a
series of fully connected layers. This results in a probability
score that represents the CNNs confidence that the input
image contains/matches a defined object. This classification
is then compared against the ground truth of the (manually)
labeled input, and an error metric is computed. The error is
then backpropagated throughout the neural network to each
weight via (stochastic) gradient descent which minimizes the
error between the final probability output and the ground
truth.
Given the extremely rapid pace of machine learning al-
gorithms, techniques, and methodologies, it is nearly im-
possible to exhaustively describe all of them. Some of the
best algorithms at performing these tasks today are R-CNN
[11], Fast R-CNN [12], and YOLO [7], however the YOLO
architecture, which stands for “You Only Look Once” and its
accompanying Darknet framework are consistently regarded
as being leaders in the field of image classification and is
the only one capable of acceptable frame rates on lower
power edge hardware. This architecture has experienced
four iterations since its conception to keep up with new
advancements. YOLO derives it’s detection speed in part to
the introduction of Anchor boxes, defined as a predefined
collection of boxes with widths and heights chosen to match
object sizes in a particular data set, or some other criteria
based on the problem set. Instead of using the sliding
window method of an arbitrarily sized box requiring multiple
passes over an image in hopes of finding an object feature
that matches predefined window dimensions near perfectly,
thousands of candidate anchor boxes are dispersed through
out a given input image. Each anchor box is used to calculate
the percentage of Intersection over Union (IOU) by assigning
an object score which is used to estimate the probability
that an object exists within a given anchor box regardless
of the predicated label, thereby only requiring a single pass
over an image (Fig. 2). Object scores combined with object
classification probability determines where an object is and
what an object is for a given input image. Both of these
parameters are incrementally updated via back propagation,
improving the size and shape of the anchor boxes as the
model is trained.
B. How CNNs are Evaluated
Mean average precision is one of the industry standards
for evaluation of a machine learning architecture, labeling
methodology, and input imagery in regards to their combined
ability to classify and localize an object or set of objects. This
metric is a function of precision and recall curves defined
below in Equations 1and 2, respectively where TP =True
Positive, T N =True Negative, F P =False Positive, and
F N =False Negative. Precision and recall serve as accuracy
metrics that calculate classification accuracy rates during
training.
Fig. 2: Intersection over Union (IoU) bounding box threshold
examples.
P recision =T P
T P +F P (1)
Recall =T P
T P +F N (2)
Average Precision is a measure of the area under the
Precision-Recall Curve (PR Curve) and is often defined as in
Equation 3. The Mean Average Precision (mAP) is simply
defined as in Equation 4.
AP =Z1
0
p(r)dr (3)
mAP = 1
N
N
X
i=1
APi(4)
A brief background on computer vision used in space
applications, the state of the art of CNNs, and one of the
industry standards for evaluating classifier performance has
been covered in this section. In Section II, an approach to
synthetic imagery is covered along with design questions that
help to address training data development for CNN models.
After initial training data development, improving model
performance through domain randomization is covered in
Section III. The ability to move from one domain to another
by domain adaptation is covered in Section IV. Next, Section
V covers computer vision in the space industry today. Section
VII addresses space hardware required for deployment of a
space-based computer vision system and a look at the tempo
of related research. Lastly, conclusions and future work is
provided in Section VII.
II. SYNTHETIC IMAGERY
The precision of neural networks has often been attributed
to architectural decisions such as number of layers, number
of neurons populating those layers, learning rates, as well
as the nonlinear and pooling functions adopted. Over the
past decade several best practices have been developed and
combined in variety of ways to produce more capable neural
networks. However the quality or lack of quality of input
data will also have a profound impact on performance.
Completely trained models using computer generated worlds
was introduced in 2017 [25] as means of generating massive
amounts of training data to train models that could solve
the complex “self driving automobile” problem. Furthermore,
by specifically leveraging the photo-realistic environment of
grand-theft auto five (GTA 5) [21], this method demonstrated
the ability to augment datasets with certain controllable
parameters such as geographic regions, time of day, and
weather to increase model robustness. With a generated
dataset of over 200,000 synthetic images pushed through the
Faster R-CNN architecture, this method achieved a mean
average precision of just under 70 percent utilizing an IOU
threshold of 0.7 when validating against the KITTI dataset
[17]. Fortunately for this particular problem set, a high
fidelity world was already created with liberal copy right use
for research applications and able to generate data from the
game engine buffers. The next step is expanding this capabil-
ity to a wider range of use cases. Given recent advancement
and proliferation of state of the art 3D rasterization engines
capable of rendering hyper-realistic environments at ever
increasing frame rates, it is now possible to substitute this
data with photo-realistic or cell shaded versions generated
within a digital world. Tools such as Nvidia’s “Ray Tracing”
[14] and “deep learning super-sampling” (DLSS) [15], as
well as The Unreal Engine’s “Meta Human” [16] Creator
have removed significant technical barriers to entry. Synthetic
datasets are now becoming more and more practical to a
growing number of research teams. Moreover, effectively
controlling desired parameters such as size, shape, distance,
and angle of every object and asset in a scene is a capability
difficult to replicate in the real world without the use of
a studio workspace. For these reasons, synthetic data has
become increasingly popular to both reduce the cost of the
data collection process and introduce greater flexibility and
variety. Providing a thorough understanding and control over
the simulated environment and can therefore be used to
create large datasets.
How accurately can the real world with all its complex
physical attributes be simulated?
Which characteristics are relevant for modeling and
which ones are unnecessary?
Which augmentations matter for a specific problem set
and which do not?
III. DOMAIN RANDOMIZATION
Domain randomization is simply the process of introduc-
ing enough variability in a simulation, real world objects will
eventually be generalized as just another randomized per-
mutation, allowing models to generalize to a greater extent
than non domain randomized datasets. In the past the use of
synthetic data has led to a problem known as the reality gap,
or the unavoidable deviation of any simulation from reality.
Domain Randomization is a method for creating simulated
data by constantly changing the environment of any given
dataset, reducing the need for hyper-realistic simulations.
The process of Domain Randomization to produce simulated
data has in some use cases increased a neural networks ability
to detect and classify real world objects. Not only inside
the intended domain, but across a wide range of domains.
Given the wide range of parameters that can be changed
dynamically within a synthetic environment, it is natural to
ask which parameters impact mean average precision the
most. The University of Toronto conducted an ablation study
[18] evaluating parameters such as lighting, asset textures,
orientation, Gaussian filters, and the use of flying distrac-
tors when validating their model against the KITTI dataset
[17]. Each parameter individually effected overall (mAP50)
anywhere from 7%-2% with lighting variation having the
largest impact. This approach was also applied to the task of
self driving cars. For space domain specific computer vision
application, the approach can be posed as a question of how
other variations can be used to increase model performance,
such as:
Variation of the size and distance of the objects of
interest
Variation in the object of interest itself (if applicable)
Angle of the camera with respect to the object of interest
Light source intensity and camera aperture size
Inferencing with supplemental lighting only
Applying these new gains in mean average precision to
real world manipulation - specifically for robotic arm control
- several researchers including a team from UC Berkeley
[23] have implemented domain randomization. When applied
to robotic arms, object classification, while still important,
comes second to the ability to perform accurate object lo-
calization. Domain randomization generally increases model
localization allowing robotic arms to be accurate up to
1.5 cm while retaining the ability to identify and track
objects even if they are partially occluded. However, domain
randomization has another use case related to robotics. In
the years to come, robotic manipulators will be tasked with
grasping/maneuvering small parts or fasteners to assemble
larger structures such as bolts, screws, and other fastening
hardware. Synthetically generated domain randomized im-
agery data can and should be used to train robotic platforms
to identify and localize the often small parts necessary for
construction or repair tasks. A sample is shown in Fig. 3.
Domain Randomization has another secondary benefit: Its
ability to supplement existing datasets by training a single
end-to-end model on a wide range of imagery that can
generalize and classify objects of interest across a wide
range of domain environments. In relation to the self driving
problem set, this translates into the ability to classify a
vehicle in any environment or weather condition across the
planet, or the ability to classify new classes of cars outside
the norm that might be found on road ways such as golf
carts. This method has been shown to achieve state of
the art performance by using episodic training to gradually
Fig. 3: Synthetic Data Domain Randomization (Unreal En-
gine 4)
introduce new domains to the network using only real world
imagery [22]. Incorporating domain randomization into a
large domain generalized data set will not only be more cost
effective when collecting large amounts of data, but may also
provide grounds for significant increases in performance.
IV. DOMAIN ADAPTATION
While domain generalization attempts to train one model
for several domain use cases, domain adaptation is the
ability to effectively train a new model (target model) for
a specific domain (source) based on what has already been
learned by another model from a different domain. This
methodology [22] [27] is related to what is known as transfer
learning. Domain shift dramatically effects a deep network’s
performance because features extracted become more and
more specific with each training epoch. If the network
becomes too specific, overfitting occurs, causing the network
to fail to generalize objects within the domain from which
it was trained, and therefore, ineffective for other domains.
A number of approaches have been proposed, including re-
training the model in the target domain adapting the weights
of the model based on the statistics of the source and target
domains, learning invariant features between domains, and
learning a mapping from the target domain to the source
domain. Researchers in the reinforcement learning commu-
nity have also studied the problem of domain adaptation
by learning invariant feature representations, adapting pre-
trained networks, and other methods.
A group of researchers from Google Brain [26] have used
domain adaptation to reduce the number of target domain
training required. Using a generative adversarial network
(GAN) algorithm to train a robotic arm to grasp objects with
a data set of nine million real images and more than eight
million synthetic domain randomized images. By varying the
percentage of the real data used the neural networks were
tested on their ability to pick up previously unseen physical
objects. They were trained with only synthetic data, only real
data, and then synthetic plus a percentage of real data. The
neural network was able to match the full real data’s accuracy
with only 2% of the real data when mixed with the synthetic
data data. In addition when combining the synthetic with the
real data there was a significant gain in accuracy compared
to only synthetic or only real.
The current neural networks that have been trained will
likely not work as well in space as they have been trained
in gravity and with solid foundations. When moving objects
in space, the reactionary forces caused by the robot’s own
movement and that of the object it is moving would come
into effect. Disturbances and vibrations become more im-
portant in space as these can propagate through the arm or
even move the spacecraft. However, space is simply another
domain that can be adapted to from a model trained on Earth.
In addition, training these networks on synthetic data and a
limited amount of real data from space, the neural networks
can gain accuracy at a significantly higher rate and lower
cost compared to training solely on real data.
V. SPACE INDUSTRY COMPUTER VISION
Perhaps the two most proliferated topics in the field of
space-based automation are On-Orbit Servicing which is
the process of rendezvous and docking with a damaged or
inoperable satellite and effecting necessary repairs, and In-
Space Assembly which is the manufacture and/or assembly
of necessary materials to effect the assembly of a structure in
micro-gravity. The space shuttle program fulfilled servicing
missions on several occasions, including five repair missions
to the Hubble telescope. Of course, these missions were
completed with astronauts performing extra vehicle activities
(EVAs), increasing overall mission risk. These EVAs helped
lay the ground work for the International Space Station
(ISS). However as our capabilities grow, a desire to shift
these responsibilities from astronauts to robotic systems has
percolated throughout the industry. Furthermore, automating
these capabilities will become mission critical technologies
as we look to exploit Cislunar, and our spacecraft travel
further into the solar system. The aerospace industry has
dedicated significant time and resources to solving these
problem sets, as detailed in Fig. 4over the last several years.
Several thousands of publications from leading conferences
such as Institute of Electrical and Electronics Engineers
(IEEE), Association for Computing Machinery (ACM), and
Society of Photo-Optical Instrumentation Engineers (SPIE)
illustrating solutions such as optimal orbits, best materials for
additive manufacturing, inverse kinematics in space, and the
economics of such a venture have been introduced over the
last 5 years. However, the publications employing machine
learning and computer vision capabilities needed to provide
a true human out of the loop solution are far less numerous
as detailed in Fig. 5.
To date, there are even fewer publications employing
domain randomization for computer vision use cases as
illustrated by Figure 6, and to our knowledge, zero literature
deploying domain randomization as a means to overcome the
variance problem within the space domain.
VI. SPACE HARDWARE REQUIRED FOR
DEPLOYMENT
Once a computer vision model has been sufficiently
trained for a desired task, deploying that model on-board a
Fig. 4: Number of In-Space Assembly and On-Orbit Assem-
bly related publications from major Conferences and Journals
2016 - 2020
Fig. 5: Number of Computer Vision based In-Space Assem-
bly and On-Orbit Assembly related publications from major
Conferences and Journals 2016 - 2020
satellite, rover, or space station with the available hardware
remains quite challenging.
Conventional avionics control architectures are controlled
by one or more central on-board computers (OBC), act-
ing as a central hub for all other subsystems including
sensor data; and can generally be implemented with 8-bit
micro-controllers. However, Performing object inferencing
in real time is computationally demanding, requiring a high-
performance embedded computer capable of high band with
communication protocols to the OBC.
Providing space-grade radiation hardening for such a sub-
system, for now, is inherently expensive. The computational
architecture will more than likely be modeled after a CUDA
[29] capable Graphics Processing Unit (GPU) with several
Gigabytes of onboard virtual memory (VRAM) that will need
to be protected against radiation, extreme vibrations, hard
vacuum, and high-temperature variations.
Traditional Commercial-off-the-shelf (COTS) hardware is
ill-suited for the extreme riggers of space. There are several
Fig. 6: Number of Computer Vision based In-Space Assem-
bly and On-Orbit Assembly related publications from major
Conferences and Journals 2016 - 2020
Edge-computing COTS devices available today such as the
Jetson TX2 and Jetson Xavier [28] capable of running on-
board computer vision architectures at acceptable frame
rates, but almost all compute devices sold for terrestrial
applications have a temperature rating between -30°C to
+70°C, while exterior temperatures of the international space
station (ISS) range between -157°C to +121°C. Lastly, and
perhaps the most problematic issue for space electronics is
the possibility of bit flipping and data corruption caused
by high amounts of ionizing radiation found within the
Van Allen belts encapsulating our planet, solar particles,
and cosmic rays from outside our solar system. Memory
modules containing any trained computer vision model itself
will require hardened electronics. Corrupting the weights of
the neural network or the neural network itself will degrade
its capability to perform object detection and localization.
However, a correlation study between corrupt model data
and its effect on mean average precision to our knowledge
has not been conducted.
As of today, the latest radiation tolerant 32-bit micropro-
cessors such as LEON4 [30] or RAD750 [31] are capable
of just over 400 DMIPS (Dhrystone Million Instructions per
Second) which is an architecture independent performance
calculation. Compared to the 32 TeraOPS (trillion Operations
Per Second) required to run the most modern computer
vision architectures such as YOLOv4 above 30 frames per
second (FPS), space grade CPUs are orders of magnitude
behind in computational complexity necessary. However,
dedicated graphics processing units use 32 bit floating point
arithmetic for rendering rasterization polygons, operating at
a level of mathematical precision not necessary required for
the multiply and accumulate operations used for machine
learning. Using dedicated analog or digital machine learning
integrated circuits capable of high volumes of low precision
calculations such as 8-bit integer computational architectures,
will allow for significantly more training calculations at a
lower energy cost.
Fig. 7: Radiation Hardened vs. Traditional Commercial-off-
the-shelf computational hardware
Given the constraints of satellite design, particularly the
cost to deliver mass to orbit and limited power distribution
provided by onboard solar panels, designing high perfor-
mance avionics and computation architectures required to
effectively deploy not only computer vision algorithms but
general machine learning technologies to the space domain
is still a significant challenge that will need to be addressed
before true automation can be achieved.
VII. CONCLUSIONS AN D FUTURE WORK
Given the promising applications of CNNs based computer
vision architectures, Synthetic Data Generation, and Domain
Randomization, applying these technology’s to space-based
problem sets may prove to be a mission essential end-to-
end solution needed to achieve truly autonomous on orbit
robotics capability.
In future work, we intend to start exploring this capability
by bridging both the reality gap as it pertains to the space
environment and the gap in literature on several of the unan-
swered questions outlined in this paper by demonstrating:
The performance of a neural network solely trained
on domain randomized synthetic detests when validated
against real world imagery within the space domain.
How effective ts the combination of real world and
synthetic training (domain generalization) on improving
accuracy
The impact of domain randomization’s characteristics
such as the introduction of flying distractors as well as
various textures and lighting on mean average precision
results.
The limitations of simulation and overall performance
of neural networks trained on a synthetic data.
REFERENCES
[1] “Mariner 6 amp; 7,” NASA, 07-Sep-2019. [Online]. Available:
https://mars.nasa.gov/mars-exploration/missions/mariner-6-7/
[2] J. Balaram, M. M. Aung, and M. P. Golombek, “The Ingenuity
Helicopter on the Perseverance Rover,” Space Sci. Rev., vol. 217, no.
4, pp. 1–11, 2021
[3] “ImageNet large Scale visual Recognition CHALLENGE 2017
(ILSVRC2017),”.
[4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. ”Imagenet
classification with deep convolutional neural networks. Advances in
neural information processing systems 25 (2012): 1097-1105.
[5] C. Szegedy et al., “Going deeper with convolutions,” Proc. IEEE
Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07-12-June-
2015, pp. 1–9, 2015, doi: 10.1109/CVPR.2015.7298594.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit., vol. 2016-December, pp. 770–778, 2016.
[7] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal
Speed and Accuracy of Object Detection,”
[8] Unreal Engine. [Online]. Available: https://www.unrealengine.com/en-
US/unreal
[9] U. Technologies, “Solutions, Unity. [Online]. Available:
https://unity.com/solutions.
[10] Blender Foundation, “blender.org - Home of the Blender project - Free
and Open 3D Creation Software,” Blender.org. [Online]. Available:
https://www.blender.org.
[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Tech report (v5)
R-CNN: Regions with CNN features,” Proc. ieee Conf. Comput. Vis.
pattern Recognit., 2014
[12] E. Hanna and M. Cardillo, “Faster R-CNN2015,” Biol. Conserv., vol.
158, pp. 196–204, 2013.
[13] A. Geiger, P. Lenz, and R. Urtasun, “VSLAM Datasets, Proc. IEEE
Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 3354–3361,
2012.
[14] B. Bitterli, C. Wyman, M. Pharr, P. Shirley, A. Lefohn, and W. Jarosz,
“Spatiotemporal reservoir resampling for real-time ray tracing with
dynamic direct lighting,” ACM Trans. Graph., vol. 39, no. 4, 2020,
doi: 10.1145/3386569.3392481.
[15] “NVIDIA DLSS 2.0: A Big Leap In AI Rendering,” Nvidia.com. [On-
line]. Available: https://www.nvidia.com/en-us/geforce/news/nvidia-
dlss-2-0-a-big-leap-in-ai-rendering/.
[16] Unreal Engine, “Early Access to MetaHuman Creator is now
available!, Unreal Engine, 14-Apr-2021. [Online]. Available:
https://www.unrealengine.com/en-US/blog/early-access-to-
metahuman-creator-is-now-available.
[17] A. Geiger, P. Lenz, and R. Urtasun, “VSLAM Datasets, Proc. IEEE
Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 3354–3361,
2012.
[18] J. Tremblay et al., “Training deep networks with synthetic data:
Bridging the reality gap by domain randomization,” arXiv [cs.CV],
2018
[19] J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil,
T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training Deep
Networks with Synthetic Data: Bridging the Reality Gap by Domain
Randomization,” 2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW), 2018.
[20] J. Huang, D. Guan, A. Xiao, and S. Lu, “FSDR: Frequency Space
Domain Randomization for Domain Generalization,” pp. 6891–6902,
2021, [Online]. Available: http://arxiv.org/abs/2103.02370.
[21] X. Yue, Y. Zhang, S. Zhao, A. Sangiovanni-Vincentelli, K. Keutzer,
and B. Gong, “Domain randomization and pyramid consistency:
Simulation-to-real generalization without accessing target domain
data,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp.
2100–2110, 2019, doi: 10.1109/ICCV.2019.00219.
[22] B. Huang, S. Chen, F. Zhou, C. Zhang, and F. Zhang, “Episodic Train-
ing for Domain Generalization Using Latent Domains,” Commun.
Comput. Inf. Sci., vol. 1397 CCIS, pp. 85–93, 2021, doi: 10.1007/978-
981-16-2336-37.
[23] J. Tobin, R. Fong, A. Ray, J.Schneider, W.Zaremba, P. Abbeel,
”Domain Randomization for Transferring Deep Neural Networks
from Simulation to the Real World, 2017 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS)
[24] E. Ameperosa, P. Bhounsule, ”Domain Randomization Using Deep
Neural Networks for Estimating Positions of Bolts,” Journal of Com-
puting and Information Science in Engineering, 2018,
[25] M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen,
and R. Vasudevan. Driving in the matrix: Can virtual worlds replace
human-generated annotations for real world tasks? In ICRA, 2017
[26] K.Bousmalis, A. Irpan, P. Wohlhar, Y. Bai, M. Kelcey, M. Kalakr-
ishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, V.
Vanhoucke, ”Using Simulation and Domain Adaptation to Improve
Efficiency of Deep Robotic Grasping, 018 IEEE International Con-
ference on Robotics and Automation (ICRA)
[27] M. Wang and W. Deng, “2018 Survey,” Neurocomputing, vol. 312,
pp. 135–153, 2018.
[28] “Jetson AGX Xavier Developer Kit,” Nvidia.com, 09-Jul-2018. [On-
line]. Available: https://developer.nvidia.com/embedded/jetson-agx-
xavier-developer-kit
[29] CUDA Toolkit,” Nvidia.com, 02-Jul-2013. [Online]. Available:
https://developer.nvidia.com/cuda-toolkit.
[30] LEON4,” Gaisler.com. [Online]. Available:
https://www.gaisler.com/index.php/products/processors/leon4.
[31] Baesystems.com. [Online]. Available:
https://www.baesystems.com/en-us/product/radiation-hardened-
electronics
... 11: Major Conference and Journal papers (2016 -2020) detailing vision based solutions for space-based assembly, servicing, and manufacturing[45] To date, few publications employing domain randomization for computer vision tasking have been demonstrated, as illustrated byFigure 2.12. To my knowledge, no other literature exists utilizing domain randomization as a means to solve the optical variation problems associated with spaceborne applications has been published. ...
... 12: Major Conference and Journal papers (2016 -2020) illustrating domain randomized computer vision solutions[45] ...
Thesis
Full-text available
The proliferation of reusable space vehicles has fundamentally changed how we inject assets into orbit and beyond, increasing the reliability and frequency of launches. Leading to the rapid development and adoption of new technologies into the Aerospace sector, such as computer vision (CV), machine learning (ML), and distributed networking. All these technologies are necessary to enable genuinely autonomous decision-making for space-borne platforms as our spacecraft travel further into the solar system, and our missions sets become more ambitious, requiring true ``human out of the loop" solutions for a wide range of engineering and operational problem sets. Deployment of systems proficient at classifying, tracking, capturing, and ultimately manipulating orbital assets and components for maintenance and assembly in the persistent dynamic environment of space and on the surface of other celestial bodies, tasks commonly referred to as On-Orbit Servicing and In Space Assembly, have a unique automation potential. Given the inherent dangers of manned space flight/extravehicular activity (EVAs) methods currently employed to perform spacecraft construction and maintenance tasking, coupled with the current limitation of long-duration human flight outside of low earth orbit, space robotics armed with generalized sensing and control machine learning architectures is a tremendous enabling technology. However, the large amounts of sensor data required to adequately train neural networks for these space domain tasks are either limited or non-existent, requiring alternate means of data collection/generation. Additionally, the wide-scale tools and methodologies required for hardware-in-the-loop simulation, testing, and validation of these new technologies outside of multimillion-dollar facilities are largely in their developmental stages. This dissertation proposes a novel approach for simulating space-based computer vision sensing and robotic control using both physical and virtual reality testing environments. This methodology is designed to both be affordable and expandable, enabling hardware-in-the-loop simulation and validation of space systems at large scale across multiple institutions. While the focus of the specific computer vision models in this paper are narrowly focused on solving imagery problems found on orbit, this work can be expanded to solve any problem set that requires robust onboard computer vision, robotic manipulation, and free flight capabilities.
... The ability to create usable synthetic data from a small sample of information is of growing importance in a variety of different domains such as medicine, education, engineering, language, and business -to name a few. Furthermore, once the correct models have been created to generate synthetic data for a specific use case, the costs of running experiments and simulations are reduced because the data can be generated in place of such experimentation [14,15]. Within the context of higher education, finding relevant data remains a challenge [16]. ...
Preprint
Full-text available
The ability to generate synthetic data has a variety of use cases across different domains. In education research, there is a growing need to have access to synthetic data to test certain concepts and ideas. In recent years, several deep learning architectures were used to aid in the generation of synthetic data-but with varying results. In the education context, the sophistication of implementing different models requiring large datasets is becoming very important. This study aims to compare the application of synthetic tabular data generation between a probabilistic model specifically a Bayesian Network, and a deep learning model, specifically a Generative Adversarial Network using a classification task. The results of this study indicate that synthetic tabular data generation is better suited for the education context using probabilistic models (overall accuracy of 75%) than deep learning architecture (overall accuracy of 38%) because of probabilis-tic interdependence. Lastly, we recommend that other data types, should be explored and evaluated for their application in generating synthetic data for education use cases.
Article
Autonomously navigating robots need to perceive and interpret their surroundings. Currently, cameras are among the most used sensors due to their high resolution and frame rates at relatively low-energy consumption and cost. In recent years, cutting-edge sensors, such as miniaturized depth cameras, have demonstrated strong potential, specifically for nano-size unmanned aerial vehicles (UAVs), where low-power consumption, lightweight hardware, and low-computational demand are essential. However, cameras are limited to working under good lighting conditions, while depth cameras have a limited range. To maximize robustness, we propose to fuse a millimeter form factor 64 pixel depth sensor and a low-resolution grayscale camera. In this work, a nano-UAV learns to detect and fly through a gate with a lightweight autonomous navigation system based on two tinyML convolutional neural network models trained in simulation, running entirely onboard in 7.6 ms and with an accuracy above 91%. Field tests are based on the Crazyflie 2.1, featuring a total mass of 39 g. We demonstrate the robustness and potential of our navigation policy in multiple application scenarios, with a failure probability down to 1.2\,\,\cdot \,\,10\,\,^{\mathrm{ -3}} crash/meter, experiencing only two crashes on a cumulative flight distance of 1.7 km.
Article
Full-text available
The Ingenuity Helicopter will be deployed from the Perseverance Rover for a 30-sol experimental campaign shortly after the rover lands and is commissioned. We describe the helicopter and the associated Technology Demonstration experiment it will conduct, as well as its role in informing future helicopter missions to Mars. This helicopter will demonstrate, for the first time, autonomous controlled flight of an aircraft in the Mars environment, thus opening up an aerial dimension to Mars exploration. The 1.8 kg1.8~\text{kg}, 1.2 m1.2~\text{m} diameter helicopter, with twin rotors in a counter-rotating co-axial configuration, will help validate aerodynamics, control, navigation and operations concepts for flight in the thin Martian atmosphere. The rover supports a radio link between the helicopter and mission operators on Earth, and information returned from a planned set of five flights, each lasting up to 90 seconds, will inform the development of new Mars helicopter designs for future missions. Such designs in the 4 kg–30 kg4~\text{kg}\text{--}30~\text{kg} range would have the capability to fly many kilometers daily and carry science payloads of 1 kg–5 kg1~\text{kg}\text{--}5~\text{kg}. Small helicopters can be deployed as scouts for future rovers helping to select interesting science targets, determine optimal rover driving routes, and providing contextual high-vantage imagery. Larger craft can be operated in standalone fashion with a tailored complement of science instruments with direct-to-orbiter communication enabling wide-area operations. Other roles including working cooperatively with a central lander to provide area-wide sampling and science investigations. For future human exploration at Mars, helicopter can be employed to provide reconnaissance.
Article
Full-text available
Current manual practices of replacing bolts on structures are time-consuming and costly, especially due to a large number of bolts. An automated method that can visually detect and localize bolt positions would be highly beneficial. We demonstrate the use of deep neural network using domain randomization for detecting and localizing bolts on a workpiece. In contrast to previous approaches that require training on real images, the use of domain randomization allows for all training to be done in simulation. The key idea is to create a wide variety of computer-generated synthetic images by varying the texture, color, camera position and orientation, distractor objects, and noise, and train the neural network on these images such that the neural network is robust to scene variability and hence provides accurate results when deployed on real images. Using domain randomization, we train two neural networks, a faster regional convolutional neural network for detecting the bolt and predicting a bounding box, and a regression convolutional neural network for estimating the x- and y-position of the bolt relative to the coordinates fixed to the workpiece. Our results indicate that in the best case we can detect bolts with 85% accuracy and can predict the position of 75% of bolts within 1.27 cm. The novelty of this work is in the use of domain randomization to detect and localize: (1) multiples of a single object, and (2) small-sized objects (0.6 cm × 2.5 cm). Video: https://youtu.be/-DLU-bjDOhE
Article
Full-text available
Deep domain adaption has emerged as a new learning technique to address the lack of massive amounts of labeled data. Compared to conventional methods, which learn shared feature subspaces or reuse important source instances with shallow representations, deep domain adaption methods leverage deep networks to learn more transferable representations by embedding domain adaptation in the pipeline of deep learning. There have been comprehensive surveys for shallow domain adaption, but few timely reviews the emerging deep learning based methods. In this paper, we provide a comprehensive survey of deep domain adaptation methods for computer vision applications with four major contributions. First, we present a taxonomy of different deep domain adaption scenarios according to the properties of data that define how two domains are diverged. Second, we summarize deep domain adaption approaches into several categories based on training loss, and analyze and compare briefly the state-of-the-art methods under these categories. Third, we overview the computer vision applications that go beyond image classification, such as face recognition, semantic segmentation and object detection. Fourth, some potential deficiencies of current methods and several future directions are highlighted.
Chapter
Domain generalization (DG) is to learn knowledge from multiple training domain, and build a domain-agnostic model that could be used to an unseen domain. In this paper, take advantage of aggregating data method from all source and latent domains as a novel, we propose episodic training for domain generalization, aim to improve the performance during the trained model used for prediction in the unseen domain. To address this goal, we first designed an episodic training procedure that train a domain-generalized model without using domain labels. Firstly, we divide samples into latent domains via clustering, and design an episodic training procedure. Then, trains the model via adversarial learning in a way that exposes it into domain shift which decompose the model into feature extractor and classifier components, and train each component on the episodic domain. We utilize domain-invariant feature for clustering. Experiments show that our proposed method not only successfully achieves un-labeled domain generalization but also the training procedure improve the performance compared conventional DG methods.
Article
Efficiently rendering direct lighting from millions of dynamic light sources using Monte Carlo integration remains a challenging problem, even for off-line rendering systems. We introduce a new algorithm---ReSTIR---that renders such lighting interactively, at high quality, and without needing to maintain complex data structures. We repeatedly resample a set of candidate light samples and apply further spatial and temporal resampling to leverage information from relevant nearby samples. We derive an unbiased Monte Carlo estimator for this approach, and show that it achieves equal-error 6×-60× faster than state-of-the-art methods. A biased estimator reduces noise further and is 35×-65× faster, at the cost of some energy loss. We implemented our approach on the GPU, rendering complex scenes containing up to 3.4 million dynamic, emissive triangles in under 50 ms per frame while tracing at most 8 rays per pixel.