Conference PaperPDF Available

From the Lab to the Desert: Fast Prototyping and Learning of Robot Locomotion

Authors:

Abstract and Figures

We present a methodology for fast prototyping of morphologies and controllers for robot locomotion. Going beyond simulation-based approaches, we argue that the form and function of a robot, as well as their interplay with real-world environmental conditions are critical. Hence, fast design and learning cycles are necessary to adapt robot shape and behavior to their environment. To this end, we present a combination of laminate robot manufacturing and sample-efficient reinforcement learning. We leverage this methodology to conduct an extensive robot learning experiment. Inspired by locomotion in sea turtles, we design a low-cost crawling robot with variable, interchangeable fins. Learning is performed using both bio-inspired and original fin designs in an artificial indoor environment as well as a natural environment in the Arizona desert. The findings of this study show that static policies developed in the laboratory do not translate to effective locomotion strategies in natural environments. In contrast to that, sample-efficient reinforcement learning can help to rapidly accommodate changes in the environment or the robot.
Content may be subject to copyright.
From the Lab to the Desert: Fast Prototyping and
Learning of Robot Locomotion
Kevin Sebastian Luck∗§, Joseph Campbell §, Michael Andrew Jansen†§ , Daniel M. Aukesand Heni Ben Amor
School of Computing, Informatics, and Decision Systems Engineering,
School of Life Sciences,
The Polytechnic School,
Arizona State University, Tempe, Arizona 85281
Email: {ksluck, jacampb1, majanse1, danaukes, hbenamor}@asu.edu
§Authors contributed equally
Abstract—We present a methodology for fast prototyping
of morphologies and controllers for robot locomotion. Going
beyond simulation-based approaches, we argue that the form
and function of a robot, as well as their interplay with real-
world environmental conditions are critical. Hence, fast de-
sign and learning cycles are necessary to adapt robot shape
and behavior to their environment. To this end, we present
a combination of laminate robot manufacturing and sample-
efficient reinforcement learning. We leverage this methodology
to conduct an extensive robot learning experiment. Inspired
by locomotion in sea turtles, we design a low-cost crawling
robot with variable, interchangeable fins. Learning is performed
using both bio-inspired and original fin designs in an artificial
indoor environment as well as a natural environment in the
Arizona desert. The findings of this study show that static
policies developed in the laboratory do not translate to effective
locomotion strategies in natural environments. In contrast to
that, sample-efficient reinforcement learning can help to rapidly
accommodate changes in the environment or the robot.
I. INTRODUCTION
Robots are often tasked with operating in challenging en-
vironments that are difficult to model accurately. Search-and-
rescue or space exploration tasks, for example, require robots
to navigate through loose, granular media of varying density
and unknown composition, such as sandy desert environments.
A common approach is to use simulations in order to de-
velop ideal locomotion strategies before deployment. Such an
approach, however, requires prior knowledge about ground
composition which may not be available or may fluctuate
significantly. In addition, the sheer complexity of such ter-
rain necessitates the use of approximations when simulating
interactions between the robot and its environment. However,
inaccuracies inherent to approximations can lead to substantial
discrepancies between simulated and real-world performance.
These limitations are especially troublesome as robot design
is also guided by simulations in order to overcome time con-
straints and material deterioration associated with traditional
physical testing.
In this work we argue that the design of effective locomotion
strategies is dependent on the interplay between (a) the shape
of the robot, (b) the behavioral and adaptive capabilities of
the robot, and (c) the characteristics of the environment. In
particular, adverse and dynamic terrains require a design pro-
Fig. 1: A robot made from a multi-layer composite learns how
to move across sand in the Arizona desert.
cess in which both form and function of a robot can be rapidly
adapted to numerous environmental constraints. To this end,
we introduce a novel methodology employing a combination
of fast prototyping and manufacturing with sample-efficient
reinforcement, thereby enabling practical, physical testing-
based design.
First, we describe a manufacturing process in which foldable
robotic devices (Fig. 1) are constructed out of a single planar
shape consisting of multiple laminated layers of material. The
overall production time of a robot using this manufacturing
method is in the range of a few hours, i.e., from the first laser-
cut to the deployment. As a result, changes to the robot shape
can be performed by quickly iterating over several low-cost
design cycles.
In addition to rapid design refinement and iteration, the
synthesis of effective robot control policies is also of vital
importance. Variations in terrain, the assembly process, motor
properties, and other factors can heavily influence the optimal
locomotion policy. Manual coding and adaptation of control
policies is, therefore, a laborious and time-intensive process
which may have to be repeated whenever the robot or terrain
properties change, especially drift in actuation or changes in
media granularity. Reinforcement learning (RL) methods [21]
are a potential solution to this problem. Using a trial-and-error
process, RL methods explore the policy space in search of
arXiv:1706.01977v1 [cs.RO] 6 Jun 2017
solutions that maximize the expected reward, e.g., the distance
traveled while executing the policy. However, RL algorithms
typically require thousands or hundreds of thousands of trials
before they converge on a suitable policy [17]. Performing
large numbers of experiments on a physical robot causes
wear-and-tear on hardware, leads to drift in sensing and
actuation, and may require extensive human involvement. This
severely limits the number of learning experiments that can be
performed within a reasonable amount of time.
A key element of our approach is a sample-efficient RL [11]
method which is used for swift learning and adaptation
whenever the changes occur to the robot or the environment.
By leveraging the low-dimensional nature and periodicity of
locomotion gaits, we can rapidly synthesize effective control
policies that are best adapted to the current terrain. We show
that using this method, the learning process quickly converges
towards appropriate policy parameters. This translates to learn-
ing times of about 2-3 hours on the physical robot.
We leverage this methodology to conduct an extensive robot
learning experiment. Inspired by locomotion in sea turtles, we
design a low-cost crawler robot with variable, interchangeable
fins. Learning is performed with different bio-inspired and
original fin designs in both an indoor, artificial environment,
as well as a natural environment in the Arizona desert. The
findings of this experiment indicate that artificial environments
consisting of poppy seeds, plastic granulates or other popular
loose media substitutes may be a poor replacement for true
environmental conditions. Hence, even policies that are not
learned in simulation, but rather on granulate substitutes in
the lab may not translate to reasonable locomotion skills in the
real-world. In addition, our findings show that reinforcement
learning is a crucial component in adapting and coping with
variability in the environment, the robot, and the manufactur-
ing process.
We thus demonstrate that the combination of a rapid proto-
typing process for robot design (form) and the fast learning of
robot policies (function) enables environment-adaptive robot
locomotion.
II. RE LATE D WOR K
Prior studies have indicated that locomotion in granulate
media is dependent upon successful compaction of the sub-
strate, without causing fluidization [9, 15, 14]. Unfortunately,
the dynamic response of granulate media during locomotion
is difficult to predictively simulate or replicate [9, 15, 1]. In
desert environments, this difficulty is compounded by the het-
erogeneous composition of the loose, sandy topsoil, making it
nearly impossible to predict the effectiveness of any locomotor
strategy a priori [9, 1]. In practice, the performance of robotic
systems in heterogeneous granulate media, particularly in xeric
habitats, must be evaluated post-hoc and iteratively improved
(see for example [15]) through successive design refinements
and adaptive learning of locomotion.
Finned animals, and sea turtles in particular, have achieved
highly stable and efficient locomotion through heterogeneous
granulate substrata [14, 9, 15]. Of the many animals capable
of effective locomotion in sand, we drew most heavily upon
the sea turtle due to the simplicity and stability of its motion
[25]. Unlike sand-swimming animals, like sand lizards [13],
finned animals (such as sea turtles) require fewer degrees of
freedom and actuated joints to achieve forward motion.
A robotic analogue to sea turtles, FlipperBot (FBot), was
designed to provide a two-limbed approximation of sea turtle
locomotion in an ongoing effort to characterize the motion of
finned animals through sand [15]. Unlike other robotic devices
inspired by turtles, FBot was designed for locomotion in
granulate media and not for swimming [10, 26]. FBot features
two degrees of freedom for each limb; however, the fins were
configured such that they could be either fixed (relative to the
arm) or free to rotate. The combined quasi-static motion of
the limbs was similar to a “breast stroke”, dragging the body
through the sand [15].
In general, the “bio-inspired robotics” approach [2] has
proven fruitful for designing laboratory robots with new ca-
pabilities (new gaits, morphologies, control schemes), includ-
ing rapid running [3, 18], slithering [22], flying [12], and
“swimming” in sand [13]. In addition, using the biologically
inspired robots as “physical models” of the organisms has
revealed scientific insights into the principles that govern
movement in biological systems, as well as new insights into
low-dimensional dynamical systems (see for example [7] and
references therein). Our work differs fundamentally from these
works, not only in execution, but also in principle: we aim to
generate optimal motion through bio-mimicry and learning,
rather than learning how optima are generated in a biological
system.
III. METHODOLOGY
In this section, we describe our methodology for fast robot
prototyping and learning. We discuss a sample-efficient rein-
forcement learning method that enables fast learning of new
locomotion skills. In combination with a laminate robot man-
ufacturing process, our method allows for rapid iterations over
both form and function of a robot. The main rationale behind
this approach is that environmental conditions are often hard
to reproduce outside of the natural application domain. Hence,
the development cycle should be informed by experiences
in the real application domain, e.g., on challenging terrain
such as desert environments. Our approach facilitates this
process and significantly reduces the underlying development
time. Consequently, we will describe the methodology for
prototyping form and function in more detail.
A. FORM: Laminate Robot Manufacturing
Laminate manufacturing can be used in order to con-
struct affordable, light-weight robots. Laminate fabrication
processes (known as SCM [24], PC-MEMS [20], popup-book
MEMS [23], lamina-emergent mechanisms [6], etc.) permit
rapid construction from planar sheets of material which are
iteratively cut, aligned, stacked, and laminated to form a
composite material.
(a) (b) (c)
Fig. 2: Manufacturing of laminate robotic mechanisms: (a) The robot components are engraved using a laser cutter on planar
sheets and later laminated. (b) The components are folded into a robotic structure. The motors, the control board, and the
battery are added manually. (c) The full fabrication process.
Fig. 3: The laminates involved are constructed as a sandwich
of five layers: poster-board, adhesive, polyester, adhesive, and
poster-board.
The laminate mechanisms discussed in this paper were
printed in five layers. As shown in Figure 3, two rigid layers of
1mm-thick poster-board are sandwiched around two adhesive
layers of Drytac MHA heat-activated acrylic adhesive, which
is itself sandwiched around a single layer of 50 µm-thick
polyester film from McMaster-Carr. Flat sheets of each mate-
rial are cut out on a laser cutter, then stacked and aligned using
a jig with holes pre-cut in the first pass of the laser cut. Once
aligned, these layers are fused together using a heated press in
order to create a single laminate. The adhesive cures at around
85-104 degrees Celsius, and comes with a paper backing
which allows it to be cut, aligned with the poster-board, and
laminated. The paper backing is then removed, and the two
adhesive-mounted poster-board layers are aligned with the
center layer of polyester and laminated again. This laminate
is returned to the laser, where a second release cut permits the
device to be separated from surrounding scrap material and
erected into a final three-dimensional configuration.
Laminate mechanisms resulting from this process are ca-
pable of a high degree of precision through bending of
flexure-based hinges created through the selective removal of
rigid material along desired bend axes. With fewer rolling
contacts(bearings) than would typically be found in traditional
robots, laminate mechanisms are ideal in sandy environments,
where sand infiltration can shorten service life. Connections
between layers can be established through adhesive layers,
in addition to plastic rivets which permit quick attachment
between laminates. Mounting holes permit attaching a va-
riety of off-the-shelf components including motors, micro-
controllers, and sensors. Rapid attachment/detachment is a
highly desired feature for this platform, as different flipper
designs can be tested using the same base platform. In all,
this fabrication method permits rapid iteration during the
design phase, and rapid re-configuration for testing a variety
of designs across a wide range of force and length scales, due
to its compatibility with a wide range of materials. Fig. 2(a)
depicts the basic planar sheets after cutting. Fig. 2(b) shows
the individual components of the robot after they are detached
from the sheets and folded into a structure. Fig. 2(c) is the full
fabrication process. The whole manufacturing process of one
robot takes up to 50 minutes while the 3D-printing process
of four horns, which serve as connections between the motors
and the paper, takes 58 minutes.
B. FUNCTION: Sample-Efficient Reinforcement Learning
In this section we discuss a sample-efficient RL method
that converges on optimal locomotion policies within a small
number of robot trials. Our approach leverages two key
insights about human and animal locomotion. In particular,
locomotion is (a) inherently low-dimensional and based on a
small number of motor synergies [8], as well as (b) highly
periodic in nature.
To implement these insights within a reinforcement learning
framework, we build upon the Group Factor Policy Search
(GrouPS) algorithm introduced by Luck et al. [11]. GrouPS
jointly searches for a low-dimensional control policy as well
as a projection matrix Wfor embedding the results into a
high-dimensional control space. It was previously shown [11]
that the algorithm is able to uncover optimal policies after a
few iterations with only hundreds of samples. Group Factor
Policy Search models the joint actions as a(m)
t= (W(m)Z+
M(m)+E(m))φ(s, t)for each time step t of a trajectory and
each m-th group of actions. The matrix Wrepresents the
transformation matrix from the low dimensional to the high
dimensional space (exploitation) and Mthe parameters of the
current mean policy. The entries of the matrices Zand E
are Gaussian distributed with zij N(0,1) for the latent
values and eij N(0, τ 1
m)for the isotropic exploration.
The function φ(s, t)consists of basis functions φi(s, t)and
depends in our experiments only of the time step tand not of
the full state s. In contrast to the work in [11], however, we
incorporate periodicity constraints into the search process by
focusing on periodic feature functions. We use periodic basis
functions over 20 time steps for the control signal, see Fig. 4.
Given a point in time twe compute each control dimension
aiby
ai=X
j
( ˜wij +mij +eij ) sin t
T720+j1
J360(1)
with ˜wij =Pkwikzk j and Jbeing the number of basis
functions in φ(s, t).
GrouPS also takes prior information about potential group-
ings of joints into account when searching for an optimal
transformation matrix W. For our robotic device we used
two groups: the first group consists of the two fin-joints and
the second group of the two base-joints. Thus, we exploit the
symmetry of the design. The number of dimensions of the
manifold was set to three and the rank parameter, controlling
the sparsity and structure of W, to one. The outline of
the algorithm can be found in Algorithm 1. Incorporating
dimensionality reduction, periodicity, and information about
the group structure yields a highly sample-efficient algorithm.
Input: Reward function R(·)and initializations of
parameters. Choose number of latent
dimension nand rank r. Set hyper-parameter
and define groupings of actions.
while reward not converged do
for h=1:H do # Sample H rollouts
for t=1:T do
at=WZφ+Mφ+Eφ
with Z∼ N (0,I)and E∼ N (0,˜τ),
where ˜τ(m)= ˜τ1
mI
Execute action at
Observe and store reward R(τ)
Initialization of q-distribution
while not converged do
Update q(M),q(W),q˜
Z,q(α)and q(˜τ)
M=Eq(M)[M]
W=Eq(W)[W]
α=Eq(α)[α]
˜τ=Eq(˜τ)[˜τ]
Result: Linear weights Mfor the feature vector φ,
representing the final policy. The columns of
Wrepresent the factors of the latent space.
Algorithm 1: Outline of the Group Factor Policy
Search (GrouPS) algorithm as presented in [11].
IV. A FO LD AB LE RO BOT IC SEA TURTLE
With the general methodology established, this section
introduces the design of the robotic device used in this
research. As discussed in Sec. II, our design takes inspiration
from sea turtles. By necessity, the design also conforms to
(a) Basis functions φ(t)1:5. (b) Basis functions φ(t)6:10.
Fig. 4: The sinusoidal basis functions φ(t)used for learning
in this paper. Each basis function is based on a sine curve
and shifted in time. The final policy is based on a linear
combination of these functions.
(a) (b)
Fig. 5: The initial “flat body” design of the robot. The front of
the robot buried into the sand during motion. The body was
later curved
the constraints of the laminate fabrication techniques being
employed – primarily that it is composed of a single planar
layer. The salient aspects of Chelonioid morphology integrated
into our design are described below.
A. Biological Inspiration
The design of our laminate device was primarily inspired
by the anatomy and locomotion of sea turtles. We chose to
focus on the terrestrial locomotion of adult sea turtles, rather
than juveniles or hatchlings, emphasizing the greater load-
bearing capacity and stability of their anatomy and behavior.
There are seven recognized species in Cheloniodea in six
genera [19]. In spite of considerable inter-specific differences
in morphology, all sea turtles use the same set of anatomical
features to generate motion. Specifically, adult sea turtles
support themselves on the radial edge of the forelimbs to (1)
elevate the body (thus reducing or eliminating drag) and (2)
generate forward motion [25]. This unique behavior allows
these large and exceedingly heavy animals (up to 915kg in
Dermochelys coriacea (Vandelli, 1761)1) to move in a stable
and effective manner through granular media [5].
B. Robot Design
Focusing on the turtle’s forelimb for generating locomotion,
the robot form and structure was determined within an iterative
1Pursuant to the International Code of Zoological Nomenclature, the first
mention of any specific epithet will include the full genus and species names
as a binomen (two part name) followed by the author and date of publication
of the name. This is not an in-line reference; it is a part of the name itself
and refers to a particular species-concept as indicated in the description of
the species by that author.
(a) (b) (c) (d)
Fig. 6: Sequence of actions the robotic arm executes in each learning cycle: (a) First the robot under test is located in the
testbed, grasped and then (b) subsequently moved into a resting position. The robotic arm proceeds to (c) smooth the testbed
with a tool. Finally, the robot under test is (d) put into its initial position and the next trajectory is executed.
design cycle. In all designs, the body was suitably broad
to prevent sinking during forward motion, and remained in
contact with the ground at rest. This provides stability while
removing the need for the limbs to bear the weight of the
body at all times. A major benefit of this configuration is
that only the two forelimbs are needed to generate forward
thrust. Transmission of load occurs primarily under tension
(as in muscles), to accommodate the laminate material and to
provide dampening to reduce joint wear. The limbs have 2
rotational degrees of freedom, such that the fins move down
and back into the substrate, while the body moves up and
forward. This two degree of freedom arm was sufficient to
replicate the circular motion of the fins (and particularly of
the radial edge) observed in sea turtles (see [25]).
Initial experiments attempted on early prototypes revealed
a critical design flaw: the anterior end was prone to “plowing”
into the substrate (see Fig. 5). This limitation was solved by
mimicking two features of turtle anatomy. First, the apical
portion of the design is shaped to elevate the body above sand,
with an upturned apex, similar to upturned intergular and gular
scales of the anterior sea turtle plastron (see [19]). Second, the
back end of the body was tapered to reduce drag (as compared
to a rectangular end of equal length).
In the final design cycle, we also sought to mimic and
explore the morphology of the fins. Extant sea turtle species
exhibit a variety of fin shapes and include irregularities seen
on the outer edges, such as scales and claws. These features
are known to be used for terrestrial locomotion by articulating
with the surface directly (rather than being buried in the
substrate) [14, 4]. In order to understand how fin shape affects
locomotion performance, we designed four pairs of fins: two
generated from outlines of sea turtle fins which include all
irregularities (Caretta caretta (Linnaeus, 1758) and Natator
depressus (Garman, 1880), from [19]), and two based on
artificial shapes with no irregularities, as shown in Fig. 7. All
of these were attached to the main body at a position equivalent
to the anatomical location of the humeroradial joint (part of
the elbow in the fin), and scaled to the width of the body.
The arms of the robot were designed such that the fins can
be interchanged at will, allowing for easy comparison of fin
performance.
ABCD
Fig. 7: The four different design of fins used for the presented
robotic device. Designs A and C are accurate reproductions
of the actual shape of sea turtle fins, namely Caretta caretta
(A) and Natator depressus (C). Designs B and D are simple
rectangular and ellipsoid shapes.
V. EXPERIMEN TS
In this section, we focus on evaluating the locomotion
performance of the prototypes generated with our laminate
fabrication process. In particular, the robustness to variations
stemming from the terrain and manufacturing process, and the
sensitivity to changes in the physical fin shape.
More formally, there are three hypotheses that we experi-
mentally evaluate:
H1 Group Factor Policy Search is able to find an improved
locomotion policy – with respect to distance traveled forward
– in a limited number of trials, despite the presence of
variations in the rapidly prototyped robotic device and the
environment.
H2 The shape of the fin influences the performance of the
locomotion policy.
H3 The locomotion policies learned in the natural
environment out-perform those learned in the artificial
environment, when executed in the natural environment.
These hypotheses are tested through the following experi-
ments.
A. Evaluation of Fin Designs
This experiment is designed to evaluate the effectiveness
of locomotion policies generated for the four types of fins
described in Sec. IV. Five independent learning sessions were
conducted for each fin, consisting of 10 policy search iterations
(a) Comparison between fin A and fin B. (b) Comparison between fin C and fin D. (c) Comparison between all fins.
Fig. 8: Comparison between the learning for different fin designs. Each experiment was performed five times and mean/standard
deviations were computed. The learning process was performed on poppy seeds.
each for a total of 1050 policy executions per fin. The
experiment was performed in an indoor, artificial environment
utilizing poppy seeds (similar to [16]) as a granulate material
substitute for sand – they are less abrasive and increase
the longevity of prototypes. Human involvement, and thus
randomness, was minimized during the learning process by
employing an articulated robotic arm (UR5). This arm was
responsible for placing the robot under test in the artificial
environment prior to each policy execution, then subsequently
removing it and resetting the environment with a leveling tool.
This sequence of actions is depicted in Fig. 6.
The policy search reward was automatically computed by
measuring the distance (in pixel values) that a target affixed to
the robot traveled with a standard 2D high-definition webcam.
This was computed from still frames captured before and after
policy execution. After learning, the mean iteration policies
were manually executed and measured in order to produce
metric distance rewards for comparison.
B. Policy Learning in a Desert Environment
The second experiment was designed to test how well
policies transfer between environments, and whether policies
learned in-situ are more effective than policies learned in
other environments. Over the course of two days, the policies
generated for each fin in the artificial environment from the
first experiment were executed in a desert environment in the
Tonto National Forest of Arizona in order to measure their
distance rewards. We opted to create a flattened test bed as
shown in Fig. 9, rather than using untouched ground, in order
to reduce locomotion bias due to inclines, rocks, and plants.
Furthermore, two additional learning sessions were con-
ducted for fins A and C in the same test bed in order
to provide a point of comparison. To maintain consistency
with the first experiment, learning was performed with 10
Policy Search iterations and reward values were measured via
camera. Manually measured distance values for each mean
iteration policy were obtained after learning. A video of the
learning process and supplementary material can be found on
http://www.c-turtle.org.
Fig. 9: The testbed in the Arizona desert used for evaluating
and learning policies in a real environment. The surface of the
testbed is flattened in order to increase comparability between
the values measured for each policy.
VI. RE SU LTS
The rewards achieved by policies learned on poppy seeds are
presented in Figure 8 with their mean and standard deviation
over the conducted experiments. Figure 8 (a) compares the bio-
logically inspired fin A (C. caretta) and the simple rectangular
shape. The second biologically inspired fin C (N. depressus)
and the artificial oval fin can be found in Figure 8 (b), both
with a similar performance. The mean values of the learned
policies are given in Figure 8 (c). The reward in these plots is
given as pixel distances, as recorded by the camera, covered
by the robot with its movement, which means that fin A (C.
caretta) outperforms all other fin designs. On the opposite,
the rectangular shaped fin shows the worst performance. This
can also be seen in Figure 10 which compares the mean and
standard deviation of achieved rewards in the last iteration of
the learning process between the four different fin designs.
Two different fin designs, A (C. caretta) and C (N. de-
pressus), were selected for the comparison between policies
learned on poppy seeds and policies learned in a natural
environment. Figure 11 (a) and (b) show the covered distances
in centimeters for policies learned and executed on poppy
seeds as well as executed in the desert for each iteration.
The third policy for each fin was learned and evaluated in
the desert. It can be seen that the policy learned in the natural
environment outperforms the policies learned on the substitute
in the laboratory environment.
Fig. 10: The mean and standard deviation of policies for each
fin design in the last iteration of the learning process. The
rewards represent the distance the robot moved forward.
A series of images from the executions of the policies are
shown in Figure 12. The pictures show the final position
after execution of policies learned in iteration one, four, six,
eight and ten. The images in Figure 12 (b) and (c) show the
difference in covered length between policies learned on poppy
seeds and the policies learned in the natural environment.
VII. DISCUSSION
The results shown in Fig. 8 and Fig. 11 indicate that for
every fin that underwent learning, in both artificial and natural
environments, the final locomotion policy shows some degree
of improvement with regard to distance traveled by the robot
after only 10 iterations. This supports hypothesis H1 which
postulated that Group Factor Policy Search would find an
improved locomotion policy in a limited number of trials,
despite variations in the environment and fin shape.
However, the results also indicate that some fins clearly
performed better than others. For example, fin B only achieved
a mean pixel reward of 35.2 in the artificial environment, while
fin A saw a mean pixel reward of 141.8, as shown in Fig. 8a.
This supports H2, which hypothesized that the physical shape
of the fin affects locomotion performance.
It is interesting to note that the biologically inspired fins (A
and C) out-performed the artificial fins (B and D) on average.
At least part of this may be due to the intersection of the fin
and the ground when they make contact at an angle, as is the
case in our robotic design. The biological fins have a curved
design which increases the surface area that is in contact with
granulate media when compared to the artificial fins while the
overall surface areas of artificial fins and biologically inspired
fins are comparable to each other. Furthermore, fin B exhibited
significant deformation when in contact with the ground which
likely reduced its effectiveness in producing forward motion.
The results shown in Fig. 11 support hypothesis H3, in
that policies learned in the natural environment outperform
the policies that were learned in the artificial environment. We
reason that part of this discrepancy is due to the composition of
the granulate material. The poppy seeds used in the artificial
environment have an average density of 0.54 g/ml with a –
qualitatively speaking – homogeneous seed size, while the
sand grains in the desert have an average density of 1.46
(a) Comparison between policies learned for fin design C. The al-
gorithm was initialized with the same random number generator for
learning.
(b) Comparison between policies learned for fin design A. The al-
gorithm was initialized with the same random number generator for
learning. Due to a technical issue only pixel distances were recorded
for learning in the desert. For comparability those pixel distances were
transformed into centimeters but are attached with a variance of about
3.5cm.
Fig. 11: Comparison between polices learned on poppy seeds
and executed on poppy seeds (LPS), learned on poppy seeds
and executed in a desert environment, and policies learned and
executed in a desert environment.
g/ml and a heterogeneous grain size. These results indicate
that artificial environments consisting of popular granulate
substitutes, such as poppy seeds, may not yield performance
comparable to the real-world environments that they are
mimicking. Thus, it is not only simulations that can yield
performance discrepancies, but also physical environments.
Additionally, we observed that the composition of the
natural environment itself fluctuated over time. For instance,
we measured a difference in the moisture content of the sand
of nearly 82% between the two days in which we performed
experiments: 1.59% and 0.87% by weight. These factors may
serve to make the target environment difficult to emulate,
and suggest that not only are discrepancies possible between
simulated environments, artificial environments, and actual
environments, but also between the same actual environment
over time. We suspect that lifelong learning might be a
possible solution to this problem.
Yet another interesting observation can be made from the
gaits shown in Fig. 13. The cycle produced by the fin during
a more effective policy extends deeper and further than that
(a) Executions of policies learned on poppy seeds. The start position of the robot was on the wall of the testbed on the left side.
(b) Executions of policies learned on poppy seeds and executed in a real desert environment. The white line shows the start position of the
robot.
(c) Executions of policies learned in a desert environment. The white line shows the start position of the robot.
Fig. 12: Executions of learned policies on poppy seeds and in a real desert environment. Row (a) shows the execution of the
policies learned on poppy seeds which are also executed in a real desert environment in (b). Finally, (c) shows the policies
learned and executed in the desert. For both learning experiments the same initial values and random number generators were
used. The images show the executions of trajectories after 1, 4, 6, 8 and 10 iterations.
Fig. 13: Top: the gait produced by the right fin after iteration
10 with fin A. Bottom: the gait produced by the right fin of
the robot after iteration 3.
produced during a less effective policy. Intuitively, we can
reason that this more effective policy pushes against a larger
volume of sand, generating more force for forward motion.
VIII. CONCLUSION
In this paper, we presented a methodology for rapid proto-
typing of robotic structures for terrestrial locomotion. A com-
bination of laminate robot manufacturing and sample-efficient
reinforcement learning enables re-configuration and adaptation
of both form and function to best fit environmental constraints.
In turn, this approach decreases the amount of time for the
development-production-learning-deployment cycle. With the
presented techniques, it is possible to construct a robot out of
raw material and learn a controller for locomotion in under a
day. We designed a bio-inspired robotic device using the new
methodology and, consequently, conducted an extensive robot
learning study which involved several thousand executions.
The experiment was performed with different sets of fins, both
inside the lab, as well as in the desert of Arizona. Our results
indicate the approach is well-suited for fast adaptation to new
ground.
The results also show that granulates which are commonly
used as a replacement for sand in robotics laboratories may not
be an effective replacement. More specifically, the efficiency
of robot control policies learned on such granulates in the lab-
oratory were not as effective when deployed outside. A variety
of factors such as variability in actuation, energy supply, the
manufacturing process, or the terrain may contribute to this
phenomenon. Consequently, learning and adaptation is of cru-
cial importance. The discussed sample-efficient reinforcement
learning algorithm enabled robots to quickly adapt an existing
policy or learn a new one. Learning time was typically in the
range of 23hours. The results also show that biological
inspiration in the fin design can lead to significant advantages
in the resulting policies, even when learning was employed.
For future work we aim to investigate life-long learning
approaches that do not separate between a training and a
deployment phase. Using an accelerometer, the robot could
continuously calculate rewards and update the control policy.
REFERENCES
[1] Hesam Askari and Ken Kamrin. Intrusion rheology in
grains and other flowable materials. Nature Materials,
15(12):1274–1279, 2016.
[2] Bharat Bhushan. Biomimetics: lessons from nature–an
overview. Philosophical Transactions of the Royal Soci-
ety A: Mathematical, Physical and Engineering Sciences,
367(1893):1445–1486, 2009.
[3] Jonathan E Clark, Jorge G Cham, Sean A Bailey, Ed-
ward M Froehlich, Pratik K Nahata, Robert J Full, and
Mark R Cutkosky. Biomimetic design and fabrication
of a hexapedal running robot. In Proceedings of IEEE
International Conference on Robotics and Automation,
volume 4, pages 3643–3649, 2001.
[4] C Kenneth Dodd Jr. Synopsis of the biological data on
the loggerhead sea turtle caretta caretta (linnaeus 1758).
Technical report, DTIC Document, 1988.
[5] Karen L Eckert and Chris Luginbuhl. Death of a giant.
Marine Turtle Newsletter, 43:2–3, 1988.
[6] Paul S. Gollnick, Spencer P. Magleby, and Larry L.
Howell. An Introduction to Multilayer Lamina Emergent
Mechanisms. Journal of Mechanical Design, 133(8):
081006, 2011. ISSN 10500472.
[7] Philip Holmes, Robert J Full, Dan Koditschek, and John
Guckenheimer. The dynamics of legged locomotion:
Models, analyses, and challenges. Siam Review, 48(2):
207–304, 2006.
[8] Nedialko Krouchev, John F. Kalaska, and Trevor Drew.
Sequential activation of muscle synergies during loco-
motion in the intact cat as revealed by cluster analysis
and direct decomposition. Journal of Neurophysiology,
96(4):1991–2010, 2006. ISSN 0022-3077.
[9] Chen Li, Tingnan Zhang, and Daniel I. Goldman. A
terradynamics of legged locomotion on granular media.
Science, 339:1408–1412, 2013.
[10] Kin-Huat Low, Chunlin Zhou, TW Ong, and Junzhi Yu.
Modular design and initial gait study of an amphibian
robotic turtle. In Robotics and Biomimetics, 2007.
ROBIO 2007. IEEE International Conference on, pages
535–540. IEEE, 2007.
[11] Kevin Sebastian Luck, Joni Pajarinen, Erik Berger, Ville
Kyrki, and Heni Ben Amor. Sparse latent space policy
search. In AAAI, pages 1911–1918, 2016.
[12] Kevin Y Ma, Pakpong Chirarattananon, Sawyer B Fuller,
and Robert J Wood. Controlled flight of a biologically
inspired, insect-scale robot. Science, 340(6132):603–607,
2013.
[13] Ryan D Maladen, Yang Ding, Chen Li, and Daniel I
Goldman. Undulatory swimming in sand: subsurface
locomotion of the sandfish lizard. science, 325(5938):
314–318, 2009.
[14] Nicole Mazouchova, Nick Gravish, Andrei Savu, and
Daniel I Goldman. Utilization of granular solidification
during terrestrial locomotion of hatchling sea turtles.
Biology Letters, 6:398–401, 2010.
[15] Nicole Mazouchova, Paul B Umbanhowar, and Daniel I
Goldman. Flipper-driven terrestrial locomotion of a sea
turtle-inspired robot. Bioinspiration and Biomimetics, 8
(2):026007, 2013.
[16] Nicole Mazouchova, Paul B Umbanhowar, and Daniel I
Goldman. Flipper-driven terrestrial locomotion of a sea
turtle-inspired robot. Bioinspiration & biomimetics, 8(2):
026007, 2013.
[17] Volodymyr Mnih, Koray Kavukcuoglu, David Silver,
Andrei A. Rusu, Joel Veness, Marc G. Bellemare,
Alex Graves, Martin Riedmiller, Andreas K. Fidjeland,
Georg Ostrovski, Stig Petersen, Charles Beattie, Amir
Sadik, Ioannis Antonoglou, Helen King, Dharshan Ku-
maran, Daan Wierstra, Shane Legg, and Demis Hassabis.
Human-level control through deep reinforcement learn-
ing. Nature, 518(7540):529–533, 02 2015.
[18] Robert Playter, Martin Buehler, and Marc Raibert. Big-
dog. In Douglas W. Gage Grant R. Gerhart, Charles
M. Shoemaker, editor, Unmanned Ground Vehicle Tech-
nology VIII, volume 6230 of Proceedings of SPIE, pages
62302O1–62302O6, 2006.
[19] Peter Pritchard and Jeanne Mortimer. Taxonomy, ex-
ternal morphology, and species identification. Research
and management techniques for the conservation of sea
turtles, 4:21, 1999.
[20] Pratheev S Sreetharan, John P Whitney, Mark D Strauss,
and Robert J Wood. Monolithic fabrication of millimeter-
scale machines. Journal of Micromechanics and Micro-
engineering, 22(5):055027, may 2012. ISSN 0960-1317.
[21] Richard S. Sutton and Andrew G. Barto. Introduction to
Reinforcement Learning. MIT Press, Cambridge, MA,
USA, 1st edition, 1998. ISBN 0262193981.
[22] Matthew Tesch, Kevin Lipkin, Isaac Brown, Ross Hatton,
Aaron Peck, Justine Rembisz, and Howie Choset. Pa-
rameterized and scripted gaits for modular snake robots.
Advanced Robotics, 23(9):1131–1158, 2009.
[23] John P Whitney, Pratheev S Sreetharan, Kevin Y Ma,
and Robert J Wood. Pop-up book MEMS. Journal of
Micromechanics and Microengineering, 21(11):115021,
nov 2011. ISSN 0960-1317.
[24] Robert J Wood, Srinath Avadhanula, Ranjana Sahai, Erik
Steltz, and Ronald S Fearing. Microrobot Design Using
Fiber Reinforced Composites. Journal of Mechanical
Design, 130(5):052304, 2008. ISSN 10500472.
[25] Jeanette Wyneken. Sea turtle locomotion: Mechanics,
behavior, and energetics. In Peter L Lutz, editor, The
Biology of Sea Turtles, pages 168–198. CRC Press, 1997.
[26] Guocai Yao, Jianhong Liang, Tianmiao Wang, Xingbang
Yang, Qi Shen, Yucheng Zhang, Hailiang Wu, and We-
icheng Tian. Development of a turtle-like underwater
vehicle using central pattern generator. In Robotics and
Biomimetics (ROBIO), 2013 IEEE International Confer-
ence on, pages 44–49. IEEE, 2013.
... The goal of legged robots in such locomotion tasks is to transform as much electric energy as possible into directional movement [2,3,4,5]. To this end, two approaches exist: 1) optimization of the behavioural policy, and 2) optimization of the robot design, which affects the achievable locomotion efficiency [2,6,7,8]. Policy optimization is, especially in novel or changing environments, often performed using reinforcement learning [8,9]. Design optimization is frequently based on evolutionary algorithms or evolution-inspired and use a population of design prototypes for this process (Fig. 1a) [2,6,10]. ...
... To this end, two approaches exist: 1) optimization of the behavioural policy, and 2) optimization of the robot design, which affects the achievable locomotion efficiency [2,6,7,8]. Policy optimization is, especially in novel or changing environments, often performed using reinforcement learning [8,9]. Design optimization is frequently based on evolutionary algorithms or evolution-inspired and use a population of design prototypes for this process (Fig. 1a) [2,6,10]. ...
Conference Paper
Full-text available
Humans and animals are capable of quickly learning new behaviours to solve new tasks. Yet, we often forget that they also rely on a highly specialized morphology that co-adapted with motor control throughout thousands of years. Although compelling, the idea of co-adapting morphology and behaviours in robots is often unfeasible because of the long manufacturing times, and the need to redesign an appropriate controller for each morphology. In this paper, we propose a novel approach to automatically and efficiently co-adapt a robot morphology and its controller. Our approach is based on recent advances in deep reinforcement learning, and specifically the soft actor critic algorithm. Key to our approach is the possibility of leveraging previously tested morphologies and behaviors to estimate the performance of new candidate morphologies. As such, we can make full use of the information available for making more informed decisions, with the ultimate goal of achieving a more data-efficient co-adaptation (i.e., reducing the number of morphologies and behaviors tested). Simulated experiments show that our approach requires drastically less design prototypes to find good morphology-behaviour combinations, making this method particularly suitable for future co-adaptation of robot designs in the real world.
... In this regard, Its aim is to achieve a goal autonomously by gaining an understanding of the environment which the agent interacts with. Then, this concept is favorably applied to solve a variety of problems and tasks from robotic manipulation [1][2][3] to locomotion and autonomous drivings [4][5][6] which all are considered serious and noticeable. Following the current and remarkable related researches, one of the main challenges is seen as the agent exploring the environment to converge for a near-optimal behavior [7,8]. ...
Article
Full-text available
Reinforcement learning (RL) algorithms with deterministic actors (policy) commonly apply noise to the action space for exploration. These exploration methods are either undirected or require extra knowledge of the environment. In the aim of addressing these fundamental limitations, this paper introduces a parameterized stochastic action-noise policy (as a probability distribution) that correlates with the objectivity of the RL algorithm. This policy is optimized based on state-action values of predicted future states. Consequently, the optimization does not rely on the explicit definition of the reward function which improves the adaptability of this exploration strategy for different environments and algorithms. Moreover, this paper presents a predictive model of system dynamics (transitional probability) with the capacity to capture the uncertainty of the environments with optimal design and fewer parameters. It significantly reduces the model complexity while maintaining the same level of accuracy as current methods. This research evaluates and analyzes the proposed method and models while demonstrating significant increase in performance and reliability across various locomotion and control tasks in comparison with current methods.
... A growing body of work has described tools, employed optimization methods, or proposed new algorithms to help automate the design and/or fabrication of robots (e.g., [1][2][3][4][5]). Important among those approaches are algorithms that aim to manage or minimize resources (for instance, see [6]). ...
Preprint
Reduction of combinatorial filters involves compressing state representations that robots use. Such optimization arises in automating the construction of minimalist robots. But exact combinatorial filter reduction is an NP-complete problem and all current techniques are either inexact or formalized with exponentially many constraints. This paper proposes a new formalization needing only a polynomial number of constraints, and characterizes these constraints in three different forms: nonlinear, linear, and conjunctive normal form. Empirical results show that constraints in conjunctive normal form capture the problem most effectively, leading to a method that outperforms the others. Further examination indicates that a substantial proportion of constraints remain inactive during iterative filter reduction. To leverage this observation, we introduce just-in-time generation of such constraints, which yields improvements in efficiency and has the potential to minimize large filters.
... VII. RELATED WORK Approaches for automated design of robots have been the subject of three recent workshops at RSS and ICRA over the last 3 years [9]. Current research examines aspects of hardware fabrication (e.g., 3D-printing [10] and prototyping [11,12]), interconnection optimization [13], rapid endto-end development and deployment [14,15], automated synthesis (jointly for mechanisms and controllers) from specifications of desired capabilities [16], and optimization subject to functionality-resource interdependencies [17,18]. ...
Preprint
Whether a robot can perform some specific task depends on several aspects, including the robot's sensors and the plans it possesses. We are interested in search algorithms that treat plans and sensor designs jointly, yielding solutions---i.e., plan and sensor characterization pairs---if and only if they exist. Such algorithms can help roboticists explore the space of sensors to aid in making design trade-offs. Generalizing prior work where sensors are modeled abstractly as sensor maps on p-graphs, the present paper increases the potential sensors which can be sought significantly. But doing so enlarges a problem currently on the outer limits of being considered tractable. Toward taming this complexity, two contributions are made: (1) we show how to represent the search space for this more general problem and describe data structures that enable whole sets of sensors to be summarized via a single special representative; (2) we give a means by which other structure (either task domain knowledge, sensor technology or fabrication constraints) can be incorporated to reduce the sets to be enumerated. These lead to algorithms that we have implemented and which suffice to solve particular problem instances, albeit only of small scale. Nevertheless, the algorithm aids in helping understand what attributes sensors must possess and what information they must provide in order to ensure a robot can achieve its goals despite non-determinism.
... Several ideas have been proffered as useful ways to tackle robot design problems. They display great and refreshing variety, including angles on the problem that emphasize fabrication, prototyping and manufacturability [7], [9], [13], [18], [19]; formal methods for (controller and hardware) synthesis of robots [14], [19], [20], [27], [31]; simulators and methods for interactive design [11]; compositional frameworks along with catalogs of components [2], [3], [26]; software for fault tracking and component-based identification [34]; and so on. ...
Chapter
Assuming one wants to design the most cost-effective robot for some task, how difficult is it to choose the robot’s actuators? This paper addresses that question in algorithmic terms, considering the problem of identifying optimal sets of actuation capabilities to allow a robot to complete a given task. We consider various cost functions which model the cost needed to equip a robot with some capabilities, and show that the general form of this problem is NP-hard, confirming what many perhaps have suspected about this sort of design-time optimization. As a result, several questions of interest having both optimality and efficiency of solution is unlikely. However, we also show that, for some specific types of cost functions, the problem is either polynomial time solvable or fixed-parameter tractable.
... Reinforcement learning (RL) methods enabled the development of autonomous systems that can autonomously learn and master a task when provided with an objective function. RL has been successfully applied to a wide range of tasks including flying [24], [17], manipulation [26], [9], [12], [3], [1], locomotion [10], [13], and even autonomous driving [6], [7]. The vast majority of RL algorithms can be classified into the two categories of (a) inherently stochastic or (b) deterministic methods. ...
Preprint
Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.
Article
This article examines the selection of a robot’s actuation and sensing hardware to minimize the cost of that design while ensuring that the robot is capable of carrying out a plan to complete a task. Its primary contribution is in the study of the hardness of reasonable formal models for that minimization problem. Specifically, for the case in which sensing hardware is held fixed, we show that this algorithmic design problem is NP-hard even for particularly simple classes of cost functions, confirming what many perhaps have suspected about this sort of design-time optimization. We also introduce a formalism, based on the notion of label maps, for the broader problem in which the design space encompasses choices for both actuation and sensing components. As a result, for several questions of interest, having both optimality and efficiency of solution is unlikely. However, we also show that, for some specific types of cost functions, the problem is either polynomial-time solvable or fixed-parameter tractable. Note to Practitioners —Despite the primary results being theoretical and, further, taking the form of bad news, this article still has considerable value to practitioners. Specifically, assuming that one has been employing heuristic or approximate solutions to robot design problems, this article serves as a justification for doing so. Moreover, it delineates some circumstances in which one can, in a sense, do better and achieve genuine optima with practical algorithms.
Conference Paper
Full-text available
Computational agents often need to learn policies that involve many control variables, e.g., a robot needs to control several joints simultaneously. Learning a policy with a high number of parameters, however, usually requires a large number of training samples. We introduce a reinforcement learning method for sample-efficient policy search that exploits correlations between control variables. Such correlations are particularly frequent in motor skill learning tasks. The introduced method uses Variational Inference to estimate policy parameters, while at the same time uncovering a low-dimensional latent space of controls. Prior knowledge about the task and the structure of the learning agent can be provided by specifying groups of potentially correlated parameters. This information is then used to impose sparsity constraints on the mapping between the high-dimensional space of controls and a lower-dimensional latent space. In experiments with a simulated bi-manual manipulator, the new approach effectively identifies synergies between joints, performs efficient low-dimensional policy search, and outperforms state-of-the-art policy search methods.
Conference Paper
Full-text available
This paper presents the construction and control of a turtle-like underwater vehicle, which uses four single axis oscillation flippers for swimming and attitude control. The maneuvers are realized by the composition of appropriate angular offset, vibration amplitude and vibration frequency of the flippers. After analyzing the actuated flippers' motion and the swimming gait, a CPG-based (central pattern generator) control architecture is introduced. Such a CPG model has several nice properties, i.e. limit cycle behavior, closed form solution, no discontinuities nor jerks performance. To estimate the utility and stability of the vehicle, experiments were also carried out in the Mochou Lake of Antarctic Zhongshan Station following the Chinese 29th Antarctic expedition. The experimental result proves that the vehicle is applicable to the underwater detection.
Article
Full-text available
We present a design methodology and manufacturing process for the construction of articulated three-dimensional microstructures with features on the micron to centimeter scale. Flexure mechanisms and assembly folds result from the bulk machining and lamination of alternating rigid and compliant layers, similar to rigid-flex printed circuit board construction. Pop-up books and other forms of paper engineering inspire designs consisting of one complex part with a single assembly degree of freedom. Like an unopened pop-up book, mechanism links reside on multiple interconnected layers, reducing interference and allowing folding mechanisms of greater complexity than achievable with a single folding layer. Machined layers are aligned using dowel pins and bonded in parallel. Using mechanical alignment that persists during bonding allows device layers to be anisotropically pre-strained, a feature we exploit to create self-assembling structures. These methods and three example devices are presented.
Article
Full-text available
Flies are among the most agile flying creatures on Earth. To mimic this aerial prowess in a similarly sized robot requires tiny, high-efficiency mechanical components that pose miniaturization challenges governed by force-scaling laws, suggesting unconventional solutions for propulsion, actuation, and manufacturing. To this end, we developed high-power-density piezoelectric flight muscles and a manufacturing methodology capable of rapidly prototyping articulated, flexure-based sub-millimeter mechanisms. We built an 80-milligram, insect-scale, flapping-wing robot modeled loosely on the morphology of flies. Using a modular approach to flight control that relies on limited information about the robot’s dynamics, we demonstrated tethered but unconstrained stable hovering and basic controlled flight maneuvers. The result validates a sufficient suite of innovations for achieving artificial, insect-like flight.
Article
The interaction of intruding objects with deformable materials arises in many contexts, including locomotion in fluids and loose media, impact and penetration problems, and geospace applications. Despite the complex constitutive behaviour of granular media, forces on arbitrarily shaped granular intruders are observed to obey surprisingly simple, yet empirical 'resistive force hypotheses'. The physics of this macroscale reduction, and how it might play out in other media, has however remained elusive. Here, we show that all resistive force hypotheses in grains arise from local frictional yielding, revealing a novel invariance within a class of plasticity models. This mechanical foundation, supported by numerical and experimental validations, leads to a general analytical criterion to determine which rheologies can obey resistive force hypotheses. We use it to explain why viscous fluids are observed to perform worse than grains, and to predict a new family of resistive-force-obeying materials: cohesive media such as pastes, gels and muds.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Article
Multiple-layer lamina emergent mechanisms (MLEMs) are the mechanisms made from multiple sheets (lamina) of material with motion that emerges out of the fabrication plane. Understanding how layers are used in existing products and in nature provides insight into how MLEMs can also use layers to achieve certain tasks. The multilayered nature of MLEMs and the interactions between these layers show how the capabilities of MLEMs are enhanced and allow them to meet specific design objectives. Layer separation is one objective for which MLEMs are well-suited. Layer separation can have a variety of applications and there are a number of different ways to design a MLEM to achieve this objective. [DOI: 10.1115/1.4004542]
Article
Mobile microrobots with characteristic dimensions on the order of 1 cm are difficult to design using either microelectromechanical systems technology or precision machining. This is due to the challenges associated with constructing the high strength links and high-speed, low-loss joints with micron scale features required for such systems. Here, we present an entirely new framework for creating microrobots, which makes novel use of composite materials. This framework includes a new fabrication process termed smart composite microstructures (SCM) for integrating rigid links and large angle flexure joints through a laser micromachining and lamination process. We also present solutions to actuation and integrated wiring issues at this scale using SCM. Along with simple design rules that are customized for this process, our new complete microrobotic framework is a cheaper, quicker, and altogether superior method for creating microrobots that we hope will become the paradigm for robots at this scale.