ArticlePDF AvailableLiterature Review

Modeling of Deformable Objects for Robotic Manipulation: A Tutorial and Review


Abstract and Figures

Manipulation of deformable objects has given rise to an important set of open problems in the field of robotics. Application areas include robotic surgery, household robotics, manufacturing, logistics, and agriculture, to name a few. Related research problems span modeling and estimation of an object's shape, estimation of an object's material properties, such as elasticity and plasticity, object tracking and state estimation during manipulation, and manipulation planning and control. In this survey article, we start by providing a tutorial on foundational aspects of models of shape and shape dynamics. We then use this as the basis for a review of existing work on learning and estimation of these models and on motion planning and control to achieve desired deformations. We also discuss potential future lines of work.
Content may be subject to copyright.
published: 17 September 2020
doi: 10.3389/frobt.2020.00082
Frontiers in Robotics and AI | 1September 2020 | Volume 7 | Article 82
Edited by:
Hakan Karaoguz,
Independent Researcher, Stockholm,
Reviewed by:
Farah Bouakrif,
University of Jijel, Algeria
Dongming Gan,
Purdue University, United States
Veronica E. Arriola-Rios
Puren Guler
Specialty section:
This article was submitted to
Robot and Machine Vision,
a section of the journal
Frontiers in Robotics and AI
Received: 13 February 2020
Accepted: 19 May 2020
Published: 17 September 2020
Arriola-Rios VE, Guler P, Ficuciello F,
Kragic D, Siciliano B and Wyatt JL
(2020) Modeling of Deformable
Objects for Robotic Manipulation: A
Tutorial and Review.
Front. Robot. AI 7:82.
doi: 10.3389/frobt.2020.00082
Modeling of Deformable Objects for
Robotic Manipulation: A Tutorial and
Veronica E. Arriola-Rios 1
*, Puren Guler 2
*, Fanny Ficuciello 3, Danica Kragic 4,
Bruno Siciliano 3and Jeremy L. Wyatt 5
1Department of Mathematics, Faculty of Science, UNAM Universidad Nacional Autonoma de Mexico, Ciudad de México,
Mexico, 2Autonomous Mobile Manipulation Laboratory, Centre for Applied Autonomous Sensor Systems, Orebro University,
Orebro, Sweden, 3PRISMA Laboratory, Department of Electrical Engineering and Information Technology, University of
Naples Federico II, Naples, Italy, 4Robotics, Learning and Perception Laboratory, Centre for Autonomous Systems, EECS,
KTH Royal Institute of Technology, Stockholm, Sweden, 5School of Computer Science, University of Birmingham,
Birmingham, United Kingdom
Manipulation of deformable objects has given rise to an important set of open problems
in the field of robotics. Application areas include robotic surgery, household robotics,
manufacturing, logistics, and agriculture, to name a few. Related research problems
span modeling and estimation of an object’s shape, estimation of an object’s material
properties, such as elasticity and plasticity, object tracking and state estimation during
manipulation, and manipulation planning and control. In this survey article, we start by
providing a tutorial on foundational aspects of models of shape and shape dynamics.
We then use this as the basis for a review of existing work on learning and estimation of
these models and on motion planning and control to achieve desired deformations. We
also discuss potential future lines of work.
Keywords: deformable objects, shape representation, learning of deformation, control of deformable objects,
registration of shape deformation, tracking of deformation
Robotic manipulation work tends to focus on rigid objects (Bohg et al., 2014; Billard and Kragic,
2019). However, most objects manipulated by animals and humans change shape upon contact.
Manipulating a deformable object presents a quite different set of challenges from those that arise
when manipulating a rigid object. For example, forces applied to a rigid object simply sum to
determine the external wrench and, when integrated over time, result in a sequence of rigid body
transformations in SE(3). This is relatively simple dynamic model, albeit still difficult to estimate
for a given object, manipulator, and set of environment contacts.
Forces applied to a deformable body, by contrast, both move the object and change its shape. The
exact combination of deformation and motion depends on the precise material composition. Thus,
material properties become a critical part of the system dynamics, and consequently the underlying
physics of deformation is complex and hard to capture. In addition, the dynamics models typically
employed in high-fidelity mechanical modeling—such as finite element models—while precise,
require detailed knowledge of the material properties, which would be unavailable to a robot
in the wild. Yet such a lack of detailed physics knowledge does not prevent humans and other
animals from performing dexterous manipulation of deformable objects. Consider the way a New
Caledonian crow shapes a tool from a branch (Weir and Kacelnik, 2006) or how a pizzaiolo
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
dexterously transforms a ball of dough into a pizza. Clearly,
robots have a long way to go to match these abilities.
Even though a great deal of work has been done on developing
solutions for each of these stages (Cootes et al., 1992; Montagnat
et al., 2001; Nealen et al., 2006; Moore and Molloy, 2007),
only a few combinations have actually been tried in robotics
to date (Nadon et al., 2018; Sanchez et al., 2018). In contrast
to earlier review articles, this paper consists of both a tutorial
and a review of related work, aimed at newcomers to robotic
modeling and manipulation of deformable objects who need
a quick introduction to the basic methods, which are adopted
from various other fields and backgrounds. We provide, in
a single place, a menu of possible approaches motivated by
computer vision, computer graphics, physics, machine learning,
and other fields, in the hope that readers can find new material
and inspiration for creatively developing their work. We then
review how these methods are applied in practice. In this way,
we take a more holistic approach than existing reviews. Since the
literature in the fields we touch on is abundant, we cannot be
exhaustive and will focus mostly on manipulation of volumetric
solid objects.
This review is motivated by a future goal in the form of an
ideal scenario. In this scenario a general purpose robot would
be capable of perceiving shape,dynamics, and necessary material
properties (e.g., elasticity, plasticity) of deformable objects to
implement manipulation strategies relying on planning and
control methods. Currently no robot has all these capabilities.
FIGURE 1 | A general purpose robot must be capable of: (i) perceiving and segmenting the object from the scene; (ii) tracking the object’s motion; (iii) predicting the
object’s behavior; (iv) planning and controlling new manipulation strategies based on the predictions; (v) selecting the appropriate model for each task, since all
models for shape and dynamics have limitations; and (vi) learning new models for previously unknown shapes and materials.
Therefore, we decompose the problem space into five main parts
that work like pieces in a puzzle: Because representational choices
are fundamental, we explain, in a tutorial style, (1) the modeling
of shape (section 2) and (2) the modeling of deformation
dynamics (section 3) to provide the reader with the necessary
mathematical background; then we discuss, in survey form,
(3) learning and estimation of the parameters of these models that
are related to deformability of objects (e.g., material properties,
such as elasticity, or shape properties, such as resolution of a
mesh; section 4), (4) the application of the models to perception
and prediction, and (5) planning and control of manipulation
actions (section 5), since these topics build on the models
explained in (1) and (2) and there is such a wide range
of different approaches that it would be impossible to cover
them all in depth. Figure 1 shows our guideline processing
stream: (i) The robot perceives the object, segments it from
its environment, and selects an adequate representation for its
shape and intended task; the desired type of representation
will determine which algorithms must be used to recognize the
object. (ii) As the robot interacts with the object, it deforms the
object and must register the deformation by modifying the shape
representation accordingly; while doing so, it can make use of
tracking/registration techniques or enhanced predictive tracking
(which requires a model of the dynamics). (iii) A suitable model
for the dynamics is selected to predict new configurations of
the shape representation as the robot interacts with the object.
(iv) Information from the previous stages is used to integrate
Frontiers in Robotics and AI | 2September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
FIGURE 2 | Relationships between shape, dynamics, learning models, and control methodologies as they have been used in the literature. Colored arrows indicate
subcategories, while black arrows show when a methodology in one level has been used for the next one.
Frontiers in Robotics and AI | 3September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
the inputs and rules for the control strategies. Finally, all this
information can be fed into learning algorithms capable of
increasing the repertoire of known objects. At this stage, there
are three types of strategies: (a) estimating new parameters
directly (which is rarely viable); (b) calibrating known physics-
based models automatically by allowing the robot to take some
measurements that will help to determine the value of model
parameters; and (c) approximating new functions that describe
the dynamics, as is done with neural networks.
Figure 2 shows the connections between the different models
covered in this review as they have been used in the publications
mentioned. Table 1 gives a summary of the publications
discussed in each section. The following describes the notation
that will be used throughout the paper:
Scalars: italic lower-case letters, e.g., x,y,z.
Vectors: bold lower-case letters, e.g., p= {x,y,z}T.
Matrices: bold upper-case letters, e.g., R.
TABLE 1 | Publication summary based on some papers from each section.
Shape Implicit Algebraic Gascuel, 1993; Kumar et al., 1995; Jaklic et al., 2000
Level set Sethian, 1997; Cremers, 2006; Sun et al., 2008
Eigenmodes Cootes et al., 1995; Blake et al., 1998; Leventon et al., 2000; Tsai et al., 2001; Cremers, 2006
Parameterized Splines de Boor, 1976; Catmull and Clack, 1978; Kass et al., 1988; Gibson and Mirtich, 1997; Unser,
1999; Cordero Valle and Cortes Parejo, 2003; Sederberg et al., 2003; Maraffi, 2004; Song and
Bai, 2008; Prasad et al., 2010
Modal decomposition Szekely et al., 1995
Free-form Sederberg and Parry, 1986; Moore and Molloy, 2007
Multigrid Xian et al., 2019
Discrete Meshes Delingette, 1999; Montagnat et al., 2001; Arvanitis et al., 2019
Skeletons Schaefer and Yuksel, 2007
Templates Yuille et al., 1992; Basri et al., 1998; Ravishankar et al., 2008; Arriola-Rios et al., 2013; Gallardo
et al., 2020
Landmarks Blake et al., 1998; Cootes and Taylor, 2004
Particles Nealen et al., 2006
Cloud of points Cretu et al., 2009; Newcombe and Davison, 2010; Martínez et al., 2019; Makovetskii et al., 2020
Dynamics Particle-based Particle systems Tonnesen and Terzopoulos, 2000
Mass-spring systems Bianchi et al., 2004; Teschner et al., 2004; Morris and Salisbury, 2008; Schulman et al., 2013;
Arriola-Rios and Wyatt, 2017
Neural networks Nurnberger et al., 1998; Zhang et al., 2019
Position-based Müller et al., 2005; Zhu et al., 2008; Tian et al., 2013; Macklin et al., 2014; Sidorov and Marshall,
2014; Guler et al., 2017; Romeo et al., 2020
Constitutive FEM Essa et al., 1992; Frank et al., 2014; Petit et al., 2018
FVM Teran et al., 2003; Barth et al., 2018
FDM Terzopoulos et al., 1987
BEM Greminger and Nelson, 2008
LEM Balaniuk and Salisbury, 2002
Approximations Modal analysis Pentland and Williams, 1989; Barbi ˇ
c and James, 2005; Fulton et al., 2019
Active contours Kass et al., 1988; Ahlberg, 1996; Nisirat, 2019
Learning Discrete Gelder, 1998
Exhaustive search Guler et al., 2015
Iterative methods Teschner et al., 2004; Frank et al., 2014
Genetic algorithms Bianchi et al., 2004
Neural networks Cretu et al., 2012
Probability Risholm et al., 2010; Schulman et al., 2013
Control and
Planning Model-based Gopalakrishnan and Goldberg, 2004; Das and Sarkar, 2011; Frank et al., 2014
Data-driven Mira et al., 2015; Li et al., 2016
Control Model-based Largilliere et al., 2015; Lin et al., 2015; Zaidi et al., 2017; Ficuciello et al., 2018
Sensor-based Wada et al., 2001; Smolen and Patriciu, 2009; Berenson, 2013; Navarro-Alarcon et al., 2016;
Delgado et al., 2017b; Hu et al., 2019; Cherubini et al., 2020
Frontiers in Robotics and AI | 4September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
FIGURE 3 | (A) Algebraic surfaces can represent basic and complex shapes, such as circles or the tangled cube; a limited but useful set of deformations is
straightforward to define, while other deformations that are not so intuitive tend to be used in combination with skeletons and skinning techniques. (B) Parametric
surfaces can represent a wider variety of shapes and their great flexibility allows deformation to be controlled with more intuitive parameters; therefore their use in
connection with models of dynamics is considerable. (C) Level set curves can show the evolution of a 2D shape in time; the intersection of a time plane with the
surface in 3D is a level curve, which represents the 2D contour shape of the object at a given time t. In this example cos2(r+c) is the mathematical representation of
the 2D contour shape’s evolution through time.
Scalar functions whose range is R: italic lower-case letters
followed by parentheses, e.g., f(·).
Vector functions whose range is Rnwith n>1: bold italic
letters followed by parentheses, e.g., S(·).
Sets: calligraphic letters, e.g., S.
Number sets: blackboard bold upper-case letters, e.g., R,N.
The initial problem of manipulating a deformable object is
to perceive and segment the object shape from the scene as
it deforms (Figure 1i). The main difficulty with this problem
is the number of degrees of freedom required to model the
object shape. The expressiveness, accuracy, and flexibility of
the model can ease the modeling of the dynamics or make it
more difficult in different scenarios. For this reason, this section
introduces a variety of models from mathematics, computer
graphics, and computer vision for representation of the shape of
deformable objects.
2.1. Implicit Curves and Surfaces
An implicit curve or surface of dimension n1 is generally
defined as the zero set of a function f:RnR,
Sf= {pRn|f(p)=0}, (1)
where pis a coordinate in an n-dimensional space in which
the surface is embedded (Montagnat et al., 2001). Therefore,
the set Sfdefines the surface formed by all points pin Rn
such that the function f, when evaluated at p, is equal to
zero. The implicit function is used to locate surface points by
solving the equation f(p)=0 (Figure 3A). In robotics, nis
usually 3 to represent Cartesian coordinates, but sometimes an
extra dimension can be used to represent time. Representations
that fall within this category are explained in the rest of
this subsection.
2.1.1. Algebraic Curves and Surfaces
Algebraic curves and surfaces satisfy (1) with f(p) being
a polynomial. First-degree polynomials define planes and
hyperplanes; second-degree polynomials define conics, which
include circles, ellipses, parabolas, and hyperbolas, and
quadrics, which include ellipsoids, paraboloids, hyperboloids,
toroids, cones, and cylinders; their (n1)-dimensional
extensions are surfaces in an n-dimensional space that satisfy
the equation
f(p)=pTAp +bp +c=0 (2)
where p= {x1,x2,...,xn}TRnis a column vector, pT
denotes the transpose of p(a row vector), ARn×nis a
matrix, bRnis a row vector, and cis a scalar constant. Note
that all the aforementioned shapes are included as particular
cases of this definition. To define a particular shape, which
satisfies a given set of constraints, the constant values in A,b,
and cmust be determined; for example, with n=2, the
constants that define a circle passing through a given set of
three points can be found by solving the system of three
equations where f(pi)=0 for all iand f(pi) is a second-
degree polynomial. In some contexts the same equation can
be rewritten to facilitate this estimation; for example, it is easy
to determine the circle centered at (xc,yc) with radius rif the
second-degree polynomial is written as (xxc)2+(yyc)2=r2
with p= {x,y}.
Superquadrics are defined by second-degree polynomials,
while hyperquadrics are the most general form and allow the
Frontiers in Robotics and AI | 5September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
representation of complex non-symmetric shapes; they are given
by the equation
|aix+biy+ciz+di|γi=1 (3)
where p= {x,y,z}TR3,Nis an arbitrary number of planes
whose intersection surrounds the object, a,b,c, and dare shape
parameters of these planes, and γiRwith γi0 for all i.
Kumar et al. (1995) presented a method for fitting hyperquadrics
to very complex deformable shapes registered as range data. A
model like (3) could be used as the base representation for a
model of the dynamics if the deformations are small. It can be
used in combination with local surface deformations to represent
a wider set of surfaces. For example, if the superquadric is the set
of points Qthat satisfy the corresponding equation, a deformed
model could be given by S=c+R(Q+d) where crepresents the
inertial center of the superquadric, Ris a rotation matrix, and d
is a vectorial displacement field. Other types of deformation can
be defined as well (Montagnat et al., 2001). For more information
about superquadrics, see Jaklic et al. (2000).
Algebraic curves, surfaces, and volumes can be used as 1D, 2D,
and 3D skeletons, or to represent objects of similar shapes and
deformations (Gascuel, 1993). They are easy to deform in certain
cases, but the types of deformations that are straightforward
to apply are very limited, such as matrix transformations that
bend, pitch, twist, stretch, or translate all the points on a curve
or the space in which the curve is embedded. For this reason,
algebraic curves and surfaces are more suited to representing
articulated or semi-articulated objects. Objects can also be
composed of several algebraic curves, where each component
is easy to manipulate with this representation. To model more
complex deformed shapes, Raposo and Gomes (2019) introduced
products of primitive algebraic surfaces, such as spheres and
cylinders, which enable both local and global deformations to
be controlled more easily than in traditional algebraic shape
models. Moreover, these models can be combined with skinning
techniques to emulate soft deformable objects and also have
parameterized representations.
2.1.2. Level Set Methods
In level set methods, the deformable model is embedded in a
higher-dimensional space, where the extra dimension represents
time (Sethian, 1997; Montagnat et al., 2001). A hypersurface
9is defined by 9(p, 0) =dist(p,S0), where S0is the
initial surface and dist can be the signed Euclidean distance
between a point pand the surface. The distance is positive if
the point lies outside the surface and negative otherwise. The
evolution of the surface Sis governed by the partial differential
equation 9t+ |∇9|F=0 involving the function 9(p,t)
and a speed function F, which determines the speed at which
each point of the surface must be moved. Thus, the function
9evolves in time and the current shape corresponds to the
surface given by 9(p,t)=0 (Figure 3C). The function 9
could have any form, such as a parameterized one, 9=
9(x(u,v), y(u,v), z(u,v), t), with the surface still defined in
implicit form, 9(x(u,v), y(u,v), z(u,v), t)=0. Unfortunately, it
could also be that 9does not even have an algebraic expression
and may need to be approximated with numerical methods.
The main advantage of level set methods is that they allow
changes of surface topology implicitly. The set Smay split into
several connected components, or several distinct components
may merge, while 9remains a function.
In computer vision applications, such as human tracking and
medical imaging, level set methods have been used successfully
in tracking deformable objects (Sethian, 1997; Cremers, 2006).
For example, Sun et al. (2008) recursively segmented deformable
objects across a sequence of frames using a low-dimensional
modal representation and applied their technique to left
ventricular segmentation across a cardiac cycle. The dynamics
are represented using a distance level set function, whose
representation is simplified using principal component analysis
(PCA). Sun et al. used methods of particle-based smoothing as
well as non-parametric belief propagation on a loopy graphical
model capturing the temporal periodicity of the heart, with the
objective being to estimate the current state of the object not only
from the data observed at that instant but also from predictions
based on past and future boundary estimates. Even though we did
not find examples of this method being used in robotics, it seems
a suitable candidate since it models the shape change through
time implicitly and would thus allow the robot to keep track of
the evolving shape of an object during manipulation.
2.1.3. Gaussian Principal Component Eigenmodes
This kind of representation is valid when the types of
deformations can be described with a single mathematical
formulation. Given a representative set SN= {S0,S1,...,SN1}
of the types of surface deformation that objects can undergo, it is
possible to use PCA to detect the main modes of deformation
(i.e., the eigenmodes 8n) and thus re-express the shapes as
a linear combination of those modes. Hence a new shape
estimation can be done using
where 8n(nN) is the largest eigenmode of shape variations in
SN,Sµis the mean of the representative set of shapes SN, and αis
a set of coefficients. Such an eigenmode representation is useful
for dealing with missing or misleading information (e.g., noise
or occlusions) coming from sensory data while constructing the
shape of the object (Cootes et al., 1995; Blake et al., 1998; Cremers,
2006; Sinha et al., 2019).
Employing combinations of previously cited methods,
Leventon et al. (2000) used eigenmode representation with
level set curves to segment images, such as medical images
of the femur and corpus callosum, by defining a probability
distribution over the variances of a set of training shapes. The
segmentation process embeds an initial curve as the zero level
set of a higher-dimensional surface, and then evolves the surface
such that the zero level set converges on the boundary of the
object to be segmented. At each step of the surface evolution, the
maximum a posteriori position and shape of the object in the
image were estimated based on the prior shape information and
the image information. The surface was then evolved globally
Frontiers in Robotics and AI | 6September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
FIGURE 4 | (A) Two Bezier curves with control points; displacing one control point (e.g., p2) affects the entire curve. In this case, the line that joins the control points
p1and p2is the tangent of the polynomial at p1; and the line that joins p3and p4is the tangent at p4.(B) Two enchained Bezier curves; by placing p3,p4, and p5on
a line it is possible to make the curve C1at p4.(C) Free-form deformation of a spline; the black circles form a lattice in a new local coordinate system, the red circles
show the deformed lattice, and the red spline is the result of shifting the blue spline in accordance with this deformation.
toward the maximum a posteriori estimate and locally based on
image gradients and curvature. The results were demonstrated
on synthetic data and medical images, in both 2D and 3D. Tsai
et al. (2001) further developed this idea.
2.2. Explicit Parameterized
Explicit parameterized representations are evaluated directly
from their functional definition (Figure 3B). In 3D, they are
of the form C(u)= {x(u), y(u), z(u)}Tfor curves and
S(u,v)= {x(u,v), y(u,v), z(u,v)}Tfor surfaces, where uand v
are parameters. The curve or surface is traced out as the values
of the parameters are varied. For example, if u[0, 1], the shape
is traced out as uvaries from 0 to 1, as happens with the circle
y=sin(u). (5)
It is common practice to parameterize a curve by time tif it will
represent a trajectory, or by arc length l1.
2.2.1. Splines
A mathematical spline Sis a piecewise-defined real function, with
kpolynomial pieces si(u) parameterized by u[u0,uk], used to
represent curves or surfaces (Cordero Valle and Cortes Parejo,
2003). The order nof the spline corresponds to the highest order
of the polynomials. The values u0,u1,...,uk1,ukwhere the
polynomial pieces connect are called knots.
Frequently, for a spline of order n,Sis required to be
differentiable up to order n1, that is, to be Cn1at knots and
Ceverywhere else. However, it is also possible to reduce its
differentiability to take into account discontinuities.
In general, any spline function S(u) of order nwith knots
u0,...,ukcan be expressed as
pjsj(u), (6)
1The arc length is the distance between a starting point αon the curve and the
current point β. For example, the length of a 1D curve embedded in 3D space,
parameterized by u, is given by L=Rβ
αp˙x2+ ˙y2+ ˙z2du, where ˙x=x
similarly for ˙yand ˙z.
where the coefficients pjare interpreted geometrically as the
coordinates of the control points that determine the shape of the
spline, and
sj(u)=(uuj)nfor j=1, ...,k,
sk+j(u)=uj1for j=1, ..., (n+1) (7)
constitute a basis for the space of all spline functions with knots
u0,...,uk, called the power basis. This space of functions is a
(k+(n+1))-dimensional linear space. By using other bases, a
large family of spline variations is generated (Gibson and Mirtich,
1997); the most important ones are the following.
Bezier splines have each segment being a Bezier curve given by
i(u), (8)
i(1 u)niui, (9)
where each piis a control point, the Bn
iare the Bernstein
polynomials of degree n,n
iare the binomial coefficients,
and u[0, 1]. The curve passes through its first and last
control points, p0and pn, and remains close to the control
polygon obtained by joining all the control points, in order,
with straight lines. Also, at its extremes it is tangent to the
line segment defined by p0p1and pn1pn. It is easy to add and
remove control points from a Bezier curve, but displacing one
causes the entire curve to change, which is why usually only
third-degree polynomial segments are used (see Figure 4).
A two-dimensional Bezier surface is obtained as the tensor
product of two Bezier curves:
j(v)pi,j. (10)
B-splines are more stable, since changes to the positions
of control points induce only local changes around that
control point, and the polynomials pass through the control
points (de Boor, 1976). They are particularly suitable for 3D
Frontiers in Robotics and AI | 7September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
reconstructions. Song and Bai (2008) show how they can be
used to fill holes and smooth surfaces originally captured as
dense clouds of points, while producing a much more compact
and manipulable representation.
Catmull-Clark surfaces approximate points lying on a mesh
of arbitrary topology (Catmull and Clack, 1978).
Non-uniform rational B-splines (NURBS) are notable
because they can represent circles, ellipses, spheres, and other
curves which, in spite of their commonness and simplicity,
cannot be represented by polynomial splines. They achieve this
by introducing quotients of polynomials in the basis2.
T-splines and non-uniform rational Catmull-Clarck
surfaces with T-junctions (T-NURCCs) allow for high-
resolution representations of 3D deformable objects with a
highly reduced number of faces and improved representations
of joints between surface patches, by introducing connections
with the shape of a T between edges of the shape (Sederberg
et al., 2003).
Splines are a very flexible tool for representing all sorts
of deformable shapes. They are extremely useful for signal
and image processing (Unser, 1999) as well as for computer
animation (Maraffi, 2004) and shape reconstruction of 2D and
3D deformable objects (Song and Bai, 2008; Prasad et al., 2010).
Active contours (see section 3.4.2 and Kass et al., 1988), also
known as snakes, are splines governed by an energy function that
introduces dynamic elements to the shape representation and are
used for tasks, such as object segmentation (Marcos et al., 2018;
Chen et al., 2019; Hatamizadeh et al., 2019).
The advantage of splines is that compact representations of
deformable objects can be built on them in accordance with
the complexity of their shape at each time. When new corners
or points of high curvature appear, more control points can be
added for an adequate representation, and if the shape becomes
simplified these points can be removed. However, this flexibility
also makes splines sensitive to noise and can lead to computation
of erroneous deformations. Hence, learning the dynamics of such
a representation is difficult with current learning algorithms, and
a general solution remains an open problem.
2.2.2. Modal Decompositions
In modal decomposition, a curve or a surface is expressed as
the sum of terms in a basis, whose elements correspond to
frequency harmonics. The sum of the first modes constituting
the surface gives a good rough approximation of its shape, which
becomes more detailed as more modes are included (Montagnat
et al., 2001). Among methods for modal decomposition, Fourier
decomposition is in widespread use. A curve may be represented
as a sum of sinusoidal terms and a surface as a combination of
spherical harmonics Ym
l(θ,ϕ) which are explicitly parameterized:
m= −l
l(θ,ϕ), (11)
where rlis a normalization factor for Yand the cm
lare constants.
It is also possible to use other bases that may be more suitable for
other shapes, such as surfaces homeomorphic to a sphere, torus,
cylinder, or plane.
Modal decomposition has mostly been used in model-based
segmentation and recognition of 2D and 3D medical images
(Szekely et al., 1995). Its main advantage is that it creates a
compact and easy-to-manipulate representation of objects whose
shape can be described as a linear combination of a few dominant
modes of deformation. The disadvantage of such methods is that
it is easy for them to miss details in objects, such as small dents,
because shapes are approximated by a limited number of terms.
2.3. Free-Forms
Free-form deformation is a method whereby the space in
which a figure is embedded is deformed according to a set of
control points of the deformation (Moore and Molloy, 2007;
see Figure 4C). It can be used to deform primitives, such as
planes, quadrics, parametric surface patches, or implicitly defined
surfaces. The deformation can be applied either globally or
locally. A local coordinate system is defined using a parallelpiped
so that the coordinates inside it are p= {x1,x2,x3}with 0 <xi<
1 for all i. A set of control points pijk lie on a lattice. When they are
displaced from their original positions, they define a deformation
of the original space with new coordinates p. The new position
of any coordinate is interpolated by applying a transformation
formula that maps pinto p. For some transformations it is
enough to estimate the new coordinates of the nodes of a mesh
or control points of a spline with respect to the new positions
of the control points of the deformed space, and the rest of
the shape will follow them, as in Sederberg and Parry (1986),
where a trivariate tensor product Bernstein polynomial was
proposed as the transformation function. Loosely related are
multigrid representations, which also allow for local management
of deformation (Xian et al., 2019).
2.4. Discrete Representations
Discrete representations contain only a finite fixed number
of key elements describing them, mainly points and lines.
Representations that fall into this category include the following:
Meshes are collections of vertices connected through edges
that form a graph. Common shapes for their faces are triangles
(triangulations), quadrilaterals, and hexagons for surfaces, and
tetrahedrons for volumes. A special case consists of the simplex
meshes, which have a constant vertex connectivity. This type
of shape representation permits smooth deformations in a
simple and efficient manner (Delingette, 1999; Montagnat
et al., 2001). Therefore, meshes are used for various tasks, such
as 3D object recognition (e.g., Madi et al., 2019) and simulation
of the dynamics of deformable objects (see section 3.3.1) with
efficient coding (Arvanitis et al., 2019).
Skeletons are made of rigid edges connected by joints that
allow bending. The position and deformation of elements
attached to the skeleton are defined with respect to their
assigned bone. Skeletons tend to be used together with the
method known as skinning, where a deformable surface
Frontiers in Robotics and AI | 8September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
is attached to the bone and softens the visual appearance
of the articulated joints through interpolation techniques.
By nature skeletons are designed to model articulated
deformations. Schaefer and Yuksel (2007) proposed a
method to automatically extract skeletons by detecting
articulated deformations.
Deformable templates are parameterized representations of a
shape that specify key values of salient features in an image.
They deform with greater ease around these features. The
features can be peaks and valleys in the image intensity, edges,
and the intensity itself, as well as points of high curvature3
(sharp turns or bends; see Yuille et al., 1992; Basri et al., 1998).
Deformable templates are mainly used for object recognition
and object tracking (e.g., Ravishankar et al., 2008; Xia et al.,
2019; Gallardo et al., 2020). They support the particular
relevance of critical points in modeling deformations, which
could make them key to developing a robot’s ability to generate
its own optimal representation of a deformable object. The
use of deformable templates is explored and illustrated by
experiments on natural and artificial agents in Arriola-Rios
et al. (2013).
Landmark points are points that would remain stable across
deformations. They can be corners, T-junctions, or points of
high curvature. For example, when a rectangular sponge is
pushed, its corners will still be corners after the deformation,
while the point of contact with the external force will become
a point of high curvature during the process and will remain
as such; these are all stable points. In particular, landmark
points can correspond to control points of splines (Blake et al.,
1998). Methods for further processing, such as the application
of deformations, can work more efficiently if they focus only
(or mainly) on landmark points rather than on the whole
representation (Cootes and Taylor, 2004).
Particles are idealized zero-dimensional dots. Their positions
are specified as a vector function parameterized by time, P(t).
They can store a set of attributes, such as mass, temperature,
shape (for visualization purposes), age, lifetime, and so on.
These attributes influence the dynamical behavior of the
particles over time and are subject to change due to procedural
stochastic processes. The particles can pass through three
different phases during their lifetime: generation, dynamics,
and death. However, manipulating them and maintaining
constraints, such as boundaries between them can become
non-trivial. For this reason, particles are used mainly to
represent gases or visual effects in animations, where the
interaction between them is very limited (Nealen et al., 2006).
Clouds of points are formed by large collections of
coordinates that belong on a surface. They are frequently
obtained from 3D scanned data and may include the color
of each point. A typical problem consists in reconstructing
3D surfaces from such clouds (Newcombe and Davison,
2010; Makovetskii et al., 2020). Cretu et al. (2009) gives
a comparative review of several methods for efficiently
3The curvature of a function is defined as κ=dφ
ds where φis the tangential angle,
with tan φ=dy
dx , and sis the arc length as defined earlier.
processing clouds of points and introduces the use of self-
organizing neural gas networks for this purpose. Clouds of
points can be structured and enriched with orientations of the
3D normals, as well as other feature descriptors for perceptual
applications (Martínez et al., 2019).
After an object’s shape is defined as described in section 2, a
suitable model of the dynamics can be used to register and predict
deformations as a robot interacts with the object (Figures 1ii,iii).
In this section, we introduce some of the most commonly used
models from different fields (e.g., computer graphics; see Gibson
and Mirtich, 1997; Nealen et al., 2006; Moore and Molloy, 2007;
Bender et al., 2014) for predicting the dynamics of deformable
objects. In robotics, the important features used to select an
appropriate model are computational complexity (e.g., for real-
time perception and manipulation), physical accuracy or visual
plausibility, and simplicity or intuitiveness (i.e., the ability to
implement simple cases easily and to be built on iteratively to
accommodate more complex cases). Therefore, we divide the
models into three classes: (1) particle-based models, which are
usually computationally efficient and intuitive but physically
not very accurate; (2) constitutive models, which are physically
accurate but computationally complex and not very intuitive;
and (3) approximations of constitutive models, which aim to
decrease the computational complexity of constitutive models
through approximations.
3.1. Background Knowledge of
First, we briefly review some background information about the
physics and dynamics of deformation. Initially, the object is in
a rest shape S0. In the discrete case it could be S0= {p0
i} ∈ Rn=3,iN}where Nis the number of points
constituting the shape of the object. Then, when an external
force fext acts on the object, such as gravitational force or force
applied by a manipulator, the object deforms and its points move
to a new position pnew. In physics-based models, the resulting
deformation is typically defined using a displacement vector field
u=pnew p0. From this displacement, the deformation can be
computed through the stress σ(i.e., the force applied per area of
the object shape) and the strain ǫ(i.e., the ratio of deformation to
the original size of the object shape). The stress tensor σis usually
calculated for each point on the object shape using Hooke’s law,
σ=Eǫ, where ǫcan be calculated as ǫ=1
2(u+ ∇uT) with u
denoting the spatial derivative of the displacement field,
Eis a tensor that is dependent on the real physical material
properties of the object, such as Young’s modulus Eand Poisson’s
ratio υ. These properties are parameters in constitutive models
Frontiers in Robotics and AI | 9September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
FIGURE 5 | (A) A simple circular deformable object represented in 2D with N=7 particles in the particle system; the particles are in static equilibrium (initial rest
state). (B) When the distance rbetween the particles iand jincreases, new forces exerted on iare calculated and the particles move. (C) A mass-spring model of a
simple cubic deformable object with N=8 particles in 3D; the particles are connected with structural and shear springs that enable the object to resist longitudinal
and shear deformations. (D) In Nurnberger et al. (1998), the neurons of a recurrent neural network represent the mass nodes (light blue) and springs (gray) of a
mass-spring model; the activation functions between the neurons of the network were devised to reproduce the mass-spring system equations.
(e.g., finite element models). Constitutive models describe strain-
stress relationships as the response of materials (e.g., elastic
or plastic) to different loads (e.g., forces applied) based on
the material properties; they are commonly used to simulate
deformation because of their high physical accuracy.
To simulate the dynamical behavior of deformation over time,
Newton’s second law of motion is employed. Let pt
ibe the
position of particle iat time t:
i= ˙pt
i= ˙vt
i, (13)
where mi,fextt
iR3, and vtR3are respectively
the mass, external forces, acceleration, and velocity at time t,
and ˙pt
1tand ˙vt
1tare first-order time
derivatives of the position and velocity, respectively, which are
approximated using finite differences. Then, according to these
derivative approximations, in each time step 1tthe points
move according to a time integration scheme. The simplest such
scheme is explicit Euler integration:
i1t, (14)
mfexti1t, (15)
where vt
iand pt
iare the velocity and position of point iat time t.
We remark that explicit Euler integration can cause problems,
such as unrealistic deformation behavior (e.g., overshooting).
There are other more stable integration schemes (e.g., implicit
integration, Verlet, Runge-Kutta; see Hauth et al., 2003) that can
be used.
Such a dynamical model can be represented simply as
G(S0,fext,θ), where input to the model Gincludes the initial
state of points p0S0of the object and the external forces
fext;θrepresents model parameters that could be related to
material properties (e.g., E,υ), as in constitutive models, to
simulate desired deformations. Then, within G, the deformation
is computed and the points ptare iterated to time state tusing an
integration scheme, such as (14) and (15).
3.2. Particle-Based Models
3.2.1. Particle Systems
In a particle system, a solid object shape Sis represented as a
collection of Nparticles (see section 2.4). These particles are
initially in an equilibrium position, p0
i= {x0
i} ∈ R3,
which can be regarded as the initial coordinates of each particle
i∈ {1, ...,N}(Figure 5A). When an external force is applied,
the object deforms and the particles move to new coordinates
ibased on physics laws, in particular Newton’s second law of
motion (13), according to a time integration scheme, such as (14)
and (15) (Figure 5B).
Although particles are usually used to model objects, such
as clouds or liquids, there are also particle frameworks for
simulation of solids. These frameworks are based on so-called
dynamically coupled particles that represent the volume of an
object (Tonnesen and Terzopoulos, 2000). The advantage of
particle systems is their simplicity, which allows simulation of
a huge number of particles to represent complex scenes. A
disadvantage of particle systems is that the surface is not explicitly
defined. Therefore, maintaining the initial shape of the deforming
object is difficult, and this can be problematic for applications,
such as tracking the return of elastic objects to their original
shape after deformation during robotic manipulation. Hence, for
objects that are supposed to maintain a given structure, particle-
based models with fixed particle couplings are more appropriate,
such as models that employ meshes for shape representation.
3.2.2. Mass-Spring Systems
Mass-spring (MS) models use meshes for shape representation
(see section 2.4). In such a model Nparticles are connected by
a network of springs (Figure 5C). As in particle systems, particle
motion is simulated using Newton’s second law of motion (13).
Frontiers in Robotics and AI | 10 September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
However, there are other forces between the connected particles,
say iand j, that affect their motion, in particular the spring force
fs(pi)=ks(|(pjpi)| − lij)(pjpi)
|(pjpi)|, where ksis the spring’s
stiffness and lij is the rest length of the spring, and the damping
force fd(p)=kd(vjvi) of the spring, where kdis the damping
coefficient. Then, the equation of motion (13) becomes
miai=fext(pi)+fd(pi)+fs(pi). (16)
For the entire particle system, this can be expressed in matrix
form as
Ma +Dv +Ku =fext, (17)
where MR3N×3N,DR3N×3N, and KR3N×3Nare a
diagonal mass matrix, a diagonal damping matrix, and a stiffness
matrix for n=3 dimensions. The MS system can be represented
as a model GMS(S0,fext ,θ), where input to GMS consists of the
initial state of the mesh shape p0S0, the external forces fext,
and the model parameters θ= {ks,kd}, which can be changed
(tuned) to determine object deformability.
MS systems are a widely used type of physics-based
model for predicting and tracking object states during robotic
manipulation, since they are intuitive and computationally
efficient (Schulman et al., 2013). However, the spring constants
are difficult to tune according to the material properties to
obtain the desired deformation behavior. One way to overcome
this tuning problem is to use learning algorithms and reference
solutions (Bianchi et al., 2004; Morris and Salisbury, 2008;
Arriola-Rios and Wyatt, 2017). Another disadvantage of MS
models is that they cannot directly simulate volumetric effects,
such as volume conservation in its basic formulation. To simulate
such effects, Teschner et al. (2004) introduced additional energy
formulations. Also, the behavior of an MS model is affected
by the directions in which the springs are placed; to deal
with this issue, Bourguignon and Cani (2000) added virtual
springs to compensate for this effect. In addition, Xu et al.
(2018) proposed a new method by introducing extra elastic
forces into the traditional MS model to integrate more complex
mechanical behaviors, such as viscoelasticity, non-linearity,
and incompressibility.
3.2.3. Neural Networks
Nurnberger et al. (1998) designed a method for controlling the
dynamics of an MS model using a recurrent neural network
(NN). Different types of neurons are used to represent the
positions p, velocities v, and accelerations aof the mass points
(nodes) and the springs (spring nodes) of the mesh shape S
(Figure 5D). The differential equations governing the behavior
of the MS system are codified in the structure of the network.
The spring functions are used as activation functions for the
corresponding neurons. The whole system poses the simulation
as a problem of minimization of energy. The information is
propagated to the neurons in stages, starting from the mass points
where the applied force is greatest, and an equilibrium point
must be reached to obtain the new configuration of the nodes
at each time t. The training is carried out with gradient descent
(backpropagation) for the NN. In addition, Zhang et al. (2019)
employed a convolutional neural network (CNN) to model
propagation of mechanical load using the Poisson equation
rather than an MS model.
The advantage of using an NN to control deformation
is the method’s flexibility, such as being able to modify the
network structure during simulation (e.g., by removing springs
as in Nurnberger et al., 1998) and simulate large deformations
efficiently (e.g., Zhang et al., 2019).
3.2.4. Position-Based Dynamics
Particle systems and MS models are force-based models where,
based on given forces, the velocities and positions of particles are
determined by a time integration scheme. In contrast, position-
based dynamics (PBD) models compute the positions directly by
applying geometrical constraints in each simulation step. PBD
methods can be used for various purposes, such as simulating
liquids, gases, and melting or visco-elastic objects undergoing
topological changes (Bender et al., 2014). Here we focus on a
special PBD method, called meshless shape matching (MSM; see
Müller et al., 2005), that is used to simulate volumetric solid
objects while preserving their topological shape.
FIGURE 6 | (A) Meshless shape matching applied to a simple object consisting of N=4 particles. (B).1 The method estimates the optimal linear transformation A
that allows the particles to move to the actual deformed positions pwith respect to the rest state p0, as in (C).(B).2 Then Ais decomposed into a rotational (rigid) part
Rand a symmetric (deformation) part S.(B).3 The Rtransformation is used to simulate rigid motion; to simulate deformation, Aand Rare combined using a
parameter, β.
Frontiers in Robotics and AI | 11 September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
In MSM, as in particle systems, an object is represented
by a set of Nparticles without any connectivity (Figure 6A).
Since there is no connectivity information between the particles,
when they are disturbed by external forces (Figure 6B) they
tend to adopt a configuration that does not respect the original
shape p0of the object. We call this disturbed configuration the
intermediate deformed shape, ¯pi=pt1
i+ ¯vt
i1t, where ¯vi=
i+fext 1t
mi. MSM calculates an optimal linear transformation,
A=(Pimiriqi)(Pimiqiqi)1, between the initial shape p0and
the intermediate deformed shape ¯pthat allows preservation of
the original shape of the object; here ri= ¯pi− ¯cand qi=
ic0, with c=1
imipibeing the center of mass of
the object (Figure 6B.1–3). Then, the linear transformation Ais
separated into rotational and symmetric parts: A=RS where R
represents rigid behavior and Srepresents deformable behavior.
Hence, to simulate rigid behavior, the goal (actual) position of the
particles is
where t=cis the translation of the object. If the object is
deformable, then Sis also included and the goal position is
pi=(R((1 β)IβS))qi+t
=((1 β)R+βA)qi+t,(19)
where βis a control parameter that determines the degree of
deformation coming from the Smatrix. If β=0, (19) becomes
(18). If βapproaches 1, then the range of deformation increases.
Subsequently, using the following integration scheme, the new
position and velocity at time tare updated:
i= ¯pi+α(pi− ¯pi), (20)
i)/1t, (21)
where αaffects the stiffness of the model (similar to the
MS model) and determines the speed of convergence of the
intermediate positions to the goal positions. In simplest form this
model can be represented as GMSM(S0,fext ,θ), where input to the
model GMSM consists of the initial state S0, external forces fext,
and model parameters θ= {β,α}, which can be tuned to decide
the range of deformability and stiffness of the model.
The main advantages of PBD methods are their simplicity,
computational and memory-wise (i.e., not needing a mesh
model) efficiency, and scalability owing to their particle-based
parallel nature. Also, they are able to calculate more visually
plausible deformations than MS models. Hence, they have been
used in a wide range of interactive graphical applications (Tian
et al., 2013; Macklin et al., 2014), particularly for modeling
the deformation of human body parts (Zhu et al., 2008;
Sidorov and Marshall, 2014; Romeo et al., 2020), and robotic
manipulation tasks (Caccamo et al., 2016; Guler et al., 2017).
A disadvantage of PBD methods is that they simulate physical
deformation less accurately than constitutive models, since they
are geometrically motivated.
3.3. Constitutive Models
To simulate more physically accurate deformations, constitutive
models, which incorporate real physical material properties,
are used. In this subsection, we start by introducing
the most commonly used constitutive models, namely
finite element models, and then briefly mention other
models that simplify finite element models to increase
computational efficiency.
3.3.1. Finite Element Method
The finite element method (FEM) aims to approximate the true
physical behavior of a deformable object by dividing its body into
smaller and simpler parts called finite elements. These elements
are connected through Nnodes that make up an irregular grid
mesh (Figure 7A). Thus, instead of particles, we work with
node displacements. The mesh deformation is calculated through
the displacement vector field u. For simulation, an equation of
motion similar to (17) is used for an entire mesh. Usually, to
decrease the computational complexity, the dynamical parts of
the equation are skipped and the deformation is calculated for a
static state in equilibrium (a=v=0). Then, the relationship
FIGURE 7 | (A) An irregular grid as the mesh of a cubic deformed object in 3D using the finite element method (left) and an element eof the mesh with its Ne=4
nodes (right); the arrows show the displacement fields u1= {u1,x,u1,y,u1,z}at node i=1, and ueis the nodal displacement vector of the element e.(B) An element
with node jand forces applied on three adjacent faces in the finite volume model. (C) A regular discrete mesh to be used in calculations of the finite difference method.
Frontiers in Robotics and AI | 12 September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
between a finite element eand its Nenodes (e.g., Ne=4 for a
tetrahedron as in Figure 7A) can be expressed as
where feR3×Necontains the Nenodal forces, ueR3×Ne
is the displacement of an element between the actual and the
deformed positions, and KeR3Ne×3Neis the stiffness matrix of
the element. The stiffness matrices of the different elements are
assembled into a single matrix KR3N×3Nfor an entire mesh
with Nnodes:
Ke, (23)
Ku =f. (24)
The matrix Krelies on the nodal displacement uR3×Nand the
constitutive material properties (e.g., Eand υ) to compute the
nodal forces fR3×Nof the entire mesh. Therefore, this huge
matrix Kshould be calculated at every time step t.
The FEM can be represented as a model GFEM(S0,fext ,θ),
which takes as input the initial state S0, external forces fext, and
constitutive model parameters θ, such as Eand υ. Then, within
model GFEM, the new positions and velocities of the nodes are
updated using a time integration scheme, such as (14) and (15).
The FEM can produce physically realistic simulations and
model complex deformed configurations. Owing to these
properties, FEM models have been used in many robotics
applications, such as tracking (Essa et al., 1992; Petit et al.,
2018; Sengupta et al., 2019) and planning manipulation around
deformable objects (Frank et al., 2014). However, they can have a
heavy computational burden due to re-evaluating Kat each time
step. This can be avoided by using linear FEM models where K
in (24) stays constant4. The drawback is that this assumption
limits the model to simulating only small deformations. Other
methods that can decrease computational complexity, such as
co-rotational FEM (Müller and Gross, 2004), can also be used.
3.3.2. Finite Volume Method
In the finite volume method (FVM), instead of calculating the
nodal forces of the mesh shape Sindividually as in FEM, the
force per unit area with respect to a certain plane orientation
is calculated. This is done by using the constitutive law for the
computation of the stress tensor σ(section 3.1). Then, the total
force acting on face iof a finite element can be calculated using
the formula
where Aiis a scalar representing the area of face iand niis its
normal vector. To calculate the nodal forces, the forces of surfaces
adjacent to node jare summed and distributed evenly to each
node (Teran et al., 2003; see Figure 7B):
fj= 1
n(fA1+fA2+fA3). (26)
4For a detailed tutorial on how to efficiently compute the Kmatrix, we direct the
reader to the notes of Müller et al. (2008).
The FVM model can be represented as GFVM(S0,fext ,θ), which
takes as input the initial state S0, in which areas Aand normals
ncan be calculated, the external forces fext, and the constitutive
model parameters Eand υ. Then, within model GFVM, the new
positions pof nodes of S0are updated at each time step tusing
a time integration scheme. Since this method is computationally
more efficient than the FEM, it has been used in many computer
graphics applications (Barth et al., 2018; Cardiff and Demirdži´
2018). However, it restricts the types of deformation that can be
simulated, such as the deformation of irregular meshes.
3.3.3. Finite Difference Method
In the finite difference method (FDM), the volume of the object
is defined as a regular M×N×Pdiscrete mesh of nodes with
horizontal, vertical, and stacked inter-node spacings h1,h2, and
h3, respectively (Figure 7C). The nodes are indexed as [m,n,p]
where 1 mM(parallel to the x-axis), 1 nN
(parallel to the y-axis), and 1 pP(parallel to the z-axis),
and pm,n,pR3is the position of the node in 3D space. The
object is deformed when an external force is applied. To calculate
the nodal forces, a displacement vector ushould be calculated
using spatial derivatives. This is done by defining finite difference
operators between the new node positions in the deformed mesh.
For example, for pm,n,pthe first-order finite difference operator
along the x-axis can be defined as dx(pm,n,p)=(pm+1,n,p
pm,n,p)/h1. Using the finite difference operators, the nodal forces
are calculated and the deformation of the object can be computed
as in the FEM (section 3.3.1).
The FDM is one of the alternative methods suggested
for decreasing the computational complexity of the FEM
(Terzopoulos et al., 1987). A disadvantage of this method is that
it is more difficult to approximate the boundaries of objects using
a regular grid for the mesh (Nealen et al., 2006), and hence the
accuracy is decreased.
3.3.4. Boundary Element Method
The boundary element method (BEM) computes the
deformation of Sby calculating the equation of motion
(17) over a surface rather than over a volume as in the FEM.
The boundary (surface) Sis discretized into a set of Nnon-
overlapping elements (e.g., mesh elements) e, whose node
coordinates pi,i=1, ...,N, are the centroids of the elements.
These elements represent displacements and tractions, and Su
and Srare surface parts where the displacement and traction
boundary conditions are defined, respectively.
The BEM provides a significant speedup compared to the
FEM because it requires fewer nodes and elements. However, it
only works for objects whose interior consists of homogeneous
material. It has been used in the ArtDefo System (James and Pai,
1999) to simulate volumetric models in real-time. Also, it has
been used to improve tracking accuracy against occlusions and
spurious edges in (Greminger and Nelson, 2008).
3.3.5. Long Elements Method
In the long elements method (LEM), a solid object is considered
to be filled with incompressible fluid as in biological tissues. The
volume of object shape Sis discretized into Cartesian meshes (i.e.,
Frontiers in Robotics and AI | 13 September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
FIGURE 8 | (A) In the boundary element method, the deformable object’s shape Sis discretized into a set of elements e;Sis disturbed by a force that displaces the
centroid node coordinates p, and the deformation is calculated based on the displacement and traction boundary conditions defined on Suand Sr, respectively. (B) A
virtual object discretized as Cartesian meshes according to the long elements method. Here, for the sake of simplicity, we show a 2D object with only two axes (iand
j). On the left and right figures are meshes of the object discretized with long elements Liand Ljparallel to the iand jaxes, respectively. External pressure is applied at
a particular point on the surface of the object, and the resulting force fon the particle (red dot) is calculated using the deformations 1liof long elements crossing
the particle.
one mesh for each axis), and each mesh contains long elements
(LEs) e∈ {1, ...,Ni}, where Niis the number of elements
in the mesh shape Sidiscretized parallel to axis i(Figure 8B).
The crossings of the LEs of the different axes define cells, each
of which contains a particle. By calculating the state of these
particles (e.g., position pand velocity v), the deformation of the
object is simulated.
The state of each particle is calculated using the laws of fluid
mechanics (e.g., Pascal’s law). Then, a system of linear equations
for each ethat fills the object volume is created. By solving
this system of equations by numerical methods, the deformation
of e,1li, is calculated. From 1lithe forces occurring due to
deformation are computed. Here, the LE is regarded as a spring
attached to a particle with known mass. As an example, in
Figure 8B, pressure is applied to an object. As a result of this
pressure, a force fiacts on the particle along the ith axis and is
calculated using the displacements of the crossing LEs attached
to the particle:
fi=kLi(1li1li)+kLj(1lj1lj) (27)
where kis the spring constant. Therefore, an LEM model
can be represented as GLEM(Si,Sj,fext ,θ), which takes as input
meshes of axes iand jfor 2D space, external forces fext, and
model parameters, such as spring constants that determine
object deformability. Subsequently, the force obtained is used
to calculate the velocities and positions of the particles along
each axis.
The LEM was developed for modeling soft tissues, especially
for surgical simulation (Balaniuk and Salisbury, 2002). It uses
a smaller number of elements than tetrahedral (e.g., FEM)
and cubic (e.g., FD) meshing, so the computational complexity
of the model is reduced as well. It is therefore capable
of interactive real-time soft tissue simulation for haptic and
graphic applications, such as robotic surgery. However, it
provides only an approximation of real physical deformation
and so presents a trade-off between physical accuracy and
computational efficiency.
3.4. Approximations of Constitutive Models
3.4.1. Modal Analysis
What makes constitutive models, such as the FEM expensive is
calculation of the motion with large matrices M,D, and Kin
Equation (17); for example, with N=20 nodal points of a mesh
shape p=(x,y,z), the calculation would involve three matrices
of size 60 ×60. Pentland and Williams (1989) proposed a way
of reducing this computational complexity based on a method
called modal analysis. Modal analysis is used for identifying an
object’s vibrational modes (Figure 9A) by decoupling (17). This
is done by using linear algebraic formulations (Nealen et al.,
2006). Below, we outline the steps of modal analysis using these
formulations, while skipping the detailed derivations.
First, the matrices are diagonalized by solving the following
eigenvalue problem (i.e., whitening transition):
M83 =K8, (28)
where 3and 8are matrices containing the eigenvalues and
eigenvectors of MK1. Then, the eigenvectors of 8are used to
transform the displacement vector u:
u=8q. (29)
By substituting (29) into (17) and multiplying by 8T, the
following system of equations is constructed:
8TM8¨q+8TD8˙q+8TK8q=8Tfext, (30)
M¨q+ ∇D˙q+ ∇Kq = ∇fext, (31)
where M,D, and Kare all diagonal matrices. This generates
3Nindependent equations of motion for the modes:
Mi¨qi+ ∇Di˙qi+ ∇Kiqi= ∇fexti. (32)
Frontiers in Robotics and AI | 14 September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
FIGURE 9 | (A) A simple 2D rectangular object in different deformation modes in modal analysis: the upper shape is a deformation mode in response to compression,
and the middle and lower shapes are deformation modes in response to bending forces from different directions. (B) Active contour S(u) controlled by external (Eext )
and shape (Eshape) energies that attract or repel according to the shape of the object (e.g., in an image); as the object deforms, the position of S(u) is updated iteratively.
Then, (32) can be solved analytically for qito compute the
motion of each mode i∈ {1, 2, ...,N}. The matrix 8contains
a different mode shape in each of its columns (Figure 9A), i.e.,
8=[81,82,...,8N]. Hence, by analyzing the eigenvalues in
3, high-frequency modes in 8can be eliminated so that only the
most dominant modes are updated using (32). This reduces the
number of equations in (32) and hence lowers the computational
cost significantly.
In modal analysis, as in constitutive models, taking Kto be
constant can increase the computational efficiency. However,
this assumption is valid only when simulating small linear
deformations and leads to errors when dynamically simulating
large deformations. To overcome this problem, some methods
use different formulations of the strain tensor (e.g., the Green
strain as in Barbiˇ
c and James, 2005) to enable simulation of larger
non-linear deformations (An et al., 2008; Pan and Manocha,
2018), or adopt more data-driven approaches (e.g., by employing
CNNs as in Fulton et al., 2019).
3.4.2. Active Contours
Active contour (AC) models are approximations of constitutive
models, such as the FEM. They were first introduced by Kass et al.
(1988) in the form of the snakes model. In their simplest form
they can be described as a function of a spline shape (section
2.2.1), S(u)Rnfor u[0, 1], in an n-dimensional space,
for example S(u)= {x(u), y(u)} ∈ R2in an image I:R2R
(Figure 9B). This spline is fitted to the shape of an object in the
image by minimizing the following energy formulation:
Esnake =Z1
Eext(S(u)) +Eshape (S(u)) du. (33)
Here Eext depends on the contour position with respect to an
attractor function f:
Eext =f(S(u)), (34)
where in the n=2 case the f(x,y) function could be the image
intensity I(x,y), which would attract the snake to the brightest
regions, or an edge detector, which would attract the snake to the
edges (Moore and Molloy, 2007).
In (33), Eshape is the internal energy of the contour, which
depends on the shape of the contour:
Eshape =α(n)|S(u)2| + β(i)|S′′(u)|2. (35)
The first-order derivative |S(u)2|controls the length of the
contour, and the goal is to minimize the total length. The second-
order derivative |S′′(u)|2controls the smoothness of the contour;
this term enables the contour to resist stretching or bending
by external forces due to f(S(u)) and is used to regularize the
contour. The weight parameters αand βdetermine elasticity and
rigidity, respectively.
The energy Esnake can be discretized into Nparts as si(u) where
the ui=ih, for i∈ {1, ...,N}, are knots and h=1
snake =
Eext(si(u)) +Eshape (si(u)). (36)
Then, from the discretization, the derivatives in Eshape can be
approximated using finite difference operators:
To find the contour that minimizes the total energy, E
snake is
minimized. The resulting expression is then put into matrix form
and used to update the position of the contour iteratively in time
by using a time integration scheme as demonstrated in Kass et al.
(1988). To represent an AC in 3D, some additional parameters are
included in the shape energy formulation: the elasticity parameter
Frontiers in Robotics and AI | 15 September 2020 | Volume 7 | Article 82
Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation
FIGURE 10 | The general schema for learning: (A) A type of model and ground truth are selected, such as 2D images used as ground truth to calibrate a mass-spring
model. (B) If the model’s parameters have not been calibrated correctly, there will be a difference between the observed ground truth and the simulation; the observed
ground truth is shown in red, the simulation in blue, and their intersection in yellow. (C) When the model is properly calibrated, it should be able to predict the behavior
of the object if given the corresponding interaction parameters, such as external forces (e.g., contact forces of the manipulator) or geometric constraints (e.g., not
crossing the floor).
βis defined along the third axis as well, and an extra parameter is
added to control the resistance to twisting (Ahlberg, 1996).
AC models have been widely used, especially in medical
imaging, for motion tracking and shape registration tasks
(Williams and Shah, 1992; Leventon et al., 2000; Das and
Banerjee, 2004), and they can also be combined with constitutive
models to achieve greater physical accuracy (Luo and Nelson,
2001). The main disadvantage of AC models is their reliance on a
good initialization of the snake contour near the desired shape in
the image. To overcome this drawback, attractor functions other
than image intensity, such as edge maps, have been proposed in
recent years (e.g., Nisirat, 2019).
This concludes our tutorial-style description of models to
provide some technical grounding in the basic mathematical
approaches to deformable object modeling. The content of
sections 4 (learning and estimation) and 5 (planning and control)
builds on the models we have described. These sections are
written in the form of broad surveys, as there is such a wide range
of different approaches that we cannot cover them all in depth.
In the previous sections, we introduced computational models of
deformable objects that have numerous applications. However,
for the models to be useful, several parameters must be
known beforehand (Figure 1vi), so these models should be
calibrated carefully. In this section we give an overview of some
representative cases of applying various learning algorithms,
which can make the calibration process autonomous (Figure 10).
The methods we review can be grouped into three types of
strategies: (a) estimating parameters directly, which is rarely
feasible (section 4.1); (b) calibrating known physics-based
models, G, automatically by allowing the robot to take some
measurements that will help to determine the values of model
parameters, θ(section 4.2.2); and (c) approximating new
functions that describe the dynamics, as is done with neural
networks (section 4.2.4).
4.1. Direct Estimation
For some models, it is possible to derive a formula to directly
calculate the parameters. For example, Gelder (1998) obtained a
formula for the parameter ksof an MS model (section 3.2.2) in
a static state, where the materials are non-uniform but isotropic.
An isotropic material is a material whose local deformation in
response to force is independent of the direction in which the
force is applied. However, in a non-uniform material the response
varies with the position where the force is applied. For 3D
tetrahedral meshes, the spring constant kscan be obtained from
the formula
where the sum is over the volume Veof a triangular element eof
a 3D mesh shape S0on its edge c. Young’s modulus, E, is chosen
empirically to give the desired amount of elasticity.
Direct estimation is a computationally efficient method.
However, often it is not possible to do such calculations for
models that rely on complex constitutive material laws as in
the FEM.
4.2. Minimizing Error
This group of methods relies on the definition of an error
function Err(pθ,ˆp)=dist(pθ,ˆp) that measures the difference
(e.g., Euclidean distance) between the deformation of some
ground truth ˆpand the simulated virtual deformable object
position pθG(S0,fext,pc,θ), where pcis the point of
contact. The ground truth ˆpcan be obtained from camera
observations of a real-world deformable object or from another,
more reliable, simulation, usually an FEM simulation. Then, pθ
Sθis simulated with various θvalues and the same interaction
parameters as in the ground truth observations, such as the
contact forces fext and positions pcSθof the manipulator
or geometric constraints (boundary conditions) like the object
not crossing the bottom surface. The objective of the learning
algorithm is to find a set of parameters θfor the model G
that minimizes the error function Err(pθ,ˆp) with the given
interaction parameters.
Frontiers in Robotics and AI | 16 September 2020 | Volume 7 | Article 82