Available via license: CC BY 4.0

Content may be subject to copyright.

REVIEW

published: 17 September 2020

doi: 10.3389/frobt.2020.00082

Frontiers in Robotics and AI | www.frontiersin.org 1September 2020 | Volume 7 | Article 82

Edited by:

Hakan Karaoguz,

Independent Researcher, Stockholm,

Sweden

Reviewed by:

Farah Bouakrif,

University of Jijel, Algeria

Dongming Gan,

Purdue University, United States

*Correspondence:

Veronica E. Arriola-Rios

v.arriola@ciencias.unam.mx

Puren Guler

puren.guler@oru.se

Specialty section:

This article was submitted to

Robot and Machine Vision,

a section of the journal

Frontiers in Robotics and AI

Received: 13 February 2020

Accepted: 19 May 2020

Published: 17 September 2020

Citation:

Arriola-Rios VE, Guler P, Ficuciello F,

Kragic D, Siciliano B and Wyatt JL

(2020) Modeling of Deformable

Objects for Robotic Manipulation: A

Tutorial and Review.

Front. Robot. AI 7:82.

doi: 10.3389/frobt.2020.00082

Modeling of Deformable Objects for

Robotic Manipulation: A Tutorial and

Review

Veronica E. Arriola-Rios 1

*, Puren Guler 2

*, Fanny Ficuciello 3, Danica Kragic 4,

Bruno Siciliano 3and Jeremy L. Wyatt 5

1Department of Mathematics, Faculty of Science, UNAM Universidad Nacional Autonoma de Mexico, Ciudad de México,

Mexico, 2Autonomous Mobile Manipulation Laboratory, Centre for Applied Autonomous Sensor Systems, Orebro University,

Orebro, Sweden, 3PRISMA Laboratory, Department of Electrical Engineering and Information Technology, University of

Naples Federico II, Naples, Italy, 4Robotics, Learning and Perception Laboratory, Centre for Autonomous Systems, EECS,

KTH Royal Institute of Technology, Stockholm, Sweden, 5School of Computer Science, University of Birmingham,

Birmingham, United Kingdom

Manipulation of deformable objects has given rise to an important set of open problems

in the ﬁeld of robotics. Application areas include robotic surgery, household robotics,

manufacturing, logistics, and agriculture, to name a few. Related research problems

span modeling and estimation of an object’s shape, estimation of an object’s material

properties, such as elasticity and plasticity, object tracking and state estimation during

manipulation, and manipulation planning and control. In this survey article, we start by

providing a tutorial on foundational aspects of models of shape and shape dynamics.

We then use this as the basis for a review of existing work on learning and estimation of

these models and on motion planning and control to achieve desired deformations. We

also discuss potential future lines of work.

Keywords: deformable objects, shape representation, learning of deformation, control of deformable objects,

registration of shape deformation, tracking of deformation

1. INTRODUCTION

Robotic manipulation work tends to focus on rigid objects (Bohg et al., 2014; Billard and Kragic,

2019). However, most objects manipulated by animals and humans change shape upon contact.

Manipulating a deformable object presents a quite diﬀerent set of challenges from those that arise

when manipulating a rigid object. For example, forces applied to a rigid object simply sum to

determine the external wrench and, when integrated over time, result in a sequence of rigid body

transformations in SE(3). This is relatively simple dynamic model, albeit still diﬃcult to estimate

for a given object, manipulator, and set of environment contacts.

Forces applied to a deformable body, by contrast, both move the object and change its shape. The

exact combination of deformation and motion depends on the precise material composition. Thus,

material properties become a critical part of the system dynamics, and consequently the underlying

physics of deformation is complex and hard to capture. In addition, the dynamics models typically

employed in high-ﬁdelity mechanical modeling—such as ﬁnite element models—while precise,

require detailed knowledge of the material properties, which would be unavailable to a robot

in the wild. Yet such a lack of detailed physics knowledge does not prevent humans and other

animals from performing dexterous manipulation of deformable objects. Consider the way a New

Caledonian crow shapes a tool from a branch (Weir and Kacelnik, 2006) or how a pizzaiolo

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

dexterously transforms a ball of dough into a pizza. Clearly,

robots have a long way to go to match these abilities.

Even though a great deal of work has been done on developing

solutions for each of these stages (Cootes et al., 1992; Montagnat

et al., 2001; Nealen et al., 2006; Moore and Molloy, 2007),

only a few combinations have actually been tried in robotics

to date (Nadon et al., 2018; Sanchez et al., 2018). In contrast

to earlier review articles, this paper consists of both a tutorial

and a review of related work, aimed at newcomers to robotic

modeling and manipulation of deformable objects who need

a quick introduction to the basic methods, which are adopted

from various other ﬁelds and backgrounds. We provide, in

a single place, a menu of possible approaches motivated by

computer vision, computer graphics, physics, machine learning,

and other ﬁelds, in the hope that readers can ﬁnd new material

and inspiration for creatively developing their work. We then

review how these methods are applied in practice. In this way,

we take a more holistic approach than existing reviews. Since the

literature in the ﬁelds we touch on is abundant, we cannot be

exhaustive and will focus mostly on manipulation of volumetric

solid objects.

This review is motivated by a future goal in the form of an

ideal scenario. In this scenario a general purpose robot would

be capable of perceiving shape,dynamics, and necessary material

properties (e.g., elasticity, plasticity) of deformable objects to

implement manipulation strategies relying on planning and

control methods. Currently no robot has all these capabilities.

FIGURE 1 | A general purpose robot must be capable of: (i) perceiving and segmenting the object from the scene; (ii) tracking the object’s motion; (iii) predicting the

object’s behavior; (iv) planning and controlling new manipulation strategies based on the predictions; (v) selecting the appropriate model for each task, since all

models for shape and dynamics have limitations; and (vi) learning new models for previously unknown shapes and materials.

Therefore, we decompose the problem space into ﬁve main parts

that work like pieces in a puzzle: Because representational choices

are fundamental, we explain, in a tutorial style, (1) the modeling

of shape (section 2) and (2) the modeling of deformation

dynamics (section 3) to provide the reader with the necessary

mathematical background; then we discuss, in survey form,

(3) learning and estimation of the parameters of these models that

are related to deformability of objects (e.g., material properties,

such as elasticity, or shape properties, such as resolution of a

mesh; section 4), (4) the application of the models to perception

and prediction, and (5) planning and control of manipulation

actions (section 5), since these topics build on the models

explained in (1) and (2) and there is such a wide range

of diﬀerent approaches that it would be impossible to cover

them all in depth. Figure 1 shows our guideline processing

stream: (i) The robot perceives the object, segments it from

its environment, and selects an adequate representation for its

shape and intended task; the desired type of representation

will determine which algorithms must be used to recognize the

object. (ii) As the robot interacts with the object, it deforms the

object and must register the deformation by modifying the shape

representation accordingly; while doing so, it can make use of

tracking/registration techniques or enhanced predictive tracking

(which requires a model of the dynamics). (iii) A suitable model

for the dynamics is selected to predict new conﬁgurations of

the shape representation as the robot interacts with the object.

(iv) Information from the previous stages is used to integrate

Frontiers in Robotics and AI | www.frontiersin.org 2September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

FIGURE 2 | Relationships between shape, dynamics, learning models, and control methodologies as they have been used in the literature. Colored arrows indicate

subcategories, while black arrows show when a methodology in one level has been used for the next one.

Frontiers in Robotics and AI | www.frontiersin.org 3September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

the inputs and rules for the control strategies. Finally, all this

information can be fed into learning algorithms capable of

increasing the repertoire of known objects. At this stage, there

are three types of strategies: (a) estimating new parameters

directly (which is rarely viable); (b) calibrating known physics-

based models automatically by allowing the robot to take some

measurements that will help to determine the value of model

parameters; and (c) approximating new functions that describe

the dynamics, as is done with neural networks.

Figure 2 shows the connections between the diﬀerent models

covered in this review as they have been used in the publications

mentioned. Table 1 gives a summary of the publications

discussed in each section. The following describes the notation

that will be used throughout the paper:

•Scalars: italic lower-case letters, e.g., x,y,z.

•Vectors: bold lower-case letters, e.g., p= {x,y,z}T.

•Matrices: bold upper-case letters, e.g., R.

TABLE 1 | Publication summary based on some papers from each section.

Shape Implicit Algebraic Gascuel, 1993; Kumar et al., 1995; Jaklic et al., 2000

Level set Sethian, 1997; Cremers, 2006; Sun et al., 2008

Eigenmodes Cootes et al., 1995; Blake et al., 1998; Leventon et al., 2000; Tsai et al., 2001; Cremers, 2006

Parameterized Splines de Boor, 1976; Catmull and Clack, 1978; Kass et al., 1988; Gibson and Mirtich, 1997; Unser,

1999; Cordero Valle and Cortes Parejo, 2003; Sederberg et al., 2003; Marafﬁ, 2004; Song and

Bai, 2008; Prasad et al., 2010

Modal decomposition Szekely et al., 1995

Free-form Sederberg and Parry, 1986; Moore and Molloy, 2007

Multigrid Xian et al., 2019

Discrete Meshes Delingette, 1999; Montagnat et al., 2001; Arvanitis et al., 2019

Skeletons Schaefer and Yuksel, 2007

Templates Yuille et al., 1992; Basri et al., 1998; Ravishankar et al., 2008; Arriola-Rios et al., 2013; Gallardo

et al., 2020

Landmarks Blake et al., 1998; Cootes and Taylor, 2004

Particles Nealen et al., 2006

Cloud of points Cretu et al., 2009; Newcombe and Davison, 2010; Martínez et al., 2019; Makovetskii et al., 2020

Dynamics Particle-based Particle systems Tonnesen and Terzopoulos, 2000

Mass-spring systems Bianchi et al., 2004; Teschner et al., 2004; Morris and Salisbury, 2008; Schulman et al., 2013;

Arriola-Rios and Wyatt, 2017

Neural networks Nurnberger et al., 1998; Zhang et al., 2019

Position-based Müller et al., 2005; Zhu et al., 2008; Tian et al., 2013; Macklin et al., 2014; Sidorov and Marshall,

2014; Guler et al., 2017; Romeo et al., 2020

Constitutive FEM Essa et al., 1992; Frank et al., 2014; Petit et al., 2018

FVM Teran et al., 2003; Barth et al., 2018

FDM Terzopoulos et al., 1987

BEM Greminger and Nelson, 2008

LEM Balaniuk and Salisbury, 2002

Approximations Modal analysis Pentland and Williams, 1989; Barbi ˇ

c and James, 2005; Fulton et al., 2019

Active contours Kass et al., 1988; Ahlberg, 1996; Nisirat, 2019

Learning Discrete Gelder, 1998

Minimizing

error

Exhaustive search Guler et al., 2015

Iterative methods Teschner et al., 2004; Frank et al., 2014

Genetic algorithms Bianchi et al., 2004

Neural networks Cretu et al., 2012

Probability Risholm et al., 2010; Schulman et al., 2013

Control and

planning

Planning Model-based Gopalakrishnan and Goldberg, 2004; Das and Sarkar, 2011; Frank et al., 2014

Data-driven Mira et al., 2015; Li et al., 2016

Control Model-based Largilliere et al., 2015; Lin et al., 2015; Zaidi et al., 2017; Ficuciello et al., 2018

Sensor-based Wada et al., 2001; Smolen and Patriciu, 2009; Berenson, 2013; Navarro-Alarcon et al., 2016;

Delgado et al., 2017b; Hu et al., 2019; Cherubini et al., 2020

Frontiers in Robotics and AI | www.frontiersin.org 4September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

FIGURE 3 | (A) Algebraic surfaces can represent basic and complex shapes, such as circles or the tangled cube; a limited but useful set of deformations is

straightforward to deﬁne, while other deformations that are not so intuitive tend to be used in combination with skeletons and skinning techniques. (B) Parametric

surfaces can represent a wider variety of shapes and their great ﬂexibility allows deformation to be controlled with more intuitive parameters; therefore their use in

connection with models of dynamics is considerable. (C) Level set curves can show the evolution of a 2D shape in time; the intersection of a time plane with the

surface in 3D is a level curve, which represents the 2D contour shape of the object at a given time t. In this example cos2(r+c) is the mathematical representation of

the 2D contour shape’s evolution through time.

•Scalar functions whose range is R: italic lower-case letters

followed by parentheses, e.g., f(·).

•Vector functions whose range is Rnwith n>1: bold italic

letters followed by parentheses, e.g., S(·).

•Sets: calligraphic letters, e.g., S.

•Number sets: blackboard bold upper-case letters, e.g., R,N.

2. REPRESENTING SHAPE FOR

DEFORMABLE OBJECTS

The initial problem of manipulating a deformable object is

to perceive and segment the object shape from the scene as

it deforms (Figure 1i). The main diﬃculty with this problem

is the number of degrees of freedom required to model the

object shape. The expressiveness, accuracy, and ﬂexibility of

the model can ease the modeling of the dynamics or make it

more diﬃcult in diﬀerent scenarios. For this reason, this section

introduces a variety of models from mathematics, computer

graphics, and computer vision for representation of the shape of

deformable objects.

2.1. Implicit Curves and Surfaces

An implicit curve or surface of dimension n−1 is generally

deﬁned as the zero set of a function f:Rn→R,

Sf= {p∈Rn|f(p)=0}, (1)

where pis a coordinate in an n-dimensional space in which

the surface is embedded (Montagnat et al., 2001). Therefore,

the set Sfdeﬁnes the surface formed by all points pin Rn

such that the function f, when evaluated at p, is equal to

zero. The implicit function is used to locate surface points by

solving the equation f(p)=0 (Figure 3A). In robotics, nis

usually 3 to represent Cartesian coordinates, but sometimes an

extra dimension can be used to represent time. Representations

that fall within this category are explained in the rest of

this subsection.

2.1.1. Algebraic Curves and Surfaces

Algebraic curves and surfaces satisfy (1) with f(p) being

a polynomial. First-degree polynomials deﬁne planes and

hyperplanes; second-degree polynomials deﬁne conics, which

include circles, ellipses, parabolas, and hyperbolas, and

quadrics, which include ellipsoids, paraboloids, hyperboloids,

toroids, cones, and cylinders; their (n−1)-dimensional

extensions are surfaces in an n-dimensional space that satisfy

the equation

f(p)=pTAp +bp +c=0 (2)

where p= {x1,x2,...,xn}T∈Rnis a column vector, pT

denotes the transpose of p(a row vector), A∈Rn×nis a

matrix, b∈Rnis a row vector, and cis a scalar constant. Note

that all the aforementioned shapes are included as particular

cases of this deﬁnition. To deﬁne a particular shape, which

satisﬁes a given set of constraints, the constant values in A,b,

and cmust be determined; for example, with n=2, the

constants that deﬁne a circle passing through a given set of

three points can be found by solving the system of three

equations where f(pi)=0 for all iand f(pi) is a second-

degree polynomial. In some contexts the same equation can

be rewritten to facilitate this estimation; for example, it is easy

to determine the circle centered at (xc,yc) with radius rif the

second-degree polynomial is written as (x−xc)2+(y−yc)2=r2

with p= {x,y}.

Superquadrics are deﬁned by second-degree polynomials,

while hyperquadrics are the most general form and allow the

Frontiers in Robotics and AI | www.frontiersin.org 5September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

representation of complex non-symmetric shapes; they are given

by the equation

N

X

i=1

|aix+biy+ciz+di|γi=1 (3)

where p= {x,y,z}T∈R3,Nis an arbitrary number of planes

whose intersection surrounds the object, a,b,c, and dare shape

parameters of these planes, and γi∈Rwith γi≥0 for all i.

Kumar et al. (1995) presented a method for ﬁtting hyperquadrics

to very complex deformable shapes registered as range data. A

model like (3) could be used as the base representation for a

model of the dynamics if the deformations are small. It can be

used in combination with local surface deformations to represent

a wider set of surfaces. For example, if the superquadric is the set

of points Qthat satisfy the corresponding equation, a deformed

model could be given by S=c+R(Q+d) where crepresents the

inertial center of the superquadric, Ris a rotation matrix, and d

is a vectorial displacement ﬁeld. Other types of deformation can

be deﬁned as well (Montagnat et al., 2001). For more information

about superquadrics, see Jaklic et al. (2000).

Algebraic curves, surfaces, and volumes can be used as 1D, 2D,

and 3D skeletons, or to represent objects of similar shapes and

deformations (Gascuel, 1993). They are easy to deform in certain

cases, but the types of deformations that are straightforward

to apply are very limited, such as matrix transformations that

bend, pitch, twist, stretch, or translate all the points on a curve

or the space in which the curve is embedded. For this reason,

algebraic curves and surfaces are more suited to representing

articulated or semi-articulated objects. Objects can also be

composed of several algebraic curves, where each component

is easy to manipulate with this representation. To model more

complex deformed shapes, Raposo and Gomes (2019) introduced

products of primitive algebraic surfaces, such as spheres and

cylinders, which enable both local and global deformations to

be controlled more easily than in traditional algebraic shape

models. Moreover, these models can be combined with skinning

techniques to emulate soft deformable objects and also have

parameterized representations.

2.1.2. Level Set Methods

In level set methods, the deformable model is embedded in a

higher-dimensional space, where the extra dimension represents

time (Sethian, 1997; Montagnat et al., 2001). A hypersurface

9is deﬁned by 9(p, 0) =dist(p,S0), where S0is the

initial surface and dist can be the signed Euclidean distance

between a point pand the surface. The distance is positive if

the point lies outside the surface and negative otherwise. The

evolution of the surface Sis governed by the partial diﬀerential

equation 9t+ |∇9|F=0 involving the function 9(p,t)

and a speed function F, which determines the speed at which

each point of the surface must be moved. Thus, the function

9evolves in time and the current shape corresponds to the

surface given by 9(p,t)=0 (Figure 3C). The function 9

could have any form, such as a parameterized one, 9=

9(x(u,v), y(u,v), z(u,v), t), with the surface still deﬁned in

implicit form, 9(x(u,v), y(u,v), z(u,v), t)=0. Unfortunately, it

could also be that 9does not even have an algebraic expression

and may need to be approximated with numerical methods.

The main advantage of level set methods is that they allow

changes of surface topology implicitly. The set Smay split into

several connected components, or several distinct components

may merge, while 9remains a function.

In computer vision applications, such as human tracking and

medical imaging, level set methods have been used successfully

in tracking deformable objects (Sethian, 1997; Cremers, 2006).

For example, Sun et al. (2008) recursively segmented deformable

objects across a sequence of frames using a low-dimensional

modal representation and applied their technique to left

ventricular segmentation across a cardiac cycle. The dynamics

are represented using a distance level set function, whose

representation is simpliﬁed using principal component analysis

(PCA). Sun et al. used methods of particle-based smoothing as

well as non-parametric belief propagation on a loopy graphical

model capturing the temporal periodicity of the heart, with the

objective being to estimate the current state of the object not only

from the data observed at that instant but also from predictions

based on past and future boundary estimates. Even though we did

not ﬁnd examples of this method being used in robotics, it seems

a suitable candidate since it models the shape change through

time implicitly and would thus allow the robot to keep track of

the evolving shape of an object during manipulation.

2.1.3. Gaussian Principal Component Eigenmodes

This kind of representation is valid when the types of

deformations can be described with a single mathematical

formulation. Given a representative set SN= {S0,S1,...,SN−1}

of the types of surface deformation that objects can undergo, it is

possible to use PCA to detect the main modes of deformation

(i.e., the eigenmodes 8n) and thus re-express the shapes as

a linear combination of those modes. Hence a new shape

estimation can be done using

¯

S=Sµ+α8n(4)

where 8n(n≪N) is the largest eigenmode of shape variations in

SN,Sµis the mean of the representative set of shapes SN, and αis

a set of coeﬃcients. Such an eigenmode representation is useful

for dealing with missing or misleading information (e.g., noise

or occlusions) coming from sensory data while constructing the

shape of the object (Cootes et al., 1995; Blake et al., 1998; Cremers,

2006; Sinha et al., 2019).

Employing combinations of previously cited methods,

Leventon et al. (2000) used eigenmode representation with

level set curves to segment images, such as medical images

of the femur and corpus callosum, by deﬁning a probability

distribution over the variances of a set of training shapes. The

segmentation process embeds an initial curve as the zero level

set of a higher-dimensional surface, and then evolves the surface

such that the zero level set converges on the boundary of the

object to be segmented. At each step of the surface evolution, the

maximum a posteriori position and shape of the object in the

image were estimated based on the prior shape information and

the image information. The surface was then evolved globally

Frontiers in Robotics and AI | www.frontiersin.org 6September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

FIGURE 4 | (A) Two Bezier curves with control points; displacing one control point (e.g., p2) affects the entire curve. In this case, the line that joins the control points

p1and p2is the tangent of the polynomial at p1; and the line that joins p3and p4is the tangent at p4.(B) Two enchained Bezier curves; by placing p3,p4, and p5on

a line it is possible to make the curve C1at p4.(C) Free-form deformation of a spline; the black circles form a lattice in a new local coordinate system, the red circles

show the deformed lattice, and the red spline is the result of shifting the blue spline in accordance with this deformation.

toward the maximum a posteriori estimate and locally based on

image gradients and curvature. The results were demonstrated

on synthetic data and medical images, in both 2D and 3D. Tsai

et al. (2001) further developed this idea.

2.2. Explicit Parameterized

Representations

Explicit parameterized representations are evaluated directly

from their functional deﬁnition (Figure 3B). In 3D, they are

of the form C(u)= {x(u), y(u), z(u)}Tfor curves and

S(u,v)= {x(u,v), y(u,v), z(u,v)}Tfor surfaces, where uand v

are parameters. The curve or surface is traced out as the values

of the parameters are varied. For example, if u∈[0, 1], the shape

is traced out as uvaries from 0 to 1, as happens with the circle

C(u)=x=cos(u)

y=sin(u). (5)

It is common practice to parameterize a curve by time tif it will

represent a trajectory, or by arc length l1.

2.2.1. Splines

A mathematical spline Sis a piecewise-deﬁned real function, with

kpolynomial pieces si(u) parameterized by u∈[u0,uk], used to

represent curves or surfaces (Cordero Valle and Cortes Parejo,

2003). The order nof the spline corresponds to the highest order

of the polynomials. The values u0,u1,...,uk−1,ukwhere the

polynomial pieces connect are called knots.

Frequently, for a spline of order n,Sis required to be

diﬀerentiable up to order n−1, that is, to be Cn−1at knots and

C∞everywhere else. However, it is also possible to reduce its

diﬀerentiability to take into account discontinuities.

In general, any spline function S(u) of order nwith knots

u0,...,ukcan be expressed as

S(u)=

k+n+1

X

j=1

pjsj(u), (6)

1The arc length is the distance between a starting point αon the curve and the

current point β. For example, the length of a 1D curve embedded in 3D space,

parameterized by u, is given by L=Rβ

αp˙x2+ ˙y2+ ˙z2du, where ˙x=∂x

∂uand

similarly for ˙yand ˙z.

where the coeﬃcients pjare interpreted geometrically as the

coordinates of the control points that determine the shape of the

spline, and

sj(u)=(u−uj)nfor j=1, ...,k,

sk+j(u)=uj−1for j=1, ..., (n+1) (7)

constitute a basis for the space of all spline functions with knots

u0,...,uk, called the power basis. This space of functions is a

(k+(n+1))-dimensional linear space. By using other bases, a

large family of spline variations is generated (Gibson and Mirtich,

1997); the most important ones are the following.

Bezier splines have each segment being a Bezier curve given by

B(u)=

n

X

i=0

piBn

i(u), (8)

Bn

i(u)=n

i(1 −u)n−iui, (9)

where each piis a control point, the Bn

iare the Bernstein

polynomials of degree n,n

iare the binomial coeﬃcients,

and u∈[0, 1]. The curve passes through its ﬁrst and last

control points, p0and pn, and remains close to the control

polygon obtained by joining all the control points, in order,

with straight lines. Also, at its extremes it is tangent to the

line segment deﬁned by p0p1and pn−1pn. It is easy to add and

remove control points from a Bezier curve, but displacing one

causes the entire curve to change, which is why usually only

third-degree polynomial segments are used (see Figure 4).

A two-dimensional Bezier surface is obtained as the tensor

product of two Bezier curves:

B(u,v)=

n

X

i=0

m

X

j=0

Bn

i(u)Bm

j(v)pi,j. (10)

B-splines are more stable, since changes to the positions

of control points induce only local changes around that

control point, and the polynomials pass through the control

points (de Boor, 1976). They are particularly suitable for 3D

Frontiers in Robotics and AI | www.frontiersin.org 7September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

reconstructions. Song and Bai (2008) show how they can be

used to ﬁll holes and smooth surfaces originally captured as

dense clouds of points, while producing a much more compact

and manipulable representation.

Catmull-Clark surfaces approximate points lying on a mesh

of arbitrary topology (Catmull and Clack, 1978).

Non-uniform rational B-splines (NURBS) are notable

because they can represent circles, ellipses, spheres, and other

curves which, in spite of their commonness and simplicity,

cannot be represented by polynomial splines. They achieve this

by introducing quotients of polynomials in the basis2.

T-splines and non-uniform rational Catmull-Clarck

surfaces with T-junctions (T-NURCCs) allow for high-

resolution representations of 3D deformable objects with a

highly reduced number of faces and improved representations

of joints between surface patches, by introducing connections

with the shape of a T between edges of the shape (Sederberg

et al., 2003).

Splines are a very ﬂexible tool for representing all sorts

of deformable shapes. They are extremely useful for signal

and image processing (Unser, 1999) as well as for computer

animation (Maraﬃ, 2004) and shape reconstruction of 2D and

3D deformable objects (Song and Bai, 2008; Prasad et al., 2010).

Active contours (see section 3.4.2 and Kass et al., 1988), also

known as snakes, are splines governed by an energy function that

introduces dynamic elements to the shape representation and are

used for tasks, such as object segmentation (Marcos et al., 2018;

Chen et al., 2019; Hatamizadeh et al., 2019).

The advantage of splines is that compact representations of

deformable objects can be built on them in accordance with

the complexity of their shape at each time. When new corners

or points of high curvature appear, more control points can be

added for an adequate representation, and if the shape becomes

simpliﬁed these points can be removed. However, this ﬂexibility

also makes splines sensitive to noise and can lead to computation

of erroneous deformations. Hence, learning the dynamics of such

a representation is diﬃcult with current learning algorithms, and

a general solution remains an open problem.

2.2.2. Modal Decompositions

In modal decomposition, a curve or a surface is expressed as

the sum of terms in a basis, whose elements correspond to

frequency harmonics. The sum of the ﬁrst modes constituting

the surface gives a good rough approximation of its shape, which

becomes more detailed as more modes are included (Montagnat

et al., 2001). Among methods for modal decomposition, Fourier

decomposition is in widespread use. A curve may be represented

as a sum of sinusoidal terms and a surface as a combination of

spherical harmonics Ym

l(θ,ϕ) which are explicitly parameterized:

S(r,θ,ϕ)=

∞

X

l=0

l

X

m= −l

cm

lrlYm

l(θ,ϕ), (11)

2https://pages.mtu.edu/~shene/COURSES/cs3621/NOTES/

where rlis a normalization factor for Yand the cm

lare constants.

It is also possible to use other bases that may be more suitable for

other shapes, such as surfaces homeomorphic to a sphere, torus,

cylinder, or plane.

Modal decomposition has mostly been used in model-based

segmentation and recognition of 2D and 3D medical images

(Szekely et al., 1995). Its main advantage is that it creates a

compact and easy-to-manipulate representation of objects whose

shape can be described as a linear combination of a few dominant

modes of deformation. The disadvantage of such methods is that

it is easy for them to miss details in objects, such as small dents,

because shapes are approximated by a limited number of terms.

2.3. Free-Forms

Free-form deformation is a method whereby the space in

which a ﬁgure is embedded is deformed according to a set of

control points of the deformation (Moore and Molloy, 2007;

see Figure 4C). It can be used to deform primitives, such as

planes, quadrics, parametric surface patches, or implicitly deﬁned

surfaces. The deformation can be applied either globally or

locally. A local coordinate system is deﬁned using a parallelpiped

so that the coordinates inside it are p= {x1,x2,x3}with 0 <xi<

1 for all i. A set of control points pijk lie on a lattice. When they are

displaced from their original positions, they deﬁne a deformation

of the original space with new coordinates p′. The new position

of any coordinate is interpolated by applying a transformation

formula that maps pinto p′. For some transformations it is

enough to estimate the new coordinates of the nodes of a mesh

or control points of a spline with respect to the new positions

of the control points of the deformed space, and the rest of

the shape will follow them, as in Sederberg and Parry (1986),

where a trivariate tensor product Bernstein polynomial was

proposed as the transformation function. Loosely related are

multigrid representations, which also allow for local management

of deformation (Xian et al., 2019).

2.4. Discrete Representations

Discrete representations contain only a ﬁnite ﬁxed number

of key elements describing them, mainly points and lines.

Representations that fall into this category include the following:

Meshes are collections of vertices connected through edges

that form a graph. Common shapes for their faces are triangles

(triangulations), quadrilaterals, and hexagons for surfaces, and

tetrahedrons for volumes. A special case consists of the simplex

meshes, which have a constant vertex connectivity. This type

of shape representation permits smooth deformations in a

simple and eﬃcient manner (Delingette, 1999; Montagnat

et al., 2001). Therefore, meshes are used for various tasks, such

as 3D object recognition (e.g., Madi et al., 2019) and simulation

of the dynamics of deformable objects (see section 3.3.1) with

eﬃcient coding (Arvanitis et al., 2019).

Skeletons are made of rigid edges connected by joints that

allow bending. The position and deformation of elements

attached to the skeleton are deﬁned with respect to their

assigned bone. Skeletons tend to be used together with the

method known as skinning, where a deformable surface

Frontiers in Robotics and AI | www.frontiersin.org 8September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

is attached to the bone and softens the visual appearance

of the articulated joints through interpolation techniques.

By nature skeletons are designed to model articulated

deformations. Schaefer and Yuksel (2007) proposed a

method to automatically extract skeletons by detecting

articulated deformations.

Deformable templates are parameterized representations of a

shape that specify key values of salient features in an image.

They deform with greater ease around these features. The

features can be peaks and valleys in the image intensity, edges,

and the intensity itself, as well as points of high curvature3

(sharp turns or bends; see Yuille et al., 1992; Basri et al., 1998).

Deformable templates are mainly used for object recognition

and object tracking (e.g., Ravishankar et al., 2008; Xia et al.,

2019; Gallardo et al., 2020). They support the particular

relevance of critical points in modeling deformations, which

could make them key to developing a robot’s ability to generate

its own optimal representation of a deformable object. The

use of deformable templates is explored and illustrated by

experiments on natural and artiﬁcial agents in Arriola-Rios

et al. (2013).

Landmark points are points that would remain stable across

deformations. They can be corners, T-junctions, or points of

high curvature. For example, when a rectangular sponge is

pushed, its corners will still be corners after the deformation,

while the point of contact with the external force will become

a point of high curvature during the process and will remain

as such; these are all stable points. In particular, landmark

points can correspond to control points of splines (Blake et al.,

1998). Methods for further processing, such as the application

of deformations, can work more eﬃciently if they focus only

(or mainly) on landmark points rather than on the whole

representation (Cootes and Taylor, 2004).

Particles are idealized zero-dimensional dots. Their positions

are speciﬁed as a vector function parameterized by time, P(t).

They can store a set of attributes, such as mass, temperature,

shape (for visualization purposes), age, lifetime, and so on.

These attributes inﬂuence the dynamical behavior of the

particles over time and are subject to change due to procedural

stochastic processes. The particles can pass through three

diﬀerent phases during their lifetime: generation, dynamics,

and death. However, manipulating them and maintaining

constraints, such as boundaries between them can become

non-trivial. For this reason, particles are used mainly to

represent gases or visual eﬀects in animations, where the

interaction between them is very limited (Nealen et al., 2006).

Clouds of points are formed by large collections of

coordinates that belong on a surface. They are frequently

obtained from 3D scanned data and may include the color

of each point. A typical problem consists in reconstructing

3D surfaces from such clouds (Newcombe and Davison,

2010; Makovetskii et al., 2020). Cretu et al. (2009) gives

a comparative review of several methods for eﬃciently

3The curvature of a function is deﬁned as κ=dφ

ds where φis the tangential angle,

with tan φ=dy

dx , and sis the arc length as deﬁned earlier.

processing clouds of points and introduces the use of self-

organizing neural gas networks for this purpose. Clouds of

points can be structured and enriched with orientations of the

3D normals, as well as other feature descriptors for perceptual

applications (Martínez et al., 2019).

3. REPRESENTING DYNAMICS FOR

DEFORMABLE OBJECTS

After an object’s shape is deﬁned as described in section 2, a

suitable model of the dynamics can be used to register and predict

deformations as a robot interacts with the object (Figures 1ii,iii).

In this section, we introduce some of the most commonly used

models from diﬀerent ﬁelds (e.g., computer graphics; see Gibson

and Mirtich, 1997; Nealen et al., 2006; Moore and Molloy, 2007;

Bender et al., 2014) for predicting the dynamics of deformable

objects. In robotics, the important features used to select an

appropriate model are computational complexity (e.g., for real-

time perception and manipulation), physical accuracy or visual

plausibility, and simplicity or intuitiveness (i.e., the ability to

implement simple cases easily and to be built on iteratively to

accommodate more complex cases). Therefore, we divide the

models into three classes: (1) particle-based models, which are

usually computationally eﬃcient and intuitive but physically

not very accurate; (2) constitutive models, which are physically

accurate but computationally complex and not very intuitive;

and (3) approximations of constitutive models, which aim to

decrease the computational complexity of constitutive models

through approximations.

3.1. Background Knowledge of

Deformation

First, we brieﬂy review some background information about the

physics and dynamics of deformation. Initially, the object is in

a rest shape S0. In the discrete case it could be S0= {p0

i=

{x0

i,y0

i,z0

i} ∈ Rn=3,i∈N}where Nis the number of points

constituting the shape of the object. Then, when an external

force fext acts on the object, such as gravitational force or force

applied by a manipulator, the object deforms and its points move

to a new position pnew. In physics-based models, the resulting

deformation is typically deﬁned using a displacement vector ﬁeld

u=pnew −p0. From this displacement, the deformation can be

computed through the stress σ(i.e., the force applied per area of

the object shape) and the strain ǫ(i.e., the ratio of deformation to

the original size of the object shape). The stress tensor σis usually

calculated for each point on the object shape using Hooke’s law,

σ=Eǫ, where ǫcan be calculated as ǫ=1

2(∇u+ ∇uT) with ∇u

denoting the spatial derivative of the displacement ﬁeld,

∇u=

∂ux/∂x∂ux/∂y∂ux/∂z

∂uy/∂x∂uy/∂y∂uy/∂z

∂uz/∂x∂uz/∂y∂uy/∂z

;(12)

Eis a tensor that is dependent on the real physical material

properties of the object, such as Young’s modulus Eand Poisson’s

ratio υ. These properties are parameters in constitutive models

Frontiers in Robotics and AI | www.frontiersin.org 9September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

FIGURE 5 | (A) A simple circular deformable object represented in 2D with N=7 particles in the particle system; the particles are in static equilibrium (initial rest

state). (B) When the distance rbetween the particles iand jincreases, new forces exerted on iare calculated and the particles move. (C) A mass-spring model of a

simple cubic deformable object with N=8 particles in 3D; the particles are connected with structural and shear springs that enable the object to resist longitudinal

and shear deformations. (D) In Nurnberger et al. (1998), the neurons of a recurrent neural network represent the mass nodes (light blue) and springs (gray) of a

mass-spring model; the activation functions between the neurons of the network were devised to reproduce the mass-spring system equations.

(e.g., ﬁnite element models). Constitutive models describe strain-

stress relationships as the response of materials (e.g., elastic

or plastic) to diﬀerent loads (e.g., forces applied) based on

the material properties; they are commonly used to simulate

deformation because of their high physical accuracy.

To simulate the dynamical behavior of deformation over time,

Newton’s second law of motion is employed. Let pt

ibe the

position of particle iat time t:

vt

i= ˙pt

i,at

i= ˙vt

i,miat

i=fextt

i, (13)

where mi,fextt

i∈R3,at

i∈R3, and vt∈R3are respectively

the mass, external forces, acceleration, and velocity at time t,

and ˙pt

i=(pt+1t

i−pt

i)

1tand ˙vt

i=(vt+1t

i−vt

i)

1tare ﬁrst-order time

derivatives of the position and velocity, respectively, which are

approximated using ﬁnite diﬀerences. Then, according to these

derivative approximations, in each time step 1tthe points

move according to a time integration scheme. The simplest such

scheme is explicit Euler integration:

pt+1t

i=pt

i+vt

i1t, (14)

vt+1t

i=vt+1

mfexti1t, (15)

where vt

iand pt

iare the velocity and position of point iat time t.

We remark that explicit Euler integration can cause problems,

such as unrealistic deformation behavior (e.g., overshooting).

There are other more stable integration schemes (e.g., implicit

integration, Verlet, Runge-Kutta; see Hauth et al., 2003) that can

be used.

Such a dynamical model can be represented simply as

G(S0,fext,θ), where input to the model Gincludes the initial

state of points p0∈S0of the object and the external forces

fext;θrepresents model parameters that could be related to

material properties (e.g., E,υ), as in constitutive models, to

simulate desired deformations. Then, within G, the deformation

is computed and the points ptare iterated to time state tusing an

integration scheme, such as (14) and (15).

3.2. Particle-Based Models

3.2.1. Particle Systems

In a particle system, a solid object shape Sis represented as a

collection of Nparticles (see section 2.4). These particles are

initially in an equilibrium position, p0

i= {x0

i,y0

i,z0

i} ∈ R3,

which can be regarded as the initial coordinates of each particle

i∈ {1, ...,N}(Figure 5A). When an external force is applied,

the object deforms and the particles move to new coordinates

pt

ibased on physics laws, in particular Newton’s second law of

motion (13), according to a time integration scheme, such as (14)

and (15) (Figure 5B).

Although particles are usually used to model objects, such

as clouds or liquids, there are also particle frameworks for

simulation of solids. These frameworks are based on so-called

dynamically coupled particles that represent the volume of an

object (Tonnesen and Terzopoulos, 2000). The advantage of

particle systems is their simplicity, which allows simulation of

a huge number of particles to represent complex scenes. A

disadvantage of particle systems is that the surface is not explicitly

deﬁned. Therefore, maintaining the initial shape of the deforming

object is diﬃcult, and this can be problematic for applications,

such as tracking the return of elastic objects to their original

shape after deformation during robotic manipulation. Hence, for

objects that are supposed to maintain a given structure, particle-

based models with ﬁxed particle couplings are more appropriate,

such as models that employ meshes for shape representation.

3.2.2. Mass-Spring Systems

Mass-spring (MS) models use meshes for shape representation

(see section 2.4). In such a model Nparticles are connected by

a network of springs (Figure 5C). As in particle systems, particle

motion is simulated using Newton’s second law of motion (13).

Frontiers in Robotics and AI | www.frontiersin.org 10 September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

However, there are other forces between the connected particles,

say iand j, that aﬀect their motion, in particular the spring force

fs(pi)=ks(|(pj−pi)| − lij)(pj−pi)

|(pj−pi)|, where ksis the spring’s

stiﬀness and lij is the rest length of the spring, and the damping

force fd(p)=kd(vj−vi) of the spring, where kdis the damping

coeﬃcient. Then, the equation of motion (13) becomes

miai=fext(pi)+fd(pi)+fs(pi). (16)

For the entire particle system, this can be expressed in matrix

form as

Ma +Dv +Ku =fext, (17)

where M∈R3N×3N,D∈R3N×3N, and K∈R3N×3Nare a

diagonal mass matrix, a diagonal damping matrix, and a stiﬀness

matrix for n=3 dimensions. The MS system can be represented

as a model GMS(S0,fext ,θ), where input to GMS consists of the

initial state of the mesh shape p0∈S0, the external forces fext,

and the model parameters θ= {ks,kd}, which can be changed

(tuned) to determine object deformability.

MS systems are a widely used type of physics-based

model for predicting and tracking object states during robotic

manipulation, since they are intuitive and computationally

eﬃcient (Schulman et al., 2013). However, the spring constants

are diﬃcult to tune according to the material properties to

obtain the desired deformation behavior. One way to overcome

this tuning problem is to use learning algorithms and reference

solutions (Bianchi et al., 2004; Morris and Salisbury, 2008;

Arriola-Rios and Wyatt, 2017). Another disadvantage of MS

models is that they cannot directly simulate volumetric eﬀects,

such as volume conservation in its basic formulation. To simulate

such eﬀects, Teschner et al. (2004) introduced additional energy

formulations. Also, the behavior of an MS model is aﬀected

by the directions in which the springs are placed; to deal

with this issue, Bourguignon and Cani (2000) added virtual

springs to compensate for this eﬀect. In addition, Xu et al.

(2018) proposed a new method by introducing extra elastic

forces into the traditional MS model to integrate more complex

mechanical behaviors, such as viscoelasticity, non-linearity,

and incompressibility.

3.2.3. Neural Networks

Nurnberger et al. (1998) designed a method for controlling the

dynamics of an MS model using a recurrent neural network

(NN). Diﬀerent types of neurons are used to represent the

positions p, velocities v, and accelerations aof the mass points

(nodes) and the springs (spring nodes) of the mesh shape S

(Figure 5D). The diﬀerential equations governing the behavior

of the MS system are codiﬁed in the structure of the network.

The spring functions are used as activation functions for the

corresponding neurons. The whole system poses the simulation

as a problem of minimization of energy. The information is

propagated to the neurons in stages, starting from the mass points

where the applied force is greatest, and an equilibrium point

must be reached to obtain the new conﬁguration of the nodes

at each time t. The training is carried out with gradient descent

(backpropagation) for the NN. In addition, Zhang et al. (2019)

employed a convolutional neural network (CNN) to model

propagation of mechanical load using the Poisson equation

rather than an MS model.

The advantage of using an NN to control deformation

is the method’s ﬂexibility, such as being able to modify the

network structure during simulation (e.g., by removing springs

as in Nurnberger et al., 1998) and simulate large deformations

eﬃciently (e.g., Zhang et al., 2019).

3.2.4. Position-Based Dynamics

Particle systems and MS models are force-based models where,

based on given forces, the velocities and positions of particles are

determined by a time integration scheme. In contrast, position-

based dynamics (PBD) models compute the positions directly by

applying geometrical constraints in each simulation step. PBD

methods can be used for various purposes, such as simulating

liquids, gases, and melting or visco-elastic objects undergoing

topological changes (Bender et al., 2014). Here we focus on a

special PBD method, called meshless shape matching (MSM; see

Müller et al., 2005), that is used to simulate volumetric solid

objects while preserving their topological shape.

FIGURE 6 | (A) Meshless shape matching applied to a simple object consisting of N=4 particles. (B).1 The method estimates the optimal linear transformation A

that allows the particles to move to the actual deformed positions pwith respect to the rest state p0, as in (C).(B).2 Then Ais decomposed into a rotational (rigid) part

Rand a symmetric (deformation) part S.(B).3 The Rtransformation is used to simulate rigid motion; to simulate deformation, Aand Rare combined using a

parameter, β.

Frontiers in Robotics and AI | www.frontiersin.org 11 September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

In MSM, as in particle systems, an object is represented

by a set of Nparticles without any connectivity (Figure 6A).

Since there is no connectivity information between the particles,

when they are disturbed by external forces (Figure 6B) they

tend to adopt a conﬁguration that does not respect the original

shape p0of the object. We call this disturbed conﬁguration the

intermediate deformed shape, ¯pi=pt−1

i+ ¯vt

i1t, where ¯vi=

vt−1

i+fext 1t

mi. MSM calculates an optimal linear transformation,

A=(Pimiriqi)(Pimiqiqi)−1, between the initial shape p0and

the intermediate deformed shape ¯pthat allows preservation of

the original shape of the object; here ri= ¯pi− ¯cand qi=

p0

i−c0, with c=1

PN

imiPN

imipibeing the center of mass of

the object (Figure 6B.1–3). Then, the linear transformation Ais

separated into rotational and symmetric parts: A=RS where R

represents rigid behavior and Srepresents deformable behavior.

Hence, to simulate rigid behavior, the goal (actual) position of the

particles is

pi=Rqi+t(18)

where t=cis the translation of the object. If the object is

deformable, then Sis also included and the goal position is

pi=(R((1 −β)I−βS))qi+t

=((1 −β)R+βA)qi+t,(19)

where βis a control parameter that determines the degree of

deformation coming from the Smatrix. If β=0, (19) becomes

(18). If βapproaches 1, then the range of deformation increases.

Subsequently, using the following integration scheme, the new

position and velocity at time tare updated:

pt

i= ¯pi+α(pi− ¯pi), (20)

vt

i=(pt

i−pt−1

i)/1t, (21)

where αaﬀects the stiﬀness of the model (similar to the

MS model) and determines the speed of convergence of the

intermediate positions to the goal positions. In simplest form this

model can be represented as GMSM(S0,fext ,θ), where input to the

model GMSM consists of the initial state S0, external forces fext,

and model parameters θ= {β,α}, which can be tuned to decide

the range of deformability and stiﬀness of the model.

The main advantages of PBD methods are their simplicity,

computational and memory-wise (i.e., not needing a mesh

model) eﬃciency, and scalability owing to their particle-based

parallel nature. Also, they are able to calculate more visually

plausible deformations than MS models. Hence, they have been

used in a wide range of interactive graphical applications (Tian

et al., 2013; Macklin et al., 2014), particularly for modeling

the deformation of human body parts (Zhu et al., 2008;

Sidorov and Marshall, 2014; Romeo et al., 2020), and robotic

manipulation tasks (Caccamo et al., 2016; Guler et al., 2017).

A disadvantage of PBD methods is that they simulate physical

deformation less accurately than constitutive models, since they

are geometrically motivated.

3.3. Constitutive Models

To simulate more physically accurate deformations, constitutive

models, which incorporate real physical material properties,

are used. In this subsection, we start by introducing

the most commonly used constitutive models, namely

ﬁnite element models, and then brieﬂy mention other

models that simplify ﬁnite element models to increase

computational eﬃciency.

3.3.1. Finite Element Method

The ﬁnite element method (FEM) aims to approximate the true

physical behavior of a deformable object by dividing its body into

smaller and simpler parts called ﬁnite elements. These elements

are connected through Nnodes that make up an irregular grid

mesh (Figure 7A). Thus, instead of particles, we work with

node displacements. The mesh deformation is calculated through

the displacement vector ﬁeld u. For simulation, an equation of

motion similar to (17) is used for an entire mesh. Usually, to

decrease the computational complexity, the dynamical parts of

the equation are skipped and the deformation is calculated for a

static state in equilibrium (a=v=0). Then, the relationship

FIGURE 7 | (A) An irregular grid as the mesh of a cubic deformed object in 3D using the ﬁnite element method (left) and an element eof the mesh with its Ne=4

nodes (right); the arrows show the displacement ﬁelds u1= {u1,x,u1,y,u1,z}at node i=1, and ueis the nodal displacement vector of the element e.(B) An element

with node jand forces applied on three adjacent faces in the ﬁnite volume model. (C) A regular discrete mesh to be used in calculations of the ﬁnite difference method.

Frontiers in Robotics and AI | www.frontiersin.org 12 September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

between a ﬁnite element eand its Nenodes (e.g., Ne=4 for a

tetrahedron as in Figure 7A) can be expressed as

Keue=fe(22)

where fe∈R3×Necontains the Nenodal forces, ue∈R3×Ne

is the displacement of an element between the actual and the

deformed positions, and Ke∈R3Ne×3Neis the stiﬀness matrix of

the element. The stiﬀness matrices of the diﬀerent elements are

assembled into a single matrix K∈R3N×3Nfor an entire mesh

with Nnodes:

K=X

e

Ke, (23)

Ku =f. (24)

The matrix Krelies on the nodal displacement u∈R3×Nand the

constitutive material properties (e.g., Eand υ) to compute the

nodal forces f∈R3×Nof the entire mesh. Therefore, this huge

matrix Kshould be calculated at every time step t.

The FEM can be represented as a model GFEM(S0,fext ,θ),

which takes as input the initial state S0, external forces fext, and

constitutive model parameters θ, such as Eand υ. Then, within

model GFEM, the new positions and velocities of the nodes are

updated using a time integration scheme, such as (14) and (15).

The FEM can produce physically realistic simulations and

model complex deformed conﬁgurations. Owing to these

properties, FEM models have been used in many robotics

applications, such as tracking (Essa et al., 1992; Petit et al.,

2018; Sengupta et al., 2019) and planning manipulation around

deformable objects (Frank et al., 2014). However, they can have a

heavy computational burden due to re-evaluating Kat each time

step. This can be avoided by using linear FEM models where K

in (24) stays constant4. The drawback is that this assumption

limits the model to simulating only small deformations. Other

methods that can decrease computational complexity, such as

co-rotational FEM (Müller and Gross, 2004), can also be used.

3.3.2. Finite Volume Method

In the ﬁnite volume method (FVM), instead of calculating the

nodal forces of the mesh shape Sindividually as in FEM, the

force per unit area with respect to a certain plane orientation

is calculated. This is done by using the constitutive law for the

computation of the stress tensor σ(section 3.1). Then, the total

force acting on face iof a ﬁnite element can be calculated using

the formula

fAi=Aiσni(25)

where Aiis a scalar representing the area of face iand niis its

normal vector. To calculate the nodal forces, the forces of surfaces

adjacent to node jare summed and distributed evenly to each

node (Teran et al., 2003; see Figure 7B):

fj= − 1

n(fA1+fA2+fA3). (26)

4For a detailed tutorial on how to eﬃciently compute the Kmatrix, we direct the

reader to the notes of Müller et al. (2008).

The FVM model can be represented as GFVM(S0,fext ,θ), which

takes as input the initial state S0, in which areas Aand normals

ncan be calculated, the external forces fext, and the constitutive

model parameters Eand υ. Then, within model GFVM, the new

positions pof nodes of S0are updated at each time step tusing

a time integration scheme. Since this method is computationally

more eﬃcient than the FEM, it has been used in many computer

graphics applications (Barth et al., 2018; Cardiﬀ and Demirdži´

c,

2018). However, it restricts the types of deformation that can be

simulated, such as the deformation of irregular meshes.

3.3.3. Finite Difference Method

In the ﬁnite diﬀerence method (FDM), the volume of the object

is deﬁned as a regular M×N×Pdiscrete mesh of nodes with

horizontal, vertical, and stacked inter-node spacings h1,h2, and

h3, respectively (Figure 7C). The nodes are indexed as [m,n,p]

where 1 ≤m≤M(parallel to the x-axis), 1 ≤n≤N

(parallel to the y-axis), and 1 ≤p≤P(parallel to the z-axis),

and pm,n,p∈R3is the position of the node in 3D space. The

object is deformed when an external force is applied. To calculate

the nodal forces, a displacement vector ushould be calculated

using spatial derivatives. This is done by deﬁning ﬁnite diﬀerence

operators between the new node positions in the deformed mesh.

For example, for pm,n,pthe ﬁrst-order ﬁnite diﬀerence operator

along the x-axis can be deﬁned as dx(pm,n,p)=(pm+1,n,p−

pm,n,p)/h1. Using the ﬁnite diﬀerence operators, the nodal forces

are calculated and the deformation of the object can be computed

as in the FEM (section 3.3.1).

The FDM is one of the alternative methods suggested

for decreasing the computational complexity of the FEM

(Terzopoulos et al., 1987). A disadvantage of this method is that

it is more diﬃcult to approximate the boundaries of objects using

a regular grid for the mesh (Nealen et al., 2006), and hence the

accuracy is decreased.

3.3.4. Boundary Element Method

The boundary element method (BEM) computes the

deformation of Sby calculating the equation of motion

(17) over a surface rather than over a volume as in the FEM.

The boundary (surface) Sis discretized into a set of Nnon-

overlapping elements (e.g., mesh elements) e, whose node

coordinates pi,i=1, ...,N, are the centroids of the elements.

These elements represent displacements and tractions, and Su

and Srare surface parts where the displacement and traction

boundary conditions are deﬁned, respectively.

The BEM provides a signiﬁcant speedup compared to the

FEM because it requires fewer nodes and elements. However, it

only works for objects whose interior consists of homogeneous

material. It has been used in the ArtDefo System (James and Pai,

1999) to simulate volumetric models in real-time. Also, it has

been used to improve tracking accuracy against occlusions and

spurious edges in (Greminger and Nelson, 2008).

3.3.5. Long Elements Method

In the long elements method (LEM), a solid object is considered

to be ﬁlled with incompressible ﬂuid as in biological tissues. The

volume of object shape Sis discretized into Cartesian meshes (i.e.,

Frontiers in Robotics and AI | www.frontiersin.org 13 September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

FIGURE 8 | (A) In the boundary element method, the deformable object’s shape Sis discretized into a set of elements e;Sis disturbed by a force that displaces the

centroid node coordinates p, and the deformation is calculated based on the displacement and traction boundary conditions deﬁned on Suand Sr, respectively. (B) A

virtual object discretized as Cartesian meshes according to the long elements method. Here, for the sake of simplicity, we show a 2D object with only two axes (iand

j). On the left and right ﬁgures are meshes of the object discretized with long elements Liand Ljparallel to the iand jaxes, respectively. External pressure is applied at

a particular point on the surface of the object, and the resulting force fon the particle (red dot) is calculated using the deformations 1liof long elements crossing

the particle.

one mesh for each axis), and each mesh contains long elements

(LEs) e∈ {1, ...,Ni}, where Niis the number of elements

in the mesh shape Sidiscretized parallel to axis i(Figure 8B).

The crossings of the LEs of the diﬀerent axes deﬁne cells, each

of which contains a particle. By calculating the state of these

particles (e.g., position pand velocity v), the deformation of the

object is simulated.

The state of each particle is calculated using the laws of ﬂuid

mechanics (e.g., Pascal’s law). Then, a system of linear equations

for each ethat ﬁlls the object volume is created. By solving

this system of equations by numerical methods, the deformation

of e,1li, is calculated. From 1lithe forces occurring due to

deformation are computed. Here, the LE is regarded as a spring

attached to a particle with known mass. As an example, in

Figure 8B, pressure is applied to an object. As a result of this

pressure, a force fiacts on the particle along the ith axis and is

calculated using the displacements of the crossing LEs attached

to the particle:

fi=kLi(1li−1li′)+kLj(1lj−1lj′) (27)

where kis the spring constant. Therefore, an LEM model

can be represented as GLEM(Si,Sj,fext ,θ), which takes as input

meshes of axes iand jfor 2D space, external forces fext, and

model parameters, such as spring constants that determine

object deformability. Subsequently, the force obtained is used

to calculate the velocities and positions of the particles along

each axis.

The LEM was developed for modeling soft tissues, especially

for surgical simulation (Balaniuk and Salisbury, 2002). It uses

a smaller number of elements than tetrahedral (e.g., FEM)

and cubic (e.g., FD) meshing, so the computational complexity

of the model is reduced as well. It is therefore capable

of interactive real-time soft tissue simulation for haptic and

graphic applications, such as robotic surgery. However, it

provides only an approximation of real physical deformation

and so presents a trade-oﬀ between physical accuracy and

computational eﬃciency.

3.4. Approximations of Constitutive Models

3.4.1. Modal Analysis

What makes constitutive models, such as the FEM expensive is

calculation of the motion with large matrices M,D, and Kin

Equation (17); for example, with N=20 nodal points of a mesh

shape p=(x,y,z), the calculation would involve three matrices

of size 60 ×60. Pentland and Williams (1989) proposed a way

of reducing this computational complexity based on a method

called modal analysis. Modal analysis is used for identifying an

object’s vibrational modes (Figure 9A) by decoupling (17). This

is done by using linear algebraic formulations (Nealen et al.,

2006). Below, we outline the steps of modal analysis using these

formulations, while skipping the detailed derivations.

First, the matrices are diagonalized by solving the following

eigenvalue problem (i.e., whitening transition):

M83 =K8, (28)

where 3and 8are matrices containing the eigenvalues and

eigenvectors of MK−1. Then, the eigenvectors of 8are used to

transform the displacement vector u:

u=8q. (29)

By substituting (29) into (17) and multiplying by 8T, the

following system of equations is constructed:

8TM8¨q+8TD8˙q+8TK8q=8Tfext, (30)

∇M¨q+ ∇D˙q+ ∇Kq = ∇fext, (31)

where ∇M,∇D, and ∇Kare all diagonal matrices. This generates

3Nindependent equations of motion for the modes:

∇Mi¨qi+ ∇Di˙qi+ ∇Kiqi= ∇fexti. (32)

Frontiers in Robotics and AI | www.frontiersin.org 14 September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

FIGURE 9 | (A) A simple 2D rectangular object in different deformation modes in modal analysis: the upper shape is a deformation mode in response to compression,

and the middle and lower shapes are deformation modes in response to bending forces from different directions. (B) Active contour S(u) controlled by external (Eext )

and shape (Eshape) energies that attract or repel according to the shape of the object (e.g., in an image); as the object deforms, the position of S(u) is updated iteratively.

Then, (32) can be solved analytically for qito compute the

motion of each mode i∈ {1, 2, ...,N}. The matrix 8contains

a diﬀerent mode shape in each of its columns (Figure 9A), i.e.,

8=[81,82,...,8N]. Hence, by analyzing the eigenvalues in

3, high-frequency modes in 8can be eliminated so that only the

most dominant modes are updated using (32). This reduces the

number of equations in (32) and hence lowers the computational

cost signiﬁcantly.

In modal analysis, as in constitutive models, taking Kto be

constant can increase the computational eﬃciency. However,

this assumption is valid only when simulating small linear

deformations and leads to errors when dynamically simulating

large deformations. To overcome this problem, some methods

use diﬀerent formulations of the strain tensor (e.g., the Green

strain as in Barbiˇ

c and James, 2005) to enable simulation of larger

non-linear deformations (An et al., 2008; Pan and Manocha,

2018), or adopt more data-driven approaches (e.g., by employing

CNNs as in Fulton et al., 2019).

3.4.2. Active Contours

Active contour (AC) models are approximations of constitutive

models, such as the FEM. They were ﬁrst introduced by Kass et al.

(1988) in the form of the snakes model. In their simplest form

they can be described as a function of a spline shape (section

2.2.1), S(u)∈Rnfor u∈[0, 1], in an n-dimensional space,

for example S(u)= {x(u), y(u)} ∈ R2in an image I:R2→R

(Figure 9B). This spline is ﬁtted to the shape of an object in the

image by minimizing the following energy formulation:

Esnake =Z1

0

Eext(S(u)) +Eshape (S(u)) du. (33)

Here Eext depends on the contour position with respect to an

attractor function f:

Eext =f(S(u)), (34)

where in the n=2 case the f(x,y) function could be the image

intensity I(x,y), which would attract the snake to the brightest

regions, or an edge detector, which would attract the snake to the

edges (Moore and Molloy, 2007).

In (33), Eshape is the internal energy of the contour, which

depends on the shape of the contour:

Eshape =α(n)|S′(u)2| + β(i)|S′′(u)|2. (35)

The ﬁrst-order derivative |S′(u)2|controls the length of the

contour, and the goal is to minimize the total length. The second-

order derivative |S′′(u)|2controls the smoothness of the contour;

this term enables the contour to resist stretching or bending

by external forces due to f(S′(u)) and is used to regularize the

contour. The weight parameters αand βdetermine elasticity and

rigidity, respectively.

The energy Esnake can be discretized into Nparts as si(u) where

the ui=ih, for i∈ {1, ...,N}, are knots and h=1

N:

E∗

snake =

N

X

i=1

Eext(si(u)) +Eshape (si(u)). (36)

Then, from the discretization, the derivatives in Eshape can be

approximated using ﬁnite diﬀerence operators:

Eshape(i)=αi

|s(ui)−si−1(u)|2

2h2+βi

|si−1(u)−2si(u)+si+1(u)|2

2h4.

(37)

To ﬁnd the contour that minimizes the total energy, E∗

snake is

minimized. The resulting expression is then put into matrix form

and used to update the position of the contour iteratively in time

by using a time integration scheme as demonstrated in Kass et al.

(1988). To represent an AC in 3D, some additional parameters are

included in the shape energy formulation: the elasticity parameter

Frontiers in Robotics and AI | www.frontiersin.org 15 September 2020 | Volume 7 | Article 82

Arriola-Rios et al. Modeling of Deformable Objects for Robotic Manipulation

FIGURE 10 | The general schema for learning: (A) A type of model and ground truth are selected, such as 2D images used as ground truth to calibrate a mass-spring

model. (B) If the model’s parameters have not been calibrated correctly, there will be a difference between the observed ground truth and the simulation; the observed

ground truth is shown in red, the simulation in blue, and their intersection in yellow. (C) When the model is properly calibrated, it should be able to predict the behavior

of the object if given the corresponding interaction parameters, such as external forces (e.g., contact forces of the manipulator) or geometric constraints (e.g., not

crossing the ﬂoor).

βis deﬁned along the third axis as well, and an extra parameter is

added to control the resistance to twisting (Ahlberg, 1996).

AC models have been widely used, especially in medical

imaging, for motion tracking and shape registration tasks

(Williams and Shah, 1992; Leventon et al., 2000; Das and

Banerjee, 2004), and they can also be combined with constitutive

models to achieve greater physical accuracy (Luo and Nelson,

2001). The main disadvantage of AC models is their reliance on a

good initialization of the snake contour near the desired shape in

the image. To overcome this drawback, attractor functions other

than image intensity, such as edge maps, have been proposed in

recent years (e.g., Nisirat, 2019).

This concludes our tutorial-style description of models to

provide some technical grounding in the basic mathematical

approaches to deformable object modeling. The content of

sections 4 (learning and estimation) and 5 (planning and control)

builds on the models we have described. These sections are

written in the form of broad surveys, as there is such a wide range

of diﬀerent approaches that we cannot cover them all in depth.

4. LEARNING AND ESTIMATION OF

MODEL PARAMETERS

In the previous sections, we introduced computational models of

deformable objects that have numerous applications. However,

for the models to be useful, several parameters must be

known beforehand (Figure 1vi), so these models should be

calibrated carefully. In this section we give an overview of some

representative cases of applying various learning algorithms,

which can make the calibration process autonomous (Figure 10).

The methods we review can be grouped into three types of

strategies: (a) estimating parameters directly, which is rarely

feasible (section 4.1); (b) calibrating known physics-based

models, G, automatically by allowing the robot to take some

measurements that will help to determine the values of model

parameters, θ(section 4.2.2); and (c) approximating new

functions that describe the dynamics, as is done with neural

networks (section 4.2.4).

4.1. Direct Estimation

For some models, it is possible to derive a formula to directly

calculate the parameters. For example, Gelder (1998) obtained a

formula for the parameter ksof an MS model (section 3.2.2) in

a static state, where the materials are non-uniform but isotropic.

An isotropic material is a material whose local deformation in

response to force is independent of the direction in which the

force is applied. However, in a non-uniform material the response

varies with the position where the force is applied. For 3D

tetrahedral meshes, the spring constant kscan be obtained from

the formula

ks=EPeVe

|c|2(38)

where the sum is over the volume Veof a triangular element eof

a 3D mesh shape S0on its edge c. Young’s modulus, E, is chosen

empirically to give the desired amount of elasticity.

Direct estimation is a computationally eﬃcient method.

However, often it is not possible to do such calculations for

models that rely on complex constitutive material laws as in

the FEM.

4.2. Minimizing Error

This group of methods relies on the deﬁnition of an error

function Err(pθ,ˆp)=dist(pθ,ˆp) that measures the diﬀerence

(e.g., Euclidean distance) between the deformation of some

ground truth ˆpand the simulated virtual deformable object

position pθ∈G(S0,fext,pc,θ), where pcis the point of

contact. The ground truth ˆpcan be obtained from camera

observations of a real-world deformable object or from another,

more reliable, simulation, usually an FEM simulation. Then, pθ∈

Sθis simulated with various θvalues and the same interaction

parameters as in the ground truth observations, such as the

contact forces fext and positions pc∈Sθof the manipulator

or geometric constraints (boundary conditions) like the object

not crossing the bottom surface. The objective of the learning

algorithm is to ﬁnd a set of parameters θfor the model G

that minimizes the error function Err(pθ,ˆp) with the given

interaction parameters.

Frontiers in Robotics and AI | www.frontiersin.org 16 September 2020 | Volume 7 | Article 82