Content uploaded by Ion Matei

Author content

All content in this area was uploaded by Ion Matei on Feb 19, 2020

Content may be subject to copyright.

Inferring Particle Interaction Physical Models

and Their Dynamical Properties

Ion Matei, Christos Mavridis, John S. Baras and Maksym Zhenirovskyy

Abstract— We propose a framework based on port-

Hamiltonian modeling formalism aimed at learning interaction

models between particles (or networked systems) and dynamical

properties such as trajectory symmetries and conservation laws

of the ensemble (or swarm). The learning process is based

on approaches and platforms used for large scale optimization

and uses features such as automatic diﬀerentiation to compute

gradients of optimization loss functions. We showcase our

approach on the Cucker-Smale particle interaction model,

which is ﬁrst represented in a port-Hamiltonian form, and

for which we re-discover the interaction model, and learn

dynamical properties that are previously proved analytically.

Our approach has the potential for discovering novel particle

cooperation rules that can be extracted and used in cooperative

control system applications.

I. Introduction

Extracting physical laws that govern a given system from

data is a central challenge in many diverse areas of science

and engineering. Most complex systems can be described

as discrete structures (graphs) with dynamical relations [1].

Such networked systems are ubiquitous and include multi-

body systems, chemical reaction networks, animal and UAV

swarms, and power systems. A fundamental challenge in

complex networked systems is to infer the laws of interaction

between particles and their dynamical properties [1]. The

problem has been approached either by using statistical

learning [2] [3], or by learning the parameters of equations

modeling the system. In [4] symbolic equations are generated

from the numerically calculated derivatives of the system

variables. In [5], [6] the constitutive equations of physical

components the system are learned using acausal repre-

sentations, while in [7] the order of fractional diﬀerential

equations modeling the system is estimated.

A general and powerful geometric framework to

model complex dynamical networked systems is the port-

Hamiltonian modeling formalism [8], [9], [10]. Port-

Hamiltonian systems are based on a known energy function

(Hamiltonian) and the interconnection of atomic structure

elements (e.g. inertias, springs and dampers for mechanical

systems) that interact by exchanging energy. They provide an

energy-consistent description of a physical system, having

the property that a power conservative interconnection of

Ion Matei and Maksym Zhenirovskyy are with the Palo Alto Re-

search Center (PARC), Palo Alto, CA (emails: ion.matei@parc.com,

maksym.zhenirovskyy@parc.com). Christos Mavridis and John S. Baras

are with the Department of Electrical and Computer Engineering and the

Institute for Systems Research, University of Maryland, College Park, MD

(emails: mavridis@umd.edu, baras@isr.umd.edu).

This material is based upon work supported by the Defense Advanced Re-

search Projects Agency (DARPA) under Agreement No. HR00111990027.

port-Hamiltonian systems is again a port-Hamiltonian system

[11].

In addition, the port-Hamiltonian system framework is

particularly suited for ﬁnding symmetries and conserved

quantities [12]. In particular, it allows to ﬁnd conserved

quantities, in addition to the Hamiltonian, called Casimir

functions [9], by examining conditions related to the port-

Hamiltonian system at hand, which can lead to model

simpliﬁcation (reduction). Moreover, ﬁnding parameterized

symmetries, e.g., Lie groups of transformations, can lead to

data generation without experimentation, as well as provide

insight on the modeling equations of the system itself [13],

[14].

In this work, we are interested in models describing the

dynamics of swarms or particle ensembles (e.g. bird ﬂocks),

which have been studied intensively through the years [15],

[16], [17]. We model the system of interacting particles as a

graph topology based on port-Hamiltonian components, and

investigate its dynamical properties, such as discrete symme-

tries of trajectories, Lie groups of invariance transformations

and conservation laws. To showcase our approach we use the

Cucker-Smale (CS) model [16] to generate training and test

data for the learning tasks. We apply large scale optimization

methods implemented on deep learning platforms to learn

the particle interaction model from data, and recover its

dynamical properties. Finally, we compare the results of our

method to the ones derived by the theoretical analysis.

The rest of the manuscript is organized as follows: Section

II introduces the CS dynamical model for particle interac-

tions, the port-Hamiltonian formalism, discrete symmetries

and the Lie groups of invariance transformations. Section

III describes the port-Hamiltonian representation of the CS

interaction model. In Section IV we prove theoretical results

on discrete symmetries, Lie groups of invariance transfor-

mations and conservation laws based on Casimir functions.

Section V describes optimization based learning algorithms

that are used to recover the particle interaction model, the

discrete and Lie symmetry maps and the conserved quanti-

ties. Finally, Section VI concludes the paper.

II. Preliminaries

In this section we ﬁrst describe the CS dynamical model

used to showcase our approach, we give a brief description

of the port-Hamiltonian formalism and introduce the notions

of symmetries and Lie groups of invariance transformations.

A. Cucker-Smale Particle Interaction Model

Let idenote a particle in an ensemble of Nparticles. The

CS particle interaction model [16] is given by ˙xi=viand

˙vi=1

NPN

j=1G(kxi−xjk)(vj−vi), where a typical choice for the

interaction function Gis G(r)=1

(1+r2)γ. The above dynamics

ensures velocity alignment of all particles [16], [17]. An ex-

tension of the original model comes from adding a potential

function [16], [17], resulting in the dynamics ˙xi=viand ˙vi=

1

NPN

j=1G(kxi−xjk)(vj−vi)−1

NPi,j∇U(kxi−xjk), where the

potential function takes the form U(r)=−CAe−r/lA+CRe−r/lR,

with CA,CR,lA,lRpositive scalars. The above model can be

compactly written as

˙

x=v(1)

˙

v=G(x)v− ∇U(x),(2)

where [x]i=xi,[v]i=vi,[G(x)]i,i=−1

NPN

i=1G(kxi−xjk),

[G(x)]i,j=1

NG(kxi−xjk), for i,j, [∇U(x)]i,i=0, and

[∇U(x)]i,j=1

N∇U(kxi−xjk), for i,j.

B. Port-Hamiltonian Systems

Consider a ﬁnite-dimensional linear state space X

along with a Hamiltonian H:X → R+deﬁning energy-

storage, and a set of pairs of eﬀort and ﬂow variables

{(ei,fi)∈ Ei× Fi,i∈{S,R,P}}, describing ports (ensembles

of elements) that interact by exchanging energy. Then, the

dynamics of a port-Hamiltonian system Σ = (X,H,S,R,P,D)

are deﬁned by a Dirac Structure D[9], [10] as

(fS,eS,fR,eR,fP,eP)∈D⇔eT

SfS+eT

RfR+eT

PfP=0,

where (i)S=(fS,eS)∈ FR× ER=X×X is an energy-storing

port, consisting of the union of all the energy-storing ele-

ments of the system (e.g. inertias and springs in mechanical

systems), satisfying fS=−˙x,eS=∂H

∂x(x),x∈ X such that

d

dt H=−eT

SfS=eT

RfR+eT

PfP, (ii)R=(fR,eR)∈ FR× ERis

an energy-dissipation (resistive) port, consisting of the union

of all the resistive elements of the system (e.g. dampers

in mechanical systems), satisfying heR,fRi≤0 and, usually,

an input-output relation fR=−R(eR), (iii)P=(fP,eP)∈

FP× EPis an external port modeling the interaction of the

system with the environment, consisting of a control port

Cand an interconnection port I, and (iv)D ⊂ F × E =

FR× ER× FR× ER× FP× EPis a central power-conserving

interconnection (energy-routing) structure (e.g. transformers

in electrical systems), satisfying he,fi=0,∀(f,e)∈ D, and

dimD=dimF, where E=F∗, and the duality product he,fi

represents power.

The basic property of port-Hamiltonian systems is that

the power-conserving interconnection of any number of port-

Hamiltonian systems is again a port-Hamiltonian system. An

important and useful special case is the class of input-state-

output port-Hamiltonian systems ˙x=[J(x)−R(x)] ∂H

∂x(x)+

g(x)u,y=gT(x)∂H

∂x(x), where u,yare the input–output pairs

corresponding to the control port C,J(x)=−JT(x) is skew-

symmetric, while the matrix R(x)=RT(x)≥0 speciﬁes the

resistive structure.

C. Symmetries and Lie Group of Transformations

Give a a diﬀerential algebraic equation (DAE) F( ˙x,x)=0

with x∈ X ⊆ Rn, the map Ψ:X × S→ X is a symmetry map,

if it is a diﬀeomorphism, and ˆx= Ψ(x) is a solution for the

DAE, that is F(˙

ˆx,ˆx)=0. Therefore the symmetry map must

obey the symmetry condition F∂Ψ(x)

∂x˙x,Ψ(x)=0.

A particular type of symmetry maps are Lie groups of

invariance transformations. The map Ψ:X × S→ X, where

S⊂Ris an interval with 0 ∈S, along with a composition law

φ:S×S→S, deﬁnes a parametrized (Lie) symmetry [12],

and, in particular, a Lie group of invariance transformations,

if for any solution xof the DAE, and for all ∈S, ˆx(t)=

Ψ(x(t),) is also a solution of the system, and (i) Ψ(x, )

is smooth in xand analytic in , (ii) Ψ(·, ) is an injection

for all ∈S, (iii) (S, φ) forms a group with identity element

zero, and φis analytic, (iv) Ψ(x,0) =x,∀x∈D, and (v) if

x= Ψ(x,) and xδ= Ψ(x, δ), then xδ= Ψ(x,φ( ,δ)). Using

the inﬁnitesimal operator X=∂Ψ(x, )

∂ |=0∂

∂x=η(x)∂

∂x, we have

that ˆx= Ψ(x, )=eXxand the symmetry condition becomes

X0F( ˙x,x)=0, or η(x)T∂F

∂x( ˙x,x)+˙xT∂η

∂x(x)∂F

∂˙x( ˙x,x)=0.

III. Port-Hamiltonian representation of the Cucker-Smale

interaction model

We introduce the notion of generalized mass-spring

damper (gMSD) components. We typically consider masses

as having one port, and springs and dampers as having

two ports. Ports are component interfaces through which

energy is exchanged. Their dynamical representations of the

gMSD are as follows: mass ˙p=f,v=∂H

∂q, spring ˙q=v,

f=∂H

∂q, damper f=R(q)v. In the case of the mass, pis

the momentum, fis the force acting on the mass, vis the

mass velocity and His the mass Hamiltonian function. In the

case of the spring, qis the spring elongation (the diﬀerence

between the positions at the two ports), vis the relative

velocity, fis the force through the spring and Hdenotes

the spring’s Hamiltonian. In the case of the damper, fis the

force through the damper, qis the relative position of the

damper, Ris a resitive term as a function of qand vis the

relative velocity.

Proposition 3.1: The CS model with potential is equiv-

alent to a fully connected N-dimensional network of gen-

eralized mass-spring-dampers, where each node iin the

network is a mass, and each link (i,j) a parallel composition

of a spring and damper. The Hamiltonian functions for

the mass-springs are given by H(p)=1

2pTpand H(q)=

1

N"−CAe−kqk

lA+CRe−kqk

lR#, respectively, and and the resistive

function of the damper is given by R(q)=1

N[1+kqk2]γ.

We show an example of this result for the one dimensional

case (p,q∈R) and for the 3 particles case. The result

holds for the general case, but the notations become more

cumbersome. The fully connected topology of the gMSD

network is shown in Figure 1. We denote by Hiand Hij the

Hamiltonian functions of the masses and springs, respec-

tively. We note that since we assume, unitary masses, the

momenta are equal to the mass velocities, that is, pi=vi,

Fig. 1: Fully connected, 3-dimensional gMSD network

i={1,2,3}. The forces through the links are the sum of the

forces through the dampers and springs, and are given by

fi j =∂Hij

∂qi j

+R(qi j)(vi−vj), for (i,j)∈ {(1,2),(2,3),(3,1)}. The

forces through the masses can be expressed as: f1=f31 −f12,

f2=f12 −f23 and f3=f23 −f31. We get the expressions for

the mass momenta dynamics as:

˙p1=∂H31

∂q31

−∂H12

∂q12

+R(q31)(v3−v1)+R(q12 )(v2−v1),(3)

˙p2=∂H12

∂q12

−∂H23

∂q23

+R(q12)(v1−v2)+R(q23 )(v3−v2),(4)

˙p3=∂H23

∂q23

−∂H31

∂q31

+R(q23)(v2−v3)+R(q31 )(v1−v3).(5)

The dynamics for the spring elongations are

˙qi j =vi−vj=∂Hi

∂pi

−∂Hj

∂pj

(6)

for (i,j)∈ {(1,2),(2,3),(3,1)}. To recover the CS model with

potential, we replace the relative positions qij with the

absolute positions, namely qi j =qi−qjRecalling that spring

potentials are symmetric functions, we get that

∂H31

∂q31

−∂H12

∂q12

=−1

3(∇U(q1−q3)− ∇U(q1−q2))(7)

∂H12

∂q12

−∂H23

∂q23

=−1

3(∇U(q2−q1)− ∇U(q2−q3))(8)

∂H23

∂q23

−∂H31

∂q31

=−1

3(∇U(q3−q2)− ∇U(q3−q2))(9)

Substituting (7)-(9) in (3)-(5), and recalling that under our

assumptions pi=vi, we recover exactly the CS model with

potential.

By introducing the notation zT=[pT,qT], with pT=

[p1,p2,p3] and qT=[q12,q23 ,q31], the equations (3)-(5) and

(6) can be expressed compactly as

˙

z=[J(z)−R(z)]∂H(z)

∂z,(10)

where H(z)=H1(p1)+H2(p2)+H3(p3)+H12(q12 )+

H23(q23 )+H31(q31), and

R(z)="R(z)0

0 0 #

with

R(z)=

R(q12)+R(q31 )−R(q12)−R(q31)

−R(q12)R(q12 )+R(q23)−R(q23)

−R(q31)−R(q23 )R(q31)+R(q23)

and where

J(z)="0J

−JT0#,with J=

−1 0 1

1−1 0

0 1 −1

.

We recognize equation (10) as the typical input-state-output

port-Hamiltonian system [9], [10].

IV. Dynamical properties of the Cucker-Smale model

In this section we introduce a set of maps for which

we demonstrate that they satisfy the required properties

for being symmetry or Lie symmetry maps. In addition,

we introduce a conserved quantity that diﬀers from the

Hamiltonian function. The maps and the conserved quantity

will be rediscovered in the learning section. The symmetry

maps will be introduced for both the original CS model and

its port-Hamiltonian representation. We consider the 1-d case

(p∈RN), since the results can be easily generalized to higher

dimensions.

A. Symmetry maps

The following result introduces a symmetry map for the

CS dynamics in port Hamiltonian form.

Proposition 4.1: The map Γ(p,q)=(p+α1,q) for α∈R

is a symmetry map for the port-Hamiltonian dynamics (10).

For the CS dynamics with potential in its original form (1)-

(2), the symmetry map is slightly diﬀerent, as shown next.

Proposition 4.2: The map Γ(x,v,t)=(x+α1t+β1,v+

α1,t) for α∈Ris a symmetry map for the CS dynamics

with potential (1)-(2).

B. Lie group of invariance transformations

As introduced in Section II-C, the Lie group of invariance

transformations [18], [19] are a particular type of symmetry

maps with the form ˆ

z=z+εη(z,t)+O(ε2). The following

result introduces the inﬁnitesimal of the CS model in the

original form.

Proposition 4.3: The map η(z,t)=η(x,v,t)=

αth1T,0TiT+hβ1T,α1TiT, for all α,β ∈Ris an

inﬁnitesimal for the Lie group of invariance transformations

corresponding to the CS dynamics in its original form.

A similar result holds for CS model in port-Hamiltonian

form, where the time dependence of the inﬁnitesimal map

is no longer present.

Proposition 4.4: The map η(z)=η(p,q)=αh1T,0TiT, for

all α∈Ris an inﬁnitesimal for the Lie group of invariance

transformations corresponding to the CS dynamics in port

Hamiltonian form.

C. Conserved quantities

The port-Hamiltonian representation has the advantage of

providing at least one quantity that is conserved, namely

the Hamiltonian. In addition to the Hamiltonian function,

there are other quantities that are conserved. The following

results introduce such quantities for both the original and

port-Hamiltonian representation of the CS particle dynamics.

Proposition 4.5: The quantity 1Tvis conserved by the CS

dynamics (1), that is, 1T˙

v=0, for all t≥0.

We can show similar results in the case of the port-

Hamiltonian representation. We will make use of the Casimir

functions which represent the conserved quantities for port-

Hamiltonain systems.

Proposition 4.6: Any function of the form C(p,q)=

α1Tp+uTq+β, where u∈Null(J), and α, γ ∈Ris a

conserved quantity for the CS dynamics in port-Hamiltonian

form (10), where zT=[pT,qT].

Remark 4.1: Note that in the 3 particle example, the

matrix Jis square and the null space of Jand JTis the

same. In general this it is not true since J∈RN×M, where

M=N(N−1)/2. Hence, only the null space of JTis given

by {α1,α ∈R}.

V. Learning interaction models and their dynamical

properties

To demonstrate that we can indeed recover the theoretical

results proved in the previous sections, we consider an

example where twenty particles (N=20) evolve according

to the CS dynamics. We consider both the original and

port-Hamiltonian representation of the CS dynamics. The

particles operate in a two dimensional space, that is, the

(relative) position and velocity vectors of each particle have

dimension two. The training data were generated by simu-

lating the CS model with parameter γ=0.15, over the time

interval [0,40] sec, starting with random initial conditions

in the interval [0,10]. A realization of the CS, simulation

results is shown in Figure 2, where we plot the particle

speed (norm of the velocity vector). The structure of the

Fig. 2: Particle speed over time kvi(t)k,i∈ {1,...,N}

time series used for training is zT=[xT,yT,vT

x,vT

y], where

x,y,vx,vy∈RN. In the port-Hamiltonian representation, the

structure is slightly diﬀerent, namely zT=[pT

x,pT

y,qT

x,qT

y],

where qx,qy∈RN(N−1)

2, and pT

x,pT

y∈RN. The computation

of the gradients and Jacobians was done using automatic

diﬀerentiation. The learning problems were implemented us-

ing the Python package Autograd [20] and the deep learning

platform Pytorch [21] featuring automatic diﬀerentiation.

A. Particle interaction model

Our ﬁrst task is to recover the interaction model be-

tween particles. We consider the port-Hamiltonian repre-

sentation case, without potential, which can be obtained

by approximating the spring potential function with zero,

by appropriately choosing the parameters of the potential

function. Using the port-Hamiltonian formalism, this task

translates to learning the constitutive equation for a gen-

eralized damper. In particular we learn Fi j =g(q2

i j;w) ˙qi j

that describes the force acting between two particles i,j,

where qi j is the relative position between two the particles.

We choose the map gto be a neural network (NN) with

one hidden layer of size 12, whose output is given by y=

W[1] tanhW[0]u+b[0]+b[1] , where the weight exponents

denote the layer number. Hence we have a total of 37

parameters. Note that we can add a ReLu type of activation

on the last layer to impose a non-negative output of the

NN. To learn the parameters of the map g, we solve the

optimization problem minw1

nPn

i=1kz(ti)−ˆ

z(ti;w)k2, where n

is the number of time samples, w={W[0],b[0] ,W[1],b[1]}

is the set of optimization variables, ˆ

z(ti) are time samples

of the solution of (10) with the resistive term deﬁned by

R(q)=g(kqk2;w), and no potential between particles. The

initial positions and velocity were uniformly drawn from the

interval [0,10]. We used the Autograd package and its Adam

algorithm implementation to solve the least square problem

introduced above. The optimization error was set to terminate

when a value smaller that 10−5is reached. We compared the

trained interaction model with the “real” interaction model,

as shown in Figure 3. We limited ourselves to a relative

Fig. 3: Comparison between the “real” (blue) and the trained

(dotted red) particle interaction models

distance between [−35,35] since this was the maximum

distance the particles reached between them over time. The

MSE between the trained and the “real” interaction curves

over the interval [-35,35] is 1.3×10−4. We note that there is

some miss-match near zero due to the fact that the particles

never got close enough. Next, we tested the interaction model

on data not used in the training but whose initial conditions

have similar statistics as the initial conditions of the training

data. The MS Etest(ti)=1

Nkz(ti)−ˆ

z(ti)k2, where z(ti), ˆ

z(ti)

designate samples of the time series obtained with the “true”

and learned interaction models, respectively, is shown in

Figure 4. We note that the prediction error stabilizes to a

reasonable small value.

Fig. 4: The MSEs of the velocity vectors for test data

B. Lie group of invariance transformations

The Lie group of transformations ψhas a structure

of the form ψ(z)=z+εη(z;w)+o(ε2), where η(z;w) is

the inﬁnitesimal of the transformation [18], [19]. We

consider a linear parameterization of the form η(z;w)=

Az+band the goal is to ﬁnd the parameters of

the inﬁnitesimal by solving the optimization problem

minA,b1

n×NPn

i=1

∂η

∂z(z(i))f(z(i))−∂f

∂z(z(i))η(z(i))

2, where nde-

notes the total number of vector samples. The optimization

problem was solved for the port-Hamiltonain representation,

using the Adam algorithm and Autograd to compute the

gradient of the cost function using automatic diﬀerentiation.

To improve the speed of the optimization algorithm we

computed oﬄine the values for the maps f(z) and ∂f

∂zat

each sample of the training data z(i). We generated 50 time

series describing the CS dynamics over the time interval

[0,40] sec, using M=50 initial condition vectors uniformly

drawn from [0,10], generating roughly 5000 data samples.

We stopped the optimization process when the MSE loss

function reached MS Etrain =1.1×10−4. As sanity check, we

looked at the structure of the learned Aand b. The structure

of bis according to what we would expect: same values

for the ﬁrst half of the vector (of roughly 1.8049) and small

values for the second half (<10−4). The entries of Aalthough

small, they were not zero, which may be a result of the fact

that we limited the number of optimization iterations.

The test data were generated randomly, in a similar way as

the training data, using a time interval [0,80] sec, generating

roughly 10000 samples. The longer time interval checks the

time extrapolation as well. As metric we used the MSE

applied on trajectories this time. We have two types of

trajectories. The ﬁrst type denoted by z(t) is a trajectory

generated by solving the CS diﬀerential equations, with

initial conditions obtained by applying the learned symmetry

transformation to the initial conditions of the test data. The

second type, denoted by ˆ

z(t) is obtained by applying the

learned symmetry map on the test data itself. Formally,

we deﬁne the metric MS Ete st =1

n×M×NPM

i=1Pn

j=1kz(i)(tj)−

ˆ

z(i)(tj)k2, where nis the number of time samples per time

series, Mis the number of time series, and z(i)(tj) is the

vector of position and velocity coordinates at time tjof

the time series i. We obtained the following MSE for the

test data: MS Ete st =6.2×10−4. We computed also the MSE

evolution over time for the trajectories, where the averaging

was taken over the time series indices (Mof them) and

entries of the state vector, but not over time as well. The

result is shown in Figure 5. We note that that prediction

Fig. 5: MSE particle velocities over time

error accumulates over time, which most likely comes from

the fact that the learned symmetry map was not exact, due

in part to the limited number of optimization iterations.

C. Symmetry maps

We repeat the learning process for the discrete symmetry

case, using this time the CS model in its original form.

We search for a map Γso that ˆ

z= Γ(z,t) is a solution

of the CS ODE ˙

z=f(z), as well. We assume that the

time remains unchanged by the symmetry, hence no

map for the time is included. We consider a linear

parameterization of the symmetry map, Γ(z,t)=Az+bt +c,

which includes time dependence as well. To learn the map

parameters, we solve the following optimization problem

minA,b,c1

n×NPn

i=1

∂Γ

∂z(z(i),ti)f(z(i))+∂Γ

∂t(z(i),ti)−fΓ(z(i),ti)

2.

We used the Pytorch deep learning platform to implement

the optimization process, using the same Adam algorithm

as in the case of the Lie symmetries group. Pytorch features

automatic diﬀerentiation as well, but has the advantage that

can be used with graphics processing units (GPUs), when

the optimization problem can be parallelized. To give an

idea of why Pytorch can be more eﬀective when scaling

up the problem in number of particles, Figure 6 shows a

comparison between the average time for an optimization

iteration of the Pytorch Adam’s algorithm when using CPU

and GPUs, as a function of the number of particles. We note

that unlike the CPU case, when using GPUs the average

iteration time grows linearly with the number of particles.

In addition, in terms of average iteration time when using

the CPU, Pytorch is superior to Autograd: 4.9 sec for

Autograd versus 2.2 sec for Pytorch for the 20 particle case,

for the same number of training samples. We use a similar

Fig. 6: Average time for an Adam iteration when using CPU

(blue curve) and GPUs (red curve), as a function of the

number of particles

strategy to generate training and test data, as in the case

of the Lie symmetries group. We stopped the optimization

algorithms when the MSE reached 5.5×10−5. The MSE for

the test data was 0.007. The test data MSE as a function of

time is shown in Figure 7. The same phenomenon of error

accumulation over time is noticed as in the case of the Lie

symmetries group.

D. Conservation laws

In this section we demonstrate that we can recover

conserved quantities as introduced in Proposition 4.6,

whose statement can be easily generalized to the two di-

mensional case. Namely, the Casimir functions have the

form C(px,py,qx,qy)=αx1Tpx+αy1Tpy+uT

xqx+uT

yqy+

β,∀α∈Rand ∀ux,uy∈Null(J). To learn the Casimir

function C(z;w), we solve the optimization problem

minw1

nPn

i=1

JT∂C

∂px(z(i))

2+

JT∂C

∂py(z(i))

2+

J∂C

∂qx(z(i))

2+

J∂C

∂qy(z(i))

2. We considered two type of parameterizations:

a linear parameterization given by C(px,py,qx,qy)=aT

xpx+

aT

ypy+bT

xqx+bT

yqyand a nonlinear parameterization given by

a neural network with a hidden layer of size 2N+N(N−1)

deﬁned by C(z)=W[1] tanh W[0] z+b[0] +b[1] . For the

linear case, the partial derivatives of Cwere hard-coded since

Fig. 7: MSE particle velocities over time

they are simple and do not depend on the training data.

We did use though Autograd to compute analytically the

gradient of the loss function. We initialized the optimization

variables randomly, and we run the Adam algorithm for 2500

iterations with a ﬁxed step of 0.001. Each iteration in the

Adam algorithm takes roughly 3msec. For sanity check, we

looked at the structure of the learned vectors axand ay,

whose entries are shown in Figure 8. We note that we indeed

recovered the expected structure, namely αx1and αy1.

Fig. 8: Entries of vectors axand ay

Another sanity check measure is to plot the evolution of

the Casimir function as a function of time depicted in Figure

9, showing that it has a constant value of 1077.53. The value

of the Casimir function depends on the initial conditions for

both the training data and the optimization variables. We

repeated the learning process for a nonlinear (neural network)

parameterization. In this case, we used Autograd to construct

functions that can be called to compute the partial derivatives

of the Casimir function. In addition of these Jacobians,

we used Autograd to generate the gradient of the loss

function. As a result, each iteration of the Adam algorithm

becomes slower, namely 3 sec. We run the algorithm for

Fig. 9: Casimir function over time for the linear parametriza-

tion

500 iterations starting from random initial conditions for the

optimization variables, selected around the zero value. The

Casimir function for the nonlinear parametrization, computed

at each point on the state trajectory is shown in Figure 10,

where we notice that the function takes a constant value of

approximately -0.4641

Fig. 10: Casimir function over time for the nonlinear param-

eterization

VI. Conclusions

In this paper we proposed a framework based on port-

Hamiltonian modeling formalism, aimed at learning inter-

action models between particles and dynamical properties

such as trajectory symmetries and conservation laws of

ensembles(or swarms) using large-scale optimization ap-

proaches. We built upon the Cucker-Smale particle inter-

action model, which we represented in a port-Hamiltonian

form, and for which we re-discovered the interaction model,

and learned the dynamical properties that were previously

proved analytically. Our approach can potentially be used

for discovering novel particle interaction rules which can

lead to new cooperative control system laws. The future

steps will include scaling up the problem to a very large

number of particles, considering non-linear parameterizations

for the symmetry maps, and re-casting the learning tasks in

a form that is compatible with parallel GPU computations

on deep learning platforms. In addition, we will explore if

symbolic computation of Jacobians together with automatic

diﬀerentiation of the loss function will lead to a signiﬁcant

decrease in time per optimization iteration.

References

[1] J.S. Baras. A fresh look at network science: Interdependent multi-

graphs models inspired from statistical physics. In Proceedings of the

6th International Symposium on Communication, Control and Signal

Processing, pages 497–500, May 2014.

[2] F. Lu, M. Zhong, S. Tang, and M. Maggioni. Nonparametric inference

of interaction laws in systems of agents from trajectory data. arXiv

preprint arXiv:1812.06003, 2018.

[3] S.L. Brunton, J.L. Proctor, and J.N. Kutz. Discovering governing

equations from data by sparse identiﬁcation of nonlinear dynam-

ical systems. Proceedings of the National Academy of Sciences,

113(15):3932–3937, 2016.

[4] J. Bongard and H. Lipson. Automated reverse engineering of nonlinear

dynamical systems. Proceedings of the National Academy of Sciences,

104(24):9943–9948, 2007.

[5] I. Matei, J. de Kleer, and R. Minhas. Learning constitutive equations

of physical components with constraints discovery. In 2018 Annual

American Control Conference (ACC), pages 4819–4824, June 2018.

[6] I. Matei, J. De Kleer, M. Zhenirovskyy, and A. Feldman. Learning

constitutive equations of physical components with predeﬁned feasi-

bility conditions. In 2019 American Control Conference (ACC), pages

922–927, July 2019.

[7] Z. Mao, Z. Li, and G.E. Karniadakis. Nonlocal ﬂocking dynamics:

Learning the fractional order of pdes from particle simulations. arXiv

preprint arXiv:1810.11596, 2018.

[8] A.J. van der Schaft and B.M. Maschke. Port-hamiltonian systems on

graphs. SIAM Journal on Control and Optimization, 51(2):906–937,

2013.

[9] A.J. van der Schaft. Port-hamiltonian systems: an introductory survey.

In M. Sanz-Sole, J. Soria, J.L. Varona, and J. Verdera, editors,

Proceedings of the International Congress of Mathematicians Vol. III,

number suppl 2, pages 1339–1365. European Mathematical Society

Publishing House (EMS Ph), 2006.

[10] A.J. van der Schaft and D. Jeltsema. Port-hamiltonian systems theory:

An introductory overview. Foundations and Trends R

in Systems and

Control, 1(2-3):173–378, 2014.

[11] J. Cervera, A.J. van der Schaft, and A. Ba˜

nos. Interconnection of port-

hamiltonian systems and composition of dirac structures. Automatica,

43(2):212–225, 2007.

[12] A. Mouchet. Applications of Noether conservation theorem to Hamil-

tonian systems. Annals of Physics, 372, 12 2015.

[13] J. Schwichtenberg. Physics from symmetry. Springer, 2015.

[14] J.S. Baras. Group invariance and symmetries in nonlinear control

and estimation. Nonlinear Control in the Year 2000, A. Isidori, F.

Lamnabhi-Lagarrigue, W. Respondek (Edts.), 1:137–171, December

2000.

[15] C.W. Reynolds. Flocks, herds and schools: A distributed behavioral

model. In ACM SIGGRAPH computer graphics, volume 21, pages

25–34. ACM, 1987.

[16] J.A. Carrillo, M. Fornasier, G. Toscani, and F. Vecil. Particle, kinetic,

and hydrodynamic models of swarming. Birkh¨

auser Boston, Boston,

2010.

[17] J.A. Carrillo, S. Martin, and V. Panferov. A new interaction potential

for swarming models. Physica D: Nonlinear Phenomena, 260:112–

126, 2013.

[18] G.W. Bluman and S.C. Anco. Symmetry and integration methods for

diﬀerential equations. Applied Mathematical Sciences, (154), 2002.

[19] A.F. Cheviakov G. W. Bluman and S.C. Anco. Applications of sym-

metry methods to partial diﬀerential equations. Applied Mathematical

Sciences, (163), 2010.

[20] D. Maclaurin, D. Duvenaud, M. Johnson, and J. Townsend. Autograd.

https://github.com/HIPS/autograd, 2018.

[21] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito,

Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic diﬀerenti-

ation in PyTorch. 2017.