Page 1

Neural Networks 29–30 (2012) 8–19

Contents lists available at SciVerse ScienceDirect

Neural Networks

journal homepage: www.elsevier.com/locate/neunet

The eMOSAIC model for humanoid robot control

Norikazu Sugimotoa,b,∗, Jun Morimotob, Sang-Ho Hyonb,c, Mitsuo Kawatob

aNational Institute of Communication Telecommunication, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto 619-0288, Japan

bATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto 619-0288, Japan

cDepartment of Robotics, Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan

a r t i c l e i n f o

Article history:

Received 16 November 2010

Received in revised form 20 September

2011

Accepted 13 January 2012

Keywords:

Modular architecture

Nonlinear and non-stationary control

problem

Humanoid robot

Computational neuroscience

a b s t r a c t

In this study, we propose an extension of the MOSAIC architecture to control real humanoid robots.

MOSAIC was originally proposed by neuroscientists to understand the human ability of adaptive control.

The modular architecture of the MOSAIC model can be useful for solving nonlinear and non-stationary

control problems. Both humans and humanoid robots have nonlinear body dynamics and many degrees

of freedom. Since they can interact with environments (e.g., carrying objects), control strategies need to

deal with non-stationary dynamics. Therefore, MOSAIC has strong potential as a human motor-control

model and a control framework for humanoid robots. Yet application of the MOSAIC model has been

limitedtosimplesimulateddynamicssinceitissusceptivetoobservationnoiseandalsocannotbeapplied

to partially observable systems. Our approach introduces state estimators into MOSAIC architecture to

cope with real environments. By using an extended MOSAIC model, we are able to successfully generate

squatting and object-carrying behaviors on a real humanoid robot.

© 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Previous studies have suggested that the human central

nervous system acquires and switches internal models of outside

environments to adaptively perform motor control of the body

in various environments (Imamizu, Kuroda, Yoshioka, & Kawato,

2004; Imamizu et al., 2007). Modular selection and identification

for control (MOSAIC) architecture composed of multiple linear

state predictors and controllers was originally proposed in

order to explain the motor-control strategy of the human

brain (Haruno, Wolpert, & Kawato, 2001; Wolpert & Kawato,

1998). Humans and humanoid robots both have nonlinear body

dynamics and many degrees of freedom (DOF). Moreover, they

both interact with objects in real environments, which naturally

requires nonlinear and non-stationary control strategies. MOSAIC

architecture has strong potential as a human motor-control model

and a control framework for humanoid robots, and its flexible

structure enables the robot to control the nonlinear and non-

stationary environment. But application of the MOSAIC model

has been limited to simple simulated dynamics, which can be

partially attributed to two problems: (1) the MOSAIC model does

∗Corresponding author at: National Institute of Communication Telecommuni-

cation, 2-2-2 Hikaridai Seiko-cho, Soraku-gun, 619-0288 Kyoto, Japan. Tel.: +81

0774 95 1064; fax: +81 0774 95 1259.

E-mail addresses: xsugi@nict.go.jp (N. Sugimoto), xmorimo@atr.jp

(J. Morimoto), gen@fc.ritsumei.ac.jp (S. Hyon), kawato@atr.jp (M. Kawato).

not explicitly consider the existence of noise input to sensory

systems and (2) it assumes full observation and thus cannot deal

with partially observable systems. In this study, we propose an

extension of the MOSAIC architecture to cope with observation

noise and partially observable systems. Each module of MOSAIC

has a forward model, and we can adopt these to construct a

state estimator. Moreover, using state estimators can provide a

reasonable model of the sensorimotor function of the central

nervous system, as previously suggested in Wolpert, Ghahramani,

and Jordan (1995). A state estimation strategy using switching

linear models is also considered to be a useful approach for

estimating hidden variables of complicated nonlinear dynamics

(Ghahramani & Hinton, 2000). Our new method, called extended

MOSAIC with state estimators (eMOSAIC), can deal with large

observationnoiseandpartiallyobservablesystems.Basedonthese

advantages the proposed control framework can be applied to real

systems such as humanoid robots.

A popular approach to controlling nonlinear dynamics is using

the inverse dynamics model and canceling the nonlinear terms.

Then, linear error dynamics are introduced to reduce the tracking

error. This approach is called feedback linearization (FL; Slotine &

Li, 1991). FL performs well if the given inverse dynamics model is

accurate. However, it is well known that estimating the inverse

model of the real system is difficult due to the existence of

friction. The controller of the proposed method can be derived

as a linear optimal controller, for which it is well known that

the sensitivity for the modeling error is low. Thus, the proposed

methodismorerobustagainstthemodelingerrorthanFL.Another

0893-6080/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.

doi:10.1016/j.neunet.2012.01.002

Page 2

N. Sugimoto et al. / Neural Networks 29–30 (2012) 8–19

9

popular nonlinear control method is gain scheduling (GS) control

(Rugh, 1991; Rugh & Shamma, 2000); The GS considers nonlinear

dynamics as a linear parameter varying (LPV) system (Shamma &

Athans, 1992) defined around a pre-determined state. Although an

analysis of robustness is well studied (Chesi, 2010; Lim & How,

1998), a design of the scheduling parameter requires a knowledge

of structure of task and plant, thus it is difficult to apply the GS to

a complex environment such as a humanoid robot. On the other

hand, in our proposed method appropriate models can be selected

according to prediction accuracy of each linear state predictor.

Adaptive control (Astrom & Wittenmark, 1989) is a standard

method for controlling a non-stationary system. In this method,

controller parameters are adaptively modified according to the

tracking error. Since this method assumes that the environment

change is gradual, it cannot cope with a sudden change of

dynamics; e.g., carrying or grabbing extra weight. To control

such an environment, the switching control method seems to be

suitable (Morse, 1996). In the proposed method, again, the module

can be switched according to prediction accuracy of each linear

state predictor.

In Section 2, we introduce eMOSAIC. In this study, we adopt an

optimal control approach in the MOSAIC model (as proposed in

Doya, Samejima, Katagiri, and Kawato (2002)).

In Sections 3 and 4, we evaluate the control performance of

the eMOSAIC model in environments with large observation noise,

partially observable setup, and modeling errors. We investigate (1)

nonlinear control performance through a squatting task (Grimes,

Chalodhorn, & Rao, 2006; Nakaoka, Nakazawa, & Ikeuchi, 2004)

using either a two-link robot model or the humanoid robot, and

(2) non-stationary control performance through a object-carrying

task using either a single-pendulum robot model or the humanoid

robot with a weight. We also show that eMOSAIC significantly

outperforms the original MOSAIC in these tasks. In Section 3, we

consider the squatting task by using the two-link robot model and

the lifting task by using the single-pendulum robot model in a

simulated environment. In Section 4, we apply eMOSAIC to control

our humanoid robot CB-i (Fig. 1) to demonstrate that the proposed

model can be used in a real environment. The results show that

a humanoid robot can maintain its balance while performing the

squatting and object-carrying tasks using the eMOSAIC model.

2. The eMOSAIC model

In this section, we introduce the eMOSAIC model to control a

highly nonlinear system, such as our CB-i humanoid robot (see

Fig. 1), in a non-stationary environment.

2.1. Optimal control problem of nonlinear and non-stationary dy-

namics

We consider an optimal control problem of nonlinear and non-

stationary dynamics, where the dynamics are represented as:

x(t + 1) = f(x(t),u(t),t) + n(t),

y(t) = h(x(t),t) + v(t),

where x ∈ ℜN,u ∈ ℜD, and y ∈ ℜLare state, action and

observationvectors,respectively,andn(t) ∼ N(0,Σx)andv(t) ∼

N(0,Σy) are system and observation noises. N(0,Σ) denotes a

Gaussian distribution with zero mean and covariance Σ.

In an optimal control framework, the learning system tries to

find the optimal controller to minimize the objective function:

s=0

(1)

(2)

J = E

∞

r(x(s),u(s))

,

(3)

Fig. 1. Humanoid robot CB-i. Height: 1.6 m. Weight: 90 kg. Robot has 51 degrees

of freedom.

wherer(x,u)isthecostfunction.Tofindtheoptimalcontrollerfor

minimizing theobjective function, we estimatethe value function:

s=t

V(x(t)) = E

∞

r(x(s),u(s))

.

(4)

2.2. The eMOSAIC model

The eMOSAIC model has a modular architecture. Each module

is composed of a state estimator, responsibility predictor, value

function estimator, and controller. We approximate nonlinear

and non-stationary dynamics by switching linear models and a

nonlinear cost function by switching quadratic models:

x(t + 1) = Aix(t) + Biu(t) + ci+ n(t),

y(t) = Hix(t) + v(t),

ri(x(t),u(t)) =

where Ai∈ ℜN×Nand Bi∈ ℜN×Dare regression parameter of the

ith linear dynamics, ci∈ ℜNis bias parameter, and Hi∈ ℜL×Nis

an observation matrix. Qi∈ ℜN×Nand Ri∈ ℜD×Dare parameters

of the ith quadratic cost function. Therefore, each state estimator

and controller can be represented by a linear model, and the value

function estimator can be represented by a quadratic model.

Fig. 2 shows a schematic diagram of the eMOSAIC model. The

responsibility predictor derives the responsibility of each module

based on the state-prediction accuracy of the state estimator.

Final output from the learning system is then derived as the

weightedsumofeachmodule’soutputbytheresponsibilitysignal.

Below, we explain the details of (I) learning forward models,

(II)stateestimator,(III)responsibilitypredictor,(IV)valuefunction

estimator, and (V) controller.

(5)

(6)

1

2x(t)TQix(t) +

1

2u(t)TRiu(t),

(7)

Page 3

10

N. Sugimoto et al. / Neural Networks 29–30 (2012) 8–19

Fig. 2. extended MOSAIC with state estimators (eMOSAIC).

2.2.1. (I) Learning forward models

The objective of learning in each linear dynamics model is

to minimize the weighted prediction error. If each model is

represented in a linear form xi(t + 1) = Wiz(t),1the expected

values for the regression parameters Wican be derived as

Wi=

⟨xz⟩i(T)

⟨zzT⟩i(T).

(8)

The notation ⟨g⟩i(T) annotates a weighted mean of a function g

with respect to the responsibility signal λi(t):

⟨g⟩i(T) =

1

T

T

t=1

g(t)λi(t).

(9)

The responsibility signal is a probability distribution of the module

selection(describedinSection2.2.3).Byiteratingtheresponsibility

signal calculation and parameter update, the likelihood will

increase to the suboptimal.

2.2.2. (II) State estimators

The state estimator estimates the latent states of the dynamics

from an observation. We use a linear state estimator:

ˆ xi(t + 1|t) = Aiˆ xi(t) + Biu(t) + ci,

ˆ xi(t + 1) = ˆ xi(t + 1|t) + Ki(y(t) − Hiˆ xi(t + 1|t))

where ˆ xiis the estimated state and Kiis the parameter of the

state estimator. We derive the parameter Kiby solving the linear

optimal estimation problem (Kalman & Bucy, 1961; Lewis, 1986)

(see Appendix A).

(10)

(11)

2.2.3. (III) Responsibility predictors

The contribution of each module is represented by probability

distribution λi, which we call the ‘‘responsibility signal’’. The

responsibility signal λiis given by the following Bayes’ rule:

λi(t) =

P(i)p(x(t) | y(1 : t),i)

where M is the set of module indices, p(x(t) | y(1 : t),i) is the

likelihood of the ith module, and P(i) is the prior. By assuming

that the prediction error and estimation error are Gaussian with

i′∈M

P(i′)p(x(t) | y(1 : t),i′),

(12)

1Wiand z represent [AiBici] and [xTuT1]T.

covariances Σxand Σy, the likelihood of the ith module p(x(t) |

y(1 : t),i) is given by

p(x(t) | y(t),i) ∝ p(y(t) | x(t|t − 1),i)p(x(t|t − 1),i),

p(y(t) | x(t|t − 1),i)

=

(13)

1

(2π)L|Σy|

p(x(t|t − 1),i) = p(xi(t|t − 1) | xi(t − 1),u(t − 1))

=

exp

−1

2ei(t)TΣ−1

yei(t)

,

(14)

ei(t) = y(t) − Hixi(t|t − 1),

(15)

1

(2π)N|Σx|

exp

−1

2di(t)TΣ−1

xdi(t)

, (16)

di(t) = xi(t) − {Aixi(t − 1) + Biu(t − 1) + ci},

where ˆ xi(t|t−1) is the predicted state of the state estimator of the

ith module and ei(t) = y(t) − Hiˆ xi(t|t − 1) is the so-called error

of innovation.

Finally, the estimated state is derived by the weighted sum of

each module:

where ˆ xi(t) is the estimated state of the ith module at time t.

(17)

ˆ x(t) =

i∈M

λi(t)ˆ xi(t),

(18)

2.2.4. (IV) Value function estimators

We derive the controller by locally solving the linear-quadratic

optimal control problem. Since we approximate nonlinear and

non-stationary dynamics by multiple linear models in Eq. (5) and

the cost function by the quadratic functions in Eq. (7), we can

locally estimate the value function by using a quadratic function:

Vi(x(t)) =

where the matrix Piis given by solving the Riccati equation:

1

2(ˆ x(t) − xv

i)TPi(ˆ x(t) − xv

i),

(19)

0 = PiAi+ AT

The center xv

iPi− PiBiR−1

iof the ith value function is given by

i

BT

iPi+ Qi.

(20)

xv

i= −(Qi+ PiAi)−1Pici.

(21)

2.2.5. (V) Controllers

By using the estimated value function in Eq. (19), we can derive

linear optimal controller for the ith module as (Doya, 2000; Lewis,

Page 4

N. Sugimoto et al. / Neural Networks 29–30 (2012) 8–19

11

a

b

c

Fig. 3. (a) Two-link robot model. The mass and length of each link are 5 kg and 0.5 m, respectively, and the friction coefficient of each joint is 0.1. (b) Relationship between

the phase of the periodic pattern generatorφ and the posture of the two-link robot model. For details, see Appendix D. (c) Single pendulum model with payload. The masses

of weights attaches at the corner and tip are 10 kg and 5 kg, and the lengths of long and short axes are 1.5 m and 0.5 m, respectively.

1986) as:

ui(t) = −RiBT

The final output of the controller is derived by the weighted sum

of each module output:

iPi(ˆ x(t) − xv

i).

(22)

u(t) =

i∈M

λi(t)ui(t).

(23)

3. Simulations

We evaluated the proposed method by using a simple robot

model (see Fig. 3(a) and (c)). The squatting and lifting tasks

evaluate the nonlinear and non-stationary control performance,

respectively.

3.1. Squatting task

3.1.1. Simulation setup

A basic squatting behavior is provided by a periodic pattern

generator. Fig. 3(b) shows the relationship between the phase of

the periodic pattern generator (φ) and the posture of the two-

link robot model (see Appendix D). The pattern generator outputs

a simple sinusoidal trajectory to a proportional derivative (PD)

controller of the robot. The frequency of the squatting movement

is 0.2 Hz. Since a robot flexes and extends its legs periodically,

a complex control law is required to prevent it from falling. We

apply eMOSAIC to control the two-link robot model so that it can

maintain its balance during the squatting. Note that the two-link

robot model cannot maintain its balance by only using the output

of the periodic pattern generator.

The angle of the lower link and the center of mass (CoM) are

represented by θ1and θ2(see Fig. 3(a)). The input state vector

is x

=

desired trajectory of the root joint. The sum of the relative desired

trajectory and the output of the periodic pattern generator is used

to derive torque at each joint based on a PD controller. The torque

commands for the actuator are generated by the PD controller

which is given by

[θ1θ2˙θ1˙θ2]T. The output of eMOSAIC is a relative

τ = −K(q − qd),

where q = [qrootqmiddle˙ qroot˙ qmiddle]Trepresents the position

and velocity of the joint angles, qdis the desired trajectory, τ =

[τrootτmiddle]Tis a torque command for the actuator of the robot,

and K =

The cost is given in the quadratic form (Eq. (7)). The parameters

of the quadratic cost are Qi= diag{0,1,0,1} and Ri= 0.1 for all

modules.Onetriallasted10s.Thesimulationandobservationtime

step was 0.002 s.

We evaluated the prediction error, the control cost, and the

computation time without the observation and system noise in

order to select a number of modules. Each module segmented

the squatting movement at regular phase intervals of the pattern

generator from φ = 0 (standing position) to φ = π (crouching

(24)

50

0

0

50

10

0

0

10

is a gain matrix of the PD controller.

Page 5

12

N. Sugimoto et al. / Neural Networks 29–30 (2012) 8–19

a

b

c

Fig. 4. Evaluation result of (a) average prediction error, (b) average cost, and (c) computation time for different number of modules, respectively.

a

b

c

Fig. 5. Results of squatting task. (a) Average costs with different observation noise. The means and standard deviations over 100 simulation runs are plotted. (i) The dotted

line indicates the result of original MOSAIC. (ii) The dashed line indicates the results of eMOSAIC without using responsibility predictor. (iii) The solid line indicates the

results of the proposed method. (b) The trajectory of CoM (θ2). The dotted and solid lines indicate the original MOSAIC and eMOSAIC. (c) The responsibility signal of the

proposed method (iii) is shown. Panel A indicates the phase of the periodic pattern generator, and panels B–D indicate the responsibility signals of the three modules. The

low-passed signals are plotted with a cutoff frequency of 10 Hz.

position). For example, in the case of three modules (M = 3), each

forward model predicts the dynamics around φ = 0,1

for each module, we sampled trajectory around the corresponding

posture by using random Gaussian noise as a control input and

estimated the parameters of forward model (see Eq. (8)). Fig. 4

shows the relationships between the number of modules and

average prediction error, average control cost, and computation

time for one trial. The prediction error and the control cost

decreased drastically from M = 1 to M = 3 while significant

differences between these values with M = 3 and these with M =

6 were not observed. On the other hand, computation time linearly

increased when the number of modules increased. Therefore,

we decided to use three modules for this task. The acquired

parameters of the linear models are presented in Appendix B.

In this simulated environment, we focused on showing three

advantages of the eMOSAIC model: (1) it can be applied to an

environment with large observation noise (see Section 3.1.2), (2) it

can be applied to a partially observable system (see Section 3.1.3),

and (3) it can cope with modeling errors (see Section 3.1.4).

2π,π. Then,

3.1.2. Robustness against large observation noises

First we evaluated the robustness of eMOSAIC against large

observation noises. For this comparison, we considered two

methods: original MOSAIC and eMOSAIC.

We tested the control performance of two methods with

observation noises: Σy = σ2

obsI where σ2

obs= 0–0.12. Fig. 5(a)

shows the relationship between the size of the observation noise

and the average cost in the squatting task. The mean and standard

deviation of the cost over 100 simulation runs are plotted. Dotted

and solid lines represent results acquired by the two methods,

original MOSAIC and eMOSAIC, respectively. The average cost of

the original MOSAIC rapidly increased based on according to the

size of the observation noise. In contrast, the average cost of

eMOSAIC was relatively insensitive to the size of the observation

noise. These results indicate that eMOSAIC is robust against

observation noise.

Next, we evaluated the effect of the prior information for the

module selection. If the linear modules are sparsely allocated, the

responsibility signal tends to chatter around a border of the local

state spaces specified by the linear models, due to the softmax

calculation of Eq. (12). Chattering of the responsibility signals

causes damage to the hardware in the real environment. In order

to avoid the hardware damage, the temporal continuity, which

assumesthatthemodulesswitchsmoothly,aspriorinformationof

module selection can be introduces. We describe the details of the

temporal continuity in Appendix C. The natural frequency of the

pendulum represented by CoM at the standing position (φ = 0) is

0.705 Hz. We empirically found that the time constant parameter

of the temporal continuity ρ around 10–20 times of the natural

frequency leads good control performance. In this simulation, we

selected the time constant parameter of the temporal continuity

as a 15 times of the natural frequency (ρ =

1

15×0.705). Dashed