Content uploaded by Bin Yang

Author content

All content in this area was uploaded by Bin Yang on Jan 16, 2015

Content may be subject to copyright.

Evolving Additive Trees for Modeling

Biochemical Systems

Yuehui Chen

∗

Bin Yang Yaou Zhao

Qingfang Meng

Computational Intelligence Lab, School of Information Science and Engineering,

University of Jinan, Shandong, Jinan 250022, P.R. China

Abstract This paper presents a hybrid evolutionary method for identifying a system of ordinary

differential equations (ODEs) from the observed time series. In this approach, the tree-structure

based evolution algorithm and particle swarm optimization (PSO) are employed to evolve the ar-

chitecture and the parameters of the additive tree models for the problem of system identiﬁcation.

Experimental results on modeling biochemical system show that the proposed method is more fea-

sible and effective than other related works.

Keywords Additive tree models, Evolutionary Algorithms, Ordinary differential equations, Par-

ticle swarm optimization, Biochemical systems

1 Introduction

Many complex systems and nonlinear phenomena changing over time exist in physics,

chemistry, economics, bioinformatics etc. Weather forecasting, quantum mechanics, wave

propagation, stock market dynamics and identiﬁcation of biological systems are exam-

ples [1]. The system of differential equations is a powerful and ﬂexible model, which

can describe complex relations among components [3]. So a lot of problems in these

ﬁelds can be expressed by the ordinary differential equations (ODEs). Thus the differen-

tial equation identiﬁcation is very important, and various methods have been proposed to

infer ODEs during the last few years. The methods can be classiﬁed into two categories:

one is to identify the parameters of the ODEs and another is to identify its structure. The

former is exempliﬁed by the Genetic Algorithms (GA), and the latter is by the Genetic

Programming (GP). Cao and his colleagues use GP to evolve the ODEs from the ob-

served time series in 1999 [2]. His main idea is to embed the genetic algorithm (GA) in

genetic programming (GP), where GP is employed to discover and optimize the model’s

structure, and GA is employed to optimize the model’s parameters. They show that the

GP-based approach introduces numerous advantages over the most available modeling

method. H.Iba propose the ODEs identiﬁcation method by using the least mean square

(LMS) along with the ordinary GP [3]. Some individuals are created at some intervals of

generations and replace the worst individuals in the population by the LMS. I.G.Tsoulos

∗

Email: yhchen@ujn.edu.cn

The Third International Symposium on Optimization and Systems Biology (OSB’09)

Zhangjiajie, China, September 20–22, 2009

Copyright © 2009 ORSC & APORC, pp. 132–141

and I.E.Lagar propose a novel method based on grammatical evolution [1]. This method

forms generations of trial solutions expressed in an analytical closed form.

In 2005, we have proposed a new representation scheme of the evolved additive tree

models for the system identiﬁcation, especially the reconstruction of polynomials and

the identiﬁcation of linear/nonlinear systems. This model is robust, and it is easy to be

analyzed by traditional techniques. This is because the evolved additive tree model is

simple in form and is very similar with the traditional representation of the system to be

reconstructed [ 10].

In this paper, we propose a hybrid evolutionary method, in which the tree-structure

based evolution algorithm and particle swarm optimization (PSO) are employed to evolve

the architecture and the parameters of the additive tree models for system of ordinary dif-

ferential equation identiﬁcation. The partitioning [4] is used in the process of identifying

the system’s structure. Each ODE in the ODEs is separately inferred.

The paper is organized as follows. In Section 2, we describe the details of our method.

In section 3, the four examples are performed to validate the effectiveness and precision

of the proposed method. Conclusions are reported in Section 4.

2 Method

2.1 The models’ structure optimization

2.1.1 The additive tree model

We use the tree-structure based evolution algorithm to evolve the architecture of the

additive tree models for the system of ordinary differential equation identiﬁcation. For

this purpose, we encode the right-hand side of a ODE into an additive tree individual

(Figure.1).

1

5

H[S

VLQ

[

[

[

[

[

:

:

:1

:1

G[L

GW

:[[:5:1H[S[[:1VLQ[

Figure 1: Example of a ODE which encoded into an additive tree model.

Two instruction / operator sets I

0

and I

1

are used to generate the additive tree.

I

0

= {+

2

,+

3

,...,+

N

}

Evolving Additive Trees for Modeling Biochemical Systems 133

I

1

= F ∪T = {∗, /, sin, cos, exp,rlog,x,R}

Here F = {∗,/,sin,cos,exp,rlog} and T = {x,R} are respectively the function sets

and the terminal sets, where +

N

,∗,/,sin,cos,exp,rlog,x, and R denote addition, mul-

tiplication, division, sine, cosine, exponent, logarithm, system inputs, random constant

number taking N, 2, 2, 1, 1, 1, 1, 0 and -1 arguments respectively [10].

N is an integer number(the maximum number of the ODE terms), I

0

is the instruction

set of the root node, and the instructions of other nodes are selected from the instruction

set I

1

. Note that if the right-hand sides of ODEs are the polynomial, the instruction set I

1

can be deﬁned as I

1

= {∗

2

,∗

3

,...,∗

n

,x

1

,x

2

,...,x

n

,R}.

We infer the system of ODEs with partitioning. Partitioning ( equations that describe

each variable of the system can be inferred separately) reduce the research space signif-

icantly. When using partitioning, a candidate equation for a signal variable is integrated

by substituting references to other variables with data from the observed time series [4].

Thus each right hand side of the ODE system is evolved independently in parallel.

2.1.2 Evolving structure of the equation

Finding an optimal or near-optimal additive tree is an evolutionary process. In this

study, the additive tree operators are used as follows.

(1) Mutation. In this paper, we choose three mutation operators to generate offsprings

from the parents. These mutation operators are as following:

1) Change one terminal node: randomly select one terminal node in the tree and

replace it with another terminal node which is generated randomly.

2) Grow: select a random leaf in the hidden layer of the tree and replace it with a

newly generated subtree.

3) Prone: randomly select a function node in a tree and replace it with a terminal

node selected in the set T.

(2) Crossover. First two parents are selected according to the predeﬁned crossover

probability P

c

. And select one nonterminal node in the hidden layer for each addi-

tive tree randomly, and then swap the selected subtree.

(3) Selection. EP-style tournament selection is applied to select the parents for the

next generation. This is repeated in each generation until the predeﬁned number of

generations or the best structure is found.

2.2 Parameter optimization of models using PSO

According to Fig.1, we check the parameters in equations, namely counting the num-

ber n

i

(i=1,2,...,N, N is the number of the equations).

According to n

i

, the particles are randomly generated initially. Each particle x

i

rep-

resents a potential solution. A swarm of particles moves through space, with the moving

velocity of each particle represented by the velocity vector v

i

. At each step, each particle

is evaluated and keep track of its own best position, which is associated with the best ﬁt-

ness it has achieved so far in a vector Pbest

i

. And the best position among all the particles

is kept as Gbest [6]. A new velocity for particle i is updated by

v

i

(t + 1) = v

i

(t) + c

1

r

1

(Pbest

i

−x

i

(t)) + c

2

r

2

(Gbest(t) −x

i

(t)) (1)

134 The 3rd International Symposium on Optimization and Systems Biology

Table 1: Parameters for experiments

Exp1 Exp2 Exp3 Exp4

Population size 20 50 20 300

Generation 50 100 50 100

Crossover rate 0.7 0.7 0.7 0.7

Time series 1 1 1 10

Stepsize 0.01 0.01 0.05 0.01

Data point 30 100 48 10

where c

1

and c

2

are positive constant, and r

1

and r

2

are uniformly distributed random

number in [0,1]. Based on the updated velocities, each particle changes its position ac-

cording to the following equation:

x

i

(t + 1) = x

i

(t) + v

i

(t + 1) (2)

2.3 Fitness deﬁnition

The ﬁtness of each variable is deﬁned as the sum of squared error(SSE) :

fitness(i) =

T−1

∑

k=0

(

x

′

i

(t

0

+ k∆t) −x

i

(t

0

+ k∆t)

)

2

(3)

where t

0

is the starting time, △t is the step size, T is the number of the data point,

x

i

(t

0

+k△t) is the actual outputs of the i-th sample, and x

′

i

(t

0

+k△t) is ODEs outputs. All

outputs are calculated by using the approximate forth-order Runge-Kutta method. When

calculating the outputs, some individuals may cause overﬂow. In this case, the individual

which ﬁtness becomes large will be weeded out from the population.

2.4 Summary of the proposed algorithm

The optimal design of each ODE can be described as follows.

(1) Randomly create an initial population(the structure and its corresponding parame-

ters).

(2) Structure optimization is achieved by the additive tree variation operators, which is

described in subsection 2.1.

(3) At some interval of generations, select the better structures to optimize parameters.

Parameter optimization is achieved by PSO as described in subsection 2.2. In this

process, the structure is ﬁxed.

(4) If satisfactory solution is found, then stop; otherwise go to step (2).

3 Experimental results

We have prepared four tasks to test the effectiveness of our method. Experimental

parameters are summarized in Table 1.

Evolving Additive Trees for Modeling Biochemical Systems 135

3.1 Example 1: E-cell simulation

In this part of the simulation, we use data of a metabolic network, called the E-cell sys-

tem (a part of the biological phospholipid pathway). The network includes two reactions

catalyzed by Glycerol kinase (EC2.7.1.30) and Glycerol-1-phosphatase (EC3.1.3.21). The

network’s external input is ATP [11]. Glycerol and sn-Glycerol-3phosphate are produced

and consumed by the two reactions [11]. This network can be approximated as

⎧

⎨

⎩

.

X

1

= −10.32X

1

X

3

.

X

2

= 9.72X

1

X

3

−17.5X

2

.

X

3

= −9.7X

1

X

3

−17.5X

2

.

X

4

= −2.3X

1

(4)

0 5 10 15 20 25 30

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Time Points

Concentrations

X1

X2

X3

X4

pred X1

pred X2

pred X3

pred X4

Figure 2: Time series of the acquired model.

The last equation is added to the model for testing whether the proposed method could

produce false positives. The time series was generated for the above set of reactions with

initial conditions {1.2, 0.1, 0.1, 0.1 } for {X

1

, X

2

, X

3

,X

4

}. Experimental parameters

for this task are shown in Table 1. The used instruction set I

0

= {+

2

,+

3

,+

4

,+

5

} and

I

1

= {∗, X

1

,X

2

,X

3

,X

4

}. We have acquired the system of eq.(6), which gave the sums of

squared errors as (X

1

, X

2

, X

3

,X

4

)=(4.0 ×10

−12

, 9.0 ×10

−12

, 3.0 ×10

−12

, 3.6 ×10

−12

).

The time series generated is shown in Fig.3 along with that of the target.

The resulting model using our method is listed in Table 2. Comparing with the true

value, we can conﬁrm that our generating system is almost identical to the target ODEs.

We also have made further compare to examine the effectiveness of our proposed ap-

proach with GP+RLS and GP+KF. Obviously, the parameters of our model are closer to

the targeted model than GP+RLS and GP+KF. And the GP+KF needs the 1000 individu-

als ﬁrstly. Our initial population size is only 20.

3.2 Experiment 2: Three-species Lotka-Volterra model

The Lotka-Volterra model describes interactions between two species, i.e., predators

and preys, in an ecosystem [7]. The following ODEs represent a three-species Lotka-

136 The 3rd International Symposium on Optimization and Systems Biology

Table 2: Obtained Parameters by GP+RLS [13], GP+KF [12] and Our Proposed Method

true value GP+RLS GP+KF Our Method

w

11

-10.32 -9.64 -10.34 -10.319973

w

21

9.72 13.42 8.87 9.720075

w

22

-17.5 21.8 -17.42 -17.500157

w

31

-9.7 -5.63 -9.74 -9.699986

w

32

17.5 12.64 17.15 -17.500034

w

41

-2.3 -2.14 -2.24 -2.299913

Volterra model:

⎧

⎨

⎩

.

X

1

= (1 −X

1

−X

2

−10X

3

)X

1

.

X

2

= (0.992 −1.5X

1

−X

2

−X

3

)X

2

.

X

3

= (−1.2 + 5X

1

+ 0.5X

2

)X

3

(5)

This system models the introduction of third species, i.e., a predator, into a two-

species system of competition, i.e., preys. More precisely, X

1

and X

2

are the number

of preys competing with each other, whereas X

3

represents the number of predators. [11]

0 20 40 60 80 100

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Time Points

Concentrations

X1

X2

X3

pred X1

pred X2

pred X3

Figure 3: Time series of the acquired model for Lotka-Volterra model.

The time series are generated for the above set of ODEs with initial conditions {1.0,

2.0, 0.1 } for {X

1

,X

2

,X

3

}. The generated time series are shown in Fig.4. Experimental pa-

rameter for this task are shown in Table 1. The used instruction set I

0

= {+

3

,+

4

,+

5

,+

6

}

and I

1

= {∗,X

1

,X

2

,X

3

}. We have acquired the system of eq.(6), and note that the two sys-

tems of ODEs, i.e., eqs.(5) and eqs.(6), are almost identical except for slightly different

Evolving Additive Trees for Modeling Biochemical Systems 137

coefﬁcients. The sums of squared errors of ODEs are 8.1×10

−11

.

⎧

⎨

⎩

.

X

1

= 1.00122X

1

−1.0019X

1

2

−0.9991X

1

X

2

−10.0093X

1

X

3

.

X

2

= 0.9934X

2

−1.5008X

1

X

2

−0.9995X

2

2

−1.001X

2

X

3

.

X

3

= −1.1988X

3

+ 5.001X

1

X

3

+ 0.4984X

2

X

3

(6)

We conduct further experiments with the above model to compare with the perfor-

mance of the Multi Expression Programming (MEP), where the structure of the ODE is

inferred by the Multi Expression Programming (MEP) and the parameters of the ODE

are optimized by using particle swarm optimization (PSO). With the same population

size and iteration, the SSE of MEP is 7.431 ×10

−7

, and the SSE of our method is only

1.936×10

−9

. So our proposed method performs better than MEP in precision.

3.3 Experiment 3: bimolecular reaction

A bimolecular reaction equations [14]are described below:

X

2

+ X

1

K

1

−→ X

3

(7)

X

3

K

2

−→ X

4

+ X

2

(8)

The corresponding rate equations for all the four species are as follows:

⎧

⎨

⎩

.

X

1

= −2X

1

X

2

.

X

2

= −2X

1

X

2

+ 1.2X

3

.

X

3

= 2X

1

X

2

−1.2X

3

.

X

4

= 1.2X

3

(9)

The time series are generated for the reactions with initial conditions{1, 0.1,0,0}for

{X

1

,X

2

,X

3

,X

4

}which is shown in Fig.4 along with targeted time series. Experimental pa-

rameters for this task are shown in Table 1. The used instruction set I

0

= {+2,+4,+5,+6}

and I

1

= {∗,X

1

,X

2

,X

3

,X

4

}. We have acquired the system of eq.(11), which give the sums

of squared errors as (X

1

,X

2

,X

3

,X

4

)=( 1.0×10

−11

, 5.7×10

−12

, 3.0×10

−12

, 1.5×10

−11

).

⎧

⎨

⎩

.

X

1

= −1.9920X

1

X

2

.

X

2

= −1.1983X

1

X

2

+ 1.9920X

3

.

X

3

= 1.9920X

1

X

2

−1.1983X

3

.

X

4

= 1.1983X

3

(10)

Compared with eq.(10) [14], our predicted model and parameters are closer to the

target system. The method is executed with time series with different time step size and

with different initial condition. And the results have scarcely any change.

⎧

⎨

⎩

.

X

1

= −1.99999X

1

X

2

.

X

2

= 1.20000X

3

−1.99999X

1

X

2

.

X

3

= −1.20000X

3

−2.00000X

1

X

2

.

X

4

= 1.99999X

3

(11)

138 The 3rd International Symposium on Optimization and Systems Biology

0 10 20 30 40 50

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time points

Concentrations

X1

X2

X3

X4

pred X1

pred X2

pred X3

pred X4

Figure 4: Time series of the acquired model for bimolecular reaction.

Table 3: Parameters of the genetic network system

i

α

i

g

i1

g

i2

g

i3

g

i4

g

i5

β

i

h

i1

h

i2

h

i3

h

i4

h

i5

1 5.0 1.0 -1.0 10.0 2.0

2 10.0 2.0 10.0 2.0

3 10.0 -1.0 10.0 -1.0 2.0

4 8.0 2.0 -1.0 10.0 2.0

5 10.0 2.0 10.0 2.0

3.4 Experiment 4: a gene regulatory network

Figure 5 shows the example of a gene regulatory network. This type of network can

be modeled by a so-called S-system model [15]. This model is based on approximating

kinetic laws with multivariate power-law functions. The model consists of n nonlinear

ODEs and the generic form of equation i is given as follows:

X

′

i

(t) =

α

i

n

∏

j=1

X

g

ij

j

(t) −

β

i

n

∏

j=1

X

h

ij

j

(t) (12)

where X is the vector of dependent variable,

α

and

β

are vectors of non-negative rate

constants and g and h are matric of kinetic orders [15].

The parameters of the genetic network are given in Table 3. And the time series con-

sists of 10 different experiments which initial conditions are created randomly with 11

uniformly sampled data-points per variable. The used instruction set I

0

= {+2,+3,+4},

F = {∗,a

x

}, and we identify the correct model(parameters in the Table 4) with Intel

Pentinum Dual 2.00GHz processor and 1GB memory in 1.9 h averagely. Shinichi Kikuchi [15]

obtained one false positive interaction (h

53

= 0.7) using 70 h on a super-computer with

a cluster of 1G processors (Pentium 3, 933 MHz). Obviously our approach is more ac-

curate and signiﬁcantly faster compared with the genetic algorithm approach by Shinichi

Evolving Additive Trees for Modeling Biochemical Systems 139

Table 4: Parameters estimated by our method

i

α

i

g

i1

g

i2

g

i3

g

i4

g

i5

1 5.0 0.9528 -0.9948

2 10.0 1.9984

3 10.0 -0.9978

4 7.9579 2.0 -1.047

5 10.0 1.9943

β

i

h

i1

h

i2

h

i3

h

i4

h

i5

1 10.0 1.8775

2 10.0 2.0

3 10.0 -0.9988 1.9988

4 9.9524 2.0

5 10.0 2.0

Kikuchi.

The above four experiments have been used widely to test the performance of the

methods inferring the ordinary differential equations for identiﬁcation of biochemical sys-

tems in the previous research [11, 12, 13, 14, 15]. From our experimental results, we can

see that not only the linear differential equation but also the nonlinear differential equation

could be correctly identiﬁed. And compared with the general methods, our method not

only can identify correctly the biochemical systems especially parameter through the very

short iterative times, but also needs the less initial population. So our proposed method

works well for modeling biochemical systems.

4 Conclusion

In this paper, a hybrid evolutionary method of evolving ODEs is proposed. By sev-

eral experiments, we succeed in creating the systems of ODEs which are very close to

the target systems. The experimental results show the effectiveness and veracity of the

proposed method. The proposed method has two advantages. (1) The evolved additive

tree model is robust and easy to analyze by using traditional techniques. This is because

the evolved additive tree model is simple in form and is very similar with the traditional

representation of the system. So we can acquire the best structure of the ODE only by a

small population. (2) With partitioning, each ODE of the ODEs can be inferred separately

and the research space reduces rapidly, so we can acquire the best system very fast.

In the future work, we will apply our approach to solve some real problems in physics,

chemistry, economics, bioinformatics etc. Weather forecasting, quantum mechanics, stock

market dynamics and identiﬁcation of biological systems are some examples.

Acknowledgment

This research was supported by the NSFC (60573065), the the Natural Science Foun-

dation of Shandong Province (Y2007G33), and the Key Subject Research Foundation of

Shandong Province.

140 The 3rd International Symposium on Optimization and Systems Biology

References

[1] I.G.Tsoulos, I.E.Lagaris, Solving differential equations with genetic programming, Genetic

Programming and Evolvable Machines, Volume 7, Issue 1, pp.1389-2576, 2006.

[2] H.Cao, L.Kang, Y.Chen, J.Yu, Evolutionary Modeling of Systems of Ordinary Differential

Equations with Genetic Programming, Genetic Programming and Evolvable Machines, vol.1,

no.40, pp.309-337, 2000.

[3] Erina Sakamoto, H.Iba, Inferring a system of differential equations for a gene regulatory net-

work by using genetic programming, Proc. Congress on Evolutionary Computation, pp.720-

726, 2001.

[4] Bongard J., Lipson H., Automated reverse engineering of nonlinear dynamical systems, Pro-

ceedings of the National Academy of Science, 104(24), pp. 9943-9948, 2007.

[5] Oltean, M., Grosan, C., Evolving digital circuits using multi expression programming, In:

Zebulum, R. et al., NASA/DoD Conference on Evolvable Hardware, 24-26 June, Seattle.

IEEE Press, NJ, pp. 87-90, 2004.

[6] Yuehui Chen, Bo Yang, Ajith Abraham, Flexible Neural Trees Ensemble for Stock Index

Modeling, Neurocomputing, Vol. 70, Issues 4-6, pp. 697-703, 2007.

[7] Y.Takeuchi, Global Dynamical Properties of Lotka-Volterra Systems, Singapore: World Sci-

entiﬁc, 1996. mechanisms from measured time-series, J. Phys. Chem, pp.970-979, 1995.

[8] Gennemark P, Wedelin D. Efﬁcient algorithms for ordinary differential equation model iden-

tiﬁcation of biological systems. IET Syst Biol, 1(2):120-129, 2007 .

[9] Savageau, M.A., Biochemical systems analysis: a study of function and design in molecular

biology, (Addison-Wesley, Reading, MA,1976).

[10] Yuehui Chen, Ju Yang, Yong Zhang and Jiwen Dong, Evolving Additive tree models for

System Identiﬁcation, International Journal of Computational Cognition, Vol.3, No.2, pp. 19-

26, 2005.

[11] Hitoshi Iba, Inference of differential equation models by genetic programming, Elsevier Sci-

ence Inc, Volume 178, Issue 23, pp. 4453-4468, 2008.

[12] Lijun Qian, Haixin Wang, Dougherty, E.R. Inference of Noisy Nonlinear Differential Equa-

tion Models for Gene Regulatory Networks Using Genetic Programming and Kalman Filter-

ing, Signal Processing, IEEE Transactions on, Volume: 56, Issue: 7, pp. 3327-3339, 2008.

[13] S. Ando, E. Sakamoto, and H. Iba, Evolutionary modeling and inference of gene network,

Inf. Sci., vol. 145, pp. 237-259, 2002.

[14] Srividhya, JCrampin, EJMcSharry, PESchnell, S, Reconstructing biochemical pathways from

time course data, Proteomics, 7(6), pp.828-838, 2007.

[15] Shinichi Kikuchi, Daisuke Tominaga1, Masanori Arita, Dynamic modeling of genetic net-

works using genetic algorithm and S-system, BIOINFORMATICS, 19(5), pp. 643-650, 2003.

Evolving Additive Trees for Modeling Biochemical Systems 141