Page 1

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 9, SEPTEMBER 20104871

Selection Policy-Induced Reduction Mappings for

Boolean Networks

Ivan Ivanov, Plamen Simeonov, Noushin Ghaffari, Xiaoning Qian, and Edward R. Dougherty

Abstract—Developing computational models paves the way to

understanding, predicting, and influencing the long-term behavior

of genomic regulatory systems. However, several major challenges

have to be addressed before such models are successfully applied

in practice. Their inherent high complexity requires strategies for

complexity reduction. Reducing the complexity of the model by re-

moving genes and interpreting them as latent variablesleads to the

problem of selecting which states and their corresponding transi-

tions best account for the presence of such latent variables. We use

the Boolean network (BN) model to develop the general framework

for selection and reduction of the model’s complexity via desig-

nating some of the model’s variables as latent ones. We also study

the effects of the selection policies on the steady-state distribution

and the controllability of the model.

Index Terms—Compression, control, gene regulatory networks,

selection policy.

I. INTRODUCTION

A

eficially altering the dynamic behavior of the underlying bio-

logical system [1]. Owing to the inherent computational burden

of optimal control methods [2], the extreme complexity, which

grows exponentially with network size, creates an impediment

for the design of optimal control policies for large gene regu-

latory networks. The computational burden can be mitigated in

different ways, for instance, by using an approximation to the

optimal policy [3] or designing greedy algorithms that do not

involveoptimizationrelativetoacostfunction[4],[5],buteven

approximation and greedy-control methods can only work with

networks that are still relatively small. Another approach, the

one considered here, is to reduce the network.

fundamentalgoalofgeneregulatorymodelingistoderive

and study intervention strategies for the purpose of ben-

Manuscript received September 15, 2009; accepted April 23, 2010. Date of

publication May 17, 2010; date of current version August 11, 2010. The as-

sociate editor coordinating the review of this manuscript and approving it for

publication was Dr. Yufei Huang. The work presented in the paper was partially

supported by the NSF Grant CCF-0514644.

I. Ivanov is with the Department of Veterinary Physiology and Pharma-

cology, Texas A&M University, College Station, TX 77843 USA (e-mail:

iivanov@cvm.tamu.edu.).

P. Simeonov is with the Department of Mathematics, University of Houston-

Downtown, Houston, TX 77002 USA (e-mail: SimeonovP@uhd.edu.).

N. Ghaffari is with the Department of Electrical and Computer Engineering,

and the Department of Statistics, Texas A&M University, College Station, TX

77843 USA (e-mail: nghaffari@tamu.edu.).

X. Qian is with the Department of Computer Science and Engineering, Uni-

versity of South Florida, Tampa, FL 33620 USA (e-mail: xqian@cse.usf.edu.).

E. R. Dougherty is with the Department of Electrical and Computer Engi-

neering, Texas A&M University, College Station, TX 77843 USA. He is also

with the Translational Genomics Research Institute (TGEN), Phoenix, AZ

85004 USA (e-mail: edward@ece.tamu.edu.).

Digital Object Identifier 10.1109/TSP.2010.2050314

The Boolean network (BN) model [6] and the probabilistic

Boolean network (PBN) model [7] which in its binary form is

essentiallycomposedof acollectionof constituentBoolean net-

works connected via a probability structure, have played key

roles in the study of gene regulatory systems, in particular, with

regard to regulatory intervention. These models are especially

useful when there is evidence of switch-like behavior. In ad-

dition, the dynamical behavior of the PBN model is described

by the well developed theory of Markov chains and their asso-

ciated transition probability matrices, which allows for a rig-

orous mathematical treatment of optimal regulatory interven-

tion. To address the issue of changing the long-run behavior,

stochastic control has been employed to find stationary con-

trol policies that affect the steady-state distribution of a PBN

[8]; however, the algorithms used to find these solutions have

complexity which increases exponentially with the number of

genes in the network. Hence, there is a need for size reducing

mappings that produce more tractable models whose dynamical

behavior and stationary control policies are “close” to those of

the original model, the key issue here being closeness, whatever

way in which that is characterized.

Onewaytoreduceanetworkistodeletegenes.Thefirstmap-

ping of this kind was introduced in [9]. It preserves the proba-

bilitystructureofagivenPBN;however,thenumberofpossible

constituentBooleannetworksinthereducedmodelcanincrease

exponentiallycompared to theoriginal PBN, thereby increasing

complexity of a different kind and making the biological inter-

pretation of the reduced PBN problematic.

The reduction mapping proposed in [10] addresses this

issue by mapping a given PBN onto a set of candidate PBNs

without increasing the number of constituent networks in the

original model. The removed gene is considered as a latent

variable which induces a specific “collapsing” of pairs of

states from the state space of the original PBN and induces a

selection of their successive states based on the steady-state

distribution of the network. The “collapsing” procedure

represents a situation in which a gene is not observable,

which implies that states of the regulatory system that differ

only in the expression of that gene become identical and

thus “collapse” into each other. The notion of collapsing is

both natural and general. If

and

network that differ only in the value of gene

is to be deleted, then these states will be identified as

in the reduced network and the

question becomes how to treat them within the functional

structure of the network.

This paper introduces a general framework for the construc-

tion of reduction mappings based on the “collapsing” heuristic

are two states in the

, and gene

1053-587X/$26.00 © 2010 IEEE

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 2

4872IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 9, SEPTEMBER 2010

byutilizingtheconceptofaselectionpolicy,ofwhichthereduc-

tionmappingof[10]isaspecialcase.Itstudiestheeffectsofse-

lectionpoliciesonthesteady-statedistributionandonstationary

control, in particular, the mean-first-passage-time (MFPT) con-

trol policy [4]. Since a binary PBN is a collection of Boolean

networks with perturbation, we treat selection policies within

the framework of Boolean networks with perturbation

By reducing each constituent BN for the same gene, in effect,

we reduce the PBN.

.

II. BACKGROUND

A. Boolean and Probabilistic Boolean Networks

A BN with perturbation ,

defined by a set of nodes

Boolean functions

represents the expression level of gene , with 1 representing

high and 0 representing low expression. The vector

theregulatoryrulesbetweengenes.Ateverytimestep,thevalue

of a gene

is predicted by the values of a set,

at the previous time step, based on the regulatory function

The set of genes

set and the function

is called the predictor function of

state of the

is a vector

the state space of the

is the collection

network. The perturbation parameter

gene mutations, i.e., at each time point there is a probability

of any gene changing its value uniformly randomly. Since there

are

genes, the probability of a random perturbation occurring

at any time point is

. At any time point , the state of

the network is given by

is called a gene activity profile (GAP). The dynamics of a

are completely described by its

matrix

where

of the underlying Markov chain undergoing the transition from

the state

to the state. The perturbation probability

the chain ergodic and therefore it possesses a steady-state prob-

ability distribution

.

Computing the elements of

present it here because of its importance in the subsequent con-

siderations. When computing the transition probabilities for a

one has to realize that at every time step one of the two

mutually exclusive events happens: either the chain transitions

accordingtotheregulatoryrules oraperturbationoccurs.This

interpretation implies that when no perturbation occurs the net-

workregulatoryrulesareapplied.Therearetwoimportantcases

in computing

for every given state

is when

is a singleton attractor, i.e.,

, where

sitions where the binary representations of

each other. The second case is when

In this case,

and

. The transition

the regulatory rules

with a probability of

turbation with a probability of

, on

and a vector of

genes is

. The variable

represents

, of genes

.

is called the predictor

. A

, and

of all states of the

models random

, and

transition probability

is the probability

makes

is straightforward. We elect to

. The first case

. In this case,

is the number of the po-

and

, where

differ from

.

, for any

can happen by either applying

or by per-

. In summary

(1)

where

is the indicator function that takes value 1 if

according to the truth table and is equal to 0 other-

wise.

A probabilistic Boolean network (PBN) consists of a set

of nodes/genes

, and a set of vector valued network functions,

, governing the state transitions of the genes, each

network function being of the form

where

[7] In most applications, the discretization is either binary or

ternary. Here we use binary

random decision is made as to whether to switch the network

function for the next transition, with the probability

switch being a system parameter. If a decision is made to

switch the network function, then a new function is chosen

from among

, with the probability of choosing

being the selection probability

not conditioned by the current network and the current network

can be selected). Each network function

the individual BNs being called the contexts of the PBN. The

PBN behaves as a fixed BN until a random decision (with

probability

) is made to switch contexts according to the

probabilities

from among

then the PBN is said to be instantaneously random; if

[11], then the PBN is said to be context-sensitive. Our interest is

with context-sensitive PBNs with perturbation, meaning that at

each time point there is a probability

value uniformly randomly. Excluding the selection-probability

structure, a context-sensitive PBN is a collection of

the contexts of the PBN. It is in this sense that by considering

reduction mappings for

reduction mappings for PBNs.

,,

,

,

. At each time point a

of a

(note that network selection is

determines a BN,

. If,

of any gene flipping its

,

we are ipso facto considering

B. Effect of Rank-One Perturbations on the Steady-State

Distribution

When studying the network reduction, we will need to char-

acterize the changes in the steady-state distribution resulting

fromcertainkindsofchangesintheMarkovchain,theso-called

rank-one perturbations (not to be confused with the notion of

perturbationin

transitions).Thisproblemhasbeenstudied

in the framework of structural intervention in gene regulatory

networks [12] and more generally in the framework of Markov

chain perturbation theory [13].

Letting

and denote the transition probability matrix

and steady-state distribution for the perturbed

and , where

lytical expressions for the steady-state distribution change can

be obtained using the fundamental matrix, which exists for any

ergodic Markov chain and is given by

where

is a column vector whose components are all unity

[14]. Letting

, the steady-state distribution change

is

.

For a rank-one perturbation, the perturbed Markov chain has

the transition probability matrix

two arbitrary vectors satisfying

, we have

denotes transpose. Ana-

,

, whereare

, and represents a

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 3

IVANOV et al.: SELECTION POLICY-INDUCED REDUCTION MAPPINGS4873

rank-one perturbation of the original transition probability ma-

trix

. In this case, it can be shown that [12]

(2)

An important special case occurs when the transition mecha-

nisms before and after perturbation differ only in one state, say

the

th state. Then

th row, whereis the coordinate vector with a 1 in the

position and 0s elsewhere. Substituting this into (2) yields

has nonzero values only in its

th

(3)

where

the respective vectors. For the th state

, and,are the th coordinates of

(4)

The results for these special cases can be extended to arbi-

trary types of perturbations so that it is possible to compute

the steady-state distributions of arbitrarily perturbed Markov

chains in an iterative fashion [12].

C. MFPT Control Policy

The problem of optimal intervention is usually formulated as

an optimal stochastic control problem [2]. We focus on treat-

ments that specify the intervention on a single control gene . A

control policy

based on

rules

for each time step . The values 0/1 are

interpreted as off/on for the application of the control.

TheMFPTalgorithmisbasedonthecomparisonbetweenthe

MFPTs of a state

and its flipped (with respect to ) state

[4]. When considering therapeutic intervention, the state space

can be partitioned into desirable

according to the expression values of a given set

For simplicity we will assume that

gene determines the partition. The intuition behind the MFPT

algorithmisthat,giventhecontrolgene ,whenadesirablestate

reacheson average faster than

control and start the next network transition from

of

and are reversed when

one can assumethatthe gene

is theleftmost gene in thestate’s

binary representation, i.e.,

the desirable states correspond to the value

assumption, the probability transition matrix

chain can be written as

is a sequence of decision

and undesirable states

of genes.

, i.e., only one

, it is reasonable to apply

. The roles

. Without loss of generality

and, and

. With this

of the Markov

(5)

Using this representation one can compute the mean first-pas-

sage times

and by solving the following system of

linear equations [15]:

(6)

(7)

where

vectors

theset

MFPT algorithm designs stationary(time independent) control

policies

for each gene

by comparing the coordinate differences

to , a tuning parameter that can be set to

a higher value when the ratio of the cost of control to the cost

of the undesirable states is higher, the intent being to apply the

control less frequently.

denotes a unit vector of the appropriate length. The

andcontain the MFPTs from each state in

,andfromeachstatein totheset

to

,respectively.The

in the network

and

III. SELECTION POLICIES AND REDUCTION MAPPINGS

To motivate the definition of a selection policy, we consider

a specific reduction mapping designed using information about

the steady-state distribution of the network [10]. That mapping

is constructed assuming that the gene

the network. For every state

that differs from

only in the value of gene . Next, for every

such pair

and, we consider their successor states

, and. Following deletion,

variable and, under the reduction mapping, the states

’collapse’ to a state in the reduced network. The state

tained from either or

by removing their th coordinate. The

reduction mapping, denoted by

of the reduced network by selecting the transition

or , otherwise. This particular type of re-

duction is a special case of a reduction mapping

by a selection policy

, which we define next.

Definition 1: A selection policy

deleted gene

is a dimensional vector,

indexed by the states of and having components equal to 1 at

exactly one of the positions corresponding to each pair

.

For each gene

there are

Using this definition, one can consider the reduction mapping

corresponding to the selection policy

the truth table of the reduced network by selecting the transition

if or, otherwise. Greater reduction of

thenetworkcanbeachievedbysequentiallydeletinggenesinan

iterativemanner following the approach outlined inthis section.

Each selection policy is obtained by the action of a corre-

sponding selection matrix

column unit vector ,

is to be ’deleted’ from

, let denote the state

and

becomes a latent

and

is ob-

, constructs the truth table

if

induced

corresponding to the

,

,

different selection policies.

. constructs

of dimension on the

(8)

where

placing the rows corresponding to the 0 entries in the selection

policy

with zero rows. The selection matrix serves as a right

identity,

, to the

is obtained from the identity matrix by re-

matrix obtained

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 4

4874IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 9, SEPTEMBER 2010

from the selection matrix by deleting its zero rows. In what fol-

lows, we will focus on the selection policies that correspond to

the case

. This could be achieved by permuting the gene

indexes and does not restrict the generality of our considera-

tions. For example, in the case of BNs on 3 genes, one of the

possible

matrixes has the following format:

(9)

The

posing and then by setting for each column ,

or

above example of the matrix

matrix is obtained from by first trans-

if

if

yields

, . The

(10)

We define the companion matrix

ping

by

for the reduction map-

(11)

The companion matrix

closely related to the dynamics of the reduced network

Its importance can be seen from the following theorem, which

states that

is asymptotically equal to the transition proba-

bility matrix

of.

Theorem 1: For every selection policy

defines a Markov chain that is

.

where

matrix.

denotes the maximum element of the respective

Proof: Without loss of generality we set

definition, each row of the companion matrix

as follows. First, the appropriate rows from

cording to the selection policy

and the columns in the resulting

pairsindexedbythepairsofflippedstates and

the

companion matrix. Because the elements of

the matrices

andare indexed with the states

compare them element-wise. Note that only rows in the orig-

inal probability transition matrix

which

contribute in the formation of both

.Thus, for eachstate,

to consider:

1.

In this case considerations similar to the one in Section II

show that

. By the

is obtained

are selected ac-

. The other rows are discarded

matrix are added in

,thusforming

, we

indexed by statesfor

and

, thereare severalcases

which equals to

. When or

we haveand

. Thus

where

and

enough to notice that

is the Hamming distance between the states

, e.g., the definition ofin Section II. Now, it is

and

but , whereis defined similarly to

.Thisobservationshowsthat

forthereducednetwork

2.

In this caseand

which gives

For the rest of the states the verification that

can be done as in case 1.

and

and

3.

,

In this case, while

and

. Computations, similar to the previous two

cases show that

Finally, for states

as in the previous case and show that

.

The above considerations show that each row of the com-

panionmatrix

couldpossiblydiffer fromthecorresponding

row in the transition probability matrix

one of them perturbed by a term that equals to

the other one by a term that equals to

tities tend to 0 as

. The norm of the matrix difference

from the statement of the theorem is proportional to the larger

of these two numbers, and thus we obtain the conclusion of the

theorem.

An immediate corollary of the theorem is that, if all of the

states in a

are either singleton attractors or part of at-

tractor cycles of the type

bility matrix

for the reduced network is identical to the com-

panion matrix for every reduction mapping induced by a selec-

tion policy. The salient observation following from the theorem

is that for large networks

probability of perturbation,

very small perturbation, identical to

In the next two sections, we use the companion matrix

to study the effects of selection-policy-induced reduction

mappings on both the steady-state distribution of the original

network and a specific type of stationary control policy, the

MFPT control policy. The issue of the effects of a reduction

,the computations are the same

in only two entries,

and

. Both quan-

, then the transition proba-

or for networks with a high

,,

.

is, up to a

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 5

IVANOV et al.: SELECTION POLICY-INDUCED REDUCTION MAPPINGS4875

on the steady-state distribution is different from the issue of its

effects on a stationary control policy for the network; however,

the reduction effects on the steady-state distribution and the

control policy can both be used as measures for the goodness

of reduction mappings.

IV. EFFECT OF REDUCTION ON THE STEADY-STATE

DISTRIBUTION

In this section we use the companion matrix

reduction mapping

turbation theory to study the effects of the reduction on the

steady-state distribution of the

section, one can assume without loss of generality that the gene

determines the partition

, andis the gene to be ’deleted’ from the network.

Since

(1), all the entries in P have two terms

. We note that the first term is the multi-

plication of the probability of no random flipping

the indicator

determined by the Boolean functions

of the. We denote

is determined by the perturbation parameter

ming distance between

and

. Hence,

for any entry in the transition matrix

write

as given in the text. Each entry in

corresponds to and each entry in

. Obviously,is the same for all

number of genes and the same value for the perturbation param-

eter , and

is determined by the regulatory rules or Boolean

functions

given in the.

For a given selection policy

by replacing the rows in

that correspond to

the rows where

or by leaving them unchanged if

. Similarly to, the probability transition matrix

for this new Markov chain can be decomposed as

for the

and results from Markov chain per-

. Similarly to the previous

of the state space

in

and

with

. The second term

and the Ham-

. We denote it by

. Thus, we can

corresponds to

with the same

, define a new Markov chain

with

Forexample,onepossiblematrix

the assumption

fora on3genesunder

is

The so introduced new Markov chain is obtained from the orig-

inal Markov chain representing the

rank-one perturbations. Thus, following [12], one can

compute the total change in the steady-state distribution caused

by those perturbations. By the definition of the steady-state

distribution,

. For each state , we have

by performing at most

where

have the similar form

denotes the th column of . For the state , we

Note that

neighboring odd and even columns. Now, for every such pair of

flipped states we compute

andare flipped states and and are

Recall that

neighboring odd and even rows in

example of

. We have

and, when

are identical, see the above

, the

Based on the definition of

is straightforward to see that

, and using Hamming distances, it

Hence

Theneighboring oddand evenrowsin

fact that for an arbitrary pair of neighboring odd and even states

and the Hamming distances between , and ,

to each other gives us

areidenticaland the

are equal

We denote

lowed here leads to the steady-state distribution for the com-

panion matrix

. Thus

for all . The procedure fol-

This observation when combined with the Markov chain per-

turbationtheoryandTheorem1showsthattheeffectsofaselec-

tion policy

on the steady-state distribution of a given

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 6

4876IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 9, SEPTEMBER 2010

can be estimated approximately by comparing the

sional vector with coordinates equal to the appropriate sums

of

andto the steady-state distribution for the com-

panion matrix

. Indeed, according to Theorem 1 the steady-

state distribution

of the Markov chain described by the com-

panionmatrixrapidlyconvergeswith

distribution

of the reduced network. At the same time, this

probability distribution can be obtained via the relation

from the steady-state distribution

chaindescribedbytheprobabilitytransitionmatrix

is obtained from the probability transition matrix of the original

by means of at most

discussion in Section II-B and (4) in particular, show how one

cancomputetheeffectonthesteady-statedistributionaftereach

one of these perturbations. The total effect on the steady-state

distribution produced by all of the perturbations that lead to the

matrix

can be computed by iteratively applying (4). Thus,

for every state

in the original

turbation effect on the sums

proper collapsing of the steady-state of the original network if

one considers the deleted gene

cussion shows that the differences

canserveasagoodapproximationtotheshiftinthesteady-state

distribution of the original

-dimen-

tothesteady-state

of the Markov

.Thelatter

rank-one perturbations. The

one can compute the per-

which represent the

as a latent variable. This dis-

,,

.

V. EFFECT OF REDUCTION ON THE CONTROL POLICY

In this section we study the effects of the selection-policy-

induced reduction mappings on control policies designed for a

.Sinceourgoalistoarriveatameasureofperformancefor

reductionmappings,weneedtoconsideracontrolpolicywhose

mathematical formulation can be related to selection policies.

As we will see, this requirement is met by the MFPT control

policy.

For a selection policy

and its corresponding reduction

mapping

we can consider the MFPT stationary control

policy on the reduced network. In this regard we assume

without loss of generality that

gene and

is the gene to be deleted from the network.

Interpreting the deletion of gene

non-observable, variable, it is desirable that the MFPT control

policy

for the reduced network

sible to the one designed for the original network when both

control policies have the same parameter

can achieve similar control actions for every state

its corresponding reduced state . Obviously, this is achieved

only if

. Thus, we arrive at the following

definition which is formulated for the general case of stationary

control policies.

,, is the control

as creation of a latent, or

is as close as pos-

. In this way, one

and

Definition 2: Given a stationary control policy

to be deleted from the

sistent at the state

state

is called-inconsistent. The ratio

-inconsistent states

is called the -relative inconsistency of the control policy

The pairs of

-inconsistent states in the original network

present us with two possible options for defining the control

action for the states in the reduced network obtained by col-

lapsing those pairs into their respective reduced states. Thus,

one should measure the effects of a selection policy-induced

reduction mapping by comparing the control actions for the

subset

of states that are not

control actions for their corresponding reduced states in the

reduced network

. The next definition provides a relative

measure of those effects.

Definition 3: Given a stationary control policy

tobedeletedfromthe,thepolicy

at the state

if and only if the control action for the

reduced state

is different from

numberofstates

wherethecontrolpolicyis

to the total number of states in

the selection policy

on the control policy

Because there is only a finite number of selection policies

, there exists a selection policy

effect

among the all possible selection policies

given stationary control policy

Recall that the MFPT control policy

on the comparisons between

to. Thus, one can achieve the desired similarity be-

tween the MFPT control policies by minimizing the differences

and betweentherespectivevectorsofmean

first-passage times for the states in the original and the reduced

networks.However,thedimensionof

the dimension of

and is

subtract the vectors of mean first-passage times for the reduced

network from their counterparts for the original network.

This problem can be circumvented by using the companion

matrix. Because the sets of desirable and undesirable states are

the same for the Markov chain corresponding to the reduced

network and the one corresponding to

equations similar to (6) and (7), namely

and a gene

, the policy

if and only if

is called -incon-

. The

of the number of

to the total number of states in

.

-inconsistent to the

and a gene

-affected

iscalled

. The ratio of the

-affected

is called the relative effect of

.

that minimizes the relative

on any

designed for a.

is designed based

and

andiswhile

, and one cannot simply

, one can consider

(12)

(13)

Consider a state

and (12), one gets (14), shown at the bottom of the page. where

the index

describes the coordinates of the vector

states

collapse to the state

such that. Then, from (6)

and the

. The right hand

(14)

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 7

IVANOV et al.: SELECTION POLICY-INDUCED REDUCTION MAPPINGS4877

side suggests that the proper vector to subtract from

vector defined by the action of the matrix

, where, referringto(6),

submatrix ofand

with identical elements for each pair of positions indexed by a

pair of flipped states

. Thus, if one pairs every equation of

type (14) with an artificially introduced identity corresponding

to the state

that was not selected by the policy

is the

on the vector

is theupper left

has dimension

(15)

then one gets a linear system of

can be represented in a matrix format as

equations. That system

(16)

where the matrix

(6) and (7), by replacing the rows corresponding to 0 entries in

the selection policy

with the appropriate unit basis vectors

. Similar considerations lead to

is obtained from the matrix , e.g.,

(17)

where the matrix

(6) and (7), by replacing the rows corresponding to 0 entries in

the selection policy

with the appropriate unit basis vectors

, andwith

sub-matrix of

that the differences

and

corresponding to the eigenvalue of 1 for the matrices

, respectively.

Recall that there exists a selection policy

with regard to minimizing the relative effect

the all possible selection policies

control policy

. Whereas for every candidate for deletion

gene there is an optimal selection policy to reduce the network,

finding that optimal selection policy requires finding the MFPT

control policy for each one of the

networks. This procedure is highly complex because it requires

finding the respective transition probability matrixes for all of

the reduced networks, then solving the corresponding systems

of linear equations, e.g., (6) and (7), to find the vectors of mean

first-passage times, and finally, executing the algorithm that

determines the control policy for the reduced network.

Althoughtheoptimalselectionpolicy

minimize the differences

lection policy

that ensures close to 0 coordinates for these

vectors at the positions that correspond to the places where its

coordinates equals to 1 is asymptotically (as

and Theorem 1 shows that the convergence is fast. The relations

given by (16) and (17) show that the differences

belong to the linear subspaces spanned by the eigen-

vectors corresponding to the eigenvalue of 1 for their respec-

tive matrixes

and. Our interest is in finding the ele-

ments of those subspaces that have small coordinates at the po-

sitions that correspond to the places where the selection policy

equals to 1.

Whilewedonotconsiderthequestionofcharacterizingthose

subspaces based on the structure of a given selection policy, we

is obtained from the matrix, e.g.,

being the lower left

. Equations (16) and (17) show

are eigenvectors

and

that is optimal

among

on the stationary MFPT

possible reduced

doesnotnecessarily

, every se-and

) optimal

and

provide two theorems concerning estimating the probability of

finding vectors that belong to the subspaces and have a certain

growth condition imposed on their coordinates. These probabil-

ities are proportional to either the surface area or the volume of

certainportionsofthe -ballin

or metric.Thus,weareinterestedinthesets

whichisendowedwitheither

,

and

, where, . By

adjusting the parameters

having the vectors of MFPT

to each other at the coordinates where the selection policy

equals to 1. Here we only state the theorems, leaving the proofs

in the Appendix.

Theorem 2: Consider the set

, one can estimate the probability of

and( and) close

. Then

Theorem 3: Consider the sets and . Then

where

functions of the respective sets,

onto the subspace spanned by the coordinate vectors

and

,and are the characteristic

is the projection of

Thequantitiesand

describe the area of the projec-

tion of the intersection between the positive coordinate cone in

and the surface of the -ball in

topology respectively. The projections are onto the subspace

of

spanned by the coordinate vectors

Similarly, the quantity

-dimensional volume of the set

portion of the

-ball in the

endowed with the and

.

describes the

, the volume of the

space that is bounded by the

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 8

4878IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 9, SEPTEMBER 2010

Fig. 1. 4-gene ?? has 256 possible selection policies. All of these selection policies and their SSD shift toward Desirable states are shown in this figure. 16 out

of 256 selection policies produce the maximum shift toward Desirable states and are identified by a circle around them. Out heuristic selection policy is one of

these 16 optimal selection policies.

surface

on the positive coordinate cone in

hand: estimating the closeness of the vectors

andin either

of the possible permutations of the coordinate vectors, one can

see that the probability of finding vectors with the structure

described in the statements of the theorems is proportional

to

in thecase, and to

case. These two quantities rapidly tend to zero if some of

the numbers

tend to zero. It is important to notice that the

minimum in either

ornorm of the coordinates of interest

of the vectors

and

given

and the selection policy

relative effect

of the selection policy

policy

depends on the choice of the parameter

is related to the cost of control [4]. Smaller values of

that control is applied more often, larger size of the set

a potential increase of

. Smaller values for

that a brute force search for vectors

enough to the MFPT vectors

Thus, there is a need for heuristic approaches that identify

both a gene for deletion and a selection policy which has small

relative effect on the control policy. It is also desirable that the

reduced network shares similar controllability properties as the

original one, where the controllability is understood in terms of

the shift of the steady-state distribution towards the desirable

states after the application of MFPT control policy. The control

policydevelopedin[16]employssuchaheuristicallychosense-

lectionpolicyinthefollowing scenario:(i) a geneis selectedfor

and the positive coordinate cone in. We focus

because of the task at

and or

or norm. Taking into account all

in the

depends only on the

. At the same time, the

on the control

which

imply

, and

also imply

that are close and

is a hard problem.and

deletion by a procedure involving the Coefficient of Determina-

tion(CoD),[17];(ii)areductionmappingusingtheheuristically

chosen selection policy is applied; (iii) steps (i) and (ii) are used

iteratively until a network of a predetermined (computationally

feasible) size is achieved, (iv) the MFPT control policy is de-

rived for the reduced network; and (v) a control policy is in-

duced on the original network from the policy derived for the

reduced network.

VI. SIMULATION STUDY

We performed simulation studies to illustrate the new con-

cepts developed in this paper. Specifically, we study the rela-

tionship between the relative effect of the selection policy on

the MFPT control policy and the shift of the steady-state distri-

bution of the network towards the desirable states.

The key concept in this paper is that of the selection policy

which determines the reduction mapping, and subsequently

the structure of the reduced network. Given the large number

of possible selection policies, it is not feasible to exhaustively

search for the optimal selection policy. The optimal selection

policyminimizestherelativeeffectonagivenstationarycontrol

policy; therefore, if that stationary control policy is changed,

then the optimal selection policy could change as well. Thus, it

is important to develop heuristics for determining suboptimal

selection policies. The algorithm CoD-Reduce outlined in [16]

designs such a policy. Fig. 1 shows the performance of that

algorithm using a 4-gene network as an example: the designed

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 9

IVANOV et al.: SELECTION POLICY-INDUCED REDUCTION MAPPINGS4879

Fig. 2. The average SSD shift toward Desirable states and the relative effects on the control policies of successive reductions of 2 sets of 100 ?? . Each set has

randomly generated attractors which constrained to be evenly distributed between the Desirable and Undesirable states. At each step the MFPT control policy is

designed and applied on the network. As the figure shows the SSD shift is similar in the original and reduced networks by applying their own control policies.

Also, SSD shift and relative effect curves follow inverse patterns. (a) 9 genes. (b) 10 genes.

suboptimal selection policy is among the several optimal selec-

tion policies. In our simulation study, we use CoD-Reduce to

design the selection policy.

We have generated several sets, 100 networks each, of

using the algorithm developed in [18] for different numbers of

genes.Thenetworksarerandomlygeneratedsubjecttoonlyone

constraint on their attractors: half of them are among the desir-

able states of the respective network. For each randomly gen-

erated network, also referred to as the original network in the

simulationstudy,wedesignandapplytherespectiveMFPTcon-

trol policy. Then we compute the difference between the total

mass of the desirable states in the SSD of the network before

and after applying the MFPT control policy. We refer to this

difference as the SSD shift (toward the desirable states). It is

clear that more efficient stationary control policies will produce

larger SSD shifts. For the next step in the simulations, we use

CoD-Reduce and delete one gene from the network and find the

MFPTcontrolpolicyforthereducednetwork.Wethencalculate

the relative effect of the selection policy on the MFPT control

policy for the original network by comparing that control policy

to the MFPT control policy for the reduced network. The reduc-

tion step and the calculation of the relative effect on the MFPT

control policy ares repeated iteratively using the reduced net-

work from the previous step as the “original” network for the

new reduction step.

Fig. 2 illustrates our findings: CoD-Reduce does not have a

significant effect (on average) on the amount of SSD shift if

a single gene is removed from the model. Similarly, there is a

small change (on average) in the relative effect of the selection

policyontheMFPTcontrolpolicywhenmovingfromanetwork

to its reduced version. Thus on average, our heuristic for per-

forming selection policy-based reduction does not have much

of an impact on the controllability of the network when a single

gene is removed from it. However, the effects of reduction tend

to accumulate with the removal of more genes, and ultimately

the reduction mappings produce poor results when a very few

genes remain in the network.

In general, the SSD shift and the relative effect tend to follow

inversetrends.Thisistobeexpectedbecauseabigrelativeeffect

onthecontrolpolicyimpliessignificantdifferenceinthecontrol

actions for the states on the larger network and their respective

reduced states in the smaller network which could ultimately

lead to a significant difference in the shifts induced by those

control policies in the steady-state distributions of the models.

A. Gastrointestinal Cancer Network Application

We have considered the effects of the reduction on a real-

world,experiment-drivennetwork generatedusing thegastroin-

testinal cancer data set from [19]. Before applying reduction,

we must infer the gene regulatory network from the gene ex-

pression data and obtain the

the gene expression data are normalized, filtered and binarized

using the methods from [20]. For the inference, we use the pro-

cedure used by [16] which is a modified version of the seed al-

gorithm proposed by Hashimoto et al. [21]. The algorithm is

initialized by the C9orf65 gene as the seed gene and the gene

CXCL12 is chosen as the control gene. The seed gene is se-

lected based on the fact that it is one of two genes composing

the best classifier in [19] and the control gene selection is based

on the strong CoD [17] connection to the seed gene. The coef-

ficient of determination (CoD) measures how a set of random

variables improves the prediction of a target variable, relative to

the best prediction in the absence of any conditioning observa-

tion [17]. Let

binary random variables,

a binary target variable, and

Boolean function such that

error of

as a predictor of

ference,

. Let

amongallpredictorfunctions

of thebest estimate of

without any predictors. TheCoD is de-

fined as

. In the preprocessing step,

be a vector of predictor

a

predicts

is the expected squared dif-

be the minimum MSE

forand

. The mean-square

betheerror

(18)

The network is grown by adding one gene at a time; at each

iterative step the gene having the strongest connectivity, mea-

sured by the CoD, to one of the genes from the current network

is added to the network. Then, the network is rewired taking

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 10

4880IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 9, SEPTEMBER 2010

TABLE I

SSD SHIFT TOWARD THE DESIRABLE STATES IN

GASTROINTESTINAL CANCER NETWORK

into account that a new gene in the network can change the

way genes influence each other. The total number of the genes

in the final network is 17: C9orf65, CXCL12, TK1, SOCS2,

THC2168366, SEC61B, ENST00000361295, KCNH2, ACTB,

RPS18, RPS13, THC2199344, SNX26, RPL26, SLC20A1,

RPS11, THC2210612, THC2161967, IER2 and LAMP1. This

17-gene

has a very large state space of size

computation of its SSD is infeasible. Thus the SSD is estimated

by first running the network for a long time and then using the

Kolmogorov-Smirnov test to decide if the network has reached

its steady state. Initially the best gene for deletion is selected

using CoD-Reduce [16] and the network is reduced by deleting

one gene, then CoD-Reduce is applied consecutively to reduce

the network down to 10 genes: C9orf65, CXCL12, RPS18,

RPS13, THC2199344, SNX26, RPL26, SLC20A1, RPS11 and

THC2210612. After reducing the original network down to

10 genes, the MFPT control policy for the reduced network

is designed and then induced back on the original 17-gene

network. The induction of the control policy is a necessary

step since the control policy designed on the reduced network

is of smaller dimension compared to the original one. We use

the method outlined in [16] for inducing the control policy de-

signed on the 10-gene network to the original 17-gene network.

After deleting one gene, each state in the reduced network

corresponds to two states, referred to here as “parent” states,

in the original network that collapsed together. After designing

the contol policy and assigning the control actions to the states

of the reduced network, in the induction procedure, the control

action of each state in the reduced network will be duplicated

as the control actions for its parent states.

Table I shows the total probability mass of the desirable and

undesirable states before and after applying the induced control

policy. As the table illustrates, there is about 13% shift in the

steady-state distribution of the network toward more desirable

states.

VII. CONCLUSION

. Exact

This paper presents a general compression strategy for

Boolean network models of genomic regulation. The strategy is

based on the concept of a selection policy, and the complexity

of the model is reduced by consecutively removing genes from

the network. The removed genes are treated as latent or non-ob-

servable variables, and the truth tables for the reduced models

are constructed based on the particular selection policy. The

effects of the compression on both the dynamical behavior of

the model and on the MFPT control policy are discussed using

the companion to the selection policy matrix. It is important to

emphasizethatwhilethereis always an optimalselection policy

which minimizes the effects of compression on the stationary

control policy the problem of finding it is a hard one. Thus,

there is a need to develop suboptimal compression algorithms

based on heuristical approaches.

APPENDIX

Proof (Theorem 2): The integral evaluation is easily veri-

fied by induction with respect to

In the case

direct integral evaluation shows that

. Next, assuming that thestatement of the the-

orem is true for

, one can compute

as follows:

.

This completes the proof of Theorem 2.

Proof (Theorem 3): We use induction with respect to

.

First we establish the case

and

. In this case,

. We obtain , thus

From the definition of it is clear that

(19)

For every

zero)

we have (with the empty sum being

where. Thus, (with

(20)

Then, we should have

(21)

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 11

IVANOV et al.: SELECTION POLICY-INDUCED REDUCTION MAPPINGS4881

which is true since

by (19) and (20). Now we handle the general case. Let

and define

(22)

Then

(23)

Definition (22) clearly implies (with

)

(24)

Since

a relevant induction hypothesis is

(25)

This hypothesis is verified using (24). We set

and

,

(26)

We obtain

(27)

The last expression gives the right-hand side of (25) after a

simple evaluation of the integral. The first formula of the the-

orem follows from (23) and (25).

It remains to evaluate

inition of

and (21) we have

. From the def-

(28)

where

(29)

By the above definition, we obtain the recurrence relation

(30)

Furthermore, the

provides the following hypothesis

evaluation of

and

(31)

To verify (31), again we set

(30) yields (32) at the bottom of the page.

Now (31) follows from (26) and (32). From (28) and (31) we

get the second formula in the statement of the theorem. This

completes the proof of Theorem 3.

,. Then,

ACKNOWLEDGMENT

The authors want to thank B. Shekhtman for the inspiring

discussions.

(32)

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.

Page 12

4882 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 9, SEPTEMBER 2010

REFERENCES

[1] A. Datta, R. Pal, A. Choudhary, and E. R. Dougherty, “Control

approaches for probabilistic gene regulatory networks,” IEEE Signal

Process. Mag., vol. 24, no. 1, pp. 54–63, 2007.

[2] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd

ed.Belmont, MA: Athena Scientific, 2005.

[3] B. Faryabi, A. Datta, and E. R. Dougherty, “On approximate stochastic

control in genetic regulatory networks,” IET Syst. Biol., vol. 1, no. 6,

pp. 361–368, 2007.

[4] G. Vahedi, B. Faryabi, J.-F. Chamberland, A. Data, and E. R.

Dougherty, “Intervention in gene regulatory networks via a stationary

mean-first-passage-time control policy,” IEEE Trans. Biomed. Eng.,

pp. 2319–2331, Oct. 2008.

[5] X. Qian, I. Ivanov, N. Ghaffari, and E. R. Dougherty, “Intervention in

generegulatorynetworksviagreedycontrolpoliciesbasedonlong-run

behavior,” BMC Syst. Biol., 2009.

[6] S. A. Kauffman, “Metabolic stability and epigenesis in randomly con-

structed genetic nets,” J. Theoret. Biol., vol. 22, pp. 437–467, 1969.

[7] I. Shmulevich, E. R. Dougherty, S. Kim, and W. Zhang, “Probabilistic

boolean networks: A rule-based uncertainty model for gene regulatory

networks,” Bioinform., vol. 18, no. 2, pp. 261–274, 2002.

[8] R. Pal, A. Datta, and E. R. Dougherty, “Robust intervention in proba-

bilistic Boolean networks,” IEEE Trans. Signal Process., vol. 56, no.

3, pp. 1280–1294, 2008.

[9] E. R. Dougherty and I. Shmulevich, “Mappings between probabilistic

Boolean networks,” Signal Process., vol. 83, no. 4, pp. 799–809, 2003.

[10] I. Ivanov and E. R. Dougherty, “Reduction mappings between proba-

bilistic boolean networks,” EURASIP J. Acoust. Signal Process., vol.

1, no. 1, pp. 125–131, 2004.

[11] M. Brun, E. R. Dougherty, and I. Shmulevich, “Steady-state probabil-

ities for attractors in probabilistic Boolean networks,” Signal Process.,

vol. 85, no. 10, pp. 1993–2013, 2005.

[12] X. Qian and E. R. Dougherty, “On the long-run sensitivity of proba-

bilistic Boolean networks,” J. Theoret. Biol., pp. 560–577, 2009.

[13] J. J. Hunter, “Stationary distributions and mean first passage times

of perturbed Markov chains,” Linear Algebra Appl., vol. 410, pp.

217–243, 2005.

[14] P. J. Schweitzer, “Perturbation theory and finite Markov chains,” J.

Appl. Probab., vol. 5, pp. 401–413, 1968.

[15] J. Norris, Markov Chains.Cambridge, U.K.: Cambridge Univ. Press,

1998.

[16] N.Ghaffari,I.Ivanov,X.Qian,andE.R.Dougherty,“ACoD-basedre-

duction algorithm for designing stationary control policies on Boolean

networks,” Bioinformatics, vol. 26, pp. 1556–1563, 2010.

[17] E. R. Dougherty, S. Kim, and Y. Chen, “Coeffcient of determination in

nonlinear signal processing,” Signal Process., vol. 80, pp. 2219–2235,

2000.

[18] R. Pal, I. Ivanov, A. Datta, and E. R. Dougherty, “Generating Boolean

networkswithaprescribedattractorstructure,”Bioinformatics,vol.54,

no. 21, pp. 4021–4025, Nov. 2005.

[19] N. D. Price, J. Trent, A. K. El-Naggar, D. Cogdell, E. Taylor, K. K.

Hunt, R. E. Pollock, L. Hood, I. Shmulevich, and W. Zhang, “Highly

accurate two-gene classifier for differentiating gastrointestinal stromal

tumors and leiomyosarcomas,” Proc. Nat. Acad. Sci., vol. 104, no. 9,

pp. 3414–3419, 2007.

[20] I. Shmulevich and W. Zhang, “Binary analysis and optimization-based

normalization of gene expression data,” Bioinform., vol. 18, no. 4, pp.

555–565, 2002.

[21] R. Hashimoto, S. Kim, I. Shmulevich, W. Zhang, M. L. Bittner, and E.

R. Dougherty, “A directed-graph algorithm to grow genetic regulatory

subnetworks from seed genes based on strength of connection,” Bioin-

form., vol. 20, no. 8, pp. 1241–1247, 2004.

Ivan Ivanov received the Ph.D. degree in mathe-

matics from the University of South Florida, Tampa.

He is an Assistant Professor with the Department

of Veterinary Physiology and Pharmacology, Texas

A&M University, College Station, TX. His current

research is focused on genomic signal processing,

and, in particular, on modeling the genomic regu-

latory mechanisms and on mappings reducing the

complexity of the models of genomic regulatory

networks.

Plamen Simeonov received the Ph.D. degree in

mathematics from the University of South Florida,

Tampa.

He is an Associate Professor of mathematics with

the Department of Computer and Mathematical Sci-

ences, University of Houston-Downtown, Houston,

TX. His current research interests are in the areas

of analysis, approximation theory, potential theory,

orthogonal polynomials and special functions, com-

puter-aided geometric design, and biostatistics.

Noushin Ghaffari received the M.Sc. degree in

computer information systems from the University

of Houston—Clear Lake, TX, in 2006.

Currently, she is pursuing the Ph.D. degree with

the Department of Electrical and Computer Engi-

neering, Texas A&M University, College Station.

Her research interests include: genomic signal

processing, systems biology, and computational

biology; especially complexity reduction and control

of Genetic Regulatory Networks.

Xiaoning Qian received the Ph.D. degree in elec-

trical engineering from Yale University, New Haven,

CT, in 2005.

Currently, he is an Assistant Professor with the

Department of Computer Science and Engineering,

University of South Florida, Tampa. He was with

the Bioinformatics Training Program, Texas A&M

University, College Station. His current research

interests include computational biology, genomic

signal processing, and biomedical image analysis.

Edward R. Dougherty received the M.S. degree

in computer science from Stevens Institute of

Technology, Hoboken, NJ, the Ph.D. degree in math-

ematics from Rutgers University, New Brunswick,

NJ, and the Doctor Honoris Causa by the Tampere

University of Technology, Finland.

HeisaProfessorwiththeDepartmentofElectrical

and Computer Engineering, Texas A&M Univer-

sity, College Station, where he holds the Robert

M. Kennedy ’26 Chair in Electrical Engineering

and is Director of the Genomic Signal Processing

Laboratory. He is also co-Director of the Computational Biology Division of

the Translational Genomics Research Institute, Phoenix, AZ, and is an Adjunct

Professor with the Department of Bioinformatics and Computational Biology,

University of Texas M. D. Anderson Cancer Center, Houston. He is author

of 15 books, editor of five others, and author of 250 journal papers. He has

contributed extensively to the statistical design of nonlinear operators for image

processing and the consequent application of pattern recognition theory to

nonlinear image processing. His current research in genomic signal processing

is aimed at diagnosis and prognosis based on genetic signatures and using gene

regulatory networks to develop therapies based on the disruption or mitigation

of aberrant gene function contributing to the pathology of a disease.

Dr. Dougherty is a fellow of SPIE, has received the SPIE President’s Award,

and served as the editor of the SPIE/IS&T Journal of Electronic Imaging. At

Texas A&M University received the Association of Former Students Distin-

guished Achievement Award in Research, been named Fellow of the Texas En-

gineering Experiment Station, and named Halliburton Professor of the Dwight

Look College of Engineering.

Authorized licensed use limited to: University of South Florida. Downloaded on August 10,2010 at 18:36:18 UTC from IEEE Xplore. Restrictions apply.