Content uploaded by Gopalakrishna Palem

Author content

All content in this area was uploaded by Gopalakrishna Palem

Content may be subject to copyright.

Data-dependencies and Learning in Artiﬁcial Systems

Palem GopalaKrishna krishna@cse.iitb.ac.in

Research Scholar, Computer Science & Engineering, Indian Institute of Technology - Bombay, Mumbai, India

Abstract

Data-dependencies play an important role in

the performance of learning algorithms. In

this paper we analyze the concepts of data

dependencies in the context of artiﬁcial sys-

tems. When a problem and its solution are

viewed as points in a system conﬁguration,

variations in the problem conﬁgurations can

be used to study the variations in the so-

lution conﬁgurations and vice versa. These

variations could be used to infer solutions to

unknown instances of problems based on the

solutions to known instances, thus reducing

the problem of learning to that of identifying

the relations among problems and their solu-

tions. We use this concept in constructing a

formal framework for a learning mechanism

based on the relations among data attributes.

As part of the framework we provide metrics

–quality and quantity – for data samples and

establish a knowledge conservation theorem.

We explain how these concepts can be used in

practice by considering an example problem

and discuss the limitations.

1. Introduction

Two instances of a function can only diﬀer in their

arguments, i.e. the input data. When a function is

sensitive to the data it is operating upon, even a slight

variation in the nature of data can cause large varia-

tions in the path of execution. This property of being

sensitive to data is termed as data-dependency which

poses critical restrictions on the applicability of algo-

rithms themselves.

The success of any data-dependent learning algorithm

highly depends on the nature of the data samples it

learns from. A well designed algorithm with mis-

matched data is unlikely to succeed in generalization.

Thus a careful analysis of the size and quality of the

input data samples is vital for the success of every

learning algorithm. While there exists suﬃcient num-

ber of metrics for learning in traditional systems in

this regard (Kearns, 1990; Angluin, 1992), there exists

almost none for learning in artiﬁcial systems, where

the typical requirements would be action selection and

planning implemented through agents (Wilson, 1994;

Bryson, 2003). These agents would act as determinis-

tic systems and thus demand non-probabilistic metrics

with data-independent algorithms.

Data-independence essentially means that the path

of execution (the series of instructions carried out)

is independent of the nature of the input data. In

other words, when an algorithm is said to be data-

independent, all instances of the algorithm would fol-

low the same execution path no matter what the input

data is. We can understand this with the following ex-

ample. Consider an algorithm to search a number in

a given array of numbers. Such an algorithm would

typically look like below.

int Search(int Array[], int ArrLen, int Number) {

for( int i=0; i < ArrLen; ++i )

if( Array[i] == Number)

return i;

return -1;

}

The above procedure sequentially scans a given array

of numbers to ﬁnd if a given number is present in the

array. It returns the index of the number if it ﬁnds

a match and −1 otherwise. The time complexity of

this algorithm is O(1) in the best case and O(n) in the

average and worst cases. However, if we change the

iterator construct from for(i= 0; i <ArrLen; ++i) to

for(i=ArrLen-1; i≥0; − − i), then the performances

would vary from best to worst and vice versa.

On the other hand consider the following data-

independent version of the same code.

int Search1(int Array[], int ArrLen, int Number) {

int nIndex = -1;

for(int i=0; i < ArrLen; ++i) {

int bEqual = (Number == Array[i]);

nIndex = bEqual * i + !bEqual * nIndex;

}

return nIndex;

}

Data-dependencies and Learning in Artiﬁcial Systems

Search1 is same as Search with the mere exception

that we have replaced the non-deterministic if state-

ment with a series of deterministic arithmetic con-

structs that in the end produce same results. The

advantage with this replacement is that the path of

execution is deterministic and independent of the in-

put array values, thus facilitating us to reorder or even

parallelize the individual iterations. This is possible

because no (i+ 1)th iteration depends on the results

of ith iteration, unlike the case of search where the

(i+ 1)th iteration would be processed only if the ith

iteration fails to ﬁnd a match.

Demanding a time complexity of O(n) in all cases,

it might appear that Search1 is inferior to Search in

performance. However, for this small cost of perfor-

mance we are gaining two invaluable properties that

are crucial for our present discussion: stability and

predictability.

It is a well-known phenomenon in the practice of

learning algorithms that the performance of learner is

highly aﬀected by the order of the training data sam-

ples, making the learner unstable and at times unpre-

dictable. In this regard, what relation could one infer

between the stability of the learner and the depen-

dencies among data samples? How does such relation

aﬀect the performance of learner? Can these depen-

dencies be analyzed in a formal framework to assist

the learning? These are some of the issues that we try

to address in the following.

2. Learning in Artiﬁcial Systems

By an artiﬁcial system we essentially mean a man-

made system that has a software module, commonly

known as agent, as one of its components. The artiﬁ-

cial system itself could be a software program such as a

simulation program in a digital computer, or it could

be a hardware system such as an autonomous robot

in the real world. And there could be more than one

agent in an artiﬁcial system. The system can use the

agents in many ways as to steer the course of simula-

tion or to process the environmental inputs (or events)

and take the necessary action etc. . . . Additionally,

the functionality of agents could be static, i.e. does

not change with experience, or it could be dynamic,

varying with experience. The literature addressing

these can be broadly classiﬁed into two classes, namely

the theories that study the agents as pre-programmed

units (such as (Reynolds, 1987; Ray, 1991)), and the

theories that consider the agents as learning units

which can adjust their functionality based on their

experience (e.g. (Brooks, 1991; Ramamurthy et al.,

1998; Cliﬀ & Grand, 1999)). The present discussion

Figure 1. Diﬀerent paths indicate diﬀerent algorithms to

solve a task instance in conﬁguration space

falls into the second category. We discuss a learning

mechanism for agents based on the notion of data-

dependencies.

Consider an agent that is trying to accomplish a task,

such as solving a maze or sorting the events based on

priority etc..., in an artiﬁcial system.

Assume that the instantaneous conﬁguration (the cur-

rent state) of artiﬁcial system is described by ngener-

alized coordinates q1, q2, . . . , qn,which corresponds to

a particular point in a Cartesian hyperspace, known

as the conﬁguration space, where the q’s form the n

coordinate axes. As the state of the system changes

with time, the system point moves in the conﬁguration

space tracing out a curve that represents ”the path of

motion of the system”.

In such a conﬁguration space, a task is speciﬁed by a

set of system point pairs representing the initial and

ﬁnal conﬁgurations for diﬀerent instances of the task.

An instance of the task is said to be solvable if there

exists an algorithm that can compute the ﬁnal con-

ﬁguration from its initial conﬁguration. The task is

said to be solvable if there exists an algorithm that

can solve all its instances.

Each instance of the algorithm solving an instance of

the task represents a path of the system in the con-

ﬁguration space between the corresponding initial and

ﬁnal system points. If there exists more than one al-

gorithm to solve the task then naturally there might

exist more than one path between the two points.

The goal of an agent that is trying to learn a task in

such a system is to observe the algorithm instances and

infer the algorithm. In this regard, all the information

that the agent would get from an algorithm instance

is just an initial-ﬁnal conﬁguration pair along with a

series of conﬁguration changes that lead from initial

conﬁguration to ﬁnal conﬁguration. The agent would

not be aware of the details of the underlying process

Data-dependencies and Learning in Artiﬁcial Systems

that is responsible for these changes, and has to infer

the process purely based on the observations.

The agent is said to have ”learned the task” if it can

perform the task on its own, by moving the system

from any given initial conﬁguration to the correspond-

ing ﬁnal conﬁguration in the conﬁguration space. It

should be noted that the procedure used (the algo-

rithm inferred) by the agent may not be the same

as the original algorithm from whose instances it has

learned.

It should also be noted that the notion of learning the

task, as described above, does not allow any proba-

bilistic or approximate solutions. The agent should be

able to perform the task correctly under all circum-

stances. An additional constraint that we put on the

agent is that it should infer the algorithm from as few

algorithmic instances as possible. This is important

for agents of both real world systems and simulation

systems alike, for in case of agents observing samples

from real world environment it may not be possible

to pickup as many samples as they want, and in case

of simulated environments each sample instance incurs

an execution cost in terms of time and other resources

and hence should be traded sparingly.

We formalize these concepts in the following.

2.1. A Formal Framework

Consider an agent that it trying to learn a task Tin a

system Swhose conﬁguration space is given by

C(S) = {~s1, ~s2, . . . , ~sN},

where each ~siis a system point represented with

n−coordinates {qi1, qi2, . . . , qin}.

Let Abe an algorithm to solve the task T, and

A1, A2, . . . , Akbe the instances of Asolving the in-

stances T1, T2, . . . , Tkof Trespectively.

In the conﬁguration space each Tiis represented by a

pair of system points (~si1, ~si2), and the corresponding

Aiby a path between those system points.

Let I, F be two operators that when applied to an

algorithm instance Ai,yield the corresponding initial

and ﬁnal system points respectively, such as I(Ai) =

~si1and F(Ai) = ~si2. We also deﬁne the corresponding

set versions of these operators ~

Iand ~

Fas following.

For all A0⊆ {A1, A2, . . . , Ak},

~

I(A0) = {I(Ai)|Ai∈A0},

and

~

F(A0) = {F(Ai)|Ai∈A0}.

The goal of the agent is to perform T, by mimicking

or modeling A, inferring A’s details from a subset of

its instances.

By following the tradition of learning algorithms, let

us call Aas the target concept, and the subset of its in-

stances D={D1, D2, . . . , Dd} ⊆ {A1, A2, . . . , Ak}as

the training set or data samples, and the agent as the

learner. We use the symbol Lto denote the learner.

At any instance during the phase of learning the set

Dcan be partitioned into two subsets O, O0such that

(O∪O0=D)∧(O∩O0=∅).

The set O⊆Ddenotes the set of data samples that

the learner has already seen, and the set O0⊆Dde-

notes the set of data samples the learner has yet to see.

Learning progresses by presenting the learner with an

unseen sample Di∈O0,and marking it as seen, by

moving it to the set O. Starting from O=∅, O0=D,

this process of transition would be repeated till it be-

comes O=D, O0=∅.

In this process, each data sample Didecreases the ig-

norance of the learner Labout the target concept A,

and hence could be assigned some speciﬁc informa-

tion content value that indicates how much additional

information Lcan gain from Diabout A.

We can determine the information content values of

data samples by establishing the concept of a zone,

where we treat an ensemble of system points that share

a common relation as a single logical entity.

Deﬁnition. A set Z⊆ C(S) deﬁnes a zone if there

exists a function f:C(S)→ {0,1}such that for each

~si∈ C(S) :

f(si) = 1 if ~si∈Z,

0 if ~si/∈Z.

The function fis called the characteristic function of

Z.

For the conﬁguration space C(S) = {~s1, ~s2, . . . , ~sm},

we can construct an equivalent zone-space Z(S) =

{Z1, Z2, . . . , Zr},such that the following holds.

∀~si∃Zi[~si∈Zi]∧ ∀r

i=1∀r

j=i+1 [Zi∩Zj=∅]∧

∀r

i=1 [Zi6=∅]∧ ∪r

i=1Zi=C(S).

The zone-space can be viewed as a m−dimensional

space with each Zibeing a point in it, where mis some

function of nwhose value depends upon and hence

would be decided by the nature of T. Let the range of

ith coordinate of this m−dimensional space be [0, ri].

Data-dependencies and Learning in Artiﬁcial Systems

If we use the notation |P|to indicate the size of any

set P, then we could represent the volume of the zone-

space as

vol(Z(S)) = |Z (S)|=

m

Y

i=1

ri.

Deﬁne an operator ∇:C(S)→ Z(S) that when ap-

plied to a system point in the conﬁguration space

yields the corresponding zone in the zone-space. Sim-

ilarly, let ~

∇be the corresponding set version of this

operator deﬁned as, for all S0⊆ C(S),

~

∇(S0) = {∇(~si)|~si∈S0}.

At any instance the knowledge of Labout Adepends

on the set of samples it has seen till then, and the infor-

mation content of a data sample depends on whether

the sample has already been seen by Lor not.

To deﬁne formally, the knowledge of the learner, after

having seen a set of samples O⊆D, is given by

KO(L) = X

Z∈~

∇(~

I(O))

|Z|.

The information content of any data sample Di∈D,

after the learner has seen a set of samples O⊆D, is

given by

ICO(Di) =

|∇(I(Di))|if ∀Dj∈O[∇(I(Di)) 6=

∇(I(Dj))];

0 otherwise;

and the information content of all data samples would

be given by

−→

ICO(D) = X

Z∈~

∇(~

I(D−O))

|Z|.

It should be noted that the above deﬁnitions measure

the information content of data samples relative to the

state of the learner and satisfy the limiting conditions

K∅(L) = 0,−→

ICD(D) = 0.

The process of learning is essentially a process of trans-

fer of information from data samples to the learner,

resulting in a change in the state of the learner. When

these changes are inﬁnitesimal, spanning many steps,

the transformation process satisﬁes the condition that

the line integral

L=ZD

∅

E do, (2.1)

where E=−→

ICO(D)− KO(L),has a stationary value.

This is known as the Hamilton’s principle, which states

that out of all possible paths by which the learner

could move from K∅(L) to KD(L),it will actually

travel along that path for which the value of the line

integral (2.1) is stationary. The phrase ”stationary

value” for a line integral typically means that the in-

tegral along the given path has same value to within

ﬁrst-order inﬁnitesimals as that along all neighboring

paths (Goldstein, 1980; McCauley, 1997).

We can summarize this by saying that the process of

learning is such that the variation of the line integral

Lis zero.

δL=δZD

∅

E do = 0.

Thus we can formulate the following conservation the-

orem.

Theorem 1. The sum KO(L) + −→

ICO(D)is conserved

for all O⊆D.

Proof. We shall prove this by establishing that

KOi(L) + −→

ICOi(D) = KOj(L) + −→

ICOj(D) for all

Oi, Oj⊆D.

Consider O1, O2⊆Dsuch that |O2|−|O1|= 1.Let

O2−O1={Di}.To calculate the information content

value of Di,we need to consider two cases.

Case 1. ∇(I(Di)) = ∇(I(Dj)) for some Dj∈O1.

In such case, ~

∇(~

I(O2)) = ~

∇(~

I(O1)),and hence

ICO1(Di) = ICO2(Di) = 0.

KO2(L) = PZ∈~

∇(~

I(O2)) |Z|

=PZ∈~

∇(~

I(O1)) |Z|

=KO1(L).

−→

ICO2(D) = −→

ICO1(D)−I CO1(D)

=−→

ICO1(D).

KO2(L) + −→

ICO2(D) = KO1(L) + −→

ICO1(D).

Case 2. ∇(I(Di)) 6=∇(I(Dj)) for all Dj∈O1.In

such case, ICO1(Di) = |∇(I(Di))|.

−→

ICO2(D) = −→

ICO1(D)−I CO1(Di)

=−→

ICO1(D)− |∇(I(Di))|.

KO2(L) = PZ∈~

∇(~

I(O2)) |Z|

=PZ∈~

∇(~

I(O1+{Di})) |Z|

=X

Z∈~

∇(~

I(O1))

|Z|+X

Z∈{∇(I(Di))}

|Z|

=KO1(L) + |∇(I(Di))|.

KO2(L) + −→

ICO2(D) = KO1(L) + |∇(I(Di))|+

−→

ICO1(D)− |∇(I(Di))|

=KO1(L) + −→

ICO1(D).

Data-dependencies and Learning in Artiﬁcial Systems

Thus whenever |O2|−|O1|= 1,it holds that

KO2(L) + −→

ICO2(D) = KO1(L) + −→

ICO1(D).

Now consider two sets Oi, Oj⊆Dsuch that |Oj| −

|Oi|=l, l > 1.Let Oj−Oi={Dj1, . . . , Djl}.We can

construct sets P1, . . . , Pl−1such that

P1=Oi∪ {Dj1}, . . . , Pl−1=Oi∪ {Dj1, . . . , Djl−1}.

Then it holds that

|P1|−|Oi|=|P2|−|P1|=· · · =|Oj|−|Pl−1|= 1.

However, we have proved that

KO2(L) + −→

ICO2(D) = KO1(L) + −→

ICO1(D)

whenever |O2|−|O1|= 1,and hence it follows that

KOi(L) + −→

ICOi(D) = KP1(L) + −→

ICP1(D),

KP1(L) + −→

ICP1(D) = KP2(L) + −→

ICP2(D),

.

.

.

KPl−1(L) + −→

ICPl−1(D) = KOj(L) + −→

ICOj(D),

and thereby, KOi(L) + −→

ICOi(D) = KOj(L) +−→

ICOj(D).

Hence the sum KO(L) + −→

ICO(D) is conserved for all

O⊆D.

An important consequence of this theorem is that ir-

respective of the order of individual samples that L

chooses to learn from, the gain in its knowledge would

always be equal to the corresponding loss in the infor-

mation content of the data samples.

KOj(L)− KOi(L) = −→

ICOi(D)−−→

ICOj(D).

We now deﬁne two metrics – quality and quantity –

for the data samples to denote the notions of necessity

and suﬃciency.

The metric quality measures the relative information

strength of individual samples, deﬁned as

quality(D) =

|D| −

~

∇(~

I(D))

|D|×100%.

Ideally a data sample set should have this value to be

100%.Smaller values indicate the presence of unnec-

essary samples that do not contribute to learning.

Similarly we deﬁne the quantity of data samples as

quantity(D) =

~

∇(~

I(D))

|Z(S)|×100%.

This is a suﬃciency measure and hence a value less

than 100% indicates the insuﬃciency of data samples

to complete the learning.

Theorem 2. The knowledge of the learner, after

completing the learning over data samples Dhaving

quantity(D) = 100%,would be equal to the volume of

the conﬁguration space |C(S)|.

Proof. When the quantity(D) = 100%,

~

∇(~

I(D))

=|Z(S)|.

Since ~

∇(~

I(D)) ⊆ Z(S),

~

∇(~

I(D))

=|Z(S)| ⇒ ~

∇(~

I(D)) = Z(S).

From theorem 1 we have,

KD(L) + −→

ICD(D) = K∅(L) + −→

IC∅(D).

Since K∅(L) = 0 and −→

ICD(D) = 0,

KD(L) = −→

IC∅(D)

=PZ∈~

∇(~

I(D−∅)) |Z|

=PZ∈~

∇(~

I(D)) |Z|

=PZ∈Z(S)|Z|

=|C(S)|.

Thus when quantity(D) = 100%,KD(L) = |C(S)|.

Theorem 3. The target concept can not be learnt with

less than |Z(S)|number of data samples.

Proof. Consider a data sample set D={D1, . . . , Dd}

having quantity(D) = 100% and |D|<|Z(S)|.

Let P=Z(S)−~

∇(~

I(D)) = {P1, . . . , Pl}, l ≥1.As-

sume that Lhas learned the target concept completely

from D. Then, by theorem 2, KD(L) = |C(S)|,and by

theorem 1,

KD(L) + −→

ICD(D) = K∅(L) + −→

IC∅(D).

Since K∅(L) = 0 and −→

ICD(D) = 0,it leads to

|C(S)|=−→

IC∅(D)

=PZ∈~

∇(~

I(D−∅)) |Z|

=PZ∈~

∇(~

I(D)) |Z|

=PZ∈(Z(S)−P)|Z|

=PZ∈Z(S)|Z| − PZ∈P|Z|

=|C(S)| − PZ∈P|Z|.

This is not possible unless P=∅,in which case it

would become Z(S) = ~

∇(~

I(D)),and |D| ≥ |Z(S)|.

Hence proved.

Data-dependencies and Learning in Artiﬁcial Systems

2.2. A Learning Mechanism Based on

Data-dependencies

Consider a task instance T1with end points (~s1, ~s2) in

the conﬁguration space. If we express T1as a point

function f, then we could write ~s2=f(~s1).An algo-

rithm instance A1that solves T1would typically im-

plement the functionality of fthereby representing a

path between ~s1and ~s2.If there exists more than one

way to implement f, then there exists more than one

path between ~s1and ~s2.Such a set of paths might be

denoted by f(~s1, α) with f(~s1,0) representing some

arbitrary path chosen to be treated as reference path.

Further, if we select some function η(~x) that vanishes

at ~x =~s1and ~x =~s2,then a possible set of varied

paths is given by

f(~x, α) = f(~x, 0) + α η(~x).

It should be noted that all these varied paths terminate

at the same end points, that is, f(~x, α) = f(~x, 0) for

all values of α.

However, when we try to consider another task in-

stance T2to be represented with these variations, we

need to make them less constrained. The tasks T1and

T2would not have the same end points in the con-

ﬁguration space and hence there would be a variation

in the coordinates at those points. We can, however,

continue to use the same parameterization as in the

case of single task instance, and represent the family

of possible varied paths by

fi(~x, α) = fi(~x, 0) + α ηi(~x),

where αis an inﬁnitesimal parameter that goes to zero

for some assumed reference path. Here the functions

ηido not necessarily have to vanish at the end points,

either for the reference path or for the varied paths.

Upon close inspection, one could realize that the varia-

tion in these family of paths is composed of two parts.

1. Variations within a task instance due to diﬀerent

algorithmic implementations.

2. Variations across task instances due to diﬀerent

initial system point conﬁgurations.

The learner can overcome the ﬁrst type of variations by

observing that the end points, and their corresponding

zones, are invariant to the paths between them. In this

regard, all the system points that belong to the same

initial, ﬁnal zone pair could be learned with a single

algorithm instance. However, for the second type of

variations, the learner may not be able to overcome

them without any prior knowledge of the task. All the

Figure 2. Schematic illustration of path variations across

task instances in conﬁguration space

diﬀerent instances of the task would have diﬀerent cor-

responding zones for their end points and hence they

need to be remembered as they are.

Below we present a mechanism that uses these con-

cepts of variations in the conﬁgurations paths to infer

solutions to the unknown problem instances based on

the solutions to the known problem instances. This

results in a learning like behavior where the known

problem-solution conﬁguration pairs form the training

set samples. Such samples could be collected by the

following procedure.

1. For a given problem identify the appropriate con-

ﬁguration space and number of dimensions.

2. Express the solution as a logical relation Rsin

terms of coordinates of the conﬁguration space.

3. Use Rsto identify an appropriate characteristic

function Fsto form a solution zone.

4. Use Fsin deciding the characteristic functions

for other zones and the number of dimensions for

zone-space.

5. Deﬁne the operator ∇to map the system points

from conﬁguration space to the zones in zone-

space.

6. Deﬁne an appropriate variation operator δin the

conﬁguration space such that variations in the

known problem conﬁgurations would give clue to

the variations in the solution conﬁgurations, such

as, solution(x+δx) = solution(x) + δx.

7. Construct the sample problem-solution conﬁgura-

tion pairs by using any traditional algorithm. The

Data-dependencies and Learning in Artiﬁcial Systems

samples should be such that all zones are repre-

sented.

Once we have all the required data samples with us,

the training procedure is simple and straightforward in

that all that is needed is to mark each of the sample

problem conﬁgurations as the reference conﬁguration

for the corresponding zone and remembering the re-

spective solution conﬁgurations for those references.

We can use a memory lookup table to store these ref-

erence solutions. The procedure is as follows.

For each data sample Di= (~pi, ~si)

{

Let Z=∇(pi);

RefProbConﬁg[Z] = ~pi;

RefSolConﬁg[Z] = ~si;

}

Once the training is over, we can compute the solution

conﬁguration ~s for any given problem conﬁguration ~p

in the conﬁguration space as follows.

1. For the given problem conﬁguration ~p, apply the

operator ∇and ﬁnd the zone Z=∇(~p);

2. Get the reference problem conﬁguration ~pi=

RefProbConﬁg[Z],and compute the variation

δ(~p, ~pi);

3. Compute the required solution conﬁguration from

the reference solution conﬁguration by applying

the variation parameter as:

~s = RefSolConﬁg[Z] + δ(~p, ~pi);

2.3. An Example Problem

To explain how these concepts of variations in the con-

ﬁguration paths could be used in practice, we consider

an example problem of sorting. We outline a proce-

dure that implements sorting based on the concepts

we have discussed till now.

The reason behind choosing sorting as opposed to any

other alternative is that the problem of sorting has

been well studied and well understood, and requires no

additional introduction. However, it should be noted

that our interest here is, rather to explain how zones

can be constructed and used for the sorting problem,

than to propose a new sorting algorithm; and hence

we do not consider any performance comparisons. In

fact, the procedure we outline below runs with O(n2)

time complexity requiring O(2n2) memory, thus any

performance comparisons would be futile.

To start with, we can consider the task of sort-

ing as being represented by its instances such as

{(3,5,4),(3,4,5)},where the second element (3,4,5)

represents the sorted result of ﬁrst element (3,5,4).

We can consider these elements as points in a

3−dimensional space.

Thus in general given an array of nintegers to be

sorted, we can form a system with n−coordinate axes

resulting in an n−dimensional conﬁguration space. If

we assume that each element of the array can take

a value in the range [0, N],where Nis some maxi-

mum integer value, then there would be a total of Nn

system points in the conﬁguration space. That is, fol-

lowing our notation from section 2.1, |C(S)|=Nn.To

construct the corresponding zone-space for this con-

ﬁguration space, consider the following mathematical

speciﬁcation for sorting,

∀n

i=1∀n

j=1 [i < j ⇒a[i]< a[j] ],

where ais an array with nintegers. This speciﬁca-

tion represents a group of conditions that need to be

satisﬁed by the array if it has to considered as being

in sorted order. Now, we can use this speciﬁcation in

identifying the following.

1. Number of dimensions of zone-space: The spec-

iﬁcation involves two quantiﬁers ∀n

i=1 and ∀n

j=1,

with an additional constraint i < j. Thus the valid

values could be i= 1, . . . , n, j =i+ 1, . . . , n, re-

sulting in a group of n×(n−1)/2 conditions to

be accounted for. Each condition would form one

coordinate axis in the zone-space and hence we

have n×(n−1)/2 axes.

2. Range of each axis of zone-space: Since each axis

is formed out of the condition (a[i]< a[j]),with

various values of i, j representing various axes, the

range of each axis would be deﬁned by the number

of possible conditions (a[i]< a[j]),(a[i] = a[j])

and (a[i]> a[j]),which is three. Hence the range

of each axis ri= 3.

3. Operator ∇: Each zone is a point in zone-space

with n×(n−1)/2 coordinates. To ﬁnd these

coordinate values we need to evaluate n×(n−1)/2

conditions (one for each axis) as below.

for(int i=0,r=0; i<n; ++i)

for(int j=i+1; j<n; ++j,++r)

ZCoord[r]=((a[i]=a[j])?0:((a[i]<a[j])?1:2));

4. Variation operator δ: We implement the variation

operator using the diﬀerences between relative ar-

ray positions of numbers before and after sorting.

We can use the array indexing and de-indexing

Data-dependencies and Learning in Artiﬁcial Systems

operations for this purpose. For example, sorting

a={3,5,4}produces ~a ={3,4,5},which gives us

a variation in the indices of elements from (0,1,2)

to (0,2,1).Thus we can use our variation operator

to express ~a as, ~a ={a[0], a[2], a[1] }.

Once we have these necessary operators with us, we

can start assigning the reference (unsorted, sorted)

conﬁguration pairs for each zone by using any tradi-

tional sorting algorithm such as heapsort or quicksort,

as shown below.

for(int i=0; i <nSamples; ++i)

{

GetZCoord(Unsorted[i], ZCoord);

quicksort(Unsorted[i], Sorted[i]);

SetRefConﬁg(ZCoord, Unsorted[i], Sorted[i]);

}

It should be noted that we have 3n×(n−1)/2zones in

the zone-space and hence we need so many sample

(unsorted, sorted) pairs as well. However, once we

complete the training with all those samples, we can

use the following procedure to sort any of the Nnpos-

sible arrays.

void LSort(int nArray[], int nSize, int nSorted[])

{

GetZCoord( nArray, ZCoord );

GetRefConﬁg( ZCoord, RefProb, RefSol);

for(int i=0; i <nSize; ++i)

nSorted[i] = nArray[RefSol[i]];

}

2.4. Limitations

Having presented the mechanism for learning based on

the concepts of variations in the system conﬁguration

paths, here we discuss the limitations of this approach.

Disadvantages: 1. As could be easily understood,

the concepts of conﬁguration space and zone-

space form the central theme of this ap-

proach. However, it may not be always pos-

sible to come up with appropriate conﬁgura-

tion space or zone-space for any given prob-

lem. In fact, for many tasks such as face

recognition etc. . . we readily do not have any

clues for logical relations among the data at-

tributes. This is one of the biggest limitations

of this approach.

2. To present the learner with some sample con-

ﬁgurations, we assumed the existence of an

algorithm that could solve the task at hand.

However, this assumption may not hold at

all times. Once again, face recognition is an

example.

3. The memory requirements are too high. We

have already seen that we need 3n×(n−1)/2

samples to correctly learn the sorting task.

However, given the goal of mimicking a human being

and the scope of abstract concepts the agents have

to learn from human beings, and given the virtually

unlimited number of problem instances that could be

solved by this learning mechanism, the memory re-

quirements should not become a problem at all (note

that the memory requirements do not depend on N

but on n, so there is no upper limit to the number of

problem instances that can be solved correctly). Fur-

ther advantages are as follows.

Advantages: 1. Independent of the order of train-

ing data samples. In this method, the learner

is invariant to the order in which it receives

the data samples. All that a learner does with

a data sample is, compute the corresponding

zone and mark the sample as a reference for

that zone. This process clearly is indepen-

dent of the order of the data samples and

hence gives the same results in all circum-

stances. It should be noted that the tradi-

tional learning algorithms does not guarantee

any such invariance.

2. Additional samples do not create any bias. If

there exists more than one sample per zone,

the characteristic functions of zones guaran-

tee that they all would produce the same re-

sults as that of ﬁrst sample. Hence the train-

ing would not be biased by the presence of ad-

ditional samples. Further, a sample could be

repeated as many times as one wants without

aﬀecting the training results. This is useful

for situations where a robot might be learn-

ing from real world, where some typical ob-

servations (such as the changing traﬃc lights,

ﬂow of vehicles etc.. . ) would get repeated

more frequently compared with some rare ob-

servations (such as earth quakes or accidents

etc. . . ). Traditional learning algorithms fail

to provide unbiased results in such situations.

3. Non-probabilistic metrics and accurate re-

sults. To meet the demands of artiﬁcial sys-

tems, the metrics we have devised are com-

pletely deterministic and are void of any

probabilistic assumptions and thus can be

adapted to any suitable system.

Data-dependencies and Learning in Artiﬁcial Systems

4. Expandable to multi-task learning. Though

we have concentrated on learning a single

task in this discussion, there is nothing in this

method that could prevent the learner from

learning more than one task at the same time.

For example, once an agent learns to sort

in ascending order (SASC), it can further

learn to sort in descending order (SDSC )

simply by computing the new variation op-

erator δSDS C directly from (δSAS C ),instead

of from new sample problem-solution conﬁg-

uration pairs. This saves the training time

and cost for SDSC. However, to implement

this feature the agent should be informed of

the relation between the tasks a priori. Smart

agents that can automatically recognize the

relation among tasks based on their conﬁgu-

ration spaces should be an interesting option

to explore further in this direction.

5. Knowledge transfer. All the knowledge of

the learner is represented in terms of refer-

ence conﬁgurations for individual zones. Any

learner who has access to these reference con-

ﬁgurations can perform equally well as the

owner of the knowledge itself, without the

need to go through all the training again.

This could lead to the concept of tradable

knowledge resources for agents.

6. Perfect partial learning. Just as additional

samples do not create bias, lack of samples

also would not create problems for learning.

A training set with quantity less than 100%

would still give correct results as long as the

problem instance at hand is from one of the

learnt zones. That is, whatever the agent

learns, it learns perfectly. This feature comes

handy to implement low cost bootstrapping

robots with reduced features and functional-

ity which can be used as ”data sample sup-

pliers” for other full-blown implementaions.

This concept of bootstrapping robots is one

of the fundamental concepts of artiﬁcial life

study in that it might invoke the possibility of

self-replicating robots (Freitas & Gilbreath,

1980; Freitas & Merkle, 2004).

3. Conclusions

The notion of data-independence for an algorithm

speaks for constant execution paths across all its in-

stances. A variation in execution path is generally

attributable to the variations in the nature of data.

When a problem and its solution are viewed as points

in a system conﬁguration, variations in the problem

conﬁgurations can be used to study the variations in

the solution conﬁgurations and vice versa. These vari-

ations could be used to infer solutions to unknown

instances of problems based on the solutions to the

known instances.

This paper analyzed the problem of data-dependencies

in the learning process and presented a learning mech-

anism based on the relations among data attributes.

The mechanism constructs a Cartesian hyperspace,

namely the conﬁguration space, for any given task, and

ﬁnds a set of paths from the initial conﬁguration to ﬁ-

nal conﬁguration that represents diﬀerent instances of

the task. As part of the learning process the learner

gradually gains information from data samples one by

one, till all data samples were processed. Once such

a transfer of information is complete, the learner can

solve any instance of the task without any restrictions.

The mechanism presented is independent of the order

of data samples and has the ﬂexibility to be expand-

able to multi-task learning. However, the practical-

ity of this approach may be hindered by the lack of

appropriate algorithms that could provide sample in-

stances. Further study to eliminate such bottlenecks

could make this a perfect choice to implement learning

behavior in artiﬁcial agents.

References

Angluin, D. (1992). Computational learning theory:

survey and selected bibliography. Proceedings of the

twenty-fourth annual ACM symposium on Theory of

computing (pp. 351–369). New York: ACM Press.

Balmer, M., Cetin, N., Nagel, K., & Raney, B. (2004).

Towards truly agent-based traﬃc and mobility simu-

lations. AAMAS ’04: Proceedings of the Third Inter-

national Joint Conference on Autonomous Agents

and Multiagent Systems (pp. 60–67). Washington,

DC, USA: IEEE Computer Society.

Brooks, R. A. (1991). Intelligence without reason. Pro-

ceedings of the 12th International Joint Conference

on Artiﬁcial Intelligence (IJCAI-91) (pp. 569–595).

San Mateo, CA, USA: Morgan Kaufmann publishers

Inc.

Brugali, D., & Sycara, K. (2000). Towards agent

oriented application frameworks. ACM Computing

Surveys,32, 21–27.

Bryson, J. J. (2003). Action selection and individua-

tion in agent based modelling. Proceedings of Agent

2003: Challenges of Social Simulation.

Data-dependencies and Learning in Artiﬁcial Systems

Cliﬀ, D., & Grand, S. (1999). The creatures global

digital ecosystem. Artiﬁcial Life,5, 77–93.

Collins, J. C. (2001). On the compatibility between

physics and intelligent organisms (Technical Report

DESY 01-013). Deutsches Elektronen-Synchrotron

DESY, Hamburg.

Decugis, V., & Ferber, J. (1998). Action selection in

an autonomous agent with a hierarchical distributed

reactive planning architecture. AGENTS ’98: Pro-

ceedings of the second international conference on

Autonomous agents (pp. 354–361). New York, NY,

USA: ACM Press.

Franklin, S. (2005). A ”consciousness” based architec-

ture for a functioning mind. In D. N. Davis (Ed.),

Visions of mind, chapter 8. IDEA Group INC.

Freitas, R. A., & Gilbreath, W. P. (Eds.). (1980).

Advanced automation for space missions, Proceed-

ings of the 1980 NASA/ASEE Summer Study. Na-

tional Aeronautics and Space Administration and

the American Society for Engineering Education.

Santa Clara, California: NASA Conference Publi-

cation 2255.

Freitas, R. A., & Merkle, R. C. (2004). Kinematic

self-replicating machines. Georgetown, TX: Landes

Bioscience.

Goldstein, H. (1980). Classical mechanics. Addison-

Wesley Series in Physics. London: Addision-Wesley.

Kamareddine, F., Monin, F., & Ayala-Rinc´on, M.

(2002). On automating the extraction of programs

from proofs using product types. Electronic Notes

in Theoretical Computer Science,67, 1–21.

Katsuhiko, T., Takahiro, K., & Yasuyoshi, I. (2002).

Translating multi-agent autoepistemic logic into

logic program. Electronic Notes in Theoretical Com-

puter Science,70, 1–18.

Kearns, M. J. (1990). The computational complexity of

machine learning. ACM Distinguished Dissertation.

Massachusetts: MIT Press.

Lau, T., Domingos, P., & Weld, D. S. (2003). Learn-

ing programs from traces using version space alge-

bra. K-CAP ’03: Proceedings of the international

conference on Knowledge capture (pp. 36–43). New

York, USA: ACM Press.

Laue, T., & R¨ofer, T. (2004). A behavior architecture

for autonomous mobile robots based on potential

ﬁelds. RoboCup 2004. Springer-Verlag.

Littlestone, N. (1987). Learning quickly when irrele-

vant attributes abound: A new linear-threshold al-

gorithm. Machine Learning,2.

Lopez, R., & Armengol, E. (1998). Machine learning

from examples: Inductive and lazy methods. Data

& Knowledge Engineering,25, 99–123.

Maes, P. (1989). How to do the right thing. Connection

Science Journal,1.

McCauley, J. (1997). Classical mechanics. Cambridge

University Press.

Moses, Y. (1992). Knowledge and communication: A

tutorial. TARK ’92: Proceedings of the 4th con-

ference on Theoretical aspects of reasoning about

knowledge (pp. 1–14). San Francisco, CA, USA:

Morgan Kaufmann Publishers Inc.

Raedt, L. D. (1997). Logical settings for concept-

learning. Artiﬁcial Intelligence,95, 187–201.

Ramamurthy, U., Franklin, S., & Negatu, A. (1998).

Learning concepts in software agents. From Ani-

mals to Animats 5: Proceedings of The Fifth Inter-

national Conference on Simulation of Adaptive Be-

havior. Cambridge: MIT Press.

Ray, T. S. (1991). Artiﬁcial life ii, chapter An Ap-

proach to the Synthesis of Life. Newyork: Addison-

Wesley.

Ray, T. S. (1994). Evolution, complexity, entropy and

artiﬁcial reality. Physica D,75, 239–263.

Reynolds, C. W. (1987). Flocks, herds, and schools: A

distributed behavioral model. Computer Graphics,

21, 25–34.

Schmidhuber, J. (2000). Algorithmic theories of every-

thing (Technical Report IDSIA-20-00 (Version 2.0)).

Istituto Dalle Molle di Studi sull’Intelligenza Artiﬁ-

ciale, Manno-Lugano, Switzerland.

Wilson, S. W. (1994). Zcs: a zeroth level classiﬁer

system. Evolutionary Computation,2, 1–18.

Zurek, W. H. (1989). Algorithmic randomness and

physical entropy. Physical Review A,40, 4731–4751.