ArticlePDF Available

Knowledge-based simulation of genetic regulation in bacteriophage lambda

Authors:

Abstract

We have developed a general-purpose computer program for the functional simulation of regulatory genetics. This simulator is knowledge-based and was developed using the Unit System, a software tool for the acquisition, representation, and manipulation of hierarchically organized knowledge. The advantages of a knowledge-based design are presented, and the simulator's architecture is described. Its performance on the decision between lytic and lysogenic growth in Sactenophage Lambda is reported.
Volume
12
Number
1
1984
Nucleic
Acids
Research
Knowledge-based
simulation
of
genetic
regulation
in
bacteriophage
lambda
Scott
Meyers'
and
Peter
Friedland
MOLGEN
Project,
Department
of
Computer
Science,
Stanford
University,
Stanford,
CA
94305,
USA
Received
12
August
1983
ABSTRACT
We
have
developed
a
general-purpose
computer
program
for
the
functional
simulation
of
regulatory
genetics.
This
simujlator
is
knowledge-based
and
was
developed
using
the
Unit
System,
a
E-ofi(wiare
tool
for
the
acqUisition,
representation,
and
manipulation
of
hierarchiically
organized
knooialedge.
The
advantages
of
a
knowledge-based
design
are
presented,
and
the
simulator's
architecture
is
described.
Its
performance
on
the
decision
between
lytic
and
lysogenic
growth
in
Bacteriooihage
Lambda
is
reported.
INTRODUCTION
The
development
of
flexible,
easy-to-use
simulation
programs
can
provide
the
experimenital
biologist
with
powerfujl
tools
for
the
elucidation
of
the
functioning
of
complex
natural
systems.
Simulators
serve
two
major
purposes:
the
first
is
the
verification
of
scientific
theories,
the
second
is
experimental
result
predictiorn.
The
verification
function
is
called
upor.
when
existinig
thieories
are
being
extended
or
new
theories
are
being
generated
to
explain
experimental
data;
the
predictive
capabilities
are
used
to
predict
laboratory
results
in
order
to
eliminate
a
great
deal
of
experimental
effort.
An
especially
important
role
for
a
simulation
program
would
be
as
par,
of
a
larger
system
ernploying
artificial
initelligence
techniques
to
develop
models
of
a
biological
system
based
on
experimental
observations.
Such
a
program
would
accept
as
input
observations
of
a
system
anid
would
produce
as
output
a
model
for
the
system
that
could
account
for
the
observations.
The
simu!ation
portion
of
such
a
program
would
be
a
crucial
tool
for
ensuring
that
only
theories
that
were
consistent
with
the
data
were
developed.
It
is
a
major
research
goal
cf
the
MOLGEN
project
to
explore
methods
for
building
a
systern
to
perform
this
kir,d
of
automatic
theoty
formation.
Traditional
Apprqaches
The
most
widespread
approach
to
genetic
simulation
is
based
on
developing
large
systems
of
differential
equations
that
predict
proteir.
and
nucleic
acid
levels
as
a
function
of
time.
An
example
of
this
approach
applied
to
Bacteriophage
Lambda
can
be
found
in
[1].
This
methodology
results
in
quantitative
predictions,
but
suffers
from
the
drawback
that
no
functional
information
is
provided.
It
is
©
I
R
L
Press
Limited,
Oxford,
England.
Nucleic
Acids
Research
also
restricted
to
systems
where
a
great
number
of
kinetic
parameters
are
accurately
known,
and
such
is
frequently
not
the
case.
Another
approach
to
genetic
simulation
is
based
on
modeling
"genetic
control
circuits"
as
sets
of
logic
equations
and
predicting
sequences
of
genetic
events
by
searching
for
stable
states
in
a
matrix
derived
from
the
equations
[2].
While
this
approach
results
in
functional
predictions,
much
of
the
information
inherent
in
the
Systemn
is
lost
in
the
abstraction
of
representing
biochemical
entities
as
boolean
variables.
As
a
result,
it
is
difficult
to
answer
why
a
system
behaves
in
a
certain
way
without
describing
how
the
logic
equationis
were
constructed.
Knowledge-Based
Approach
Our
work
has
applied
a
,(nowledge-based
approach
to
the
problem
of
biological
simulation.
The
distinction
between
a
system
founded
on
a
database
and
one
founded
on
a
knowledge-base
is
somewhat
fuzzy,
but
the
followirg
!ist
should
serve
to
point
out
some
of
the
basic
differences.
*
Content:
knowledge
bases
contain
both
factual
and
procedural
information,
while
databases
tend
to
contain
on!y
factual
information.
Futhermore,
the
factual
information
in
knowledge
bases
varies
from
simple
symbols
(numbers,
strings,
and
the
like)
to
much
more
complicated
form.s
(DNA.
restriction
maps.
electron
density
maps,
etc.).
An
important
type
of
information
that
is
common
in
knowledge
bases
but
rare
in
databases
is
meta-knowledge:
know!edge
about
how
to
use
other
knowledge.
An
example
of
such
meta-know!edge
is
heuristics
for
determining
how
relevant
and
trustworthy
other
knowledge
is
under
a
variety
of
circumistances.
*
Structure:
databases
most
often
have
a
rigid,
well-defined
format
that
can
be
expanded
or
compressed,
but
cannot
fundainentally
be
changed.
Most
references
in
databases
are
explicit.
Knowledge
bases,
in
contrast,
have
a
very
fluid,
dynamic
structure,
and
many
references
are
contained
implicitly.
*
Complexity:
the
inforination
in
databases
often
lends
itself
to
rather
straightforward
analysis,
while
the
informationi
in
knowledge
bases
can
be
used
as
a
basis
for
"reasoning"
about
poorly
defined
problems.
For
example,
databases
have
been
used
to
find
statistical
correiations
between
symptoms
and
diseases
[3],
while
knowiedge
bases
have
been
used
in
the
development
of
programs
to
perform
automatic
diagnosis
of
human
illnesses
[4,
5].
Our
work
has
shown
that
a
knowledge-based
approach
can
have
significant
advantages
over
the
alternative
methodologies
mentioried
above.
Unlike
a
simulajor
based
on
differential
equations,
our
simulator
is
not
dependent
on
detailed
kinetic
data,
and
unlike
a
simulator
based
on
logic
equations,
our
simulator
retains
all
information
inherent
in
the
system.
A
knowledge-based
system
can
take
advantage
of
all
data
about
a
system,
both
declarative
and
procedural.
This
allows
our
simulator
to
predict
not
only
what
happens,
but
to
predict
why
it
happens,
as
well.
2
Nucleic
Acids
Research
THE
SIMULATOR
The
simulator
was
developed
using
the
Unit
System
[6,
7],
a
system
for
the
acquisition,
representation,
and
manipulation
of
hierarchically
organized
knowledge.
It
was
chosen
for
use
on
this
project
because
it
has
a
well-developed
user
interface
that
is
customized
for
the
realm
of
molecular
genetics,
and
because
it
contains
a
Rules
Language,
an
English-like
language
that
allows
specialists
unfamiliar
with
computers
to
describe
procedural
knowledge
directly
to
the
knowledge
base.
The
simulator
knowledge
base
contains
two
kinds
of
information.
The
first
is
general
information
about
molecular
genetics
(e.g.,
the
concepts
of
genes,
proteins,
transcription,
etc.),
the
second
is
specific
information
about
the
particular
biological
system
under
study,
in
this
case
Bacteriophage
Lambda.
The
general
knowledge
consists
of
rules
in
the
Rules
Language;
it
is
the
program
that
makes
the
simulator
run.
These
rules
are
used
to
determine
whether
genes
are
being
transcribed,
whether
proteins
are
present,
etc.
Two
aspects
of
these
rules
are
noteworthy.
The
first
is
that
they
are
completely
general
to
regulatorl
genetics
-
they
are
not
limited
to
a
simulation
of
Bacteriophage
Lambda.
It
is
possible
for
the
simulator
to
determnne
the
state
of
any
set
of
genes
and
proteins
that
are
described
in
the
system-specific
portion
of
the
knowledge
base.
The
second
is
that
the
rules
are
designed
for
geneticists,
rather
than
for
computer
programmers;
it
is
relatively
easy
for
a
scientist
familiar
with
molecular
genetics
and
the
general
design
of
the
knowledge
base
to
determine
what
each
statement
in
the
Rules
Language
accomplishes.
The
user
does
ntot
need
to
understand
any
of
the
internal
details
of
knowledge
representation
end
maniplulation
within
the
Unit
System
to
"program"
using
the
Rules
Language.
In
addition,
the
language
postpones
the
determination
of
context
until
the
time
the
rules
are
actually
execLuted.
This
means
that
the
user
can
describe
procedural
inforination
about
objects
and
properties
that
do
not
yet
exist;
the
rile
interpreter
heuristically
"binds"
all
variables
to
plausible
items
in
the
relevant
units
being
considered.
Users
can
freely
employ
words
like
"the,"
"to,"
etc.
to
make
statemenits
more
readable.
For
example,
the
rules
used
to
determine
the
current
status
of
a
protein
are
shown
below.
Comiments
are
in
italics.
IF
BEING-TRANSCRIBED0GENE
IS
FALSE
THEN
SET
NEWLIFE
TO
-1
ELSE
SET
NEWLIFE
TO
LIFESPAN
This
rule
determines
a
tentative
value
for
the
change
in
time
that
the
prQtein
of
interest
will
remain
active
and
stores
that
value
in
the
variable
called
NEWLIFE.
This
value
is
analogous
to
the
protein's
change
ini
concentration,
for
the
simulator
equates
effective
concentration
wvith
a
positive
reniaiiiing-time.
If
the
protein
is
not
being
produced
(if
the
corresponding
gene
is
not
being
transcribed),
the
protein's
remainiing-time
(concentration)
decays
by
one
unit.
Otherwise,
the
protein's
remaining-time
is
set
to
the
maximum
possible,
its
lifespan.
The
"@"
notation
is
provided
to
allow
the
user
to
explicitly
specify
the
context
of
a
variable;
in
this
case
BEING-TRANSCRIBED
relates
to
GENEs.
SET
MODULATOR-EFFICIENCY
TO
0
FOR
EACH
ROW
IN
THE
MODULATORS
TABLE
IF
REMAINING-TIME0MODULATING-PROTEIN
>*
I
THEN SET
MODULATOR-EFFICIENCY
TO
ACTION-COEFFICIENT
OTHERWISE
SET
MODULATOR-EFFICIENCY
TO
ACTION-COEFF
TIMES
REH,SAINING-TIME@MODULATING-PROTEIN
3
Nucleic
Acids
Research
Some
proteins
are
affected
by
other
proteins.
These
modulating
proteirns
are
kept
in
a
table
for
any
protein
affected.
This
rule
(and
the
one
prior
to
it)
look
at
the
relative
strengths
and
conicentrations
of
each
of
the
modulators
and
use
them
to
determine
an
overall
value
for
the
variable
MODULATOR-EFFICIENCY,
a
measure
of
how
the
modtilating
proteins
affect
the
remaining-time
of
the
protein
in
question.
SET
NEWLIFE
TO
NEWLIFE
+
MODULATOR-EFFICIENCY
SET
REMAINING-TIME
TO
REMAINING-TIME
+
NEWLIFE
The
two
preceding
statements
use
the
values
of
NEWLIFE
and
MODULATOR-EFFICIENCY
to
determine
the
amount
of
time
that
the
protein
in
question
will
remain
active.
IF
REMAINING-TIME
>
LIFESPAN
THEN
SET
REMAINING-TIME
TO
LIFESPAN
This
statenient
ensures
that
no
protein
ever
exceeds
its
maximum
possible
concentration.
IF
REMAINING-TIME
>
0
THEN
SET
PRESENT
TO
TRUE
ELSE
SET
REMAINING-TIME
TO
0
AND
SET
PRESENT
TO
FALSE
A
protein
is
present
if
it
will
still
be
active
for
some
time
greater
than
zero.
The
information
in
the
knowledge
base
that
is
specific
to
the
biological
system
under
study
is
divided
into
three
classes:
genes,
prcteins,
and
special
DNA
control
loci
(promoters,
operators,
terminators,
and
nut
sites).
Each
of
these
classes
is
defined
as
a
prototypical
uinit
within
the
knowledge
base
with
slots
to
contain
the
properties
used
by
the
simulator.
These
slots
may
either
be
static
facts
associated
with
a
particular
object,
e.g.,
the
lifespan
(akin
to
a
half-life)
or
binding
affinity
of
a
protein,
or
dynamic
properties
modified
during
the
simulation,
such
as
whether
a
gene
is
currently
being
transcribed
or
which,
if
any,
protein
is
attached
to
a
special
DNA
locus.
T-he
process
of
describing
a
genetic
system
to
be
simulated
consists
of
creating
the
appropriate
children
of
these
units
and
filling
in
all
of
the
slots
for
which
information
is
known.
An
example
of
one
of
the
prototypical
units
(GENES)
is
shown
below.
Each
slot
occupies
one
line
in
the
listing,
and
for
each
slot,
three
pieces
of
information
are
given:
its
name,
its
datatype,
and
its
value.
Datatypes
describe
what
kind
of
information
is
stored
in
the
slot,
for
example,
a
description
(for
the
DESCR
slot).
a
string
(for
the
ORGANISM
slot),
or
a
list
of
other
units
(for
the
PROMOTERS
slot).
For
a
prototypical
unit,
slot
values
may
be
either
an
actual
instance
of
the
datatype
(for
the
CREATOR
slot),
or
a
restriction
on
possible
vaiues
for
an
instance
of
the
datatype
(a
chioice
of
"LAMBDA"
or
"E-COLI"
for
the
unit's
ORGANISM,
for
example).
As
in
the
prior
listing,
comments
are
italicized.
Nlamo
Datatype
Value
DESCR:
<DESCR>
This
node
is
the
root
of
all
genes.
CRFATOR:
<CREATOR>
"CSD.MEYERS"
The
above
two
slots
(as
wall
as
sevaral
others
not
shown)
contain
"bookkeeping"
information
associated
with
the
unit.
ORGANISM:
<STRING>
One
of:
["LAMBDA"
"E-COLI"]
PROMOTERS:
<LIST>
UNITs:
There
may
be
several
promoters
associatled
with
a
gene.
This
slot
will
be
filled
for
a
soecific
gene
with
a
list
of
tne
names
of
one
or
mwore
specific
promoter
units.
INTERVENING-TERMINATOR:
<UNIT>
If
there
is
a
terrminator
locus
between
a
genie's
promoter
and
its
coding
region,
the
naine
of
that
locuis
will
appear
here.
4
Nucleic
Acids
Research
NUT-SITE:
<UNIT>
If
a
nut
site
occurrs
within
the
coding
region
of
a
gene,
the
name
of
that
site
will
be
entered
here.
MUTATED:
<STRING>
One
of:
["TRUE"
"FALSE"]
BEING-TRANSCRIBED:
<STRING)
One
of:
["TRUE"
"FALSE"]
The
above
tYwo
properties
are
used
by
the
simulator
to
indicate
the
current
state
of
the
gene.
The
overall
structure
of
the
knowledge
base
used
to
simulate
Bacteriophage
Lambda
is
shown
in
Figure
1.
It
is
important
to
note
that
both
the
rules
for
the
simulator
(the
"program")
and
the
data
about
Lambda
(i.e.,
the
genes.
proteins,
and
DNA
loci)
are
stored
in
the
same
knowledge
base.
This
type
of
uniformity
in
representation
would
be
virtually
impossible
if
a
database
were
used.
APPLICATION
TO
BACTERIOPHAGE
LAMBDA
Bacteriophage
Lambda
is
an
attractive
genetic
system
for
simulation
studies
for
two
reasons.
First,
it
exhib.ts
a
variety
of
regulatory
mechanisms
that
are
believed
to
be
common
to
all
biological
systems.
Lambdla
is
thtis
the
subject
of
intense
interest
in
the
biological
community.
Second,
Lambda
has
been
studied
for
over
two
decades
and
is
now
probably
better
understood
than
is
any
corparable
organism.
This
paper
presupposes
the
reader's
familiarity
with
the
basic
interactions
that
take
place
in
the
regulatory
region
of
Lainbda
during
the
process
that
leads
to
lysis
or
lysogeny;
those
unfamiliar
with
the
system
may
want
to
consult
[8].
The
Model
of
Lambda
Only
those
genes,
proteins,
and
DNA
coritrol
loci
which
affect
the
decision
to
go
lytic
or
lysogenic
were
included
in
the
model.
This
led
to
the
simp!ified
model
of
6
Lambda
genes, 7
promoters,
3
operators,
3
terminators,
and
2
nut
sites
shown
in
Figure
2.
In
addition,
the
hfI
gene
of
E.
Coli
was
included
in
the
model,
for,
unlike
most
E.
Coli
genes,
tlhe
interaction
of
the
hfl
gene
product
with
the
Lambda
proteins
is
both
important
and
well
understood.
Figure
1:
Structure
of
the
Lambda
Knowledge
Base
5
Nucleic
Acids
Research
cl
li
u
N
PL
iL
cl
Pm
I
R
L
r
P
cro
C
t|
0P.
|
e
|
ci
0
Po
R
Figure
2:
Regulatory
Region
of
Simplified
Lambda
Genome
Results
We
ran
simulations
on
eight
different
genetic
systems:
wild-type
Lambda;
single-locus
mutants
of
Lambda
at
cl,
cli,
cro,
N,
and
nutR;
a
Lambda
double
mutant
at
N
and
tR1;
and
wild-type
Lambda
in
an
hfl
host.
All
mutations
were
assumed
to
be
completely
deleterious
(the
affected
DNA
locus
suffered
complete
loss
of
function).
In
all
but
two
of
these
cases,
the
results
of
the
simulator
agreed
with
what
is
observed
in
the
laboratory.
Those
two
examples,
the
N
mutant
and
the
nut
R
mutant,
resulted
in
a
prediction
that
the
system
would
oscillate
between
two
different
genetic
states
without
making
a
committment
to
either
the
lysogenic
or
the
lytic
pathway.
Within the
confines
of
the
model,
this
is
a
correct
prediction,
for
the
current
model
assumes
entirely
deterministic
behavior.
In
reality,
of
course,
this
is
not
the
case.
In
particular,
terminator
sites
do
not
stop
all
transcripts
that
have
not
been
explicitly
antiterminated;
they
are
"leaky,"
a
fact
not
taken
into
account
by
our
model.
Adding
nondeterministic
behavior
to
the
model
would
significantly
increase
its
power;
we
plan
to
make
this
enhancement
in
the
future.
The
knowledge-based
simulator
of
Lambda
also
provides
a
useful
tool
for
teaching
students
about
regulatory
models.
The
student
can
describe
his
current
understanding
of
Lambda
to
the
simulator,
run
the
system
under
various
conditions,
and
see
if
the
results
reflect
reality.
Where
conflicts
occur,
the
student
can
determine
exactly
what
knowledge
was
incorrect
or
incompletely
understood,
and
modify
the
knowledge
base
accordingly.
The
use
of
the
simulator
for
educational
purposes
can
be
extended
to
encompass
serious
scientific
study
of
the
model
by
experts.
Presumably,
if
an
expert's
view
of
the
Lambda
system
does
not
reflect
reality
when
the
simulator
is
run,
then
the
model
itself
must
be
improved.
As
was
mentioned
earlier
in
the
article,
emulating
the
human
process
of
scientific
model
construction,
testing,
and
improvement
is
a
major
research
goal
of
the
MOLGEN
project;
the
knowledge-based
simulator
will
be
an
important
tool
for
this
research.
Saignificantly,
the
Lambda
knowledge
base
(including
the
simulator
rules)
took
on
the
order
of
only
two
man-months
to
build
and
test.
The
existence
of
a
sophisticated
knowledge
representation
and
acquisition
tool
(the
Unit
System)
allowed
the
construction
of
a
simulator
in
far
less
time
than
would
have
been
required
with
conventional
prograinming
techniques.
6
Nucleic
Acids
Research
AN
EXAMPLE
The
following
example
illustrates
the
type
of
information
that
the
simulator
provides
to
the
biologist.
The
ability
to
describe
the
functional
state
of
each
important
entity
(e.g.
promoter,
protein,
etc.)
at
all
times
is
one
of
the
major
advantages
of
the
knowledge-based
approach.
The
example
shows
the
results
of
running
the
simulator
on
the
N
and
tR1
double
mutant.
Annotations
to
the
transcript
are
printed
in
italics.
ENTER
LIST
OF
MUTATED
UNITS:
n
t-rl
The
user
is
first
asked
to
provide
a
list
of
mutations.
The
mutations
are
specified
by
listing
the
names
of
the
units
in
the
knowledge
base
that
represernt
the
DNA
loci
to
be
mutated.
Entering
the
null
list
(no
entries)
results
in
a
simulation
of
wild-type
Lambda.
ENTER
NUMBER
OF
PHAGE
TO
SIMULATE:
1
The
only
other
information
specified
by
the
user
is
the
number
of
bacteriophages
to
simulate.
Because
the
model
is
determin1istic,
all
simulations
for
any
given
set
of
mutations
will
result
iln
identical
behavior.
When
nondeterminism
is
added
to
the
model,
however,
it
will
be
possible
to
obtain
a
distribution
of
results.
**
STARTING
ITERATION
NUMBER
1
'*
The
user
is
informed
every
time
the
simulator
begins
a
new
cycle.
A
cycle
consists
of
updating
the
state
of the
system
and
then
checking
for
a
commitment
to
lysis
or
lysogeny.
P-E
IS
INACTIVE
*
*
P-L
IS
ACTIVE
**
P-M
IS
INACTIVE
*
P-R
IS
ACTIVE
0*
The
state
of
each
of
the
promoters
is
printed
out
after
they
have
been
updated.
No
mention
is
made
of
the
operators,
for
if
a
promoter
is
not
active,
genes
originating
at
that
point
cannot
be
transcribed.
*
CI
IS
NOT
BEING
TRANSCRIBED
*
*
CII
IS
BEING
TRANSCRIBED
**
CIII
IS
NOT
BEING
TRANSCRIBED
*
**
CRO
IS
BEING
TRANSCRIBED
**
*
HFL
IS
BEING
TRANSCRIBED
*0
N
N
IS
NOT
BEING
TRANSCRIBED
*0
'
Q
IS
BEING
TRANSCRIBED
0*
After
the
status
of
the
gernes
is
updated,
their
states
are
also
printed
out.
**
REMAINING
TIME
OF
Cl-PROTEIN
IS
0
*0
*
REMAINING
TIME
OF
CII-PROTEIN
IS
3.0
*0
REMAINING
TIME
OF
CIII-PROTEIN
IS
0
00
00
REMAINING
TIME
OF
CRO-PROTEIN
IS
3.0
0
00
REMAINING
TIME
OF
HFL-PROTEIN
IS
3.0
O
REMAINING
TIME
OF
N-PROTEIN
IS
0
**
00
REMAINING
TIME
OF
Q-PROTEIN
IS
4.0
O
The
final
piece
of
information
for
an
iteration
is
tne
remaining
time
of
each
of
the
proteins.
The
remaining
time
represernts
the
number
of
cycles
that
a
pro!ein's
concentration
will
renmaini
sufficient
for
it
to
exert
an
influence
on
the
system.
The
remaining
times
shown
for
cl,
cro.
hfI,
and
Q
are
the
samo
as
their
lifespan,
since
no
time
has
yet
passed
since
their
production.
7
Nucleic
Acids
Research
*
STARTING
ITERATION
NUMBER
2
O8
88
P-E
IS
INACTIVE
$0
t'-L
IS
INACTIVE
*0
88
P-M
IS
INACTIVE
*
P-R
IS
INACTIVE
Os
-
CI
IS
NOT
BEING
TRAHSCRIBED
8
*
CII
IS
NOT
BEING
TRANFCRIBED
8
CIII
IS
NOT
BEING
TRANSCRIBED
*
CRO
IS
NOT
BEItG
TRANSCRIBED
N
NFL
IS
BEING
TRAUSCRIBED
Os
88
N
IS
NOf
BEING
TRANSCRIBED
*
Q
IS
NOT
BEING
TRANSCRIBED
*8
8
REMAIIING
TIME
OF
Cl-PROTEIN
IS
0
*
REMAINING
TIME
OF
CII-PROTEIN
IS
1.6
8
88
REMAINING
TIME
OF
CIII-PROTEIN
IS
0
*8
*
REMAINING
TIME
OF
CRO-PROTEIN
IS
2.0
88
*
REMAINING
TIME
OF
HFL-PROTEIN
IS
3.0
*
8
REMAINING
TIME
OF
N-PROTEIN
IS
0
*
88
REMAINING
TIME
OF
Q-PROTEIN
IS
3.0
88
Note
how
the
remaining
times
have
changed.
Hfi,
being
an
E.
Coli
gere,
is
alwas
transcribed,
and
thie
remaining
time
of
its
protein
is
therefore
always
the
same
as
its
lifespan.
The
proteins
from
cro
and
Q
have
begun
to
decay
and
their
remaining
timre
has
decreased
by
one
time
unit
over
the
course
of
the
second
iteration.
Because
the
hfl
Cene
product
attacks
the
cli
protein,
the
decay
rate
of
the
latter
gene
product
is
faster
than
normal.
Its
remaining
time
has
thus
decreased
by
1.4
time
units
in
the
past
cycle.
Iterations
3
and
4
are
omitted
here
for
brevity.
88
STARTING
ITERATION
NUMBER
6
8
88
FINIS'1ED
SIMULATION
NUMBER
1
*I
Having
come
to
the
conclusion
that
the
phage
being
simulated
would
purrue-
lytic
growth,
the
simulatcr
announces
that
it
has
finished
the
sinmulation.
8**
SIMULATION
RESULTS
**8
NUMBER
OF
LYSOGENIC
PHAGE
-
0
NUMBER
OF
LYTIC
PHAGE
*
1
PERCENTAGE
LYSOGENIC
PHAGE
-
0.0
PERCEurAGE
LYTIC-PHAGE
a
100.0
When
all
simulations
are
completed,
the
simulator
prints
a
summary
of
the
results.
ACKNOWLEDGEMENTS
This
work
is
part
of
the
MOLGEN
project,
a
joint
research
effort
among
the
departments
of
Computer
Science,
Medicine,
and
Biochemistry
at
Stanford
University.
The
research
has
been
supported
under
NSF
grant
MC580-16247.
Computational
resources
have
been
provided
by
the
SUMEX-AIM
National
Biomedical
Research
Resource,
NIH
grant
RR-00785-08,
and
by
the
Department
of
Computer
Science.
Thanks
are
due
to
Dr.
Rene
Bach
for
valuable
comments
during
the
development
of
the
simulator
ancd
during
the
preparation
of
this
manuscript.
MOLGEN
is
a
trademark
of
the
Board
of
Trustees
of
Stanford
University.
8
Nucleic
Acids
Research
'Current
address:
Silvar-Lisco,
3172
Porter
Drive,
Palo
Alto,
CA
94304,
USA
REFERENCES
1.
Kananyan,
G.
K.,
et-al.,
'Expanded
Model
of
the
Ontogenesis
of
Phage
Lambda,"
Soviet
Genetics,
Vol.
16,1980,
pp.
1280-1286.
2.
Thomas,
R.,
et.
al.,
"A
Complex
Control
Circuit:
Regulation
of
Immunity
in
Temperate
Bacteriophages,"
Eur.
J.
Biochem.,
Vol.
71,1976,
pp.
211-227.
3.
Blum,
R.L.,
"Discovery,
Confirmation,
and
Incorporation
of
Causal
Relationships
from
a
Large
Time-Oriented
Clinical
Database:
the
RX
Project,"
in
Computers
in
Biomedical
Research,
H.R.
Warner,
ed.,
Academic
Press,
New
York,
1982,
pp.
164-187.
4.
Shortliffe,
E.H.,
Computer-Based
Medical
Consultations:
MYCIN,
Elsevier/North
Holland,
New
York,
1976.
5.
Miller,
R.A.,
et.
al.,
"INTERNIST-1,
An
Experimental
Computer-Based
Diagnostic
Consultant
for
GEneral
Internal
Medicine,"
New
England
Journal
of
Medicine,
Vol.
307,1982,
pp.
468-476.
6.
Smith,
R.
G.
and
Friedland,
P.
E.,
"Unit
Package
User's
Guide,"
Heuristic
Programming
Project
Memo
HPP-80-28,
Stanford
University,
1980.
7.
Stefik,
M.,
"An
Examination
of
a
Frame-Structured
Representation
System,"
Proceedings
of
the
Sixth
International
Joint
Conference
on
Artificial
Intelligence,
IJCAI,
1979,
pp.
845-852.
8.
Hershey,
A.
D.,
ed.,
The
BaCteriophage
Lambda,
Cold
Spring
Harbor
Laboratory,
Cold
Spring
Harbor,
New
York,
1971.
9
... Une simulation consisteà répéter le processus de confrontations de la base de faits avec les conditions, età opérer les actions dont les conditions sont satisfaites. De tels modèles ont eté appliquésà la biologie, du phage lambda[132,157]. ...
Article
The scientific domain of the Systems Biology studies the interactions between the components of a biological system in order to understand its functioning as a whole. In this thesis, we first used searched to apprehend how a biological network, modelled as a simple graph, interact with its environment, modelled by another graph. Next, we have defined the MIB formalism (for Model of Interactions in Biology) that enables to model, to search and to study the heterogeneous motifs in biological networks. Finally, for deepening the study of structure and dynamics of biological networks, we have proposed the MIN formalism (for Modular Interaction Network). MIN inherited the bipartite structure of MIB, but also includes the richer annotations for nodes, arcs and possible states of the network, thus enabling the automatic translation of data contained in MIN into other formalisms commonly used in biology for dynamics modelling, such as logical networks, differential equations or Petri nets.
Article
Biological analysis of Drosophila embryogenesis has provided a model of protein interaction in segment formation. In this paper we introduce GEISHA system, which verifies and revises the rules of pattern formation in embryogenesis. The system consists of three parts: rule-based simulator, evaluator, and user interface. The simulator tests all the possible rule patterns, and the evaluator qualitatively evaluates results of the simulator; it searches for the desired pattern of protein expression. The user interface enables us to input or save data using GUI.
Article
FBA(flux balance analysis) with Boolean rules for representing regulatory events has correctly predicted cellular behaviors, such as optimal flux distribution, maximal growth rate, metabolic by-product, and substrate concentration changes, with various environmental conditions. However, until now, since FBA has not taken into account a hierarchical regulatory network, it has limited the representation of the whole transcriptional regulation mechanism and interactions between specific regulatory proteins and genes. In this paper, in order to solve these problems, we describe the construction of hierarchical regulatory network with defined symbols and the introduction of a weight for representing interactions between symbols. Finally, the whole cellular behaviors with time were simulated through the linkage of a hierarchical regulatory network module and dynamic simulation module including FBA. The central metabolic network of E. coli was chosen as the basic model to identify our suggested modeling method.
Chapter
Zu den wichtigsten Informationsquellen der modernen biologischen Forschung gehören die Daten, die die Natur in Nukleinsäuren In verbaler oder geometrischer Form (in Basensequenzen oder Raumstrukturen) gespeichert hat. Es ist selbstverständlich, daß sich die Molekularbiologen in den letzten Jahren — neben chemischen und physikalischen Untersuchungsmethoden — zunehmend auch der Mittel der Mathematik, der Informationstheorie und der Computer bedienen, um diese „makromolekulare Sprache“ mit ihrer eigenartigen Semantik verstehen und analysieren zu können. Neben der Aufgeschlossenheit, die bei den Vertretern einer verhältnismäßig jungen Wissenschaft nicht überraschend ist, spielt in diesem Vorgang auch die Größe und Komplexität der schon vorhandenen Sequenzinformation eine nicht unwesentliche Rolle, von ihren Zuwachsraten gar nicht zu sprechen. Wir müssen uns nur vergegenwärtigen, daß nach dem Erscheinen der ersten vollständigen Nukleotidsequenz (Holley et al. 1965), in den 13 Jahren von 1965 bis 1978 insgesamt Sequenzen von etwa 12000 Nukleotiden publiziert wurden. In den darauffolgenden 4 Jahren erschienen in etwa 1200 Arbeiten über eine Million Nukleotide, und ihre Zahl hat sich seitdem jährlich ungefähr verdoppelt.
Chapter
The question of whether or not the human genome contains retroviral sequences is interesting in terms of both evolution and potential expression of these sequences. We have isolated baboon endogenous virus (BaEV) related sequences from a human DNA library (1) in order to address these questions (2,3). Hybridization and DNA sequence analyses of clones containing the human locus, ERV3, reveals that this region contains an apparent full length integrated retroviral genome (3,4). Significant homology was found between the gag and pol DNA sequences of this provirus and those of other mammalian type C retroviruses including: BaEV (5), Moloney murine murine leukemia virus (M-MuLV) (6), human T-cell leukemia virus (HTLV) (7) and two previously characterized human proviruses, ERV1 (2) and 51–1 (8). The complete nucleotide sequence has been obtained for both ERV3 long terminal repeats (LTRs). These elements contain known transcriptional control sequences within the LTRs in positions characteristic of mammalian type C retro-viruses. In contrast, the nucleotide sequence immediately following the U5 region of the 5’ LTR (that region commonly containing the primer binding site) is complementary not to the tRNApro utilized in the replication of type C mammalian retroviruses (9), but to a tRNAarg (10). This difference may define a new group of human retroviruses.
Chapter
This chapter provides a short review of the modelling of Genetic Regulatory Networks (GRNs). GRNs have a basic requirement to model (at least) some parts of a biological system using some kind of logical formalism. They represent the set of all interactions among genes and their products for determining the temporal and spatial patterns of expression of a set of genes. The origin of modelling the regulation of gene expression goes back to the Nobel-prize winning work of Lwoff, Jacob and Monod on the mechanisms underlying the behaviour of bacterial viruses that switch between so-called lytic and lysogenic states. Some of the circuit-based approaches to GRNs such as the work of Kauffman, Thomas, and Shapiro and Adam are discussed.
Article
The recent rapid increase in genomic data related to many microorganisms and the development of computational tools to accurately analyze large amounts of data have enabled us to design several kinds of simulation approaches for the complex behaviors of cells. Among these approaches, dFBA (dynamic flux balance analysis), which utilizes FBA, differential equations. and regulatory events, has correctly predicted cellular behaviors under given environmental conditions. However, until now, dFBA has centered on substrate concentration, cell growth. and gene on/off. but a detailed hierarchical structure of a regulatory network has not been taken into account. The use of Boolean rules for regulatory events in dFBA has limited the representation of interactions between specific regulatory proteins and genes and the whole transcriptional regulation mechanism with environmental change. In this paper, we adopted the operon as the basic structure, constructed a hierarchical structure for a regulatory network with defined fundamental symbols, and introduced a weight between symbols in order to solve the above problems. Finally, the total control mechanism of regulatory elements (operons, genes, effectors, etc.) with time was simulated through the linkage of dFBA with regulatory network modeling. The lac operon, trp operon, and tna operon in the central metabolic network of E. coli were chosen as the basic models for control patterns. The suggested modeling method in this study can be adopted as a basic framework to describe other transcriptional regulations, and provide biologists and engineers with useful information on transcriptional regulation mechanisms under extracellular environmental change.
Article
In order to synthesize functional proteins, mRNA must be translated by the ribosome with a certain level of accuracy. As we will discuss below, our knowledge concerning the levels of precision ultimately attained by the translational machinery, especially at the level of the nascent polypeptide chain, is still incomplete. Furthermore, it will be interesting to compare the sparse data that exist with the results of a naïve calculation of the following type: in order to synthesize correctly at least 50% of the polypeptide chains of an enzyme as large as β-galactosidase which contains 1169 amino acid residues, an average error no larger than 1 in 1700 per amino acid residue can be tolerated (Fig. 5.1). Why should attaining this level of accuracy pose a problem?
Article
In the living cell vast number of physical and chemical reactions occurs to execute a specific function. It is a great challenge of this century to design a cell model and simulate it in the computer based on the available data. Detailed knowledge of chemical substances, reactions, network connections of different reactions in the space may be helpful in mathematical formulating of the function of the cell. A suitable software package is ultimately needed to simulate the model in the computer. The model simulation results can be compared with the real experimental data and the success of the model can be assessed and modification should be done accordingly. There may be many problems to build a successful cell model such as the knowledge of the exact mechanism of cell function, quantitative data, scaling, computational power and good simulation software. In this present paper I have reviewed about the fundamental approach to design a cell model, different problems related with it and the probable application of a cell model.
Article
The Unit Package is an interactive knowledge representation system with representations for individuals, classes, indefinite individuals, and abstractions. Links between the nodes are structured with explicit definitional roles, types of inheritance, defaults, and various data formats. This paper presents the general ideas of the Unit Package and compares it with other current knowledge representation languages. The Unit Package was created for a hierarchical planning application, and is now in use by several AI projects.
Article
The objectives of the methods and computer implementation presented here are (1) to automate the process of hypothesis generation and exploratory analysis of data in large nonrandomized, time-oriented clinical data bases, (2) to provide knowledgeable assistance in performing studies on large data bases, and (3) to increase the validity of medical knowledge derived from nonprotocol data. The RX computer program consists of a knowledge base (KB), a discovery module, a study module, and a clinical data base. Utilizing techniques from the field of artificial intelligence, the KB contains medical and statistical knowledge hierarchically organized, and is used to assist in the discovery and study of new hypotheses. Confirmed results from the data base are automatically encoded into the KB. The discovery module uses lagged, nonparametric correlations to generate hypotheses. These are then studied in detail by the study module which automatically determines confounding variables and methods for controlling their influence. In determining the confounders of a new hypothesis the study module uses previously “learned” causal relationships. The study module selects a study design and statistical method based on knowledge of confounders and their distribution in the data base. Most studies have used a longitudinal design involving a multiple regression model applied to individual patient records. Data for system development were obtained from the American Rheumatism Association Medical Information System.
Article
Temperate bacteriophages can display in a stable way two essentially different behaviours. In the immune state, a gene (cI) produces a repressor which prevents expression of all the other viral genes; in the non-immune state the typically viral functions are expressed. The choice between the two pathways and the establishment of one of them have much in common with cell determination and differentiation. This choice depends on a complex control system, in fact one of the most intricate nets of regulation known in some detail. Our paper provides a formal description and partial analysis of this regulatory net. It is shown that even for relatively simple known models, this kind of analysis uncovers predictions which had previously remained hidden. Some of these predictions were checked experimentally.
Article
An enlarged threshold model of regulatory system of development of lambda phage (RSDP lambda-2) is built. It includes 15 synthetic blocks of proteins and mRNAs and 4 blocks corresponding to the ontogenetic processes: two-stage replication, integration and exclusion of phage genome, formation of aggregates of regulatory proteins, regulation of bacterial lysis. By means of computer simulation of the RSDP lambda-2 model, the dynamics of concentrations of all main proteins, respective fractions of mRNAs and DNA are described in lytic and lysogenic regimens of phage ontogenesis. The results obtained are in a good agreement with available experimental data. The dependence of portion (%) of lysogenic responses on the mean multiplicity n of phage infection of bacterial culture, is built. This curve has a maximum point in accordance with experimental data of Kourilsky [10].
Article
Internist-I is an experimental computer program capable of making multiple and complete diagnoses in internal medicine. It differs from most other programs for computer-assisted diagnosis in the generality of its approach and the size and diversity of its knowledge base. To document the strengths and weaknesses of the program we performed a systematic evaluation of the capabilities of INTERNIST-I. Its performance on a series of 19 clinicopathological exercises (Case Records of the Massachusetts General Hospital) published in the Journal appeared qualitatively similar to that of the hospital clinicians but inferior to that of the case discussants. The evaluation demonstrated that the present form of the program is not sufficiently reliable for clinical applications. Specific deficiencies that must be overcome include the program's inability to reason anatomically or temporally, its inability to construct differential diagnoses spanning multiple areas, its occasional attribution of findings to improper causes, and its inability to explain its "thinking".
The BaCteriophage Lambda
  • A D Hershey
Hershey, A. D., ed., The BaCteriophage Lambda, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1971.
Unit Package User's Guide
  • R G Smith
  • P E Friedland
Smith, R. G. and Friedland, P. E., "Unit Package User's Guide," Heuristic Programming Project Memo HPP-80-28, Stanford University, 1980.
A Complex Control Circuit: Regulation of Immunity in Temperate Bacteriophages
  • R Thomas
Thomas, R., et. al., "A Complex Control Circuit: Regulation of Immunity in Temperate Bacteriophages," Eur. J. Biochem., Vol. 71,1976, pp. 211-227.