Content uploaded by Mihai Oltean
Author content
All content in this area was uploaded by Mihai Oltean on Jul 30, 2022
Content may be subject to copyright.
STUDIA
UNIV.
BABE^-BOLYAI,
INFORMATICA.
VoLUME
XLIV,
NUMBER
1,
1999
THEOREM
PROVING
AND
DNA
COMPUTING
DOINA
TATAR
AND
MIHAI
OLTEAN
Abstract.
We
start from
sticker
systenms as
language
generating
devices
for DNA
computing
((3]))
and
we
define
the
sticker
systems
associate
with
a
set
of
clauses. The
Robinson's
theorem
is stated in
terms
of
the
language
generated
by
this
sticker
system
1.
Introduction
A DNA (dezoxyribose nucleic acid)
is
a large double-stranded (helicoidal)
structure
that contains,
in
order
form, all information
needed
to
generate
proteins
for living
organisms.
This
coded
information is a
sequence
of
four
nucleotides,
A (adenine), T
(Thymine),
C (Cytosine), G
(Guanine)
paired
A-T,
C-G
according
to
the
so-called
Watson-Crick complementarity
[3].
One
can
think
DNA
as
a
program
interpreted
by
a complex
biological
machinery
that
generates
sequence
of
aminoacids (proteins).
There
are
precisely
20
aminoacids
(prote
ins)
that
can
be
generated
from 64 possible triplets
(codons)
of
nucleotides,
and
each
of
them
can be
represented
by
multiple triplets[2]. For
example,
the
aminoacid
Ala
can
be
formed
by
the
following
triplets:
GCA,
GCC,
GCT,GCC.
In
a
simplified
manner
the
generation
of
proteins
form
DNA
proceeds
in four
phases:
transcription,
splicing,
aminoacid
generation
and
protein
folding.
In [3). [4]
the
authors
present
a language-theoretic
model
of
DNA
splicing. This
model will be
adopted
for resolution
method
in
automated
theorem
proving. In
the
following section
we
will present the sticker operation and the sticker systems
as
introduced
in
[3].
In
section
3
we
will
present
the
resolution
method
as
a
complete
computation
in a
stiker system.
In
section
4
the
corresponding
justifications
in
propositional
caleulus
are
introduced.
2.
Sticker
operation
and
sticker
system
If
we
have
single
stranded
sequence
of
A,
C,
G,
T
nucleotides,
together
with
a
Single stranded sequence composed
of
the
complementary
nucleotides, the two
sequences
will
be
glued
together
(by
hydrogens
bounds),
forming
a
double
stranded
DNA
sequence. What
[3]
extracts
from here is the
operation
of
prolonging
to
the righta
THEOREM
PROVING
AND
DNA
cOMPUTING
63
cequnce
of
(single or double) symbols by using given single stranded strings, matching
hem
with
portions
of
the
current
sequence
accordind
to
a
complementary
relation.
In
the
following
we
willl
present
the
sticker
operation
and
the
sticker
system
asin
Let V
be
an
alphabet endowed with a symmetric relation p (of complementarity), p
VxV.
Let #
be
a
special
symbol
not
in
V,
denoting
an
empty
space
(the
blank
symbol).
Using
the
elements
of
V u
{}
one
construct
the
composite
symbols
of
the
following
sets:
a
a,beV,(a,b)
E p
)-(C
aeV
Now
the
set:
(V
w,=S),
where
#
sV)=
+)
is
introduced
and
the
elements
of
this
set
are
called
well-started
sequences.
S
Stated
otherwise,
the
elements
of
W»
((V)
start
with
pairs
of
symbols
in
V,
as
selected
by
the
complementarity
relation,
and
end:
a)
either
by
a suffix
consisting
of
pairs
a
(b #
(b
b)
or with a
suffix
consisting
of
pairs
for
a,be
V
(the
symbols
are
#
a
S(V)t
lV)
definde as follows:
for
x e
W,(V).
y e S(V), z e Wp(V), one write
not
mixed).
ne
sticker
operation,
denoted
by
u,
is a
partially
defined
mapping
from
Wp(V)
x
H(x.y) = z
and
only
if
one
of
the
following
cases
holas:
D.
TATAR
AND
M.
OLTEAN
64
4.a,.a..rtG
#
arp
###
y C
: # #
for
k 2
0,
r21,
p2
1,
a, e V,
1Sisktr+p,
b,
e
V,
1si
sk
c; e
V,
l3isr,
and
(a
C)
E p.
1sis;
y=
#
(#
* H
(4)
(aa
# #
for k
20,
r21,
p2
1,
a e
V,
1Si
sktrtp,
b,
e
V,
ISiSk,
c;
e
V,
Isiar,
and
(a-C)
E
p,
Isi
ar;
r=
3
#)
(#
#
(
-
##
a # #
for k 2
0,
r21,
p2
1,
a,
E
V,
Isisk
bj
e
V,
ISisk+r+p,
c,
E
V,
ISiar
and
(ac)
E
p.
lsiSr;
THEOREM
PROVING
AND
DNA cOMPUTING
65
4.
C+P
y=
#
(a #
Cp
...*#,
for
k 20,
r20,
rtp2
1,
a e
V,
1siak,
b e
V,
1si
sk+r,
c,
E
V,
I1si
artp,
and
(ak-,c)
E p,
1siar;
In
the
case
1
one
add
complementary
symbols
on
the
lower
level
without
completing
all
the
blank
spaces.
In
case
one
we
complete
the
blank
spaces
on
the
lower
Cases
3
and
4
level
of
x
and
possibly
add
more
composite
symbols
of
the
form
are
symmetric
to
cases
1
and
2,
respectively,
completing
blank
spaces
on
the
upper
level
of the string.
Note
that
in
all
cases
the
string
y
must
contain
at
least
one
composite
symbol
and
that
cases
2
and
4
allow
the
prolongation
on
of
"blunt"
strings
in
W,(V):
when
r = 0
there is
no
blank
position
inx.
For
strings
X,
y
which
do
not
satisfy
any
of
the
previous
conditions,
H(x.y)
is
not
defined.
Using the sticker operation
in
[3] the authors define a generating/computing
mechanism, sticker system, as follows:
Y=(V.p,
A,
Bi,
Bu),
where V is
an
alphabet,
p c
VxV
is a
symetric
relation
on
V, A is a
finite
subset
of
WV)
of
axioms,
and
Bas
B
are
finite
subsets
of , respectively.
and
ne
idea
behind
such
a
machinery
is
the
following.
One
start
with
the
sequences
in
and
one
prolong
them
to
the
right
with
the
strings
in
Ba,
Bu
acording
to
the
sticker
aon
(the
elements
of
B,
are
used
on
the
lower
row,
and
those
of
B,
are
used
on
the
66
D. TATAR
AND
M. OLTEAN
upper
row).
When
no
blank
symbol
is present,
we
obtain
a string
over
the alphah
.The
language
of
all
such
strings
is
the
language generated
by
y.
Formally,
[3]
defines
this
language
as
follows:
For
two
strings
x, z e
W,(V)
one
write
x-3z
iff'z
u(x.y)
for
some
y e
B4
UB
One
denote
by
»*
the
retlexive
and
transitive
closure
of
the
relation
,
like
usually
A
sequence
NiN...N,
X1
E A
is
called
a
computation
in
y
(of
length
k - 1). A
(V
computation
as
above
is
complete
if
X,
E|
(no blank symbol is present
in
the
last
string
of
composite symbols).
The
language
generated
by
y, denoted
by
L(Y),
is
defined
by
L(y)
={we
x
w,
x¬A
V),
Therefore,
only
the
complete computation
are
taken
into
account when
defining
L(y).
3.
Resolution
method
as
a
sticker
system
One
of
the
most
largely
used
refutation
method
in
automated
theorem
proving
is
resolution method. It
can
be
introduced
as
a formal
system
[S]:
R=(,FRAp,
Rg)
where:
{p,
9,
r,..Pi.
9,
ri}
U{
kv}.
variables.
P,
9.
r,
...Pi»
qi»
ri
are
propositional
Fr
is
a
set
{1,
g,...fi,
gi,..}
of
clauses and
they
have the
following
fornm:
Pvv...
V p
where
if
a =1
P
ifa
= 0
The
constructs
p
or
p
are
called literals. A
special
clause
is
the
empty
clause whien
is
free
from
literals.
It
is
noted
as
AR
.
(The
resolution
system
has
no
axioms.)
THEOREM
PROVING
AND
DNA
COMPUTING
67
RR
{res},
so
the
only
deduction
rule
is
called
res,
and
it is
defined
as
follows:
fv
p.
gv
Ffvg
f,ge
Fe
It
is
known
that
any
problem
as
u2
v
is
equivalent with
u.un.
V
unsatisfiable.
Moreover
each
formula
u...,
u,,v
can
be
reduced
as
a
set
of
clauses.
So,
to
verify
U,
2
Fv
is
equivalent
with
to
verify
if
a
set
of
clauses
is
unsatisfiable.
Theorem
(JA. Robinson)
A set
of
clauses
from
propositional
computation
is
unsatisfiable
(or
contradictious)
if
and
only
ifCF
What we will explain here is applicable
if
original disjunctions have the length at
most 2
(contain
at
most
two
variable).
In
4
we'1l
show
that
any
refutation
by
resolution
can be
reduced
at
the
clauses
with
the
length
at
most
2.
We
codify
each
clause
of
C
by
a single stranded
DNA
formed by the sequence
of
the
variables
in
that
clause.
For
each
clause
pvq
we
put
in
the
test
tube
two
kind
of
sequences:
pq
and
qp.
Due
to
the
complementarity,
through
the
application
of
the
resolution
rule
between
two
DNA
sequences
it
will
be
fonned
a
new
DNA
sequence
of
higher
length.
Example:
C=\gvp,rv
p}
t
is
clear
that
the
obtaining
of
the
empty
clause
means
a
complete
DNA
sequence
uble
stranded.
In
the
terms
of
sticker
systems,
the
deduction
of
the
empty
clause
ans
a
complete
computation
(no
blank
symbols
is
present
in
the
last
string
of
Ompagite symbols). In the previous example,
if
we have also the F
and
q clauses,
we
Obtain
the
empty
clauses,
which
means
a
DNA
sequence
without
blank
symbols:
Example:
C=lqv
p,rv
p.9,rj
D.
TATAR
AND
M.
OLTEAN
68
P
Definition
The
sticker
system
asociated
with
a
set
C
of
clauses
is:
Y
(V.p,
A,
Ba,
B,),
where:
Vis
Vis
the
set
of
propositional
variables
in
clauses,
is
the
complementarity relation
in
propositional calculus,
#
B
is
a
set
of
elements
of,constructed
as:
#Y#
#
ba
EB.
bp::P)
if
pvp
V...vp
EC
B
is
a
set
of
elements
of
constructed
as:
#
if p v p v... v
p
EC
A=
BaUB.
(Let
us
remark
that
k<2,
see
section
4).
In
this
sticker
system,
for
two
strings
x, z E
Wp(V)
we
have:
xz
(or
z
=H(x,y)
for
some
y e
BaUB,)
ifx
is
a
string,
y is a
string
from B
or
B,
(a
clause)
and
z is a
string
obtained
resolution
between
y and x
(more
exactly,
between
y
and
the suffix
of
x).
A
part
of
the
Robinson's
theorem
can
now
be
stated
as:
by
Theorem
A
set
of
clauses
C,
with
at
most
two
variables,
is
unsatisfiable
if
L(Y)
*
¢.
where
the
sticker
system
associated
with C. y
is
THEOREM
PROVING
AND
DNA
COMPUTING
69
4.
Propositional
calculus
considerations.
In
this
section
we
will justify
on
limitation
at
clauses with
at
most two literals and
he
formulation
as
a
sutficient
conditions,
of
Robinson's
theorem,
in
terms
of
sticker
systems.
Let
us
consider the four posibilities for a formula:
a)
with A in
conclusion
b)with
v
in
premise
c)with
v
in
conclusion
d)
with A in
premise.
a)
For
this
case,
we
consider
the
theorem:
Kp-9) (P>r)
{p-Ar)
This theorem says
that,
to
prove a theorem
as
"p->qnr"
is
enough
to
prove
"p->qq
and
p->r'.
Both
formulae,
"p>q"
and
"p->r"
,(or,
as
clauses
pvq,
pvr,
. .
have a number
of
literals less
than
"p->qnr".
b)
For this case let us consider the theorem
Kp-)a
(q->r)
+{pvq-)
SO,
to
prove
a
theorem
as
"pvq
is
enough
to
prove
"p->r"
and
"q->r.
Both these formulae
have
a
number
of
literals
less
than
"pvqr
The
cases
a)
and
b)
provide
the
following
observation:
If one
of
formulae from the set
{u,..,
Un,
V}
, let say
u,
is
of
the
form:
then
this
formulae
will
introduce
kl
clauses
with
two
literals
of
the
form:
P VgP
va,Pi
v4
lt
the
set
of
these
clauses
(for
u,)
in
union
with
the
rest
of
clauses
(for
or
i.,Ur
js..,)
is
unsatisfiable,
then
we
conclude
that
{us.,
Ug.
V
is
unsatisfiable
AND
conversely
C)For
this
case
we
have the
implication:
Kp-q)
a
(p->r)
>(p->qvr)
G
ODserve
that
the
reverse
implication is
not
a
valid
formula.
As
clauses, this
formula
can
be
rewriten
as
Fpvg)apvr)>(Pvqvr)
By
the
equivalence:
70 D.
TaTAR
AND
M.
OLTEAN
Ku->F)a(>F}>(uvvF)
(or
Fua
*>UVv)
we
can
deduce
that
if
a
set
of
clauses:
C={pvq. pvr,...
is
unsatisfiable,
then
the
set
of
clauses
C'=pvqvr,..
is
also
unsatisfiable.
In
this
case
we
have
the
implication:
Hp-)qNr)>{p^q->*)
d)
Or
vr)a(Gvr)>(pvvr)F
Similar with the case c), we can deduce that
if
a set
of
clauses:
C
pvr,qvr,...
is
unsatisfiable, then the set
of
causes
C-vgvr,..
is
also
unsatisfiable.
The
cases
c)
and
d)
provide
the
following
observation:
If
one
of
formulae
{u,s.,
U,,
V},
let
say
u;,
is
the
form:
P
AP
A..A
P>4
v9v..vg
then
this
formula
will
introduce
kel
clauses
with
two
literals
of
the
form:
if
the
set
of
these
clauses
(for
u,)
in
union
with
the
set
of
clauses
for
Ui,
ui-1,lj+l5-.,Uas
V
is
unsatisfiable,
then
we
conclude
that
{u,..,
u,,V}
is
unsatisfiable
(but
not
AND
conversely).
REFERENCES
Adleman,
Molecular
Computation
of
Solutions
to
Combinatorial
Problems,
Science.
Vo
266,
1994,
pp
1021.
Jacques
Cohen:
Computational
Molecular
Biology:
a promissing
application
uing
LP
anu
its extensions.The
LP
paradigms,
a
25-ycar
perspective,
ed. K.
Apt,
Springer,
1999.
2.
3.
and
L.
Kari,
Gh
Páun,
G.
Rozenberg,
A.
Salomaa,
Sheng
Yu,
DNA
computing,
sticker
syoen
tems,
and
universality,
Acta
Informatica 35, 1998,
pp
401-4
4
57,
Gh.
Påun:
Splicing,
A
challenge
Jor
formal
language
theorist,
Buletin
of
EATCS,
"
1995,
ppl83-194.
THEOREM
PROVING
AND
DNA
cOMPUTING
71
nTatar:
The
mathematical
bases
of
computer
science, Univ Babe_-Bolyai, Cluj-Napoca,
1993
(in
romanian).
Tlniversity *
Babes-
Bolyai", Cluj-Napoca,
Romania
Faculty
of
Mathematics
and
Computer
Science
Department
of
Computer
Science
Fmail
address:
dtatar@cs.ubbeluj.ro,
moltean@cs.
ubbeluj.ro