ArticlePDF Available

Long-term storage of information in DNA

Authors:
Science's Science's
COM
PASS
COM
PASS
A Bountiful
Harvest
of Rainwater
A Bountiful
Harvest
of Rainwater
OVER THOUSANDS
OF
YEARS,
SOCIETIES
HAVE
developed
a
diversity
of local water harvest-
ing
and
management
regimes
that
continue
to
survive
in South
Asia, Africa,
and other
parts
of
the
world
(1).
Such
systems
are
often
inte-
grated
with
agroforestry
(2)
and local forest
management
practices
(3).
In their
Policy
Fo-
rum
"Managing
water for
people
and nature"
(Science's Compass,
11
May, p.
1071),
Nels
Johnson
and co-authors discuss several
mar-
ket mechanisms for sustainable
water
man-
agement,
including taxing
users to
pay
com-
mensurate
costs of
supply
and
distri-
bution
and costs of
integrated
water-
shed
management,
and
charging
pol-
luters for
effluent treatment.
Although
such measures
are indeed
essential,
I
would
argue
that
they
are insufficient:
They
should be
complemented
with
policy
innovations to
promote
rainwa-
ter
harvesting (4).
Revival of local
practices
of rain-
water
harvesting
could
provide
sub-
Villa
stantial
amounts of
water.
For
exam-
mosi
ple,
a
hectare
of
land in
Barmer,
one
harv
of India's driest
places,
with
100
mil-
limeters of rainfall
annually,
could
yield
1
million
liters of water
per
year
from
harvest-
ing
rainwater. Even
with
simple
technology
such as
ponds
and
earthen embankments
called
tanks,
at least
half a million liters a
year
can be harvested
from rain
falling
over
1
hectare of
land,
as
is
being
done in the Thar
Desert,
making
it
the
most
densely
populated
desert
in the
world.
Indeed,
there are 1.5
mil-
lion
village
tanks in use and
sustaining every-
day
life in
the
660,000
villages
in India.
Letters to the Editor
Letters
(-300
words)
discuss material
published
in Science in the
previous
6 months or
issues
of
general
interest.
They
can be submitted
by
e-mail
(science_letters@aaas.org),
the Web
(www.letter2science.org),
or
regular
mail
(1200
New
York
Ave.,
NW,
Washington,
DC
20005,
USA).
Letters are
not
acknowledged
upon receipt,
nor
are authors
generally
con-
sulted before
publication.
Whether
published
in full or in
part,
letters are
subject
to
editing
for
clarity
and
space.
OVER THOUSANDS
OF
YEARS,
SOCIETIES
HAVE
developed
a
diversity
of local water harvest-
ing
and
management
regimes
that
continue
to
survive
in South
Asia, Africa,
and other
parts
of
the
world
(1).
Such
systems
are
often
inte-
grated
with
agroforestry
(2)
and local forest
management
practices
(3).
In their
Policy
Fo-
rum
"Managing
water for
people
and nature"
(Science's Compass,
11
May, p.
1071),
Nels
Johnson
and co-authors discuss several
mar-
ket mechanisms for sustainable
water
man-
agement,
including taxing
users to
pay
com-
mensurate
costs of
supply
and
distri-
bution
and costs of
integrated
water-
shed
management,
and
charging
pol-
luters for
effluent treatment.
Although
such measures
are indeed
essential,
I
would
argue
that
they
are insufficient:
They
should be
complemented
with
policy
innovations to
promote
rainwa-
ter
harvesting (4).
Revival of local
practices
of rain-
water
harvesting
could
provide
sub-
Villa
stantial
amounts of
water.
For
exam-
mosi
ple,
a
hectare
of
land in
Barmer,
one
harv
of India's driest
places,
with
100
mil-
limeters of rainfall
annually,
could
yield
1
million
liters of water
per
year
from
harvest-
ing
rainwater. Even
with
simple
technology
such as
ponds
and
earthen embankments
called
tanks,
at least
half a million liters a
year
can be harvested
from rain
falling
over
1
hectare of
land,
as
is
being
done in the Thar
Desert,
making
it
the
most
densely
populated
desert
in the
world.
Indeed,
there are 1.5
mil-
lion
village
tanks in use and
sustaining every-
day
life in
the
660,000
villages
in India.
Letters to the Editor
Letters
(-300
words)
discuss material
published
in Science in the
previous
6 months or
issues
of
general
interest.
They
can be submitted
by
e-mail
(science_letters@aaas.org),
the Web
(www.letter2science.org),
or
regular
mail
(1200
New
York
Ave.,
NW,
Washington,
DC
20005,
USA).
Letters are
not
acknowledged
upon receipt,
nor
are authors
generally
con-
sulted before
publication.
Whether
published
in full or in
part,
letters are
subject
to
editing
for
clarity
and
space.
In the
Negev
Desert,
decentralized har-
vesting
of
water in microcatchments
from
rain
falling
over
a 1-hectare
watershed
yield-
ed
95,000
liters
of water
per
hectare
per
year,
whereas collection
efforts
from a
single
large
unit from
a
345-hectare
watershed
yielded
only
24,000
liters
per
hectare
per
year
(5).
Thus,
75% of
the collectible
water was lost
as a result of
the
longer
distance
of
runoff.
Indeed,
this is consistent
with local
knowl-
edge
distilled
in an
Indian
proverb,
"Capture
rain
where it rains."
In
the
cities,
rainwater could
be harvested
from
building rooftops
for
residential
use,
and
any surplus
could
be channeled
through
bore wells to
replenish
the
groundwater,
In the
Negev
Desert,
decentralized har-
vesting
of
water in microcatchments
from
rain
falling
over
a 1-hectare
watershed
yield-
ed
95,000
liters
of water
per
hectare
per
year,
whereas collection
efforts
from a
single
large
unit from
a
345-hectare
watershed
yielded
only
24,000
liters
per
hectare
per
year
(5).
Thus,
75% of
the collectible
water was lost
as a result of
the
longer
distance
of
runoff.
Indeed,
this is consistent
with local
knowl-
edge
distilled
in an
Indian
proverb,
"Capture
rain
where it rains."
In
the
cities,
rainwater could
be harvested
from
building rooftops
for
residential
use,
and
any surplus
could
be channeled
through
bore wells to
replenish
the
groundwater,
iges
in the Thar Desert
in
India,
the world's
t
densely
populated
desert,
rely
heavily
on the
esting
of rainwater for their
daily
needs.
iges
in the Thar Desert
in
India,
the world's
t
densely
populated
desert,
rely
heavily
on the
esting
of rainwater for their
daily
needs.
avoiding
loss
to runoff.
However,
if tanks and
other
rain
harvesting
technology
are
to be
used to their full
potential,
policy
innovations
must
include
institutional
changes
so
that
such
common-pool
resources
are
effectively
managed
(6).
Also,
all
forms
of
government
subsidies
need to be removed
to allow
market
mechanisms,
such as
the
ones
Johnson
et al.
discuss,
to run
their
course.
Users would
then
find
it
prudent
not
only
to
make efficient use
of
priced
water,
but
they
would
also have
the
incentive to collect
the
gift
that Mother
Na-
ture has to offer
in the
form of
rain.
DEEP
NARAYAN PANDEY
Indian Forest
Service,
Indian Institute
of Forest
Management,
Bhopal
462003,
India.
E-mail:
dnpandey@vsnl.com
or
deep@inef.org
References and Notes
1.
A.
Agarwal,
S.
Narain,
Eds.,
Dying
Wisdom:
Rise,
Fall
and
Potential of India's
Traditional
Harvesting Sys-
tems
(Centre
For Science and
Environment,
New Del-
hi,
1997).
2. H.
R.
Wagachchi,
K.
F.
Wiersum,
Agrofor.
Sys.
35,
291
(1997).
3. D. N.
Pandey,
Ethnoforestry:
Local
Knowledge
for
Sustainable
Forestry
and Livelihood
Security
(Hi-
manshu/AFN,
New
Delhi,
1998).
avoiding
loss
to runoff.
However,
if tanks and
other
rain
harvesting
technology
are
to be
used to their full
potential,
policy
innovations
must
include
institutional
changes
so
that
such
common-pool
resources
are
effectively
managed
(6).
Also,
all
forms
of
government
subsidies
need to be removed
to allow
market
mechanisms,
such as
the
ones
Johnson
et al.
discuss,
to run
their
course.
Users would
then
find
it
prudent
not
only
to
make efficient use
of
priced
water,
but
they
would
also have
the
incentive to collect
the
gift
that Mother
Na-
ture has to offer
in the
form of
rain.
DEEP
NARAYAN PANDEY
Indian Forest
Service,
Indian Institute
of Forest
Management,
Bhopal
462003,
India.
E-mail:
dnpandey@vsnl.com
or
deep@inef.org
References and Notes
1.
A.
Agarwal,
S.
Narain,
Eds.,
Dying
Wisdom:
Rise,
Fall
and
Potential of India's
Traditional
Harvesting Sys-
tems
(Centre
For Science and
Environment,
New Del-
hi,
1997).
2. H.
R.
Wagachchi,
K.
F.
Wiersum,
Agrofor.
Sys.
35,
291
(1997).
3. D. N.
Pandey,
Ethnoforestry:
Local
Knowledge
for
Sustainable
Forestry
and Livelihood
Security
(Hi-
manshu/AFN,
New
Delhi,
1998).
4. T. M.
Boers,
J.
Ben-Asher,
Agric.
Water
Manag.
5,
145
(1982).
5. M.
Evenari,
L.
Shanan,
N.
Tadmor,
The
Negev:
The
Challenge
of a Desert
(Harvard
Univ.
Press,
Cam-
bridge,
ed.
2,
1982).
6. E. Ostram et
al.,
Science
284,278
(1999).
4. T. M.
Boers,
J.
Ben-Asher,
Agric.
Water
Manag.
5,
145
(1982).
5. M.
Evenari,
L.
Shanan,
N.
Tadmor,
The
Negev:
The
Challenge
of a Desert
(Harvard
Univ.
Press,
Cam-
bridge,
ed.
2,
1982).
6. E. Ostram et
al.,
Science
284,278
(1999).
Long-Term
Storage
of
Information
in DNA
IN THIS DIGITAL
AGE,
THE TECHNOLOGY
USED
for information
storage
is
undergoing rapid
advances.
Data
currently
being
stored
in
magnetic
or
optical
media
will
probably
be-
come
unrecoverable
within a
century
or
less,
through
the combined
effects of hardware
and
software
obsolescence
and
decay
of
the
storage
medium. New
approaches
are
re-
quired
that
will
permit
retrieval of informa-
tion stored
for
centuries
or even millennia.
DNA
has
three
properties
that recom-
mend it as
a vehicle for
long-term
informa-
tion
storage.
First,
DNA has stood
the
infor-
mational
"test of
time"
during
the
billions
of
years
since
life
emerged.
Nonreplicating
DNA molecules
are also
quite
robust. Al-
though
DNA stored
under nonideal
condi-
tions
(e.g.,
in
archaeological deposits)
is sub-
ject
to
hydrolytic
and oxidative
damage (1),
mitochondrial
DNA
extracted
and
amplified
from
7000-year-old
human remains
yielded
an accurate DNA
sequence
(2).
Storage
of
DNA under
more favorable
conditions
can
result
in
extremely long
stability,
as
evi-
denced
by
the
reported
recovery
of
viable
bacteria from
250-million-year-old
salt
crys-
tals
(3).
Second,
because
DNA is our
genetic
material,
methods for both
storage
and
read-
ing
of
DNA-encoded
information
should
re-
main central to
technological
civilizations
and
undergo
continual
improvements.
Third,
use
of
DNA as a
storage
medium would
per-
mit each
segment
of
information
to be stored
in
an enormous
number
of
identical
molecules.
This extensive
informational
re-
dundancy
would
strongly
mitigate
effects of
any
losses
due to stochastic
decay (4).
Data retrieval
of information
stored
in
DNA
should
ideally
require
minimal
prior
knowledge beyond
a
familiarity
with molec-
ular
biological techniques.
In the
procedure
we have
developed,
two
standard
techniques
are
required
for
recovery
of
stored
informa-
tion:
polymerase
chain
reaction
(PCR)
and
DNA
sequence
analysis.
Central
to
our
pro-
Long-Term
Storage
of
Information
in DNA
IN THIS DIGITAL
AGE,
THE TECHNOLOGY
USED
for information
storage
is
undergoing rapid
advances.
Data
currently
being
stored
in
magnetic
or
optical
media
will
probably
be-
come
unrecoverable
within a
century
or
less,
through
the combined
effects of hardware
and
software
obsolescence
and
decay
of
the
storage
medium. New
approaches
are
re-
quired
that
will
permit
retrieval of informa-
tion stored
for
centuries
or even millennia.
DNA
has
three
properties
that recom-
mend it as
a vehicle for
long-term
informa-
tion
storage.
First,
DNA has stood
the
infor-
mational
"test of
time"
during
the
billions
of
years
since
life
emerged.
Nonreplicating
DNA molecules
are also
quite
robust. Al-
though
DNA stored
under nonideal
condi-
tions
(e.g.,
in
archaeological deposits)
is sub-
ject
to
hydrolytic
and oxidative
damage (1),
mitochondrial
DNA
extracted
and
amplified
from
7000-year-old
human remains
yielded
an accurate DNA
sequence
(2).
Storage
of
DNA under
more favorable
conditions
can
result
in
extremely long
stability,
as
evi-
denced
by
the
reported
recovery
of
viable
bacteria from
250-million-year-old
salt
crys-
tals
(3).
Second,
because
DNA is our
genetic
material,
methods for both
storage
and
read-
ing
of
DNA-encoded
information
should
re-
main central to
technological
civilizations
and
undergo
continual
improvements.
Third,
use
of
DNA as a
storage
medium would
per-
mit each
segment
of
information
to be stored
in
an enormous
number
of
identical
molecules.
This extensive
informational
re-
dundancy
would
strongly
mitigate
effects of
any
losses
due to stochastic
decay (4).
Data retrieval
of information
stored
in
DNA
should
ideally
require
minimal
prior
knowledge beyond
a
familiarity
with molec-
ular
biological techniques.
In the
procedure
we have
developed,
two
standard
techniques
are
required
for
recovery
of
stored
informa-
tion:
polymerase
chain
reaction
(PCR)
and
DNA
sequence
analysis.
Central
to
our
pro-
www.sciencemag.org
SCIENCE
VOL
293 7 SEPTEMBER
2001
www.sciencemag.org
SCIENCE
VOL
293 7 SEPTEMBER
2001
0
o
0
v
I
z
z
Oa
ac
Wm
u
0
o
0
v
I
z
z
Oa
ac
Wm
u
1763 1763
SCIENCE'S
COMPASS
cedure
is
the
use of
two classes of
DNA
(see
the
figure):
information
DNAs
(iDNAs)
con-
taining
the
stored
information,
and a
single
polyprimer
key (PPK)
that
is
the
key
to
re-
trieving
the
information
stored
in the iDNAs.
Each iDNA contains
the
following sequence
elements: common
flanking
forward
(F)
and
reverse
(R)
PCR
amplification primers
(~10
to
20 bases
long),
a
unique
sequencing primer
(comparable
in
size
to
the F and R
primers),
a
small common
spacer (~3
to
4
bases
long)
serving
as
a cue to indicate
the
start of
the
stored
information,
and
a
unique
information
segment.
The
information
to be stored is
en-
coded
successively
in
these
information
seg-
ments,
beginning
with
Information
1. The
PPK
is also
flanked
by
the common
F and R
primers
and contains
in the
proper
order
the
unique sequencing primers
for the ordered
re-
trieval from
each iDNA of its information
segment
sequence.
Common
spacer sequences
indicate
the
demarcations between
each
sequencing primer.
Each information
seg-
ment
should be
capable
of
encoding any possible
data
(e.g., text),
whereas correct
readout
requires
that each se-
quencing primer prime
a
se-
quencing
reaction
only
from
the
appropriate position
with-
in a
specific
iDNA,
and not
misprime
on
any
iDNA. Vari-
ous
approaches
can be taken
to
satisfy
these conditions
(5).
To retrieve the stored
in-
formation,
a future reader
would
proceed
as follows.
First,
the PPK is
amplified
and
sequenced,
by using
co-
stored F and R
primers.
PCR
amplification
would
yield
amounts of
the PPK suffi-
cient for
further
analysis,
even if
extensive
degradation
or modification had occurred
Polyprimer key
F
Primer
Seq
Primer
1
/
Seq
Primer 2
:
Seq
Primer3
\
Seq
Primer
n
R Primer
????
We have carried
of this
technique,
u
C,
and
T
to encode
an "obvious" ternar
even
if this
encodinl
future
reader,
rec(
number
of informat
would
permit
use of
cal
techniques
to
de
been
encoded in the
Two iDNAs
were
respectively,
"IT
'
TIMES
IT WAS
TIMES,"
and
"IT
FOOLISHNESS
I
OF
BELIEF." Not c
the most famous
oF
(7),
but
the
fourfold
"it
was
the"
provide(
this
approach
to de
Info
F
Primer
t
F Primer
F Primer
\
+
F Primer
+-
inorllllauvlo
n
r
rlllll
IIIIIQII I III I
out
a
simple prototype
into small
wells
at a
comparable
density
on
sing only
the bases
A,
a
"microchip."
Use
of such
a
microchip
for
English
text based on
storage
would
impose
two levels of order on
y
code
(5,
6).
However,
the information
stored
in iDNAs
placed
in
g
were not obvious
to a these microwells:
the x and
y
coordinates
of
overy
of
a sufficient
each
microwell,
plus
the order
within each
ion
segment sequences
microwell
provided
by
the scheme described
f standard
cryptanalyti-
here. A
single
series
of
unique
identification
etermine
how
text had
primers,
encoded
within a
single
PPK,
DNA.
should
suffice to
order
the
collection
of
constructed
to
encode,
iDNAs
within
every
microwell and thus
per-
WAS THE
BEST OF mit readout
in the
proper
order
of the infor-
THE WORST
OF mation stored on
the entire
chip.
Because of
WAS THE
AGE OF the enormous
number
of different
potential
F
WAS THE EPOCH
20-base
primer sequences
(11),
the
capacity
)nly
is
this text one of for information
storage
in
microarrayed
)ening
lines of
a novel DNA is
presently
limited
by practical
rather
repetition
of
the
phrase
than
theoretical
considerations.
It seems rea-
d
a test of
the
ability
of
sonable
that with
minor
advances in
mi-
al
with
repeated
DNA
croarray technology,
about 200
novels or
other data
each
equivalent
in size to
A Tale
irmation DNAs of
Two Cities could
be stored in a DNA
mi-
Information
1 R
Primer
crochip
with
the area of a
postage
stamp
(12).
Ongoing
technological
advances
should
greatly
increase this
capacity.
CARTER
BANCROFT,1*
TIMOTHY
BOWLER,2
Information
2 R Primer
Information
_
2 Rre
BRIAN
BLOOM,1
CATHERINE TAYLOR
CLELLAND1
Departments
of
1Physiology
and
Biophysics,
and
2Biochemistry
and
Molecular
Biology,
Mount Sinai
Information 3
R Primer
School of Medicine, New York, NY 10029,
USA
*To whom
correspondence
should be
addressed.
.
E-mail:
carter.bancroft@mssm.edu
Structures of DNA molecules
used for information
storage
and readout.
The
single polyprimer
key
contains
a series of se-
quencing primer sequences
(Seq
Primer
#)
flanked
by
common
forward
(F Primer)
and reverse
(R Primer)
PCR
primer
sequences,
each
separated
by
a small common
spacer.
Each information
DNA is also flanked
by
the common PCR
primer sequences
and
contains
two
unique
elements,
separated
by
the
common
spacer:
a
Seq
Primer and
a numbered information
segment
(Information
#) (not
drawn
to
scale).
during storage. Sequence
analysis
of
the en-
tire
PPK would
reveal the
sequences
of
the
F
and R PCR
primers,
plus
an
internal ordered
series of elements of
comparable
size,
sug-
gesting
roles
for these elements as
sequencing
primers.
This
interpretation
would lead the
reader to
the
second
step, sequence
analysis
of
the
information
segments. Assuming
that
the F
and
R
primers
were
interpreted
to be
"universal" PCR
primers,
the reader would
use these for simultaneous
PCR
amplification
of
all
of
the
collectively
stored iDNAs. Se-
quential
use of
each
sequencing
primer
to
prime
a
sequencing
reaction on
the
collection
of PCR
products
would then
yield
the
se-
quence
of each of
the
information
segments,
arranged
in
the
proper
order to be decoded
and read
as
a
continuous block.
sequences
both within
and
between
infor-
mation
segments.
We
simultaneously
am-
plified
by
PCR
the two
iDNAs,
and then
sequenced
each of
the
products;
decoding
of
the resultant DNA
sequences
successful-
ly
recovered
the stored
text
(8).
The two standard molecular
biological
techniques
used
in
our model
for informa-
tion
storage
in DNA could form
the basis
for a
variety
of
DNA-based
memory storage
devices
(9).
Moreover,
the use of
DNA
mi-
croarray technology
should
permit
extensive
scale-up
of
this model.
Current
microarray
technology,
in which
up
to
10,000
small
DNA
samples
can be
spotted
in an ordered
array
onto an
approximately
3-square-cen-
timeter
surface
(glass,
etc.) (10),
would
have
to be modified to
permit spotting
of
DNA
References and
Notes
1. M.
Hoss,
P.
Jaruga,
T.
H.
Zastawny,
M.
Dizdaroglu,
S.
Paabo,
Nucleic
Acids.
Res.
24,
1304
(1996).
2. S.
Paabo,
J.
A.
Gifford,
A. C.
Wilson,
Nucleic Acids Res.
16,9775
(1988).
3. R. H.
Vreeland,
W.
D.
Rosenzweig,
D. W.
Powers,
Nature
407,897
(2000).
4. Studies
of ancient human
remains
provide
informa-
tion
on
long-term
rates
of DNA
decay
and/or
modifi-
cation,
and thus a
highly
conservative
estimate of
minimum DNA
amounts
required
for
prolonged
stor-
age
under more
ideal conditions.
About
0.1%
of the
DNA extracted
from
ancient,
decomposed
tissue is
unmodified
[(1);
S. Paabo et
al.,
J.
Biol
Chem.
264,
9709
(1989)].
However,
as little
as 100
to 300 fem-
tograms
of this unmodified
DNA can
serve as a PCR
template
(2).
In
the
prototype
we have
executed,
in-
formation is stored
in
20
nanograms
(20,000
picograms)
of identical
DNA molecules
-250 base
pairs
in
size
[see
text and
(8)],
far above
the 100-
picograms
range.
Moreover,
use
of this
large
number
of molecules
(about
80
billion)
as PCR
templates
in
our readout
procedure
should
greatly suppress
effects
of
any
base
modifications
during prolonged
storage
that
yield
sequence
changes
in individual
DNA
molecules
[0.
Handt
et
al.,
Am.
J.
Hum.
Genet.
59,
368
(1996)].
5. In the
simplest
model,
two bases
(e.g.,
A and
T)
would
be used
to
encode
text
in information
segments,
and
the other two
(e.g.,
G
and
C)
to construct
sequencing
primers.
This
would
prevent
mishybridization
of se-
quencing primers
to
information
segments,
but would
greatly
limit
efficiency
of text
storage
and the num-
ber of
possible
different
sequencing primers.
In the
prototype
we have
executed,
we have
instead encod-
ed text
using only
the bases
A, C,
and T.
Sequencing
primers
were
designed
with all
four
bases,
plus
a re-
quirement
that each fourth
position
be a G.The resul-
tant mismatch at
(at least)
each fourth
position
be-
tween the
sequences
of
any sequencing
primer
and
any
information
segment
should
prevent
mispriming.
Scale-up
of the
storage
model
presented
here
would
ultimately require
computer-generated
design [see,
for
example,
M. Garzon
et
aL,
in DNA Based
Comput-
ers
V,
volume 54
of DIMACS Series
in
Discrete
Mathe-
7 SEPTEMBER 2001
VOL 293
SCIENCE
www.sciencemag.org
_
,
_,. I
C_~
1764
SCIENCE'S
COMPASS
SCIENCE'S
COMPASS
matics
and Theoretical
Computer
Science,
E.
Winfree,
D.
K.
Gifford,
Eds.
(American
Mathematical
Society,
Providence, RI,
2000),
p.
91]
of
large
numbers of
both
sequencing primers
and iDNA
sequences
that
satisfy
the
constraints
on these
elements.
6. The DNA
bases
were
ordered
alphabetically (A,
C,
T).
DNA codons
were
then
constructed
by
means of a
ternary
code,
beginning
with
"AAA"
(encoding
the let-
ter
"A").
The bases
C,
and then
T,
were inserted
pro-
gressively
into the
third, second,
and first
positions,
yielding
a series
of 27
codons
encoding
the
English
letters in
alphabetical
order,
plus
a
space
(8).
7. C.
Dickens,
A Tale
of Two
Cities
[Oxford
Univ.
Press,
London,
New
York,
1953
(originally
published
in
1859)].
8.
Supplementary
material
is
available
at
http://www.sciencemag.org/cgi/content/full/293/
5536/1763/DC1
9. The
combined
operations
of PCR
followed
by
se-
quence
analysis
are
analogous
to the
retrieval of in-
formation from an
addressable
storage
device such as
the random access
memory
in
a
computer.
The
ability
to
use these
combined
operations
to
retrieve data
permits
construction of DNA
representations
of clas-
sical
computer
data
structures such
as
arrays,
linked
lists,
and
trees. The model
depicted
in the
figure
is
somewhat
analogous
to
an
array
data
structure. The
PPK
contains the addresses of
the data elements
(the
iDNA
segments),
which
can be
calculated
(se-
quenced)
and
then
used for
selective retrieval of the
stored data. In
an alternative serial
model,
analogous
to
a linked
list,
a
series of
iDNAs could be
designed,
each
containing
both
a data
element
and
the se-
quencing primer
for
retrieving
the data element from
the
succeeding
iDNA. Such
a serial model would obvi-
ate the
need for
a
separate
PPK,
but information
re-
trieval would
require
considerably
more
experimental
manipulations (and prior specific
knowledge)
than in
the
parallel
model
explored
here.
10. D.
Gerhold,
T.
Rushmore,
C. T.
Caskey,
Trends
Biochem.
matics
and Theoretical
Computer
Science,
E.
Winfree,
D.
K.
Gifford,
Eds.
(American
Mathematical
Society,
Providence, RI,
2000),
p.
91]
of
large
numbers of
both
sequencing primers
and iDNA
sequences
that
satisfy
the
constraints
on these
elements.
6. The DNA
bases
were
ordered
alphabetically (A,
C,
T).
DNA codons
were
then
constructed
by
means of a
ternary
code,
beginning
with
"AAA"
(encoding
the let-
ter
"A").
The bases
C,
and then
T,
were inserted
pro-
gressively
into the
third, second,
and first
positions,
yielding
a series
of 27
codons
encoding
the
English
letters in
alphabetical
order,
plus
a
space
(8).
7. C.
Dickens,
A Tale
of Two
Cities
[Oxford
Univ.
Press,
London,
New
York,
1953
(originally
published
in
1859)].
8.
Supplementary
material
is
available
at
http://www.sciencemag.org/cgi/content/full/293/
5536/1763/DC1
9. The
combined
operations
of PCR
followed
by
se-
quence
analysis
are
analogous
to the
retrieval of in-
formation from an
addressable
storage
device such as
the random access
memory
in
a
computer.
The
ability
to
use these
combined
operations
to
retrieve data
permits
construction of DNA
representations
of clas-
sical
computer
data
structures such
as
arrays,
linked
lists,
and
trees. The model
depicted
in the
figure
is
somewhat
analogous
to
an
array
data
structure. The
PPK
contains the addresses of
the data elements
(the
iDNA
segments),
which
can be
calculated
(se-
quenced)
and
then
used for
selective retrieval of the
stored data. In
an alternative serial
model,
analogous
to
a linked
list,
a
series of
iDNAs could be
designed,
each
containing
both
a data
element
and
the se-
quencing primer
for
retrieving
the data element from
the
succeeding
iDNA. Such
a serial model would obvi-
ate the
need for
a
separate
PPK,
but information
re-
trieval would
require
considerably
more
experimental
manipulations (and prior specific
knowledge)
than in
the
parallel
model
explored
here.
10. D.
Gerhold,
T.
Rushmore,
C. T.
Caskey,
Trends
Biochem.
Sci.
24,
168
(1999).
11.
There
is
a theoretical
maximum of
420,
but this
number
would be
reduced
by
the
requirement
that
primer
se-
quences
be
designed
to
avoid
mispriming
at
inappro-
priate
sites.
12.
A
conservative
upper
limit on
the
size of
the informa-
tion
segment
is -600
bases,
set
by
the
present
limits
on DNA
sequence
obtainable
from
a
single sequenc-
ing primer.
If four-base codons
chosen from
our
three-base
alphabet (A,
C,
T)
were
used
(to
permit
en-
coding
of
all common
English alphanumeric
charac-
ters
plus
a
space),
each iDNA
could
store about 150
characters.
Storage
of
A
Tale of
Two
Cities,
containing
742,901
alphanumeric
characters
plus spaces,
would
require
-5000
iDNAs.
Current
technology
would
per-
mit
single-pass sequence
analysis
of a
single
PPK con-
taining
up
to
100
unique sequencing primers,
imply-
ing
that
information
could be stored
in -100
differ-
ent iDNAs
per
microwell.
Since
50
microwells
would
thus be
required
to
store Dickens' novel in
DNA
form,
a
10,000-well
microchip
could store
-200
texts.
13.
Supported by
Defense
Advanced Research
Projects
Agency/National
Science
Foundation
grant
CCR-9724012.
The
Challenge
of
Defining
Disease
THE
PHILOSOPHICAL
DEFINITION OF
DISEASE
based on
impairment
or limitation of
normal
function that
Boorse
(1)
proposed
is
rejected
as
being clinically
impractical by
L.
K.
F
Temple
and
co-authors
in
their
Essay
"Defining
disease
in
the
genomics
era"
Sci.
24,
168
(1999).
11.
There
is
a theoretical
maximum of
420,
but this
number
would be
reduced
by
the
requirement
that
primer
se-
quences
be
designed
to
avoid
mispriming
at
inappro-
priate
sites.
12.
A
conservative
upper
limit on
the
size of
the informa-
tion
segment
is -600
bases,
set
by
the
present
limits
on DNA
sequence
obtainable
from
a
single sequenc-
ing primer.
If four-base codons
chosen from
our
three-base
alphabet (A,
C,
T)
were
used
(to
permit
en-
coding
of
all common
English alphanumeric
charac-
ters
plus
a
space),
each iDNA
could
store about 150
characters.
Storage
of
A
Tale of
Two
Cities,
containing
742,901
alphanumeric
characters
plus spaces,
would
require
-5000
iDNAs.
Current
technology
would
per-
mit
single-pass sequence
analysis
of a
single
PPK con-
taining
up
to
100
unique sequencing primers,
imply-
ing
that
information
could be stored
in -100
differ-
ent iDNAs
per
microwell.
Since
50
microwells
would
thus be
required
to
store Dickens' novel in
DNA
form,
a
10,000-well
microchip
could store
-200
texts.
13.
Supported by
Defense
Advanced Research
Projects
Agency/National
Science
Foundation
grant
CCR-9724012.
The
Challenge
of
Defining
Disease
THE
PHILOSOPHICAL
DEFINITION OF
DISEASE
based on
impairment
or limitation of
normal
function that
Boorse
(1)
proposed
is
rejected
as
being clinically
impractical by
L.
K.
F
Temple
and
co-authors
in
their
Essay
"Defining
disease
in
the
genomics
era"
(Science's
Compass,
3
Aug., p. 807).
How-
ever,
the
definition
they
offer
seems too
broad
and also
fraught
with
its own set
of
difficulties.
Temple
et
al. write
that
"disease is a
state
that
places
individuals at
increased
risk
of adverse
consequences."
According
"...activities
such as
mountain
climbing
or
bungee jumping
could
be
construed as
disease
states."
to this
definition,
activities
such as moun-
tain
climbing
or
bungee
jumping
could be
construed as
disease
states. If
we
could
provide
appropriate
modifiers for
the
words
"state"
and
"adverse
consequences,"
then we
would be
better
poised
to
begin
a
more
precise
definition
of
disease
in
the
genomics
era.
(Science's
Compass,
3
Aug., p. 807).
How-
ever,
the
definition
they
offer
seems too
broad
and also
fraught
with
its own set
of
difficulties.
Temple
et
al. write
that
"disease is a
state
that
places
individuals at
increased
risk
of adverse
consequences."
According
"...activities
such as
mountain
climbing
or
bungee jumping
could
be
construed as
disease
states."
to this
definition,
activities
such as moun-
tain
climbing
or
bungee
jumping
could be
construed as
disease
states. If
we
could
provide
appropriate
modifiers for
the
words
"state"
and
"adverse
consequences,"
then we
would be
better
poised
to
begin
a
more
precise
definition
of
disease
in
the
genomics
era.
just
a...CL
i
CKw
just
a...CL
i
CKw
The
Mammalian
Genotyping
Service is
funded
by
the
National
Heart,
Lung,
and Blood Institute to
assist
in
linkage mapping
of
genes
which
cause or
influence disease
and other research
purposes.
Genotyping
is
carried out
using
whole
genome
polymorphism
scans
at
Marshfield,
Wisconsin
under
the
direction of Dr. James
Weber.
Capacity
of the
Service
is
currently
about
7,000,000
genotypes (DNA
samples
times
polymorphic
markers)
per year
and
growing. Although
the Service was
initially
established for
genetic
projects dealing
with
heart,
lung,
and blood
diseases,
the Mammalian
Genotyping
Service
will
now
consider
all
meritorious
applications.
Genome scans for
humans, mice,
rats,
dogs
and
zebrafish are available.
To
ensure
the most
promising
projects
are
undertaken,
investigators
must
submit a brief
application
which will be
evaluated
by
a
scientific
advisory
panel.
At this
time,
only
projects
with at least
10,000
genotypes
will be
considered. DNA
samples
must be
in
hand at
the
time of
application.
Most
genotyping
within the
Service
is
currently
done
with
multiallelic
STRPs
(microsatellites).
However,
genotyping
with
human
diallelic
polymorphisms
has been
initiated
and will
likely expand.
There are no
genotyping
fees
for
approved
projects.
The
Service is funded
through
September,
2006.
Application
deadlines are
every
six months.
The
Mammalian
Genotyping
Service is
funded
by
the
National
Heart,
Lung,
and Blood Institute to
assist
in
linkage mapping
of
genes
which
cause or
influence disease
and other research
purposes.
Genotyping
is
carried out
using
whole
genome
polymorphism
scans
at
Marshfield,
Wisconsin
under
the
direction of Dr. James
Weber.
Capacity
of the
Service
is
currently
about
7,000,000
genotypes (DNA
samples
times
polymorphic
markers)
per year
and
growing. Although
the Service was
initially
established for
genetic
projects dealing
with
heart,
lung,
and blood
diseases,
the Mammalian
Genotyping
Service
will
now
consider
all
meritorious
applications.
Genome scans for
humans, mice,
rats,
dogs
and
zebrafish are available.
To
ensure
the most
promising
projects
are
undertaken,
investigators
must
submit a brief
application
which will be
evaluated
by
a
scientific
advisory
panel.
At this
time,
only
projects
with at least
10,000
genotypes
will be
considered. DNA
samples
must be
in
hand at
the
time of
application.
Most
genotyping
within the
Service
is
currently
done
with
multiallelic
STRPs
(microsatellites).
However,
genotyping
with
human
diallelic
polymorphisms
has been
initiated
and will
likely expand.
There are no
genotyping
fees
for
approved
projects.
The
Service is funded
through
September,
2006.
Application
deadlines are
every
six months.
11,000
Fir
CHEMICAL
11,000
Fir
CHEMICAL
C C
,nd
SUPPUES
,nd
SUPPUES
Y;-
V A ' A
'r.
* 1 4
1 4 [:.
1 [ {
I
I{
~
i i
I I r
Y;-
V A ' A
'r.
* 1 4
1 4 [:.
1 [ {
I
I{
~
i i
I I r
Circle
No.
56 on Readers'
Service Card
Circle
No.
56 on Readers'
Service Card
Circle No. 4
on
Readers'
Service Card
Circle No. 4
on
Readers'
Service Card
-
2
O
-
2
O
... Recently, million years old genomic DNA of mammoths was successfully decoded, revealing its great potential as a long-term data carrier under frozen conditions 1 . Owing to its high density and low maintenance cost, revealed by recent studies, DNA has been considered as an ideal storage medium to meet the emerging challenge of data explosion [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] . Frequently occurring errors in DNA synthesis, amplification, sequencing, and preservation, however, challenge the data reliability in DNA. ...
Article
Full-text available
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g. DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. Here the authors present a strand assembly algorithm (DBGPS) using de Bruijn graph and greedy path search.
... However, biotechnologies such as synthesis and sequencing were limited at that time, and the field did not develop rapidly. In 2001, Bancroft et al. 2 encoded two quotes from the opening lines of A Tale of Two Cities into a DNA molecule using a method similar to the codon method used in DNA to encode protein sequences. In 2012, Church et al. 3 encoded text, JavaScript programs, and images into corresponding DNA sequences that were eventually deposited in DNA and could handle errors from DNA sequencing and synthesis. ...
Article
Full-text available
The rapid development of information technology has generated substantial data, which urgently requires new storage media and storage methods. DNA, as a storage medium with high density, high durability, and ultra-long storage time characteristics, is promising as a potential solution. However, DNA storage is still in its infancy and suffers from low space utilization of DNA strands, high read coverage, and poor coding coupling. Therefore, in this work, an adaptive coding DNA storage system is proposed to use different coding schemes for different coding region locations, and the method of adaptively generating coding constraint thresholds is used to optimize at the system level to ensure the efficient operation of each link. Images, videos, and PDF files of size 698 KB were stored in DNA using adaptive coding algorithms. The data were sequenced and losslessly decoded into raw data. Compared with previous work, the DNA storage system implemented by adaptive coding proposed in this paper has high storage density and low read coverage, which promotes the development of carbon-based storage systems.
... New alternative methods of data storage, such as molecular or atomic media, have been considered sustainable because they increase storage capacity, reduce energy cost, and increase durability [2]. As shown in Fig. 1, data storage using synthetic DNA has received a lot of attention and is considered a sustainable alternative and storage medium [5][6][7][8][9][10]. As the carrier of genetic information in our cells and, more generally, of life as we know it, the DNA molecule has evolved as a natural storage medium for genetic information and serves as a model for biological life. ...
Article
Full-text available
Deoxyribonucleic acid (DNA) is increasingly emerging as a serious medium for long-term archival data storage because of its remarkable high-capacity, high-storage-density characteristics and its lasting ability to store data for thousands of years. Various encoding algorithms are generally required to store digital information in DNA and to maintain data integrity. Indeed, since DNA is the information carrier, its performance under different processing and storage conditions significantly impacts the capabilities of the data storage system. Therefore, the design of a DNA storage system must meet specific design considerations to be less error-prone, robust and reliable. In this work, we summarize the general processes and technologies employed when using synthetic DNA as a storage medium. We also share the design considerations for sustainable engineering to include viability. We expect this work to provide insight into how sustainable design can be used to develop an efficient and robust synthetic DNA-based storage system for long-term archiving.
... Years later DNA Microdot was employed to encrypt some short messages [4] (1999). More studies on DNA storage emerged afterwards for purpose of either art or archive [1] [17] [11]. ...
Preprint
Full-text available
This work generalizes binary parity check, exclusive-or and Hamming code to the cases of quaternary system, resulting in quaternary checksum, redundancy and Hamming code, respectively. Tested libraries of computer programs written in Perl as well as sample Perl and Python scripts invoking those Perl libraries are made available online. Supplementary materials available at https://www.researchgate.net/publication/354652282_Quaternary_ checksum_and_redundancy Computer programs for both Perl and Python users available at https://github.com/tom123jack321/quaternary_ cs_redun_hamming.
... DNA has an estimated information density of about 4.6 × 10 8 GB/mm 3 and is, under optimal conditions, stable for thousands of years (2). Several groups have developed approaches for DNA data storage (1,(3)(4)(5)(6)(7)(8)(9) and DNA watermarking (10)(11)(12)(13)(14)(15). However, limitations in code word design have only been partly addressed so far. ...
Article
Full-text available
The use of complex biological molecules to solve computational problems is an emerging field at the interface between biology and computer science. There are two main categories in which biological molecules, especially DNA, are investigated as alternatives to silicon-based computer technologies. One is to use DNA as a storage medium, and the other is to use DNA for computing. Both strategies come with certain constraints. In the current study, we present a novel approach derived from chaos game representation for DNA to generate DNA code words that fulfill user-defined constraints, namely GC content, homopolymers, and undesired motifs, and thus, can be used to build codes for reliable DNA storage systems.
... They proposed different encoding and decoding schemes and did real wet-lab experiments to store digital data into DNA storage. For example, for an early study, Bancroft et al. [15] used two DNA classes to store different types of data. They encoded English characters and a space to three nucleotides and finally achieved an information density of 1.06 nucleotides per bit. ...
Preprint
Full-text available
With the rapid increase of available digital data, DNA storage is identified as a storage media with high density and capability of long-term preservation, especially for archival storage systems. However, the encoding density (i.e., how many binary bits can be encoded into one nucleotide) and error handling are two major factors intertwined in DNA storage. Considering encoding density, theoretically, one nucleotide can encode two binary bits (upper bound). However, due to biochemical constraints and other necessary information associated with payload, the encoding densities of various DNA storage systems are much less than this upper bound. Additionally, all existing studies of DNA encoding schemes are based on static analysis and really lack the awareness of dynamically changed digital patterns. Therefore, the gap between the static encoding and dynamic binary patterns prevents achieving a higher encoding density for DNA storage systems. In this paper, we propose a new Digital Pattern-Aware DNA storage system, called DP-DNA, which can efficiently store digital data in DNA storage with high encoding density. DP-DNA maintains a set of encoding codes and uses a digital pattern-aware code (DPAC) to analyze the patterns of a binary sequence for a DNA strand and selects an appropriate code for encoding the binary sequence to achieve a high encoding density. An additional encoding field is added to the DNA encoding format, which can distinguish the encoding scheme used for those DNA strands, and thus we can decode DNA data back to its original digital data. Moreover, to further improve the encoding density, a variable-length scheme is proposed to increase the feasibility of the coding scheme with a high encoding density. Finally, the experimental results indicate that the proposed DP-DNA achieves up to 103.5% higher encoding densities than prior work.
Preprint
Full-text available
Current research on DNA storage usually focuses on the improvement of storage density by developing effective encoding and decoding schemes while lacking the consideration on the uncertainty in ultra-long-term data storage and retention. Consequently, the current DNA storage systems are often not self-contained, implying that they have to resort to external tools for the restoration of the stored DNA data. This may result in high risks in data loss since the required tools might not be available due to the high uncertainty in far future. To address this issue, we propose in this paper a self-contained DNA storage system that can bring self-explanatory to its stored data without relying on any external tool. To this end, we design a specific DNA file format whereby a separate storage scheme is developed to reduce the data redundancy while an effective indexing is designed for random read operations to the stored data file. We verified through experimental data that the proposed self-contained and self-explanatory method can not only get rid of the reliance on external tools for data restoration but also minimise the data redundancy brought about when the amount of data to be stored reaches a certain scale.
Article
Manipulation, measurement, and imaging of single molecules are of great importance for furthering our fundamental understanding of matter and living systems. The past decades have witnessed phenomenal advances in creating single-molecule tools to spatially and temporally control single-molecule events. DNA nanotechnology offers a promising solution for addressing key challenges for on-demand designability of new analytic platforms and controllability of dynamical processes for single-molecule analysis. Here we review recent advances in using self-assembled DNA nanostructures for single-molecule analysis. Thanks to their programmability, accuracy, hybridization dynamics, and chemical functionality, DNA nanostructures can not only act as a part of a “bottom-up” constructed analytic system, but also serve as a defined nanoobject for the calibration of new methodologies, empowering single-molecule analysis. Future research in building better self-assembled structures with sub-nm accuracy and defined optoelectronic properties and advanced molecular machineries with complex functionalities may open new adventures for in vitro and in vivo single-molecule analysis.
Article
Full-text available
Current research on DNA storage usually focuses on the improvement of storage density by developing effective encoding and decoding schemes while lacking the consideration on the uncertainty in ultra-long-term data storage and retention. Consequently, the current DNA storage systems are often not self-contained, implying that they have to resort to external tools for the restoration of the stored DNA data. This may result in high risks in data loss since the required tools might not be available due to the high uncertainty in far future. To address this issue, we propose in this paper a self-contained DNA storage system that can bring self-explanatory to its stored data without relying on any external tool. To this end, we design a specific DNA file format whereby a separate storage scheme is developed to reduce the data redundancy while an effective indexing is designed for random read operations to the stored data file. We verified through experimental data that the proposed self-contained and self-explanatory method can not only get rid of the reliance on external tools for data restoration but also minimise the data redundancy brought about when the amount of data to be stored reaches a certain scale.
ResearchGate has not been able to resolve any references for this publication.