ArticleLiterature Review

DNA as a digital information storage device: hope or hype?

Authors:
  • ICAR-National Rice Research Institute
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The total digital information today amounts to 3.52 × 10²² bits globally, and at its consistent exponential rate of growth is expected to reach 3 × 10²⁴ bits by 2040. Data storage density of silicon chips is limited, and magnetic tapes used to maintain large-scale permanent archives begin to deteriorate within 20 years. Since silicon has limited data storage ability and serious limitations, such as human health hazards and environmental pollution, researchers across the world are intently searching for an appropriate alternative. Deoxyribonucleic acid (DNA) is an appealing option for such a purpose due to its endurance, a higher degree of compaction, and similarity to the sequential code of 0’s and 1’s as found in a computer. This emerging field of DNA as means of data storage has the potential to transform science fiction into reality, wherein a device that can fit in our palms can accommodate the information of the entire world, as latest research has revealed that just four grams of DNA could store the annual global digital information. DNA has all the properties to supersede the conventional hard disk, as it is capable of retaining ten times more data, has a thousandfold storage density, and consumes 10⁸ times less power to store a similar amount of data. Although DNA has an enormous potential as a data storage device of the future, multiple bottlenecks such as exorbitant costs, excruciatingly slow writing and reading mechanisms, and vulnerability to mutations or errors need to be resolved. In this review, we have critically analyzed the emergence of DNA as a molecular storage device for the future, its ability to address the future digital data crunch, potential challenges in achieving this objective, various current industrial initiatives, and major breakthroughs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Smart contracts within the blockchain ecosystem further streamline processes, ensuring faster and more efficient transactions. Given the substantial volume of data generated from human genome sequencing, an alternative solution has emerged involving the physical storage of data -specifically, storing it in DNA [3] [4]. However, this technology has certain inherent drawbacks, notably in data retrieval. ...
... Using the same technique used to encode the data, this artificial DNA can be obtained and can be decoded back to its original form -binary codes. [3] [4] Assume that the nucleotide bases -A, T, G, and C are encoded into or from binary digits as follows: ...
... The 'storage issue' may be substantially resolved with this storage method. In order to fully appreciate the potential of this technology, consider that by 2040, one kilogram of DNA will be enough to fulfill the world's storage demands, which are projected to be around 3 x 10^24 bits [4]. However, there are some major drawbacks associated with this technology that cannot be overlooked. ...
Preprint
Full-text available
Individual genome projects have not been discussed in quite some time, possibly for various reasons. However, despite best efforts, there remains a need to establish a means by which individuals can benefit from the reference genome—a significant outcome of the Human Genome Project—and effectively utilize it. In light of this, a personalized genomic application has been visualized as a potent transformative tool accessible to people globally. An integrated framework has been proposed that melds blockchain technology, genome-tailored applications, and data compression techniques to ensure swift, secure, transparent, and space-efficient operations. This software leverages advanced Artificial Intelligence and Machine Learning methodologies, including neural networks, fuzzy logic, and expert systems, to process individual genomic data. Reports are then generated based on insights gleaned from the user's genome, compared to a reference genome to detect any disparities. Given the security, encryption, and immutability features of blockchain technology, it proves apt for both data transport and storage. Moreover, a technique called 'Data Abbreviation' has been devised to ensure that genetic data and generated reports occupy less space. All of this could contribute to a transformative step toward the greater good of humanity.
... Smart contracts within the blockchain ecosystem further streamline processes, ensuring faster and more efficient transactions. Given the substantial volume of data generated from human genome sequencing, an alternative solution has emerged involving the physical storage of data -specifically, storing it in DNA [3] [4]. However, this technology has certain inherent drawbacks, notably in data retrieval. ...
... Using the same technique used to encode the data, this artificial DNA can be obtained and can be decoded back to its original form -binary codes. [3] [4] Assume that the nucleotide bases -A, T, G, and C are encoded into or from binary digits as follows: International Journal of Biotechnology & Bioengineering, 2024 A 01 T 00 G 10 C 11 For instance: An image file's binary is acquired. Using the codes above, this binary will be encoded to A, T, G, and C. ...
... The 'storage issue' may be substantially resolved with this storage method. In order to fully appreciate the potential of this technology, consider that by 2040, one kilogram of DNA will be enough to fulfill the world's storage demands, which are projected to be around 3 x 10^24 bits [4]. However, there are some major drawbacks associated with this technology that cannot be overlooked. ...
Preprint
Full-text available
Individual genome projects have not been discussed in quite some time, possibly for various reasons. However, despite best efforts, there remains a need to establish a means by which individuals can benefit from the reference genome—a significant outcome of the Human Genome Project—and effectively utilize it. In light of this, a personalized genomic application has been visualized as a potent transformative tool accessible to people globally. An integrated framework has been proposed that melds blockchain technology, genome-tailored applications, and data compression techniques to ensure swift, secure, transparent, and space-efficient operations. This software leverages advanced Artificial Intelligence and Machine Learning methodologies, including neural networks, fuzzy logic, and expert systems, to process individual genomic data. Reports are then generated based on insights gleaned from the user's genome, compared to a reference genome to detect any disparities. Given the security, encryption, and immutability features of blockchain technology, it proves apt for both data transport and storage. Moreover, a technique called 'Data Abbreviation' has been devised to ensure that genetic data and generated reports occupy less space. All of this could contribute to a transformative step toward the greater good of humanity.
... Traditional storage technology has been difficult to meet the growing demand for high-speed data storage in the future. In contrast, DNA molecular as a carrier of life information has unique advantages in terms of storage time, stability and energy consumption [5,6]. Utilizing DNA for image data B Kun Bi bik@seu.edu.cn 1 State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China storage is a novel approach that is expected to develop into a cost-effective and highly stable solution. ...
Article
Full-text available
The increasing demand for image data storage exceeds the capabilities of current technology and DNA, as an emerging storage medium, is expected to resolve the challenge of storing massive image data. Recent DNA image storage methods adopted fixed code tables and error-correcting codes without leveraging image characteristics, leading to low coding density, low security, and poor reconstruction. In this paper, according to the characteristics of images and DNA storage, we propose a method called DNA Lossy Storage Image Encryption and Denoising (DNA-LSIED), which focuses on image security and high-quality image reconstruction. Firstly, DNA-LSIED converts the pixel matrix into DNA sequences using the chaotic encryption algorithm to balance the GC content. Then, it employs the maximum-probability insertion and deletion strategy to convert base insertion and deletion errors into substitution errors which are dispersed and converted into noise on the image through data interleaving. Finally, median filtering is applied to remove the noise caused by base substitution errors to reconstruct high-quality images. Compared to other methods, DNA-LSIED has significant advantages in terms of reconstructed image quality, coding density and synthesized cost, which contributes to cost savings and large-scale applications. DNA-LSIED provides a new insight into the storage of images in DNA and combine biotechnology with computer technology, which facilitates interdisciplinary applications.
... DNA storage technology refers to the technology in which information such as documents, images, and audio is stored and completely read using artificially synthesized deoxynucleotide chains. Compared with traditional electronic information storage, DNA storage has the advantages of large capacity, high density, and low energy consumption [1][2][3]. In an era where data storage demands are growing exponentially, DNA is expected to become a potential storage medium to replace traditional storage devices. ...
Article
Full-text available
The technology of DNA storage uses artificially synthesized deoxynucleotide chains to store information, ensuring precise and error-free reading. Compared with traditional electronic information storage, it has advantages in terms of capacity, density, and energy consumption. In this study, a K-Means clustering model is constructed with the aim of accurately clustering DNA sequences after DNA storage sequencing. To objectively evaluate the effectiveness of the model, clustering results are compared in detail with the correct DNA sequences. Experimental data show that when processing 100,000 DNA storage sequencing sequences, the accuracy of the model exceeds 90%, and the entire clustering process only takes 10 seconds. This result fully demonstrates the important role of the K-Means model in restoring original information sequences in DNA storage and provides a solid theoretical and practical foundation for future research.
... Furthermore, the biochemical reactions and inherent operations of DNA molecules demonstrate substantial parallelism. Despite current drawbacks such as high read/write costs and slow read/write speeds, DNA is still considered the optimal choice for storing over 60% of global cold data, making it a potential storage medium for current cloud storage applications [7]. Vast amounts of sensor data, images, and video streams are expected to be encoded and stored in tiny DNA molecules, achieving extremely high information density. ...
Preprint
In the wake of the swift evolution of technologies such as the Internet of Things (IoT), the global data landscape undergoes an exponential surge, propelling DNA storage into the spotlight as a prospective medium for contemporary cloud storage applications. This paper introduces a Semantic Artificial Intelligence-enhanced DNA storage (SemAI-DNA) paradigm, distinguishing itself from prevalent deep learning-based methodologies through two key modifications: 1) embedding a semantic extraction module at the encoding terminus, facilitating the meticulous encoding and storage of nuanced semantic information; 2) conceiving a forethoughtful multi-reads filtering model at the decoding terminus, leveraging the inherent multi-copy propensity of DNA molecules to bolster system fault tolerance, coupled with a strategically optimized decoder's architectural framework. Numerical results demonstrate the SemAI-DNA's efficacy, attaining 2.61 dB Peak Signal-to-Noise Ratio (PSNR) gain and 0.13 improvement in Structural Similarity Index (SSIM) over conventional deep learning-based approaches.
... DNA is a stable and efficient molecule for storing genetic information and is particularly attractive for spacebased biomanufacturing, where products can be synthesized when and where they are needed. 2 However, the space environment poses significant challenges to the stability and functionality of DNA-based systems because exposure to cosmic radiation, microgravity, and extreme temperatures can cause DNA damage, strand breaks, and other genetic changes. 3−7 In addition, space environmental factors can also affect the gene expression and protein synthesis machinery of cells, impacting the functionality and efficiency of DNA-based systems. ...
Article
Full-text available
Effective transport of biological systems as cargo during space travel is a critical requirement to use synthetic biology and biomanufacturing in outer space. Bioproduction using microbes will drive the extent to which many human needs can be met in environments with limited resources. Vast repositories of biological parts and strains are available to meet this need, but their on-site availability requires effective transport. Here, we explore an approach that allows DNA plasmids, ubiquitous synthetic biology parts, to be safely transported to the International Space Station and back to the Kennedy Space Center without low-temperature or cryogenic stowage. Our approach relied on the cyanobacterium Nostoc punctiforme PC73102, which is naturally tolerant to prolonged desiccation. Desiccated N. punctiforme was able to carry the non-native pSCR119 plasmid as intracellular cargo safely to space and back. Upon return to the laboratory, the extracted plasmid showed no DNA damage or additional mutations and could be used as intended to transform the model synbio host Escherichia coli to bestow kanamycin resistance. This proof-of-concept study provides the foundation for a ruggedized transport host for DNA to environments where there is a need to reduce equipment and infrastructure for biological parts stowage and storage.
... The world of nanoscience is quickly evolving. 1−4 As DNArelated costs (synthesis and sequencing) are increasingly accessible, nanotechnological methods using DNA, and their applications, are burgeoning and far-reaching, 5 ranging from well-known rather simple applications (e.g., barcoding and data storage) 6,7 to highly complex ones (e.g., random number generation, logic gates and circuits, and cryptography). 8 One deleterious limiting factor is the intrinsic nature of DNA that requires aqueous working environments. ...
Article
Full-text available
The present work describes a complete and reversible transformation of DNA’s properties allowing solubilization in organic solvents and subsequent chemical modifications that are otherwise not possible in an aqueous medium. Organo-soluble DNA (osDNA) moieties are generated by covalently linking a dsDNA fragment to a polyether moiety with a built-in mechanism, rendering the process perfectly reversible and fully controllable. The precise removal of the polyether moiety frees up the initial DNA fragment, unaltered, both in sequence and nature. The solubility of osDNA was confirmed in six organic solvents of decreasing polarity and six types of osDNAs. As a proof of concept, in the context of DNA-encoded library (DEL) technology, an amidation reaction was successfully performed on osDNA in 100% DMSO. The development of osDNA opens up entirely new avenues for any DNA applications that could benefit from working in nonaqueous solutions, including chemical transformations.
... With the exponential growth of internet data, the demand for efficient and high-quality storage media is increasingly evident (Zhirnov et al., 2016) . Mainstream storage media include magnetic storage, optical storage, and semiconductor storage (Kun et al., 2020), all of which suffer from issues such as short storage lifespan and the need for high energy consumption to maintain stored content (Williams et al., 2002;Goda and Kitsuregawa, 2012;Extance, 2016;Panda et al., 2018). For instance, astronomical data records require the use of tons of heavy hard drives, and the lifespan of these hard drives is limited by their read/write cycles. ...
Article
Full-text available
DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid‘s resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.
... Its structure comprises long twisted strings of sugar-phosphate chains connected through AT/GC base pairs (A -adenine, T -thymine, G -guanine and C -cytosine); the sequence of AT and GC is analogous to the coding of binary numbers in computers. 28,29 It is the best nanowire in natural existence, the cost of individual nucleo-bases is a few US cents, and it has benefits like self assembly, self replication and the capacity to adopt various conformers. Intrinsic DNA is electrically poorly conducting (insulating) but insertion of metal ions within the base pairs of this molecule appreciably enhances its electrical conductivity and hence makes such materials suitable in the design of bioelectronics. ...
Article
Full-text available
Green electronics, where functional organic/bio-materials that are biocompatible and easily disposable are implemented in electronic devices, have gained profound interest. DNA is the best biomolecule in existence that shows data storage capacity, in virtue of the sequential arrangement of AT and GC base pairs, analogous to the coding of binary numbers in computers. In the present work, a robust, uniform and repeatable room-temperature resistive switching in a Cu/Cu2S/DNA/Au heterojunction is demonstrated. The DNA nanostructures were anchored on the densely packed hexagonal Cu2S structures by simple electrochemical deposition. This heterostructure presents outstanding memristor behavior; the device exhibits resistive switching at a very low threshold voltage of 0.2 V and has a relatively high ON/OFF ratio of more than 10² with a good cycling stability of ∼1000 cycles and a negligible amount of variation. The justification for such a switching mechanism is also given on the basis of the energy-band diagram of the Cu2S–DNA interface. Based on the studies herein, the resistive switching is attributed to the reversible doping of DNA by Cu⁺ ions, leading to intrinsic trap states. Further, the switching is modeled with the help of different transport mechanisms, like Schottky-barrier emission, Poole–Frenkel emission and Fowler–Nordheim tunneling.
... DNA is made up of four nucleotides which are Thymine (T), Cytosine(C), Guanine (G), and Adenosine (A). Therefore, DNA can be used as a storage media for any kind of information [ 33 ]. The property of hybridization between complementary DNA nucleotide bases A-T and C-G is exploited in the biomolecular computing field as a central process of computations. ...
Article
Full-text available
To address the need for secure digital image transmission an algorithm that fulfils all prominent prerequisites of a steganography technique is developed. By incorporating the salient features of fractal cover images, dual-layer encryption using the standard chaotic map and DNA-hyperchaotic cryptography along with DWT-SVD embedding, key aspects like robustness, better perceptual quality and high payload capacity are targeted to build a blind colour image steganography algorithm in this work. A fractal cover image is used to hide a DNA-chaotic encrypted colour image using DWT-SVD embedding method. A two-dimensional standard chaotic map, which exhibits robust chaos for a very large range of parameter, is used to generate the pseudo-random number sequences of cryptographic qualities. One of the core novelty of the proposed method is the 2 layers chaotic encryption method to generate the DNA encrypted secret image which is finally embedded in a fractal cover image using DWT-SVD transform domain technique capable of withstanding the false positive attack. The comprehensive statistical security tests and the standard evaluation benchmarks depict that this efficient yet simple hybrid steganography algorithm is highly robust as well as sustainable against removal, geometrical, image enhancement and histogram attacks, offers better perceptual image quality and also contributes high perceptual quality of the extracted image.
... It has extremely high data storage density, remains stable for hundreds of years, and requires very little power [36,37]. Surely, there are a few engineering obstacles to be conquered before DNA can be used as a mass data storage device [38]. ...
Article
Full-text available
Constrained coding is a somewhat nebulous term which we may define by either inclusion or exclusion. A constrained system is defined by a constrained set of 'good' or 'allowable' sequences to be recorded or transmitted. Constrained coding focuses on the analysis of constrained systems and the design of efficient encoders and decoders that transform arbitrary user sequences into constrained sequences. Constrained coding has extensively been used since the advent in the 1950s of digital storage and communication devices. They have found application in all hard disk, non-volatile memories, optical discs, such as CD, DVD and Blu-Ray Disc, and they are now projected for usage in DNA-based storage. We survey theory and practice of constrained coding, tracing the evolution of the subject from its origins in Shannon's classic 1948 paper to present-day applications in DNA-based data storage systems.
... NGS predstavlja čitavu skupinu različitih metoda sekvenciranja od kojih su danas dve u širokoj komercijalnoj upotrebi [5] [7]. Tabela 1. Tabelarno poređenje DNK kapaciteta sa dosadašnjim pristupima na osnovu nekoliko karakteristika [3]. ...
Article
Potrebe za novim alternativama skladištenja sve veće količine podataka su evidentne. Čuvanje podataka u DNK molekulima se javlja kao jedna od opcija koja ima brojne prednosti u odnosu na postojeće mogućnosti. Kakav uticaj može imati na čitavu sferu skladištenja podataka, način realizacije i njene prednosti i mane detaljno su analizirane u ovom radu.
... Suffice to say that this has interestingly allowed the appreciation of quantum molecular interactions and information flow through dynamic signal processing and precision recordings that are only recently becoming accessible through quantum computing. The potential for quantum biology using DNA nanotechnologies like DNA origami [13], DNA for quantum information processing and personalized cryptographic encoding for healthcare data using DNA makes this an even more attractive area of recent research focus [14]. ...
... Suffice to say that this has interestingly allowed the appreciation of quantum molecular interactions and information flow through dynamic signal processing and precision recordings that are only recently becoming accessible through quantum computing. The potential for quantum biology using DNA nanotechnologies like DNA origami [13], DNA for quantum information processing and personalized cryptographic encoding for healthcare data using DNA makes this an even more attractive area of recent research focus [14]. ...
... [33] In biomedical, luminescent, and the production of antibodies, nuclear coordination polymer materials have been used alongside; this analysis also uses arti cial data storage devices. [34] Coordinating substances vary from coordination chemistry in general from crystal architecture and engineering. The frequency of the interactions gives the metal-organic compounds a degree of structural modularity. ...
Preprint
Full-text available
In biological systems Chirality is important property from small molecules to macromolecules. The construction of homochiral coordination supramolecules in crystal and helical delivers the connection of molecular and macromolecular chirality. Complexity and properties in the presence of cadmium ion and bpe auxiliary ligand for bio-molecular guanosine-5- monophosphate disodium salt (GMP) was studied. The two Complexes 1 and 2 have been investigated the impact of auxiliary ligand bpe, hydroxy group on the sugar motif and pH for coordination of GMP ligands. The interaction of mixed ligands for growth and advancement of chiral complexes was controlled by the alteration of pH values for coordination of guanosine-5-monophosphate nucleotide with cadmium Cd (II) metal. The chirality of complexes 1 and 2 was studied with solid circular dichroism (CD) spectroscopy, including supramolecular chirality and extended auxiliary ligand (EAC) combining with the crystal structure analysis. The various hydrogen bonding and auxiliary ligand are the special means of transporting chirality from isolated molecules to dynamic supramolecular three-dimensional designs of GMP nucleotide crystals. The research results will be benefit to the controlling supramolecular assembly with well-defined structure and properties.
... Data hiding based on the DNA sequence has been attracting much attention due to its potential storage capacity [11]. Several DNA steganography approaches have been proposed [12] [13] [14]. ...
Article
Full-text available
Cryptography is the science of protecting information by transforming data into formats that cannot be recognized by unauthorized users. Steganography is the science of hiding information using different media such as image, audio, video, text, and deoxyribonucleic acid (DNA) sequence. The DNA-based steganography is a newly discovered information security technology characterized by high capacity, high randomization, and low modification rate that leads to increased security. There are various DNA-based methods for hiding information.. In this paper, we compared three DNA-based techniques (substitution, insertion, and complementary) in terms of its capacity, cracking property, Bit Per Nucleotide (BPN), and payload. The selected algorithms combine DNA-based steganography and cryptography techniques. The results show that the substitution technique offers the best BPN for short secret messages and offers the best imperceptibility feature. We also found that both the substitution and the complementary method have a threshold BPN. On the other hand, the insertion method does not have a threshold BPN and it is more difficult to crack.
... However, the amount of information humans process daily is comparable to all information stored in the world on printed media. This has inspired scientists to explore DNA's potential in applied computing and information storage (8,9). ...
Article
Full-text available
Nowadays, information processing is based on semiconductor (e.g., silicon) devices. Unfortunately, the performance of such devices has natural limitations owing to the physics of semiconductors. Therefore, the problem of finding new strategies for storing and processing an ever-increasing amount of diverse data is very urgent. To solve this problem, scientists have found inspiration in nature, because living organisms have developed uniquely productive and efficient mechanisms for processing and storing information. We address several biological aspects of information and artificial models mimicking corresponding bioprocesses. For instance, we review the formation of synchronization patterns and the emergence of order out of chaos in model chemical systems. We also consider molecular logic and ion fluxes as information carriers. Finally, we consider recent progress in infochemistry, a new direction at the interface of chemistry, biology, and computer science, considering unconventional methods of information processing. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 12 is June 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
... Scientists from the New York Genome Center and Columbia University, USA, developed a highly robust storage mechanism called 'DNA Fountain' and present the storage of 2.14 × 10 6 bytes of information, including videos and a complete computer operating system [30]. ...
Chapter
Full-text available
Every moment, millions of files are transferred and stored everywhere. There are many evil actions aimed at obtaining data, whether for theft or destruction. Therefore, it is necessary to work on techniques that make storing and transferring data safe. To protect data, various technologies have evolved over the years using cryptography, information hiding. Deoxyribonucleic acid (DNA) -based cryptography is recently the preferred trend for security. Machine learning (ML), Big Data (BD), and DNA coding created a revolution in security services that use biomolecules’ concepts, giving us hope for unbreakable algorithms. However, the concepts still need to be explicitly exploited for practical applications where people use and apply it. This paper discusses the application of DNA coding features, service models, and security issues. It proposes a technique for securing data during storing or transferring, which will be cost-effective and secure using bio-computational techniques. The tool uses ML,BD, and DNA steganography techniques and binary coding rules to make the algorithm secure where it is an additional layer of biosecurity, which is more effective than conventional cryptographic techniques. We are using hash function techniques to make the tool more efficient for message authentication and non-repudiation purposes.
... In this work we experimentally and numerically studied the effect of spatial where permuted nucleobases are embedded in a longer DNA sequences. In particular, studying SERS/Raman spectra as a function of intermolecular [45] or nearest neighbor interaction may potentially enable more accurate SERS-based sensing platforms and allow lower read-out error needed to meet bio-applications [31], support DNA memory applications [46] and allow novel tool to control Raman/SERS response of complex molecules. Furthermore, our DFT simulation, presenting mode splitting and normal modes dependence on the position along the phosphate backbone, may lead to a novel structure dependent mechanism for inhomogeneous broadening of the peaks in complex molecules such as ssDNA sequences, which may be of importance from both fundamental and application perspectives. ...
Preprint
Full-text available
Surface enhanced Raman scattering (SERS) process results in a tremendous increase of Raman scattering cross section of molecules adsorbed to plasmonic metals and influenced by numerous physico-chemical factors such as geometry and optical properties of the metal surface, orientation of chemisorbed molecules and chemical environment. While SERS holds promise for single molecule sensitivity and optical sensing of DNA sequences, more detailed understanding of the rich physico-chemical interplay between various factors is needed to enhance predictive power of existing and future SERS-based DNA sensing platforms. In this work we report on experimental results indicating that SERS spectra of adsorbed single-stranded DNA (ssDNA) isomers depend on the order on which individual bases appear in the 3-base long ssDNA due to intra-molecular interaction between DNA bases. Furthermore, we experimentally demonstrate that the effect holds under more general conditions when the molecules don't experience chemical enhancement due to resonant charge transfer effect and also under standard Raman scattering without electromagnetic or chemical enhancements. Our numerical simulations qualitatively support the experimental findings and indicate that base permutation results in modification of both Raman and chemically enhanced Raman spectra.
... While in this work we considered only 3-base long sequences to simplify the system and potentially enhance the effect, we expect this work to facilitate future theoretical studies, as well as experimental TERS-based sensing platforms, which admit sufficiently high spatial resolution to enable excitation of just few bases [19,[42][43][44] to study sensing of longer ssDNA molecules where permuted nucleobases are embedded in a longer DNA sequences. In particular, studying SERS/Raman spectra as a function of intermolecular [45] or nearest neighbor interaction may potentially enable more accurate SERS-based sensing platforms and allow lower read-out error needed to meet bio-applications [31], support DNA memory applications [46] and allow novel tool to control Raman/SERS response of complex molecules. Furthermore, our DFT simulation, presenting mode splitting and normal modes dependence on the position along the phosphate backbone, may lead to a novel structure dependent mechanism for inhomogeneous broadening of the peaks in complex molecules such as ssDNA sequences, which may be of importance from both fundamental and application perspectives. ...
Article
Full-text available
Surface-enhanced Raman scattering (SERS) process results in a tremendous increase of Raman scattering cross section of molecules adsorbed to plasmonic metals and influenced by numerous physico-chemical factors such as geometry and optical properties of the metal surface, orientation of chemisorbed molecules and chemical environment. While SERS holds promise for single molecule sensitivity and optical sensing of DNA sequences, more detailed understanding of the rich physico-chemical interplay between various factors is needed to enhance predictive power of existing and future SERS-based DNA sensing platforms. In this work, we report on experimental results indicating that SERS spectra of adsorbed single-stranded DNA (ssDNA) isomers depend on the order on which individual bases appear in the 3-base long ssDNA due to intramolecular interaction between DNA bases. Furthermore, we experimentally demonstrate that the effect holds under more general conditions when the molecules do not experience chemical enhancement due to resonant charge transfer effect and also under standard Raman scattering without electromagnetic or chemical enhancements. Our numerical simulations qualitatively support the experimental findings and indicate that base permutation results in modification of both Raman and chemically enhanced Raman spectra.
Article
DNA is considered as an ideal supramolecular material in information storage with high storage density and long-term stability. Enzymes, as green and sustainable tools, offer several unique advantages in DNA-based...
Article
Full-text available
Deoxyribonucleic acid (DNA), the oldest natural storage medium, offers a highly promising mode for information computing and storage with its high information density, low maintenance costs, and the ability to do parallel processing. DNA cryptography is an emerging discipline focused on achieving information security within this new paradigm. In this paper, we first present the foundational concepts of cryptography and biological technologies involved in DNA cryptography. We then comprehensively review 2 types of DNA cryptography: pseudo-DNA cryptography and natural DNA cryptography. After summarizing and discussing the security foundations of these cryptographic methods, we highlight the main challenges relating to measurability, standard protocols, robustness, and operability. Finally, we outline future directions for DNA cryptography, hoping to facilitate the evolution of this nascent field.
Article
In the wake of the swift evolution of technologies such as the Internet of Things (IoT), the global data landscape is undergoing an exponential surge, propelling DNA storage into the spotlight as a prospective medium for contemporary cloud storage applications. This paper introduces a Semantic Artificial Intelligence-enhanced DNA storage (SemAI-DNA) paradigm, distinguishing itself from prevalent deep learning-based methodologies through two key modifications: 1) embedding a semantic extraction module at the encoding terminus, facilitating the meticulous encoding and storage of nuanced semantic information; 2) conceiving a forethoughtful multi-reads filtering model at the decoding terminus, leveraging the inherent multi-copy propensity of DNA molecules to bolster system fault tolerance, coupled with a strategically optimized decoder’s architectural framework. Numerical results demonstrate the SemAI-DNA’s efficacy, attaining 2.61 dB Peak Signal-to-Noise Ratio (PSNR) gain and 0.13 improvement in Structural Similarity Index (SSIM) over conventional deep learning-based approaches.
Article
Full-text available
It is more important now than ever to develop more effective data storage techniques in the constantly advancing field of computers. Due to their limits in terms of durability, information density, and physical space requirements, current storage technologies, such as magnetic and optical media, are finding it difficult to keep up with the global demand for data storage, which is expected to exceed 175 trillion gigabytes by 2025. Nature’s own method of storing genetic information for a long time, deoxyribonucleic acid (DNA), provides great data density, stability, and endurance, making it a viable solution to address this issue. Through a comprehensive analysis of existing literature, this paper aims to provide a nuanced understanding of DNA computing’s current state - mainly its trajectory towards becoming a viable alternative to traditional storage methods - highlighting the history, current developments/challenges, and future prospects.
Conference Paper
DNA storage is a new and promising method of data storage, offering remarkable characteristics such as high density, durability, and easy maintenance, making it ideal for data archiving. However, DNA storage has not been widely applied till now due to some inherent limitations, such as the high cost of DNA synthesis, unavoidable errors during DNA synthesis and sequencing processes, and difficulties in accomplishing random I/O. To enhance the robustness of DNA storage, conventional approaches include increasing redundancy, generating amplifications, and designing error-correcting codes (ECC) for a single DNA sequence structure. This paper focuses on image DNA storage and presents an innovative and robust DNA encoding strategy based on a residual convolutional neural network (CNN). When recovering images with noise or missing points, our experimental results show high reliability which can achieve a satisfying peak signal-to-noise ratio (PSNR) value when the error rate is less than or equal to 10%. Therefore, our work represents a viable solution to improve the robustness of image DNA storage.
Article
DNA has emerged as an attractive medium for storing large amounts of data due to its high information density, long-term stability, and low energy consumption. However, in contrast to commercially available storage media, DNA-based data storage currently falls behind in terms of writing and reading speeds, waste as well as cost. To harness the full potential of DNA as a data storage medium, it is imperative to advance high-throughput DNA synthesis without compromising cost and pollution. Industry-standard phosphoramidite DNA synthesis has reached its limitation because of its short nucleotide length (< 200), overconsumption of organic solvents leading to the production of toxic wastes, and slow writing speed. Enzymatic DNA synthesis shows promise as a replacement with long nucleotides, an environmentally friendly process, and fast writing speed. In this review, we overview enzymatic DNA synthesis methods, evaluate current methods that utilize high-throughput and parallel synthesis, and conclude with comments on how enzymatic DNA synthesis can be the answer to DNA data storage.
Article
Full-text available
Networks of interacting DNA oligomers are useful for applications such as biomarker detection, targeted drug delivery, information storage, and photonic information processing. However, differences in the chemical kinetics of hybridization reactions, referred to as kinetic dispersion, can be problematic for some applications. Here, it is found that limiting unnecessary stretches of Watson-Crick base pairing, referred to as unnecessary duplexes, can yield exceptionally low kinetic dispersions. Hybridization kinetics can be affected by unnecessary intra-oligomer duplexes containing only 2 base-pairs, and such duplexes explain up to 94% of previously reported kinetic dispersion. As a general design rule, it is recommended that unnecessary intra-oligomer duplexes larger than 2 base-pairs and unnecessary inter-oligomer duplexes larger than 7 base-pairs be avoided. Unnecessary duplexes typically scale exponentially with network size, and nearly all networks contain unnecessary duplexes substantial enough to affect hybridization kinetics. A new method for generating networks which utilizes in-silico optimization to mitigate unnecessary duplexes is proposed and demonstrated to reduce in-vitro kinetic dispersions as much as 96%. The limitations of the new design rule and generation method are evaluated in-silico by creating new oligomers for several designs, including three previously programmed reactions and one previously engineered structure.
Preprint
Full-text available
Networks of interacting DNA oligomers are useful for applications such as biomarker detection, targeted drug delivery, information storage, and photonic information processing. However, differences in the chemical kinetics of hybridization reactions, referred to as kinetic dispersion, can be problematic for certain applications. Here, it is found that controlling known factors is sufficient to mitigate most kinetic dispersion. Eliminating complementary base-sequences which are not part of the desired hybridization reaction, referred to as unnecessary duplexes, is key to achieving exceptionally low kinetic dispersions. An analysis of existing experimental data indicates that unnecessary duplexes explain up to 94% of previously reported kinetic dispersion. Nearly all networks are found to contain unnecessary duplexes substantial enough to affect hybridization kinetics. New networks are generated using in-silico optimization, reducing in-vitro kinetic dispersion up to 86%. Limitations of the generation method are tested by creating oligomers for three previously programmed reactions and one previously engineered structure.
Article
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention owing to its advantages of high storage density, long storage time, and easy maintenance. However, the base utilization and information density of existing DNA storage methods are insufficient. Therefore, this study proposes a rotational coding based on blocking strategy (RBS) for encoding digital information such as text and images in DNA data storage. This strategy satisfies multiple constraints and produces low error rates in synthesis and sequencing. To illustrate the superiority of the proposed strategy, it was compared and analyzed with existing strategies in terms of entropy value change, free energy size, and Hamming distance. The experimental results show that the proposed strategy has higher information storage density and better coding quality in DNA storage, so it will improve the efficiency, practicality, and stability of DNA storage.
Article
Digital synthetic polymers with uniform chain lengths and defined monomer sequences have recently become intriguing alternatives to traditional silicon-based information devices or natural biomacromolecules for data storage. The structural diversity of information-containing macromolecules endows the digital synthetic polymers with higher stability and storage density but less occupied space. Through subtly designing each unit of coded structure, the information can be readily encoded into digital synthetic polymers in a more economical scheme and more decodable, opening up new avenues for molecular digital data storage with high-level security. This tutorial review summarizes recent advances in salient features of digital synthetic polymers for data storage, including encoding, decoding, editing, erasing, encrypting, and repairing. The current challenges and outlook are finally discussed to offer potential solution guidance and new perspectives for the creation of next-generation digital synthetic polymers and broaden the scope of their applicability.
Article
As an emerging storage medium, DNA is a crucial carrier for future data storage. DNA has a better storage density and a longer storage life than conventional storage media. However, DNA's key limitations as a storage medium are the precision of information reading and storage stability. Herein, we designed and successfully developed a switchable DNA storage nanocomposite structure (MM-DNA/DNA'-SiO2). The composite structure employs magnetic microspheres (MMs) based on Fe3O4 nanoparticles as the DNA carrier for storing information (MM-DNA) and silicon dioxide nanoparticles modified with complementary strands as the protective outer layer (DNA'-SiO2), and its switchability and DNA sequence reading accuracy were both validated in this study. This method provides a novel nanostructure medium and technical proposal for efficient DNA storage with potential application value.
Article
Background DNA storage is becoming a global research hotspot in recent years and today, most researches are focused on storage density and big data. The security of DNA storage needs to be observed. Some DNA-based security methods were introduced for traditional information security problems. However, few encryption algorithm considered the limitation of biotechnology and applied it for DNA storage. The difference between DNA cryptography and the traditional one is that the former is based on the limitation of biotechnology, which is unrelated to numeracy. Objective An extended XOR algorithm (EXA) was introduced for encryption with constriants of biotechnology, which can solve the problems of synthesis and sequencing partly, such as GC content and homopolymer in DNA storage.. Methods The target file was converted by a quaternary DNA storage model to maximize the storage efficiency. The key file could be ‘anything’ converted into DNA sequence by a binary DNA storage model to make the best utilization for the length of the key file. Results The input files were encrypted into DNA storage and decrypted to error-free output files. Conclusion This meant error-free encryption DNA storage is feasible and EXA paves the way for encryption in large-scale DNA storage.
Article
In 2019, at the World Economic Forum, DNA data storage was indicated as one of the breakthroughs expected to radically impact the global socio-economic order. Indeed, dry DNA is a relatively stable substance and an extremely capacious information carrier. One gram of DNA can hold up to 455 exabytes, provided that one nucleotide encodes two bits of information. In this critical review, the main attention is paid to nucleinography, meaning the conversion of digital data into nucleotide sequences. The evolution and diversity of approaches intended for encoding data with nucleotides are demonstrated. The most noticeable examples of storing minor as well as considerable quantities of non-biological information in DNA are given. Some issues of DNA data storage are also reported.
Article
Advances in synthetic chemistry have enabled abiotic, sequence defined polymers to imitate the structures and functions once exclusive to DNA. Indeed, the vast library of accessible backbones and pendant-group functionalities afford synthetic polymers an advantage over DNA in emerging applications as they can be tailored for stability or performance. Moreover, novel methodologies for sequencing and conjugation have been leveraged to elevate the versatility of discrete macromolecules. This review highlights abiotic, sequence-defined polymers in their capacity to mimic the primary functions of DNA – data storage and retrieval, sequence-specific self-assembly of duplexes, and replication and synthetic templating of new macromolecules.
Article
Full-text available
Deoxyribonucleic acid (DNA) comprises four nucleotides and twenty amino acids (a combination of nucleotides) that generate living organisms’ structures. These discrete components, jointly with DNA characteristics and functions, allow understanding the DNA as a digital component. Thus, when DNA is considered an organic digital memory, it becomes a compelling data storage medium given its superior density, stability, energy efficiency, longevity, and lack of foreseeable technical obsolescence compared with conventional electronic media. Furthermore, various challenging experiments (described in this work) have demonstrated that digital information (regardless of its type, i.e., text, audio, video, image) can be written in DNA, stored, and accurately read. On the other hand, since nature has designed DNA with a tremendous capacity to store information, compression techniques (also described in this work) are required for appropriately managing this enormous quantity of information. Finally, we discuss a bit’s representation for nucleotides and amino acids due to DNA digital characteristics.
Chapter
Over the last few decades, there has been a rapidly growing interest in using nanostructures for biomedical applications such as bio-imaging, bio-sensing, targeted drug delivery, and therapy. This is particularly due to their unique size-dependent optical, electrical, catalytic, magnetic, and other properties. Nanostructures can be successfully used for biomedical applications only after appropriate ligand functionalization. In this chapter, we will discuss about various strategies for the preparations of ligand-functionalized colloidal nanostructures and their selective biomedical applications.
Chapter
Full-text available
The ability of nucleobases, nucleosides, nucleotides and their derivatives to support supramolecular interactions has enabled the construction of a variety of architectures. In particular, nucleolipid hybrids have gained significant interest as they serve as excellent scaffolds for the bottom-up generation of hierarchical assemblies with wide biomedical and material applications. In this chapter, we provided a detailed discussion on the recent advances in the design and applications of nucleolipid assemblies. First, we discuss various design approaches in synthesizing nucleolipid supramolecular synthons and the various self-assembled architectures they form. In the second part, recent applications of nucleolipid assemblies are reviewed in detail. Emphasis is laid on assemblies that can be used as delivery tools, injectable gels, tissue engineering scaffolds, sensors and environment remedial systems. Easy synthesis, ability to tune the assembling process and useful applications of the nucleolipid architectures discussed in this chapter underscore the high potential of nucleolipid assemblies in the real-life applications.
Article
With the explosion of data, DNA is considered as an ideal carrier for storage due to its high storage density. However, low-quality DNA sets hamper the widespread use of DNA storage. This work proposes a new method to design high-quality DNA storage sets. Firstly, random switch and double-weight offspring strategies are introduced in Double-strategy Black Widow Optimization Algorithm (DBWO). Experimental results of 26 benchmark functions show that the exploration and exploitation abilities of DBWO are greatly improved from previous work. Secondly, DBWO is applied in designing DNA storage sets, and compared with previous work, the lower bounds of storage sets are boosted by 9%–37%. Finally, to improve the poor stabilities of sequences, the End-constraint is proposed in designing DNA storage sets. By measuring the number of hairpin structures, melting temperature, and minimum free energy, it is evaluated that with our innovative constraint, DBWO can construct not only a larger number of storage sets, but also enhance physical and thermodynamic properties of DNA storage sets.
Article
DNA storage has been a thriving interdisciplinary research area because of its high density, low maintenance cost, and long durability for information storage. However, the complexity of errors in DNA sequences including substitutions, insertions and deletions hinders its application for massive data storage. Motivated by the divide-and-conquer algorithm, we propose a hierarchical error correction strategy for text DNA storage. The basic idea is to design robust codes for common characters which have one-base error correction ability including insertion and/or deletion. The errors are gradually corrected by the codes in DNA reads, multiple alignment of character lines, and finally word spelling. On one hand, the proposed encoding method provides a systematic way to design storage friendly codes, such as 50% GC content, no more than 2-base homopolymers, and robustness against secondary structures. On the other hand, the proposed error correction method not only corrects single insertion or deletion, but also deals with multiple insertions or deletions. Simulation results demonstrate that the proposed method can correct more than 98% errors when error rate is less than or equal to 0.05. Thus, it is more powerful and adaptable to the complicated DNA storage applications.
Chapter
RNA is a versatile biopolymer that can fold into a variety of nano-sized architectures using its ability of programmable self-assembly through base pairing. Following the trail of DNA nanotechnology, RNA nanotechnology has recently emerged as a new field and has attracted widespread attention because of its applications mainly in the field of nanomedicine. RNA nanoarchitectures are mainly constructed using the bottom-up assembly utilizing small RNA motifs generally found in folded RNA structures in living systems. RNA, apart from its role in central dogma, also provides many functional modules such as aptamer that binds to specific targets or the ribozymes that do catalysis mimicking the role of proteins. The nanoarchitecture constructed using the principles of RNA nanotechnology can be decorated with these functional RNA modules that are useful for many biological applications including drug delivery. In this chapter, we will review the basic principles of constructing RNA nanoarchitectures using both natural and artificial RNA motifs that can be used similar to Lego pieces to design complex architectures. We will also provide a brief overview of various applications of these RNA nanoarchitectures in diverse fields.
Chapter
This chapter briefly overviews DNA and RNA computing systems, concentrating on the motivation, origin, recent achievements, and future perspectives. While the book is a comprehensive review of the topic, the chapter, being addressed to newcomers in the area, is easy for reading and provides the most important references for students and researchers starting their work in this sophisticated area.
Chapter
Global structures are geared towards sustaining trade relationships that help economies achieve economic prosperity. Cities, both through their geographies and through their increasing economic role, help in this, and can even be seen to emerge as global superpowers, where some urban economic trump the economic performance of entire nations. As the role of capitalism gains in strength and cities reinforce their agenda as being global economic leaders, there will be a natural tendency towards models aimed towards better performance and efficiency in urban spheres. On this, the role of technology -coupled with urban governance is hailed. Smart cities is seen to aid but is limited in the sense that data is computed but await decisions from people—a step which can be cost deficit. This chapter supports that there will be a natural tendency to move towards autonomous cities to better support economic motives.
Article
Here we report a simple and flexible method for DNA data storage based on Perl script. For this approach, the text data of the preamble of the “Universal Declaration of Human Rights” consisting of 2,046 words was encoded into the corresponding 8,148 base pairs of DNA using Perl-based encoding with a hash table. The encoded DNA sequences were then artificially synthesized for storage. The information DNA consisted of a total of 22 chemically synthesized DNA fragments with 400 nucleotides each, which were inserted into a cloning vector to multiply the plasmid DNA. The nucleotide integrity of the data-carrying DNA sequences were ensured under the accelerated aging conditions. Also, an erroneous nucleotide in the information DNA sequences was successfully corrected using the overlap extension PCR method. The stored DNA was read by sequencing, and the resulting DNA sequence information was successfully decoded to convert the DNA records back to the original document. Our results indicate that textual data can be stored in DNA using a simple, easy, and flexible Perl by running a script from the command line.
Article
Full-text available
DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.
Article
Full-text available
Steganography provides unconventional solutions to protect communication as well as copyright of intellectual property. In this paper, we propose a steganographic method that exploits some characteristics of the Deoxyribonucleic Acid (DNA) for hiding other types of digital content. The proposed work is an extension to [1], as it provides a solution to the problem that the sender and the receiver have to secretly communicate both the stego-DNA and the reference sequence. Communicating such information could be suspicious and reveals the secrecy of the steganographic channel itself. Thus, the proposed hiding method is implemented in two main stages: the first one hides the secret message into some reference DNA sequence using a generic substitution technique. The next phase employs a self embedding algorithm that randomly inserts the stego-DNA sequence into the reference one. In this way, the extraction process can be done blindly and the communicating parties don't actually have to exchange anything in advance but the secret key. Furthermore, a DNA-based playfair ciphering is applied on the secret data before embedding in order to increase the security of the hiding algorithm. When compared with other hiding methods, the proposed method showed an outstanding performance providing high security and embedding capacity.
Article
Full-text available
We describe the first DNA-based storage architecture that enables random access to data blocks and rewriting of information stored at arbitrary locations within the blocks. The newly developed architecture overcomes drawbacks of existing read-only methods that require decoding the whole file in order to read one data fragment. Our system is based on new constrained coding techniques and accompanying DNA editing methods that ensure data reliability, specificity and sensitivity of access, and at the same time provide exceptionally high data storage capacity. As a proof of concept, we encoded parts of the Wikipedia pages of six universities in the USA, and selected and edited parts of the text written in DNA corresponding to three of these schools. The results suggest that DNA is a versatile media suitable for both ultrahigh density archival and rewritable storage applications.
Article
Full-text available
The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr bp). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr bp), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr bp), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr bp, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication.
Article
Full-text available
Digital production, transmission and storage have revolutionized how we access and use information but have also made archiving an increasingly complex task that requires active, continuing maintenance of digital media. This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high-density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information or were not amenable to scaling-up, and used no robust error-correction and lacked examination of their cost-efficiency for large-scale information archival. Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information of 5.2 × 10(6) bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.
Article
Full-text available
Claims of extreme survival of DNA have emphasized the need for reliable models of DNA degradation through time. By analysing mitochondrial DNA (mtDNA) from 158 radiocarbon-dated bones of the extinct New Zealand moa, we confirm empirically a long-hypothesized exponential decay relationship. The average DNA half-life within this geographically constrained fossil assemblage was estimated to be 521 years for a 242 bp mtDNA sequence, corresponding to a per nucleotide fragmentation rate (k) of 5.50 × 10(-6) per year. With an effective burial temperature of 13.1°C, the rate is almost 400 times slower than predicted from published kinetic data of in vitro DNA depurination at pH 5. Although best described by an exponential model (R(2) = 0.39), considerable sample-to-sample variance in DNA preservation could not be accounted for by geologic age. This variation likely derives from differences in taphonomy and bone diagenesis, which have confounded previous, less spatially constrained attempts to study DNA decay kinetics. Lastly, by calculating DNA fragmentation rates on Illumina HiSeq data, we show that nuclear DNA has degraded at least twice as fast as mtDNA. These results provide a baseline for predicting long-term DNA survival in bone.
Article
Full-text available
Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. We developed a strategy to encode arbitrary digital information in DNA, wrote a 5.27-megabit book using DNA microchips, and read the book by using next-generation DNA sequencing.
Article
Full-text available
Polar bears (PBs) are superbly adapted to the extreme Arctic environment and have become emblematic of the threat to biodiversity from global climate change. Their divergence from the lower-latitude brown bear provides a textbook example of rapid evolution of distinct phenotypes. However, limited mitochondrial and nuclear DNA evidence conflicts in the timing of PB origin as well as placement of the species within versus sister to the brown bear lineage. We gathered extensive genomic sequence data from contemporary polar, brown, and American black bear samples, in addition to a 130,000- to 110,000-y old PB, to examine this problem from a genome-wide perspective. Nuclear DNA markers reflect a species tree consistent with expectation, showing polar and brown bears to be sister species. However, for the enigmatic brown bears native to Alaska's Alexander Archipelago, we estimate that not only their mitochondrial genome, but also 5-10% of their nuclear genome, is most closely related to PBs, indicating ancient admixture between the two species. Explicit admixture analyses are consistent with ancient splits among PBs, brown bears and black bears that were later followed by occasional admixture. We also provide paleodemographic estimates that suggest bear evolution has tracked key climate events, and that PB in particular experienced a prolonged and dramatic decline in its effective population size during the last ca. 500,000 years. We demonstrate that brown bears and PBs have had sufficiently independent evolutionary histories over the last 4-5 million years to leave imprints in the PB nuclear genome that likely are associated with ecological adaptation to the Arctic environment.
Article
Full-text available
The Tyrolean Iceman, a 5,300-year-old Copper age individual, was discovered in 1991 on the Tisenjoch Pass in the Italian part of the Ötztal Alps. Here we report the complete genome sequence of the Iceman and show 100% concordance between the previously reported mitochondrial genome sequence and the consensus sequence generated from our genomic data. We present indications for recent common ancestry between the Iceman and present-day inhabitants of the Tyrrhenian Sea, that the Iceman probably had brown eyes, belonged to blood group O and was lactose intolerant. His genetic predisposition shows an increased risk for coronary heart disease and may have contributed to the development of previously reported vascular calcifications. Sequences corresponding to ~60% of the genome of Borrelia burgdorferi are indicative of the earliest human case of infection with the pathogen for Lyme borreliosis.
Article
Full-text available
A study on method of encoding meaningful information as DNA sequences was presented. It was found that the embedded information survived its rough handling in in the mail, providing that a DNA strand can be as dependable as a piece of paper in terms of information storage. The developments done in finding a super-dependable storage medium to ensure adequate protection for the encoded DNA strands were discussed. Recent advances in genetic engineering, that have allowed the introduction of foreign DNA molecules into the living cells of bacteria, humans, and other organisms were also presented.
Article
Full-text available
DNA that has been recovered from archaeological and palaeontological remains makes it possible to go back in time and study the genetic relationships of extinct organisms to their contemporary relatives. This provides a new perspective on the evolution of organisms and DNA sequences. However, the field is fraught with technical pitfalls and needs stringent criteria to ensure the reliability of results, particularly when human remains are studied.
Article
Full-text available
We report the design, synthesis, and assembly of the 1.08–mega–base pair Mycoplasma mycoides JCVI-syn1.0 genome starting from digitized genome sequence information and its transplantation into a M. capricolum recipient cell to create new M. mycoides cells that are controlled only by the synthetic chromosome. The only DNA in the cells is the designed synthetic DNA sequence, including “watermark” sequences and other designed gene deletions and polymorphisms, and mutations acquired during the building process. The new cells have expected phenotypic properties and are capable of continuous self-replication.
Article
Full-text available
An improved Huffman coding method for information storage in DNA is described. The method entails the utilization of modified unambiguous base assignment that enables efficient coding of characters. A plasmid-based library with efficient and reliable information retrieval and assembly with uniquely designed primers is described. We illustrate our approach by synthesis of DNA that encodes text, images, and music, which could easily be retrieved by DNA sequencing using the specific primers. The method is simple and lends itself to automated information retrieval.
Article
Full-text available
The microdot is a means of concealing messages (steganography) that was developed by Professor Zapp and used by German spies in the Second World War to transmit secret information. A microdot (``the enemy's masterpiece of espionage'') was a greatly reduced photograph of a typewritten page that was pasted over a full stop in an innocuous letter. We have taken the microdot a step further and developed a DNA-based, doubly steganographic technique for sending secret messages. A DNA-encoded message is first camouflaged within the enormous complexity of human genomic DNA and then further concealed by confining this sample to a microdot.
Article
Full-text available
High-throughput direct sequencing techniques have recently opened the possibility to sequence genomes from Pleistocene organisms. Here we analyze DNA sequences determined from a Neandertal, a mammoth, and a cave bear. We show that purines are overrepresented at positions adjacent to the breaks in the ancient DNA, suggesting that depurination has contributed to its degradation. We furthermore show that substitutions resulting from miscoding cytosine residues are vastly overrepresented in the DNA sequences and drastically clustered in the ends of the molecules, whereas other substitutions are rare. We present a model where the observed substitution patterns are used to estimate the rate of deamination of cytosine residues in single- and double-stranded portions of the DNA, the length of single-stranded ends, and the frequency of nicks. The results suggest that reliable genome sequences can be obtained from Pleistocene organisms. • 454 • deamination • depurination • paleogenomics
Article
Full-text available
The scale of environmental impacts associated with the manufacture of microchips is characterized through analysis of material and energy inputs into processes in the production chain. The total weight of secondary fossil fuel and chemical inputs to produce and use a single 2-gram 32MB DRAM chip are estimated at 1600 g and 72 g, respectively. Use of water and elemental gases (mainly N2) in the fabrication stage are 32,000 and 700 g per chip, respectively. The production chain yielding silicon wafers from quartz uses 160 times the energy required for typical silicon, indicating that purification to semiconductor grade materials is energy intensive. Due to its extremely low-entropy, organized structure, the materials intensity of a microchip is orders of magnitude higher than that of "traditional" goods. Future analysis of semiconductor and other low entropy high-tech goods needs to include the use of secondary materials, especially for purification.
Article
DNA is an excellent medium for archiving data. Recent efforts have illustrated the potential for information storage in DNA using synthesized oligonucleotides assembled in vitro. A relatively unexplored avenue of information storage in DNA is the ability to write information into the genome of a living cell by the addition of nucleotides over time. Using the Cas1-Cas2 integrase, the CRISPR-Cas microbial immune system stores the nucleotide content of invading viruses to confer adaptive immunity. When harnessed, this system has the potential to write arbitrary information into the genome. Here we use the CRISPR-Cas system to encode the pixel values of black and white images and a short movie into the genomes of a population of living bacteria. In doing so, we push the technical limits of this information storage system and optimize strategies to minimize those limitations. We also uncover underlying principles of the CRISPR-Cas adaptation system, including sequence determinants of spacer acquisition that are relevant for understanding both the basic biology of bacterial adaptation and its technological applications. This work demonstrates that this system can capture and stably store practical amounts of real data within the genomes of populations of living cells.
Article
A reliable and efficient DNA storage architecture DNA has the potential to provide large-capacity information storage. However, current methods have only been able to use a fraction of the theoretical maximum. Erlich and Zielinski present a method, DNA Fountain, which approaches the theoretical maximum for information stored per nucleotide. They demonstrated efficient encoding of information—including a full computer operating system—into DNA that could be retrieved at scale after multiple rounds of polymerase chain reaction. Science , this issue p. 950
Article
Modern archiving technology cannot keep up with the growing tsunami of bits. But nature may hold an answer to that problem already.
Article
Demand for data storage is growing exponentially, but the capacity of existing storage media is not keeping up. Using DNA to archive data is an attractive possibility because it is extremely dense, with a raw limit of 1 exabyte/mm³ (109 GB/mm³), and long-lasting, with observed half-life of over 500 years. This paper presents an architecture for a DNA-based archival storage system. It is structured as a key-value store, and leverages common biochemical techniques to provide random access. We also propose a new encoding scheme that offers controllable redundancy, trading off reliability for density. We demonstrate feasibility, random access, and robustness of the proposed encoding with wet lab experiments involving 151 kB of synthesized DNA and a 42 kB random-access subset, and simulation experiments of larger sets calibrated to the wet lab experiments. Finally, we highlight trends in biotechnology that indicate the impending practicality of DNA storage for much larger datasets.
Article
We report on a strong capacity boost in storing digital data in synthetic DNA. In principle, synthetic DNA is an ideal media to archive digital data for very long times because the achievable data density and longevity outperforms today's digital data storage media by far. On the other hand, neither the synthesis, nor the amplification and the sequencing of DNA strands can be performed error-free today and in the foreseeable future. In order to make synthetic DNA available as digital data storage media, specifically tailored forward error correction schemes have to be applied. For the purpose of realizing a DNA data storage, we have developed an efficient and robust forwarderror-correcting scheme adapted to the DNA channel. We based the design of the needed DNA channel model on data from a proof-of-concept conducted 2012 by a team from the Harvard Medical School [1]. Our forward error correction scheme is able to cope with all error types of today's DNA synthesis, amplification and sequencing processes, e.g. insertion, deletion, and swap errors. In a successful experiment, we were able to store and retrieve error-free 22 MByte of digital data in synthetic DNA recently. The found residual error probability is already in the same order as it is in hard disk drives and can be easily improved further. This proves the feasibility to use synthetic DNA as longterm digital data storage media.
Article
It was suggested more than thirty years ago that Watson-Crick base pairing might be used for the rational design of nanometre-scale structures from nucleic acids. Since then, and especially since the introduction of the origami technique, DNA nanotechnology has enabled increasingly more complex structures. But although general approaches for creating DNA origami polygonal meshes and design software are available, there are still important constraints arising from DNA geometry and sense/antisense pairing, necessitating some manual adjustment during the design process. Here we present a general method of folding arbitrary polygonal digital meshes in DNA that readily produces structures that would be very difficult to realize using previous approaches. The design process is highly automated, using a routeing algorithm based on graph theory and a relaxation simulation that traces scaffold strands through the target structures. Moreover, unlike conventional origami designs built from close-packed helices, our structures have a more open conformation with one helix per edge and are therefore stable under the ionic conditions usually used in biological assays.
Article
Structural DNA nanotechnology and the DNA origami technique, in particular, have provided a range of spatially addressable two- and three-dimensional nanostructures. These structures are, however, typically formed of tightly packed parallel helices. The development of wireframe structures should allow the creation of novel designs with unique functionalities, but engineering complex wireframe architectures with arbitrarily designed connections between selected vertices in three-dimensional space remains a challenge. Here, we report a design strategy for fabricating finite-size wireframe DNA nanostructures with high complexity and programmability. In our approach, the vertices are represented by n × 4 multi-arm junctions (n = 2-10) with controlled angles, and the lines are represented by antiparallel DNA crossover tiles of variable lengths. Scaffold strands are used to integrate the vertices and lines into fully assembled structures displaying intricate architectures. To demonstrate the versatility of the technique, a series of two-dimensional designs including quasi-crystalline patterns and curvilinear arrays or variable curvatures, and three-dimensional designs including a complex snub cube and a reconfigurable Archimedean solid were constructed.
Article
This review focuses on how to use DNA nanostructures as scaffolds to organize biological molecules. First, we introduce the use of structural DNA nanotechnology to engineer rationally designed nanostructures. Second, we survey approaches used to generate protein-DNA conjugates. Third, we discuss studies exploring DNA scaffolds to create DNA nanodevices to analyze protein structures, to engineer enzyme pathways, to create artificial light-harvesting systems, and to generate nanomachines in vitro and in vivo. Future challenges and perspectives of using DNA nanostructures as programmable biomolecular scaffolds are addressed at the end.
Article
In this digital age, the technology used for information storage is undergoing rapid advances. Data currently being stored in magnetic or optical media will probably become unrecoverable within a century or less, through the combined effects of hardware and software obsolescence and decay of the
Article
Information, such as text printed on paper or images projected onto microfilm, can survive for over 500 years. However, the storage of digital information for time frames exceeding 50 years is challenging. Here we show that digital information can be stored on DNA and recovered without errors for considerably longer time frames. To allow for the perfect recovery of the information, we encapsulate the DNA in an inorganic matrix, and employ error-correcting codes to correct storage-related errors. Specifically, we translated 83 kB of information to 4991 DNA segments, each 158 nucleotides long, which were encapsulated in silica. Accelerated aging experiments were performed to measure DNA decay kinetics, which show that data can be archived on DNA for millennia under a wide range of conditions. The original information could be recovered error free, even after treating the DNA in silica at 70 °C for one week. This is thermally equivalent to storing information on DNA in central Europe for 2000 years.
Article
A simple DNA-based data storage scheme is demonstrated in which information is written using "addressing" oligonucleotides. In contrast to other methods that allow arbitrary code to be stored, the resulting DNA is suitable for downstream enzymatic and biological processing. This capability is crucial for DNA computers and may allow for a diverse array of computational operations to be carried out using this DNA. Although here we use gel-based methods for information readout, we also propose more advanced methods involving protein/DNA complexes and atomic force microscopy/nanopore schemes for data readout.
Article
We have recently succeeded in synthesizing long oligonucleotides (90-mers) with high yield. This synthesis requires 360 virtual masks, and thus puts challenges on image placement and local contrast. We have updated our DNA synthesis modeling to Monte Carlo simulation from numerical approach. We also devised a method, called “Inverted Capping,” to remove sequence errors from edge scattering of light, which provides a large error reduction and the possibility of fabrication of higher resolutions. Finally, we have also implemented an image locking method to eliminate image drifts.
Article
In just seven years, next-generation technologies have reduced the cost and increased the speed of DNA sequencing by four orders of magnitude, and experiments requiring many millions of sequencing reads are now routine. In research, sequencing is being applied not only to assemble genomes and to investigate the genetic basis of human disease, but also to explore myriad phenomena in organismic and cellular biology. In the clinic, the utility of sequence data is being intensively evaluated in diverse contexts, including reproductive medicine, oncology and infectious disease. A recurrent theme in the development of new sequencing applications is the creative 'recombination' of existing experimental building blocks. However, there remain many potentially high-impact applications of next-generation DNA sequencing that are not yet fully realized.
Article
We report a partial ndhF sequence (1528 bp) of Magnolia latahensis and a partial rbcL sequence (699 bp) of Persea pseudocarolinensis from the Clarkia fossil beds of Idaho, USA (Miocene; 17-20 million years [my] BP). The ndhF sequence from M. latahensis was identical to those of extant M. grandiflora, M. schiediana, M. guatemalensis, and M. tamaulipana. Parsimony analysis of the ndhF sequence of M. latahensis and previously reported ndhF sequences for Magnoliaceae placed M. latahensis within Magnolia as a member of the Theorhodon clade. This result is reasonable considering that: (1) the morphology of M. latahensis is very similar to that of extant M. grandiflora, and (2) a recent molecular phylogenetic study of Magnoliaceae showed that the maximum sequence divergence of ndhF among extant species is very low (1.05% in subfamily Magnolioideae) compared with other angiosperm families. We reanalyzed the previously reported rbcL sequence of M. latahensis with sequences for all major lineages of extant Magnoliales and Laurales. This sequence is sister to Liriodendron, rather than grouped with a close relative of M. grandiflora as predicted by morphology and the results of the ndhF analysis, possibly due to a few erroneous base calls in the sequences. The rbcL sequence of P. pseudocarolinensis differed from rbcL of extant Persea species by 3-6 nucleotides and from rbcL of extant Sassafras albidum by two nucleotides. Phylogenetic analyses of rbcL sequences for all major lineages of Magnoliales and Laurales placed the fossil P. pseudocarolinensis within Lauraceae and as sister to S. albidum. These results reinforce the suggestion that Clarkia and other similar sites hold untapped potential for molecular analysis of fossils.
Article
Three recent lawsuits are focusing public attention on the environmental and occupational health effects of the world's largest and fastest growing manufacturing sector-the $150 billion semiconductor industry. The suits allege that exposure to toxic chemicals in semiconductor manufacturing plants led to adverse health effects such as miscarriage and cancer among workers. To manufacture computer components, the semiconductor industry uses large amounts of hazardous chemicals including hydrochloric acid, toxic metals and gases, and volatile solvents. Little is known about the long-term health consequences of exposure to chemicals by semiconductor workers. According to industry critics, the semiconductor industry also adversely impacts the environment, causing groundwater and air pollution and generating toxic waste as a by-product of the semiconductor manufacturing process. In contrast, the U.S. Bureau of Statistics shows the semiconductor industry as having a worker illness rate of about one-third of the average of all manufacturers, and advocates defend the industry, pointing to recent research collaborations and product replacement as proof that semiconductor manufacturers adequately protect both their employees and the environment.
Article
Biotechnological methods can be used for cryptography. Here two different cryptographic approaches based on DNA binary strands are shown. The first approach shows how DNA binary strands can be used for steganography, a technique of encryption by information hiding, to provide rapid encryption and decryption. It is shown that DNA steganography based on DNA binary strands is secure under the assumption that an interceptor has the same technological capabilities as sender and receiver of encrypted messages. The second approach shown here is based on steganography and a method of graphical subtraction of binary gel-images. It can be used to constitute a molecular checksum and can be combined with the first approach to support encryption. DNA cryptography might become of practical relevance in the context of labelling organic and inorganic materials with DNA 'barcodes'.
Article
A simple, practical method to watermark short trademarks or signatures into genomic DNA is introduced. Since the marking method is biologically innocuous, it can be applied to all commercialized bacteria to help establish brand names for the engineered strains and to resolve legal disputes regarding gene-related patents. The first such strain of Bacillus subtilis is engineered and is ready to be distributed.
Article
DNA is an attractive memory unit because of its immense information density. Here, we describe a memory model made of DNA, called Nested Primer Molecular Memory (NPMM). NPMM consists of many DNA strands, and each DNA strand consists of two areas: a data area and a data address area. When the address of target data is specified, only the target data can be extracted from NPMM. In this paper, we evaluate the validity of the basic operations of NPMM and then discuss the feasibility of scaled-up NPMM through some laboratory experiments. In the latter, we deal with scaled-up NPMM simulated by the Concentration Scaling method.
Article
The practical realization of DNA data storage is a major scientific goal. Here we introduce a simple, flexible, and robust data storage and retrieval method based on sequence alignment of the genomic DNA of living organisms. Duplicated data encoded by different oligonucleotide sequences was inserted redundantly into multiple loci of the Bacillus subtilis genome. Multiple alignment of the bit data sequences decoded by B. subtilis genome sequences enabled the retrieval of stable and compact data without the need for template DNA, parity checks, or error-correcting algorithms. Combined with the computational simulation of data retrieval from mutated message DNA, a practical use of this alignment-based method is discussed.
Article
We developed a system to encode digital information in DNA polymers based on the partial restriction digest (PRD). Our encoding method relies on the length of the fragments obtained by the PRD rather than the actual content of the nucleotide sequence, thus eliminating the need for expensive sequencing machinery. In this letter, we report on the encoding of 12 bits of data in a DNA fragment of 110 nucleotides and the process of recovering the data.
Article
The future of integrated electronics is the future of electronics itself. Integrated circuits will lead to such wonders as home computers, automatic controls for automobiles, and personal portable communications equipment. But the biggest potential lies in the production of large systems. In telephone communications, integrated circuits in digital filters will separate channels on multiplex equipment. Integrated circuits will also switch telephone circuits and perform data processing. In addition, the improved reliability made possible by integrated circuits will allow the construction of larger processing units. Machines similar to those in existence today will be built at lower costs and with faster turnaround.
Physics of the future: How science will shape human destiny and our daily lives by the year 2100: Anchor
  • M Kaku
Biocompatible writing of data into DNA
  • G M Skinner
  • K Visscher
  • M Mansuripur
  • GM Skinner
Skinner GM, Visscher K, Mansuripur M (2007) Biocompatible writing of data into DNA. J Bionanosci 1:17-21
How much information is stored in the human genome?
  • Y Grigoryev
Grigoryev Y (2012) How much information is stored in the human genome? Technical report from BitesizeBio http://bites izebi o.com/8378/how-much-infor matio n-is-store d-in-the-human -genom e/
Life Expectancy: How long will magnetic media last
  • Van Bogart
  • Jw