DNA could be the future of data storage

A full operating system and film stored on DNA were recovered with no errors.

The world is churning out so much data that hard drives may not be able to keep up, leading researchers to look at DNA as a possible storage medium. DNA is ultra compact, and doesn’t degrade over time like cassettes and CDs. In a new study, Yaniv Erlich and Dina Zielinski demonstrate DNA’s full potential and reliability for storing data. The researchers wrote six files—a full computer operating system, a 1895 French film, an Amazon gift card, a computer virus, a Pioneer plaque, and a study by information theorist Claude Shannon—into 72,000 DNA strands, each 200 bases long. They then used sequencing technology to retrieve the data, and software to translate the genetic code back into binary. The files were recovered with no errors. We spoke with Erlich about the results, and what they mean for the future of data storage.

Yaniv Erlich and Dina Zielinski. Credit: New York Genome Center

ResearchGate: What motivated this study?

Yaniv Erlich: As humanity produces data at faster rates each year, progress in traditional data storage technologies has dramatically slowed over the last five years. This means that we need to think about new approaches for data storage.

RG: How does your study fit into this effort?

Erlich: We showed that we can reliably store information on DNA, and that our organizing of information approaches “optimal packing,” meaning it is nearly impossible to fit more information on the same amount of DNA material. We stored a film, an operating system, and other types of data on DNA molecules.

RG: How did you achieve this?

Erlich: We mapped the bits of the files to DNA nucleotides. Then, we synthesized these nucleotides and stored the molecules in a test-tube. To retrieve the information, we sequenced the molecules. This is the basic process. To pack the information, we devised a strategy—called DNA Fountain—that uses mathematical concepts from coding theory. It was this strategy that allowed us to achieve optimal packing, which was the most challenging aspect of the study.

RG: Why did you choose to use DNA?

Erlich: DNA has several big advantages. First, it is much smaller than traditional media. In fact, we showed that we can reach a density of 215 Petabytes per gram of DNA! Second, DNA lasts for an extended period of time, over 100 years, which is orders of magnitude more than traditional media. Try to listen to any disk from the 90s, and see if it’s still good. Finally, traditional media suffers from digital obsoleteness. My parents have 8 mm tapes that are basically useless now. DNA has been around for 3 billion years, and humanity is unlikely to lose its ability to read these molecules. If it does, we will have much bigger problems than data storage.

RG: How long do you think it will be until DNA storage is available to the general public?

Erlich: I would guess more than a decade. We are still in early days, but it also took magnetic media years of research and development before it became useful.

RG: What other applications do you foresee?

Erlich: DNA is versatile, and molecular biology offers an extensive toolkit to manipulate it. This opens the possibility of using molecular biology tools to assist computing. Usually, it is the other way around!

Featured image courtesy of Garrett Coakley.