Hard drives, flash drives, genetic drives? Hyunjun Park, co-founder and CEO of Catalog, explains how human DNA can be used to safeguard data.
Dan Patterson, senior producer for CNET and CBS News, spoke with Hyunjun Park, co-founder and CEO of Catalog, a company working on DNA data storage. The following is an edited transcript of their conversation.
Hyunjun Park: For the purposes of this discussion, data is any kind of information that we generate in the world. But, for storing purposes, for us, it’s really a series of ones and zeros. It’s just a long number of ones and zeros, binary data that you would normally use a computer to store with things like hard drives and flash drives. And we’re now trying to store with a new medium, DNA.
SEE: Technology in education: The latest products and trends (free PDF) (TechRepublic)
We’re now using synthetic DNA to store this data, but we may in the future use organic DNA, but it’s actually, if you think about it, it’s the reverse. We’re already using organic DNA in our bodies in all the organisms in the world. We’re using that to store data already. That’s the chromosomes, the DNA inside of your cells. That’s storing information in a very digital manner, but we’re trying to copy that scheme using synthetic DNA molecules that we mimic. We’re now mimicking nature to store information using this new medium.
What is synthetic DNA? It’s, in the end, exactly the same as organic DNA, but we come from it from a synthetic and artificial standpoint. We chemically make the molecules that look exactly like the organic DNA that’s in your body.
Your body is already using DNA to store information in a very digital way. And by that I mean, there are four different things that make up DNA. Four different bases, A, T, G, and C, and the sequences of those base pairs dictate the information that’s stored in it. It’s a very digital manner in which the body stores information. So we’re taking clues from that and taking advantage of all the characteristics of DNA to now store digital information rather than genetic information. And this would happen in a test tube or in a lab rather than inside the cell.
Now, the advantages that I mentioned include things like incredible information density. Because the body is having to pack in so much genetic information into the content of a small cell–that means you can store a lot of information in a very small volume. That’s called information density. And when you think about the information density of DNA, we look at on the order of 200 petabytes per gram of DNA.
That means a data center that contains an exabyte of data, could be stored in a sugar cube’s worth of DNA. That’s a lot of information density. Another advantage is the stability of these molecules. You’re seeing news out there where we’ve been able to sequence the genome of horses that have been preserved in the permafrost for 700,000 years. With DNA’s information storage, you can put the information in that medium once, and you’ll be able to keep it around, essentially forever. And you can store that in a test tube at room temperature, and it’s a very stable form of information storage.
It sounds very far fetched, but I can’t take credit for the idea of storing information in DNA or storing digital information in DNA for the first time. That idea has been around for decades, since the ’50s, even. It just hasn’t been possible to store a lot of information using DNA, because it’s been so expensive to write information into these molecules.
What we’re doing at Catalog that’s novel is that we’ve come up with a platform, a way of doing that in a much cheaper and faster way than what’s been possible with existing technology. We’ve bridged that gap.
How might that come into use in real life? As a demonstration, as a proof of concept last year, we stored all of the English text of Wikipedia into DNA, using a new printing machine that prints DNA molecules that we built. You can imagine a very near future where we would have these machines hooked up to data centers, and for things that need long-term archival or a highly parallel processing, we would store that information in DNA form and query it as needed and make thousands of copies if you need it to be. But that’s also another characteristic of DNA that’s very advantageous as a storage medium.