Creating a biological hard drive: scientists use genetic code to store information

When Shakespeare wrote his famous Sonnet 18 in 1609 – the one that begins with the iconic line “Shall I compare thee to a summer’s day?” – he had no idea that one day his words would also be written into the basic structure of life.

Recently, a team of scientists led by Nick Goldman encoded all of Shakespeare’s 154 sonnets onto synthetic DNA, along with Watson and Crick’s classic paper on DNA structure, an audio clip of Martin Luther King Jr.’s “I Have a Dream” speech, and a picture of Goldman’s research institute.

While this is not the first time that scientists have added synthetic information to DNA, it marks an increase in experimentation with whether DNA could have a use outside of holding the genes necessary for life.

Last year, for example, Harvard geneticist George Church drafted his most recent book onto DNA. In 2003, the German company Icon Genetics copied a line from Virgil’s “Georgics” into the DNA of genetically modified plants. The line read, “Neither can every soil bear every fruit,” in the original Latin, and it was used as a marker to tell the genetically-modified plants from non-modified ones.

Goldman’s information amounts to a little more than 700 kilobytes, which is a negligible amount of information one considers that there are allegedly three zetabytes of knowledge in the world. However, Goldman’s work is significant because it has created a real field for scientists to explore the possibilities of information storage, especially critical in today’s Information Age. Scientists speculate that a single gram of DNA can contain an outstanding 450 billion gigabytes of information.

When storing this man-made information on genetic code, scientists have a lot to consider. For starters, scientists only have four nucleic acids to work with (adenine, cytosine, guanine, and thymine), all of which must be coded to somehow represent a wide scale of information. The trick is to synthesize a workable code to convey this information in essentially four letters. Goldman’s team first tried converting the information into binary code, using A and C to represent zero and G and T to represent one.

But this simplistic binary code was prone to errors, and a much more complicated code was eventually developed and submitted for patent by Church’s researchers. With so many possible styles of coding, scientists must find the style that is the most cost-efficient while still protecting against errors in the DNA.

DNA holding synthetic information is saved outside of living cells, in a vial of solution. Genetic code in cells is constantly changing, and if synthetic DNA is kept in cells, the messages may be altered or lost over time.

The real significance of coding Shakespeare’s poetry onto synthetic genetic code is what it may mean for the future of information storage. Goldman’s institution, the European Bioinformatics Institute in the United Kingdom, holds multiple biological databases for scientists across the world, and the data is becoming quite costly to manage and store.

That is not to say that DNA coding is entirely cost-efficient. In fact, its current cost makes it an expensive endeavor, especially when it comes to the physical DNA synthesis of the necessary coded nucleotides.

The price of storing a megabyte of information on DNA is currently about $12,000, according to Nature, and the same information would cost a little more than $200 to be read back. However, scientists believe that if current trends continue, prices will fall by a power of ten every five years, a much steeper drop than that of electronic equivalents in today’s market.

Scientists also predict that the falling price of DNA storage may allow it to be used one day be by people outside of the scientific community, for things like saving a wedding video for future generations.

Beside cost, time is another constraint of DNA storage. Church’s book took several days to encode into the nucleotide bases of DNA, and computers take even longer to read back the information. For long-term storage, however, Goldman foresees DNA becoming a viable method despite cost and time requirements, which are anticipated to drop in the near future.

Interestingly, Goldman has discussed this form of DNA storage as being “apocalypse-proof,” according to Nature. Scientists have been able to recover DNA from woolly mammoths that are over 50,000 years old and Neanderthal DNA samples were not exactly properly preserved. The viability of DNA after centuries makes Goldman believe that this method may make preservation of information for future generations possible even in the event of a hypothetical disaster of enormous scale.

Shakespeare’s 154 sonnets weigh about half a pound as a physical book; they take up about 100 kilobytes on a digital device, and in DNA form, they are reduced to literally a speck of dust, according to Goldman. With the new research into this biochemical storage technique, the three zetabytes of information in the world today, Goldman calculates, could fit in the space of one and a half cubic meters, or “the back of your station wagon.”

In Shakespeare’s final lines of Sonnet 18, he writes of his hope that poetry will preserve his love’s memory forever:

“When in eternal lines to time thou grow’st:
So long as men can breathe or eyes can see,
So long lives this, and this gives life to thee.”

Scientists like Goldman and Church have similar hopes. But instead of lines of poetry, they look toward lines of DNA.