Home > News content

Microsoft studies using DNA as a hard disk and successfully stores 200M data

via:博客园     time:2018/2/23 14:13:43     readed:516

DNA is seen as a more portable and longer storage carrier for digital information storage, and the technology is progressing rapidly. Films, GIF maps, and famous literary works "war and peace" are all placed on the DNA, and the size of the stored data is getting bigger and bigger.

The technical level, in the DNA storage, decoding the digital information is this: researchers need the data from the 0 and 1 into the formation of DNA bases: adenine (A), thymine (T), guanine (G), cytosine (C), and then through the synthesis of this part of the data stored in DNA. When the data is needed to be retrieved, the researchers sequenced the DNA and regenerated the data from the base to 0 and 1.

As the size of data storage expands, the location of the data stored on DNA, and the technology to restore the data are also followed.

Recently, Microsoft has worked with the molecular information system laboratory (MISL) of University of Washington to develop a new technology to retrieve DNA sequences and decode. They will be 35 files, a total of 200.2 MB data storage to 13 million DNA oligonucleotide (only 20 nucleotide bases following short chain), and successfully find and decode these data in a 10 million 300 thousand DNA sequence in the pool, there is no data loss occurs.

The researchers selected data stored on the DNA consists of the following: MV HD, classical music collection OK Go This Too Shall Pass the band's song, "Universal Declaration of human rights" of the 100 languages, CropTrust stores the Svalbard Global Seed Vault database etc..

微软研究用

Storage of 200.2 MB data to DNA

The paper, published in the Journal Nature Biotechnology, has also been attached to the Microsoft official website.

They're using something called

The so-called DNA data random access is similar to the RAM technology used by computers and mobile phones when they are taking photos and songs. A slight difference is that when the computer and the phone call the data, the location of these data stores does not affect the speed of the call, and the speed of the call is very fast. However, retrieving data on DNA can only temporarily make the storage location do not affect, and the speed of decoding data has not been improved.

The general process of retrieving the data on DNA is the same, unlocking the DNA double helix structure, copying the sequence of stored data, and converting the data. In order to obtain the required data, the whole DNA is often needed to be sequenced.

Random access technology on DNA is usually used with a primer Library (primers refer to a small segment of DNA or RNA) with polymerase chain reaction (PCR). Primers added to the ends of each DNA sequence can help to locate the location of data faster. When decoding, researchers do not need to sequence the whole DNA. PCR helps to speed up decoding by duplicating the sequence that we want to read.

微软研究用

The process of random access on DNA

In the experiment of Microsoft and MISL laboratory, the researchers designed new primer library, decoding and restoring data algorithm, increasing the ability of fault tolerance when storing and decoding data, and finally did not lose data when retrieving data. There is a lot of Microsoft's contribution to the development of decoder and algorithm.

Microsoft and University of Washington have cooperated with many projects in the technology of DNA storage and decode data. Microsoft researcher Karin Strauss is also one of the leading MISL labs managers. For example, in 2016, the two companies were co operating with 100 classic works such as war and peace into the DNA.

This is one of Microsoft's favorite directions for future storage technologies, such as replacing hard drives in data centers with DNA. Karin Strauss once said:

As the storage medium, compared to the DNA hard disk, TF card etc. should be light, keep in dry, low temperature environment, can be stored for a long time, these are the advantages of. But the cost of DNA synthesis and the cost of time are quite high. Microsoft stores 100 classic works, a total of 200 MB data to DNA, which costs 1 billion 500 million bases. It sells 0.04 cents for each base of Twist Bioscience, and it also needs 60 million dollars.

Reinhard Heckel, a postdoctoral fellow at the University of California, Berkeley, believes that while the cost of the technology continues to fall, it is hard to say whether it will be lower than the tape:

From the title: "super body"

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments