Random Access Expectation in DNA Storage and Fountain Codes

2026-05-11Information Theory

Information Theory
AI summary

The authors study how many coded pieces from a special type of linear code need to be collected to decode one piece of original information, which is important for DNA data storage. They focus on codes with a fully symmetric structure and show these are equivalent to LT codes in binary form. By analyzing these codes with a peeling decoder for large data sizes, they find a lower bound on the average number of coded pieces needed, normalized by the size of the original data. Their results show that at least about 78.5% of coded pieces are required, and about 78.7% can be achieved.

DNA data storagelinear codesfully symmetric codesLT codesgenerator matrixrandom access expectationpeeling decoderblocklengthcoding theoryinformation symbols
Authors
Christoph Hofmeister, Rawad Bitar, Eitan Yaakobi
Abstract
Motivated by DNA data storage, we study the expected number of coded symbols drawn from a linear code until a desired information symbol can be decoded - the random access expectation. We focus on generator matrices with a type of symmetry, conjectured in prior work to be optimal, which we call fully symmetric. We point out an equivalence between binary fully symmetric codes and LT codes. Using this observation, we analyze the random access expectation of binary fully symmetric codes under a peeling decoder, in the large blocklength limit. Under these assumptions, the random access expectation, normalized by the number of information symbols, is at least π/4 {\approx} 0.7854, while a value of {\approx} 0.7869 is achievable.