Deciphering the Language of Stutter DNA

This month’s blog I will try to dabble with some of the fascinating new found aspects in the molecular biology of short tandem repeat DNA sequences and translation. One of the awesome steps in the discovery of the genetic code was the discovery of the mechanism by which cells convert DNA information into proteins. The four letter alphabet of DNA is first converted into a string RNA in a process called transcription. The subsequent conversion of the messenger RNA strings to proteins (consisting of 20 different amino acids) is called translation. In general eukaryotic translation of messenger RNA is tightly controlled by many steps proteins and ribosomal RNA resulting in strict regulation of the protein synthesis starting paradigmatically starting with a start codon, the infamous AUG triplet and the following protein coding sequence laid out in so-called open reading frames. The process was worked out in the decade after the discovery of the double helix structure in 1954 and for long after its description in 1964 the AUG start codon mediated translation initiation was a paradigm in molecular biology. However, if you would do the effort to sort through the three billion letters that make up the human genome, you find some surprising things. Only about 1% of the three billion letters directly code for proteins.

So what about the 99 % of the human genome not directly encoding proteins ? Microsatellites are short tandem repeat (STR) DNA sequences composed of simple motifs (usually 1-6 nucleotides) repeated multiple times at a specific genetic locus. They are found throughout the genome, both in coding and non-coding regions.
It has been found that the human genome is riddled with microsatellites, short tandem repeated stretches of sequence composed of simple motifs (usually 1-6 nucleotides) repeated multiple times and covering up to 50% of the human genome. Microsatellites contribute to genetic diversity and are used for forensic pupposes in DNA fingerprinting techniques. Mapping these microsatellites has recently been dubbed the human repeatome. For quite a while the repeatome was ignored for disease and although considered useful also considered as molecular thrash. this view started to change when the technical and computational complexities of sequencing and mapping these repeats were solved. The next generation sequencing and long read technologies opened up the genome for more systematic analysis of the repeats and hence the repeatome is now a better categorized part of the human genome. Repeats are becoming more precisely mapped. The functional effects have been analyzed first in disease but accumulating evidence starts to sketch a more clear picture of the role for STR’s.
Repeat-Associated Non-AUG (RAN) translation was relatively recently discovered in 2011 when it was noticed the hereditary diseases spinocerebellar ataxia type 8 (SCA8) and myotonic dystrophy type 1 (DM1) are due to repeat expansion outside the range of normal or “healthy” genes. The microsatellite repeats were shown to expand and affect not only the translation but are also foci of antisense transcription resulting in aberrant RNA species that initially were considered to be non translatable till the repeats were shown to be translated and a new mechanism dubbed Repeat Associated Non ATG translation was proven to be relevant to explain the observations. How do these RAN proteins affect the health of cells? Why are these RAN proteins not translated more often given the ubiquitous distribution of repeated sequences in the human genome ? How does the cell deal with RNA’s generation repeats ? The non-canonical form of translation occurs in all reading frames from both coding and non-coding regions of sense and antisense transcripts carrying expansions of trinucleotide to hexanucleotide repeat sequences. RAN translation challenges canonical rules of translation initiation. It generates unexpected repeat proteins, opening new paradigms in disease mechanisms and cell biology. The generalized theme is the expansion of triplet mutations expanding from a small number to up to 100 or even thousands of repeats. It is now clear that many expansion mutations are in some cases bidirectional transcribed, producing two toxic expansion RNAs, which can produce up to six mutant proteins by repeat associated non-AUG (RAN) translation.
Because these microsatellites express both sense and antisense expansion transcripts and RAN proteins in all three reading frames, each repeat expansion mutation can produce up to six different polymeric proteins, three from the sense transcripts and the other three from the antisense transcripts. Depending on the repeat motif, homo-, di-, tetra- and penta-peptide RAN proteins can be produced. And based on the location of a given expansion mutation within its corresponding gene, toxicity may non-exclusively involve different mechanisms including protein loss- or gain-of-function, RNA gain-of-function, and RAN protein toxicity.
So why were these translated repeats not discovered earlier? For one the dogma of canonical translation initiation and its complex control obfuscated the research efforts to investigate how repeats behave in translation and disease. The discovery of repeat-associated non-AUG (RAN) translation was a significant breakthrough that challenged existing models of canonical translation. It revealed that repeat expansion mutations can lead to the expression of unexpected repetitive proteins in all three reading frames without an AUG initiation codon. Prior to this discovery, the canonical rules of translation initiation did not account for such unconventional protein synthesis.
It is now clear that many microsatellite expansion mutations are bidirectional transcribed producing both sense and antisense toxic expansion RNAs, which can produce up to six mutant proteins by repeat associated non-AUG (RAN) translation. This shows that stutter DNA better known as Short Tandem Repeats are not only useful for forensic applications but increasingly prove to be important for understanding molecular biology of hereditary and aging related diseases.


Leave a comment