In higher eukaryotes like human only function of minimal part of genome is known and most of the genome (over 97% in human) is designated as so called junk DNA without any well known function. To understand how the genome function as a whole, we obviously should understand the overall function of this genomic desert as a largest part of genome.
My guess is that there should exist one high level theory which would explain most of the junk DNA instead of many smaller theories for that.
Here I do just this. I give a new theory to explain most of the junk DNA by using few common properties of higher eykaryotes and their DNA in genomes.
Let's keep human be as a model organism for higher eukaryotes.
We make following notions:
1)
Human is multicellular organism with several types of cells.
2)
The DNA (chromoseomes) with same sequence is able to function in specific way in each type of nucleated cell of our body.
3)
The DNA molecule in the nucleus of interphase state cell is huge and strongly three dimensional biomolecule. This means that the DNA is not like a simple magnetic tape allowing read data by transcription from anywhere and anytime.
Current explanations for junk DNA seem to forget these basic properties of higher eukaryotes and their genomes. It should be big miracle that the same DNA molecule (DNA with same sequence) is able to function in specific way in each types of our nucleated cells, in thousands of milliards of our cells.
Obviously one reason for forgetting these basic things is that quite little is known about internal architecture of nucleus and how the DNA behaves there as a three dimensional molecule. However I believe that despite of the lack of information it is possible to determine some properties of the three dimensional structure of the functioning DNA in the interface state cell. And what is essential here, those properties have also consequences for the one dimensional structure of the DNA, i.e. the sequence of the DNA.
My theory has the name “DNA spaghetti theory”, which comes from a simple analogy I use later.
The human body contains thousands of milliards of nucleated cells of hundreds of types. The DNA with same sequence functions in specific way in each of cells.
The specific set of transcription units is read in each type of cell. These transcription units must be somehow biochemically visible in the nucleus to be transcribed.
Obviously biochemical visibility implies that the transcription units must also be in some sense visible three dimensionally in nucleus.
This means that the DNA must have some three dimensional structure which supports the transcription required in the cell.
If we use here same terms as for proteins, we can ask how the DNA folds to the specific three dimensional structure supporting the required transcription.
To answer for this let's compare the DNA and proteins. Compared to the DNA the proteins are quite small molecules. By folding they achieve their more or less strict three dimensional structure to support one or only few of their biochemical functions. On the contrary for proteins the DNA is a very large and flexible biomolecule and it is used in many different complex tasks in different types of cells.
For these reasons it is obvious that the DNA has not any strict three dimensional structure in specific type of interphase state cell. However at the same time it has some three dimensional structure which supports required transcription in that cell.
How is this possible?
There can be only one high level model which explains this situation. It is obvious that active parts of DNA, especially active transcription units, are pulled out from DNA due to the heterogeneous environment of the nucleus. The exact three dimensional positions where the transcription units are positioned can be quite random (even some relations can exist). Of course these three dimensional positions must be suitable for transcription. It is well known that the passive DNA, heterochromatin, usually locates in the periphery of the nucleus and the active parts normally in inner areas of nucleus.
This means that the DNA must have different three dimensional structure even in individual cells of same type and even in one cell from time to time, even the expression is same.
This model is simplified but when we simply it even more with the following spaghetti analogy (dna sphagetti theory), it is easy to understand that the genomes of higher eukaryotes must contain much of DNA which has been erroneously called as junk DNA:
Let's think we have a box (nucleus) containing some spaghetti (the genome of higher eukaryote). Then we have black beads (all transcription units of the genome) in spaghetti. Our task is pull out specific subset of beads (the transcription units transcribed in the specific cell) to randomly selected positions at surface of the box. It is required that this task is possible to repeat over and over for partly overlapping sets of beads ( for all different type of cells of the organism). To be possible to do this, it is clear that there must be quite much empty space between the beads, exact amount of space depending on the dimensions of the box and how the partly overlapping sets (transcriptomes) overlap.
This model is independent of the geometry of the box. Of course in the nucleus the geometry is different as shortly explained before.
We should remember that the cell never born de novo but through different kind of divisions and associations of other cells with large amount of inherited epigenetic features. So pulling out of DNA does happen never in newborn homogeneous space but in environment builded by evolution.
Some comments:
It is possible that in the real nucleus the pulling out of DNA happens gradually. Perhaps first is pulled out larger clusters, then smaller ones depending on what kind of sets of transcription units is needed in the specific biochemical situation. For example, transcription units can locate close to each other if they are always used at the same time. On the other hand, known irregular distribution of the junk DNA in genomes perhaps refers to the stages. Also it is possible that at some times unused transcription units are pulled back somewhere when not needed any more, for example during development. It is also possible that the DNA forms some kind of net in a nucleus of a living cell.
In this model it would be sensible to regard many regulation elements in DNA as a “hanging points of DNA”. For example, it is known that some regulation elements in DNA function independently of orientation and others can function as enhancer or repressor depending on the situation. It also sounds natural that many “old fashion” regulation elements can locate far from genes or transcription units they regulate.
The idea is quite old. I first expressed it in internet at Apr 2003 but in Finnish language..
If you want to read the Finnish language version, it is here:
http://koti.welho.com/hvirkkun/artikkeli/noncoding.html
If you have any comments please send them to me:
Heikki Virkkunen
heikki.virkkunen@welho.com