I have been prompted by reading about genomes to produce a different version of the chromosome parable at reference 1, a parable itself prompted by going to a talk by one Paul Davies, physicist and personality.
A single cell.
A colony.
We are most interested in what goes on in a single cell, although this cell is likely to one of many others, all managing to coexist in a more or less shared environment. And there is likely to be rather more structure than is shown here; we have geometry as well as chemistry.
So, inside our cell we have a chemical soup, fairly well mixed, at some temperature and pressure.
These are fairly complicated chemicals and stuff is always going on. The sort of stuff which might be illustrated by the sort of figure snapped above. Some of this goes to maintain the boundary wall, which needs constant attention given the buffeting it takes, or perhaps even to make it a bit bigger, a bit of growth.
In addition, there is traffic at the entry ports, traffic potentially in both directions and probably regulated by more or less complicated gates. Inputs will usually include ingredients for whatever the cell is brewing up and something to provide the energy to keep the show on the road. Outputs might be for use elsewhere – or for waste disposal.
But sometimes, this is not enough and the cell needs to deploy the protein manufacturing capability of DNA, with, for these purposes, proteins being considered as more or less long sequences of amino acids. The trick of the DNA being to hold compact, coded versions of the necessary sequences. Given that there are only around twenty of these acids, the letters of the alphabet – six bits - would suffice as codes.
Genetic information is stored as base pairs in the double stranded DNA. But it is mostly used when translated into the mostly single stranded RNA, of which there are a number of different kinds. A translation process which, inter alia, drops all the introns and leaves just the exons. For which see below.
It seems that nature is very conservative and most of life – at least the sort that you can see – uses the same code, unlike computers where there are lots of coding conventions and lots of standards.
As noted at reference 1, one novelty of DNA is that it is about information more than chemistry and it can store any sequence of letters, rather in the way that computer memory can store any sequence of bits. Memory which is indifferent to what those bits might be saying and in something of the same way the chemical machinery associated with DNA, will, within limits, work with any sequence of amino acids, without needing specialised reactions of the sort illustrated above.
That said, while a protein might be completely specified by it sequence of amino acids, the folding behaviour of that assembled sequence is important in understanding how it works. Noting in passing, that one of the successes of Google AI has been to predict, reasonably accurately, how a protein will fold, saving organic chemists a lot of time and trouble.
Noting also that while DNA might be indifferent to the sequences it stores, I do not believe the converse to be true, that is to say that an arbitrary sequence of amino acids might not make a viable protein, it might fall apart almost as fast as you can build it.
Noting also that it is the huge potential of proteins specified in this simple, linear way which makes life as we know it possible.
Some technical points
Most of the human genetic code is held in the 23 pairs of chromosomes which are held in nearly all the cells of a human body. At a more or less optical level, these are summarised in the figure above – lifted from reference 3 below – where plenty of zoom is available.
The chromosomes and their bands, which do not vary much from person to person, provide a useful framework for mapping.
For present purposes, the genetic code can be thought of as being expressed in triples of base pairs called codons, where each codon codes for an amino acid. Given that there are four bases, there are 64 different codons, which, allowing for a small amount of punctuation, gives plenty of redundancy: the map from codon to amino acid is very much many to one. The snap above, taken from the German, gives the idea. Green for start bottom and red for stop right.
Put simply, start and end are marked by start and end codons. These are supplemented by special sequences which serve in much the same way as an identifier on a computer record.
It was once thought that a gene consisted of a simple sequence of codons, started with a start codon and stopped with one of the stop codons. Things turned out to be more complicated with that simple sequence often being broken up by stretches call introns, giving a sequence of exons and introns. Introns are removed by RNA processing before the serious business of translation into proteins starts.
Complications.
[BP506-06. An example of a section of DNA translated by a computer in all six possible reading frames. The open reading frames (ORFs) for gene 1 and gene 2 are highlighted]
RNA processing might remove more than just an intron, perhaps an exon separated by two introns as well. This brings some variety into the proteins being generated – variety which is sometimes helpful, sometimes not. I have not been able to come up with a simple story about how introns are identified or marked, but there must be something: perhaps something left behind by the process which inserted them into the chromosome in the first place
Note that we do not need to be able to identify, to find introns in the way that we do for genes. It is enough for RNA processing simply to dump them when it comes across them
Codons might get out of alignment on their base pair foundation. If, for example, a base pair is deleted, the triples that follow are apt to be nonsense, reading as quite the wrong sequence of amino acids. In which connection the concept of reading frame is relevant
There are lots of repeats. The same protein may be specified many different times, perhaps in different places in the set of chromosomes, perhaps with minor differences between them
There is lots of copying going on. DNA is copied when cells divide and RNA is copying chunks of DNA all the time. Proteins do not last very long and cells need to keep stocked up. Now while the RNA machinery is very clever and the error rate is very low, the large numbers mean that plenty of errors – aka mutations – creep into the system. Some for better, some for worse.
In the snap above, lifted from Wikipedia, we have a stretch of bacterial DNA, together with the six possible readings, three forwards and thee backwards, readings which use the standard one letter abbreviations for amino acids. With asterisk marking start and stop. The point being, I think, that the two genes use different reading frames, a complication which RNA processing has to cope with somehow.
There is plenty of unused space in most genomes. To the extent that I recall an anecdote about chemists writing the works of Shakespeare into the genomes of suitable bacteria on slow Friday afternoons. I have not bothered to check this anecdote today, but I believe the spirit of it, at least, to be fair enough.
Conclusions
A bit of background for a post to follow, hopefully fairly soon.
PS: for those that prefer (possibly elderly) texts to Wikipedia, a lot of the foregoing is to be found in Chapter 1 of reference 4. With rather more serious treatment scattered through reference 5.
References
Reference 1: https://psmv4.blogspot.com/2019/03/maxwells-demon-revisited.html.
Reference 2: https://en.wikipedia.org/wiki/Protein.
Reference 3: https://en.wikipedia.org/wiki/Chromosome.
Reference 4: Molecular Biology of the Cell - Alberts, Johnson, Lewis, Raff, Roberts and Walter - 2002. First published 1983, with this being the fourth edition.
Reference 5: Genes VII - Benjamin Lewin - 2000. Originally published in 1983, now at least at Genes XI published in 2014, with a raft of co-authors.






No comments:
Post a Comment