Error catastrophe


Error catastrophe is the extinction of an organism as a result of excessive mutations. Error catastrophe is something predicted in mathematical models and has also been observed empirically.
Like every organism, viruses 'make mistakes' during replication. The resulting mutations increase biodiversity among the population and help subvert the ability of a host's immune system to recognise it in a subsequent infection. The more mutations the virus makes during replication, the more likely it is to avoid recognition by the immune system and the more diverse its population will be. However, if it makes too many mutations, it may lose some of its biological features which have evolved to its advantage, including its ability to reproduce at all.
The question arises: how many mutations can be made during each replication before the population of viruses begins to lose self-identity?

Basic mathematical model

Consider a virus which has a genetic identity modeled by a string of ones and zeros. Suppose that the string has fixed length L and that during replication the virus copies each digit one by one, making a mistake with probability q independently of all other digits.
Due to the mutations resulting from erroneous replication, there exist up to 2L distinct strains derived from the parent virus. Let xi denote the concentration of strain i; let ai denote the rate at which strain i reproduces; and let Qij denote the probability of a virus of strain i mutating to strain j.
Then the rate of change of concentration xj is given by
At this point, we make a mathematical idealisation: we pick the fittest strain and assume that it is unique ; and we then group the remaining strains into a single group. Let the concentrations of the two groups be x, y with reproduction rates a>b, respectively; let Q be the probability of a virus in the first group mutating to a member of the second group and let R be the probability of a member of the second group returning to the first. The equations governing the development of the populations are:
We are particularly interested in the case where L is very large, so we may safely neglect R and instead consider:
Then setting z = x/y we have
Assuming z achieves a steady concentration over time, z settles down to satisfy
.
So the important question is under what parameter values does the original population persist ? The population persists if and only if the steady state value of z is strictly positive. i.e. if and only if:
This result is more popularly expressed in terms of the ratio of a:b and the error rate q of individual digits: set b/a = , then the condition becomes
Taking a logarithm on both sides and approximating for small q and s one gets
reducing the condition to:
RNA viruses which replicate close to the error threshold have a genome size of order 104 base pairs. Human DNA is about 3.3 billion base units long. This means that the replication mechanism for human DNA must be orders of magnitude more accurate than for the RNA of RNA viruses.

Information-theory based presentation

To avoid error catastrophe, the amount of information lost through mutation must be less than the amount gained through natural selection. This fact can be used to arrive at essentially the same equations as the more common differential presentation.
The information lost can be quantified as the genome length L times the replication error rate q. The probability of survival, S, determines the amount of information contributed by natural selection— and information is the negative log of probability. Therefore, a genome can only survive unchanged when
For example, the very simple genome where L = 1 and q = 1 is a genome with one bit which always mutates. Since Lq is then 1, it follows that S has to be ½ or less. This corresponds to half the offspring surviving; namely the half with the correct genome.

Applications

Some viruses such as polio or hepatitis C operate very close to the critical mutation rate. Drugs have been created to increase the mutation rate of the viruses in order to push them over the critical boundary so that they lose self-identity. However, given the criticism of the basic assumption of the mathematical model, this approach is problematic.
The result introduces a Catch-22 mystery for biologists, Eigen's paradox: in general, large genomes are required for accurate replication, but a large genome requires a high accuracy rate q to persist. Which comes first and how does it happen? An illustration of the difficulty involved is L can only be 100 if q' is 0.99 - a very small string length in terms of genes.