Using Statistics to Decipher Secrets of Natural Mutation
A new mathematical approach for analyzing the complex, subtle patterns of natural mutation in DNA is likely to help biologists understand how mutation contributes to evolutionary change in mammals.
The researchers, Howard Hughes Medical Institute investigator Philip Green and his student Dick Hwang, published a report describing the first applications of their new analytical approach in the August 3, 2004, online early edition of the Proceedings of the National Academy of Sciences.. Both Hwang and Green are at the University of Washington in Seattle.
“Understanding naturally occurring mutations has been of great interest because mutations are major drivers of evolution,” said Green. “However, it’s surprising how little is still known about their causes.”
Previous studies have revealed a number of biases in the rates of different types of mutational change. These arise in part from the innate biochemical characteristics of the four DNA nucleotide bases – adenine, guanine, cytosine and thymine — that affect their vulnerability to modification and the accuracy with which they are replicated when cells divide. Particular nucleotide sequences, for example, cytosine-guanine (CpG) dinucleotides, form “hotspots” — regions that are particularly vulnerable to alterations that convert one nucleotide to another, causing mutations.
To understand these biases, Hwang and Green sought to develop a flexible approach to analyze the process of “neutral DNA evolution” in regions of the genome thought to lack genes and other functionally important sequences. “If you want to get an unvarnished picture of the mutation process itself, uncorrupted by natural selection, you want to look at neutrally evolving DNA,” said Green. “Mutations in DNA that is not functional should better represent the complete spectrum of naturally occurring mutations. Mutations are of course also occurring in the genes and those are of interest because they can create new phenotypes and cause variation among traits. Some of those mutations are advantageous and consequently quickly spread through the species, while others are deleterious and are weeded out. So genes and other features don’t evolve at neutral rates.”
“Apart from their intrinsic interest, we think understanding the underlying mutation patterns better will also help us in finding the functionally important features in the genome. Basically, it’s a signal-to-noise issue, where the naturally occurring mutations are the `noise’ and the functional parts of the genome are the `signal.’ The better we understand the noise, the better job we can do of understanding the signals.”
To begin to understand the patterns of DNA changes that result from neutral mutation, Hwang and Green developed a new version of a powerful statistical technique that they call “Bayesian Markov chain Monte Carlo sequence analysis.” Basically, the technique enables them to feed in sequence information from genomes of different organisms and discern patterns that can distinguish models of mutational mechanisms.
According to Green, the statistical approach offers an effective way to analyze models that are very difficult or impossible to solve analytically. “Until recently,” he said, “the state of the art in the molecular evolution field was to use models that people knew were gross over-simplifications, but had the merit that you could solve them analytically. Without doing too much computation, you could make estimates of mutation rates of various sorts. However, the cost of that simplified approach was a model that is unrealistic.”
In particular, he said, the standard model treated all positions in the sequence as evolving independently of each other, rather than taking into account context effects, in which the identity of neighboring nucleotides influences the nature and rate of mutations.
“While a few other investigators have been working on how to take into account context effects, I think we are doing it in a more rigorous, more complete way,” said Green. Without such a rigorous approach, he said, models of evolution could give erroneous results regarding the effects of mutation.
“I think the more realistic you can make the model, the less likely you are to be led astray by drawing conclusions that really had more to do with the deficiencies of your model than with the underlying reality,” said Green.
Hwang and Green tested their analytical approach by using it to compare the sequences of corresponding genome segments from 19 mammalian species, including human, horse, lemur, rat, rabbit, hedgehog and armadillo. Such comparisons among species across the mammalian evolutionary tree can provide insight into how mutational patterns have changed over evolutionary time.
They focused their analysis on a 1.7 million base-pair DNA segment known as the “greater cystic fibrosis transmembrane conductance regulator region,” which was sequenced in the 19 mammals by Eric Green and his colleagues at the National Human Genome Research Institute. To concentrate on the neutrally evolving DNA, Hwang and Phil Green excluded the genes from those segments and compared what was left.
According to Green, the comparison of context-dependent mutation in the segments across the species revealed that the CpG mutations, unlike other mutation types, accumulated in a regular clock-like fashion. The analysis also distinguished other sources of naturally occurring mutations and their variation due to biological and biochemical influences, and appears to offer some insight into factors such as generation time and population size that have varied in mammalian evolution.
Green concluded that by contributing to a better understanding of naturally occurring mutations, the technique would help in understanding both how genetic disease arises and how evolution has occurred.
A next step, he said, will be to extend the analysis to sites on the genome that are not evolving neutrally. This should help identify genomic regions that were not previously recognized to be of functional importance, said Green. Also, he said, such analyses could offer considerable insight into how patterns of natural selection have varied across different species in the course of evolution.
“A more complex model of the neutral process should start to pay for itself in exploring these phenomena, because you’re frequently looking for relatively subtle effects,” said Green.