ICB 2007 Hong Kong Supplementary formulas Rates of
ICB 2007, Hong Kong Supplementary formulas Rates of Evolutionary Changes
Outline n Nei & Gojobori’s unweighted method ¡ ¡ n n Synonymous / Nonsysnonymous sites Synonymous / Nonsynonymous nucleotide difference Jukes and Cantor’s model for the multiple nucleotide substitution correction Estimation of the divergence time
Nei & Gojobori’s unweighted method n n Nei & Gojobori’s unweighted method is a simple way of estimating number of nucleotide substitutions between two protein -coding sequences. We’ll show this method in two steps ¡ ¡ Step 1: Count synonymous / nonsysnonymous sites in one or more sequences. Step 2: Count synonymous / nonsynonymous nucleotide difference between two sequences
Synonymous / Nonsysnonymous sites n n n For example, codons UUA UUG CUU CUC CUA CUG represent the same amino acid We first calculate the synonymous (the amino acid type is unchanged after nucleotide substitution) fraction of UUA When we are comparing two sequences, calculate the S and N for both sequences for the length of the shorter sequence and average the values from the two sequences. U f 1=1/3 U f 2=0/3 A f 3=1/3 U→C O U→C X A→U X U→A X A→G O U→G X A→C X
Synonymous / Nonsynonymous nucleotide difference n n The differences between two sequences are counted codon by codon. When comparing two codons, there are three types of possibilities: one, two and three nucleotide pairs are mismatched. Each case is taken care of separately. Case 1: 1 mismatched nucleotide pair ¡ ¡ If the two codons represent the same amino acid. Sd=Sd+1 If the two codons represent different amino acid. Nd=Nd+1
Synonymous / Nonsynonymous nucleotide difference n Case 2: 2 mismatched nucleotide pairs ¡ Assume only one nucleotide substitution is allowed at once, there are two paths of the same probability for such substitution. For example: CCC →CAA n n ¡ ¡ CCC → CCA → CAA CCC → CAA There are 4 substitutions in total, 1 is synonymous (CCC → CCA), 3 are nonsynonymous. Therefore Sd=Sd+1/4 and Nd=Nd+3/4. There always 2 paths for 2 mismatched pairs, but if any of the paths involves a stop codon, the path is ignored.
Synonymous / Nonsynonymous nucleotide difference n Case 2: 2 mismatched nucleotide pairs – special case (stop codon in path) ¡ One example of such special case is AAA → TAT n n ¡ AAA → TAT AAA → AAT → TAT In path 1, TAA is a stop codon, therefore path 1 is ignored and we only use the two substitutions in path 2. Hence Sd=Sd + 0/2, Nd=Nd + 2/2.
Synonymous / Nonsynonymous nucleotide difference n Case 3: 3 mismatched nucleotide pairs ¡ There are six paths of the same probability for such substitution. The reason that there are six paths are the same as the previous case. Only we have three substitutions per path now and hence 3!=6 paths. For example CTT→ AGG. n n n CTT → AGT → AGG CTT → ATG → AGG CTT → CTG → AGT → AGG CTT → CTG → AGG CTT → CGG → AGG CTT → CTG → CGG → AGG
Synonymous / Nonsynonymous nucleotide difference ¡ ¡ There are 18 substitutions in total, 6 are synonymous and 12 are nonsynonymous. Therefore Sd=Sd + 6/18 and Nd=Nd + 12/18. Similarly, if there is a stop codon in the path, the path is ignored.
Jukes and Cantor’s model for the multiple nucleotide substitution correction n For synonymous substitution n For nonsynonymous substitution n The dn/ds ratio has proved to be useful in accessing the protein coding part in the genomes. Generally in protein coding section, dn < ds, which indicates strong selection. Conversely, a high dn/ds ratio indicates weak selection.
Estimation of the divergence time n We can estimate the divergence time of the two species using the previous result. The formula is:
- Slides: 11