Drummond:Akashi's Test: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
(adding multiple genes material)
Line 2: Line 2:
Akashi's test is very simple.  Suppose you have two aligned codon sequences (a target sequence and an orthologous sequence) and a list of preferred codons.  The question we wish to answer: Is there an association between preferred codons and conserved amino acids, controlling for differences between amino acids?
Akashi's test is very simple.  Suppose you have two aligned codon sequences (a target sequence and an orthologous sequence) and a list of preferred codons.  The question we wish to answer: Is there an association between preferred codons and conserved amino acids, controlling for differences between amino acids?


From the aligned codon sequences, build a 2x2 table with entries a, b, c, and d like this:
From the aligned codon sequences, build a 2x2 contingency table with entries a, b, c, and d like this:


<table border="1" cellspacing="0">
<table border="1" cellspacing="0">
Line 25: Line 25:


:<math>E(a) = n \hat{p}\hat{q}</math>
:<math>E(a) = n \hat{p}\hat{q}</math>
:<math>V(a) = n \hat{p}\hat{q}</math>
:<math>V(a) = \frac{1}{n-1} n\hat{p}(1-\hat{p}) n\hat{q}(1-\hat{q})</math>
 
With the mean and variance, we could write down a <math>Z</math>-score for one table:
:<math>Z = \frac{a - E(a)}{\sqrt{V(a)}}</math>
 
And because a <math>Z</math>-score gives us a measure of statistical significance, we also want an effect size -- the magnitude of the association between preferred codons and conserved sites -- which we can compute as an odds ratio, the ratio of finding a preferred/conserved association divided by the odds of finding a nonpreferred/variable association:
 
:<math>OR = \frac{ad}{bc}</math>


==Akashi's test on multiple genes==
==Akashi's test on multiple genes==
===Combining 2x2 contingency tables using the Mantel-Haenszel procedure===
But calculating <math>Z</math> and <math>OR</math> for a single amino acid in a single gene is perhaps of limited interest.  How do we combine tables so that we can ask questions like, "What is the overall association between preferred codons and conserved sites across the genome?"


To combine tables, we use the


==References==
==References==

Revision as of 19:21, 23 January 2009

Akashi's test on a single gene

Akashi's test is very simple. Suppose you have two aligned codon sequences (a target sequence and an orthologous sequence) and a list of preferred codons. The question we wish to answer: Is there an association between preferred codons and conserved amino acids, controlling for differences between amino acids?

From the aligned codon sequences, build a 2x2 contingency table with entries a, b, c, and d like this:

AA=SerConservedVariable
Preferred[math]\displaystyle{ a }[/math][math]\displaystyle{ b }[/math]
Unpreferred[math]\displaystyle{ c }[/math][math]\displaystyle{ d }[/math]

for each amino acid. You'll usually have 18 tables; W and M have no synonymous codon alternatives and therefore don't contribute to Akashi's test.

  • [math]\displaystyle{ a }[/math] = the number of codons in your target sequence that encode amino acid AA, are PREFERRED, and encode an AA which is unchanged (CONSERVED) in the orthologous sequence
  • [math]\displaystyle{ b }[/math] = the number of codons in your target sequence that encode amino acid AA, are PREFERRED and encode an AAwhich is different (VARIABLE) in the orthologous sequence
  • [math]\displaystyle{ c }[/math] = the number of codons in your target sequence that encode amino acid AA, are UNPREFERRED and encode an AA which is unchanged (CONSERVED) in the orthologous sequence
  • [math]\displaystyle{ d }[/math] = the number of codons in your target sequence that encode amino acid AA, are UNPREFERRED and encode an AA which is different (VARIABLE) in the orthologous sequence

Now the statistics. Assuming no association -- that is, assuming that the probability of a codon being preferred (which we designate [math]\displaystyle{ p }[/math]) is independent of the probability that it encodes a conserved amino acid (which we designate [math]\displaystyle{ q }[/math]) -- we can write down the expected value and variance of [math]\displaystyle{ a }[/math], [math]\displaystyle{ E(a) }[/math] and [math]\displaystyle{ V(a) }[/math]:

[math]\displaystyle{ n = a + b + c + d }[/math]
[math]\displaystyle{ \hat{p} = \frac{a + b}{n} }[/math]
[math]\displaystyle{ \hat{q} = \frac{a + c}{n} }[/math]
[math]\displaystyle{ E(a) = n \hat{p}\hat{q} }[/math]
[math]\displaystyle{ V(a) = \frac{1}{n-1} n\hat{p}(1-\hat{p}) n\hat{q}(1-\hat{q}) }[/math]

With the mean and variance, we could write down a [math]\displaystyle{ Z }[/math]-score for one table:

[math]\displaystyle{ Z = \frac{a - E(a)}{\sqrt{V(a)}} }[/math]

And because a [math]\displaystyle{ Z }[/math]-score gives us a measure of statistical significance, we also want an effect size -- the magnitude of the association between preferred codons and conserved sites -- which we can compute as an odds ratio, the ratio of finding a preferred/conserved association divided by the odds of finding a nonpreferred/variable association:

[math]\displaystyle{ OR = \frac{ad}{bc} }[/math]

Akashi's test on multiple genes

Combining 2x2 contingency tables using the Mantel-Haenszel procedure

But calculating [math]\displaystyle{ Z }[/math] and [math]\displaystyle{ OR }[/math] for a single amino acid in a single gene is perhaps of limited interest. How do we combine tables so that we can ask questions like, "What is the overall association between preferred codons and conserved sites across the genome?"

To combine tables, we use the

References

  1. Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics. 1994 Mar;136(3):927-35. DOI:10.1093/genetics/136.3.927 | PubMed ID:8005445 | HubMed [Akashi-Genetics-1994]