# Guide to Excel for statistics

### From OpenWetWare

(t-test with example) |
Current revision (19:07, 7 September 2010) (view source) (→Evaluation of Excel for statistics) |
||

(6 intermediate revisions not shown.) | |||

Line 1: | Line 1: | ||

{{back to statistics portal}} | {{back to statistics portal}} | ||

+ | |||

== common statistical functions == | == common statistical functions == | ||

Line 5: | Line 6: | ||

* AVERAGE(cells) - arithmetical [[wikipedia:mean|mean]] of cells specified | * AVERAGE(cells) - arithmetical [[wikipedia:mean|mean]] of cells specified | ||

* MEDIAN(cells) - middle value; half the values are greater, half are less than the [[wikipedia:median|median]] | * MEDIAN(cells) - middle value; half the values are greater, half are less than the [[wikipedia:median|median]] | ||

- | * COUNT(cells) - counts the number of | + | * COUNT(cells) - counts the number of measurements n; useful for standard error calculation |

* VAR(cells) - variance of a sample population; variance = standard deviation squared | * VAR(cells) - variance of a sample population; variance = standard deviation squared | ||

* STDEV(cells) - standard deviation of a sample population; measures how much values vary from the mean | * STDEV(cells) - standard deviation of a sample population; measures how much values vary from the mean | ||

- | You can type these functions straight into the cells starting with the equal sign, e.g. type =stdev(A1:A10). Capitalisation is not important. Replace "cells" in the above list with the cells to be used for the calculation. Cells can be specified by clicking or typing. You can specify a range of cells by stating the (top) left and the (bottom) right cell, e.g. type A1:A10 (10 cells) or A1:B10 (2x10cells). Alternatively, you can enumerate cells using | + | You can type these functions straight into the cells starting with the equal sign, e.g. type =stdev(A1:A10). Capitalisation is not important. Replace "cells" in the above list with the cells to be used for the calculation. Cells can be specified by clicking or typing. You can specify a range of cells by stating the (top) left and the (bottom) right cell, e.g. type A1:A10 (10 cells) or A1:B10 (2x10cells). Alternatively, you can enumerate cells using commas as separators, e.g. A1,A3,A5. If you want to prevent automatic change of the column or row when copying formulas, use the dollar sign, e.g. $A1 will keep the column the same while A$1 will prevent a change in the row number. |

== Student t test == | == Student t test == | ||

- | Excel's | + | Excel's [[wikipedia:t-test|Student's t test]] is based on the function TTEST(). Watch out to correctly specify the tails and type parameters. If in doubt use: tails 2, type 3. This is the most stringent setting. |

=TTEST(group1,group2,tails,type) | =TTEST(group1,group2,tails,type) | ||

Line 22: | Line 23: | ||

2 two-tailed distribution | 2 two-tailed distribution | ||

type 1 paired = e.g. same cells after 1h, 2h | type 1 paired = e.g. same cells after 1h, 2h | ||

- | 2 2 | + | 2 2 series with equal standard deviation |

- | 3 2 | + | 3 2 series with unequal standard deviation |

</pre> | </pre> | ||

- | + | ===Example=== | |

- | + | ||

data series 1: 520, 460, 500, 470<br> | data series 1: 520, 460, 500, 470<br> | ||

data series 2: 230, 270, 250, 280<br> | data series 2: 230, 270, 250, 280<br> | ||

Line 33: | Line 33: | ||

Question: Are these 2 series significantly different from each other? | Question: Are these 2 series significantly different from each other? | ||

- | + | Calculation for all TTEST variations | |

<pre> | <pre> | ||

- | 0.000018 ttest(B5:B8,C5:C8,2,3) | + | 0.000018* ttest(B5:B8,C5:C8,2,3) 2 tails, unequal SD |

- | 0. | + | 0.000013 ttest(B5:B8,C5:C8,2,2) 2 tails, equal SD |

- | 0. | + | 0.000009 ttest(B5:B8,C5:C8,1,3) 1 tail, unequal SD |

- | 0.002559 ttest(B5:B8,C5:C8,2,1) | + | |

+ | (0.002559 ttest(B5:B8,C5:C8,2,1) 2 tails, paired series) | ||

</pre> | </pre> | ||

- | You can see how tails and type affect the values. Note this data is not paired. In most experiments you will comparing unpaired data, unless you follow the same system in different conditions. For example, | + | You can see how tails and type affect the values. Note this data is not paired; the last test should not be applied here and is just given for completeness. In most experiments you will be comparing unpaired data, unless you follow the same system in different conditions. For example, cells at 1h, same cells at 2h. Note that 2 tails, unequal SD (starred) gives the highest value, which means it is the most stringent test. |

- | Result: The data series are significantly different from each other | + | Result: The data series are significantly different from each other with p < 0.001. |

- | == | + | == Evaluation of Excel for statistics == |

- | Read this [http://www.practicalstats.com/ | + | Excel is a useful data management and instructional tool. However, the statistical functions should not be used for statistical analyses (McCullough and Heiser, 2008). Presumably the algebraic functions are accurate. |

+ | |||

+ | Read this [http://www.practicalstats.com/xlsstats/excelstats.html excellent review of Excel for statistics] on practicalstats.com. | ||

The main arguments are summarised below: | The main arguments are summarised below: | ||

Line 52: | Line 55: | ||

* many statistical methods are not available, e.g. [[wikipedia:box plots|box plots]], 2-way [[wikipedia:ANOVA|ANOVA]] with unequal sample size, [[wikipedia:nonparametric|nonparametric tests]] | * many statistical methods are not available, e.g. [[wikipedia:box plots|box plots]], 2-way [[wikipedia:ANOVA|ANOVA]] with unequal sample size, [[wikipedia:nonparametric|nonparametric tests]] | ||

* several procedures are misleading, e.g. confidence function | * several procedures are misleading, e.g. confidence function | ||

- | * distributions are not computed with precision (Excel only correct to 2nd digit) | + | * distributions are not computed with precision (Excel is only correct to 2nd digit) |

- | * regression routines are incorrect for | + | * regression routines are incorrect for multi-collinear data |

* ranks of tied data are computed incorrectly | * ranks of tied data are computed incorrectly | ||

Line 62: | Line 65: | ||

* [http://support.microsoft.com/kb/214076 standard error of the mean with Excel] | * [http://support.microsoft.com/kb/214076 standard error of the mean with Excel] | ||

* [http://www.utexas.edu/its/training/handouts/excelstat/ tutorial treating many Excel stats functions] | * [http://www.utexas.edu/its/training/handouts/excelstat/ tutorial treating many Excel stats functions] | ||

+ | |||

+ | ready-made spreadsheets: | ||

+ | * [http://udel.edu/~mcdonald/statsignedrank.html excellent summary of Wilcoxon signed rank test (non-parametric statistics) and Excel sheet to perform the test] | ||

+ | |||

+ | |||

+ | References | ||

+ | |||

+ | [http://dx.doi.org/10.1016/j.csda.2008.03.004 McCullough, B.D., and D.A. Heiser, 2008. On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis archive. Volume 52 , Issue 10 ] |

## Current revision

back to stats portal |

## Contents |

## common statistical functions

- AVERAGE(cells) - arithmetical mean of cells specified
- MEDIAN(cells) - middle value; half the values are greater, half are less than the median
- COUNT(cells) - counts the number of measurements n; useful for standard error calculation
- VAR(cells) - variance of a sample population; variance = standard deviation squared
- STDEV(cells) - standard deviation of a sample population; measures how much values vary from the mean

You can type these functions straight into the cells starting with the equal sign, e.g. type =stdev(A1:A10). Capitalisation is not important. Replace "cells" in the above list with the cells to be used for the calculation. Cells can be specified by clicking or typing. You can specify a range of cells by stating the (top) left and the (bottom) right cell, e.g. type A1:A10 (10 cells) or A1:B10 (2x10cells). Alternatively, you can enumerate cells using commas as separators, e.g. A1,A3,A5. If you want to prevent automatic change of the column or row when copying formulas, use the dollar sign, e.g. $A1 will keep the column the same while A$1 will prevent a change in the row number.

## Student t test

Excel's Student's t test is based on the function TTEST(). Watch out to correctly specify the tails and type parameters. If in doubt use: tails 2, type 3. This is the most stringent setting.

=TTEST(group1,group2,tails,type)

group 1 A1:A8 1st group of measurements group 2 B1:B8 2nd group of measurements tails 1 one-tailed distribution 2 two-tailed distribution type 1 paired = e.g. same cells after 1h, 2h 2 2 series with equal standard deviation 3 2 series with unequal standard deviation

### Example

data series 1: 520, 460, 500, 470

data series 2: 230, 270, 250, 280

Question: Are these 2 series significantly different from each other?

Calculation for all TTEST variations

0.000018* ttest(B5:B8,C5:C8,2,3) 2 tails, unequal SD 0.000013 ttest(B5:B8,C5:C8,2,2) 2 tails, equal SD 0.000009 ttest(B5:B8,C5:C8,1,3) 1 tail, unequal SD (0.002559 ttest(B5:B8,C5:C8,2,1) 2 tails, paired series)

You can see how tails and type affect the values. Note this data is not paired; the last test should not be applied here and is just given for completeness. In most experiments you will be comparing unpaired data, unless you follow the same system in different conditions. For example, cells at 1h, same cells at 2h. Note that 2 tails, unequal SD (starred) gives the highest value, which means it is the most stringent test.

Result: The data series are significantly different from each other with p < 0.001.

## Evaluation of Excel for statistics

Excel is a useful data management and instructional tool. However, the statistical functions should not be used for statistical analyses (McCullough and Heiser, 2008). Presumably the algebraic functions are accurate.

Read this excellent review of Excel for statistics on practicalstats.com.

The main arguments are summarised below:

- many of Excel's charts violate standards of good graphics, i.e. fancy but not good for science
- many statistical methods are not available, e.g. box plots, 2-way ANOVA with unequal sample size, nonparametric tests
- several procedures are misleading, e.g. confidence function
- distributions are not computed with precision (Excel is only correct to 2nd digit)
- regression routines are incorrect for multi-collinear data
- ranks of tied data are computed incorrectly

## links

Tutorials:

- mean & standard deviation with Excel
- standard error of the mean with Excel
- tutorial treating many Excel stats functions

ready-made spreadsheets:

References