# Guide to Excel for statistics

### From OpenWetWare

(→evaluation of Excel for statistics: added summary from practicalstats.com review) |
(t-test with example) |
||

Line 10: | Line 10: | ||

You can type these functions straight into the cells starting with the equal sign, e.g. type =stdev(A1:A10). Capitalisation is not important. Replace "cells" in the above list with the cells to be used for the calculation. Cells can be specified by clicking or typing. You can specify a range of cells by stating the (top) left and the (bottom) right cell, e.g. type A1:A10 (10 cells) or A1:B10 (2x10cells). Alternatively, you can enumerate cells using the comma as separator, e.g. A1,A3,A5. If you want to prevent automatic change of the column or row when copying formulae, use the dollar sign, e.g. $A1 will keep the column the same while A$1 will prevent a change in the row number. | You can type these functions straight into the cells starting with the equal sign, e.g. type =stdev(A1:A10). Capitalisation is not important. Replace "cells" in the above list with the cells to be used for the calculation. Cells can be specified by clicking or typing. You can specify a range of cells by stating the (top) left and the (bottom) right cell, e.g. type A1:A10 (10 cells) or A1:B10 (2x10cells). Alternatively, you can enumerate cells using the comma as separator, e.g. A1,A3,A5. If you want to prevent automatic change of the column or row when copying formulae, use the dollar sign, e.g. $A1 will keep the column the same while A$1 will prevent a change in the row number. | ||

+ | |||

+ | == Student t test == | ||

+ | |||

+ | Excel's student t test is based on the function TTEST(). Watch out to correctly specify the tails and type. If in doubt use: tails 2, type 3. This is the most stringent setting. | ||

+ | |||

+ | =TTEST(group1,group2,tails,type) | ||

+ | <pre> | ||

+ | group 1 A1:A8 1st group of measurements | ||

+ | group 2 B1:B8 2nd group of measurements | ||

+ | tails 1 one-tailed distribution | ||

+ | 2 two-tailed distribution | ||

+ | type 1 paired = e.g. same cells after 1h, 2h | ||

+ | 2 2 samples, equal SD | ||

+ | 3 2 samples, unequal SD | ||

+ | </pre> | ||

+ | |||

+ | See an example below. It's based on this great web page [http://helios.bto.ed.ac.uk/bto/statistics/tress4a.html#Student's%20t-test] | ||

+ | |||

+ | data series 1: 520, 460, 500, 470<br> | ||

+ | data series 2: 230, 270, 250, 280<br> | ||

+ | |||

+ | Question: Are these 2 series significantly different from each other? | ||

+ | |||

+ | Results for all TTEST variations | ||

+ | <pre> | ||

+ | 0.000018 ttest(B5:B8,C5:C8,2,3) | ||

+ | 0.000009 ttest(B5:B8,C5:C8,1,3) | ||

+ | 0.000013 ttest(B5:B8,C5:C8,2,2) | ||

+ | 0.002559 ttest(B5:B8,C5:C8,2,1) | ||

+ | </pre> | ||

+ | You can see how tails and type affect the values. Note this data is not paired. In most experiments you will comparing unpaired data, unless you follow the same system in different conditions. For example, animal aged 10w, same animal aged 15w. | ||

+ | |||

+ | Result: The data series are significantly different from each other. P < 0.001. | ||

== evaluation of Excel for statistics == | == evaluation of Excel for statistics == |

## Revision as of 15:07, 21 March 2007

back to stats portal |

## Contents |

## common statistical functions

- AVERAGE(cells) - arithmetical mean of cells specified
- MEDIAN(cells) - middle value; half the values are greater, half are less than the median
- COUNT(cells) - counts the number of values n; useful for standard error calculation
- VAR(cells) - variance of a sample population; variance = standard deviation squared
- STDEV(cells) - standard deviation of a sample population; measures how much values vary from the mean

You can type these functions straight into the cells starting with the equal sign, e.g. type =stdev(A1:A10). Capitalisation is not important. Replace "cells" in the above list with the cells to be used for the calculation. Cells can be specified by clicking or typing. You can specify a range of cells by stating the (top) left and the (bottom) right cell, e.g. type A1:A10 (10 cells) or A1:B10 (2x10cells). Alternatively, you can enumerate cells using the comma as separator, e.g. A1,A3,A5. If you want to prevent automatic change of the column or row when copying formulae, use the dollar sign, e.g. $A1 will keep the column the same while A$1 will prevent a change in the row number.

## Student t test

Excel's student t test is based on the function TTEST(). Watch out to correctly specify the tails and type. If in doubt use: tails 2, type 3. This is the most stringent setting.

=TTEST(group1,group2,tails,type)

group 1 A1:A8 1st group of measurements group 2 B1:B8 2nd group of measurements tails 1 one-tailed distribution 2 two-tailed distribution type 1 paired = e.g. same cells after 1h, 2h 2 2 samples, equal SD 3 2 samples, unequal SD

See an example below. It's based on this great web page [1]

data series 1: 520, 460, 500, 470

data series 2: 230, 270, 250, 280

Question: Are these 2 series significantly different from each other?

Results for all TTEST variations

0.000018 ttest(B5:B8,C5:C8,2,3) 0.000009 ttest(B5:B8,C5:C8,1,3) 0.000013 ttest(B5:B8,C5:C8,2,2) 0.002559 ttest(B5:B8,C5:C8,2,1)

You can see how tails and type affect the values. Note this data is not paired. In most experiments you will comparing unpaired data, unless you follow the same system in different conditions. For example, animal aged 10w, same animal aged 15w.

Result: The data series are significantly different from each other. P < 0.001.

## evaluation of Excel for statistics

Read this excellent review of Excel for statistics on practicalstats.com.

The main arguments are summarised below:

- many of Excel's charts violate standards of good graphics, i.e. fancy but not good for science
- many statistical methods are not available, e.g. box plots, 2-way ANOVA with unequal sample size, nonparametric tests
- several procedures are misleading, e.g. confidence function
- distributions are not computed with precision (Excel only correct to 2nd digit)
- regression routines are incorrect for multicollinear data
- ranks of tied data are computed incorrectly

## links

Tutorials: