Practical Use of Statistics in Biology Learning outcomes

Practical Use of Statistics in Biology

Learning outcomes? • By the end of the statistics session you will – Be able to describe what statistics are – To be able to explain why statistics are necessary – Understand what statistics do and don’t allow us to say about data – Understand the framework in which statistical analysis is used

What are statistics? There are 3 kinds of lies: lies, damn lies, and statistics. Mark Twain/Benjamin Disraeli There are two kinds of statistics, the kind you look up and the kind you make up. Rex Stout 88. 2% of statistics are made up on the spot. Vic Reeves

A mean value is 1. The sum of all values divided by the number of values 2. The middle value in a range 3. The most common value 4. A measure of central tendency

A median value is? 1. The sum of all values divided by the number of values 2. The middle value in a range 3. The most common value 4. A measure of central tendency

An average is? 1. The sum of all values divided by the number of values 2. The middle value in a range 3. The most common value 4. A measure of central tendency

Adam is 1. 75 m high, Rachel is 1. 77 m high, this data is? 1. 2. 3. 4. Ordinal Nominal Binary Continuous

Frequency histogram

Fish A turns right, Fish B turns left This data is 1. 2. 3. 4. Ordinal Nominal Binary Continuous

Binomial choice 12 10 8 6 4 2 0 place cue

3000 people supported Manchester United, 700 supported Chelsea, 1 supported Nottingham Forest. This data is? 1. Ordinal 2. Nominal 3. Binary 4. Continuous

The first contestant scored 7/10, the second 8/10, the third 1/10. This data is? 1. 2. 3. 4. Ordinal Nominal Binary Continuous

Which of the following is a statistic? 1. 2. 3. 4. p=0. 042 Mean=24. 7 A 50% chance χ2= 8. 43

Statistics • The quotes at the start suggests statistics is a way of “manipulating” perceptions of data to convince people of an argument, BUT, to scientists: • A statistic is a measure derived from samples of data • The expectation is that it relates closely to a real value (a parameter) in the population from which it is drawn • In some cases however (biased or incomplete sampling for example) it may not • Statistical significance tests whether differences between statistical measures are real or based on chance errors in the sampling

Height of football fans: is this significant? 1. Yes 2. No

Measures of variation • These are ways to measure the “confidence” in the mean value • Standard deviation (= √variance) is a measure of how closely the sample data are to the mean – 1 standard deviation from the sample mean contains 68. 25 of the actual population • Large standard deviation = large spread in the data • Small standard deviation = small spread in the data • Standard error is sometimes used (standard deviation/√n)

Standard deviation vs. standard error • Put simply, standard error is an estimate of how close to the population mean your sample mean is likely to be • standard deviation is the degree to which individuals within the sample differ from the sample mean. • Standard error decreases with larger sample size whereas standard deviation is unaffected by sample size

What does statistical significance mean? • Statistical tests calculate the probability that the samples of data that have been collected are drawn from the same population • Usually the default is the NULL HYPOTHESIS (H 0) that the two samples are drawn from the same population (sample means fall within the same population mean) • The ALTERNATIVE HYPOTHESIS (Ha) is that the samples are not drawn from the same population (different sample means represent different population means) • Statistical tests calculate the probability that the sample means are DIFFERENT BY CHANCE

If you had a 1 in 20 chance of missing the bus (and thus being 5 mins late for your 9 am lecture) if you stay in bed an extra hour in the morning, would you? 1. Yes 2. No

If you had a 5% chance of getting rabies if you pat this dog, would you? 1. Yes 2. No

Statistical significance • Biology uses a convention of p (α) < 0. 05 • It is arbitrary, no single reason why significance is at this level • What is the difference between p=0. 051 and p=0. 049? • Medicine uses a higher threshold (P<0. 01) • The level at which significance is assigned can lead to errors and thus the significance level is about deciding what is an acceptable error rate

Statistical Error • Type I error: – Falsely rejecting the null hypothesis • i. e. You have obtained a significant result by chance • In effect our significance level of p < 0. 05 is setting a type I error rate of 5% of the time. • Failing to adhere to the assumptions of a statistical test can increase the type I error probability • Type II error – Falsely rejecting the alternative hypothesis – Is generally considered less of a “sin” because no conclusions are being made about the data • Statistics is thus not about “proof”, or about manipulating data but about quantifying probability of error

Statistical tests • Statistical tests are based around comparing some aspect of the data, usually mean/mode, or median • The type of test that can be performed depends upon – The type of data • Independent, paired or repeated measures – The distribution of the data • Normal or non normal

Parametric tests • Parametric tests for statistical significance assume that the RESIDUAL values of the data have a normal distribution • A normal (Gaussian) distribution is one in which the mean median and modal values are all the same, and in which the distribution is bell shaped

Parametric tests assumptions • Parametric tests assume that 95% of the data falls within ± 1. 96 standard deviations of the mean • Thus, if a parameter falls outside this area then it is different with p<0. 05 that this is by chance – This is where we get our significance level • If a sample is not normally distributed, it will alter the values that fall outside this range and lead to errors • Equal variance between groups is also assumed

One tailed vs. two tailed • If we do not know whether one sample will be bigger or smaller than the other then our test is “two tailed” – The second sample could fall in the 2. 5% that is smaller or bigger than the first sample’s population • If we expect the difference to be in one direction, e. g. We only expect one group to be larger, then for p<0. 05 we expect those values to fall in one tail of the normal curve and thus the significance becomes p<0. 1

One tailed tests • Require a specific prediction to be made in ADVANCE of the data being collected AND results in the opposite direction can be reasonably regarded as equivalent to no difference • Biologists are as a rule very suspicious of one tailed tests, mainly because we usually don’t know enough about the systems we are studying to be confident that the conditions above are met.

How do you know if your data is normal? • Plot a frequency histogram (eyeball method) – NOT STRICTLY CORRECT • Test it using either Shapiro-Wilk (n<2000) or Kolmogorov-Smirnov test (n>2000) • Compares your data against a theoretical normal distribution • If p<0. 05 then it is DIFFERENT from a normal distribution • CAUTION: you are testing the RESIDUAL values (data value-mean) not the actual values • Parametric tests assume normal distribution of the RESIDUALS, not the original data

If your data is not normal. . . • You can transform it, to see if that makes it normal – Examples are to take the logarithm (value expressed to the power of 10, i. e. 10 x) or square root – If so, do the parametric test on the transformed data – If this does not help then • If n>50, do it anyway (large data sets tend to normal even if the underlying data is not) – Some parametric tests (e. g. ANOVA) are robust to deviations from normality IN SOME CIRCUMSTANCES • OR consider other types of test that don’t make assumptions about the data distribution

Non-parametric tests • Make fewer (or sometimes no) assumptions about the data • Use ranks of the data values rather than the data values themselves • Are useful when sample sizes are small (which make it hard to demonstrate normality in data) • Are useful for ordinal (ranking data) which are rarely normal • Are more ROBUST (fewer assumptions) but less POWERFUL (more likely to commit type II error).

Examples: test for difference between two means (data are independent) • Parametric: • T-test • Assumes normal distribution • Samples have equal variance • Does not require equal numbers in each group • Non Parametric • Mann Whitney U test • Responses are ordinal i. e. One is greater than another • Uses ranks of the data • Does not require equal numbers in each group

Differences between 2 means: data are paired • Parametric • Paired t test – Units are tested twice – Each pair is independent – e. g. Measure performance before and after drinking alcohol – Requires equal numbers (i. e. Matched pairs) • Non parametric • Wilcoxon signed rank test – Requires equal numbers (i. e. Matched pairs) – Each pair is independent – Data must be on an interval scale (ordinal not sufficient)

Differences between categories: χ2 test • Measures difference between observed and expected frequencies of counts – E. g. How many blue and how many green eyed people in a population – Can only be used on raw counts, not on measurements – Assumes random sampling – Independent data blue green male 10 5 female 7 8

Differences amongst 3 or more means • Parametric • ANOVA (analysis of variance) – Independent data – Use post hoc tests to determine where any differences lie • r. ANOVA – Repeated measures – Scores between participants should be independent – Should be derived from a random sample • Non Parametric • Kruskal-Wallis – Independent data – Use multiple pairwise comparisons to determine where differences lie • Freidman test – Repeated measures – Scores between participants should be independent

Complex tests • General Linear model, generalised linear model, Generalized Linear Mixed Model – Are used for “complicated data sets” – Usually when there are many variables other than the independent variables to be tested and it is necessary to see how much variation these are contributing (e. g. Day of treatment, sex, year of treatment) – These are included as “random” variables

Carrying out statistical tests • By hand (if you did statistics A level, you already have) – Takes a long time but you will understand more about how the test works. Worth trying for e. g. Mann Whitney U test with small sample sizes • By creating your own formulas in excel, or via a programming package – Takes time to set up but will allow you to understand the tests, once set up, quicker than by hand. • Using specialist software packages (SPSS is available at Queen’s) – Good for complex stats tests or large data sets, but you may get into a “shake and bake” mindset i. e. Not understand what is really going on. Requires data input in the right format • Using the R development environment – Can be used as a programming environment or with some level of prebuilt functions (once data is in, most common stats tests are a single command). – May be daunting as is close to programming, but is becoming recognised as a vital skill and while not as convenient as SPSS, will enhance employability

Designing an experiment: statistical considerations • When designing an experiment it is prudent to take account of the type of data that you will have and the kind of stats tests you will use • Think about the type of data you will collect, how it will look in a spreadsheet and how it will be analysed • What limitations might your design bring – E. g. Small sample size, non continuous data set, paired data requires 2 samples from each individual

Designing an experiment: student performance and alcohol • Design an experiment to test the hypothesis that alcohol consumption affects academic performance on a field course • Consider the type of data to collect and what type of statistics tests you will use to analyse the data • There are only 30 students on the field course so consider how to best reduce random sampling errors

If you only buy 2 books. . .