Statistics for Water Science Hypothesis Testing Fundamental concepts

  • Slides: 28
Download presentation
Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite

Statistics for Water Science: Hypothesis Testing: Fundamental concepts and a survey of methods Unite 5: Module 17, Lecture 2

Statistics · A branch of mathematics dealing with the collection, analysis, interpretation and presentation

Statistics · A branch of mathematics dealing with the collection, analysis, interpretation and presentation of masses of numerical data: · Descriptive Statistics (Lecture 1) · Basic description of a variable · Hypothesis Testing (Lecture 2) · Asks the question – is X different from Y? · Predictions (Lecture 3) · What will happen if… Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 2

Objectives · Introduce the basic concepts and assumptions of significance tests · Distributions on

Objectives · Introduce the basic concepts and assumptions of significance tests · Distributions on parade · Developing hypotheses · What is “true”? · Survey statistical methods for testing for differences in populations of numbers · Sample size issues · Appropriate tests · What we won’t do: · Elaborate on mathematical underpinnings of tests (take a good stats course for this!) Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 3

From our last lecture · The mean: · A measure of central tendency ·

From our last lecture · The mean: · A measure of central tendency · The Standard Deviation: · A measure of the ‘spread’ of the data Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 4

Tales of the normal distribution · Many kinds of data follow this symmetrical, bell-shaped

Tales of the normal distribution · Many kinds of data follow this symmetrical, bell-shaped curve, often called a Normal Distribution. · Normal distributions have statistical properties that allow us to predict the probability of getting a certain observation by chance. Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 5

Tales of the normal distribution · When sampling a variable, you are most likely

Tales of the normal distribution · When sampling a variable, you are most likely to obtain values close to the mean · 68% within 1 SD · 95% within 2 SD 2. 0 Developed by: Host 1. 0 Updated: Jan. 21, 2004 0 1. 0 2. 0 U 5 -m 17 b-s 6

Tales of the normal distribution · Note that a couple values are outside the

Tales of the normal distribution · Note that a couple values are outside the 95 th (2 SD) interval · These are improbable 2. 0 Developed by: Host 1. 0 Updated: Jan. 21, 2004 0 1. 0 2. 0 U 5 -m 17 b-s 7

Tales of the normal distribution · The essence of hypothesis testing: · If an

Tales of the normal distribution · The essence of hypothesis testing: · If an observation appears in one of the tails of a distribution, there is a probability that it is not part of that population. 2. 0 Developed by: Host 1. 0 0 Updated: Jan. 21, 2004 1. 0 2. 0 U 5 -m 17 b-s 8

“Significant Differences” · A difference is considered significant if the probability of getting that

“Significant Differences” · A difference is considered significant if the probability of getting that difference by random chance is very small. · P value: · The probability of making an error by chance · Historically we use p < 0. 05 Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 9

The probability of detecting a significant difference is influenced by: · The magnitude of

The probability of detecting a significant difference is influenced by: · The magnitude of the effect · A big difference is more likely to be significant than a small one Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 10

The probability of detecting a significant difference is influenced by: · The spread of

The probability of detecting a significant difference is influenced by: · The spread of the data · If the Standard Deviation is low, it will be easier to detect a significant difference Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 11

The probability of detecting a significant difference is influenced by: · The number of

The probability of detecting a significant difference is influenced by: · The number of observations · Large samples more likely to detect a difference than a small sample Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 12

Hypothesis testing · Hypothesis: · A statement which can be proven false · Null

Hypothesis testing · Hypothesis: · A statement which can be proven false · Null hypothesis HO: · “There is no difference” · Alternative hypothesis (HA): · “There is a difference…” · In statistical testing, we try to “reject the null hypothesis” · If the null hypothesis is false, it is likely that our alternative hypothesis is true · “False” – there is only a small probability that the results we observed could have occurred by chance Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 13

Common probability levels Alpha Level Reject Null Hypothesis P > 0. 05 Not significant

Common probability levels Alpha Level Reject Null Hypothesis P > 0. 05 Not significant No P < 0. 05 1 in 20 Significant Yes P <0. 01 1 in 100 Significant Yes 1 in 1000 Highly Significant Yes P < 0. 001 Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 14

Types of statistical errors (you could be right, you could be wrong) Accept Ho

Types of statistical errors (you could be right, you could be wrong) Accept Ho Reject Ho Ho is True Correct Decision Type I Error Alpha Ho is False Type II Error Beta Correct Decision Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 15

Examples of type I and type II errors Type II Error 2. 0 Type

Examples of type I and type II errors Type II Error 2. 0 Type I Error 1. 0 Developed by: Host 0 1. 0 Updated: Jan. 21, 2004 2. 0 U 5 -m 17 b-s 16

Common statistical tests Question Test Does a single observation belong to a population of

Common statistical tests Question Test Does a single observation belong to a population of values? Are two (or more populations) of number different? Z-test T-test F-test (ANOVA) Is there a relationship between x and y Regression Is there a trend in the data (special case of above Regression Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 17

Does a single observation belong to a population of values: The Z-test · On

Does a single observation belong to a population of values: The Z-test · On June 26, 2002, a temperature probe reading at 7 m depth in Medicine Lake was 20. 30 C. Is this unusually high for June? Note: this is a “one-tailed test”, we just want to know if it’s high We’re not asking if it is unusually low or high (2 tailed) Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 18

The z distribution: Standard normal distribution) · The Z-distribution is a Normal Distribution, with

The z distribution: Standard normal distribution) · The Z-distribution is a Normal Distribution, with special properties: · Mean = 0 Variance = 1 · Z = (observed value – mean)/standard error · Standard error = standard deviation * sqrt(n) The Z distribution Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 19

Medicine lake example · Calculate the Z-score for the observed data · Compare the

Medicine lake example · Calculate the Z-score for the observed data · Compare the Z score with the significant value for a one tailed test (1. 645) Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 20

The Deep Math… Z = (observed value – mean)/standard error Standard error = standard

The Deep Math… Z = (observed value – mean)/standard error Standard error = standard deviation * sqrt(n) Z = (20. 3 – 19. 7) 0. 08 = 6. 89 · Since 6. 89 > the critical Z value of 1. 64 · Our deep temperature is significantly higher than the June average temperature. · Further exploration shows that a storm the previous day caused the warmer surface waters to mix into the deeper waters. Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 21

Are two populations different: The t-test · Also called Student’s t-test. “Student” was a

Are two populations different: The t-test · Also called Student’s t-test. “Student” was a synonym for a statistician that worked for Guinness brewery · Useful for “small” samples (<30) · One of the most basic statistical tests, can be performed in Excel or any common statistical package Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 22

Are two populations different: The t-test Developed by: Host Updated: Jan. 21, 2004 U

Are two populations different: The t-test Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 23

Are two populations different: The t-test · One of the most basic statistical tests,

Are two populations different: The t-test · One of the most basic statistical tests, can be performed in Excel or any common statistical package · Same principle as Z-test – calculate a t value, and assess the probability of getting that value Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 24

In Excel · Formula: · @ttest(Pop 1, Pop 2, #Tails, Test. Type) · Tailed

In Excel · Formula: · @ttest(Pop 1, Pop 2, #Tails, Test. Type) · Tailed tests: 1 or 2 · Test. Type · 1 - paired (if there is a logical pairing of XY data) · 2 - equal variance · 3 - unequal variance · Test returns exact probability value Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 25

Example: 1 -tailed temperature comparison · @ttest(Pop 1, Pop 2, 1, 3) = 1.

Example: 1 -tailed temperature comparison · @ttest(Pop 1, Pop 2, 1, 3) = 1. 5 * 10 -149 Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 26

ANOVA: Tests of multiple populations · ANOVA – analysis of variance · Compare 2

ANOVA: Tests of multiple populations · ANOVA – analysis of variance · Compare 2 or more populations · Surface temperatures for 3 lakes · Can handle single or multiple factors · One way ANOVA – comparing lakes · Two-way ANOVA – compare two factors · Temperature x Light effects on algal populations · Repeated measures ANOVA – compare factors over time Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 27

Next Time: Regression - Finding relationships among variables Developed by: Host Updated: Jan. 21,

Next Time: Regression - Finding relationships among variables Developed by: Host Updated: Jan. 21, 2004 U 5 -m 17 b-s 28