Stat 301 Day 28 Comparing groups on a

  • Slides: 16
Download presentation
Stat 301 – Day 28 Comparing groups on a quantitative response (Ch. 4)

Stat 301 – Day 28 Comparing groups on a quantitative response (Ch. 4)

Announcements n Project 2

Announcements n Project 2

Example n n Researchers Holdgate et al. (2016) studied walking behavior of elephants in

Example n n Researchers Holdgate et al. (2016) studied walking behavior of elephants in North American zoos to see whethere is a difference in average distance traveled by African and Asian elephants. They put GPS loggers on 33 randomly selected African elephants and 23 randomly selected Asian elephants and measured the distance (in kilometers) the elephants walked per day. How can we analyze these data?

Ch. 4 – Comparing groups on quantitative response n n What are appropriate graphs

Ch. 4 – Comparing groups on quantitative response n n What are appropriate graphs to look at? What are appropriate statistics for summarizing the data numerically? q n n Choice of statistic How assess statistical significance? How estimate the corresponding difference in population/ treatment parameter? Factors that affect p-value, confidence Scope of conclusions based on study design

Investigation 4. 2 (p. 257) n n n Salaries (in millions of dollars) of

Investigation 4. 2 (p. 257) n n n Salaries (in millions of dollars) of NBA players Open NBASalaries 2017. txt from Lecture Notes page (just salaries and conference) Copy and paste in R or JMP q q n R: read. table(“clipboard”, header=T) or Import Dataset JMP: paste with column names Answer (c) – (d)

Investigation 4. 2 n Answer (f) and (h) q q q n May want

Investigation 4. 2 n Answer (f) and (h) q q q n May want to open NBASalary_2 and use Two Populations applet Take one sample Report difference in sample means on board Answer (n) q 1000 differences in sample means

Central Limit Theorem n Does this theorem apply here? n Did it successfully predict

Central Limit Theorem n Does this theorem apply here? n Did it successfully predict the behavior of the sampling distribution?

Recap n n Overall/on average, not much difference between the two populations However, taking

Recap n n Overall/on average, not much difference between the two populations However, taking 20 from each league, might find a difference in the sample proportions from random sampling error

Recap n Luckily, the distribution of the differences in sample means follows a very

Recap n Luckily, the distribution of the differences in sample means follows a very predictable pattern Mean: m 1 -m 2 SD: sqrt(s 12/n 1 + s 22/n 2) Approximately normal as populations not too skewed or samples too small Differences in sample means

1000 trials – difference in sample means CLT Prediction n Approximately normal Center: .

1000 trials – difference in sample means CLT Prediction n Approximately normal Center: . 1927 SD: 1. 5255 Simulation

Recap n Which means, when we use the sample standard deviations to calculate the

Recap n Which means, when we use the sample standard deviations to calculate the standard error, the standardized statistic will be wellmodelled by a t-distribution The appropriate degrees of freedom are a little complicated, but we’ll let the computer deal with that

Technology Options (p. 264 -5) n Theory-based Inference applet q q n R q

Technology Options (p. 264 -5) n Theory-based Inference applet q q n R q q n Summary data Raw data (stacked vs. unstacked) iscamtwosamplet t. test(y ~ x, alt = “”, var. equal = FALSE) JMP q q Journal: Hypothesis Test for Two Means Fit Y by X, t Test

To Do – for Thursday n Finish Investigation 4. 2 q n Technology instructions

To Do – for Thursday n Finish Investigation 4. 2 q n Technology instructions Quiz 24 q Use the Two Proportions applet

Investigation 4. 2 (q) To generate the t-statistics q q q Make sure looking

Investigation 4. 2 (q) To generate the t-statistics q q q Make sure looking at full dataset Edit the script Repeat earlier edits Uncomment new. Column("tstat", numeric) Comment //results. Table<<Add. Row… Uncomment tstat = and results. Table<<Add. Row…

Investigation 4. 2 (n) Investigate (independent) random samples from two different populations q q

Investigation 4. 2 (n) Investigate (independent) random samples from two different populations q q n Run script to generate two repetitions Edit script to run 100 repetitions but without pausing in between Examine all 3 distributions. How do they compare?

Investigation 4. 1 n n Comparing the groups using graphs with the same scaling

Investigation 4. 1 n n Comparing the groups using graphs with the same scaling Statistical inference: With small data sets, why not look at all possible arrangements of the observations to groups? Count how many are more extreme than our observed result? q q Models random assignment Do need to consider what you mean by “more extreme” Not feasible in moderate to large data sets Doesn’t give us a confidence interval