Hypothesis Testing Coke vs Pepsi Hypothesis tweets reflect

  • Slides: 12
Download presentation
Hypothesis Testing

Hypothesis Testing

Coke vs. Pepsi • Hypothesis: tweets reflect market share (people tweet as much as

Coke vs. Pepsi • Hypothesis: tweets reflect market share (people tweet as much as they drink) • Market share: – 67% vs. 33% • From tweets: – 71% vs. 29% • Happened by chance? Or people tend to talk more about Coke than they drink it?

A simpler hypothesis testing • Claim: I can distinguish Coke and Pepsi just by

A simpler hypothesis testing • Claim: I can distinguish Coke and Pepsi just by tasting. • How do you verify my claim?

It's like a court judgment • If you want to prove something, you have

It's like a court judgment • If you want to prove something, you have to assume the opposite, and find evidence that contradicts it. • In a court, you want to prove a defendant guilty. You assume he/she is innocent.

You conducted an experiment… • And have some outcome – 62 out 100 correct

You conducted an experiment… • And have some outcome – 62 out 100 correct • Assuming I cannot distinguish them, I did it just by random guessing, is the result possible? • Of course possible, if I'm lucky, I can get 100 out 100. But is the result surprising?

How do we define surprising-ness? • Let's play random guess game one million times.

How do we define surprising-ness? • Let's play random guess game one million times. If it turns out, 4 of 1 million times someone manages to score 62 or more, then we can say you have to be very super duper lucky to do that. Actually 0. 000004% lucky. • And we are 99. 999996% sure, that you can't get 62 in one game just by luck • Thus I am actually be able to distinguish Coke and Pepsi to some extent.

But we can't play this game that many times… • Or can we? •

But we can't play this game that many times… • Or can we? • Open Excel • In cell B 1, type = rand() • Can you make B 1 say 0 if the random number is less than 0. 5 and 1 otherwise? • You just flipped a coin in Excel!

Random Guessing Game in Excel • Flip the coin 100 times, in the same

Random Guessing Game in Excel • Flip the coin 100 times, in the same column • Find out how many heads you had in cell B 101 • We've just played the random guessing game one time. • Can you do it 10 times?

Histogram • We want to find out how many times we scored 62 or

Histogram • We want to find out how many times we scored 62 or higher. • It's also interesting to look at how the scores are distributed, i. e. which are more likely • It's called a histogram • Let's create one by hand • Then in Excel

 • Now do it 50 times! (or more… doesn't have to be exact)

• Now do it 50 times! (or more… doesn't have to be exact) • Does the histogram look better? • What about 500 times? Look at the histogram

How probable is a score of 62? • You can calculate it from the

How probable is a score of 62? • You can calculate it from the histogram • Let's play the game in Python for as many times as we want! • Here are the steps: – flip a coin 100 times, and record the number of heads (I'll show you how to flip coins in Python) – Do it 1, 000 times. Record all the scores (numbers of heads) – Find out how many of them is greater than 62. What's the percentage? – Now calculate this percentage for 2, 000 games. 5, 000 games, 10, 000 and 50, 000 games. What about the score 57 or higher? 54? 50? – Ahuh, may be you want to write a function…

Back to Coke vs. Pepsi

Back to Coke vs. Pepsi