Statistical Inference and Regression Analysis StatGB 3302 30

Statistical Inference and Regression Analysis: Stat-GB. 3302. 30, Stat-UB. 0015. 01 Professor William Greene Stern School of Business IOMS Department of Economics

Part 5 – Hypothesis Testing

Part 5 – Hypothesis Testing 3/100 Objectives of Statistical Analysis ¢ ¢ Estimation l How long do hard drives last? l What is the median income among the 99%ers? Inference – hypothesis testing l Did minorities pay higher mortgage rates during the housing boom? l Is there a link between environmental factors and breast cancer on eastern long island?

Part 5 – Hypothesis Testing 4/100 General Frameworks ¢ Parametric Tests: features of specific distributions such as the mean of a Bernoulli or normal distribution. ¢ Specification Tests (Semiparametric) l l ¢ Do the data arrive from a Poisson process Are the data normally distributed Nonparametric Tests: Are two discrete processes independent?

Part 5 – Hypothesis Testing 5/100 Hypotheses ¢ ¢ Hypotheses - labels l State 0 of Nature – Null Hypothesis l State 1 – Alternative Hypothesis Exclusive: Prob(H 0 ∩ H 1) = 0 Exhaustive: Prob(H 0) + Prob(H 1) = 1 Symmetric: Neither is intrinsically “preferred” – the objective of the study is only to support one or the other. (Rare? )

Part 5 – Hypothesis Testing 6/100 Testing Strategy

7/100 Part 5 – Hypothesis Testing Posterior (to the Evidence) Odds

Part 5 – Hypothesis Testing 8/100 Does the New Drug Work? ¢ ¢ Hypotheses: H 0 =. 50, H 1 =. 75 Priors: P 0 =. 40, P 1 =. 60 Clinical Trial: N = 50, 31 patients “respond’” p =. 62 Likelihoods: l l ¢ ¢ L 0 (31 of 50| =. 50) = Binomial(50, 31, . 50) =. 0270059 L 1 (31 of 50| =. 75) = Binomial(50, 31, . 75) =. 0148156 Posterior odds in favor of H 0 = (. 4/. 6)(. 0270059/. 0148156) = 1. 2152 > 1 Priors favored H 1 1. 5 to 1, but the posterior odds favor H 0 , 1. 2152 to 1. The evidence discredits H 1

Part 5 – Hypothesis Testing 9/100 Decision Strategy ¢ ¢ ¢ Prefer the hypothesis with the higher posterior odds A gap in theory: How does the investigator do the cost benefit test? l Starting a new business venture or entering a new market: Priors and market research l FDA approving a new drug or medical device. Priors and clinical trials Statistical Decision Theory adds the costs and benefits of decisions and errors.

Part 5 – Hypothesis Testing 10/100 An Alternative Strategy ¢ Recognize the asymmetry of null and alternative hypotheses. ¢ Eliminate the prior odds (which are rarely formed or available).

11/100 http: //query. nytimes. com/gst/fullpage. html? res=9 C 00 E 4 DF 113 BF 935 A 3575 BC 0 A 9649 C 8 B 63 Part 5 – Hypothesis Testing

12/100 Part 5 – Hypothesis Testing Classical Hypothesis Testing ¢ ¢ The scientific method applied to statistical hypothesis testing Hypothesis: The world works according to my hypothesis Testing or supporting the hypothesis l Data gathering l Rejection of the hypothesis if the data are inconsistent with it l Retention and exposure to further investigation if the data are consistent with the hypothesis Failure to reject is not equivalent to acceptance.

Part 5 – Hypothesis Testing 13/100 Asymmetric Hypotheses Null Hypothesis: The proposed state of nature ¢ Alternative hypothesis: The state of nature that is believed to prevail if the null is rejected. ¢

Part 5 – Hypothesis Testing 14/100 Hypothesis Testing Strategy Formulate the null hypothesis ¢ Gather the evidence ¢ Question: If my null hypothesis were true, how likely is it that I would have observed this evidence? ¢ Very unlikely: Reject the hypothesis l Not unlikely: Do not reject. (Retain the hypothesis for continued scrutiny. ) l

Part 5 – Hypothesis Testing 15/100 Some Terms of Art ¢ ¢ ¢ ¢ Type I error: Incorrectly rejecting a true null Type II error: Failure to reject a false null Power of a test: Probability a test will correctly reject a false null Alpha level: Probability that a test will incorrectly reject a true null. This is sometimes called the size of the test. Significance Level: Probability that a test will retain a true null = 1 – alpha. Rejection Region: Evidence that will lead to rejection of the null Test statistic: Specific sample evidence used to test the hypothesis Distribution of the test statistic under the null hypothesis: Probability model used to compute probability of rejecting the null. (Crucial to the testing strategy – how does the analyst assess the evidence? )

Part 5 – Hypothesis Testing 16/100 Possible Errors in Testing Hypothesis is True Hypothesis is False I Do Not Reject the Hypothesis Correct Decision I Reject the Hypothesis Correct Type I Error Decision Type II Error

Part 5 – Hypothesis Testing 17/100 A Legal Analogy: The Null Hypothesis is INNOCENT Null Hypothesis Not Guilty Finding: Verdict Guilty Correct Decision Alternative Hypothesis Guilty Type II Error Guilty defendant goes free Type I Error Innocent defendant is convicted Correct Decision The errors are not symmetric. Most thinkers consider Type I errors to be more serious than Type II in this setting.

Part 5 – Hypothesis Testing 18/100 (Jerzy) Neyman – (Karl) Pearson Methodology “Statistical” testing ¢ Methodology ¢ l l Formulate the “null” hypothesis Decide (in advance) what kinds of “evidence” (data) will lead to rejection of the null hypothesis. I. e. , define the rejection region Gather the data Mechanically carry out the test.

Part 5 – Hypothesis Testing 19/100 Formulating the Null Hypothesis ¢ ¢ Stating the hypothesis: A belief about the “state of nature” l A parameter takes a particular value l There is a relationship between variables l And so on… The null vs. the alternative l By induction: If we wish to find evidence of something, first assume it is not true. l Look for evidence that leads to rejection of the assumed hypothesis. l Evidence that rejects the null hypothesis is significant

Part 5 – Hypothesis Testing 20/100 Example: Credit Scoring Rule ¢ Investigation: I believe that Fair Isaacs relies on home ownership in deciding whether to “accept” an application. l l ¢ Null hypothesis: There is no relationship Alternative hypothesis: They do use homeownership data. What decision rule should I use?

Part 5 – Hypothesis Testing 21/100 Some Evidence = Homeowners 5469 5030 1845 1100

Part 5 – Hypothesis Testing 22/100 Hypothesis Test Acceptance rate for homeowners = 5030/(5030+1100) =. 82055 ¢ Acceptance rate for renters is. 74774 ¢ H 0: Acceptance rate for renters is not less than for owners. ¢ H 0: p(renters) >. 82055 ¢ H 1: p(renters) <. 82055 ¢

Part 5 – Hypothesis Testing 23/100 The Rejection Region What is the “rejection region? ” ¢ Data (evidence) that are inconsistent with my hypothesis ¢ Evidence is divided into two types: Data that are inconsistent with my hypothesis (the rejection region) l Everything else l

Part 5 – Hypothesis Testing 24/100 My Testing Procedure I will reject H 0 if p(renters) <. 815 (chosen arbitrarily) ¢ Rejection region is sample values of p(renters) < 0. 815 ¢

Part 5 – Hypothesis Testing 25/100 Distribution of the Test Statistic Under the Null Hypothesis ¢ ¢ ¢ Test statistic p(renters) = 1/N i Accept(=1 or 0) Use the central limit theorem: Assumed mean =. 82055 Implied standard deviation = sqr(. 82055*. 17945/7413)=. 00459 Using CLT, normally distributed. (N is very large). Use z = (p(renters) -. 82055) /. 00459

Part 5 – Hypothesis Testing 26/100 Alpha Level and Rejection Region Prob(Reject H 0|H 0 true) = Prob(p <. 815 | H 0 is true) = Prob[(p -. 82055)/. 00459) = Prob[z < -1. 209] =. 11333 ¢ Probability of a Type I error ¢ Alpha level for this test ¢

Part 5 – Hypothesis Testing 27/100 Distribution of the Test Statistic and the Rejection Region Area=. 11333

Part 5 – Hypothesis Testing 28/100 The Test ¢ The observed proportion is 5469/(5469+1845) = 5469/7314 =. 74774 ¢ The null hypothesis is rejected at the 11. 333% significance level (by the design of the test)

Part 5 – Hypothesis Testing 29/100 Power of the test

30/100 Part 5 – Hypothesis Testing Power Function for the Test (Power = size when alternative = the null. )

Part 5 – Hypothesis Testing 31/100 Application: Breast Cancer On Long Island ¢ ¢ ¢ Null Hypothesis: There is no link between the high cancer rate on LI and the use of pesticides and toxic chemicals in dry cleaning, farming, etc. Neyman-Pearson Procedure l Examine the physical and statistical evidence l If there is convincing covariation, reject the null hypothesis l What is the rejection region? The NCI study: l Working null hypothesis: There is a link: We will find the evidence. l How do you reject this hypothesis?

Part 5 – Hypothesis Testing 32/100 Formulating the Testing Procedure ¢ ¢ Usually: What kind of data will lead me to reject the hypothesis? Thinking scientifically: If you want to “prove” a hypothesis is true (or you want to support one) begin by assuming your hypothesis is not true, and look for evidence that contradicts the assumption.

Part 5 – Hypothesis Testing 33/100 Hypothesis About a Mean ¢ I believe that the average income of individuals in a population is $30, 000. l l ¢ ¢ ¢ H 0 : μ = $30, 000 (The null) H 1: μ ≠ $30, 000 (The alternative) I will draw the sample and examine the data. The rejection region is data for which the sample mean is far from $30, 000. How far is far? ? ? That is the test.

Part 5 – Hypothesis Testing 34/100 Application The mean of a population takes a specific value: ¢ Null hypothesis: H 0: μ = $30, 000 H 1: μ ≠ $30, 000 ¢ Test: Sample mean close to hypothesized population mean? ¢ Rejection region: Sample means that are far from $30, 000 ¢

Part 5 – Hypothesis Testing 35/100 Deciding on the Rejection Region ¢ ¢ If the sample mean is far from $30, 000, reject the hypothesis. Choose, the region, for example, Rejection 29, 500 30, 000 30, 500 The probability that the mean falls in the rejection region even though the hypothesis is true (should not be rejected) is the probability of a type 1 error. Even if the true mean really is $30, 000, the sample mean could fall in the rejection region.

Part 5 – Hypothesis Testing 36/100 Reduce the Probability of a Type I Error by Making the (non)Rejection Region Wider Reduce the probability of a type I error by moving the boundaries of the rejection region farther out. Probability outside this interval is large. 28, 500 29, 500 You can make a type I error impossible by making the rejection region very far from the null. Then you would never make a type I error because you would never reject H 0. 30, 000 30, 500 Probability outside this interval is much smaller. 31, 500

Part 5 – Hypothesis Testing 37/100 Setting the α Level ¢ ¢ ¢ “α” is the probability of a type I error Choose the width of the interval by choosing the desired probability of a type I error, based on the t or normal distribution. (How confident do I want to be? ) Multiply the z or t value by the standard error of the mean.

Part 5 – Hypothesis Testing 38/100 Testing Procedure ¢ ¢ The rejection region will be the range of values greater than μ 0 + zσ/√N or less than μ 0 - zσ/√N Use z = 1. 96 for 1 - α = 95% Use z = 2. 576 for 1 - α = 99% Use the t table if small sample, variance is estimated and sampling from a normal distribution.

Part 5 – Hypothesis Testing 39/100 Deciding on the Rejection Region ¢ If the sample mean is far from $30, 000, reject the hypothesis. ¢ Choose, the region, say, Rejection I am 95% certain that I will not commit a type I error (reject the hypothesis in error). (I cannot be 100% certain. )

40/100 Part 5 – Hypothesis Testing The Testing Procedure (For a Mean)

Part 5 – Hypothesis Testing 41/100 Application

Part 5 – Hypothesis Testing 42/100 The Test Procedure Choosing z = 1. 96 makes the probability of a Type I error 0. 05. ¢ Choosing z = 2. 576 would reduce the probability of a Type I error to 0. 01. ¢ Reducing the probability of a Type I error reduces the power of the test because it reduces the probability that the null hypothesis will be rejected. ¢

Part 5 – Hypothesis Testing 43/100 P Value ¢ Probability of observing the sample evidence assuming the null hypothesis is true. ¢ Null hypothesis is rejected if P value <

44/100 Part 5 – Hypothesis Testing P value < Prob[p(renter) <. 74774] = Prob[z < (. 74774 -. 82055)/. 00459] = (-15. 86) =. 59946942854362260 * 10 -56 Impossible =. 11333

Part 5 – Hypothesis Testing 45/100 Confidence Intervals ¢ For a two sided test about a parameter, a confidence interval is the complement of the rejection region. (Proof in text, p. 338)

Part 5 – Hypothesis Testing 46/100 Confidence Interval ¢ If the sample mean is far from $30, 000, reject the hypothesis. ¢ Choose, the region, say, Rejection Confidence Rejection I am 95% certain that the confidence interval contains the true mean of the distribution of incomes. (I cannot be 100% certain. )

Part 5 – Hypothesis Testing 47/100 One Sided Tests ¢ ¢ H 0 = 0, H 1 0 Rejection region is sample mean far from 0 in either direction H 0 = 0, H 1 > 0. Sample means less than 0 cannot be in the rejection region. Entire rejection region is above 0. Reformulate: H 0 < 0, H 1 > 0.

Part 5 – Hypothesis Testing 48/100 Likelihood Ratio Tests

Part 5 – Hypothesis Testing 49/100 Carrying Out the LR Test In most cases, exact distribution of the statistic is unknown ¢ Use -2 log Chi squared [1] ¢ For a test about 1 parameter, threshold value is 3. 84 (5%) or 6. 45 (1%) ¢

Part 5 – Hypothesis Testing 50/100 Poisson Likelihood Ratio Test 50

51/100 Part 5 – Hypothesis Testing Generalities About LR Test

Part 5 – Hypothesis Testing 52/100 Gamma Application

Part 5 – Hypothesis Testing 53/100 Specification Tests Generally a test about a distribution where the alternative is “some other distribution. ” ¢ Test is generally based on a feature of the distribution that is true under the null but not true under the alternative. ¢

Part 5 – Hypothesis Testing 54/100 Poisson Specification Tests 3820 observations on doctor visits ¢ Poisson distribution? ¢

Part 5 – Hypothesis Testing 55/100 Deviance Test ¢ ¢ ¢ Poisson Distribution p(x) = exp(- ) x/x! H 0: Everyone has the same Poisson Distribution H 1: Everyone has their own Poisson distribution Under H 0, observations will tend to be near the mean. Under H 1, there will be much more variation. Likelihood ratio statistic (Text, p. 348)

Part 5 – Hypothesis Testing 56/100 Deviance Test

Part 5 – Hypothesis Testing 57/100 Dispersion Test ¢ ¢ ¢ Poisson Distribution p(x) = exp(- ) x/x! H 0: The distribution is Poisson H 1: The distribution is something else Under H 0, the mean will be (almost) the same as the variance Approximate Likelihood ratio statistic (Text, p. 348) = N * Variance / Mean For the doctor visit data, this is 22, 348. 6 vs. chi squared with 1 degree of freedom. H 0 is rejected.

Part 5 – Hypothesis Testing 58/100 Specification Test - Normality ¢ Normal Distribution is symmetric and has kurtosis = 3. ¢ Compare observed 3 rd and 4 th moments to what would be expected from a normal distribution.

59/100 Part 5 – Hypothesis Testing Symmetric and Skewed Distributions

Part 5 – Hypothesis Testing 60/100 Kurtosis: t[5] vs. Normal Kurtosis of normal(0, 1) Kurtosis of t[k] =3 = 3 + 6/(k-4); for t[5] = 3+6/(5 -4) = 9.

61/100 Part 5 – Hypothesis Testing Bowman and Shenton Test for Normality

Part 5 – Hypothesis Testing 62/100 Testing for a Distribution ¢ ¢ ¢ H 0: The distribution is assumed H 1: The assumed distribution is incorrect Strategy: Do the features of the sample resemble what we would observe if H 0 were correct l Continuous: CDF of data resemble CDF of the assumed distribution l Discrete: Sample cell probabilities resemble predictions from the assumed distribution

63/100 Part 5 – Hypothesis Testing Probability Plot for Normality

Part 5 – Hypothesis Testing 64/100 Normal (log)Income?

65/100 Part 5 – Hypothesis Testing Random Sample from Normal

Part 5 – Hypothesis Testing 66/100 Normality Tests

67/100 Part 5 – Hypothesis Testing Kolmogorov - Smirnov Test

Part 5 – Hypothesis Testing 68/100 Chi Squared Test for a Discrete Distribution Outcomes = A 1, A 2, …, AM ¢ Predicted probabilities based on a theoretical distribution = E 1( ), E 2( ), …, EM( ). ¢ Sample cell frequencies = O 1, …, OM ¢

Part 5 – Hypothesis Testing 69/100 Test Statistics

Part 5 – Hypothesis Testing 70/100 V 2 Rocket Hits Adapted from Richard Isaac, The Pleasures of Probability, Springer Verlag, 1995, pp. 99101. 576 0. 25 Km 2 areas of South London in a grid (24 by 24) 535 rockets were fired randomly into the grid = N P(a rocket hits a particular grid area) = 1/576 = 0. 001736 = θ Expected number of rocket hits in a particular area = 535/576 = 0. 92882 How many rockets will hit any particular area? 0, 1, 2, … could be anything up to 535. The 0. 9288 is the λ for a Poisson distribution:

Part 5 – Hypothesis Testing 71/100 1 2 3 4 5 6 7 8 9 10 11 12 13

Part 5 – Hypothesis Testing 1 72/100 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13

Part 5 – Hypothesis Testing 73/100 1 2 3 4 5 6 7 8 9 10 11 12 13

Part 5 – Hypothesis Testing 74/100 Poisson Process ¢ ¢ θ = 1/169 N = 144 λ = 144 * 1/169 = 0. 852 True Probabilities: l l l P(X=0) =. 4266 P(X=1) =. 3634 P(X=2) =. 1548 P(X=3) =. 0437 P(X=4) =. 0094 P(X>4) =. 0021

Part 5 – Hypothesis Testing 75/100 Interpreting The Process ¢ ¢ λ = 0. 852 Probabilities: l l l P(X=0) =. 4266 P(X=1) =. 3634 P(X=2) =. 1548 P(X=3) =. 0437 P(X=4) =. 0094 P(X>4) =. 0021 ¢ ¢ ¢ There are 169 squares There are 144 “trials” Expect. 4266*169 = 72. 1 to have 0 hits/square Expect. 3634*169 = 61. 4 to have 1 hit/square Etc. Expect the average number of hits/square to =. 852.

Part 5 – Hypothesis Testing 76/100 Does the Theory Work? Theoretical Outcomes Sample Outcomes Outcome Probability Number Sample Proportion of Cells 0 . 4266 72 . 4733 80 1 . 3634 61 . 2899 49 2 . 1548 26 . 1539 26 3 . 0437 7 . 0769 13 4 . 0094 2 . 0059 1 >4 . 0021 1 . 0000 0 169*Prob(Outcome) Number of cells Observed frequencies

Part 5 – Hypothesis Testing 77/100 Chi Squared for the Bombing Run 77

78/100 Part 5 – Hypothesis Testing

79/100 Part 5 – Hypothesis Testing

80/100 Part 5 – Hypothesis Testing

81/100 Part 5 – Hypothesis Testing

Part 5 – Hypothesis Testing 82/100 Difference in Means of Two Populations ¢ Two Independent Normal Populations l Common known variance l Common unknown variance l Different Variances l One and two sided tests ¢ Paired Samples Means of paired observations l Treatments and Controls – Diff-in-Diff SAT Nonparametric – Mann/Whitney Two Bernoulli Populations l ¢ ¢

83/100 Part 5 – Hypothesis Testing Comparing Two Normal Populations

84/100 Part 5 – Hypothesis Testing Unknown Common Variance

85/100 Part 5 – Hypothesis Testing Household Incomes, Equal Variances ---------------------------t test of equal means INCOME by MARRIED ---------------------------MARRIED = 0 Nx = 817 MARRIED = 1 Ny = 3057 t [ 3872] = 3. 7238 P value =. 0002 ---------------------------Mean Std. Dev. Std. Error INCOME -----------------------MARRIED = 0. 27982. 12939. 00453 MARRIED = 1. 30145. 15194. 00275 ---------------------------

86/100 Part 5 – Hypothesis Testing Unknown Different Variances

Part 5 – Hypothesis Testing 87/100 2 Proportions ¢ ¢ ¢ Two Bernoulli Populations: Xi ~ Bernoulli with Prob(xi=1) = x Yi ~ Bernoulli with Prob(yi=1) = y H 0: x = y The sample proportions are px = (1/Nx) ixi and py = (1/Ny) iyi Sample variances are px(1 -px) and py(1 -py). Use the Central Limit Theorem to form the test statistic.

88/100 Part 5 – Hypothesis Testing z Test for Equality of Proportions Application: Take up of public health insurance. ---------------------------t test of equal means PUBLIC by FEMALE ---------------------------FEMALE =0 Nx = 1812 FEMALE =1 Ny = 1565 t [ 3375] = 5. 8627 P value =. 0000 ---------------------------Mean Std. Dev. Std. Error PUBLIC -----------------------FEMALE = 0. 84713. 35996. 00846 FEMALE = 1. 91310. 28178. 00712

Part 5 – Hypothesis Testing 89/100 Paired Sample t and z Test ¢ ¢ Observations are pairs (Xi, Yi), i = 1, …, N Hypothesis x = y. Both normal distributions. May be correlated. l Medical Trials: Smoking vs. Nonsmoking (separate individuals, probably independent) l SAT repeat tests, before and after. (Definitely correlated) Test is based on Di = Xi – Yi. Same as earlier with H 0: D = 0.

Part 5 – Hypothesis Testing 90/100 Treatment Effects ¢ SAT Do Overs l l l ¢ Experiment: X 1, X 2, …, XN = first SAT score, Y 1, Y 2, …, YN = second Treatment: T 1, …, TN = whether or not the student took a Kaplan (or similar) prep score Hypothesis, y > x. Placebo: In Medical trials, N 1 subjects receive a drug (treatment), N 2 receive a placebo. l Hypothesis: Effect is greater in the treatment group than in the control (placebo) group.

91/100 Part 5 – Hypothesis Testing Measuring Treatment Effects

Part 5 – Hypothesis Testing 92/100 Treatment Effects in Clinical Trials ¢ ¢ Does Phenogyrabluthefentanoel (Zorgrab) work? Investigate: Carry out a clinical trial. l l l N+0 = “The placebo effect” N+T – N+0 = “The treatment effect” The hypothesis is that the difference in differences has mean zero. Placebo Drug Treatment No Effect N 00 N 0 T Positive Effect N+0 N+T

Part 5 – Hypothesis Testing 93/100 A Test of Independence ¢ ¢ In the credit card example, are Own/Rent and Accept/Reject independent? Hypothesis: Prob(Ownership) and Prob(Acceptance) are independent Formal hypothesis, based only on the laws of probability: Prob(Own, Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities. Rejection region: Joint frequencies that do not look like the products of the marginal frequencies.

Part 5 – Hypothesis Testing 94/100 Contingency Table Analysis The Data: Frequencies Reject Accept Total Rent 1, 845 5, 469 7, 214 Own 1, 100 5, 030 6, 630 Total 2, 945 10, 499 13, 444 Step 1: Convert to Actual Proportions Reject Accept Total Rent 0. 13724 0. 40680 0. 54404 Own 0. 08182 0. 37414 0. 45596 Total 0. 21906 0. 78094 1. 00000

Part 5 – Hypothesis Testing 95/100 Independence Test Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions. [Rent, Reject] [Rent, Accept] [Own, Reject] [Own, Accept] 0. 54404 x 0. 21906 0. 54404 x 0. 78094 0. 45596 x 0. 21906 0. 45596 x 0. 78094 = = 0. 11918 0. 42486 0. 09988 0. 35606

96/100 Part 5 – Hypothesis Testing Comparing Actual to Expected

Part 5 – Hypothesis Testing 97/100 When is the Chi Squared Large? Critical values from chi squared table ¢ Degrees of freedom = (R-1)(C-1). ¢ Critical chi squared D. F. . 05. 01 1 3. 84 6. 63 2 5. 99 9. 21 3 7. 81 11. 34 4 9. 49 13. 28 5 11. 07 15. 09 6 12. 59 16. 81 7 14. 07 18. 48 8 15. 51 20. 09 9 16. 92 21. 67 10 18. 31 23. 21

Part 5 – Hypothesis Testing 98/100 Analyzing Default ¢ ¢ Do renters default more often (at a different rate) than owners? To investigate, we study the cardholders (only) OWNRENT 0 DEFAULT 0 1 All 4854 615 5469 46. 23 5. 86 52. 09 1 4649 44. 28 381 3. 63 5030 47. 91 All 9503 90. 51 996 9. 49 10499 100. 00

Part 5 – Hypothesis Testing 99/100 Hypothesis Test

Part 5 – Hypothesis Testing 100/100 Multiple Choices: Travel Mode ¢ ¢ ¢ 210 Travelers between Sydney and Melbourne 4 available modes, air, train, bus, car Among the observed variables is income. Does income help to explain mode choice? Hypothesis: Mode choice and income are independent.

Part 5 – Hypothesis Testing 101/100 Travel Mode Choices

102/100 Part 5 – Hypothesis Testing Travel Mode Choices and Income +-----------------------------+ | Travel MODE Data | +-----------------------------+ |INCOME | AIR TRAIN BUS CAR || Total | +-----------------------++-----+ |LOW | 10 36 9 8 || 63 | | | 0. 04761 0. 17143 0. 04286 0. 03810 || 0. 30000 | |-----------------------++-----+ |MEDIUM | 19 20 13 24 || 76 | | | 0. 09048 0. 09524 0. 06190 0. 11429 || 0. 36190 | |-----------------------++-----+ |HIGH | 29 7 8 27 || 71 | | | 0. 13810 0. 03333 0. 03810 0. 12857 || 0. 33810 | |=======================++=====+ |Total | 58 63 30 59 || 210 | | | 0. 27619 0. 30000 0. 14286 0. 28095 || 1. 00000 | +-----------------------+------+

Part 5 – Hypothesis Testing 103/100 Contingency Table +-----------------------------+ | Travel MODE Data | +-----------------------------+ |INCOME | AIR TRAIN BUS CAR || Total | +-----------------------++-----+ | | 10 36 9 8 || 63 | |LOW | 0. 04761 0. 17143 0. 04286 0. 03810 || 0. 30000 | | | 0. 08286 0. 09000 0. 04286 0. 08429 || |-----------------------++-----+ | | 19 20 13 24 || 76 | |MEDIUM | 0. 09048 0. 09524 0. 06190 0. 11429 || 0. 36190 | | | 0. 09995 0. 10857 0. 05170 0. 10168 || |-----------------------++-----+ | | 29 7 8 27 || 71 | |HIGH | 0. 13810 0. 03333 0. 03810 0. 12857 || 0. 33810 | | | 0. 09338 0. 10143 0. 04830 0. 09499 || |=======================++=====+ |Total | 58 63 30 59 || 210 | | | 0. 27619 0. 30000 0. 14286 0. 28095 || 1. 00000 | +-----------------------+------+ Assuming independence, P(Income, Mode) = P(Income) x P(Mode).

Part 5 – Hypothesis Testing 104/100 Computing Chi Squared For our transport mode problem, R = 3, C = 4, so DF = 2 x 3 = 6. The critical value is 12. 59. The hypothesis of independence is rejected.