To understand Gods thoughts we must study statistics

  • Slides: 64
Download presentation
“To understand God’s thoughts we must study statistics, for these are the measure of

“To understand God’s thoughts we must study statistics, for these are the measure of his purpose. ” Florence Nightingale 1820 - 1910

CHAPTER 2 DESCRIBING CONTINGENCY TABLES In this chapter we: • cover the construction of

CHAPTER 2 DESCRIBING CONTINGENCY TABLES In this chapter we: • cover the construction of a contingency table • establish some terminology and notation • derive association measures • define marginal and partial contingency tables 2

Chapter 1: Levels of Measurement 3

Chapter 1: Levels of Measurement 3

Chapter 1: Levels of Measurement 4

Chapter 1: Levels of Measurement 4

Chapter 1: Levels of Measurement Y = CURED? Yes No X= MEDICATION A B

Chapter 1: Levels of Measurement Y = CURED? Yes No X= MEDICATION A B C D E . 1. 3. 09. 05. 07 . 2. 1. 05. 03 . 33. 75. 90. 50. 70 5

Chapter 1: Levels of Measurement Y (column variable) 1 2 Row margins 1 X

Chapter 1: Levels of Measurement Y (column variable) 1 2 Row margins 1 X (row variable) 2 Column margins 6

Review questions 7

Review questions 7

Review questions DISTRIBUTION FIXED BY EXPERIMENTAL DESIGN Poisson Nothing Binomial/multinomial n Product multinomial Hypergeometric

Review questions DISTRIBUTION FIXED BY EXPERIMENTAL DESIGN Poisson Nothing Binomial/multinomial n Product multinomial Hypergeometric 8

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions Real-data examples will now be given

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions Real-data examples will now be given for each sampling design. 1. POISSON Binding of antiphospholipid antibodies to trophoblasts and effect of neurologic diseases on pregnancy loss -- Wright State University (WSU) microbiology graduate student. BRAIN INDEX PLACENTAL BINDING Low High Low 39 9 High 4 10 In this case, the total sample size is n = 62 -- it was NOT fixed by the experimenter. In fact, the experimenter decided to continue sampling pregnant women until she ran out of time and had to stop in order to carry out her analyses and write her dissertation. Therefore, n is a random variable. 9

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions 2. MULTINOMIAL This sampling design is

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions 2. MULTINOMIAL This sampling design is also referred to as a “naturalistic” design or a “cross-sectional” design. Jury decisions: jury outcome versus restriction of decision alternatives*. DECISION ALTERNATIVES JURY DECISION 1 2 3 First-degree 11 9 2 Second-degree 20 33 15 Manslaughter 22 29 5 Not Guilty 19 1 2 In this case, the total sample size n = 168 was fixed by the experimenter, perhaps because a preliminary sample size analysis revealed that n = 168 was needed to insure reliable statistical results. Therefore, n is a fixed constant. *Vidmar, N. , 1972: J. Personality and Social Psych. 22, 211 -218 10

Review questions 11

Review questions 11

Review questions Y = CAUGHT A COLD? X= TREATMENT GROUP Yes No TOTAL Placebo

Review questions Y = CAUGHT A COLD? X= TREATMENT GROUP Yes No TOTAL Placebo 31 (. 22) 109 140 Ascorbic Acid 17 (. 12) 122 139 Fixed by experimenter 12

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions Linus Pauling, 1901 – 1994 Nobel

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions Linus Pauling, 1901 – 1994 Nobel prizes in chemistry and peace. Named the 16 th greatest scientist in history by New Scientist. 13

Review questions MYOCARDIAL INFARCTION (MI) Yes No ORAL CONTRACEPTIVE Used 23 (. 40) 34

Review questions MYOCARDIAL INFARCTION (MI) Yes No ORAL CONTRACEPTIVE Used 23 (. 40) 34 (. 20) Never used 35 132 TOTAL 58 166 Fixed by experimenter 14

Review questions MYOCARDIAL INFARCTION (MI) ORAL CONTRACEPTIVE Yes No Used 23 (. 40) 34

Review questions MYOCARDIAL INFARCTION (MI) ORAL CONTRACEPTIVE Yes No Used 23 (. 40) 34 (. 20) Never used 35 132 TOTAL 58 166 Fixed by experimenter 15

Review questions MYOCARDIAL INFARCTION (MI) Yes No ORAL CONTRACEPTIVE Used 23 (. 40) 34

Review questions MYOCARDIAL INFARCTION (MI) Yes No ORAL CONTRACEPTIVE Used 23 (. 40) 34 (. 20) Never used 35 132 TOTAL 58 166 Fixed by experimenter 16

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions 4. HYPERGEOMETRIC Consider the ability to

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions 4. HYPERGEOMETRIC Consider the ability to predict the order of milk or tea being poured first when preparing tea (Fisher, R. A. , 1935: The Design of Experiments, Oliver & Boyd). Actual incident (“tea tasting experiment”): regarding the making of a cup of tea, R. A. Fisher’s assistant, Dr. Muriel Bristol, claimed that, without seeing the preparation of the tea, she knew whether the milk had been added before or after the hot tea was poured (she preferred the milk poured first). Fisher doubted this assertion and wished to test it. Note in the table below, both the row and column marginal totals are fixed by design (Dr. Bristol is told that four cups of each type are prepared). POURED FIRST--GUESS Milk Tea POURED FIRST--ACTUAL TOTAL Milk 3 1 4 Tea 1 3 4 TOTAL 4 4 Fixed by experimenter 17

Kempthorne wrote, with Klaus Hinkelmann, Design and Analysis of Experiments, Volumes 1 and 2

Kempthorne wrote, with Klaus Hinkelmann, Design and Analysis of Experiments, Volumes 1 and 2 Fisher was Kempthorne’s advisor Review questions 18

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS -- CAUTION Review questions Contingency tables generated by

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS -- CAUTION Review questions Contingency tables generated by a product-multinomial sampling scheme must be handled carefully. Only the variable having fixed margins may be conditioned on, even if that variable is not the predictor variable (which is often the case in retrospective or rare event studies). 19

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS – CAUTION (continued) Review questions Consider a study

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS – CAUTION (continued) Review questions Consider a study investigating the relationship between GENDER and Amyotrophic Lateral Sclerosis (ALS, Lou Gehrig’s disease). ALS is so rare that a HUGE sample of the population would need to be sampled just to obtain one or two cases of ALS. Because of this “rare event” problem, a more practical strategy is to obtain a listing of all ALS patients and randomly sample 100 subjects from this listing and then randomly sample 100 subjects from the non-ALS population. Suppose the following contingency table results (artificial data): ALS GENDER Yes No TOTAL Male 40 50 90 Female 60 50 110 TOTAL 100 For these data, the conditional probabilities P[GENDER=Male|ALS] for ALS = Yes and No, are valid and interpretable since the ALS margins are fixed. But P[ALS=Yes|GENDER] for GENDER=Male and Female, are NOT valid because the GENDER margins are NOT fixed. 20

Review questions Yes GENDER ALS No TOTAL Male 40 50 90 Female 60 50

Review questions Yes GENDER ALS No TOTAL Male 40 50 90 Female 60 50 110 TOTAL 100 n = 200 21

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions The three sampling designs we’ve introduced,

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions The three sampling designs we’ve introduced, Poisson, multinomial, and product-multinomial, are important because the statistical methods developed in this course are only applicable to contingency tables generated by these designs. Therefore, it will be important to recognize when a contingency table is NOT generated from one of these distributions. We look at a few examples of such tables. See if you can tell why the given contingency table is not generated by any of our three sampling designs. 1. Consider a random sample of 500 voters who respond to the statement ”the President is doing a good job” at two different time points: (a) before a major speech, and again (2) after the speech. The data are given below. AFTER SPEECH BEFORE SPEECH Yes No Yes 425 70 No 150 355 22

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS questions 2. Consider. Review the contingency table below

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS questions 2. Consider. Review the contingency table below presenting data on the piano choice of soloists scheduled during the 1973 -74 concert season for selected major American orchestras. The EU in this study is the soloist. PIANO CHOICE Ochestra Steinway Other TOTAL Boston Symphony 4 2 6 Chicago 13 1 14 Cleveland 11 2 13 Minnesota 2 2 4 New York Philharmonic 9 2 11 Philadelphia 6 0 6 TOTAL 45 9 n = 54 23

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions 3. One hundred twenty-two paramedic trainees

TWO-WAY CONTINGENCY TABLE – SAMPLING DESIGNS Review questions 3. One hundred twenty-two paramedic trainees were assigned to four different training groups for field endotracheal intubation technique*. For several weeks after the end of the course the trainees were observed by a supervisor whenever they performed an intubation as part of their work with the ambulance service. The success or failure of the intubation attempt was recorded. SUCCESS TRAINING GROUP Yes No TOTAL I 290 24 314 II 211 30 241 III 95 19 114 IV 93 28 121 TOTAL 689 101 790 NOTE: this contingency table was analyzed incorrectly because the statistical methodology used for analysis was based on data generated by one of the three principal sampling designs (from which these data were not generated). Therefore, unreliable results were presented in the resulting publication. For a personal story behind this example, see the next two slides. *Stewart, R. D. , et al. , 1984: Effect of varied training techniques on field endotracheal intubation success rates, Annals of 24 Emergency Medicine 13, 1032 -1036.

Concerning the ”field endotracheal intubation technique” example: Review questions I served on the Institutional

Concerning the ”field endotracheal intubation technique” example: Review questions I served on the Institutional Animal Care and Use Committee (IACUC) at Wright State University (WSU) in Dayton, Ohio in the early 2000 s. This committee reviewed all protocols involving animal use for research or teaching on campus. As the ”resident statistician” I was responsible for checking that the proposed sample sizes, study design, execution of the experimental procedure, data collectyion, and statistical analysis plan were appropriate for all such protocols. A professor from the Boonshoft School of Medicine at WSU submitted a teaching protocol proposing to use live cats in the training of intubation technique rather than non-animal models such as mannequins or video presentations. The protocol was highly sensitive and it was certain that several animal rights activists would be attending the meeting (they are allowed to attend but not to speak). I approached the University veterinarian (a permanent member of the IACUC) and asked him if there was some literature that I could read that would address the learning effectiveness of different intubation teaching techniques, especially live animal versus non-animal teaching models. He was able to provide all IACUC members with the article entitled ”Effect of varied training techniques on field endotracheal intubation success rates”. The article compared the intubation success rate between live animal and non-live animal teaching models. It seemed like the perfect article to answer the question: ”Do live animal teaching models lead to statistically 25 and practically higher intubation success rates than non-live animal teaching models? ”

Concerning the ”field endotracheal intubation technique” example, continued: Review questions As I eagerly read

Concerning the ”field endotracheal intubation technique” example, continued: Review questions As I eagerly read through the article I realized that the authors, in multiple instances, applied the standard chi-squared test to contingency tables that were not generated from any of the principal sampling designs (Poisson, multinomial, product-multinomial). Thus, their claims of statistical significance were unreliable. It was a great disappointment. I had to attend the meeting without knowing, based on valid scientific data, whether intubation teaching methods using live animals were superior to teaching methods that did not use live animals. At the meeting, I told my fellow committee members that I was generally disregarding the results of the article because of the incorrect statistical analysis of the data and, therefore, unreliable conclusions. As it happens, the IACUC voted to approve the professor’s protocol and allowed him to use live cats in intubation training. One of the compelling arguments leading to the decision was given by a medical faculty member of the IACUC who said, ”if I was lying in the roadway after an accident unable to breath and two medical trainees were rushing toward me to intubate me and one had received training using a live animal while the other had trained on a mannequin or only received lectures and video presentations, I would choose the former to intubate me”. Here is an example where statistical science, had it been applied correctly, could have answered a very important question, thereby helping to resolve a truly important issue. --HJK 26

Review questions Y, column Success X, row Failure 1 2 27

Review questions Y, column Success X, row Failure 1 2 27

Review questions X, row Y, column Success Failure 1 2 28

Review questions X, row Y, column Success Failure 1 2 28

Review questions X, row Y, column Success Failure 1 2 29

Review questions X, row Y, column Success Failure 1 2 29

Review questions X, row Y, column Success Failure 1 2 30

Review questions X, row Y, column Success Failure 1 2 30

TWO-WAY CONTINGENCY TABLE – ASSOCIATION Review questions 3. ODDS RATIO, OR (continued) The first

TWO-WAY CONTINGENCY TABLE – ASSOCIATION Review questions 3. ODDS RATIO, OR (continued) The first strong proponent of the OR as a measure of association was G. Udny Yule in the early 1900 s. LLMs use ORs as ”building blocks”. For this reason, Yule is regarded as the grandfather of the LLM approach. And because ORs are so important in the study of LLMs, we focus on them as a preferred measure of association in contingency tables. G. Udny Yule, 1871 - 1951 31

Review questions 32

Review questions 32

Review questions Yes GENDER ALS No TOTAL Male 40 50 90 Female 60 50

Review questions Yes GENDER ALS No TOTAL Male 40 50 90 Female 60 50 110 TOTAL 100 n = 200 33

Review questions 1|i 34

Review questions 1|i 34

Review questions ALS GENDER Yes No TOTAL Male 40 50 90 Female 60 50

Review questions ALS GENDER Yes No TOTAL Male 40 50 90 Female 60 50 110 TOTAL 100 n = 200 35

Review questions HEART ATTACK GROUP Yes No TOTAL Placebo 189 10, 845 11, 034

Review questions HEART ATTACK GROUP Yes No TOTAL Placebo 189 10, 845 11, 034 Aspirin 104 10, 933 11, 037 TOTAL 293 21, 778 36

Review questions RECEIVED DEATH PENALTY? Yes No DEFENDANT’S RACE TOTAL White 53 430 483

Review questions RECEIVED DEATH PENALTY? Yes No DEFENDANT’S RACE TOTAL White 53 430 483 11. 0 Black 15 176 191 7. 9 37

PARTIAL AND MARGINAL CONTINGENCY TABLES Review questions The table below is called a marginal

PARTIAL AND MARGINAL CONTINGENCY TABLES Review questions The table below is called a marginal table; it was formed by ”collapsing” (or summing) over the two tables for the two Victim’s Races. RECEIVED DEATH PENALTY? DEFENDANT’S RACE Yes No TOTAL White 53 430 483 11. 0 Black 15 176 191 7. 9 The two tables below, one for each Victim’s Race, are called partial tables. White Victims DEFENDENT’S RACE RECEIVED DEATH PENALTY Yes No % Yes ---------------------------White 53 414 11. 3 Black 11 37 22. 9 Black Victims DEFENDENT’S RACE RECEIVED DEATH PENALTY Yes No % Yes --------------------------White 0 16 0. 0 Black 4 139 2. 8 38

PARTIAL AND MARGINAL CONTINGENCY TABLES Review questions SUMMARY OF RESULTS: Marginal table of DEFENDENT’S

PARTIAL AND MARGINAL CONTINGENCY TABLES Review questions SUMMARY OF RESULTS: Marginal table of DEFENDENT’S RACE x RECEIVED DEATH PENALTY: θ = 1. 45 => odds of Whites receiving the death penalty is 45% higher than for Blacks Partial tables of DEFENDENT’S RACE x RECEIVED DEATH PENALTY: White Victims: θ = 0. 43 => odds of Whites receiving the death penalty is 57% lower than for Blacks Black Victims: θ = 0 => odds of Whites receiving the death penalty is 100% lower than for Blacks NOTE: MARGINAL ASSOCIATION IS NOT THE SAME AS PARTIAL ASSOCIATION. 39

Review questions Y Marginal Table X X 1 2 X 40

Review questions Y Marginal Table X X 1 2 X 40

Review questions 41

Review questions 41

Review questions 42

Review questions 42

Review questions RESPONSE Marginal Table TREATMENT A B Success Failure θ = 2. 0

Review questions RESPONSE Marginal Table TREATMENT A B Success Failure θ = 2. 0 43

Review questions 44

Review questions 44

The effect of transforming one variable 45

The effect of transforming one variable 45

MEASURES OF ASSOCIATION The effect of transforming one variable There are many different summary

MEASURES OF ASSOCIATION The effect of transforming one variable There are many different summary measures of association beyond what has been presented here. Each is designed to address specific kinds of categorical variables, sampling designs, and research questions. The following chart provides a brief summary of such measures of association. For definitions of the notation appearing in the chart and further information, see: Goodman, L. A. and Kruskal, W. H. , 1979. Measures of Association for Cross Classifications. Springer-Verlag, New York. Khamis, H. J. , 1998. Measures of Association, Encyclopedia of Biostatistics, Eds. Peter Armitage and Theodore Colton, John Wiley & Sons. 46

Association Between Two Categorical Variables No Measuring Association? Yes Measuring Agreement? Both Factors Nominal?

Association Between Two Categorical Variables No Measuring Association? Yes Measuring Agreement? Both Factors Nominal? Yes Both Factors Nominal? No Both Factors Ordinal? Yes No Yes Both Factors Ordinal? No Yes One factor nominal and one factor ordinal? Yes Ridit Analysis 47

Introduction to SAS 48

Introduction to SAS 48

Introduction to SAS = Statistical Analysis System SAS is used for, among many other

Introduction to SAS = Statistical Analysis System SAS is used for, among many other things, organizing, describing, and analyzing data. We begin by writing and editing a SAS program, which consists of two steps: data step and proc step (”proc” = procedure). Data step: create, name, organize, and manipulate a set of data. Proc step: carry out analyses and procedures on the data.

Introduction to SAS Let’s create a small data set. DATA STEP: ”$” indicates the

Introduction to SAS Let’s create a small data set. DATA STEP: ”$” indicates the variable ”sex” takes character values rather than numerical values (i. e. , alphameric data). data cholesterol; input id sex $ chol wt ht; datalines; 1 m 230 240 69 2 f 210 120 68 Use a period, ”. ”, to 3 f 240 170. designate a missing 4 m 215 235 71 value. ; A semicolon must appear (i) at the end of run; every SAS statement and (ii) on the line after the last line of data. In blue: SAS keywords = words recognized by SAS and which ask SAS to do something. In red: SAS names = names that you get to choose for your data set and variables, subject to conditions: • begins with a letter, • contains no special characters such as ”*” or ”#” (”-” and ”_” are ok to use), • contains no blank spaces.

Introduction to SAS New variables can be created between the ”input” statement and the

Introduction to SAS New variables can be created between the ”input” statement and the ”datalines” statement: data cholesterol; input id sex $ chol wt ht; bmi = (wt/ht**2); datalines; 1 m 230 240 69 2 f 210 120 68 3 f 240 170. 4 m 215 235 71 ; run; New variable “bmi” is created

Introduction to SAS Use the DATA STEP to manipulate an existing data set. data

Introduction to SAS Use the DATA STEP to manipulate an existing data set. data cholesterol 2; set cholesterol; bmi = (wt/ht**2); if bmi gt 29 then alert=1; else alert=0; run; Two new variables are created in this data step: bmi and alert.

Introduction to SAS PROC STEP The proc step enables us to analyze the data.

Introduction to SAS PROC STEP The proc step enables us to analyze the data. It comes after the ”run; ” statement at the end of the data step. There are many SAS procedures, three commonly used procedures are the ”print”, ”means”, and ”freq” procedures. proc print data=cholesterol 2; var id bmi alert; run; proc means data=cholesterol 2; var chol wt ht bmi; run; proc freq data=cholesterol 2; table sex alert; run;

Introduction to SAS data step proc step data cholesterol; input id sex $ chol

Introduction to SAS data step proc step data cholesterol; input id sex $ chol wt ht; datalines; 1 m 230 240 69 2 f 210 120 68 3 f 240 170. data lines 4 m 215 235 71 ; run; data cholesterol 2; set cholesterol; bmi = (wt/ht**2); data chol_nonmiss; set cholesterol 2; if bmi ne. ; if bmi gt 29 then alert=1; else alert=0; run; proc print data=cholesterol 2; var id bmi alert; run; proc means data=cholesterol 2; var id chol wt ht bmi; run;

Introduction to SAS We will use the Statistical Analysis System (SAS) to analyze data.

Introduction to SAS We will use the Statistical Analysis System (SAS) to analyze data. Review questions How data are entered into a SAS program depends on the form of the data. There are two options. 1. RAW DATA The values of each variable are given for each EU. For example: ID 1 2 3 4 5. . . GENDER M M F F M. . . PASSED Yes Yes No No. . . SAS program: data exam; input id gender $ passed $; datalines; 1 m yes 2 m yes 3 f yes. . ; proc freq; table gender*passed; run; 55

Review questions These levels change slowest These levels change fastest PASSED GENDER Yes No

Review questions These levels change slowest These levels change fastest PASSED GENDER Yes No M 25 16 F 32 10 Notice the order in which the frequencies are written: with columns (”PASSED”) changing fastest. Notice that for ”PASSED”, 1 = ”Yes” and 2 = ”No”. For ”GENDER”, 1 = ”M” and 2 = ”F”. 56

MEASURES OF ASSOCIATION The effect of transforming one variable To access measures of association

MEASURES OF ASSOCIATION The effect of transforming one variable To access measures of association in SAS, use the following language. data assoc; do row = 1 to 4; do col = 1 to 4; input count @@; output; end; datalines; 12 43 26. . . ; proc freq; weight count; table row*col/all; run; “all” provides a wide range of tests and coefficients for the two-way table. 57

Introduction to SAS We illustrate with the Linus Pauling French Skier data. *-------------------------------------------------These data

Introduction to SAS We illustrate with the Linus Pauling French Skier data. *-------------------------------------------------These data come from the Linus Pauling Cold Study of French skiers (Proc. Natinoal Academy of Science, 68, 2678 -2681, 1971). This is a purposive cohort study (fixed number of subjects in each of the two treatment groups, Vitamin C and Placebo). *-------------------------------------------------; proc format; value rowfmt 1='placebo' 2='vitamin C'; value colfmt 1='yes' 2='no'; data pauling; do row=1 to 2; do col=1 to 2; input count @@; output; end; label row='treatment‘ col='caught cold'; format rowfmt. colfmt. ; datalines; 31 109 17 122 ; title 'Linus Pauling 1971 Data of French Skiers'; proc freq; weight count; table row*col/riskdiff measures nopercent nocol; run; quit; 58

CHAPTER 2 CONCLUSION The effect of transforming one variable This chapter covered the three

CHAPTER 2 CONCLUSION The effect of transforming one variable This chapter covered the three principal sampling designs in categorical data analysis: Poisson, multinomial, and productmultinomial. The hypergeometric distribution was discussed for the case where both row and column margins are fixed. The most commonly used measures of association were introduced. The odds ratio was highlighted because of its importance in the study of loglinear models and because of its important properties. The connections among (1) the discrete distributions, (2) sampling designs, and (3) association measures were examined. Finally, we discussed the distinction between a marginal contingency table and partial contingency tables. 59

SUMMARY EXERCISES The effect of transforming one variable 1. About 0. 85% of people

SUMMARY EXERCISES The effect of transforming one variable 1. About 0. 85% of people have peanut allergies. A simple new clinical test for peanut allergy is studied. For those with the allergy the test is positive 85% of the time (this is called the sensitivity of the test); for those without the allergy the test is negative 65% of the time (this is called the specificity of the test). Suppose a person tests positive for peanut allergy, what is the probability that he/she has the allergy? 2. A random sample of 272 young women in Zimbabwe are classified according to (i) whether they married while still a “child” and (ii) their academic achievement. (Data kindly provided by Desmond Mwembe, National University of Science and Technology, Bulawayo, Zimbabwe). ACADEMIC ACHIEVEMENT Obtain and interpret the estimated: (a) Risk difference (b) Risk ratio (c) Odds ratio CHILD MARRIAGE Yes No Low 50 84 High 21 117 60

SUMMARY EXERCISES, continued The effect of transforming one variable If the probability that it

SUMMARY EXERCISES, continued The effect of transforming one variable If the probability that it will rain tomorrow is 0. 8, what are the odds that it will 3. rain tomorrow? 4. If the odds of winning a bet are 9. 0, what is the probability of winning the bet? 5. The risk of a peanut allergy is 0. 88% for men and 0. 82% for women, so the risk ratio is 1. 073. What is the approximate OR? 6. Identify the following designs as Poisson, multinomial, product-multinomial, hypergeometric, or other. (a) Sample 100 men and 100 women and record whether they pass an IQ test as “highly intelligent” (>130). (b) Randomly sample 300 college students and cross-classify them according to their gender (M, F) and ethnic background (White, Black, Asian, Hispanic, Other). 61

SUMMARY EXERCISES, continued Review questions (c) At a highway checkpoint, count the number of

SUMMARY EXERCISES, continued Review questions (c) At a highway checkpoint, count the number of 5 -axel (or larger) trucks that pass during a period of 3 hours; for each truck cross-classify according to make and color. (d) In proving that the 1970 draft lottery was unfair, statisticians looked at the contingency table where the 366 days of the year (including leap year) were divided into thirds (these were the 3 rows of the table) and the 12 months formed the columns. Each individual subject to the draft was cross-classified according to (1) which third of the 366 days they fell into according to the lottery, and (2) which month their birthday fell in. (Randomization and Social Affairs: The 1970 Draft Lottery, Stephen E. Fienberg, Science, Vol. 171, No. 3968 (Jan. 22, 1971), pp. 255 -261, published by the American Association for the Advancement of Science, https: //www. jstor. org/stable/1730983. ) For a personal story relating to these data, see the next two slides. (e) Each of 50 elderly individuals were tracked for one year. Every time they visited their doctor, the visit was cross-classified according to: (1) if the visit lasted longer than 1 hour or not, and (2) if they received a prescription or not. The total sample size was n = 168 visits. 62

Concerning the 1970 draft lottery exercise: Review questions Richard M. Nixon signed Executive Order

Concerning the 1970 draft lottery exercise: Review questions Richard M. Nixon signed Executive Order 11497 on 11/26/69 establishing the 1970 draft lottery for males 19 – 26 years of age. In his article cited above, Fienberg used a variety of statistical tools, including analysis of the 3 x 12 (lottery number tertile versus birth month) contingency table, to show overwhelmingly that the 1970 draft lottery was not fair. In fact, he showed that those young men born in the latter months (October – December) had a higher probability of receiving a low draft number than those born in the earlier months (January – March). Those receiving a draft number below 195 were drafted. Thus, most young American men hoped for a high draft number. Upon learning of this discrepancy, the individuals in charge of the draft lottery (which included no statisticians) hired three former presidents of the American Statistical Association to advise them in conducting the 1971 draft lottery. Again using a variety of statistical methods, it was shown that the 1971 draft lottery was fair. My birthday is in December. In the 1970 draft lottery my chance of getting a low draft number would have been disproportionately high. In 1970 I was 18 years old and, hence, did not qualify for the draft lottery. If I had qualified for it, my draft number would have been 135; being in good physical condition with no disabilities, I would certainly have been drafted. 63

Concerning the 1970 draft lottery exercise, continued: Review questions The first year that I

Concerning the 1970 draft lottery exercise, continued: Review questions The first year that I qualified for the draft lottery was 1971. My chance of receiving a low draft number that year was the same as any other 19 – 26 year-old. As it happens, my draft number was 218. I was not drafted. This is important because the Vietnam War was raging at the time and many draftees were sent there and many, tragically, died there. Because of these circumstances, I feel that statistics may very well have saved my life. This discussion has nothing whatever to do with the politics of the Vietnam War, whether it was just or unjust, good or bad. Is has to do, exclusively, with the fundamental principal that every young man has the same chance of receiving a low draft number. Because those who ran the 1970 draft lottery did not understand how much work it takes to ensure randomization, many young men were drafted and perhaps sent to Vietnam and died because they were born in a month late in the year rather than a month early in the year. One of those men could have been me had statistics not stepped in. This is an illustration of the tragedy that can result when basic statistical principles are not understood and applied. I was a beneficiary of the statistical expertise that converted a biased lottery in 1970 to an unbiased one in 1971. ---HJK 64