Simpson Paradox And related problems 1 Simpson Paradox
Simpson Paradox And related problems 1
Simpson Paradox • 1960’s Admission data show that male and female have different admission rates when entering a famous University Graduate School. • But every relevant person of the graduate school claimed that they are very fair in the process. 2
Hypothetical Data • Two schools (Arts and Engineering) • Male admission rate = 35/80 =. 44 • Female admission rate = 20/60 =. 33 3
2 by 2 table admit deny male 35 45 female 20 40 4
Further analysis • School of Art • Male admission rate = 5/20 =. 25 • Female admission rate = 10/40 =. 25 • School of Engineering • Male admission rate = 30/60 = 0. 5 • Female admission rate = 10/20 = 0. 5 5
Data • • • Notice that 30 + 5 = 35 30 + 15 = 45 10 + 10 = 20 10 + 30 = 40 6
School of arts admit deny male 30 30 female 10 10 7
School of Engineering Admit Deny Male 5 15 Female 10 30 8
Why? • In each school, we can see that it is fair. • But, on the whole, it seems that the female students are discriminated. 9
Reason • More female students apply for school of arts • The admission rate for school of arts is low for both male and female 10
Maori versus Non-Maori Age Non-Maori (deaths/1000) 0 -4 Maori (deaths/1000) 3. 68 5 -14 . 28 . 27 15 -24 1. 26 1. 06 25 -44 2. 44 1. 31 45 -64 15. 0 8. 76 65+ 67. 36 54. 75 2. 75 11
But, on the whole • For Maori, death rate = 4. 65/1000 • For non-Maori, death rate = 8. 35/1000 12
The lesson we learn • We cannot draw a conclusion based on the data without understanding how the data are obtained. 13
Causal relation • When we say that there is a sex discrimination in the admission process, we mean that sex is a cause and admission is the consequence. • How can we come to conclude factor A causes the outcome B in Science? 14
Some possible mistakes • Data---from hospital record • Death rates of surgical patients are different for operations with different anesthetics • Halothane (1. 7%), Pentothal (1. 7%), Cyclopropane (3. 4%), Ether (1. 9%) • Can we say that cyclopropane is more dangerous than the other anesthetics? 15
Answer • No! the worst patients were receiving cyclopropane. 16
Study the effect of vaccine on preventing Polio • Can we apply the vaccine to all students and compare the proportion of students having polio at the end of year with the proportion in last year? • Can we apply the vaccine to all students in New York City and compare with proportion of students having polio with the corresponding proportion in Chicago? 17
Further questions • Can we compare the above proportion of students from private school with that of private school? • Can we compare the above proportion of male students with that of female students? 18
How to know the effect of vaccine in preventing polio • We need two groups: control group (no “real” treatment) treatment group (apply the vaccine) 19
We should compare the two groups under “equal” conditions • People are different from each other • By random assignment of participants into the two groups, we can make the two groups have almost identical conditions – e. g. , around the same on average 20
Real difficulties • There are many factors that will affect the outcome, it is impossible to control all of them 21
Design of an Experiment • For comparing one treatment (A) with the other treatment (B), we need to randomize the patient into each group receiving one of the treatments 22
The vaccine can prevent Polio • 1956 ---USA---over two million children involved • Can we let the students voluntarily select their own treatment? 23
Randomization • We need to randomly assign each school children to receive vaccine or placebo • The purpose of such randomization is to ensure the comparability of the two groups • Unfortunately many physicians could not understand the importance of the randomization 24
Placebo • In this case, placebo is another kind of liquid, which is similar to the vaccine in its outlook, injected into the children. • It is used so that all children were receiving “same” treatment. So that the difference in the results would not be explained as psychological effect 25
Data Polio (after half No polio (after year) half year) Control (placebo) A=115 B=201, 114 treatment C=33 D=200, 712 26
An example • • • The University Group Diabetes Program Randomly assign patients to 4 groups: Group 1: Placebo Group 2: Tolbutamide Group 3: Insulin Standard Group 4: Insulin variable 27
The results are controversial • Is it really random? 28
Seven risk factors • There are 7 risk factors related to diabetes • Age of 55 or older, High Blood Pressure, History of Chest Pains, Electrocardiogram (EEG), history of digitalise use, High Cholesterol level, overweight and Calcification of the arteries 29
Risk factor distributions No of RF 0 1 2 3 4 5 6 I 28 60 59 26 10 2 0 II 25 50 58 34 17 4 1 III 22 62 60 34 8 8 1 IV 15 76 57 30 4 4 1 30
Surprise? • The distribution in the four groups are almost identical • Notice that the study of the distribution is carried out after the experiment is done. It is quite likely that the randomization would make all potential risk factors equally distributed across the groups 31
Exercise one • How to show that vitamin C can prevent catching cold? 32
FDA • Food and Drug Administration • Guidelines for developing drugs and treatments • Statisticians should be involved in the design of the experiment and analysis of the data 33
Some past errors • Hormone therapy (approved by FDA)--treat menopausal symptoms and to prevent osteoporosis, or age-related loss of bone density • Later experiments showed that it does not protect against heart diseases or strokes and it increases the risk of dangerous blood clots and gallbladder disease. 34
Smoking and Lung Cancer • For moral reason, we cannot randomly assign a person to smoke or not to smoke 35
Observational study • Case-Control study • We study the smoking habit of patients with lung cancer in the hospital • In the same hospital, we study the smoking habit of patients of other diseases (without lung cancer, around same age, gender) • Or, we can study the individuals without lung cancer from the same community 36
Example • Oral contraceptives and Thromboembolic diseases • Cases—all women in the hospital having thromboembolic diseases • Control--? 37
Selection of controls • • Hospital---same as case Discharge date---same 6 -month interval Discharge status---all alive Age—same 5 -year span Marital status---same Residence---same metropolitan area Race---same 38
Selection of controls • Parity---same (no pregnancies, one or two, three or more) • Hospital status---same (ward, semiprivate, or private room) 39
Observational study • Cohort study • At the beginning, we have two groups, one smoking and the other non-smoking • Wait for 5 years and study the proportions of persons getting lung cancer in the two groups 40
Cancer risk • Many reports on the cancer risk were based on observation studies. Their results were not really reliable. 41
Exercise two • Think about the validity of using casecontrol study in the following task---to show salted fish can cause nasopharyngeal cancer. 42
Question • Comment on the following? • In a 1996 study by Dr. Leslie Wolfson of the University of Connecticut, tai chi was compared to balance training, strength training, and combined balance and strength training in people with an average of Eighty. Those who learned tai chi gained significantly more balance and strength than the other groups. 43
Case 1 • We obtain data on recoveries for males and females who have received a treatment (t) and a control © 44
Males R=1 R=0 T=t 18 12 T=c 7 3 45
Females R=1 R=0 T=t 2 8 T=c 9 21 46
Combined R=1 R=0 T=t 20 20 T=c 16 24 47
Question • The recovery rate is higher for T=c for both males and females • But the recovery rate is higher for T=t for the combined group? • For a new subject whose gender is unknown, which treatment should we prefer, t or c? 48
Another situation • Data on yields and heights for samples of black and white plants 49
Tall Y=1 Y=0 C=w 18 12 C=b 7 3 50
Short Y=1 Y=0 C=w 2 8 C=b 9 21 51
Combined Y=1 Y=0 C=w 20 20 C=b 16 24 52
Question • Should we plant a white (C=w) or a black variety of plant, in ignorance of the height the plant will grow to? 53
- Slides: 53