1 DATASET INTRODUCTION 2 Dataset Urine From Cleveland

  • Slides: 63
Download presentation
1 DATASET INTRODUCTION

1 DATASET INTRODUCTION

2 Dataset: Urine From Cleveland Clinic 1981 -1984

2 Dataset: Urine From Cleveland Clinic 1981 -1984

3 Outcome Variable: Categorical Variable § Calcium Oxalate Crystal Presence • In this analysis,

3 Outcome Variable: Categorical Variable § Calcium Oxalate Crystal Presence • In this analysis, this variable will be our • Outcome variable • Response Variable • Dependent Variable • Note: The dataset is coded directly as Yes/No (not 0/1 coding)

4 Other Variables (Covariates) Quantitative. Variables q Specific Gravity q p. H q Osmolarity

4 Other Variables (Covariates) Quantitative. Variables q Specific Gravity q p. H q Osmolarity q Conductivity q Urea Concentration (millimoles/liter) q Calcium Concentration (millimoles/liter) q Cholesterol: serum cholesterol levels

5 Discussion/Review § Purpose of dataset: Determine which of the covariates are related to

5 Discussion/Review § Purpose of dataset: Determine which of the covariates are related to the outcome. Covariates can also be called • Independent Variables • Predictors • Explanatory Variables § Outcomes/Covariates can be categorical or quantitative § Can be more than one outcome and many covariates in a given study with any mixture of variable types

6 Calcium Oxalate Crystal Presence N Mean Std Dev Min Q 1 Med Q

6 Calcium Oxalate Crystal Presence N Mean Std Dev Min Q 1 Med Q 3 Max 8. 48 No 42 2. 69 1. 90 0. 17 1. 22 2. 16 3. 93 Yes 31 5. 92 3. 59 0. 27 3. 10 6. 19 7. 82 14. 34

7 Discussion § Clearly, those with calcium oxalate crystals present tend to have higher

7 Discussion § Clearly, those with calcium oxalate crystals present tend to have higher calcium concentrations § Later we will learn to conduct hypothesis tests in such situations § Now we use this data to illustrate concepts of probability

8 Comments § To facilitate our discussion of probability and classification tests § We

8 Comments § To facilitate our discussion of probability and classification tests § We will categorize the quantitative variable Calcium Concentration into four groups 1 = 0 -1. 99 2 = 2 -4. 99 3 = 5 -7. 99 4 = 8 or More

9 BASIC PROBABILITY Part 1 (Unconditional Probability using Logic)

9 BASIC PROBABILITY Part 1 (Unconditional Probability using Logic)

10 Back to the Urine Dataset § Suppose one individual is selected from our

10 Back to the Urine Dataset § Suppose one individual is selected from our sample and consider the following questions • What is the probability that the individual has calcium oxalate crystals present? • What is the probability that the individual has a calcium concentration of 5 or more? • What is the probability the individual has calcium oxalate crystals present AND has a calcium concentration of 5 or more? • What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more?

11 Comments § All of these four probability questions relate to the ENTIRE SAMPLE

11 Comments § All of these four probability questions relate to the ENTIRE SAMPLE § We begin by answering the questions logically from the table we created using software

12 Let’s Practice! Basic Probability of an Event • What is the probability that

12 Let’s Practice! Basic Probability of an Event • What is the probability that the individual has calcium oxalate crystals present? We will denote this event by A. • = PREVALENCE of calcium oxalate crystals in our sample Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

13 Let’s Practice! Basic Probability of an Event • What is the probability that

13 Let’s Practice! Basic Probability of an Event • What is the probability that the individual has a calcium concentration of 5 or more? We will denote this event by B. Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

14 Let’s Practice! Basic Probability of an Event: Intersections • What is the probability

14 Let’s Practice! Basic Probability of an Event: Intersections • What is the probability the individual has calcium oxalate crystals present AND has a calcium concentration of 5 or more? Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

15 Let’s Practice! Basic Probability of an Event: Unions • What is the probability

15 Let’s Practice! Basic Probability of an Event: Unions • What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more? Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

16 USING PROBABILITY RULES Part 1

16 USING PROBABILITY RULES Part 1

17 Probability Rules § Rules are created and used for many reasons § The

17 Probability Rules § Rules are created and used for many reasons § The rules and properties stated previously are important and useful in probability and sometimes in statistics § Not always needed • If you can determine the answer through logic alone you may not need a rule! • If you are provided only pieces of the puzzle, sometimes a rule is faster than logic!

18 Continuing § We now illustrate a few formulas using the questions we have

18 Continuing § We now illustrate a few formulas using the questions we have already answered using logic

19 Let’s Practice Again! Complement Rule • What is the probability that the individual

19 Let’s Practice Again! Complement Rule • What is the probability that the individual DOES NOT have calcium oxalate crystals present? • We could use logic and count the No’s instead of the Yes’s however knowing P(Yes)=P(A): Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

20 Let’s Practice Again! Addition Rule (Unions) • What is the probability the individual

20 Let’s Practice Again! Addition Rule (Unions) • What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more? Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

21 Let’s Practice Again! Addition Rule (Unions) • What is the probability the individual

21 Let’s Practice Again! Addition Rule (Unions) • What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more?

22 INDEPENDENCE Part 1

22 INDEPENDENCE Part 1

23 Independent Events § Two events are independent if knowing one event occurs does

23 Independent Events § Two events are independent if knowing one event occurs does not change the probability of the other § This is not the same as “disjoint” events which are separate in that they cannot occur together § These are two different concepts entirely § Independence is a statement about the equality of the probability of one event whether or not the other event occurs (or is occurring, or has occurred)

24 Let’s Practice! Investigating Independence Part 1 ? We know the following from our

24 Let’s Practice! Investigating Independence Part 1 ? We know the following from our sample

25 Let’s Practice! Investigating Independence Part 1 § From our sample we have: §

25 Let’s Practice! Investigating Independence Part 1 § From our sample we have: § This is clearly not equal to 0. 247!! § In our sample the events are dependent (we can test this hypothesis about the population later)

26 BASIC PROBABILITY Part 2: Conditional Probability (Logic & Formula)

26 BASIC PROBABILITY Part 2: Conditional Probability (Logic & Formula)

27 Conditional Probability § So far, we have divided by the TOTAL § Sometimes,

27 Conditional Probability § So far, we have divided by the TOTAL § Sometimes, however, we have additional CONDITIONS that cause us to alter the denominator (bottom) of our probability calculation § Suppose, when choosing one person from the Urine data, we ask • Given the individual has Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above? § “Conditional” refers to the fact that we have these additional conditions, restrictions, or other information

28 Let’s Practice! CONDITIONAL Probability of an Event • Given the individual has Calcium

28 Let’s Practice! CONDITIONAL Probability of an Event • Given the individual has Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above? Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

29 Let’s Practice! CONDITIONAL Probability FORMULA • Given the individual has Calcium Oxalate Crystals

29 Let’s Practice! CONDITIONAL Probability FORMULA • Given the individual has Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above?

30 Let’s Practice! CONDITIONAL Probability of an Event • Given the individual DOES NOT

30 Let’s Practice! CONDITIONAL Probability of an Event • Given the individual DOES NOT HAVE Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above? Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

31 MORE PRACTICE Conditional Probability

31 MORE PRACTICE Conditional Probability

32 Let’s Verify! CONDITIONAL Probability of an Event • Given the individual has a

32 Let’s Verify! CONDITIONAL Probability of an Event • Given the individual has a calcium concentration of 5 or above, what is the probability the individual has calcium oxalate crystals? • We have a small amount of rounding error this time Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

33 INDEPENDENCE Part 2

33 INDEPENDENCE Part 2

34 Let’s Practice! Investigating Independence Part 2 ? We know the following from our

34 Let’s Practice! Investigating Independence Part 2 ? We know the following from our sample ?

35 Comments Investigating Independence Part 2 § These probabilities are clearly unequal in our

35 Comments Investigating Independence Part 2 § These probabilities are clearly unequal in our sample, our eventual question might be if this is also true for our population § In this sample, these events are dependent § From our analysis so far, it seems likely they may be dependent in our population (we can test later) § Knowing whether or not the person has calcium oxalate crystals present CHANGES the probability of having a calcium concentration of 5 or above!!

36 GENERAL MULTIPLICATION RULE

36 GENERAL MULTIPLICATION RULE

37 General Multiplication Rule § This formula comes from rearranging the definition of conditional

37 General Multiplication Rule § This formula comes from rearranging the definition of conditional probability § To achieve the second formulation on the right consider the formula below for P(A|B) instead and note that the numerator is unchanged

38 General Multiplication Rule

38 General Multiplication Rule

39 REPEATED SAMPLING

39 REPEATED SAMPLING

40 Repeated Sampling § Often we consider problems in which we draw multiple individuals

40 Repeated Sampling § Often we consider problems in which we draw multiple individuals from a set of individuals • Drawing parts from a box where some are defective • Choosing multiple people from a certain population § The formulas we have investigated can be used to calculate probabilities in these situations

41 Let’s Practice! § If we select two subjects at random from our sample,

41 Let’s Practice! § If we select two subjects at random from our sample, what is the probability that both have a calcium concentration of 8 or more? Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

42 WANT TO LEARN MORE? READ THE FOLLOWING OPTIONAL MATERIAL The remaining slides are

42 WANT TO LEARN MORE? READ THE FOLLOWING OPTIONAL MATERIAL The remaining slides are optional. They illustrate some more difficult probability rules along with additional examples of probability related to the health sciences

43 Optional Content: Read About § Relative Risk § Total Probability Rule § Bayes

43 Optional Content: Read About § Relative Risk § Total Probability Rule § Bayes Rule § Screening Tests • Sensitivity/Specificity • PV+/PV • False Positive and False Negative Rates § ROC Curves

44 Relative Risk § Relative risk is • the risk of an “event” relative

44 Relative Risk § Relative risk is • the risk of an “event” relative to an “exposure” • the ratio of the probability of the event occurring among “exposed” versus “non-exposed” • If A and B are independent, the relative risk is 1 § In our rule B is the EVENT and A is the EXPOSURE

45 Let’s Practice! § Find the Relative Risk of High Calcium Concentration Given Calcium

45 Let’s Practice! § Find the Relative Risk of High Calcium Concentration Given Calcium Oxalate Crystal Presence • Note: this is the reverse of what we probably want in this case, consider that for more practice! • INTERPRET RR: Having a calcium concentration of 5 or more is around 4 times more likely among those with calcium oxalate crystals than among those without.

46 Total Probability Rule

46 Total Probability Rule

47 Bayes’ Rule § We want to find P(A|B) so that we will need

47 Bayes’ Rule § We want to find P(A|B) so that we will need to “rearrange” the formula swapping A’s and B’s

48 Bayes’ Rule

48 Bayes’ Rule

49 Let’s Verify! CONDITIONAL Probability of an Event • Given the individual has a

49 Let’s Verify! CONDITIONAL Probability of an Event • Given the individual has a calcium concentration of 5 or above, what is the probability the individual has calcium oxalate crystals? • We have a small amount of rounding error this time Table of group by r group (Calcium Concentration r (Calcium Oxalate Group) Crystal Presence) Frequency No Yes Total 0 -1. 99 19 4 23 2 -4. 99 17 9 26 5 -7. 99 5 11 16 8 or More 1 7 8 42 31 73

50 SCREENING TESTS and ROC Curves

50 SCREENING TESTS and ROC Curves

51 Screening Tests

51 Screening Tests

52 Sensitivity & Specificity Has Condition “Epi” Style Does not have Condition Test Positive

52 Sensitivity & Specificity Has Condition “Epi” Style Does not have Condition Test Positive A TP B FP Total Positive Test (A+B) Test Negative C FN D TN Total Negative Test (C+D) Number with Condition (A+C) Number without Condition (B+D)

53 Sensitivity & Specificity group (Calcium Concentration Group) Frequency Total Has Condition r (Calcium

53 Sensitivity & Specificity group (Calcium Concentration Group) Frequency Total Has Condition r (Calcium Oxalate Crystal Presence) Yes No Total 0 -1. 99 4 19 23 2 -4. 99 9 17 26 5 -7. 99 11 5 16 8 or More 7 1 8 31 42 73 0 -1. 99 NEGATIVE 2 or more POSITIVE Does not have Condition 4 19 27 23 31 42

54 Sensitivity & Specificity group (Calcium Concentration Group) Frequency Total Has Condition r (Calcium

54 Sensitivity & Specificity group (Calcium Concentration Group) Frequency Total Has Condition r (Calcium Oxalate Crystal Presence) Yes No Total 0 -1. 99 4 19 23 2 -4. 99 9 17 26 5 -7. 99 11 5 16 8 or More 7 1 8 31 42 73 0 -4. 99 NEGATIVE 5 or more POSITIVE Does not have Condition 13 36 18 6 31 42

55 Sensitivity & Specificity group (Calcium Concentration Group) Frequency Total Has Condition r (Calcium

55 Sensitivity & Specificity group (Calcium Concentration Group) Frequency Total Has Condition r (Calcium Oxalate Crystal Presence) Yes No Total 0 -1. 99 4 19 23 2 -4. 99 9 17 26 5 -7. 99 11 5 16 8 or More 7 1 8 31 42 73 0 -7. 99 NEGATIVE 8 or more POSITIVE Does not have Condition 24 41 7 1 31 42

56 Bayes’ Rule Here we Define: A = Disease B = Test Positive Negative

56 Bayes’ Rule Here we Define: A = Disease B = Test Positive Negative 0 - 4. 99 Positive ≥ 8 Has Condition Does not have Condition 24 41 7 1 31 42

57 Choosing Different Cut-Off 2 or more High Sensitivity but Low Specificity Cut-point Sensitivity

57 Choosing Different Cut-Off 2 or more High Sensitivity but Low Specificity Cut-point Sensitivity Specificity 2 or more 0. 87 0. 45 5 or more 0. 58 0. 86 8 or more 0. 23 0. 98

58 Choosing Different Cut-Off 5 or more Specificity Increased But you reduce sensitivity (orange

58 Choosing Different Cut-Off 5 or more Specificity Increased But you reduce sensitivity (orange arrow) Cut-point Sensitivity Specificity 2 or more 0. 87 0. 45 5 or more 0. 58 0. 86 8 or more 0. 23 0. 98

59 Choosing Different Cut-Off 8 or more Very High Specificity Very Low Sensitivity (High

59 Choosing Different Cut-Off 8 or more Very High Specificity Very Low Sensitivity (High False Negative Rate) Cut-point Sensitivity Specificity 2 or more 0. 87 0. 45 5 or more 0. 58 0. 86 8 or more 0. 23 0. 98

60 What happens when § We assign all individuals a positive test result? •

60 What happens when § We assign all individuals a positive test result? • Sensitivity = P(Test+|Disease) = 1 • Specificity = P(Test-|No Disease) = 0 • 1 – Specificity = 1 § We assign all individuals a negative test result? • Sensitivity = P(Test+|Disease) = 0 • Specificity = P(Test-|No Disease) =1 • 1 – Specificity = 0

61 Receiver Operating Characteristic curve (ROC curve) Cut-point Sensitivity Specificity 2 or more 0.

61 Receiver Operating Characteristic curve (ROC curve) Cut-point Sensitivity Specificity 2 or more 0. 87 0. 45 5 or more 0. 58 0. 86 8 or more 0. 23 0. 98

ROC Curves § Area under the curve = probability that for a randomly selected

ROC Curves § Area under the curve = probability that for a randomly selected pair of normal and abnormal subjects, the test will correctly identify the normal subject given the “measurement” § Area = 0. 89 for the example on the left 62

63 Trapezoidal Rule (FYI)

63 Trapezoidal Rule (FYI)