In todays lecture Probability n Counting methods Permutations
In today’s lecture… Probability n Counting methods- Permutations & Combinations n Independence n Non-independence/Bayes’ Rule n
Example: Prostate Cancer Study n n Thompson et al. (2006) Prostate Specific Antigen (PSA) evaluation leads to early detection of prostate cancer Study looked at 5519 men who underwent prostate biopsy Characteristics looked at: age, race, history of prostate cancer, previous biopsies/screening
Table 1: Racial characteristics of study participants (n=5519) n % White 5310 96. 2 ~ 96 Black 209 3. 8~ 4 Race
Table 2: Number of prostate cancers and high grade prostate cancers
Probability: Chance of something happening (from 0 -1) n 0: cannot happen n 1: sure to happen P(A) = probability that event “A” will occur P(PSA 0 -1) = probability that PSA level is from 0 -1 A B P(A&B) P(~A) = probability that event “A” will NOT occur [complement] P(~PSA 0 -1)= probability that PSA level is NOT from 0 -1 P(A & B) = the probability that both A and B happen [joint probability] P(PSA 0 -1 & white) = the probability of being a white male with PSA 0 -1 P(A|B) = the probability that A occurs, given that B occurred [conditional probability] P(PSA 0 -1|white) = the probability that PSA is 0 -1, given that the patient is white A *Sainani K. , Stanford P(A/B) B
Assessing Probability 1. Theoretical/Classical probability—based on theory (a priori understanding of a phenomena) e. g. : theoretical probability of rolling a 2 on a standard die is 1/6 theoretical probability of choosing an ace from a standard deck is 4/52 theoretical probability of getting heads on a regular coin is 1/2 2. Empirical probability—based on empirical data e. g. : you toss an irregular die (probabilities unknown) 100 times and find that you get a 2 twenty-five times; empirical probability of rolling a 2 is 1/4 empirical probability of an Earthquake in Bay Area by 2032 is. 62 (based on historical data) empirical probability of a lifetime smoker developing lung cancer is 15 percent (based on empirical data) *Sainani K. , Stanford
Computing theoretical probabilities: counting methods Great for gambling! Fun to compute! If outcomes are equally likely to occur… these are called “counting methods” because we have Note: to count the number of ways A can occur and the number of total possible outcomes. *Sainani K. , Stanford
Applying our example… P (PSA level 0 -1) = (# cases with PSA 0 -1) (total number of cases) = (1963)/(5519) = 0. 35 P (PSA level >6) = (# cases with PSA >6) (total number of cases) = (150)/(5519) = 0. 03 You randomly pick a patient to test his PSA. What’s the probability that he is white? P(white) = (# cases who are white) (total number of cases) = (5310)/(5519) = 0. 96 …that he is black? P(black) = (# cases who are black) (total number of cases) = (209)/(5519) = 0. 04
Example 2 What’s the probability that you pick two patients who are black? P(1 st patient black) = (# cases who are black) (total number of cases) = (209)/(5519) = 0. 038 ~ 0. 04 P(2 nd patient black) = (# cases who are black) (total number of cases) = (208)/(5518) = 0. 037 ~0. 04 P(black & black) = P(1 st patient black) x P(2 nd patient black) = 0. 0016 This is an example of joint probability…more on this coming up!
Example 3 If you have 5 patients (3 white, 2 black), and you want to test PSA of two randomly chosen patients, what’s the probability that they are white (W) and black (B)? Considering order of picking, P (1 B, 1 W patient) = # ways to pick one B, one W pair # total patient pairs Numerator = W 1 B 1 W 1 B 2 W 2 B 1 W 2 B 2 W 3 B 1 W 3 B 2 B 1 W 1 B 2 W 1 B 1 W 2 B 2 W 2 B 1 W 3 B 2 W 3 = 12 Denominator = 5 x 4 = 20 P(1 B, 1 W) = 12/20 = 0. 6 5 patients 4 patients
Applying our PSA example, using a probability tree… Second pick First pick P(B)=0. 04, From our example 2 Outcome P(BB)=0. 04*0. 04 = 0. 0016 P(B=. 04) P(W=. 96) P(BW)=0. 04*0. 96 = 0. 038 P(B=. 04) P(WB)=0. 04*0. 96 = 0. 038 P(W=. 96) P(WW)=0. 96*0. 96 = 0. 922 P(W=. 96) P(1 B, 1 W) = P(BW) +P(WB) = 0. 038 + 0. 038 = 0. 076 Rule of thumb: in probability, “and” means multiply, “or” means add
Ignoring order of picking: P (1 B, 1 W patient) = (# ways to pick one B, one W ) (total # ways to pick 2 patients) Numerator = W 1 B 1 W 1 B 2 W 2 B 1 W 2 B 2 W 3 B 1 W 3 B 2 = 6 Denominator = (5 x 4)/2 We divide out the order, by dividing by 2 here P (picking a B, W patient) = 6 = 12 = 0. 6 (5 x 4)/2 20
Summary of Counting Methods Counting methods for computing probabilities Permutations— order matters! Combinations— Order doesn’t matter With replacement Without replacement *Sainani K. , Stanford
Permutations—Order matters! A permutation is an ordered arrangement of objects. With replacement = once an event occurs, it can occur again (after you roll a 6, you can roll a 6 again on the same die). Without replacement = an event cannot repeat (after you draw an ace of spades out of a deck, there is 0 probability of getting it again). *Sainani K. , Stanford
Permutations with replacement n Sample space: the set of all possible outcomes. Example: in genetics, if both the mother and father carry one copy of a recessive disease-causing mutation (d), there are three possible outcomes (the sample space): § § § n child is not a carrier (DD) child is a carrier (Dd) child has the disease (dd). Probabilities: the likelihood of each of the possible outcomes (always 0 P 1. 0). P(genotype=DD)=. 25 ¨ P(genotype=Dd)=. 50 ¨ P(genotype=dd)=. 25. ¨ *Sainani K. , Stanford
Summary: order matters, with replacement Formally, “order matters” and “with replacement” use powers Equation for total number of possible outcomes: *Sainani K. , Stanford
Example 1: What’s the chance of having a child with the disease(dd) if both parents are heterozygote (Dd)? Mother’s allele Child’s outcome Father’s allele P(DD) =. 5*. 5 =. 25 P(♂D=. 5) P(♀D=. 5) P(♂d=. 5) P(Dd) =. 5*. 5 =. 25 P(♀d=. 5) P(♂D=. 5) P(♂d=. 5) P(d. D) =. 5*. 5 =. 25 P(dd) =. 5*. 5 =. 25 _______ 1. 0 P(dd) = 1 way to get (dd) 22 possible outcomes = 1/4 = 0. 25 *Sainani K. , Stanford
Permutations without replacement Example 1: Suppose you want to test PSA levels of 4 patients: A, B, C, D. How many ways can you test them? C A B C D B A C D C B A D D B C A. . B A B C C D A C D D D OR# permutations = 4 x 3 x 2 x 1 = 4! = 24 Reminder! So there are 4! ways of doing 4 tests for 4 patients Factorial notation: n! =n x (n-1) x (n-2) x………. x 1
Example 2: What if you had 3 different tests and 5 people? Test 3: Test 2: Only 4 possible Test 1: 5 possible B A B C D E A B C D *Sainani K. , Stanford only 3 possible D E
Summary: order matters, without replacement Formally, “order matters” and “without replacement” use factorials Note: This formula also worked for Example 1. We were picking 4 people for 4 spots. So, 4!/ (4 -4)! = 4!/0! = 4! = 24 *Sainani K. , Stanford
Recall Permutation Theory… If you want to see if there is a difference between the mean PSA scores for black ( n=209) and white patients (n=5310) in PSA example: 1. Calculate mean scores of black patients & white patients 2. Shuffle scores of 5000 random patients, 3. Number of possible permutations of shuffling are: (5519!) = A huge number of permutations (5519 -5000)! 4. Compare original mean scores to mean scores of each permutation.
2. Combinations—Order doesn’t matter A combination helps determine the number of ways “r” objects can be chosen from “n” larger group of objects Introduction to combination function, or “choosing” Written as: Spoken: “n choose r” *Sainani K. , Stanford
Example of combinations If you have 3 identical tests. What are the # of ways you can choose 3 out of 5 patients, to be tested? = 5 C 3 = 5! = 5 x 4 = 10 3! (5 -3)! 2
Example: Distinct vs. Nondistinct objects Suppose you want to calculate mean PSA scores of 4 patients (3 white, 1 black): A (White), B (White), C (White), & D (Black). How many ways can we arrange the 4 patients based on race? Total number of arrangements of 4 people = 4! = 24 However, based on race, 3 of them are identical (White- A, B, C) and 1 of them is identical (Black- D). If you only consider race of the patients, there will be fewer arrangements possible…
For example: arrangement A B C D (W W W B) = arrangement C B A D (W W W B) In fact, the arrangement (W W W B) can be done in 6 distinct ways: A B C D A C B D B C A D B A C D C A B D C B A D = 3! permutations of white patients x 1 permutation of the black patient = 6 x 1 = 6 This is one race based arrangement.
Similarly, the arrangement (W W B W) can be done in 3!x 1!=6 ways : A B D C A C D B B C D A B A D C C A D B C B D A
Since we don’t care about order, 4! ways of arranging the 4 patients is reduced to: 4! = 4! = 4 3! 1! 6 Hence, number of ways of arranging n objects, of which k are white and m are black: = n! k! m! ( 1. White or black are just examples of being nondistinct 2. Can be extended to any number of nondistinct sets)
This is also a “choosing” problem since we are choosing 3 tests for white patients & 1 for the black patient: 4 C 3 = 4 C 1 = 4! (3!)(1!) = 4
Summary: combinations If r objects are taken from a set of n objects without replacement and disregarding order, how many different samples are possible? Formally, “order doesn’t matter” and “without replacement” use choosing *Sainani K. , Stanford
Summary of Counting Methods Counting methods for computing probabilities Permutations— order matters! With replacement: nr Without replacement: n(n-1)(n-2)…(n-r+1)= *Sainani K. , Stanford Combinations— Order doesn’t matter Without replacement:
Independence Formal definition: A and B are independent if and only if P(A&B)=P(A)*P(B) Going back to our Genetics example: The mother’s and father’s alleles are segregating independently. P(♂D|♀D)=. 5 and P(♂D|♀d)=. 5 Joint Probability: The probability of two events happening simultaneously. Conditional Probability: Read as “the probability that the father passes a D allele given that the mother passes a d allele. ” What father’s gamete looks like is not dependent on the mother’s – doesn’t depend which branch you start on! Marginal probability: This is the Formally, P(DD)=. 25=P(D♂)*P(D♀) *Sainani K. , Stanford probability that an event happens at all, ignoring all other outcomes.
On the tree Marginal probability: mother Mother’s allele Conditional probability Joint probability Child’s outcome Father’s allele P(DD)=. 5*. 5=. 25 P(♂D/ ♀D )=. 5 P(♀D =. 5) P(♀d=. 5) P(♂D=. 5) P(♂d=. 5) *Sainani K. , Stanford P(Dd)=. 5*. 5=. 25 P(d. D)=. 5*. 5=. 25 P(dd)=. 5*. 5=. 25 _______ 1. 0 Marginal probability: father
Independent mutually exclusive n n n Events A and ~A are mutually exclusive, but they are NOT independent. P(A&~A)= 0 P(A)*P(~A) 0 Conceptually, once A has happened, ~A is impossible; thus, they are completely dependent. *Sainani K. , Stanford
Practice problem If HIV has a prevalence of 3% in San Francisco, and a particular HIV test has a false positive rate of. 001 and a false negative rate of. 01, what is the probability that a random person selected off the street will test positive?
Answer Marginal probability of carrying the virus. Conditional probability: the probability of testing + given that a person is + P(test +)=. 99 P(+)=. 03 P(test - )=. 01 P(test +) =. 001 P(-)=. 97 P(test -) =. 999 Joint probability of being + and testing + P (+, test +)=. 0297 P(+, test -)=. 003 P(-, test +)=. 00097 P(-, test -) =. 96903 _______ 1. 0 Marginal probability of testing positive P(test +)=. 0297+. 00097=. 03067 P(+&test+) P(+)*P(test+). 0297 . 03*. 03067 (=. 00092) Dependent! *Sainani K. , Stanford
Law of total probability One of these has to be true (mutually exclusive, collectively exhaustive). They sum to 1. 0. *Sainani K. , Stanford
Law of total probability n n Formal Rule: Marginal probability for event A= Where: B 1 A B 2 B 3 *Sainani K. , Stanford
Non independent events/Conditional Probability When two events are not independent, the occurrence of one event depends on whether the other has occurred
Bayes’ Rule
Bayes’ Rule Definition: Let A and B be two events with P(B) 0. The conditional probability of A given B is: *Sainani K. , Stanford
Bayes’ Rule: OR From the “Law of Total Probability” *Sainani K. , Stanford
In-Class Exercise n If HIV has a prevalence of 3% in San Francisco, and a particular HIV test has a false positive rate of. 001 and a false negative rate of. 01, what is the probability that a random person who tests positive is actually infected (also known as “positive predictive value”)? *Sainani K. , Stanford
Answer: using probability tree P(test +)=. 99 P(+)=. 03 P (+, test +)=. 0297 P(test - =. 01) P(+, test -)=. 003 P(test +) =. 001 P(-)=. 97 P(test -) =. 999 P(-, test +)=. 00097 P(-, test -) =. 96903 _______ 1. 0 A positive test places one on either of the two “test +” branches. But only the top branch also fulfills the event “true infection. ” Therefore, the probability of being infected is the probability of being on the top branch given that you are on one of the two circled branches above. *Sainani K. , Stanford
Answer: using Bayes’ rule *Sainani K. , Stanford
Conditional probability in epidemiology: Odds and Risk (probability) Definitions: Risk = P(A) = cumulative probability (you specify the time period!) For example, what’s the probability that a person with a high sugar intake develops diabetes in 1 year, 5 years, or over a lifetime? Odds = P(A)|P(~A) For example, “the odds are 3 to 1 against a horse” means that the horse has a 25% probability of winning. Note: An odds is always higher than its corresponding probability, unless the probability is 100%. *Sainani K. , Stanford
Introduction to the 2 x 2 Table Exposure (E) a No Exposure (~E) b Disease (D) No Disease (~D) c d c+d = P(~D) a+c = P(E) b+d = P(~E) Marginal probability of exposure *Sainani K. , Stanford Marginal probability of disease a+b = P(D)
Coming soon…(Applications of today’s lecture) More on odds ratios, risk ratios n Patterns of categorical data/distributions n Frequency tables n Chi square n Logistic regression n Kaplan Meier, survival analysis n Special thanks to Dr. Cobb for her great slides from last year!
- Slides: 47