Topic 6 Probability Dr J Frost jfrosttiffin kingston
Topic 6: Probability Dr J Frost (jfrost@tiffin. kingston. sch. uk) Last modified: 18 th July 2013
Slide guidance Key to question types: SMC Senior Maths Challenge Uni Questions used in university interviews (possibly Oxbridge). www. ukmt. org. uk The level, 1 being the easiest, 5 the hardest, will be indicated. BMO British Maths Olympiad Those with high scores in the SMC qualify for the BMO Round 1. The top hundred students from this go through to BMO Round 2. Questions in these slides will have their round indicated. MAT Maths Aptitude Test Admissions test for those applying for Maths and/or Computer Science at Oxford University Interview Frost A Frosty Special Questions from the deep dark recesses of my head. Classic Well known problems in maths. STEP Exam used as a condition for offers to universities such as Cambridge and Bath.
Slide guidance ? Any box with a ? can be clicked to reveal the answer (this works particularly well with interactive whiteboards!). Make sure you’re viewing the slides in slideshow mode. For multiple choice questions (e. g. SMC), click your choice to reveal the answer (try below!) Question: The capital of Spain is: A: London B: Paris C: Madrid
Topic 6: Probability Part 1 – Manipulating Probabilities Part 2 – Random Variables a. b. c. d. e. Random Variables Discrete and Continuous Distributions Mean and Expected Value Uniform Distributions Standard Deviation and Variance Part 3 – Common Distributions a. b. c. d. e. Binomial Bernoulli Poisson Geometric Normal/Gaussian
Some starting notes Only some of those reading this will have done a Statistics module at A Level. Therefore only GCSE knowledge is assumed. There is some overlap with the field of Combinatorics. For probability problems relating to ‘arrangements’ of things, look there instead. Probability and Stats questions in… In university interviews… In SMC In STEP Probability questions frequently come up (although not technically requiring any more than GCSE theory). In my experience, applicants tend to do particularly bad at these questions. Harder probability questions are quite rare (although one appeared towards the end of 2012’s paper) Two questions at the end of every paper. You could avoid these, but you broaden your choice if you prepare for these. In BMO Used to be moderately common, but less so nowadays
ζ Topic 6 – Probability Part 1: Manipulating Probabilities
Events and Sets An event is a set of outcomes. ? ? ?
GCSE Recap ? ? ?
More useful identities ? ?
Conditional probabilities We might want to express the probability of an event given that another occurred: The probability that A occurred given B occurred. To appreciate conditional probabilities, consider a probability tree: 1 st pick 2 nd pick Red Green This represents the probability that a green counter was picked second GIVEN that a red counter was picked first.
Conditional probabilities Using the tree, we can construct the following identity for condition probabilities: A B
Conditional probabilities The Common Sense Method The Formal Method ? ?
Examples (Source: Edexcel) ? ? ?
Bayes’ Rule relates causes and effects. It allows us find the probability of the cause given the effect, if we know the probability of the effect given the cause. ?
Bayes’ Rule But we don’t always need to know the probability of the effect. Question: The probability that a game is called off if it’s raining is 0. 7. The probability it’s called off if it didn’t rain (e. g. due to player illness) is 0. 05. The probability that it rains on any given day is 0. 2. Andy Murray’s game is called off. What’s the probability that rain was the cause?
Bayes’ Rule Question: The probability that a game is called off if it’s raining is 0. 7. The probability it’s called off if it didn’t rain (e. g. due to player illness) is 0. 05. The probability that it rains on any given day is 0. 2. Andy Murray’s game is called off. What’s the probability that rain was the cause? ? ?
ζ Topic 6 – Probability Part 2: Random Variables
Random Variables A random variable is a variable which can have multiple values, each with an associated probability. The variable can be thought of as a ‘trial’ or ‘experiment’, representing something which can have a number of outcomes. A random variable has 3 things associated with it: 1 The outcomes The values the random variable can have (e. g. outcomes of the throw of a die) 2 A probability function The probability associated with each outcome. Parameters These are constants used in our probability function that can be set (e. g. number of throws) 3 (formally known as the ‘support vector’)
Example random variables This symbol means “for all” Random variable (X) Outcomes Parameters? The single throw of a fair die. {1, 2, 3, 4, 5, 6} None The single throw of an unfair die. {1, 2, 3, 4, 5, 6} We can set the probability of ? each outcome: p 1, p 2, …, p 6 ? ? We use capital letters for random variables. Parameters are values we can control, but do not change across different outcomes. We’ll see plenty more examples. ? Probability Function ? ?
Sketching the probability function Probabilities It’s often helpful to show the probability function as a graph. Suppose a random variable X represents the single throw of a biased die: P(X) 1 2 3 4 5 6 X Outcomes
Discrete vs Continuous Distributions Discrete distributions are ones where the outcomes are discrete, e. g. throw of a die, number of Heads seen in 10 throws, etc. In contrast continuous distributions allow us to model things like height, weight, etc. Here’s two possible probability functions: Discrete Continuous 0. 3 P(X=h) P(X=k) 0. 4 0. 2 0. 1 1 2 3 4 Number of times target hit (k) 1. 2 m 1. 4 m 1. 6 m 1. 8 m 2. 0 m 2. 2 m 2. 4 m Height of randomly picked person (h)
Discrete vs Continuous Distributions Discrete P(X=k) 0. 4 0. 3 0. 2 0. 1 1 2 3 4 Number of times target hit (k) We call the probability function the: Probability mass function (PMF for short)
Continuous Distributions Does it make sense to talk about the probability of someone being exactly 2 m? ? P(X=h) Clearly not, but we could for example find the probability of a height being in a particular range. 1. 0 m 1. 2 m 1. 4 m 1. 6 m 1. 8 m 2. 0 m 2. 2 m 2. 4 m 2. 6 m Height of randomly picked person (h) The probability associated with a particular value is known as the probability density. It’s value alone is not particular meaningful (and can be greater than 1!), but finding the area in a range gives us a probability mass. This is similar to histograms, where the y-axis is the ‘frequency density’, and finding the area under the bars gives us the frequency.
Probability Density Question: Archers fire arrows at a target. The probability of the arrow being a certain distance from the centre of the target is proportional to this distance. No archer is terrible enough that his arrow will be more than 1 m from the centre. What’s the probability that an arrow is less than 0. 5 m from the centre? Source: Frosty Special ? 2 Probability is proportional to distance. Maximum distance is 1 m. P(X=x) Since area under graph must be 1, then maximum probability density must be 2, so that the area of triangle is ½ x 2 x 1 = 1. 0. 5 1 Distance of arrow from centre (x) We’re finding the probability of the arrow being between 0 m and 0. 5 m, so find the area under the graph in this region. We can see this will be 0. 25.
Probability Density Question: Archers fire arrows at a target. The probability of the arrow being a certain distance from the centre of the target is proportional to this distance. No archer is terrible enough that his arrow will be more than 1 m from the centre. What’s the probability that an arrow is less than 0. 5 m from the centre? Source: Frosty Special Alternatively, using a cleaner integration approach: Step 1: Use the information to express the proportionality relationship: ? Step 3: Finally, integrate desired range. ?
Mean and Expected Value Mean of a Sample Mean of a Random Variable ? x 0 1 2 3 P(X=x) 0. 25 0. 05 0. 2 ?
Expected Value Question: Two people randomly think of a real number between 0 and 100. What is the expected difference between their numbers? (i. e. the average range) (Source: Frosty Special) (Hint: Make your random variable the difference between the two numbers ) As with many problems, it’s easier to consider a simpler scenario. Consider just say integers between 0 and 10. How many ways can the numbers be chosen if the range is 0? Or the range is 1? Or 2? What do you notice? Step 1: Use the information to express the proportionality relationship: ? ?
Expected Value Question: Two people randomly think of a real number between 0 and 100. What is the expected difference between their numbers? (i. e. the average range) (Source: Frosty Special) (Hint: Make your random variable the difference between the two numbers ) Step 3: Finally, given our known PDF, find E[X] ? One of the harder problem sheet exercises is to consider what happens when we introduce a 3 rd number!
Modifying Random Variables We often modify the value of random variables. Example: X = outcome of a single throw of a die, Y = outcome of another die Consider X + 1 What does it mean? We add 1 to all the outcomes of the die (i. e. we now have 2 to 7) ? The probabilities remain unaffected. How does the expected value change? ? ?
Modifying Random Variables We often modify the value of random variables. Example: X = outcome of a single throw of a die, Y = outcome of another die Now consider X + Y What does it mean? We consider all possible outcomes of X and Y, and combine them by adding them. The new ? set of outcomes is 2 to 12. Clearly we need to recalculate the probabilities.
Uniform Distribution A uniform distribution is one where all outcomes are equally likely. Discrete Example Continuous Example ? ?
Standard Deviation and Variance of a Sample Variance of a Random Variable The expected value of a value is just the value itself.
Standard Deviation and Variance Example: Find the variance of this biased spinner (which just has the values 1 and 2), represented by the random variable X. k 1 2 P(X=k) 0. 6 0. 4 ?
STEP Question (We’ll do part (b) a bit later) What might be going through your head: “I need to consider each of the 3 cases. ” “I have a PDF. This requires me to use definite integration. ”
STEP Question
Mean and Variance of Random Variables Click to choose points uniformly across x. No: We can see that because the lines are steeper either side of the circle, we’d likely have less points in these regions, and thus we haven’t chosen a point with uniform distribution around the circle. We’d have a similar problem if we were trying to pick a random point on a sphere, and picked a random latitude/longitude coordinate (we’d favour the poles)
Mean and Variance of Random Variables In which case, how can we make sure we pick a point randomly? ? ?
Summary
ζ Topic 6 – Probability Part 3: Common Distributions
Common Distributions We’ve seen so far that can build whatever random variable we like using two essential ingredients: specifying the outcomes, and specifying a PMF/PDF that associates a probability with each outcome. But there’s a number of well-known distributions for which we already have the outcomes and probability function defined: we just need to set some parameters. Bernoulli Multivariate Binomial e. g. Throw of a (possible biased) coin. e. g. Throw of a (possibly biased) die. e. g. Counts the number of heads and tails in 10 throws. Multinomial Poisson Geometric e. g. Counting the number of each face in 10 throws of a die. e. g. Number of cars which pass in the next hour given a known average rate. e. g. The number of times I have to flip a coin before I see a heads. We won’t explore these. Exponential Dirichlet e. g. The possible time before a volcano next erupts. e. g. The possible probability distributions for the throw of a die, given I threw a die 60 times and saw 10 ones, 10 twos, 10 threes, 10 fours, 10 fives and 10 sixes.
Bernoulli Distribution The Bernoulli Distribution is perhaps the most simple distribution. It models an experiment with just two outcomes, often referred to as ‘success’ and ‘failure’. It might represent the single throw of a coin. (where ‘Heads’ could represent a ‘success’) Description A single trial with two outcomes. Outcomes “Failure”/”Success”, or {0, 1} ? Parameters? p, the probability of ? success. Probability Function ? A trial with just two outcomes is known as a Bernoulli Trial. A sequence of Bernoulli Trials (all independent of each other) is known as a Bernoulli Process. An example is repeatedly flipping a coin, and recording the result each time.
Binomial Distribution Question: If I throw a biased coin (with probability of heads p) 8 times, what is the probability I see 3 heads? H H H T T T ? ? ?
Binomial Distribution Therefore, in general, the probability of k successes in n trials is: Description Outcomes Parameters? Probability Function Binomial D Number of ‘successes’ in n trials. {0, 1, 2, … , n} i. e. between 0 ? and n successes. ? ?
Frost Real-Life Example While on holiday in Hawaii, I was having lunch with a family, where an unusually high number were left-handed: 5 out of the 8 of us (including myself). I was asked what the probability of this was. (Roughly 10% of the world population is left-handed. ) ? ? (This example points out one of the assumptions of the Binomial Distribution: that each trial is independent. But this was unlikely to be the case, since most on the table were related, and left-handedness is in part hereditary. Sometimes when we model a scenario using an ‘offthe-shelf’ distribution, we have to compromise by making simplifying assumptions. )
Summary of Distributions so far Similarly, a multivariate distribution represents a single trial with any number of outcomes. A multinomial distribution is a generalisation of the Binomial Distribution, which gives us the probability of counts when we have multiple outcomes. Generalise to n trials Bernoulli e. g. “What’s the probability of getting a Heads? ” Binomial e. g. “What’s the probability of getting 3 Heads and 2 Tails? ” Generalise to k outcomes Multivariate e. g. “What’s the probability of getting a 5? Generalise to n trials Multinomial e. g. “What’s the probability of rolling 3 sixes, 2 fours and a 1? (Use your combinatorics knowledge to try and work out the probability function for this!)
Poisson Distribution Cars pass you on a road at an average rate of 5 cars a minute. What’s the probability that 3 cars will pass you in the next minute? When you have a known average ‘rate’ of events occurring, we can use a Poisson Distribution to model the number of events that occur within that period. We can see that when the average rate is 10 (say per minute), we’re most likely to see 10 cars. But technically, we could see a million cars (even if the probability is very low!) k is the number of events (e. g. seeing a car) that occur.
Poisson Distribution Assumptions that the Poisson Distribution makes: 1. All events occur independently (e. g. a car passing you doesn’t affect when the next car will pass you). 2. Events occur equally likely at any of time (e. g. we’re not any more likely to see cars at the beginning of the period than at the end) Description Outcomes Number of events {0, 1, 2, … } up to occurring within a infinity. ? fixed period given an average rate. i. e. The Poisson Distribution is a DISCRETE distribution. Parameters? Probability Function ? ?
Poisson Distribution Example: An active volcano erupts on average 5 times each year. It’s equally likely to erupt at any time. Q 1) What’s the probability that it erupts 10 times next year? ? Q 2) What’s the probability that it erupts at all next year? ? Q 3) What’s the probability that it next erupts between 2 and 3 years after the current date? ?
Relationship to the Binomial Distribution Imagine that we segment this fixed period into a number of smaller chunks of time, in each of which an event can occur (which we’ll describe as a ‘success’), or not occur. 1 minute A car passed in this period! If we presumed that we only had at most one car passing in each of these smaller periods of time, then we could use a Binomial Distribution to model the total number of cars that pass across 1 minute, because it models the number of successes. Of course, multiple cars could actually pass within each smaller segment of time. How would we fix this?
Relationship to the Binomial Distribution We could simply use smaller chunks of time – in the limit, we have tiny slivers of time, so instantaneous that we couldn’t possibly have two cars passing at exactly the same time. 1 minute ?
Uniform Distribution We saw earlier that a uniform distribution is where each outcome is equally likely. Description Each outcome is equally likely. Outcomes x 1, x 2, …, xn ? Parameters? None. Probability Function ? Examples: The throw of a fair die, the throw of a fair coin, the possible lottery numbers this week (presuming the ball machine isn’t biased!). ?
Geometric Distribution You, Christopher Walken, are captured by the Viet Cong during the Vietnam War, and forced to play Russian Roulette. The gun has 6 slots on the barrel, one of which has a bullet, and the other slots empty. Before each shot, you rotate the barrel randomly, then shoot at your own head. If you survive, you repeat this ordeal. Q 1) What’s the probability that you die on the first shot? ?
Geometric Distribution If you have a number of trials, where in each trial you can have a ‘success’ or ‘failure’, and you repeat the trial until you have a success (at which point you stop), then the geometric distribution gives you the probability of succeeding on the 1 st trial, the 2 nd trial, and so on. Description Outcomes Succeeding on the { 1, 2, 3, … } The trial on which xth trial after ? previously failing. you succeed. Parameters? ? Probability Function ?
Geometric Distribution SMC Level 5 Level 4 Level 3 Level 2 Level 1
Frost Real-Life Example My mum (who works at John Lewis), was selling London Olympics ‘trading cards’, of which there were about 200 different cards to collect, and could be bought in packs. Her manager was curious how many cards you would have to buy on average before you collected them all. The problem was passed on to me! (Note: Assume for simplicity that each card is equally likely to be acquired – unlike say ‘Pokemon cards’ [a childhood fad I never got into], where lower numbered cards are rarer) Hint: Perhaps think of the trials needed to collect the next card as a geometric process? Then consider these processes all combined together. ? Explanation on next slide…
Frost Real-Life Example
Coin Conundrums How would you model a fair coin given you have just a fair die? Solution: Easy! Roll the die. If you get say an even number, declare ? ‘Heads’, else declare ‘Tails’. How would you model a fair die given you have just a fair coin? ? How would you model a fair coin using an unfair coin? ?
Gaussian/Normal Distribution P(X=x) A Gaussian/Normal distribution is a continuous distribution which has a ‘bell-curve’ type shape. It’s useful for modelling variables where the values are clustered, about the mean, and spread out around it with probability dropping off. 70 85 100 115 130 145 IQ (“Intelligence Quotient”) x IQ is a good example. The mean is (by definition) 100, and the probability of having an IQ drops off symmetrically, with Standard Deviation 15 (by definition).
Z-values P(X=x) 70 85 100 115 130 145 IQ (“Intelligence Quotient”) x ?
Z-values …and our hundredths digit here. We look up the units and tenths digit of our z-value here… ?
Z-values
- Slides: 61