Chapter 7 Random Variables and Discrete probability Distributions

Chapter 7 Random Variables and Discrete probability Distributions 1

7. 2 Random Variables and Probability Distributions A random variable is a function or rule that assigns a numerical value to each simple event in a sample space. n A random variable reflects the aspect of a random experiment that is of interest for us. n There are two types of random variables: n Discrete random variable n Continuous random variable. n 2

Discrete and Continuous Random Variables A random variable is discrete if it can assume a countable number of values. n A random variable is continuous if it can assume an uncountable number of values. Discrete random variable Continuous random variable n After the first value is defined, the second value, and any value any number can be the next one thereafter are known. 0 1 2 3. . . 01/161/4 Therefore, the number of values is countable 1/2 1 Therefore, the number of values is uncountable 3

Discrete Probability Distribution n A table, formula, or graph that lists all possible values a discrete random variable can assume, together with associated probabilities, is called a discrete probability distribution. n To calculate the probability that the random variable X assumes the value x, P(X = x), n n n add the probabilities of all the simple events for which X is equal to x, or Use probability calculation tools (tree diagram), Apply probability definitions 4

Requirements for a Discrete Distribution n If a random variable can assume values xi, then the following must be true: 5

Distribution and Relative Frequencies n n In practice, often probability distributions are estimated from relative frequencies. Example 7. 1 n A survey reveals the following frequencies (1, 000 s) for the number of color TVs per household. Number of TVs Number of Households x p(x) 0 1218/Total = 1. 319 2. 374 3. 191 4 1, 218 0 . 012 32, 379 1 37, 961 2 19, 387 3 7, 714 4 6

Determining Probability of Events The probability distribution can be used to calculate the probability of different events n Example 7. 1 – continued Calculate the probability of the following events: n P(The number of color TVs is 3) = P(X=3) =. 191 n P(The number of color TVs is two or more) = P(X³ 2)=P(X=2)+P(X=3)+P(X=4)+P(X=5)= 7. 374 +. 191 +. 076 +. 028 =. 669 n

Developing a Probability Distribution Probability calculation techniques can be used to develop probability distributions n Example 7. 2 n A mutual fund sales person knows that there is 20% chance of closing a sale on each call she makes. n What is the probability distribution of the number of sales if she plans to call three customers? n 8

Probability Finding Probability/ Distribution. Probability Distribution Developing a Probability Distribution n Solution n n Use probability rules and trees Define event S = {A sale is made}. P(S)=. 2 P(SC)=. 8 P(S)=. 2 SSS P(SC)=. 8 P(S)=. 2 S S SC S P(SC)=. 8 P(S)=. 2 S S C SC SC S S P(SC)=. 8 P(S)=. 2 SC S S C SC SC S P(SC)=. 8 SC SC SC P(SC)=. 8 P(S)=. 2 P(SC)=. 8 (. 2)(. 8)=. 032 X 3 2 1 2 P(x). 23 =. 008 3(. 032)=. 096 3(. 128)=. 384 0. 83 =. 512 9

7. 3 Describing the Population/ Probability Distribution The probability distribution represents a population n We’re interested in describing the population by computing various parameters. n Specifically, we calculate the population mean and population variance. n 10

Population Mean (Expected Value) n Given a discrete random variable X with values xi, that occur with probabilities p(xi), the population mean of X is. 11

Population Variance n Let X be a discrete random variable with possible values xi that occur with probabilities p(xi), and let E(xi) = m. The variance of X is defined by 12

The Mean and the Variance n Example 7. 3 n n Find the mean the variance and the standard deviation for the population of the number of color television per household in example 7. 1 Solution n E(X) = m = Sxip(xi) = 0 p(0)+1 p(1)+2 p(2)+…= 0(. 012)+1(. 319)+2(. 374)+… = 2. 084 V(X) = s 2 = S(xi - m)2 p(xi) = (0 -2. 084)2 p(0)+(12. 084)2 p(1) + (2 -2. 084)2+… =1. 107 Using a shortcut formula 1/2 s = 1. 107 = 1. 052 for the variance 13

The Mean and the Variance n Solution – continued n The variance can also be calculated as follows: 14

Laws of Expected Value and Variance Laws of Expected Value § § § E(c) = c E(X + c) = E(X) + c E(c. X) = c. E(X) Laws of Variance § § § V(c) = 0 V(X + c) = V(X) V(c. X) = c 2 V(X) 15

Laws of Expected Value Variance n Example 7. 4 The monthly sales at a computer store have a mean of $25, 000 and a standard deviation of $4, 000. n Profits are 30% of the sales less fixed costs of $6, 000. n Find the mean and standard deviation of the monthly profit. n 16

Laws of Expected Value and Variance n Solution Profit =. 30(Sales) – 6, 000 n E(Profit) = E[. 30(Sales) – 6, 000] E(X + c) = E(X) + c = E[. 30(Sales)] – 6, 000 E(c. X) = c. E(X) =. 30 E(Sales) – 6, 000 =. (30)(25, 000) – 6, 000 = 1, 500 V(X + c) = V(X) n V(Profit) = V(. 30(Sales) – 6, 000] = V[(. 30)(Sales)] V(c. X) = c 2 V(X) = (. 30)2 V(Sales) = 1, 440, 000 n s = [1, 440, 000]1/2 = 1, 200 n 17

7. 4 Bivariate Distributions The bivariate (or joint) distribution is used when the relationship between two random variables is studied. n The probability that X assumes the value x, and Y assumes the value y is denoted n p(x, y) = P(X=x and Y = y) 18

Bivariate Distributions 19

Bivariate Distributions n Example 7. 5 n Xavier and Yvette are two real estate agents. Let X and Y denote the number of houses that Xavier and Yvette will sell next week, respectively. n The bivariate probability distribution is presented next. 20

Bivariate Distributions 0. 42 7. 5 p(x, y) – Example continued X Y 0 1 2 0. 21 0. 12. 21. 07 1. 42. 06. 02 2. 06. 03. 01 0. 12 0. 06 0. 07 0. 02 0. 01 Y X=0 y=0 X 0. 03 y=1 y=2 X=1 X=2 21

Marginal Probabilities n Example 7. 5 – continued n Sum across rows and down columns X p(0, 0) p(0, 1) p(0, 2) Y 0 1 2 p(x) 0. 12. 21. 07. 40 1. 42. 06. 02. 50 The marginal probability P(X=0) 2. 06. 03. 01. 10 p(y). 60. 30. 10 1. 00 P(Y=1), the marginal probability. 22

Describing the Bivariate Distribution The joint distribution can be described by the mean, variance, and standard deviation of each variable. x p(x) y n This is done p(y) using the marginal 0 distributions. 0. 6. 4 n 1. 3 2 2 . 5 1 . 1. 1 E(X) =. 7. 5 E(Y) = 23

Describing the Bivariate Distribution n To describe the relationship between the two variables we compute the covariance and the coefficient of correlation n Covariance: COV(X, Y) = S(X – mx)(Y- my)p(x, y) n Coefficient of Correlation r = COV(X, Y) s xs y 24

Describing the Bivariate Distribution n Example 7. 6 n Calculate the covariance and coefficient of correlation between the number of houses sold by the two agents in Example 7. 5 n Solution • COV(X, Y) = S(x-mx)(y-my)p(x, y) = (0 -. 7)(0 -. 5)p(0, 0)+…(2 -. 7)(2 -. 5)p(2, 2) = -. 15 • r=COV(X, Y)/sxsy = -. 15/(. 64)(. 67) = -. 35 25

Conditional Probability (Optional) Example 7. 5 - continued X Y 0 1 2 p(x) 0. 12. 21. 07. 40 1. 42. 06. 02. 50 2. 06. 03. 01. 10 p(y). 60. 30. 10 1. 00 The sum is equal to 1. 0 26

Conditions for Independence (optional) n Two random variables are said to be independent when P(X=x|Y=y)=P(X=x) or P(Y=y|X=x)=P(Y=y). n This leads to the following relationship for independent variables P(X=x and Y=y) = P(X=x)P(Y=y) n Example 7. 5 - continued • Since P(X=0|Y=1)=. 7 but P(X=0)=. 4, The variables X and Y are not independent. 27

Sum of Two Variables n The probability distribution of X + Y is determined by n n n Determining all the possible values that X+Y can assume For every possible value C of X+Y, adding the probabilities of all the combinations of X and Y for which X+Y = C Example 7. 5 - continued n n Find the probability distribution of the total number of houses sold per week by Xavier and Yvette. Solution • X+Y is the total number of houses sold. X+Y can have the values 0, 1, 2, 3, 4. 28

The Probability Distribution of X+Y P(X+Y=0) = P(X=0 and Y=0) =. 12 P(X+Y=1) = P(X=0 and Y=1)+ P(X=1 and Y=0) =. 21 +. 42 =. 63 P(X+Y=2) = P(X=0 and Y=2)+ P(X=1 and Y=1)+ P(X=2 and Y =. 07 +. 06 =. 19 X Y 0 1 2 p(y) 0. 12. 42. 06. 60 1. 21. 06. 03. 30 2. 07. 02. 01. 10 p(x). 40. 50. 10 1. 00 The probabilities P(X+Y)=3 and P(X+Y) =4 are calculated the same way. The distribution follows 29

The Expected Value and Variance of X+Y n The distribution of X+Y x+y p(x+y) n 0 1. 12. 63 2. 19 3. 05 4. 01 The expected value and variance of X+Y can be calculated from the distribution of X+Y. E(X+Y)=0(. 12)+ 1(63)+2(. 19)+3(. 05)+4(. 01)=1. 2 n V(X+Y)=(0 -1. 2)2(. 12)+(1 -1. 2)2(. 63)+… =. 56 n 30

The Expected Value and Variance of X+Y n The following relationship can assist in calculating E(X+Y) and V(X+Y) • E(X+Y) =E(X) + E(Y); • V(X+Y) = V(X) +V(Y) +2 COV(X, Y) • When X and Y are independent COV(X, Y) = 0, and V(X+Y) = V(X)+V(Y). 31

7. 6 The Binomial Distribution The binomial experiment can result in only one of two possible outcomes. n Typical cases where the binomial experiment applies: n A coin flipped results in heads or tails n An election candidate wins or loses n An employee is male or female n A car uses 87 octane gasoline, or another gasoline. n 32

Binomial Experiment There are n trials (n is finite and fixed). n Each trial can result in a success or a failure. n The probability p of success is the same for all the trials. n All the trials of the experiment are independent. n n Binomial Random Variable n The binomial random variable counts the number of successes in n trials of the 33

Developing the Binomial Probability Distribution (n = 3) ) S 1 p) = ) S | 1 S 2 (2 PP(S S 2 , S 1 S | 2 p P(S 3 )= S 3 P(SSS)=p 3 P(F | 1 -p 3 S 2 , S 1 )) S , F = | 2 p 1 ) S S ( ( 3 3 P P F 3 P(SSF)=p 2(1 -p) S 3 P(SFS)=p(1 -p)p P(S 3 P(F 3 )= P(S 2|S 1 PP(F p )1 = (F 2 |)S=1 ) 2 1 S ( p P(F ) P Since the outcome of each trial is F 2 P(F 3 =1 -p 3 |F 2 , S independent of the previous outcomes, 1) we can replace the conditional probabilities ) F , 1 S p | 2 = ) with the marginal probabilities. S(3 S 3 P( S 2 P(P F )=1)p 1 )= F | S 2 P ( 1 -p PP(S 2 P(F(F|3 )=1 -p 3 S 2 , F 1) ) F , 1 F | 2 F 1 (S(S 3 3)=p PP P(F |F-p 1 ) 2 )=21 F 2 PP(F(F 3|F ) -p 3 =21 , F 1 ) F 3 P(SFF)=p(1 -p)2 S 3 P(FSS)=(1 -p)p 2 F 3 P(FSF)=(1 -p)p(1 S 3 P(FFS)=(1 -p)2 p F 3 P(FFF)=(1 -p)3 34

Developing the Binomial Probability Distribution (n = 3) 3 P(SSS)=p SSS Let X be the number of successes in three trials. Then, P(X = 3) = p 3 X=3 P(X = 2) = 3 p 2(1 -p) X =2 P(X = 1) = 3 p(1 -p)2 X=1 P(X = 0) = (1 - p)3 X=0 2(1 -p) P(SSF)=p SS S S P(SFS)=p(1 -p)p P(SFF)=p(1 -p)2 2 SS P(FSS)=(1 -p)p P(FSF)=(1 -p)p(1 P(FFS)=(1 -p)2 p This multiplier is calculated in the following formula P(FFF)=(1 -p)3 35

Calculating the Binomial Probability In general, The binomial probability is calculate 36

Calculating the Binomial Probability n Example 7. 9 & 7. 10 n n Pat Statsdud is registered in a statistics course and intends to rely on luck to pass the next quiz. The quiz consists on 10 multiple choice questions with 5 possible choices for each question, only one of which is the correct answer. Pat will guess the answer to each question Find the following probabilities • Pat gets no answer correct • Pat gets two answer correct? • Pat fails the quiz 37

Calculating the Binomial Probability n Solution n Checking the conditions • • An answer can be either correct or incorrect. There is a fixed finite number of trials (n=10) Each answer is independent of the others. The probability p of a correct answer (. 20) does not change from question to question. 38

Calculating the Binomial Probability n Solution – Continued n Determining the binomial probabilities: Let X = the number of correct answers 39

Calculating the Binomial Probability n Solution – Continued n Determining the binomial probabilities: Pat fails the test if the number of correct answers is less than 5, which means less than or=equal 4. + p(2) + p(3) + p(4) P(X£ 4) p(0) +top(1) =. 1074 +. 2684 +. 3020 +. 2013 +. 0881 =. 9672 This is called cumulative probability 40

Mean and Variance of Binomial Variable Binomial Distributionsummary E(X) = m = np V(X) = s 2 = np(1 -p) n Example 7. 11 n n If all the students in Pat’s class intend to guess the answers to the quiz, what is the mean and the standard deviation of the quiz mark? Solution m = np = 10(. 2) = 2. n s = [np(1 -p)]1/2 = [10(. 2)(. 8)]1/2 = 1. 26. n 41

7. 7 Poisson Distribution n The Poisson experiment typically fits cases of rare events that occur over a fixed amount of time or within a specified region n Typical cases n n n The number of errors a typist makes per page The number of customers entering a service station per hour The number of telephone calls received by a switchboard per hour. 42

Properties of the Poisson Experiment n n The number of successes (events) that occur in a certain time interval is independent of the number of successes that occur in another time interval. The probability of a success in a certain time interval is n n n the same for all time intervals of the same size, proportional to the length of the interval. The probability that two or more successes will occur in an interval approaches zero as 43

The Poisson Variable and Distribution n The Poisson Random Variable n n The Poisson variable indicates the number of successes that occur during a given time interval or in a specific region in a Poisson experiment Probability Distribution of the Poisson Random Variable. 44

Poisson Distributions (Graphs) 0 1 2 3 4 5 45

Poisson Distributions (Graphs) Poisson probability distribution with m =2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 46 14 Poisson probability distribution with m =5 Poisson probability distribution with m =7

Poisson Distribution n Example 7. 12 The number of Typographical errors in new editions of textbooks is Poisson distributed with a mean of 1. 5 per 100 pages. n 100 pages of a new book are randomly selected. n What is the probability that there are no typos? e-mmx e-1. 50 = =. 2231 0! n Solution x! 47 n P(X=0)= n

Finding Poisson Probabilities Poisson Distribution n Example 7. 13 n n For a 400 page book calculate the following probabilities Important! • There are no typos A mean of 1. 5 typos per 100 pages, is • There are five or fewer typos Solution equivalent to 6 typos per 400 pages. e-mmx e-660 = =. 002479 n P(X=0)= x! 0! n P(X£ 5)=<use the formula to find p(0), p(1), …, p(5), then calculate p(0)+p(1)+…+p(5) =. 4457 48