STATISTICS Univariate Distributions Professor KeSheng Cheng Department of
STATISTICS Univariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 1
Probability density functions of discrete random variables • Discrete uniform distribution • Bernoulli distribution • Binomial distribution • Negative binomial distribution • Geometric distribution • Hypergeometric distribution • Poisson distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 2
Discrete uniform distribution N ranges over the possible integers. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 3
Bernoulli distribution 1 -p is often denoted by q. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 4
Binomial distribution • Binomial distribution represents the probability of having exactly x success in n independent and identical Bernoulli trials. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 5
Negative binomial distribution • Negative binomial distribution represents the probability of achieving the r-th success in x independent and identical Bernoulli trials. • Unlike the binomial distribution for which the number of trials is fixed, the number of successes is fixed and the number of trials varies from experiment to experiment. The negative binomial random variable represents the number of trials needed to achieve the rth success. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 6
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 7
Geometric distribution • Geometric distribution represents the probability of obtaining the first success in x independent and identical Bernoulli trials. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 8
Hypergeometric distribution where M is a positive integer, K is a nonnegative integer that is at most M, and n is a positive integer that is at most M. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 9
• Let X denote the number of defective products in a sample of size n when sampling without replacement from a box containing M products, K of which are defective. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 10
Poisson distribution • The Poisson distribution provides a realistic model for many random phenomena for which the number of event occurrences within a given scope (time, length, area, volume) is of interest. For example, the number of fatal traffic accidents per day in Taipei, the number of meteorites that collide with a satellite during a single orbit, the number of defects per unit of some material, the number of flaws per unit length of some wire, etc. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 11
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 12
Assume that we are observing the number of occurrences of a certain event in time, space, region or length. Also assume that there exists a positive quantity which satisfies the following properties: 1. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 13
2. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 14
The probability of success (occurrence) in each independent trial. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 15
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 16
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 17
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 18
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 19
Comparison of Poisson and Binomial distributions 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 20
• Example Suppose that the average number of telephone calls arriving at the switchboard of a company is 30 calls per hour. Assuming telephone call occurrences are rare and can be modeled by a Poisson distribution. 1) What is the probability that no calls will arrive in a 3 -minute period? 2) What is the probability that more than five calls will arrive in a 5 -minute interval? 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 21
Assuming time is measured in minutes Poisson distribution is NOT a good approximation of the binomial distribution in this case. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 22
Assuming time is measured in seconds Poisson distribution is a good approximation of the binomial distribution. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 23
• The first property provides the basis for transferring the mean rate of occurrence between different observation scales. • The “small time interval of length h” can be measured in different observation scales. • represents number of observations if (in unit of time) is chosen as the scale of observation (or conducting a random experiment). • is the mean rate of occurrence when observation scale is used. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 24
• If the first property holds for various observation scales, say , then it implies the probability of exactly one happening in a small time interval h can be approximated by • The probability of more than one happenings in time interval h is negligible. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 25
i represents the average rate of occurrence (or success) within an interval of i. Within the interval of i , the probability of more than one occurrence is negligible. It is equivalent to say that a Bernoulli random experiment, with a probability of success i , is conducted once per i unit of time. Whether the phenomenon under investigation can be modeled as a Poisson process is dependent on our choice of scale of observation. The Poisson assumption is valid only if, under the chosen scale of observation, the occurrence of an event is rare, i. e. , probability of more than one occurrence is negligible. Definition of a rare event is dependent on the scale of observation. For example, typhoon occurrences cannot be considered as rare if they are observed on an annual basis. However, typhoons occurrences are rare, if observed on a weekly basis. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 26
• probability that more than five calls will arrive in a 5 -minute interval • Occurrences of events which can be characterized by the Poisson distribution is known as the Poisson process. If the problem is treated by considering a binomial distribution of n=5 and p=0. 5. What is the probability of having more than five calls in a 5 -minute interval? 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 27
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 28
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 29
Probability density functions of continuous random variables • Uniform or rectangular distribution • Normal distribution (also known as the Gaussian distribution) • Exponential distribution (or negative exponential distribution) • Gamma distribution (Pearson Type III) • Lognormal distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 30
Uniform or rectangular distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 31
PDF of U(a, b) 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 32
Normal distribution (Gaussian distribution) 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 33
Z 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 34
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 35
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 36
X~N(μ 1, σ1) 3/10/2021 Z~N(0, 1) Y~N(μ 2, σ2) Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 37
Commonly used values of normal distributions 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 38
Exponential distribution (negative exponential distribution) Mean rate of occurrence in a Poisson process. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 39
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 40
Gamma distribution represents the mean rate of occurrence in a Poisson process. is equivalent to in the exponential density. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 41
• The exponential distribution is a special case of gamma distribution with • The sum of n independent identically distributed exponential random variables with parameter has a gamma distribution with parameters. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 42
Pearson Type III distribution (PT 3) , and are the mean, standard deviation and skewness coefficient of X, respectively. It reduces to Gamma distribution if = 0. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 43
• The Pearson type III distribution is widely applied in stochastic hydrology. • Total rainfall depths of storm events can be characterized by the Pearson type III distribution. • Annual maximum rainfall depths are also often characterized by the Pearson type III or log. Pearson type III distribution. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 44
Log-Normal Distribution Log-Pearson Type III Distribution (LPT 3) • A random variable X is said to have a lognormal distribution if Log(X) is distributed with a normal density. • A random variable X is said to have a Log. Pearson type III distribution if Log(X) has a Pearson type III distribution. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 45
Lognormal distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 46
Approximations between random variables • • • 3/10/2021 Approximation of binomial distribution by Poisson distribution Approximation of binomial distribution by normal distribution Approximation of Poisson distribution by normal distribution Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 47
Approximation of binomial distribution by Poisson distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 48
Approximation of binomial distribution by normal distribution • Let X have a binomial distribution with parameters n and p. If , then for fixed a<b, is the cumulative distribution function of the standard normal distribution. It is equivalent to say that as n approaches infinity X can be approximated by a normal distribution with mean np and variance npq. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 49
Approximation of Poisson distribution by normal distribution • Let X have a Poisson distribution with parameter . If , then for fixed a<b • It is equivalent to say that as approaches infinity X can be approximated by a normal distribution with mean and variance . 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 50
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 51
Example • Suppose that two fair dice are tossed 600 times. Let X denote the number of times that a total of 7 dots occurs. What is the probability that ? 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 52
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 53
Transformation of random variables • [Theorem] Let X be a continuous RV with density fx. Let Y=g(X), where g is strictly monotonic and differentiable. The density for Y, denoted by f. Y, is given by 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 54
• Proof: Assume that Y=g(X) is a strictly monotonic increasing function of X. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 55
Example • Let X be a gamma random variable with Y is also a gamma random variable with scale parameter 1/ and shape parameter . 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 56
Definition of the location parameter 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 57
Example of location parameter 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 58
Definition of the scale parameter 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 59
Example of scale parameter 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 60
X 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 61
Simulation • Given a random variable X with CDF FX(x), there are situations that we want to obtain a set of n random numbers (i. e. , a random sample of size n) from FX(. ). • The advances in computer technology have made it possible to generate such random numbers using computers. The work of this nature is termed “simulation”, or more precisely “stochastic simulation”. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 62
3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 63
Pseudo-random number generation • Pseudorandom number generation (PRNG) is the technique of generating a sequence of numbers that appears to be a random sample of random variables uniformly distributed over (0, 1). 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 64
• A commonly applied approach of PRNG starts with an initial seed and the following recursive algorithm (Ross, 2002) modulo m where a and m are given positive integers, and the above equation means that is divided by m and the remainder is taken as the value of. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 65
• The quantity is then taken as an approximation to the value of a uniform (0, 1) random variable. • Such algorithm will deterministically generate a sequence of values and repeat itself again and again. Consequently, the constants a and m should be chosen to satisfy the following criteria: • For any initial seed, the resultant sequence has the “appearance” of being a sequence of independent uniform (0, 1) random variables. • For any initial seed, the number of random variables that can be generated before repetition begins is large. • The values can be computed efficiently on a digital computer. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 66
• A guideline for selection of a and m is that m be chosen to be a large prime number that can be fitted to the computer word size. For a 32 bit word computer, m = and a = result in desired properties (Ross, 2002). 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 67
Simulating a continuous random variable • probability integral transformation 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 68
The cumulative distribution function of a continuous random variable is a monotonic increasing function. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 69
Example • Generate a random sample of random variable V which has a uniform density over (0, 1). • Convert to using the above Vto-X transformation. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 70
Random number generation in R • R commands for stochastic simulation (for normal distribution • pnorm – cumulative probability • qnorm – quantile function • rnorm – generating a random sample of a specific sample size • dnorm – probability density function For other distributions, simply change the distribution names. For examples, (punif, qunif, runif, and dunif) for uniform distribution and (ppois, qpois, rpois, and dpois) for Poisson distribution. 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 71
Generating random numbers of discrete distribution in R • Discrete uniform distribution • R does not provide default functions for random number generation for the discrete uniform distribution. • However, the following functions can be used for discrete uniform distribution between 1 and k. • • 3/10/2021 rdu<-function(n, k) sample(1: k, n, replace=T) # random number ddu<-function(x, k) ifelse(x>=1 & x<=k & round(x)==x, 1/k, 0) # density pdu<-function(x, k) ifelse(x<1, 0, ifelse(x<=k, floor(x)/k, 1)) # CDF qdu <- function(p, k) ifelse(p <= 0 | p > 1, return("undefined"), ceiling(p*k)) # quantile Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 72
• Similar, yet more flexible, functions are defined as follows • dunifdisc<-function(x, min=0, max=1) ifelse(x>=min & x<=max & round(x)==x, 1/(max-min+1), 0) >dunifdisc(23, 21, 40) >dunifdisc(c(0, 1)) • punifdisc<-function(q, min=0, max=1) ifelse(q<min, 0, ifelse(q>max, 1, floor(q-min+1)/(max-min+1))) >punifdisc(0. 2) >punifdisc(5, 2, 19) • qunifdisc<-function(p, min=0, max=1) floor(p*(max-min+1))+min >qunifdisc(0. 2222222, 2, 19) >qunifdisc(0. 2) • runifdisc<-function(n, min=0, max=1) sample(min: max, n, replace=T) >runifdisc(30, 2, 19) >runifdisc(30) 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 73
• Binomial distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 74
• Negative binomial distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 75
• Geometric distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 76
• Hypergeometric distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 77
• Poisson distribution 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 78
An example of stochastic simulation • The travel time from your home (or dormitory) to NTU campus may involve a few factors: • Walking to bus stop (stop for traffic lights, crowdedness on the streets, etc. ) • Transportation by bus • Stop by 7 -11 or Starbucks for breakfast (long queue) • Walking to campus 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 79
Gamma distribution with mean 30 minutes and standard deviation 10 minutes. Exponential distribution with a mean of 20 minutes. All Xi’s are independently distributed. • If you leave home at 8: 00 a. m. for a class session of 9: 10, what is the probability of being late for the class? 3/10/2021 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept. of Bioenvironmental Systems Engineering, NTU 80
The Acceptance/Rejection Method • This method uses an auxiliary density for generation of random quantities from another distribution. This method is particularly useful for generating random numbers of random variables whose cumulative distribution functions cannot be expressed in closed form.
• Suppose that we want to generate random numbers of a random variable X with density f(X). • An auxiliary density g(X) which we know how to generate random samples is identified and cg(X) is everywhere no less than f(X) for some constant c, i. e. ,
cg(X) f(X) X
• • Generate a random number x of density g(X), Generate a random number u from the density U[0, cg(x)), Reject x if u > f(x); otherwise, x is accepted as a random number form f(X), Repeat the above steps until the desired number of random numbers are obtained.
3/10/2021 Dept. of Bioenvironmental Systems Engineering, National Taiwan University 85
3/10/2021 Dept. of Bioenvironmental Systems Engineering, National Taiwan University 86
- Slides: 86