Probability distribution functions Normal distribution Lognormal distribution Mean
Probability distribution functions • • • Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions
Normal (Gaussian) distribution • Normal density function • What does the figure tell us about the values of the CDF?
More on the normal distribution • P = normcdf(X, MU, SIGMA) returns the cdf of the normal distribution with mean MU and standard deviation SIGMA, evaluated at the values in X. The size of P is the common size of X, MU and SIGMA. • normcdf(1)=0. 8413. • 1 -normcdf(6)= 9. 8659 e-010 • If X is normally distributed, Y=a. X+b is also normally distributed. What would be the mean and standard deviation of Y? • Notation
Estimating mean and standard deviation • Given a sample from a normally distributed variable, the sample mean is the best linear unbiased estimator of the true mean. • For the variance the equation gives the best unbiased estimator, but the square root is not an unbiased estimate of the standard deviation x=randn(5, 10000); s=std(x); mean(s) 0. 9463 s 2=s. ^2; mean(s 2) 1. 0106
Lognormal distribution • If ln(X) has normal distribution X has lognormal distribution. That is, if X is normally distributed exp(X) is lognormally distributed. • Notation: • Probability distribution function (PDF) • Mean and variance
Mean, mode and median • Mode (highest point) • Median (50% of samples)
Light and heavy tails • Normal distribution has light tail. Six sigma is equivalent to. 99999 (nines) safety. • Lognormal is heavy tailed 0. 9963 m=exp(0. 5) m =1. 6487 v=exp(1)*(exp(1)-1) v =4. 6708 sig=sqrt(v) sig =2. 1612 sig 6=m+6*sig 6 =14. 6159 logncdf(sig 6, 0, 1) =0. 9963
Fitting distribution to data • Typically fit to CDF.
Empirical CDF [F, X] = ecdf(Y) calculates the Kaplan-Meier estimate of the cumulative distribution function (cdf), also known as the empirical cdf. Y is a vector of data values. F is a vector of values of the empirical cdf evaluated at X. [F, X, FLO, FUP] = ecdf(Y) also returns lower and upper confidence bounds for the cdf. These bounds are calculated using Greenwood's formula, and are not simultaneous confidence bounds. ecdf(. . . ) without output arguments produces a plot of the empirical cdf. Use the data cursor to read precise values from the plot.
Example x=lognrnd(0, 1, 1, 20); ecdf(x) hold on x=lognrnd(0, 1, 1, 10000); ecdf(x)
Extreme value distributions • No matter what distribution you sample from, the mean of the sample tends to be normally distributed as sample size increases (what mean and standard deviation? ) • Similarly, distributions of the minimum (or maximum) of samples belong to other distributions. • Even though there are infinite number of distributions, there are only three extreme value distribution. – Type I (Gumbel) derived from normal. – Type II (Frechet) e. g. maximum daily rainfall – Type III (Weibull) weakest link failure
Example x=5 -0. 3*randn(10, 1000); minx=min(x); hist(minx); ecdf(minx)
Gumbel distribution • PDF and CDF • Mean, median, mode and variance
Weibull distribution • Probability distribution • Used to describe distribution Of strength or fatigue life in brittle materials (weakest link connection) • If it describes time to failure, then ü k<1 indicates that failure rate decreases with time, ü k=1 indicates constant rate, ü k>1 indicates increasing rate. • Useful for other phenomena like wind speed distribution. • Can add 3 rd parameter by replacing x by x-c.
- Slides: 14