Spatial Statistics SGG 2413 Descriptive Statistics Assoc Prof

  • Slides: 48
Download presentation
Spatial Statistics (SGG 2413) Descriptive Statistics Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar

Spatial Statistics (SGG 2413) Descriptive Statistics Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman Director Centre for Real Estate Studies Faculty of Engineering and Geoinformation Science Universiti Tekbnologi Malaysia Skudai, Johor Spatial Statistics: Topic 3 1

Learning Objectives l Overall: To give students a basic understanding of descriptive statistics Specific:

Learning Objectives l Overall: To give students a basic understanding of descriptive statistics Specific: Students will be able to: * understand the basic concept of descriptive statistics * understand the concept of distribution * can calculate measures of central tendency dispersion * can calculate measures of kurtosis and skewness l Spatial Statistics: Topic 3 2

Contents l What is descriptive statistics l Central tendency, dispersion, kurtosis, skewness l Distribution

Contents l What is descriptive statistics l Central tendency, dispersion, kurtosis, skewness l Distribution Spatial Statistics: Topic 3 3

Descriptive Statistics Use sample information to explain/make abstraction of population “phenomena”. l Common “phenomena”:

Descriptive Statistics Use sample information to explain/make abstraction of population “phenomena”. l Common “phenomena”: * Association (e. g. σ1, 2. 3 = 0. 75) * Tendency (left-skew, right-skew) * Trend, pattern, location, dispersion, range * Causal relationship (e. g. if X then Y) l Emphasis on meaningful characterisation of data (e. g. central tendency, variability), graphics, and description l Use non-parametric analysis (e. g. 2, t-test, 2 -way anova) l Spatial Statistics: Topic 3 4

E. g. of Abstraction of phenomena Spatial Statistics: Topic 3 5

E. g. of Abstraction of phenomena Spatial Statistics: Topic 3 5

Inferential Statistics l Using sample statistics to infer some “phenomena” of population parameters l

Inferential Statistics l Using sample statistics to infer some “phenomena” of population parameters l Common “phenomena”: cause-and-effect Y = f(X) l * One-way r/ship Y 1 = f(Y 2, X, e 1) * Feedback r/ship Y 2 = f(Y 1, Z, e 2) * Recursive Y = f(X, e ) 1 1 Y 2 = f(Y 1, Z, e 2) l Use parametric analysis (e. g. α and ) through regression analysis l Emphasis on hypothesis testing Spatial Statistics: Topic 3 6

Parametric statistics l Statistical analysis that attempts to explain the population parameter using a

Parametric statistics l Statistical analysis that attempts to explain the population parameter using a sample l E. g. of statistical parameters: mean, variance, std. dev. , R 2, t-value, F-ratio, xy, etc. l It assumes that the distributions of the variables being assessed belong to known parameterised families of probability distributions Spatial Statistics: Topic 3 7

Examples of parametric relationship Dep=9 t – 215. 8 Dep=7 t – 192. 6

Examples of parametric relationship Dep=9 t – 215. 8 Dep=7 t – 192. 6 Spatial Statistics: Topic 3 8

Non-parametric statistics First used by Wolfowitz (1942) l Statistical analysis that attempts to explain

Non-parametric statistics First used by Wolfowitz (1942) l Statistical analysis that attempts to explain the population parameter using a sample without making assumption about the frequency distribution of the assessed variable l In other words, the variable being assessed is distribution-free l E. g. of non-parametric statistics: histogram, stochastic kernel, non-parametric regression l Spatial Statistics: Topic 3 9

Descriptive & Inferential Statistics (DS & IS) DS gather information about a population characteristic

Descriptive & Inferential Statistics (DS & IS) DS gather information about a population characteristic (e. g. income) and describe it with a parameter of interest (e. g. mean) l IS uses the parameter to test a hypothesis pertaining to that characteristic. E. g. Ho: mean income = RM 4, 000 H 1: mean income < RM 4, 000) l The result for hypothesis testing is used to make inference about the characteristic of interest (e. g. Malaysian upper middle income) l Spatial Statistics: Topic 3 10

Sample Statistics: Central Tendency Measure Advantages Best known average (Sum of Exactly calculable all

Sample Statistics: Central Tendency Measure Advantages Best known average (Sum of Exactly calculable all values Make use of all data ÷ Useful for statistical analysis no. of Mean Disadvantages Affected by extreme values Can be absurd for discrete data (e. g. Family size = 4. 5 person) Cannot be obtained graphically values) Median Not influenced by extreme Needs interpolation for group/ (middle value) values Obtainable even if data distribution unknown (e. g. group/aggregate data) Unaffected by irregular class width Unaffected by open-ended class aggregate data (cumulative frequency curve) May not be characteristic of group when: (1) items are only few; (2) distribution irregular Very limited statistical use Mode Unaffected by extreme values Cannot be determined exactly in Easy to obtain from histogram group data Determinable from only values Very limited statistical use Spatial Statistics: Topic 3 11 near the modal class (most frequent value)

Central Tendency – Mean For individual observations, . E. g. X = {3, 5,

Central Tendency – Mean For individual observations, . E. g. X = {3, 5, 7, 7, 8, 8, 8, 9, 9, 10, 12} = 96 ; n = 12 l Thus, = 96/12 = 8 l The above observations can be organised into a frequency table and mean calculated on the basis of frequencies l x 3 5 7 8 9 10 12 f 1 1 2 3 2 2 fx 3 5 14 24 18 20 12 1 = 96; = 12 Thus, = 96/12 = 8 Spatial Statistics: Topic 3 12

Central Tendency - Mean and Mid-point l Let say we have data like this:

Central Tendency - Mean and Mid-point l Let say we have data like this: Price (RM ‘ 000/unit) of Shop Houses in Skudai Location Min Max Town A 228 450 Town B 320 430 Can you calculate the mean? Spatial Statistics: Topic 3 13

Central Tendency - Mean and Mid-point (contd. ) l Let’s calculate: M = ½(Min

Central Tendency - Mean and Mid-point (contd. ) l Let’s calculate: M = ½(Min + Max) Town A: (228+450)/2 = 339 Town B: (320+430)/2 = 375 l Are these figures means? Spatial Statistics: Topic 3 14

Central Tendency - Mean and Mid-point (contd. ) Let’s say we have price data

Central Tendency - Mean and Mid-point (contd. ) Let’s say we have price data as follows: Town A: 228, 295, 310, 420, 450 Town B: 320, 295, 310, 400, 430 l Calculate the means? Town A: Town B: l Are the results same as previously? l l Be careful about mean and “mid-point”! Spatial Statistics: Topic 3 15

Central Tendency – Mean of Grouped Data l House rental or prices in the

Central Tendency – Mean of Grouped Data l House rental or prices in the PMR are frequently tabulated as a range of values. E. g. Rental (RM/month) 135 -140 140 -145 145 -150 150 -155 155 -160 Mid-point value (x) Number of Taman (f) fx 137. 5 142. 5 147. 5 152. 5 157. 5 5 9 6 2 1 687. 5 1282. 5 885. 0 305. 0 157. 5 What is the mean rental across the areas? = 23; = 3317. 5 Thus, = 3317. 5/23 = 144. 24 l Spatial Statistics: Topic 3 16

Central Tendency – Median l Let say house rentals in a particular town are

Central Tendency – Median l Let say house rentals in a particular town are tabulated: Rental (RM/month) Number of Taman (f) Rental (RM/month) Cumulative frequency l 130 -135 135 -140 140 -145 155 -50 150 -155 3 5 9 6 2 >135 > 140 > 145 > 150 > 155 3 8 17 23 25 Calculation of “median” rental needs a graphical aids→ 1. Median = (n+1)/2 = (25+1)/2 =13 th. Taman 2. (i. e. between 10 – 15 points on the vertical axis of ogive). 3. Corresponds to RM 140145/month on the horizontal axis 4. There are (17 -8) = 9 Taman in the range of RM 140 -145/month 5. Taman 13 th. is 5 th. out of the 9 Taman 6. The rental interval width is 5 7. Therefore, the median rental can be calculated as: 140 + (5/9 x 5) = RM 142. 8 Spatial Statistics: Topic 3 17

Central Tendency – Median (contd. ) Spatial Statistics: Topic 3 18

Central Tendency – Median (contd. ) Spatial Statistics: Topic 3 18

Central Tendency – Quartiles (contd. ) Following the same process as in calculating “median”:

Central Tendency – Quartiles (contd. ) Following the same process as in calculating “median”: Upper quartile = ¾(n+1) = 19. 5 th. Taman UQ = 145 + (3/7 x 5) = RM 147. 1/month Lower quartile = (n+1)/4 = 26/4 = 6. 5 th. Taman LQ = 135 + (3. 5/5 x 5) = RM 138. 5/month Inter-quartile = UQ – LQ = 147. 1 – 138. 5 = 8. 6 th. Taman IQ = 138. 5 + (4/5 x 5) = RM 142. 5/month Spatial Statistics: Topic 3 19

Variability Indicates dispersion, spread, variation, deviation l For single population or sample data: l

Variability Indicates dispersion, spread, variation, deviation l For single population or sample data: l where σ2 and s 2 = population and sample variance respectively, xi = individual observations, μ = population mean, = sample mean, and n = total number of individual observations. l The square roots are: standard deviation Spatial Statistics: Topic 3 20

Variability (contd. ) l l Why “measure of dispersion” important? Consider yields of two

Variability (contd. ) l l Why “measure of dispersion” important? Consider yields of two plant species: * Plant A (ton) = {1. 8, 1. 9, 2. 0, 2. 1, 3. 6} * Plant B (ton) = {1. 0, 1. 5, 2. 0, 3. 9} Mean A = mean B = 2. 28% But, different variability! Var(A) = 0. 557, Var(B) = 1. 367 * Would you choose to grow plant A or B? Spatial Statistics: Topic 3 21

Variability (contd. ) l Coefficient of variation – CV – std. deviation as %

Variability (contd. ) l Coefficient of variation – CV – std. deviation as % of the mean: A better measure compared to std. dev. in case where samples have different means. E. g. l * Plant X (ton/ha) = {1. 2, 1. 4, 2. 6, 2. 7, 3. 9} * Plant Y (ton/ha) = {1. 4, 1. 5, 2. 1, 3. 2, 3. 9} Spatial Statistics: Topic 3 22

Variability (cont. ) Yield (ton/ha) Farm No. Species Calculate CV for both species. X

Variability (cont. ) Yield (ton/ha) Farm No. Species Calculate CV for both species. X Y 1 1. 2 1. 4 CVx = (1. 2/2. 36) x 100 2 1. 4 1. 5 = 50. 97% 3 2. 6 2. 1 4 2. 7 3. 2 CVy = (1. 2/2. 42) x 100 5 3. 9 Mean 2. 36 2. 42 Var. 1. 20 = 49. 46% Species X is a little more variable than species Y Spatial Statistics: Topic 3 23

Variability (cont. ) l Std. dev. of a frequency distribution E. g. age distribution

Variability (cont. ) l Std. dev. of a frequency distribution E. g. age distribution of second-home buyers (SHB): Spatial Statistics: Topic 3 24

Probability distribution If there 20 lecturers, the probability that Logical probability: A becomes a

Probability distribution If there 20 lecturers, the probability that Logical probability: A becomes a professor is: p = 1/20 = 0. 05 l Experiential probability: Out of 100 births, half of them were girls (p=0. 5), as the number increased to 1, 000, two-third were girls (p=0. 67) but from a record of 10, 000 new-born babies, three-quarter were girls (p=0. 75) l Subjective probability: The probability of a drug addict recovering from addiction is 50: 50 l General rule: No. of times event X occurs Pr (event X) = ------------------ Total number of occurrences l Probability of certain event X to occur has a specific form of distribution l Spatial Statistics: Topic 3 25

Probability Distribution Classical example of tossing Dice 1 Dice 2 1 2 3 4

Probability Distribution Classical example of tossing Dice 1 Dice 2 1 2 3 4 5 6 1 2 3 4 5 6 7 5 6 7 8 9 10 5 6 7 8 9 10 11 12 What is the distribution of the sum of tosses? Spatial Statistics: Topic 3 26

Probability Distribution (contd. ) Discrete variable Values of x are discrete (discontinuous) Sum of

Probability Distribution (contd. ) Discrete variable Values of x are discrete (discontinuous) Sum of lengths of vertical bars p(X=x) = 1 all x Spatial Statistics: Topic 3 27

Probability Distribution (cont. ) Continuous variable Age Freq Prob. Mean = 39. 5 36

Probability Distribution (cont. ) Continuous variable Age Freq Prob. Mean = 39. 5 36 3 0. 02 Std. dev = 2. 45 37 14 0. 07 38 10 0. 04 39 36 0. 18 40 73 0. 36 41 27 0. 14 42 20 0. 10 43 17 0. 09 Total 200 1. 00 Pr (Area under curve) = 1 Age distribution of second-home buyers in Spatial Statistics: Topic 3 probability histogram 28

Probability Distribution (cont. ) l l l l Pr (Age ≤ 36) = 0.

Probability Distribution (cont. ) l l l l Pr (Age ≤ 36) = 0. 02 Pr (Age ≤ 37) = Pr (Age ≤ 36) + Pr (Age = 37) = 0. 02 + 0. 07 = 0. 09 Pr (Age ≤ 38) = Pr (Age ≤ 37) + Pr (Age = 38) = 0. 09 + 0. 04 = 0. 13 Pr (Age ≤ 39) = Pr (Age ≤ 38) + Pr (Age = 39) = 0. 13 + 0. 18 = 0. 31 Pr (Age ≤ 40) = Pr (Age ≤ 39) + Pr (Age = 40) = 0. 31 + 0. 36 = 0. 67 Pr (Age ≤ 41) = Pr (Age ≤ 40) + Pr (Age = 41) = 0. 67 + 0. 14 = 0. 81 Pr (Age ≤ 42) = Pr (Age ≤ 41) + Pr (Age = 42) = 0. 81 + 0. 10 = 0. 91 Pr (Age ≤ 43) = Pr (Age ≤ 42) + Pr (Age = 43) = 0. 91 + 0. 09 = 1. 00 Cumulative probability corresponds to the left tail of a distribution Spatial Statistics: Topic 3 29

Probability Distribution (cont. ) As larger and larger samples are drawn, the probability distribution

Probability Distribution (cont. ) As larger and larger samples are drawn, the probability distribution is getting smoother l Tens of different types of probability distribution: Z, t, F, gamma, etc l Most important: normal distribution Larger sample l Spatial Statistics: Topic 3 Very large sample 30

Normal Distribution - ND Salient features of ND: * Bell-shaped, symmetrical * Total area

Normal Distribution - ND Salient features of ND: * Bell-shaped, symmetrical * Total area under curve = 1 * Area under curve between any two points = prob. of values in that range (shaded area) * Prob. of any exact value = 0 * Has a function of: μ = mean of variable x; σ = std. dev. of x; π = ratio of circumference of a circle to its diameter = 3. 14; e = base of natural log = l 2. 71828. Spatial Statistics: Topic 3 31

Normal Distribution - ND Population 2 Population 1 1 2 1 2 * determines

Normal Distribution - ND Population 2 Population 1 1 2 1 2 * determines location * A larger population has while determines narrower base (smaller shape of ND Spatial Statistics: Topic 3 variance) 32

Normal Distribution (cont. ) * Has a mean and a variance 2, i. e.

Normal Distribution (cont. ) * Has a mean and a variance 2, i. e. X N( , 2 ) * Has the following distribution of observation: “Home-buyers example…” Mean age = 39. 3 Std. dev = 2. 42 Spatial Statistics: Topic 3 33

Standard Normal Distribution (SND) Since different populations have different and (thus, locations and shapes

Standard Normal Distribution (SND) Since different populations have different and (thus, locations and shapes of distribution), they have to be standardised. l Most common standardisation: standard normal distribution (SND) or called Z-distribution l (X=x) is given by area under curve l Has no standard algebraic method of integration → Z ~ N(0, 1) l To transform f(x) into f(z): x - µ Z = ------- ~ N(0, 1) σ l Spatial Statistics: Topic 3 34

Z-Distribution Probability is such a way that: * Approx. 68% -1< z <1 *

Z-Distribution Probability is such a way that: * Approx. 68% -1< z <1 * Approx. 95% -1. 96 < z < 1. 96 * Approx. 99% -2. 58 < z < 2. 58 l Spatial Statistics: Topic 3 35

Z-distribution (cont. ) l When X= μ, Z = 0, i. e. l When

Z-distribution (cont. ) l When X= μ, Z = 0, i. e. l When X = μ + σ, Z = 1 When X = μ + 2σ, Z = 2 When X = μ + 3σ, Z = 3 and so on. It can be proven that P(X 1 <X< Xk) = P(Z 1 <Z< Zk) SND shows the probability to the right of any particular value of Z. l l Spatial Statistics: Topic 3 36

Normal distribution…Questions A study found that the mean age, A of second-home buyers in

Normal distribution…Questions A study found that the mean age, A of second-home buyers in Johor Bahru is 39. 3 years old with a variance of RM 2. 45. Assuming normality, how sure are you that the mean age is: (a) ≥ 40 years old; (b) 39 to 42 years old? Answer (a): P(A ≥ 40) = P[Z ≥ (40 – 39. 3)/2. 4] = P(Z ≥ 0. 2917 0. 3000) = 0. 3821 (b) P(39 ≤ A ≤ 42) = P(A ≥ 39) – P(A ≥ 42) = 0. 45224 – P[A ≥ (42 -39. 3)/2. 4] = 0. 45224 – P(A ≥ 1. 125) = 0. 45224 – 0. 12924 = 0. 3230 Use Z-table! Spatial Statistics: Topic 3 Always remember: to convert to SND, subtract the mean and divide by the std. dev. 37

“Student’s t-Distribution” Similar to Z-distribution (bell-shaped, symmetrical) l Has a function of where =

“Student’s t-Distribution” Similar to Z-distribution (bell-shaped, symmetrical) l Has a function of where = gamma distribution; v = n-1 = d. o. f; = 3. 147 l Flatter with thicker tails l Distributed with t (0, σ) and -∞ < t < +∞ l As n→∞ t (0, σ) → N(0, 1) l Probability calculation requires information on d. o. f. l Spatial Statistics: Topic 3 38

How Are t-dist. and Z-dist. Related? l Using central limit theorem, N( , 2/n)

How Are t-dist. and Z-dist. Related? l Using central limit theorem, N( , 2/n) will become z N(0, 1) as n→∞ l For a large sample, t-dist. of a variable or a parameter is given by: l. The interval of critical values for variable, x is: Spatial Statistics: Topic 3 39

Skewness, m 3 & Kurtosis, m 4 Skewness, m 3 measures degree of symmetry

Skewness, m 3 & Kurtosis, m 4 Skewness, m 3 measures degree of symmetry of distribution l Kurtosis, m 4 measures its degree of peakness l Both are useful when comparing sample distributions with different Xi = indivudal sample shapes observation, = l Useful in data analysis sample mean; = std. l deviation; n = sample size Spatial Statistics: Topic 3 40

Skewness Right (+ve) skew Left (-ve) skew Bimodal Uniform Spatial Statistics: Topic 3 Perfectly

Skewness Right (+ve) skew Left (-ve) skew Bimodal Uniform Spatial Statistics: Topic 3 Perfectly normal (zero skew) J-shaped 41

Kurtosis Leptokurtic Mesokurtic Platykurtic (high peak) (normal) (low peak) (+ve kurtosis) (zero kurtosis) (-ve

Kurtosis Leptokurtic Mesokurtic Platykurtic (high peak) (normal) (low peak) (+ve kurtosis) (zero kurtosis) (-ve kurtosis) Mesokurtic distribution…kurtosis = 3 Leptokurtic distribution…kurtosis < 3 Platykurtoc distribution…kurtosis > 3 Spatial Statistics: Topic 3 42

Occurrence of ganoderma X-coord. (000) Y-coord. (000) 535. 60 104. 80 536. 70 Trees

Occurrence of ganoderma X-coord. (000) Y-coord. (000) 535. 60 104. 80 536. 70 Trees with Ganoderma Trees with ganoderma X-coord. (000) Y-coord. (000) 8 547. 75 106. 08 5 107. 30 12 547. 10 105. 25 8 536. 80 106. 80 11 547. 80 101. 05 7 537. 30 107. 31 12 548. 18 105. 92 8 537. 15 105. 40 13 548. 80 105. 90 12 537. 40 105. 37 13 548. 95 104. 85 15 538. 48 107. 82 9 548. 94 104. 50 13 542. 22 106. 10 8 548. 75 103. 73 7 540. 35 105. 91 7 540. 10 104. 95 7 540. 30 104. 75 6 538. 75 102. 80 5 545. 10 105. 90 4 546. 30 105. 90 3 547. 15 105. 90 2 548. 94 102. 80 4 Occurrence of ganoderma Spatial Statistics: Topic 3 43

Aluminium residues in the soil Al p. p. m. Freq. 0 0 250 7

Aluminium residues in the soil Al p. p. m. Freq. 0 0 250 7 500 E. g. Al 2++ + H 2++O-- → Al 2 O + H 2 sum 102. 00 13 mean 1073. 53 750 25 1000 18 1250 13 1500 9 1750 7 2000 3 2250 4 skew 2500 3 kurtosis 553. 05 2 305867. 94 3 169161266. 28 4 935551939 11. 64 Spatial Statistics: Topic 3 0. 77 13. 44 44

Measures of spatial separation Weighted mean centre (Xcoord. ) = Weighted mean centre (Ycoord.

Measures of spatial separation Weighted mean centre (Xcoord. ) = Weighted mean centre (Ycoord. ) = Distance (x 1, y 1) and (x 2, y 2) = E. g. WCM = ((545. 10 -542. 86)2 + (105. 90 -105. 48)2)0. 5 = (5. 0176 + 0. 1764)0. 5 = 2. 28 (i. e. 2, 280 m) Standard distance = Spatial Statistics: Topic 3 45

Spatial distribution – Occurrence of ganoderma Sum f = 191. 00 Weighted mean centre

Spatial distribution – Occurrence of ganoderma Sum f = 191. 00 Weighted mean centre Standard distance Point to point distance (e. g. ) 1. 84 Yw = 20147. 40 542. 86 105. 48 (Xw- )2 =588. 46 (Yw- )2 = 55. 50 x-dist. 5. 00 y-dist. 0. 17 2. 27 Distance Wc-M Xw = 103687. 00 Spatial Statistics: Topic 3 46

Spatial distribution – point data Ethnic distribution of residence Spatial Statistics: Topic 3 47

Spatial distribution – point data Ethnic distribution of residence Spatial Statistics: Topic 3 47

Ethnic distribution of residence x f fx (x- )2 0 81 0 -0. 49

Ethnic distribution of residence x f fx (x- )2 0 81 0 -0. 49 1 50 50 0. 51 2 9 18 1. 51 140 68 1. 54 0. 49 2 0. 01 CV 0. 02 CV 0. 12 tc -8. 15 Reject Ho…residence pattern is scattered Ho: 2 = (pattern is random) H 1: 2 > (pattern is clustered) or 2 < (pattern is scattered) X = no. of observations per quadrat; f = frequency of quadrats; = (fx)/ f; 2 = (x- )2/ (fx) -1; CV = 2/ ; CV = (2/(k-1))½. Spatial Statistics: Topic 3 k = (fx) -1 Test statistics 48