Introduction to Descriptive Statistics 22503 Population vs Sample



































- Slides: 35
Introduction to Descriptive Statistics 2/25/03
Population vs. Sample Notation Population Greeks , , Vs Sample Romans s, b
Types of Variables
Describing data Moment Center Spread Mean Variance (standard deviation) Skewness Non-mean based measure Mode, median Range, Interquartile range -- Peaked Kurtosis --
Mean
Variance, Standard Deviation
Variance, S. D. of a Sample
Coefficient of variation
Skewness Symmetrical distribution • IQ • SAT
Skewness Asymmetrical distribution • GPA of MIT students
Skewness (Asymmetrical distribution) • Income • Contribution to candidates • Populations of countries • “Residual vote” rates
Skewness
Skewness
Kurtosis
A few words about the normal curve • Skewness = 0 • Kurtosis = 3
More words about the normal curve
SEG example The instructor and/or section leader: Mean s. d. Skew Kurt Gives well-prepared, relevant presentations 6. 0 0. 69 -1. 7 8. 5 Explains clearly and answers questions well 5. 9 0. 68 -1. 0 4. 8 Uses visual aids well 5. 6 0. 85 -1. 8 8. 9 Uses information technology effectively 5. 5 0. 91 -1. 1 5. 0 Speaks well 6. 1 0. 69 -1. 5 6. 8 Encourages questions & class participation 6. 1 0. 66 -0. 88 3. 7 Stimulates interest in the subject 5. 9 0. 76 -1. 1 4. 7 Is available outside of class for questions 5. 9 0. 68 -1. 3 6. 3 Overall rating of teaching 5. 9 0. 67 -1. 2 5. 5 Graph
Graph some SEG variables The instructor and/or section leader: Mean s. d. Skew Kurt Uses visual aids well 5. 6 0. 85 -1. 8 8. 9 Encourages questions & class participation 6. 1 0. 66 -0. 88 3. 7 Graph
Binary data
Commands in STAT for getting univariate statistics • • • summarize, detail graph, bin() normal graph, box tabulate [NB: compare to table]
Explore Q 9: Overall teaching evaluation subject 3. 371 3. 982 3. 14 14. 02 D 21 W. 803 21 M. 480 17. 906 2. 51 q 9 6. 4375 6. 73333 6. 46154 5. 66667 5. 69231 5. 28571 5. 88235 n 16 15 13 3 12 13 14 17
Graph Q 9. graph q 9
Divide into 7 “bins” and have them span 1, 1. . 2, 2. . 3, … 6. . 7. graph q 9, bin(7) xscale(0, 7)
Add ticks at each integer score. graph q 9, bin(7) xscale(0, 7) xlabel(0, 1, 2, 3, 4, 5, 6, 7)
Add a finer grain to the bars. graph q 9, bin(14) xscale(0, 7) xlabel(0, 1, 2, 3, 4, 5, 6, 7)
Even finer grain • . graph q 9, bin(28) xscale(0, 7) xlabel(0, 1, 2, 3, 4, 5, 6, 7)
Superimpose the normal curve (with the same mean and s. d. as the empirical distribution) . graph q 9, bin(28) xscale(0, 7) xlabel(0, 1, 2, 3, 4, 5, 6, 7) norm
Do the previous graph with only larger classes (n > 20). graph q 9 if n>20, bin(28) xscale(0, 7) xlabel(0, 1, 2, 3, 4, 5, 6, 7)
Draw the previous graph with a box plot. graph q 9 if n>20, box ylabel
Draw the box plots for small (0. . 20), medium (21. . 50), and large (50+) classes gen size = 0 if n <=20 . (237 missing values generated) . replace size=1 if n > 20 & n <=100 (196 real changes made). replace size = 2 if n > 100 (41 real changes made). sort size. graph q 9 , box ylabel by(size)
A note about histograms with unnatural categories From the Current Population Survey (2000), Voter and Registration Survey How long (have you/has name) lived at this address? -9 -3 -2 -1 1 2 3 4 5 6 No Response Refused Don't know Not in universe Less than 1 month 1 -6 months 7 -11 months 1 -2 years 3 -4 years 5 years or longer
Simple graph
Solution, Step 1 Map artificial category onto “natural” midpoint -9 -3 -2 -1 1 2 3 4 5 6 No Response missing Refused missing Don't know missing Not in universe missing Less than 1 month 1/24 = 0. 042 1 -6 months 3. 5/12 = 0. 29 7 -11 months 9/12 = 0. 75 1 -2 years 1. 5 3 -4 years 3. 5 5 years or longer 10 (arbitrary)
Graph of recoded data
Density plot of data