Descriptive statistics for one variable What to describe
Descriptive statistics for one variable 描述性统计
What to describe? n What is the “location” or “center” of the data? (“measures of location”) n How do the data vary? (“measures of variability”).
Types of statistics n Descriptive Statistics Gives numerical and graphic procedures to summarize a collection of data in a clear and understandable way Inferential Statistics Provides procedures to draw inferences about a population from a sample n
Reasons for using statistics aid in summarization n aid in “getting at what’s going on” n aid in extracting “information” from the data n aid in communication n
Frequency distribution n The frequency with which observations are assigned to each category or point on a measurement scale. Most basic form of descriptive statistic n May be expressed as a percentage of the total sample found in each category n Source : Reasoning with Statistics, by Frederick Williams & Peter Monge, fifth edition, Harcourt College Publishers.
Frequency distribution n The distribution is “read” differently depending upon the measurement level Nominal scales are read as discrete measurements at each level (no measurements at each level ( ordering) n Ordinal measures show tendencies, but categories should not be compared (ordering categories should not be compared ( exists, but not distance) n Interval (distance exists, but no ratios) and Interval ( ratio scales (ratios exist) all for comparison ratio scales ( among categories n
Sex N Mean Median Tr. Mean St. Dev SE Mean female 126 91. 23 90. 00 90. 83 11. 32 1. 01 male 100 96. 79 110. 00 105. 62 17. 39 1. 74 Minimum Maximum Q 1 Q 3 female 65. 00 120. 00 85. 00 98. 25 male 75. 00 162. 00 95. 00 118. 75
Source: Protecting Children from Harmful Television: TV Ratings and the V-chip Amy I. Nathanson, Ph. D Lecturer, University of California at Santa Barbara Joanne Cantor, Ph. D Professor, Communication Arts, University of Wisconsin-Madison
Source: http: //www. elonka. com/kryptos/ Web page on cryptography
Ancestry of US residents
Source: UCLA International Institute
Source: Cornell University website
Source: www. cit. cornell. edu/computer/students/bandwidth/charts. html
Source: www. cit. cornell. edu/computer/students/bandwidth/charts. html
Source: Verisign
Search engine use
The percentage of online searches done by US home and work web surfers in July 2006 home and work web
NY Times
Source: Verisign
Old Faithful Geyser
Duration in seconds of 272 eruptions of the Old Faithful geyser. library(datasets) > faithful[1: 10, ] n eruptions waiting 1 3. 600 79 2 1. 800 54 3 3. 333 74 4 2. 283 62 5 4. 533 85 6 2. 883 55 7 4. 700 88 8 3. 600 85 9 1. 950 51 10 4. 350 85 > summary(faithful) eruptions waiting Min. : 1. 600 Min. : 43. 0 1 st Qu. : 2. 163 1 st Qu. : 58. 0 Median : 4. 000 Median : 76. 0 Mean : 3. 488 Mean : 70. 9 3 rd Qu. : 4. 454 3 rd Qu. : 82. 0 Max. : 5. 100 Max. : 96. 0
Normal distribution n Many characteristics are distributed through the population in a ‘normal’ manner n n Normal curves have well-defined statistical properties Parametric statistics are based on the assumption that the variables are distributed normally n n Most commonly used statistics This is the famous “Bell curve” where many cases fall near the middle of the distribution and few fall very high or very low n I. Q.
Statistical properties of the normal distribution
I. Q. distribution
Measures of central tendency n Mode (Mo): the most frequent score in a distribution n n good for nominal data Median (Md): the midpoint or midscore in a distribution. n (50% cases above/50% cases below) – insensitive to extreme cases --Interval or ratio Source : Reasoning with Statistics, by Frederick Williams & Peter Monge, fifth edition, Harcourt College Publishers.
Measures of central tendency n Mean The ‘average’ score—total score divided by the number of scores n has a number of useful statistical properties n n however, can be sensitive to extreme scores many statistics based on mean n Sensitive to ‘outliers’ n n Extreme cases that just happened to end up in your sample by chance
Index of central tendency Source: http: //www. uwsp. edu/psych/stat/5/skewnone. gif
Source: Scianta. com
Source: www. wilderdom. com/. . . /L 2 -1 Understanding. IQ. html
Source: CSAP’s Data Pathways
Measures of dispersion n n Look at how widely scattered over the scale the scores are Groups with identical means can be more or less diverse To find out how the group is distributed, we need to know how far or close individual members are from the mean Like mean, only meaningful for interval or ratiolevel measures
Measures of dispersion n Range Distance between the highest and lowest scores in a distribution; n sensitive to extreme scores; n compensate by calculating interquartile range (distance between the 25 th and 75 th percentile points) which represents the range of scores for the middle half of a distribution Usually used in combination with other measures of dispersion.
Range Source: www. animatedsoftware. com/statglos/sgrange. htm
Source: http: //pse. cs. vt. edu/So. Sci/converted/Dispersion_I/box_n_hist. gif
n Average Deviation (Mean Deviation) Merits: 1. Easy to calculate and understand. 2. This can be calculated from any average. 3. It is less affected by extreme observations. Demerits: 1. This is mathematically incomplete because it ignores negative signs. 2. As it can be calculated from any average, it does not have certainty (i. e. , it is not a well defined measure). 3. Its use is very limited in statistical work.
Measures of dispersion n Variance (S 2) n Average of squared distances of individual points from the mean n High variance means that most scores are far away from the mean. Low variance indicates that most scores cluster tightly about the mean.
Standard Deviation (SD) A summary statistic of how much scores vary from the mean Square root of the Variance expressed in the original units of measurement n Used in a number of inferential statistics n
Variance vs. Standard Deviation Variance Population Sample Standard Deviation
Skewness of distributions n Measures look at how lopsided distributions are—how far from the ideal of the normal curve they are n When the median and the mean are different, the distribution is skewed. The greater the difference, the greater the skew.
Distributions that trail away to the left are negatively skewed and those that trail away to the right are positively skewed n If the skewness is extreme, the researcher should either transform the data to make them better resemble a normal curve or else use a different set of statistics— nonparametric statistics—to carry out the analysis n
Different Shapes of Distributions Source: http: //faculty. vassar. edu/lowry/f 0204. gif
Skewness of distributions Source: http: //www. polity. org. za/html/govdocs/reports/aids/image 022. gif
Distribution of posting frequency on Usenet
Kurtosis n Measures of kurtosis look at how sharply the distribution rises to a peak and then drops away
- Slides: 57