Descriptive Statistics Frequency Tables Visual Displays Measures of
Descriptive Statistics Frequency Tables Visual Displays Measures of Center 1
Overview v Descriptive Statistics summarize or describe the important characteristics of a known set of population data v Inferential Statistics use sample data to make inferences (or generalizations) about a population 2
Important Characteristics of Data 1. Center: A representative or average value that indicates where the middle of the data set is located 2. Variation: A measure of the amount that values vary among themselves the 3. Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed) 4. Outliers: Sample values that lie very far away from the vast majority of other sample values 5. Time: Changing characteristics of the data over time 3
Summarizing Data With Frequency Tables v Frequency Table lists classes (or categories) of values, along with frequencies (or counts) of the number of values that fall into each class 4
Table 2 -1 Qwerty Keyboard Word Ratings 2 2 5 1 2 6 3 3 4 2 4 0 5 7 7 5 6 6 8 10 7 2 2 10 5 8 2 5 4 2 6 1 7 2 3 8 1 5 2 14 2 2 6 3 1 7 5
Table 2 -3 Frequency Table of Qwerty Word Ratings Rating Frequency 0 -2 20 3 -5 14 6 -8 15 9 - 11 2 12 - 14 1 6
Frequency Table Definitions 7
Lower Class Limits are the smallest numbers that can actually belong to different classes Rating Lower Class Limits Frequency 0 -2 20 3 -5 14 6 -8 15 9 - 11 2 12 - 14 1 8
Upper Class Limits are the largest numbers that can actually belong to different classes Rating Upper Class Limits Frequency 0 -2 20 3 -5 14 6 -8 15 9 - 11 2 12 - 14 1 9
Class Boundaries number separating classes Rating Frequency - 0. 5 Class Boundaries 2. 5 5. 5 8. 5 11. 5 0 -2 20 3 -5 14 6 -8 15 9 - 11 2 12 - 14 1 14. 5 10
Class Midpoints midpoints of the classes Rating Class Midpoints Frequency 0 - 1 2 20 3 - 4 5 14 6 - 7 8 15 9 - 10 11 2 12 - 13 14 1 11
Class Width is the difference between two consecutive lower class limits or two consecutive class boundaries Rating Class Width Frequency 3 0 -2 20 3 3 -5 14 3 6 -8 15 3 9 - 11 2 3 12 - 14 1 12
Guidelines For Frequency Tables 1. Be sure that the classes are mutually exclusive. 2. Include all classes, even if the frequency is zero. 3. Try to use the same width for all classes. 4. Select convenient numbers for class limits. 5. Use between 5 and 20 classes. 6. The sum of the class frequencies must equal the number of original data values. 13
Constructing A Frequency Table 1. Decide on the number of classes. 2. Determine the class width by dividing the range by the number of classes (range = highest score - lowest score) and round up. class width range round up of number of classes 3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score. 4. Add the class width to the starting point to get the second lower class limit, add the width to the second lower limit to get the third, and so on. 5. List the lower class limits in a vertical column and enter the upper class limits. 6. Represent each score by a tally mark in the appropriate class. Total tally marks to find the total frequency for each class. 14
Inputting a Frequency Table into the TI 83 Calculator • • Push Stat Select Edit Input Class Marks in List 1 Input Frequencies in List 2 15
Relative Frequency Table relative frequency = class frequency sum of all frequencies 16
Relative Frequency Table Rating Frequency Relative Rating Frequency 0 -2 20 0 -2 38. 5% 20/52 = 38. 5% 3 -5 14 3 -5 26. 9% 14/52 = 26. 9% 6 -8 15 6 -8 28. 8% 9 - 11 2 9 - 11 3. 8% 12 - 14 1. 9% etc. Total frequency = 52 17
Cumulative Frequency Table Rating Frequency Rating Cumulative Frequency 0 -2 20 Less than 3 20 3 -5 14 Less than 6 34 6 -8 15 Less than 9 49 9 - 11 2 Less than 12 51 12 - 14 1 Less than 15 52 Cumulative Frequencies 18
Frequency Tables Rating Frequency Rating Relative Frequency Rating Cumulative Frequency 0 -2 20 0 -2 38. 5% Less than 3 20 3 -5 14 3 -5 26. 9% Less than 6 34 6 -8 15 6 -8 28. 8% Less than 9 49 9 - 11 2 9 - 11 3. 8% Less than 12 51 12 - 14 1. 9% Less than 15 52 19
Pictures of Data depict the nature or shape of the data distribution Histogram a bar graph in which the horizontal scale represents classes and the vertical scale represents frequencies 20
Histogram of Qwerty Word Ratings Rating Frequency 0 -2 20 3 -5 14 6 -8 15 9 - 11 2 12 - 14 1 21
Relative Frequency Histogram of Qwerty Word Ratings Relative Rating Frequency 0 -2 38. 5% 3 -5 26. 9% 6 -8 28. 8% 9 - 11 3. 8% 12 - 14 1. 9% 22
Histogram and Relative Frequency Histogram 23
Viewing a Histogram Using the TI 83 Calculator • • • First input class marks and frequencies in L 1 & L 2 2 nd y= (Statplot) Select 1 Select “on” Select histogram type (top row, 3 rd graph) X list = 2 nd 1 (L 1) Freq = 2 nd 2 (L 2) Zoom 9 (Zoom. Stat) 24
Frequency Polygon 25
Viewing a Frequency Polygon Using the TI 83 Calculator • • • First input class marks and frequencies in L 1 & L 2 2 nd y= (Statplot) Select 1 Select “on” Select polygon type (top row, 2 nd graph) X list = 2 nd 1 (L 1) Freq = 2 nd 2 (L 2) Zoom 9 (Zoom. Stat) 26
Ogive 27
Dot Plot 28
Stem-and Leaf Plot Stem Raw Data (Test Grades) 67 72 89 85 88 90 75 89 99 100 6 7 8 9 10 Leaves 7 25 5899 09 0 29
Pareto Chart 45, 000 40, 000 35, 000 Accidental Deaths by Type 25, 000 20, 000 15, 000 10, 000 5, 000 Firearms Ingestion of food or object Fire Drowning Poison Falls 0 Motor Vehicle Frequency 30, 000 30
Pie Chart Firearms (1400. 1. 9%) Ingestion of food or object (2900. 3. 9% Fire (4200. 5. 6%) Motor vehicle (43, 500. 57. 8%) Drowning (4600. 6. 1%) Poison (6400. 8. 5%) Accidental Deaths by Type Falls (12, 200. 16. 2%) 31
Scatter Diagram 20 TAR • 10 • • 0 0. 0 • • • • • 0. 5 1. 0 1. 5 NICOTINE 32
Other Graphs v Boxplots v Pictographs v Pattern of data over time 33
Measures of Center a value at the center or middle of a data set 34
Definitions Mean (Arithmetic Mean) AVERAGE the number obtained by adding the values and dividing the total by the number of values 35
Notation denotes the addition of a set of values x is the variable usually used to represent the individual data values n represents the number of data values in a sample N represents the number of data values in a population 36
Notation x is pronounced ‘x-bar’ and denotes the mean of a set of sample values x x = n µ is pronounced ‘mu’ and denotes the mean of all values a population µ = in x N Calculators can calculate the mean of data 37
Finding a Mean Using the TI 83 Calculator • • • Push Stat Select Edit Enter data into L 1 2 nd mode (quit) Push Stat >Calc Select 1 (1 -Var. Stats) 2 nd 1 (L 1) The first output item is the mean 38
Definitions v Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude v often denoted by x~ (pronounced ‘x-tilde’) v is not affected by an extreme value 39
6. 72 3. 46 3. 60 6. 44 26. 70 3. 46 3. 60 6. 44 6. 72 26. 70 (in order - exact middle odd number of values) MEDIAN is 6. 44 40
6. 72 3. 46 3. 60 6. 44 6. 72 (even number of values) no exact middle -- shared by two numbers 3. 60 + 6. 44 2 MEDIAN is 5. 02 41
Finding a Median Using the TI 83 Calculator • • • Push Stat Select Edit Enter data into L 1 2 nd mode (quit) Push Stat >Calc Select 1 (1 -Var. Stats) 2 nd 1 (L 1) Arrow down to “Med” to see the value for median 42
Definitions v Mode the score that occurs most frequently Bimodal Multimodal No Mode denoted by M the only measure of central tendency that can be used with nominal data 43
Examples a. 5 5 5 3 1 5 1 4 3 5 ïMode is 5 b. 1 2 2 2 3 4 5 6 6 6 7 9 ïBimodal c. 1 2 3 6 7 8 9 10 ïNo Mode 2 and 6 44
Definitions v Midrange the value midway between the highest and lowest values in the original data set Midrange = highest score + lowest score 2 45
Finding a Midrange Using the TI 83 Calculator • • • Push Stat Select Edit Enter data into L 1 2 nd mode (quit) Push Stat >Calc Select 1 (1 -Var. Stats) 2 nd 1 (L 1) Arrow down to see the “min” and “max” values for the midrange calculation 46
Round-off Rule for Measures of Center Carry one more decimal place than is present in the original set of values 47
Mean from a Frequency Table use class midpoint of classes for variable x (f • x) x = f x = class midpoint f = frequency f=n 48
Finding a Grouped Mean Using the TI 83 Calculator • • • Push Stat Select Edit Enter class marks into L 1 and frequencies into L 2 2 nd mode (quit) Push Stat >Calc Select 1 (1 -Var. Stats) 2 nd 1, 2 nd 2 (L 1, L 2) The first output item is the mean 49
Descriptive Statistics Measures of Variability Measures of Position 50
Definitions v Symmetric Data is symmetric if the left half of its histogram is roughly a mirror of its right half. v Skewed Data is skewed if it is not symmetric and if it extends more to one side than the other. 51
Skewness Mode = Mean = Median SYMMETRIC Mean Mode Median SKEWED LEFT (negatively) Mean Mode Median SKEWED RIGHT (positively) 52
Waiting Times of Bank Customers at Different Banks in minutes Jefferson Valley Bank 6. 5 6. 6 6. 7 6. 8 7. 1 7. 3 7. 4 7. 7 Bank of Providence 4. 2 5. 4 5. 8 6. 2 6. 7 7. 7 8. 5 9. 3 10. 0 Jefferson Valley Bank of Providence Mean 7. 15 Median 7. 20 Mode 7. 7 Midrange 7. 10 53
Dotplots of Waiting Times 54
Measures of Variation Range highest value lowest value 55
Measures of Variation Standard Deviation a measure of variation of the scores about the mean (average deviation from the mean) 56
Sample Standard Deviation Formula S= (x - x) n-1 2 calculators can compute the sample standard deviation of data 57
Sample Standard Deviation Shortcut Formula s= n ( x ) - ( x) n (n - 1) 2 2 calculators can compute the sample standard deviation of data 58
Population Standard Deviation = (x - µ) 2 N calculators can compute the population standard deviation of data 59
Symbols for Standard Deviation Sample Textbook Some graphics calculators Some non-graphics calculators s Sx x n-1 Population x x n Textbook Some graphics calculators Some non-graphics calculators Articles in professional journals and reports often use SD for standard deviation and VAR for variance. 60
Finding a Standard Deviation Using the TI 83 Calculator • • • Push Stat Select Edit Enter data into L 1 2 nd mode (quit) Push Stat >Calc Select 1 (1 -Var. Stats) 2 nd 1 (L 1) The 4 th output item is the sample s. d. and the 5 th output item is the population s. d. 61
Measures of Variation Variance standard deviation squared } Notation s 2 2 use square key on calculator 62
Variance 2 s = 2 = (x - x ) 2 n-1 (x - µ) N 2 Sample Variance Population Variance 63
Round-off Rule for measures of variation Carry one more decimal place than is present in the original set of values. Round only the final answer, never in the middle of a calculation. 64
Standard Deviation from a Frequency Table n [ (f • x 2)] -[ (f • x)]2 S= n (n - 1) Use the class midpoints as the x values Calculators can compute the standard deviation for frequency table 65
Finding a Grouped Standard Deviation Using the TI 83 Calculator • • • Push Stat Select Edit Enter class marks into L 1 and frequencies into L 2 2 nd mode (quit) Push Stat >Calc Select 1 (1 -Var. Stats) 2 nd 1, 2 nd 2 (L 1, L 2) The 4 th output item is the sample s. d and the 5 th output item is the population s. d 66
Estimation of Standard Deviation Range Rule of Thumb x - 2 s x (minimum usual value) Range 4 s x + 2 s (maximum usual value) or s Range 4 highest value - lowest value = 4 67
Usual Sample Values minimum ‘usual’ value (mean) - 2 (standard deviation) minimum x - 2(s) maximum ‘usual’ value (mean) + 2 (standard deviation) maximum x + 2(s) 68
The Empirical Rule (applies to bell-shaped distributions) x 69
The Empirical Rule (applies to bell-shaped distributions) 68% within 1 standard deviation 34% x-s 34% x x+s 70
The Empirical Rule (applies to bell-shaped distributions) 95% within 2 standard deviations 68% within 1 standard deviation 34% 13. 5% x - 2 s 13. 5% x-s x x+s x + 2 s 71
The Empirical Rule (applies to bell-shaped distributions) 99. 7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 34% 2. 4% 0. 1% 13. 5% x - 3 s x - 2 s 13. 5% x-s x x+s x + 2 s x + 3 s 72
Chebyshev’s Theorem v applies to distributions of any shape. v the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always 2 at least 1 - 1/K , where K is any positive number greater than 1. v at least 3/4 (75%) of all values lie within 2 standard deviations of the mean. v at least 8/9 (89%) of all values lie within 3 standard deviations of the mean. 73
Measures of Variation Summary For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations. 74
Measures of Position v z Score (or standard score) the number of standard deviations that a given value x is above or below the mean 75
Measures of Position z score Sample x x z= s Population x µ z= Round to 2 decimal places 76
Interpreting Z Scores Unusual Values -3 Ordinary Values -2 -1 0 Unusual Values 1 2 3 Z 77
Measures of Position Quartiles, Deciles, Percentiles 78
Quartiles Q 1, Q 2, Q 3 divides ranked scores into four equal parts 25% (minimum) 25% 25% Q 1 Q 2 Q 3 (maximum) (median) 79
Deciles D 1, D 2, D 3, D 4, D 5, D 6, D 7, D 8, D 9 divides ranked data into ten equal parts 10% 10% D 1 D 2 D 3 10% 10% D 4 D 5 10% 10% D 6 D 7 D 8 D 9 80
Percentiles 99 Percentiles 81
Quartiles, Deciles, Percentiles Fractiles (Quantiles) partitions data into approximately equal parts 82
Quartiles Q 1 = P 25 Q 2 = P 50 Q 3 = P 75 Deciles D 1 = P 10 D 2 = P 20 D 3 = P 30 • • • D 9 = P 90 83
Viewing Quartiles Using the TI 83 Calculator • • • Push Stat Select Edit Enter data into L 1 2 nd mode (quit) Push Stat >Calc Select 1 (1 -Var. Stats) 2 nd 1 (L 1) Arrow down to view the values for Q 2 and Q 3 (the median “med” is Q 2) 84
Exploratory Data Analysis the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate the data sets in order to understand their important characteristics 85
Outliers v a value located very far away from almost all of the other values v an extreme value v can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is totally obscured 86
Boxplots (Box-and-Whisker Diagram) Reveals the: v center of the data v spread of the data v distribution of the data v presence of outliers Excellent for comparing two or more data sets 87
Boxplots 5 - number summary v Minimum v first quartile Q 1 v Median (Q 2) v third quartile Q 3 v Maximum 88
Boxplots 2 4 6 14 0 0 6 8 10 12 14 Boxplot of Qwerty Word Ratings 89
Viewing Boxplots Using the TI 83 Calculator • • • First input class marks and frequencies in L 1 & L 2 2 nd y= (Statplot) Select 1 Select “on” Select boxplot type (bottom row, 2 nd graph) X list = 2 nd 1 (L 1) Freq = 2 nd 2 (L 2) Zoom 9 (Zoom. Stat) 90
Boxplots Bell-Shaped Uniform Skewed 91
Exploring v Measures of center: mean, median, and mode v Measures of variation: standard deviation and range v Measures of spread & relative location: minimum values, maximum value, and quartiles v Unusual values: outliers v Distribution: histograms, stem-leaf plots, and boxplots 92
- Slides: 92