Daniel S Yates The Practice of Statistics Third

  • Slides: 38
Download presentation
Daniel S. Yates The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.

Daniel S. Yates The Practice of Statistics Third Edition Chapter 1: Exploring Data 1. 2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company

Objectives for 1. 2 • Given a data set, How do you compute mean,

Objectives for 1. 2 • Given a data set, How do you compute mean, median, quartiles, and the five-number summary? • How do you construct a box plot using the fivenumber summary? • How do you compute the inter-quartile range? • How do you identify an outlier using the interquartile range rule? • How do you compute the standard deviation and variance?

Measure for The Center of a Distribution

Measure for The Center of a Distribution

The Means of a Data Set • So far, we know several measures of

The Means of a Data Set • So far, we know several measures of central tendency of a set of numbers: means, median, and mode. • The means is the arithmetic average of the data set.

The Mean of a Data Set “Average Value” • Σ (sigma) means to add

The Mean of a Data Set “Average Value” • Σ (sigma) means to add them all up. All the data values and get a total. • Take the total and divide by the number of data.

Example - Mean • Joey’s first 14 quiz grades in a marking period were

Example - Mean • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Find the mean. • Answer 85. • Use calculator – Stat edit, enter data in L 1 Second Stat, Math, Mean( L 1), Enter

The Median of the Data Set • Median is the center of the data

The Median of the Data Set • Median is the center of the data set. • Half of the data set is above and Half is below the median. The 50 th Percentile. • The median may or may not be in the data set.

Calculation for Median “Middle Value”

Calculation for Median “Middle Value”

Example - Median • Joey’s first 14 quiz grades in a marking period were

Example - Median • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Find the median. • Answer 85. • Use calculator – Stat edit, enter data in L 1 Second Stat, Math, Median( L 1), Enter

Terminology “A measure is resistant” • A measure that does not respond strongly to

Terminology “A measure is resistant” • A measure that does not respond strongly to the influence of outliers (extreme observations). • Furthermore, a measure that is resistant does not respond strongly to changes in a few observations.

Are mean and median resistant? Mean and Median Applet

Are mean and median resistant? Mean and Median Applet

Mean vs Median • Mean is not a resistant measure. – It is sensitive

Mean vs Median • Mean is not a resistant measure. – It is sensitive to the influence of a few extreme observations (outliers). – It is sensitive to skewed distributions. The mean is pulled towards the tail. • Median is resistant. – It is resistant to extreme values and skewed distributions. • For skewed distributions the median is the better measure for center.

Measure for Spread Range Quartiles Five Number Summary The Standard Deviation

Measure for Spread Range Quartiles Five Number Summary The Standard Deviation

Range • The difference between the largest value and the smallest value. • Gives

Range • The difference between the largest value and the smallest value. • Gives the full spread of the data. • But may be dependent on outliers.

Quartiles • We can describe the spread (variability of a distribution) by giving several

Quartiles • We can describe the spread (variability of a distribution) by giving several percentiles (pth percentile of a distribution) • Typically we use 25 th percentile, 50 th percentile, 75 th percentile. • Q 1, median, Q 3.

Example • Joey’s first 14 quiz grades in a marking period were 86, 84,

Example • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Find Q 1, median, and Q 3. • Answer: Q 1 = 78, Median = 85, Q 3 = 91 • Using the calculator – STAT, CALC, 1 -Var Stats L 1, ENTER

Five Number Summary Using the calculator, we again use 1 -Var Stats.

Five Number Summary Using the calculator, we again use 1 -Var Stats.

Five Number Summary Computer Software Output

Five Number Summary Computer Software Output

Five Number Summary Computer Software Output

Five Number Summary Computer Software Output

Graphical Display of 5 Number Summary

Graphical Display of 5 Number Summary

Example - Boxplot • Joey’s first 14 quiz grades in a marking period were

Example - Boxplot • Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Answer 74 78 85 91 98 • Calculator STAT PLOT, make appropriate selections on the menu, ZOOM, 9: Zoom Stat

Interquartile Range

Interquartile Range

Identifying Outliers

Identifying Outliers

Variance and Standard Deviation

Variance and Standard Deviation

Example – Variance and Standard 86 86 -85=1 1 Deviation 84 84 -85=-1 1

Example – Variance and Standard 86 86 -85=1 1 Deviation 84 84 -85=-1 1 Joey’s first 14 quiz grades in a marking period were 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 91 91 -85=6 36 75 75 -85=-10 100 Calculate the variance and standard deviation. 78 78 -85=-7 49 80 80 -85=-5 25 74 74 -85=-11 121 87 87 -85=2 4 76 76 -85=-9 81 96 96 -85=11 121 82 82 -85=-3 9 90 90 -85=5 25 98 98 -85=13 169 93 93 -85=8 64 Standard Deviation Calculator – STAT EDIT, enter data in list 1, QUIT STAT CALC 1 -Var Stat Total 1190 Tot 806

Standard Deviation • The standard deviation is zero when there is no spread. •

Standard Deviation • The standard deviation is zero when there is no spread. • The Standard deviation gets larger as the spread increases.

Impact of adding a constant to all data in the set? • Joey’s first

Impact of adding a constant to all data in the set? • Joey’s first 14 quiz grades in a marking period were – 86, 84, 91, 75, 78, 80, 74, 87, 76, 96, 82, 90, 98, 93 • Add 32 points to each score, then store in L 2. • Compute 1 -Var Stat. What has changed? • The five-number summary has changed but the standard deviation has not? • The measure the spread remains the same?

The impact of multiplying each data in the set by a constant? • •

The impact of multiplying each data in the set by a constant? • • Using the data set in L 1 multiply the 2. Compute 1 -Var Stat. What has changed? The five-number summary has changed by 2 times and the standard deviation has changed by 2 times. • The measure of the spread has increased.