Instructor Nahid Farnaz Nhn North South University BUS
Instructor: Nahid Farnaz (Nhn) North South University BUS 173: Applied Statistics Chapter 3 Describing Data: Numerical Chap 3 -1
Measures of Central Tendency Overview Central Tendency Mean Median Mode Arithmetic average Midpoint of ranked values Most frequently observed value Chap 3 -2
Arithmetic Mean n The arithmetic mean (mean) is the most common measure of central tendency n For a population of N values: Population values Population size n For a sample of size n: Observed values Sample size Chap 3 -3
Arithmetic Mean (continued) n n n The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 Chap 3 -4
Median n In an ordered list, the median is the “middle” number (50% above, 50% below) 0 1 2 3 4 5 6 7 8 9 10 Median = 3 n Not affected by extreme values Chap 3 -5
Finding the Median n The location of the median: n n n If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of the two middle numbers Note that is not the value of the median, only the position of the median in the ranked data Chap 3 -6
Mode n n n A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be no mode There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode Chap 3 -7
Which measure of location is the “best”? n n Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. n Example: Median home prices may be reported for a region – less sensitive to outliers Chap 3 -8
Shape of a Distribution n Describes how data are distributed n Measures of shape n Symmetric or skewed Left-Skewed Symmetric Right-Skewed Mean < Median Mean = Median < Mean Chap 3 -9
Exercise 3. 2 A department store manager is interested in the number of complaints received by the customer service dept. about the quality of electrical products sold. Records over a 5 week period show the following number of complaints for each week: 13, 15, 8, 16, 8 a. Compute the mean number of weekly complaints b. Compute the median c. Find the mode Ans: a. 12; b. 13; c. 8 Statistics for Business and Economics, 6 e © 2007 Pearson Education, Inc. Chap 3 -10
Ex. 3. 6 The demand for bottled water increases during the hurricane season in Florida. A random sample of 7 hours showed that the following numbers of 1 gallon bottles were sold in one store: 40, 55, 62, 43, 50, 65. a. Describe the central tendency of the data b. Comment on symmetry or skewness Ans: Mean=53. 57; no unique mode; median=55; left skewed Statistics for Business and Economics, 6 e © 2007 Pearson Education, Inc. Chap 3 -11
Measures of Variability Variation Range n Interquartile Range Variance Standard Deviation Coefficient of Variation Measures of variation give information on the spread or variability of the data values. Same center, different variation Chap 3 -12
Range n n Simplest measure of variation Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Chap 3 -13
Disadvantages of the Range n Ignores the way in which data are distributed 7 8 9 10 11 12 Range = 12 - 7 = 5 n 7 8 9 10 11 12 Range = 12 - 7 = 5 Sensitive to outliers 1, 1, 1, 2, 2, 3, 3, 4, 5 Range = 5 - 1 = 4 1, 1, 1, 2, 2, 3, 3, 4, 120 Range = 120 - 1 = 119 Chap 3 -14
Interquartile Range n n n Can eliminate some outlier problems by using the interquartile range Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data Interquartile range = 3 rd quartile – 1 st quartile IQR = Q 3 – Q 1 Chap 3 -15
Quartile Formulas Find a quartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q 1 = 0. 25(n+1) Second quartile position: Q 2 = 0. 50(n+1) (the median position) Third quartile position: Q 3 = 0. 75(n+1) where n is the number of observed values Chap 3 -16
Interquartile Range Example: X minimum Q 1 25% 12 Median (Q 2) 25% 30 25% 45 X Q 3 maximum 25% 57 70 Interquartile range = 57 – 30 = 27 q Five number summary: Minimum < Q 1 < Median Q 2 < Q 3 < Maximum Chap 3 -17
Quartiles n Example: Find the first quartile Sample Ranked Data: 11 12 13 16 16 17 18 21 22 (n = 9) Q 1 = is in the 0. 25(9+1) = 2. 5 position of the ranked data so use the value half way between the 2 nd and 3 rd values, so Q 1 = 12. 5 Chap 3 -18
Example 3. 4: Waiting Times at Gilotti’s Grocery (five number summary) A stem and leaf display for a sample of 25 waiting times (in seconds) is given below. n Compute the 5 number summary. n Calculate and interpret IQR Minutes N = 25 Stem Unit =10. 0; Leaf Unit = 1. 0 1 1 2 4 6 7 8 8 9 9 2 1 2 2 2 4 6 8 9 9 3 0 1 2 3 4 4 0 2 Chap 3 -19
Population Variance n Average of squared deviations of values from the mean n Population variance: Where = population mean N = population size xi = ith value of the variable x Chap 3 -20
Sample Variance n Average (approximately) of squared deviations of values from the mean n Sample variance: Where = sample mean n = sample size Xi = ith value of the variable X Chap 3 -21
Population Standard Deviation n Most commonly used measure of variation Shows variation about the mean Has the same units as the original data n Population standard deviation: Chap 3 -22
Sample Standard Deviation n Most commonly used measure of variation Shows variation about the mean Has the same units as the original data n Sample standard deviation: Chap 3 -23
Example: Sample Standard Deviation Sample Data (xi) : 10 12 n=8 14 15 17 18 18 24 Mean = x = 16 A measure of the “average” scatter around the mean Chap 3 -24
Example 3. 5 n A professor teaches two large sections of marketing and randomly selects a sample of test scores from both sections. Find the range and standard deviation from each sample: Section 1 50 60 70 80 90 Section 2 72 68 70 74 66 Statistics for Business and Economics, 6 e © 2007 Pearson Education, Inc. Chap 3 -25
Chebyshev’s Theorem This theorem established data intervals for any data set, regardless of the shape and distribution n For any population with mean μ and standard deviation σ , and k > 1 , the percentage of observations that fall within the interval [μ kσ] Is at least n Chap 3 -26
Chebyshev’s Theorem (continued) n Regardless of how the data are distributed, at least (1 - 1/k 2) of the values will fall within k standard deviations of the mean (for k > 1) n Examples: At 2 least (1 - 1/1 ) = 0% ……. . . k=1 within (μ ± 1σ) (1 - 1/22) = 75% …. . . . k=2 (μ ± 2σ) (1 - 1/32) = 89% ………. k=3 (μ ± 3σ) Chap 3 -27
The Empirical Rule n n If the data distribution is bell-shaped, then the interval: contains about 68% of the values in the population or the sample 68% Chap 3 -28
The Empirical Rule n n contains about 95% of the values in the population or the sample contains about 99. 7% of the values in the population or the sample 95% 99. 7% Chap 3 -29
Exercise 3. 18 Use Chebychev’s theorem to approximate each of the following observations if the mean is 250 and S. D. is 20. Approximately what proportion of the observations is a) Between 190 and 310? b) Between 210 and 290? c) Between 230 and 270? Ans: a) 88. 9% k=3; b) 75% k=2; c) 0% k=1 Chap 3 -30
Coefficient of Variation n Measures relative variation n Always in percentage (%) n Shows standard deviation relative to mean n Can be used to compare two or more sets of data measured in different units Chap 3 -31
Comparing Coefficient of Variation n n Stock A: n Average price last year = $50 n Standard deviation = $5 Stock B: n n Average price last year = $100 Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price Chap 3 -32
The Sample Covariance n The covariance measures the strength of the linear relationship between two variables n The population covariance: n The sample covariance: n n Only concerned with the strength of the relationship No causal effect is implied Chap 3 -33
Interpreting Covariance n Covariance between two variables: Cov(x, y) > 0 x and y tend to move in the same direction Cov(x, y) < 0 x and y tend to move in opposite directions Cov(x, y) = 0 x and y are independent Chap 3 -34
Correlation Coefficient n Measures the relative strength of the linear relationship between two variables n Population correlation coefficient: n Sample correlation coefficient: Chap 3 -35
Features of Correlation Coefficient, r n Unit free n Ranges between – 1 and 1 n n n The closer to – 1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship Chap 3 -36
Scatter Plots of Data with Various Correlation Coefficients Y Y r = -1 X Y Y r = -. 6 X Y Y r = +1 X r=0 X r = +. 3 X r=0 X Chap 3 -37
Interpreting the Result n n n r =. 733 There is a relatively strong positive linear relationship between test score #1 and test score #2 Students who scored high on the first tended to score high on second test Chap 3 -38
Example 3. 13 (Covariance and Correlation Coefficient) Rising Hills Manufacturing Inc. wishes to study the relationship between the number of workers, X, and the number of tables produced, Y. it has obtained a random sample of 10 hours of production. The following (x, y) combinations of points were obtained: (12, 20) (30, 60) (15, 27) (24, 50) (14, 21) (18, 30) (28, 61) (26, 54) (19, 32) (27, 57) Compute the covariance and correlation coefficient. Briefly discuss the relationship between X and Y. Chap 3 -39
Obtaining Linear Relationships n An equation can be fit to show the best linear relationship between two variables: Y = β 0 + β 1 X Where Y is the dependent variable and X is the independent variable Chap 3 -40
Least Squares Regression n Estimates for coefficients β 0 and β 1 are found to minimize the sum of the squared residuals The least-squares regression line, based on sample data, is Where b 1 is the slope of the line and b 0 is the yintercept: Chap 3 -41
Example 3. 15 n n Using the answers from Example 3. 13 (Rising Hills Inc. ), compute the sample regression coefficients b 1 and b 0. What is the sample regression line? Statistics for Business and Economics, 6 e © 2007 Pearson Education, Inc. Chap 3 -42
- Slides: 42