Below is listed the test results of 2
Below is listed the test results of 2 different classes. It is human nature to compare these results. How can we compare the results of one class to that of another? Class A: 52, 59, 60, 62, 64, 65, 68, 72, 80, 82, 84, 86, 87, 88, 90, 91, 95, 96 Class B: 44, 49, 55, 57, 61, 66, 71, 74, 81, 82, 83, 85, 90, 92, 94, 98, 99, 100 Perhaps the quickest measure of central tendency is to observe the most frequent mark for each class. In Class A the mark of 82% occurs twice whereas the other marks only occur once. In Class B the exact same thing happens. For this reason this method is unsatisfactory. This measure of central tendency is called the MODE. It has the advantage of being very easy and relatively quick to determine but it is not very helpful in many situations as it can often be very misleading. Another way to compare classes is to compare the middle value from each class. Since both classes have 19 values, we are looking for the tenth value (there are 9 values below and 9 values above. Class A: 52, 59, 60, 62, 64, 65, 68, 72, 80, 82, 84, 86, 87, 88, 90, 91, 95, 96 9 values below 9 values above Class B: 44, 49, 55, 57, 61, 66, 71, 74, 81, 82, 83, 85, 90, 92, 94, 98, 99, 100 9 values below 9 values above This is called the MEDIAN. 82 for both here as well. This method is a little more time -consuming especially if the numbers are not in order as they are here. The Median is generally a more reliable measure of central tendency than the Mode.
The third and perhaps most common measure of central tendency is the MEAN. This is also known as the average or the arithmetic mean. It is also the most tedious value to determine – especially with a lot of values because you have to add them all up. Lets assign each value in a class to variable ‘x’. So in Class A, x 1 = 52, x 2 = 59, x 3 = 60 and x 19 = 96. When we look at the values in this way we can express the sum of the values like so: This is a very cumbersome way to express the sum of the data values. There is a special symbol used in statistics to represent the sum of data values. We can read this symbol as follows: The sum of xi, where x ranges from ‘i’ to ‘n’ represents the total number of data values. = 52+59+60+62+64+65+68+72+80+82+82+84+86+87+88+90+91+95+96 Even though that symbol may be weird and foreign to you, which would you prefer to write: the weird symbol on the left side of the ‘=‘ or the sum of the 19 numbers on the right. One more point – sometimes it can be the sum of more than 19 numbers. One of the reasons that we have to write it is to express the formula for MEAN. By the way, since we are using variable ‘x’ to represent the individual marks, we will use the following symbol to represent the MEAN of the marks:
The formula to determine MEAN from a set of values is: Where n represents the total data values Class A: 52, 59, 60, 62, 64, 65, 68, 72, 80, 82, 84, 86, 87, 88, 90, 91, 95, 96 Class B: 44, 49, 55, 57, 61, 66, 71, 74, 81, 82, 83, 85, 90, 92, 94, 98, 99, 100 We see that the mean ends up having the same value for both classes as well. None of the 3 measures of central tendency does anything to distinguish the results of one class from the results of another. There are other ways to make a distinction between the results of both classes than using central tendency. We can observe the VARIANCE. This is an indication of how spread out or dispersed the values are. It is a measure of how much the data values deviate from the arithmetic mean. The larger the variance, the greater the dispersion and the smaller the variance, the more clustered the data values.
The symbol for variance is: s 2 To calculate variance, we really should do it step by step. 1. Calculatearithmetic mean 2. Calculatethe thedifference between each data value and the mean 3. Squarethe theresult from step 2 2 4. Calculatethe thesum ofof all of of thethe values from step 3 3 5. Divide the result of step 4 by (n – 1)
Class A Remember that the symbol for variance is s 2 not s. We do not take the square root of 185. 67 to determine variance. The variance is 185. 67. 52 52 - 77 -25 625 59 59 - 77 -18 324 60 60 - 77 -17 289 62 62 - 77 -15 225 64 64 - 77 -13 169 65 65 - 77 -12 144 68 68 - 77 -9 81 72 72 - 77 -5 25 80 80 - 77 3 9 82 82 - 77 5 25 84 84 - 77 7 49 86 86 - 77 9 81 87 87 - 77 10 100 88 88 - 77 11 121 90 90 - 77 13 169 91 91 - 77 14 196 95 95 - 77 18 324 96 96 - 77 19 361 = 3342
Class B 44 44 - 77 -33 1089 49 49 - 77 -28 784 55 55 - 77 -22 484 57 57 - 77 -20 400 61 61 - 77 -16 256 66 66 - 77 -11 121 71 71 - 77 -6 36 74 74 - 77 -3 9 81 81 - 77 4 16 82 82 - 77 5 25 83 83 - 77 6 36 85 85 - 77 8 64 90 90 - 77 13 169 92 92 - 77 15 225 94 94 - 77 17 289 98 98 - 77 21 441 99 99 - 77 22 484 100 - 77 23 529 = 5482
The variance for Class A is 185. 67 as compared to the variance for Class B is 304. 56. This indicates that the values are more spread out for Class B. You might say that the calculation we just did is unnecessary. We can determine that Class B is more spread out than Class A just by observing its range (lowest to highest value). Class A ranges from 52 to 96, a separation of 44. Class B ranges from 44 to 100, a difference of 56. Class B is more spread out. All of that is true but the variance is more revealing because it takes into consideration all of the values whereas range only takes 2 values from each class. Actually, due to the fact that the difference between the mean and each specific value is squared, the variance is not the best way to compare dispersion of the classes. To compensate for this fact, we can square root the variance. This allows the units measuring dispersion to be the same as the units for the class values. When we do this we get the STANDARD DEVIATION. The symbol for standard deviation is ‘s’. For Class A For Class B
The owner of 2 service stations decided to record the number of litres of gasoline needed to fill the tank of each car that stops at one of his service stations. One of the stations is located along the highway and the other is located downtown. Highway Service Station: 30, 22, 21, 28, 25, 26, 24, 29, 23, 20, 27, 25, 24, 25 Downtown Service Station: 25, 23, 30, 19, 35, 27, 15, 25, 17, 31, 14, 20, 33, 25, 36 30 30 - 25 5 25 22 22 - 25 -3 9 21 21 - 25 -4 16 28 28 - 25 3 9 25 25 - 25 0 0 26 26 - 25 1 1 24 24 - 25 -1 1 29 29 - 25 4 16 23 23 - 25 -2 4 20 20 - 25 -5 25 27 27 - 25 2 4 25 25 - 25 0 0 24 24 - 25 -1 1 25 25 - 25 0 0 = 112 Highway Service Station
The owner of 2 service stations decided to record the number of litres of gasoline needed to fill the tank of each car that stops at one of his service stations. One of the stations is located along the highway and the other is located downtown. Highway Service Station: 30, 22, 21, 28, 25, 26, 24, 29, 23, 20, 27, 25, 24, 25 Downtown Service Station: 25, 23, 30, 19, 35, 27, 15, 25, 17, 31, 14, 20, 33, 25, 36 25 25 - 25 0 0 23 23 - 25 -2 4 30 30 - 25 5 25 19 19 - 25 -6 36 35 35 - 25 10 100 27 27 - 25 2 4 15 15 - 25 -10 100 25 25 - 25 0 0 17 17 - 25 -8 64 31 31 - 25 6 36 14 14 - 25 -11 121 20 20 - 25 -5 25 33 33 - 25 8 64 25 25 - 25 0 0 36 36 - 25 11 121 = 700 Downtown Service Station
Box and Whiskers plot Highway Downtown 14 16 18 20 22 24 26 28 30 32 34 36 38 Number of litres The semi-interquartile range is basically half of the interquartile range and it is the mean length of a quartile. About half of the values will fall between Md - Q and Md + Q Q 3 = 27 L From the box-and-whiskers plot the minimum and maximum from the highway station are 20 and 30 respectively. The minimum and maximum from the downtown station are 14 and 36 respectively. The quartiles (Q 1, Q 2, and Q 3) are as follows: Q 1 = 23 L Q 1 = 19 L Q 2 (Md) = 25 L Q 3 = 27 L Q 3 = 31 L The interquartile range (IR) is: IR = Q 3 – Q 1 = 27 – 23 =4 L IR = Q 3 – Q 1 = 31 – 19 = 12 L
There is one other measure that is used to determine the degree of dispersion of the data values from a group. We already have defined variance and standard deviation. The third device is called MEAN DEVIATION. This calculation is similar to standard deviation in that its units are the same as the data values given. But it is a little simpler than standard deviation formula. Observe the 2 formulas: Absolute value of This is the value of the difference with the negative sign removed if it is present. There are 3 differences between these 2 formulas. What are they? No square root in the formula for mean deviation. Mean deviation formula divides by n instead of (n-1).
25 25 - 25 0 0 23 23 - 25 -2 2 30 30 - 25 5 5 19 19 - 25 -6 6 35 35 - 25 10 10 27 27 - 25 2 2 15 15 - 25 -10 10 25 25 - 25 0 0 17 17 - 25 -8 8 31 31 - 25 6 6 14 14 - 25 -11 11 20 20 - 25 -5 5 33 33 - 25 8 8 25 25 - 25 0 0 36 36 - 25 11 11 = 84
Mr. White had carpeting installed in each of the 10 units in his 2 apartment buildings located on Stone Street. The following table shows the surface area covered in each unit. a) Calculate the standard deviation of the Building 1 (m 2) Building 2 (m 2) data collected for each building. 48 36 Building 1: _____ 38 42 Building 2: _____ 12 37 b) Using standard deviation for each 22 28 building, determine for which building the 35 45 carpet area differs the most from one unit to the next. Explain. 24 36 42 24 _________________ 36 46 _________________ 52 39 _________________ 15 48 c) Would the variance have allowed you to draw the same conclusions? Why? _________________________________
- Slides: 15