On Average On Spread Frequency Distribution and Statistical
On Average On Spread Frequency Distribution and Statistical Parameters By Dr D D Basu Advisor, CSE Former Addl. Director, CPCB
Central Tendency and Spread: The Two key parameters of Data Analysis
Concept of Average � The purpose of an average is to represent a group of individual values in a simple and concise manner. � Average is to act as a representation. � The simplest average is called as “mean” meaning “centre’. All averages are known to statistician as measures as central tendencies. � Several types of mean are Ø The Arithmetic mean Ø The Weighted Arithmetic mean Ø The Geometric mean
THE ARITHMETIC MEAN The arithmetic mean, or briefly the mean, of a set of N numbers X₁ , X₂ , X₃ , ……. . XN is denoted by X (read “X bar”) and is defined as EXAMPLE. The arithmetic mean of the numbers 8, 3, 5, 12 and 10 is
If the numbers X 1, X 2, …. , XK occur f 1, f 2, …. , fk times, respectively (i. e. , occur with frequencies f 1, f 2, …. , fk), the arithmetic mean is Where N = ∑f is the total frequency (i. e. , the total number of cases) EXAMPLE. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4, and 1, respectively, the arithmetic mean is
THE WEIGHTED ARITHMETIC MEAN Sometimes we associate with the numbers X 1, X 2, …. , XK certain weighting factors (or weights) w 1, w 2, …. , w. K depending on the significance or importance attached to the numbers. In this case, Is called the weighted arithmetic mean. Note the similarity to equation (2), which can be considered a weighted arithmetic mean with weights f 1, f 2, …. , f. K. EXAMPLE. If a final examination in a course is weighted 3 times as much as a quiz and a student has a final examination grade of 85 and quiz grades of 70 and 90, the mean grade is
THE GEOMETRIC MEAN G The geometric mean G of a set of N positive numbers is the Nth root of the product of the numbers: EXAMPLE: Find the mean 6 and 54 Arithmetic Mean = (6+54) = 30 2 Geometric Mean = √ 6*54 = √ 324 = 18
Example Find the Mean of SOx values 13, 23, 12, 44, 55 measured in a city for 5 consecutive days, Arithmetic Mean = 13+23+12+44+55 = 29. 4 5 Geometric Mean = √ 13*23*12*44*55 = 24. 4
MEASURE OF DISPERSION How far is the location of data from mean X 1 - x is the distance of location x 1 from positive mean value X 2 - x is the distance of location x 2 from negative mean value |X - x| ignores the sign Mean deviation= Standard Deviation=
EXAMPLES OF STANDARD DEVIATION �Find the standard deviation s of each set of numbers in the problem a) Х=∑ X/N= (12+6+7+3+15+10+18+5)/8 = 76/8 = 9. 5 S= ∑ ( X – X)2 N � =√(12 -9. 5)²+(6 -9. 5)²+(7 -9. 5)²+(3 -9. 5)²+(15 -9. 5)²+(10 -9. 5)²+(18 -9. 5)²+(5 -9. 5)² 8 =√ 23. 75 = 4. 87
Concept of Frequency
Definition �The rate of which some thing occurs over a period of time frequency. Frequency is the number of occurrence of event per unit time.
Example �Dissolved oxygen were measured 20 time at a sampling point in river values are 5, 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3
Frequency table of DO at sampling point X of city Y Sr. No. Data Value Frequency 1 2 3 5 3 4 5 6 2 6 7 1 TOTAL 20
�Relative frequency: Relative frequency is the fraction or proportion of times an answer occurs �Cumulative Frequency: Is the accumulation previous frequency �Relative cumulative frequency is the accumulation of previous Relative cumulative frequency
Table: Frequency, Relative frequency, Cumulative frequency, Relative Cumulative frequency Sr. No. Data Value Frequency Relative Frequency Cumulative Frequency Relative Cumulative Frequency 1 2 3 3/20=0. 15 3 0. 15 2 3 5 5/20=0. 25 3+5=8 0. 15+0. 25 =0. 4 3 3/20=0. 15 3+5+3=11 0. 15+0. 4 =0. 55 4 5 6 6/20=0. 3 3+5+3+6 =17 0. 3+0. 55 =0. 85 5 6 2 2/20=0. 1 3+5+3+6+2 =19 0. 1+0. 85 =0. 95 6 7 1 1/20=0. 05 3+5+3+6+2+ 0. 05+0. 95 1=20 =1. 00 N=20
Graphical – Presentation of Cumulative Frequency
Graphical – Presentation of Relative Frequency
Frequency and Mean
Frequency and Standard Deviation
Calculation of Standard Deviation �Std. Deviation (S) = √ ∑ (X – X)2 n �Variance = S 2 = ∑ (X – X)2 n = ∑ X 2 – 2 X ∑ X + ∑ X 2 n n n But ∑ X = X, X is constant n = ∑ X 2 – 2 X 2 + X 2 = ∑ X 2 – X 2 n n
For Group Frequency S 2 = ∑ f (X – X)2 ∑f S = ∑ f (X – X)2 ∑f
Example Data value, x Frequency, f 2 3 12 3 5 45 4 3 48 5 6 150 6 2 72 1 7 49 ∑ f = 20 S= ∑ fx 2 – X 2 = ∑f 376 - (4. 1)2 20 ∑ fx 2 = 376 = 18. 8 – 16. 81 = 1. 99 = 1. 41
Group Frequency and Statistical Parameter
How to construct a frequency distribution �Particulate Matter (PM)
Data for PM at point X in City Y
Class Interval, Tally Mark and Frequency
1 312 7 12 813 2 13 313 7 13 814 2 14 314 7 14 815 2 15 315 7 15 816 2 16 316 7 16 817 2 17 317 7 2 12 812 2 11 Frequency PM Frequency Distribution 9 8 8 7 6 6 5 5 4 4 3 2 2 1 PM 2 1 0
PM (mg/NM 3) Frequency 118 - 126 3 127 - 135 5 136 - 144 9 145 - 153 12 154 - 162 5 163 - 171 4 172 - 180 2 Total 40
PM Frequency Distribution 14 12 12 Frequency 10 9 8 6 4 5 5 4 3 2 2 0 118 - 126 127 - 135 136 - 144 145 - 153 PM 154 - 162 163 - 171 172 - 180
Group Frequency and Mean Class Interval Class Boundaries Class Mid Mark Frequency X Mid Mark 118 - 126 117. 5 – 126. 5 122. 5 3 367. 5 127 - 135 126. 5 – 135. 5 131. 5 5 657. 5 136 - 144 135. 5 – 144. 5 140. 5 9 1264. 5 145 - 153 144. 5 – 153. 5 149. 5 12 1794 154 - 162 153. 5 – 162. 5 159. 5 5 797. 5 163 - 171 162. 5 – 171. 5 167. 5 4 670 172 - 180 171. 5 – 180. 5 176. 5 2 ∑ f = 40 ∑ f x / ∑ f = 5904 / 40 = 147. 6 353 ∑ f x = 5904
Definition on Range and Class Intervals � � Range – The differences between two extreme value i. e. maximum and minimum value is called the range. The range of observations of particulate matter value is 176 – 119 = 57. Class Interval – The overall range can be subdivided into number of smaller ranges which are called class intervals. The length of class intervals are usually equal (in the first case it is kept 5, in the second case it is kept 9).
How to choose Class Intervals �Generally for large sample, 20 class intervals are chosen i. e. class interval is R/20. For small sample, it is preferred as R/12. But by inspecting, we may decide the class interval. So in second case, it is kept 9. This is called tuning.
Definition of Class Mid marks �Class mid marks – Middle value of class interval is called the class mid marks.
Definition of Class Boundaries �Class boundaries – Frequency distribution is continuous phenomenon. Thus the value “ 127” may be located any where between 126. 5 to 127. 5 that is why class interval 127 – 135 imply class boundaries 126. 5 – 135. 5. Grouped frequency tables shall be developed with class boundaries so that class boundaries cover the whole range of observed values gap or overlap.
Standard Deviation Class Mid mark (x) Frequency (f) (x 2) f(x 2) 122. 5 03 15006. 25 45018. 75 131. 5 05 17292. 25 86461. 25 140. 5 09 19740. 25 177662. 3 149. 5 12 22350. 25 268203 159. 5 05 25440. 25 127201. 3 167. 5 04 28056. 25 112225 176. 5 02 31152. 25 62304. 5 = ∑ f(x 2) = 879076 Standard Deviation, S = 21786 ∑ f(x 2) – X 2 ∑f = 879076 – (147. 6)2 = 21977 40 = 191 = 13. 82
The median and percentiles Using Interpolation The weights in the frequency distribution of Table X are assumed to be continuously distributed. In such case the median is that weight for which half the total frequency (40/2 = 20) lies above it and half lies below it. Table X PM (mg/Nm 3) Frequency 118 - 126 3 127 - 135 5 136 - 144 9 145 - 153 12 154 - 162 5 163 - 171 4 172 - 180 2 Total 40
Now the sum of the first three class frequencies is 3+5+9 = 17. Thus to give the desired 20, we require three more of the 12 cases in the fourth class. Since the fourth class interval, 145 -153, actually corresponds to weights 144. 5 to 153. 5, the median must lie 3/12 of the way between 144. 5 and 153. 5; that is, the median is 1 st Quartile, is Now the sum of the 1 st two class frequency is 5+3= 8. Thus, to give desired 10, we require 2 more of the 9 cases in the 3 rd class. Since the 3 rd class interval is 136 to 144, actually corresponds to weight 135. 5 to 144. 5. The 1 st Quarter must lie 2/9 of the way between 135. 5 and 144. 5; that is the 1 st Quarter is
3 rd quartile is Now, sum of the 1 st 4 class frequency is 29. Thus, to give desired 30, we require 1 more of the 5 cases in the 5 th class. Since the 5 th class interval is 154 – 162, actually corresponds to weights 154. 5 to 162. 5. The 3 rd quarter must lie 1/5 of the way between 154. 5 to 162. 5; that is, the 3 rd quartile is
Mode Median Mean relation
Definition of Statistics: The science of producing unreliable facts from reliable figures. Evan Esar
- Slides: 41