STATISTICS PART 1 The science of learning from



























- Slides: 27
STATISTICS (PART 1)
� The science of learning from data. � Numerical facts � Collection of methods for planning experiments, obtaining data and organizing, analyzing, interpreting and drawing the conclusions or making a decision
� Population: A collection, or set, of individuals or objects or events whose properties are to be analyzed. � Sample: A subset of the population. Population Sample � Element: Entities on which data are collected. � Observation: Value of variable for an element. � Data Set: A collection of observation on one or more variables.
� Ungrouped data – Data that has not been organized into groups. Also called as raw data. Data Frequency 2 8 3 4 5 6 7 7 8 2 9 5 Data Frequency 2– 4 5 5– 7 6 8 – 10 10 11 – 13 8 14 – 16 4 17 – 19 3 � Grouped data - Data that has been organized into groups (into a frequency distribution).
VARIABLES QUALITATIVE NOMINAL Example: gender, color ORDINAL Example: Pass/Fail, Good, Bad QUANTITATIVE DISCRETE CONTINUOUS Example: Counts- number of items/integers Example: Measurement. Length, weight
Identify each of the following examples as qualitative or quantitative variables. 1. The residence hall for each student in a statistics class. (qualitative) 2. The amount of gasoline pumped by the next 10 customers at the local Unimart. (quantitative ) 3. The amount of radon in the basement of each of 25 homes in a new development. (quantitative ) 4. The color of the baseball cap worn by each of 20 students. (qualitative) 5. The length of time to complete a mathematics homework assignment. (quantitative ) 6. The state in which each truck is registered when stopped and inspected at a weigh station. (qualitative
� Discrete data is data which can only take certain values, or can be counted. The number of people in a room can only be 1, 2, 3, … and not 1. 23, 1. 57, 10. 22. Example: - Number of car on a road - Number of children in a family � Continuous data cannot assume exact values but can assume any values between two given values. The data is acquired through the process of measuring. For example, the height 175 cm (correct to the nearest cm) could have arisen from any values in the range. - Weight of people - Speeds of motor boats at a particular part of a race - The times taken by each of student to run 100 m
Provide simple summaries about the sample and the measures STATISTICS Descriptive - Measurement of central tendency - Measurement of dispersion Trying to reach conclusion that extend beyond the immediate data alone Inferential - Confidence Interval - Hypothesis testing
Descriptive Statistics �A study on data summary or describes a collection, data organization (presentation of data in a more informative way such as graphical, diagrams and charts). � In general divided by two categories : - Data presentation (display) - Tabular - Charts/graphs 9
Inferential Statistics � Branch of statistics: using a sample to draw conclusions about a population (basic tool: probability). � Consists of generalizing from samples to population, performing estimations and hypothesis tests, determining relationships among variables, and making predictions. � Area statistics which are deal with decision making procedures. � Population – consists of all subjects (human or otherwise) that are being studied. � Sample – is a group of subjects selected from a population. 10
Constructing Frequency Distribution � When summarizing large quantities of raw data, it is often useful to distribute the data into classes. Weight 60 -62 63 -65 66 -68 69 -71 72 -74 Total Frequency 5 18 42 27 8 100 Weight of 100 male students in XYZ university � A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class.
� � � For quantitative data, an interval that includes all the values that fall within two numbers; the lower and upper class which is called class. Class is in first column for frequency distribution table. *Classes always represent a variable, non-overlapping; each value is belong to one and only one class. The numbers listed in second column are called frequencies, which gives the number of values that belong to different classes. Frequencies denoted by f. Table 6. 1: Weight of 100 male students in XYZ university Variable Third class (Interval Class) Weight 60 -62 63 -65 66 -68 69 -71 72 -74 Total Frequency 5 18 42 27 8 100 Frequency column Frequency of the third class.
� The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. � The difference between the two boundaries of a class gives the class width; also called class size. Formula: - Class Midpoint or Mark Class midpoint or mark = (Lower Limit + Upper Limit)/2 - Class Width / Class Size class width , c =Upper Limit– Lower Limit
Cumulative Frequency Distributions � � A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class. In cumulative frequency distribution table, each class has the same lower limit but a different upper limit. Table 6. 2: Class Limit, Class Boundaries, Class Width , Cumulative Frequency Weight (Class Interval) Number of Students, f Class Boundaries Cumulative Frequency 60 -62 5 59. 5 -62. 5 5 63 -65 18 62. 5 -65. 5 5 + 18 = 23 66 -68 42 65. 5 -68. 5 23 + 42 = 65 69 -71 27 68. 5 -71. 5 65 + 27 =92 72 -74 8 71. 5 -74. 5 92 + 8 = 100 TOTAL 100
Example 6. 9: From Table 6. 1: Class Boundary Weight (Class Interval) 60 -62 63 -65 66 -68 69 -71 72 -74 Total Class Boundary 59. 5 -62. 5 -65. 5 -68. 5 -71. 5 -74. 5 Frequency 5 18 42 27 8 100
� Measures of Central Tendency - Mean - Median - Mode � Measures of average are also called measures of central tendency and include the mean, median, mode, and midrange. of Dispersion - Variance - Standard deviation After know about average, you must know how the data values are dispersed. That is, do the data values cluster around the mean.
� Mean of a sample is the sum of the sample data divided by the total number sample. GROUPED DATA: When the data has been grouped into intervals and the mid-points of the intervals are denoted by
� Consider data set of weights of 30 students. Find the mean of grouped data. Weight(kg) Frequency (f) 20 -29 30 -39 40 -49 50 -59 60 -69 1 8 10 6 5 Answer: 46. 5 kg
� Median The median is the middle value of a set of numbers arranged in order of magnitude and normally is denoted by, GROUPED DATA: The median of frequency distribution data can be described as:
Find the median of the following data: 1. 2. Class 1 -5 6 -10 11 -15 16 -20 21 -25 26 -30 Total Class Frequency 2 4 9 7 5 3 30 Answer: 15. 5 1– 3 4– 6 7– 9 5 3 2 10 – 12 13 – 15 16 – 18 interval Frequency Answer: 11 1 6 4
� Mode The mode of a set of numbers is the value which occurs most often and denoted by , GROUPED DATA: The median of frequency distribution data can be described as: NOTE: Class with the highest frequency is called MODAL CLASS
Find the mode of the following data: 1. Class Frequency 1 -5 2 6 -10 4 11 -15 9 16 -20 7 Answer: 14. 07 21 -25 5 26 -30 3 Total 30 2. Class 1– 3 4– 6 5 3 7 – 9 10 – 12 13 – 15 16 – 18 interval Frequency Answer: 14. 64 2 1 6 4
� � � When the mean, median and mode are all equal, the distribution of the data set has a bell-shaped curve. The distribution is then said to be symmetric. If Mode < Median < Mean, then the distribution is said to be positive/right skewed, meaning there a few unusual large values. If Mean < Median < Mode, then the distribution is said to be negative/left skewed, that is there are some unusual small values.
� The standard deviation from the mean is used widely in statistics to indicate the measure of dispersion. Small standard deviation tells that most of the data is close to the mean. While large standard deviation shows that much of the data is far from the mean.
GROUPED DATA:
Example 6. 10 (Grouped data) Find the variance and standard deviation of the sample data below: Weight (Class Interval) Frequency, f Class Mark, x fx Cumulative Frequency, F Class Boundary 60 -62 63 -65 66 -68 69 -71 72 -74 5 18 42 27 8 61 64 67 70 73 305 1152 2814 1890 584 5 23 65 92 100 59. 5 -62. 5 -65. 5 -68. 5 -71. 5 -74. 5 Total 100 6745 Answer : s 2=8. 61; s=2. 93
Consider data set of weights of 30 students. Find the standard deviation. Weight(kg) 20 -29 30 -39 40 -49 50 -59 60 -69 Answer: Frequency (f) 1 8 10 6 5