Classification of Variables Discrete Numerical Variable A variable

Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process. home back next

Classification of Variables Continuous Numerical Variable A variable that produces a response that is the outcome of a measurement process. home back next

Classification of Variables Categorical Variables that produce responses that belong to groups (sometimes called “classes”) or categories. home back next

Measurement Levels • • Nominal and Ordinal Levels of Measurement refer to data obtained from categorical questions. A nominal scale indicates assignments to groups or classes. Ordinal data indicate rank ordering of items. home back next

Frequency Distributions A frequency distribution is a table used to organize data. The left column (called classes or groups) includes numerical intervals on a variable being studied. The right column is a list of the frequencies, or number of observations, for each class. Intervals are normally of equal size, must cover the range of the sample observations, and be non-overlapping. home back next

Construction of a Frequency Distribution n Rule 1: Intervals (classes) must be inclusive and non-overlapping; Rule 2: Determine k, the number of classes; Rule 3: Intervals should be the same width, w; the width is determined by the following: Both k and w should be rounded upward, possibly to the next largest integer. home back next

Construction of a Frequency Distribution Quick Guide to Number of Classes for a Frequency Distribution Sample Size Fewer than 50 50 to 100 over 100 Number of Classes 5 – 6 classes 6 – 8 classes 8 – 10 classes home back next

Cumulative Frequency Distributions A cumulative frequency distribution contains the number of observations whose values are less than the upper limit of each interval. It is constructed by adding the frequencies of all frequency distribution intervals up to and including the present interval. home back next

Relative Cumulative Frequency Distributions A relative cumulative frequency distribution converts all cumulative frequencies to cumulative percentages home back next

Histograms and Ogives A histogram is a bar graph that consists of vertical bars constructed on a horizontal line that is marked off with intervals for the variable being displayed. The intervals correspond to those in a frequency distribution table. The height of each bar is proportional to the number of observations in that interval. home back next

Histograms and Ogives An ogive, sometimes called a cumulative line graph, is a line that connects points that are the cumulative percentage of observations below the upper limit of each class in a cumulative frequency distribution. home back next

Histogram and Ogive for Example 2. 1 home back next

Stem-and-Leaf Display A stem-and-leaf display is an exploratory data analysis graph that is an alternative to the histogram. Data are grouped according to their leading digits (called the stem) while listing the final digits (called leaves) separately for each member of a class. The leaves are displayed individually in ascending order after each of the stems. home back next

Stem-and-Leaf Display for Gilotti’s Deli Example home back next

Tables - Bar and Pie Charts Frequency and Relative Frequency Distribution for Top Company Employers Example home back next

Tables - Bar and Pie Charts Figure 2. 9 Bar Chart for Top Company Employers Example home back next

Tables - Bar and Pie Charts Figure 2. 10 Pie Chart for Top Company Employers Example home back next

Pareto Diagrams A Pareto diagram is a bar chart that displays the frequency of defect causes. The bar at the left indicates the most frequent cause and bars to the right indicate causes in decreasing frequency. A Pareto diagram is use to separate the “vital few” few from the “trivial many. ” many. home back next

Line Charts A line chart, also called a time plot, is a series of data plotted at various time intervals. Measuring time along the horizontal axis and the numerical quantity of interest along the vertical axis yields a point on the graph for each observation. Joining points adjacent in time by straight lines produces a time plot. home back next

Line Charts home back next

Parameters and Statistics A statistic is a descriptive measure computed from a sample of data. A parameter is a descriptive measure computed from an entire population of data. home back next

Measures of Central Tendency - Arithmetic Mean - The arithmetic mean of a set of data is the sum of the data values divided by the number of observations. home back next

Sample Mean If the data set is from a sample, then the sample mean, , is: home back next

Population Mean If the data set is from a population, then the population mean, , is: home back next

Measures of Central Tendency - Median - An ordered array is an arrangement of data in either ascending or descending order. Once the data are arranged in ascending order, the median is the value such that 50% of the observations are smaller and 50% of the observations are larger. home back next

Measures of Central Tendency - Median If the sample size n is an odd number, the median, Xm, is the middle observation. If the sample size n is an even number, the median, median Xm, is the average of the two middle observations. The median will be located in the 0. 50(n+1)th ordered position home back next

Measures of Central Tendency - Mode - The mode, if one exists, is the most frequently occurring observation in the sample or population. home back next

Shape of the Distribution The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the mean. In a symmetric distribution the mean and median are equal. home back next

Shape of the Distribution A distribution is skewed if the observations are not symmetrically distributed above and below the mean. A positively skewed (or skewed to the right) distribution has a tail that extends to the right in the direction of positive values. A negatively skewed (or skewed to the left) distribution has a tail that extends to the left in the direction of negative values. home back next

Shapes of the Distribution home back next

Measures of Central Tendency - Geometric Mean The Geometric Mean is the nth root of the product of n numbers: The Geometric Mean is used to obtain mean growth over several periods given compounded growth from each period. home back next

Measures of Variability - The Range - The range is in a set of data is the difference between the largest and smallest observations home back next

Measures of Variability - Sample Variance - The sample variance, s 2, is the sum of the squared differences between each observation and the sample mean divided by the sample size minus 1. home back next

Measures of Variability - Short-cut Formulas for s 2 Short-cut formulas for the sample variance, s 2, are: home back next

Measures of Variability - Population Variance The population variance, 2, is the sum of the squared differences between each observation and the population mean divided by the population size, N. home back next

Measures of Variability - Sample Standard Deviation - The sample standard deviation, s, is the positive square root of the variance, and is defined as: home back next

Measures of Variability - Population Standard Deviation- The population standard deviation, , is home back next

The Empirical Rule (the 68%, 95%, or almost all rule) • • • For a set of data with a mound-shaped histogram, the Empirical Rule is: approximately 68% of the observations are contained with a distance of one standard deviation around the mean; 1 approximately 95% of the observations are contained with a distance of 2 standard deviations around the mean; 2 almost all of the observations are contained with a distance of three standard deviation around the mean; 3 home back next

Coefficient of Variation The Coefficient of Variation, CV, is a measure of relative dispersion that expresses the standard deviation as a percentage of the mean (provided the mean is positive). The sample coefficient of variation is home back next

Coefficient of Variation The population coefficient of variation is home back next

Percentiles and Quartiles Data must first be in ascending order. Percentiles separate large ordered data sets into 100 ths. The Pth percentile is a number such that P percent of all the observations are at or below that number. Quartiles are descriptive measures that separate large ordered data sets into four quarters. home back next

Percentiles and Quartiles The first quartile, Q 1, is another name for the 25 th percentile The first quartile divides the ordered data such that 25% of the observations are at or below this value. Q 1 is located in the. 25(n+1)st position when the data is in ascending order. That is, home back next

Percentiles and Quartiles The third quartile, Q 3, is another name for the 75 th percentile The first quartile divides the ordered data such that 75% of the observations are at or below this value. Q 3 is located in the. 75(n+1)st position when the data is in ascending order. That is, home back next

Interquartile Range The Interquartile Range (IQR) measures the spread in the middle 50% of the data; that is the difference between the observations at the 25 th and the 75 th percentiles: home back next

Five-Number Summary The Five-Number Summary refers to the five descriptive measures: minimum, first quartile, median, third quartile, and the maximum. home back next

Box-and-Whisker Plots A Box-and-Whisker Plot is a graphical procedure that uses the Five-Number summary. A Box-and-Whisker Plot consists of • an inner box that shows the numbers which span the range from Q 1 Box-and-Whisker Plot to Q 3. • a line drawn through the box at the median. The “whiskers” are lines drawn from Q 1 to the minimum vale, and from Q 3 to the maximum value. home back next

Box-and-Whisker Plots (Excel) home back next

Grouped Data Mean For a population of N observations the mean is Where the data set contains observation values m 1, m 2, . . . , mk occurring with frequencies f 1, f 2, . . . f. K respectively home back next

Grouped Data Mean For a sample of n observations, the mean is Where the data set contains observation values m 1, m 2, . . . , mk occurring with frequencies f 1, f 2, . . . f. K respectively home back next

Grouped Data Variance For a population of N observations the variance is Where the data set contains observation values m 1, m 2, . . . , mk occurring with frequencies f 1, f 2, . . . f. K respectively home back next

Grouped Data Variance For a sample of n observations, the variance is home back next