Basic Statistics 1 Basic Statistics Measures of Location
Basic Statistics 1
Basic Statistics • Measures of Location, or Central Tendency – Mean, Median, Mode • Measures of Variation, or Dispersion, Spread – Range, Variance and Standard Deviation VV 2
Basic Statistics is the science of collecting, analyzing, interpreting and presenting data. Central tendency refers to the middle point of a distribution Central tendency helps to state one value that would best capture and communicate the distribution MEAN (M) • • • The mean is a simple arithmetic average How to compute Mean: Add all the data points and divide by the total number N M= X 1+X 2+…………. Xn / N M= X/N Where indicates the summation of X 1…. Xn MEDIAN (Md) The median is the point that divides the distribution into two equal parts such that an exactly equal number of scores fall above and below the point (50 th percentile) How to compute Median: Median = the( (n+1)/2) th item in the a data array (Midpoint value in a data set arranged from the lowest to highest) • the data set contains an odd number of values, the median is the middle value • If the data set contains an even number of values, the median is the average value of the two middle values - where n= Number of items in the data array MODE (Mo) The mode is the most frequently occurring value VV 3
Basic Statistics • Population: The total set of units (for example, people, parts, measurements, invoices, days, …) about which we would like information. • Sample: A subset of the population, usually much smaller and drawn randomly from the population, yielding data values that we can analyze “Practically, Graphically, and Analytically. ” • Inference is the process of looking at characteristics of the sample data values like mean or variation, to make conclusions about the likely values of the corresponding characteristics in the population. ( N is population size) (n is sample size) VV 4
Basic Statistics The measures of variability: – The Range – Variance (already explained earlier) – Standard Deviation Range is the difference between the highest and lowest observed values – Range (R) = Value of highest observation - Value of lowest observation – Range is easy to understand to find – Its usefulness as measure of dispersion is limited because it considers only the highest and lowest values and ignores the variation among of other observations – The range is more sensitive to outliers than the variance. VV 5
Basic Statistics • The Variance ( ) as already explained is the Average Squared Deviation of each data point from the Mean. • The Standard Deviation ( ) is the Square Root of the Variance. – The most common and useful measure of variation is the standard deviation. – The Variance for a sum or difference of two variables is found by adding both variances VV 6
Normal Curve Basic Statistics Mean Definition: – A probability distribution where the most frequently occurring value is in the middle and other probabilities tail off symmetrically in both directions. – Bell shaped and characterized by mean (measure of central tendency) and standard deviation (measure of dispersion or variability) – Area under the curve is 1 which is the probability of all outcomes Characteristics: n n n Curve theoretically does not reach zero. Curve can be divided in half with equal pieces falling either side of the most frequently occurring value. A normal curve indicates random or chance variation. The peak of the curve represents the center of the process. The area under the curve represents virtually 100% of the process is capable of producing VV 7
Graphical Analysis • Some ways to analyze data graphically • • Dot plots Scatter plots Box plots Histograms VV 8
Graphical Analysis Dot Plots: To display variation in a process. Quick graphical comparison of two or more processes • Shows location and spread of data and it also identifies outliers • For plotting this, put a horizontal axis for the measurements and place a dot for each data point and when the dots are very close, it can be stacked. VV 9
Graphical Analysis Box Plot: (Tukey) The Box plot helps in visually comparing the averages and variability of two or more data sets. It also identifies outliers. A Box (Whisker) Plot is an easy way of comparing the mean, quartiles, and outliers of different data sets Aids in understanding the distribution of the data and to get a quick, graphical comparison of two processes. Outlier * Maximum observation 75 th percentile Median (50 th percentile) 25 th percentile Minimum observation VV 10
Graphical Analysis Scatter Plot: Shows the relationship between one variable and another Visually indicate strength of correlation Communicate positive/negative sense of correlation To Plot: – Collect more paired samples – Draw diagram: Horizontal axis, “cause” variable Vertical axis, “effect” variable – Plot the data – Analyze data to determine correlation (straight line or tight clusters indicate strong relationship) VV 11
Graphical Analysis Histogram: Depicts distribution and shape of data, gaps and outliers Characteristics of a Bar Chart and counts the number of data points in specified intervals To Plot: – – – – Count the number of data points Determine the range (R) for entire set Divide range value into classes (K) Determine class width (H) (H=R/K) Determine class boundary, end point: (1 st point = lowest data value + H) Construct frequency table based on values computed in previous step Construct a histogram based on frequency table VV 12
VOP • Voice of the Process (VOP) is the use of control charts and subgrouping techniques to judge how a process is performing. • Voice of the Process – Control Charts • Control charts allow our processes to “speak” to us by showing variability in data and a graphical representation of that variability and how distributions are formed • Statistical process control (SPC) - based on numerical (variable) measurements that become a picture of the process over time • All processes are subject to variability • Natural causes : Random variations • Assignable causes: correctable (Lack of infrastructure, Skill level) • When is a process said to be under control ? – When the variability in the quality characteristics is due to natural causes only VV 13
SPC - Control Charts • A control chart is a graphic recording to monitor, control and improve process performance by studying variation and its source • Focuses attention on detecting and monitoring variation over time • Distinguishes special from common cause variation • Serves as a tool for on going control of a process • Helps to improve a process to perform consistently and predictably • Provides a common language for discussing process performance • Why use control chart? • • It is simple to use Enables synthesis, visualizations and interpretation of data Enables to a large extent about what is going to happen Helps in taking correct decisions at the right time VV 14
SPC - Control Charts • What are the types of Control Charts? – There are two main categories of Control Charts, those that display variable data and those which displays attribute data. • Variables Data: – This category of Control Chart displays values resulting from the measurement of a continuous variable. Examples of variables data are elapsed time, temperature etc • Attribute Data: – This category of Control Chart displays data that result from counting the number of occurrences or items in a single category of similar items or occurrences. These “count” data may be expressed as pass/fail, yes/no, or presence/absence of a defect VV 15
SPC - Control Charts • Variable control charts: - X-Bar & R charts (Averages & Range) - X-Bar & S charts (Averages and Standard Deviation) - I & MR charts (Individuals & Moving Range) • Attribute control charts: - P or NP charts (Proportion or number of non-conforming units) - C or U charts (Count or proportion of defects) VV 16
Selecting the Type of Control Charts Type of Data Variable data (Continuous Measurement) Attribute Data (Count or Classification) Defects Defectives Variable subgroup size Constant subgroup size Variable subgroup size C chart U chart NP chart number of defects per unit number of defective units Proportion of defective units Constant subgroup size, c >5 VV Subgroup size =1 Subgroup size is small (usually 3 to 5) Subgroup size is large (usually > 10) I-MR chart X and S chart 17
SPC - Control Charts Control Chart - The type varies based on the type of data (Variable and attribute data) If a point lies outside the control limits, this deviation could be due to special causes UCL and LCL are the control limits (+ or – 3 sigma Values) which is 3 process standard deviations above or below the average UCL Assignable (Special) Cause Variation Process Average LCL ± 3 Natural Variation (Common) Causes Common causes - variation that are inherent in a process over time. They affect every outcome of the process and everyone working in the process Special causes - process variation that is not inherent in the process. Defined by Shewhart as a fleeting event, not systemically related. VV 18
- Slides: 18