Measure Phase Six Sigma Statistics Six Sigma Statistics

Six Sigma Statistics Welcome to Measure Process Discovery Six Sigma Statistics Basic Statistics Descriptive

Purpose of Basic Statistics The purpose of Basic Statistics is to: • Provide a

Statistical Notation – Cheat Sheet Summation An individual value, an observation The Standard Deviation

Parameters vs. Statistics Population: All the items that have the “property of interest” under

Types of Data Attribute Data (Qualitative) – Is always binary, there are only two

Discrete Variables Discrete Variable Possible values for the variable The number of defective needles

Continuous Variables Continuous Variable Possible Values for the Variable The length of prison time

Definitions of Scaled Data • Understanding the nature of data and how to represent

Nominal Scale Qualitative Variable Possible nominal level data values for the variable Blood Types

Ordinal Scale Qualitative Variable Possible Ordinal level data values Automobile Sizes Subcompact, intermediate, full

Interval Scale Interval Variable Possible Scores IQ scores of students in Black. Belt Training

Ratio Scale Ratio Variable Possible Scores Grams of fat consumed per adult in the

Converting Attribute Data to Continuous Data • Continuous Data is always more desirable •

Descriptive Statistics Measures of Location (central tendency) – Mean – Median – Mode Measures

Descriptive Statistics Open the MINITAB™ Project “Measure Data Sets. mpj” and select the worksheet

Measures of Location Mean is: • Commonly referred to as the average. • The

Measures of Location Median is: • The mid-point, or 50 th percentile, of a

Measures of Location Trimmed Mean is a: Compromise between the Mean and Median. •

Measures of Location Mode is: The most frequently occurring value in a distribution of

Measures of Variation Range is the: Difference between the largest observation and the smallest

Measures of Variation Standard Deviation is: Equivalent of the average deviation of values from

Measures of Variation Variance is the: Average squared deviation of each individual data point

Normal Distribution The Normal Distribution is the most recognized distribution in statistics. What are

The Normal Curve The normal curve is a smooth, symmetrical, bell-shaped curve, generated by

Normal Distribution Each combination of Mean and Standard Deviation generates a unique normal curve:

Normal Distribution The area under the curve between any 2 points represents the proportion

The Empirical Rule… -6 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5

Slides: 28

Download presentation

Measure Phase Six Sigma Statistics

Six Sigma Statistics Welcome to Measure Process Discovery Six Sigma Statistics Basic Statistics Descriptive Statistics Normal Distribution Assessing Normality Special Cause / Common Cause Graphing Techniques Measurement System Analysis Process Capability Wrap Up & Action Items OSSS LSS Green Belt v 9. 1 - Measure Phase 2 © Open. Source. Six. Sigma, LLC

Purpose of Basic Statistics The purpose of Basic Statistics is to: • Provide a numerical summary of the data being analyzed. – Data (n) • • • Factual information organized for analysis. Numerical or other information represented in a form suitable for processing by computer Values from scientific experiments. Provide the basis for making inferences about the future. Provide the foundation for assessing process capability. Provide a common language to be used throughout an organization to describe processes. Relax…. it won’t be that bad! OSSS LSS Green Belt v 9. 1 - Measure Phase 3 © Open. Source. Six. Sigma, LLC

Statistical Notation – Cheat Sheet Summation An individual value, an observation The Standard Deviation of sample data A particular (1 st) individual value The Standard Deviation of population data For each, all, individual values The variance of sample data The mean, average of sample data The variance of population data The grand mean, grand average The range of data The mean of population data The average range of data Multi-purpose notation, i. e. # of subgroups, # of classes A proportion of sample data A proportion of population data The absolute value of some term Sample size Greater than, less than Greater than or equal to, less than or equal to OSSS LSS Green Belt v 9. 1 - Measure Phase 4 Population size © Open. Source. Six. Sigma, LLC

Parameters vs. Statistics Population: All the items that have the “property of interest” under study. Frame: An identifiable subset of the population. Sample: A significantly smaller subset of the population used to make an inference. Population Sample Population Parameters: Sample Statistics: – Arithmetic descriptions of a population – µ, , P, 2, N OSSS LSS Green Belt v 9. 1 - Measure Phase 5 – Arithmetic descriptions of a sample – X-bar , s, p, s 2, n © Open. Source. Six. Sigma, LLC

Types of Data Attribute Data (Qualitative) – Is always binary, there are only two possible values (0, 1) • Yes, No • Go, No go • Pass/Fail Variable Data (Quantitative) – Discrete (Count) Data • Can be categorized in a classification and is based on counts. – Number of defects – Number of defective units – Number of customer returns – Continuous Data • Can be measured on a continuum, it has decimal subdivisions that are meaningful – Time, Pressure, Conveyor Speed, Material feed rate – Money – Pressure – Conveyor Speed – Material feed rate OSSS LSS Green Belt v 9. 1 - Measure Phase 6 © Open. Source. Six. Sigma, LLC

Discrete Variables Discrete Variable Possible values for the variable The number of defective needles in boxes of 100 diabetic syringes 0, 1, 2, …, 100 The number of individuals in groups of 30 with a Type A personality 0, 1, 2, …, 30 The number of surveys returned out of 300 mailed in a customer satisfaction study. 0, 1, 2, … 300 The number of employees in 100 having finished high school or obtained a GED 0, 1, 2, … 100 The number of times you need to flip a coin before a head appears for the first time 1, 2, 3, … (note, there is no upper limit because you might need to flip forever before the first head appears. OSSS LSS Green Belt v 9. 1 - Measure Phase 7 © Open. Source. Six. Sigma, LLC

Continuous Variables Continuous Variable Possible Values for the Variable The length of prison time served for individuals convicted of first degree murder All the real numbers between a and b, where a is the smallest amount of time served and b is the largest. The household income for households with incomes less than or equal to $30, 000 All the real numbers between a and $30, 000, where a is the smallest household income in the population The blood glucose reading for those individuals having glucose readings equal to or greater than 200 All real numbers between 200 and b, where b is the largest glucose reading in all such individuals OSSS LSS Green Belt v 9. 1 - Measure Phase 8 © Open. Source. Six. Sigma, LLC

Definitions of Scaled Data • Understanding the nature of data and how to represent it can affect the types of statistical tests possible. • Nominal Scale – data consists of names, labels, or categories. Cannot be arranged in an ordering scheme. No arithmetic operations are performed for nominal data. • Ordinal Scale – data is arranged in some order, but differences between data values either cannot be determined or are meaningless. • Interval Scale – data can be arranged in some order and for which differences in data values are meaningful. The data can be arranged in an ordering scheme and differences can be interpreted. • Ratio Scale – data that can be ranked and for which all arithmetic operations including division can be performed. (division by zero is of course excluded) Ratio level data has an absolute zero and a value of zero indicates a complete absence of the characteristic of interest. OSSS LSS Green Belt v 9. 1 - Measure Phase 9 © Open. Source. Six. Sigma, LLC

Nominal Scale Qualitative Variable Possible nominal level data values for the variable Blood Types A, B, AB, O State of Residence Alabama, …, Wyoming Country of Birth United States, China, other Time to weigh in! OSSS LSS Green Belt v 9. 1 - Measure Phase 10 © Open. Source. Six. Sigma, LLC

Ordinal Scale Qualitative Variable Possible Ordinal level data values Automobile Sizes Subcompact, intermediate, full size, luxury Product rating Poor, good, excellent Baseball team classification Class A, Class AAA, Major League OSSS LSS Green Belt v 9. 1 - Measure Phase 11 © Open. Source. Six. Sigma, LLC

Interval Scale Interval Variable Possible Scores IQ scores of students in Black. Belt Training OSSS LSS Green Belt v 9. 1 - Measure Phase 100… (the difference between scores is measurable and has meaning but a difference of 20 points between 100 and 120 does not indicate that one student is 1. 2 times more intelligent ) 12 © Open. Source. Six. Sigma, LLC

Ratio Scale Ratio Variable Possible Scores Grams of fat consumed per adult in the United States OSSS LSS Green Belt v 9. 1 - Measure Phase 0… (If person A consumes 25 grams of fat and person B consumes 50 grams, we can say that person B consumes twice as much fat as person A. If a person C consumes zero grams of fat per day, we can say there is a complete absence of fat consumed on that day. Note that a ratio is interpretable and an absolute zero exists. ) 13 © Open. Source. Six. Sigma, LLC

Converting Attribute Data to Continuous Data • Continuous Data is always more desirable • In many cases Attribute Data can be converted to Continuous • Which is more useful? – 15 scratches or Total scratch length of 9. 25” – 22 foreign materials or 2. 5 fm/square inch – 200 defects or 25 defects/hour OSSS LSS Green Belt v 9. 1 - Measure Phase 14 © Open. Source. Six. Sigma, LLC

Descriptive Statistics Measures of Location (central tendency) – Mean – Median – Mode Measures of Variation (dispersion) – – Range Interquartile Range Standard deviation Variance OSSS LSS Green Belt v 9. 1 - Measure Phase 15 © Open. Source. Six. Sigma, LLC

Descriptive Statistics Open the MINITAB™ Project “Measure Data Sets. mpj” and select the worksheet “basicstatistics. mtw” OSSS LSS Green Belt v 9. 1 - Measure Phase 16 © Open. Source. Six. Sigma, LLC

Measures of Location Mean is: • Commonly referred to as the average. • The arithmetic balance point of a distribution of data. Stat>Basic Statistics>Display Descriptive Statistics…>Graphs… >Histogram of data, with normal curve Sample Population Descriptive Statistics: Data Variable N N* Mean SE Mean St. Dev Minimum Q 1 Median Q 3 Data 200 0 4. 9999 0. 000712 0. 0101 4. 9700 4. 9900 5. 0000 5. 0100 Variable Maximum Data 5. 0200 OSSS LSS Green Belt v 9. 1 - Measure Phase 17 © Open. Source. Six. Sigma, LLC

Measures of Location Median is: • The mid-point, or 50 th percentile, of a distribution of data. • Arrange the data from low to high, or high to low. – It is the single middle value in the ordered list if there is an odd number of observations – It is the average of the two middle values in the ordered list if there an even number of observations Descriptive Statistics: Data Variable N N* Mean SE Mean St. Dev Minimum Q 1 Median Q 3 Data 200 0 4. 9999 0. 000712 0. 0101 4. 9700 4. 9900 5. 0000 5. 0100 Variable Maximum Data 5. 0200 OSSS LSS Green Belt v 9. 1 - Measure Phase 18 © Open. Source. Six. Sigma, LLC

Measures of Location Trimmed Mean is a: Compromise between the Mean and Median. • The Trimmed Mean is calculated by eliminating a specified percentage of the smallest and largest observations from the data set and then calculating the average of the remaining observations • Useful for data with potential extreme values. Stat>Basic Statistics>Display Descriptive Statistics…> Trimmed Mean Descriptive Statistics: Data Variable N N* Mean SE Mean Tr. Mean St. Dev Minimum Q 1 Median Data 200 0 4. 9999 0. 000712 4. 9999 0. 0101 4. 9700 4. 9900 5. 0000 Variable Q 3 Maximum Data 5. 0100 5. 0200 OSSS LSS Green Belt v 9. 1 - Measure Phase 19 © Open. Source. Six. Sigma, LLC

Measures of Location Mode is: The most frequently occurring value in a distribution of data. Mode = 5 OSSS LSS Green Belt v 9. 1 - Measure Phase 20 © Open. Source. Six. Sigma, LLC

Measures of Variation Range is the: Difference between the largest observation and the smallest observation in the data set. • A small range would indicate a small amount of variability and a large range a large amount of variability. Descriptive Statistics: Data Variable N N* Mean SE Mean St. Dev Minimum Q 1 Median Q 3 Data 200 0 4. 9999 0. 000712 0. 0101 4. 9700 4. 9900 5. 0000 5. 0100 Variable Maximum Data 5. 0200 Interquartile Range is the: Difference between the 75 th percentile and the 25 th percentile. Use Range or Interquartile Range when the data distribution is Skewed. OSSS LSS Green Belt v 9. 1 - Measure Phase 21 © Open. Source. Six. Sigma, LLC

Measures of Variation Standard Deviation is: Equivalent of the average deviation of values from the Mean for a distribution of data. A “unit of measure” for distances from the Mean. Use when data are symmetrical. Population Sample Descriptive Statistics: Data Variable N N* Mean SE Mean St. Dev Minimum Q 1 Median Q 3 Data 200 0 4. 9999 0. 000712 0. 0101 4. 9700 4. 9900 5. 0000 5. 0100 Variable Maximum Data 5. 0200 Cannot calculate population Standard Deviation because this is sample data. OSSS LSS Green Belt v 9. 1 - Measure Phase 22 © Open. Source. Six. Sigma, LLC

Measures of Variation Variance is the: Average squared deviation of each individual data point from the Mean. Sample OSSS LSS Green Belt v 9. 1 - Measure Phase Population 23 © Open. Source. Six. Sigma, LLC

Normal Distribution The Normal Distribution is the most recognized distribution in statistics. What are the characteristics of a Normal Distribution? – Only random error is present – Process free of assignable cause – Process free of drifts and shifts So what is present when the data is Non-normal? OSSS LSS Green Belt v 9. 1 - Measure Phase 24 © Open. Source. Six. Sigma, LLC

The Normal Curve The normal curve is a smooth, symmetrical, bell-shaped curve, generated by the density function. It is the most useful continuous probability model as many naturally occurring measurements such as heights, weights, etc. are approximately Normally Distributed. OSSS LSS Green Belt v 9. 1 - Measure Phase 25 © Open. Source. Six. Sigma, LLC

Normal Distribution Each combination of Mean and Standard Deviation generates a unique normal curve: “Standard” Normal Distribution – Has a μ = 0, and σ = 1 – Data from any Normal Distribution can be made to fit the standard Normal by converting raw scores to standard scores. – Z-scores measure how many Standard Deviations from the mean a particular data-value lies. OSSS LSS Green Belt v 9. 1 - Measure Phase 26 © Open. Source. Six. Sigma, LLC

Normal Distribution The area under the curve between any 2 points represents the proportion of the distribution between those points. The area between the Mean and any other point depends upon the Standard Deviation. m x Convert any raw score to a Z-score using the formula: Refer to a set of Standard Normal Tables to find the proportion between μ and x. OSSS LSS Green Belt v 9. 1 - Measure Phase 27 © Open. Source. Six. Sigma, LLC

The Empirical Rule… -6 -5 -4 -3 -2 -1 +1 +2 +3 +4 +5 +6 68. 27 % of the data will fall within +/- 1 standard deviation 95. 45 % of the data will fall within +/- 2 standard deviations 99. 73 % of the data will fall within +/- 3 standard deviations 99. 9937 % of the data will fall within +/- 4 standard deviations 99. 999943 % of the data will fall within +/- 5 standard deviations 99. 9999998 % of the data will fall within +/- 6 standard deviations OSSS LSS Green Belt v 9. 1 - Measure Phase 28 © Open. Source. Six. Sigma, LLC