Why Statistics Notes about behavior Ideas and ramblings
Why Statistics ? Notes about behavior Ideas and ramblings from Devore & Peck: Statistics
Why Stats? Three Reasons l To be informed l To understand issues and make decisions l The be able to evaluate decisions about your life and those who you may teach
Reason One: Being Informed l Our life is filled with data but most of that data comes in the form of sound bites. l We are not given sufficient details to make our own decisions l We are expected to “follow” l The average American is data ignorant l Not “understanding” data takes away our control of decision making
Reason One: Informed Consumer l To be in control of your decisions you must be able to: 1. 2. 3. Extract information from charts and graphs Follow the logic of numerical arguments Know the basic rules of how data should be gathered, summarized, and analyzed to draw valid “truthful” statistical conclusions.
Reason Two: Understanding and Making Decisions l Be able to decide if information is adequate and sufficient to make a decision 1. l l Know enough to challenge the data presented by virtue of “knowing” about data Analyze the data that is available. Assess assumptions inherent (built into) the type of data collected Draw conclusions & make decisions about the data Assess the risk of an incorrect decision
Reason Three: Life Decisions l Drug screening for work (or the Olympics): False positives and negatives l Criteria to define financial need. l Scores on state achievement exams l Data about teen accidents that affects the rate of payment l Probability of an incorrect medical diagnosis
Types of Data l Nominal – Naming l Ordinal – Ordering l Interval – Equal Intervals l Ratio – True Zero
Working with Interval and Ratio Data Remember Interval and Ratio data are the only two types of data that can be added, subtracted, multiplied, or divided. l Note about the use of symbols to indicate operations: l There are three symbols that mean “multiply” l x , , and ( ) thus 2 x 3, 2 3, and 2(3) l There are four symbols that mean “divide” / (or virgule), ) ― (vinculum) , ÷ (obelus), and (a ) closed parenthesis attached to a vinculum) l
Measures of Similarity l Measures of Central Tendency Mean 2. Median 3. Mode l Normal Curve –estimating the population from a sample 1.
Mean (Average or � (x-bar)) l� = the sum of scores divided by the number of scores l +5+7+3+6+4 = 25 total with 5 scores l 5/25 = �
Median (middle) l "Middle value" of a list. The smallest number such that at least half the numbers in the list are no greater than it. If the list has an odd number of entries, the median is the middle entry in the list after sorting the list into increasing order. If the list has an even number of entries, the median is equal to the sum of the two middle (after sorting) numbers divided by two l +5+7+3+6+4 change to … l +3+4+5+6+7 = find the middle = 5 True Middle = (5 (scores) + 1) ÷ 2 = 3 rd score, 5 (for odd numbers ONLY)
Mode (most common) For lists, the mode is the most common (frequent) value. A list can have more than one mode. For histograms, a mode is a relative maximum ("bump") l In this list (+5+7+3+6+4) there is no mode but in this list l +5+7+3+6+6+4 the most common number is 6 l X X X X 1 2 3 4 5 6 7
Population Estimates- Normal Curve l The normal curve (aka Bell curve) is an estimate of the population of all possible instances of an object, event or other entity 68. 26% 95. 44% 97. 74% of the entire population
Nature and the role of variability If all students at PHCC were invariable my job would be really easy! l Sadly, the students at PHCC exhibit high variability including age, education, socioeconomic status, self-assessment, and educational expectations - so I have lots of work in planning & preparing for class, anticipating prior knowledge, anticipating levels of understanding, and predicting the speed of information delivery. l
What is variability? l Variability refers to the spread of scores is about describing how the data (plural of datum) along the scale of measurement is organized. Data variability describes the ways in which the data are grouped
Range, Variance, and Deviation How do we see and tell about variability – the distance (or spread) of scores across the continuum of scores? l One way is the range. In the example 3+4+5+6+7 the range is 7 -3 = 4 l The � is not enough to describe the data. The following examples all have a � of 5: l 2+1+6+1+15 = 25/5 = � = 5 l 2+2+1+6+3+3+3+7+18 = 45/9 = � = 5 l
Data has a � of 5 l 3+4+5+6+7 = � = 5 Range = 7 -3 = 4 l 1+1+2+6+15 = 25/5 = � = 5 Range = 15 -1 = 14 l 1+2+2+3+3+3+6+7+18 = 45/9 = � = 5 Range = 18 -1 =17
Variability I Reason 1. Academic problems 2. Poor advising or teaching 3. Needed a break 4. Economic reasons 5. Family responsibilities 6. To attend another school 7. Personal problems 8. Other
Variance Another way to estimate variability is by calculating the variance. The variance is the sum of differences between each score and the �. The variance is squared so that any negative numbers do not counterbalance the positive numbers. l first calculate the mean of the scores, then measure the amount that each score deviates from the mean and then square that deviation (by multiplying it by itself). Numerically, the variance equals the average of the several squared deviations from the l mean High variance Low variance
Calculating Variance 1. 2. 3. 4. 5. First calculate the mean of the scores, then measure the amount that each score deviates from the mean then square that deviation (by multiplying it by itself). Add up all of the variance-squared scores Divide by the number of scores (5) l l l Variance = 10 / 5 = 2 +5 – 5 = 0 =02 = 0 +7 – 5 = 2 =22 = 4 +3 – 5 = -2 =-22=4 +6 – 5 = 1 =12 = 1 +4 – 5 = -1=-12 = 1 10
Another Variance l l l First calculate the mean l of the scores, l then measure the amount that each score l deviates from the mean l then square that deviation (by multiplying l it by itself). Add up all of the variance-squared scores Divide by the number of scores (5) 2 +2 – 5 =-3=-3 =9 2 +1 – 5 =-4=-4 =12 2 +6 – 5 =-1=-1 =1 2 +1 – 5 =-4=-4 =12 2 +15– 5 =10=10 =100 Variance = 134 / 5 = 26. 8 134
Standard Deviation Variance = 10 / 5 = 2 l Is simply the square S. D. or s = √ 2 = 1. 414214 root of the variance l Problem 1 Variance = 134 / 5 = 26. 8 S. D. or s = √ 26. 8 = 5. 176872 l Problem 2
Interquartile Range The distance from the 75 th percentile to the 25 th percentile in a group of scores. l As the median divides a data set in half, the quartiles divide the data set into fourths. Hence the second quartile, denoted Q 2, is the median. 1+2+2+3+3+3+6+7+18 l True Middle = (9 (scores) + 1) ÷ 2 = 5 th score, 3 True Middle of lower half = (4 scores) +1) ÷ 2 = 2. 5 True Middle of upper half = (4 scores) +1) ÷ 2 = 6. 5 Interquartile Range = Q 3 – Q 1 = 6. 5 – 2. 5 = 3 The Interquartile range ignores outlier numbers such as the 18 we are interested only in the data above and below Q 2. In the above example and do not include Q 2 in either score
Converted Measures Scores can be converted to a common denominator to provide equated comparisons between groups. l Z-scores (standard scores), percentile, and stanine scores are all converted to a common base so that comparisons between groups can be made. l l Percentiles Raw scores, or total of points a student earns on a tests, are converted into percentage values. There are two statistics used for this purpose: the percentile rank which is a number between 0 and 100 indicating the percent of cases in a norm group falling at or below that score. The percentile is a point on a scale of scores at or below which a given percent of the cases falls. For example, a child who scores at the 42 percentile , is doing as well as, or better than, 42 percent of the students who took the same test. l Percentiles are like quartiles, except that they divide the data set into 100 equal parts instead of four equal parts
Percentiles Explained The percentile for an observation x is found by dividing the number of observations less than x by the total number of observations and then multiplying this quantity by 100. Once you can calculate Percentitles, you can also determine Deciles and Quartiles. l The First Quartile = the 25 th Percentile The Second Quartile = the 50 th Percentile Third Quartile = the 75 th Percentile l Given 45 out of 50 students had test scores less than 80. Since 45/50 = 90%. If you had a score of 80, you were in the 90 th percentile 1+2+2+3+3+3+6+7+18 The percentile for a score of 6 = (6 ÷ 9) x 100 =. 66667 x 100 = 66. 66% So a score of 6 is higher that 66% of the other scores
Stanine Scores l Stanines The term stanine is derived from “standard nine” l Stanine scores range from 1 to 9 with 5 in the center. Except for 1 and 9, each stanine includes a band of scores one half a standard deviation wide. Thus stanine scores are standard scores with a mean of 5 and a standard deviation of 2. l Test scores are commonly expressed using these single-digit scores which can help students and parents visualize where someone falls on the test scale. l The National Stanine is a scale score that divides the scores of the norming sample into nine groups, ranging from a high of 9 to a low of 1. Stanine 1 -3 are generally considered below average, Stanine scores 4 -6 average, and Stanine 7 -9 above average. Stanine scores have a constant relationship to Percentiles; that is a given Percentile always falls into the same stanine. Stanine 5, for example, always includes Percentiles 41 -59.
Stanine Example Danville Montessori Third Grade CAT Scores (Total Battery) National Stanine Scale Score National Percentile 1998 -1999 -2000 -2001 7. 5 7. 2 7. 3 740. 1 725. 3 730. 4 95. 0 89. 0 *** 2001 -2002 -2003 -2004 -2005 7. 7 7. 4 8. 6 9. 0 743. 2 732. 4 757. 0 759. 0 98. 0 *** 97. 0 *** YEAR *** denotes less than ten students tested, therefore the National Percentile for the group is not computed.
Stanine Description l The middle stanine is the fifth one; it contains the middle 20% of the scores. Each stanine interval, except the first and last ones, spans half of a standard deviation. 1, 2 or 3 = "below average" 4, 5 or 6 = "average" 7, 8, or 9 = "above average"
Stanine Calculation l Stanine is calculated from a z-score l (2 x z-score) + 5 l A mean of 5 and a S. D. of 2
Standard Score l When a set of scores are converted to z- scores, the scores are said to be standardized and are referred to as standard scores. Standard scores have a mean of 0 and a standard deviation of 1.
Stats Interpretation Summary
Variability II Estimate of number of each color of M & M’s in a large bag
M & M’s Variability I Q: What is the percentage of each color in "M&M's" Chocolate Candies? l A: On average, "M&M's" Plain Chocolate Candies and our new "M&M's" Mint Chocolate Candies contain 30% browns, 20% each of yellows and reds and 10% each of oranges, greens, and blue. For "M&M's" Peanut Chocolate Candies, the ratio is 20% each of browns, yellows, reds, greens and oranges. We use the same ratio for our "M&M's" Peanut Butter and Chocolate l Total Brown Yellow Red 660 198 132 Orange Green 66 66 Blue 66
M & M’s Variability II Estimate of colors present in large bag 30% 20% 10% 10%
Count Your M & M’s Use Excel to graph your counts – convert to %. Does it match what the company says? 12 11 10 9 8 7 6 5 4 3 2 1 red blue yellow green brown orange
Homework l Find the mean, median, & mode for your M & M’s & for the total group l Find the Variance and the standard deviation your M & M’s & for the total group l Convert your groups M & M’s to Percentiles l Answer the question- How does your sample vary from the total group sample
Homework Format Results Description of M & M population ( the % estimates from Mars Company) 1. The results (central tendency)…. 2. The variability (range, variance, and standard deviation) 3. Most (color) fell in the 90 th percentile, while (and so forth) 4. * provide charts as necessary Description of Sample 1. Within the present sample (do the same as above except for using the sample’s statistics) 2. * Provide charts as necessary 3. Summary 1. Summarize your results by comparing your sample to the population
- Slides: 37