Statistics A Review Student Presentations 2018 Inverse Normal
























































- Slides: 56
Statistics A Review Student Presentations 2018
Inverse Normal - - - - - Percentile to Z-Score The inverse normal distribution is a way to take a known probability and work back to finding the Z-Score. The calculator function for this is inv. Norm(), and is found when you click the 2 nd key then the “vars” key, and it should be third option in the list. To do this, use the inv. Norm key, put your percentile as a decimal as the first number, then 0 and 1. This will give you the Z-Score inv. Norm(Percentile, 0, 1)
Normal CDF - - - Z-Score to Percentile The normal cumulative distribution function is a way to find the probability that a z score is within a certain range. The calculator function for this is normalcdf(), and is found when you click the 2 nd key then the “vars” key, and it should be the second option in the list. To do this, use the normalcdf key, put your lower bound as the first number, upper bound as the second number, then 0 and 1. This will give you the percentile in decimal form. normalcdf(Lower. Bound, Upper. Bound, 0, 1)
TI-BASIC programs for calculators PROGRAM: PTOZ PROGRAM: ZTOP : Disp “PERCENTILE” : Input A : A/100→D : inv. Norm(D, 0, 1)→E : Disp “THE Z SCORE IS” : Disp E : Disp “LOWER BOUND” : Input A : Disp “UPPER BOUND” : Input B : normalcdf(A, B, 0, 1)→E : E*100→F : Disp “IT IS IN THE” : Disp F : Disp “PERCENTILE”
Practice If you are in the 68 th percentile, what Z-Score do you have? What about the 34 th percentile? 99. 7 th? If the lower bound is -9, 999 and the upper bound is 3, what percent of the graph is within that range? What about a lower bound of 9, 999 and an upper bound of 1. 5?
Answers 0. 4677 99. 865% -0. 4125 6. 681% 2. 7478
Z-Score is defined as a method to tell how many standard deviations a statistic is away from the mean. Ex: if the length of a squirrel’s tail has a z-score of 2. 6, it would be 2. 6 standard deviations larger than the average squirrel’s tail.
An In-Depth Example This is a bell curve representing IQ scores in the UK (2014). If we wanted to see how many standard deviations someone with an IQ of 120 is away from the mean, we’d use the z-score. 120 - 100 / 15 = 1. 333 standard deviations X (one value) Standard deviation average (Z-scores can be positive if they’re above the mean or negative if they’re below the mean. )
Questions 1. If a student takes a test and gets 88%, what is his z-score compared to his other classmates if the average test score is 82% and the standard deviation is 5? 2. If another student in that same class had a score of 74% on the test, find his z-score. 3. Why would we need to use z-scores?
Answers 1. 88 - 82 / 5 = 1. 2 standard deviations away from the mean 2. 74 - 82 / 5 = -1. 6 standard deviations away from the mean 3. We can use z-scores to analyze how individuals compare to one another in a group. It is effective for the test score example stated previously so we can see how those students compared to the rest of the class. This could be useful for a teacher in this case so they could see how much information the students are retaining and so on.
Mean, Median, Mode Mean: used to derive the central tendency of the data in question. It is determined by adding all the data points in a population and then dividing the total by the number of points. Median: a simple measure of central tendency. To find the median, we arrange the values in order from smallest to largest value. If there is an odd number of values, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. Mode: the most frequently occurring number found in a set of numbers. The mode is found by collecting and organizing data in order to count the frequency of each result. The result with the highest number of occurrences is the mode of the set.
Examples Mean: 12+13+8+4+2+23+16= 78/7 = 11. 1 Median: 1, 1, 2, 4, 6, 8, 8, 9, 11, 13, 15, 16, 17, 19, 21 Mode: 1, 6, 7, 9, 3, 1, 4, 5, 2, 6, 8, 0, 3, 1, 7, 9, 5, 2, 1, 0, 5 Mode= 1 Frequency: 1(4), 2(2), 3 (2), 4(1) 5(3), 6(2), 7(2) 8(1), 9(2)
Sample Questions 1. Find Mean, Median, and Mode 1, 2, 3, 6, 9, 3, 1, 9, 3, 6, 4, 7, 8, 3, 8, 9, 1 2. If the data is skewed should you use the mean or median to find the central tendency? 3. If there is a large number of data entries and no outliers should you use mean or median to represent the central tendency? Answers on next page
Answers 1. Mean: 4. 88 Median: 4 Mode: 3 2. Median 3. Mean
Skewed Left Vs. Skewed Right When one tail of a histogram extends further out than the other, the histogram is said to be skewed to that side. Skewed Left ● ● When one data point is much lower than the mean/median, the left side stretches further out than the right. The median remains the same, but the mean is pulled to the left. Skewed Right ● ● When one data point is much higher than the mean/median, the right side stretches further out than the left. The median remains the same, but the mean is pulled to the right.
Normal Distribution Mean and Median are the same number. Each side is equal in length (distribution in symmetrical).
Distances Traveled During Spring Break 2017 Vacation Distance (Miles) 937 703 613 2007 1213 2550 1371 2994 1001 951 1467 1952 628 3796 1478 1614 3796 The two outliers at 3796 miles pull the data (mean) to the right, the histogram is skewed right.
Questions 1. Which side is this graph skewed to? 3. ) Which side is this histogram skewed to? 4. ) What affects skewness the most? A. Mean B. Mode C. Outliers / High or Low Numbers D. The Category
Answers 1. Which side is this graph skewed to? 3. ) Right 1. Which side is this histogram skewed to? Right Which side is this graph skewed to? Right 4. ) What affects skewness the most? C. Outliers / High or Low Numbers
Interquartile Range (IQR) The IQR is the middle 50% of a set of data, and is the difference between Q 1 and Q 3. It is represented as the ‘box’ of a box and whisker plot. Dominic Kares
Range The range is the difference between the minimum and maximum of a data set, and encapsulates the entirety of the data. It could be represented on many graphs, but on a box and whisker plot it is the distance between the ends of the whiskers, or the distance between the farthest outliers. Dominic Kares
Questions 1) Find the range from this set of data: 38, 27, 74, 83, 27, 26, 76, 64, 16, 72 1) Find the IQR from the following five number summary: Min- 8 Q 1 - 14 Median- 19 Q 2 - 27 Max- 36 1) When there are several far-reaching outliers, is it better to use IQR or range? 1) When a data set condensed, the IQR and range are fairly (similar/different)
Answers 1) 67 1) 13 1) IQR 1) Similar
Correlation Coefficients and Regressions ● ● ● A Correlation Coefficient is a number between − 1 and +1 calculated to represent the linear dependence of two variables. Correlation Coefficients are symbolized by the letter ‘r’ or ‘r-value’ Correlation Coefficients are determined by graphed data like this: ○ Correlation Coefficients are positive if there is positive correlation and negative if there is negative correlation ○ The closer that the value is to 1 or -1 the more correlation it represents ○ Correlation Coefficients can only be determined with two quantitative values ○ A Regression is a measure of the relation between the mean value of one variable and corresponding values of other variables
Example Height (inches) Weight (pounds) 73 175 74 190 67 155 68 140 This data has a positive correlation so that the r-value is more than 0 and it is highly correlated, so it is likely closer to 1 than to 0. The actual correlation coefficient is r = 0. 78136. 135 You 70 can find the correlation coefficent on a calculator by follwoing the follwowing steps: ● Turn on “Dia. Gnostic. On” under the “Catalog” section of your calculator ● Insert the values and now you will be able to see the “r” and the “r^2” under “Lin. Reg”
Questions 1. 2. 3. 4. Height Weight 67 140 75 205 73 170 66 125 Is this 72 data Positively or Negatively Correlated? Is the r-value negative or positive? What is the r-value of the above data? Is the r-value above or below 0. 7? 175
Answers 1. 2. 3. 4. Positively Correlated Positive 0. 97059 Above
What are Outliers? An outlier is a point that falls more than 1. 5 times the interquartile range above third quartile or below the first quartile.
Example Using Data With Outliers As an example of data with outliers I did a survey of one of my classes. I asked everyone how old they are, and made conclusions from it.
Data Example With Outliers Ages of People Surveyed: 16 17 18 16 17 16 16 15 18 17 43 16
IQR (How to tell there’s an outlier) The IQR is calculated by Q 3 -Q 1 which gives us 1.
Finding The Outlier 1 times 1. 5 is equivalent To 1. 5, so outliers lay between 1. 5 below 16, or 1. 5 above 17. 43 is the only data value Considered an outlier.
Box Plot of The Data As you can see just from the box plot, there’s definitely an outlier. This would be the Data value of 43.
Example Questions If you did a survey on the ages of people in a 3 rd grade classroom, how would the age of the teacher affect the data?
Example Answer The teacher’s age would be an outlier in the data. Because it is an outlier it makes the mean a larger number, and it also makes the range bigger.
Example Questions For the data shown, Q 1=61, and Q 3=78. 5 Average Test Scores: 75 64 Using the data and Q 1 and Q 3, determine if there any outliers, and which data value(s) they are. 69 85 22 56 63 66 82 75 100 59
Example Answer 78. 5 -61= 17. 5 x 1. 5= 26. 25 61 -26. 25= 34. 75 78. 5+26. 25= 104. 75 Therefore yes, there is an outlier, and the data value is 22.
Example Questions Is the mean, median, or mode most affected by outliers?
Example Answer Outliers really have no effect on the median and mode, however, outliers can have effect on the mean making it larger, or maybe even smaller.
Box and whiskers Q 1 Q 2 (median) Min Q 3 Max Interquartile Range (IQR)
Step 1: Five Number Summary How to: On Calc: Stat Edit Enter your data points into L 1 Stat Calc 1 -Var Stats Make sure is says L 1 Example: - Min: 1. 8 Q 1: 4. 35 Median: 5. 6 Q 3: 7. 95 Max: 14. 1
Step 2: Plot Box and Whiskers How to: On Calc: 2 nd -> y= (stat plot) Plot 1 On -> box plot Zoom -> 9 Example:
Practice Questions 1. Make a box and whiskers plot for the following information. The GPA of high school seniors: 3. 8 2. 4 4. 1 1. 6 2. 5 3. 2 3. 7 2. 9 3. 4 4. 0 3. 2 3. 4 3. 7. 2. Looking at the following box and whiskers plot for the ages of students in a college class: a. Find the IQR. b. Is the data skewed right, skewed left, or a normal distribution?
Answers 1. a. b. IQR: 8 Normal distribution
Percentile Rank ● ● ● The percentile rank is the percentage of the scores that are equal to or less than a given score. To find the score of a certain person in a class: ○ Count how many score are equal to or less than that score but do not include the person’s score ○ Then find how many scores are before it in and put that over how many total students are in the class ○ Multiply that fraction by 100 and it give you the percentile rank That score will tell you how someone did in relation to the other scores in the class but not if they did bad of good
Percentile Rank Examples What is Brooke’s percentile rank? ● ● 9 scores are below her score 9/11 x 100= 81. 8 is Brooke’s percentile rank What is Julie’s percentile rank? ● ● 6 scores are below or equal to her score 6/11 x 100= 54. 5 is Julie’s percentile rank What is Tim’s percentile rank? ● ● 2 scores are below his score 2/11 x 100= 18. 2 is Tim’s percentile rank John 50 Ella 89 Tim 77 Emma 93 Sam 84 Jake 98 Brooke 97 Izzy 63 Chris 80 Nick 82 Julie 84
Questions 1. 1. What is a Lab’s percentile rank? If a Shih Tzu is in the bottom of the ranking for this group of dogs. Does that mean he is in the bottom for all dogs? If someones percentile rank is high in the class does that mean they are doing good or bad? Type of Dog Weight German shepherd 67 Lab 91 Husky 74 Poodle 44 Great dane 107 Beagle 41 Pug 36 Shih Tzu 18 Bulldogs 55
Answers 1) 2) 3) Lab weighs 91 pounds a) 8/10 x 100= 80 No because it is just based on that particular group of dogs if it was a group of little dogs the shih tzu might be in one of the highest percentiles. It means they are doing good based on that particular group but that may just mean that they are doing better than the other students in the class which the other students could be doing really bad
Standard Deviation (σ): a quantity calculated to indicate the spread of the data. Top 5 Fastest Dog Breeds Speed Greyhound 43 mph Vizsla 40 mph Sakuki 40 mph Jack Russell 38 mph Dalmatian 37 mph By Hand: First you should find the mean of the data and then find the distance all the other numbers are from the mean. Next, take those distances and square them then find the mean between the data and square root the answer to find the standard deviation. By Calculator: Plug the data into List 1 and then go to Stat, Calc, 1 -Var Stats, make sure the list is List 1, hit Calculate, and the answer is under �� x, in this case it is 2. 059 Madi Brown
Questions 1. Charlie has a standard deviation of 1. 6 for goals scored during a hockey game and Bob has a standard deviation of 3. 8 for goals scored during a hockey game. Which person is a more consistent player? Why? 1. Jenny has a standard deviation of 5. 9 for cars bought per year and Jeff has a standard deviation of 9. 6 for the times he visited the store per year. Does Jeff having a standard deviation of 9. 6 mean he has more variability?
Answers 1. Charlie is more consistent because his standard deviation is lower when compared to Lisa’s standard deviation. The lower the standard deviation, the closer the number is to the mean. The topics are the same, so the standard deviation can be compared. 1. No, these standard deviations can’t be compared because they are different topics. Jenny might work for a car dealership and sell an average of 167 cars per year and Jeff might visit the store 1, 532 times per year.
Normal Distribution ● ● ● ● a function that represents the distribution of many random variables as a symmetrical bell -shaped graph. The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from the mean: 68% of the data falls within one standard deviation of the mean. 95% of the data falls within two standard deviations of the mean. 99. 7% of the data falls within three standard deviations of the mean. ● ● ● a function that represents the distribution of many random variables as a symmetrical bellshaped graph. The empirical rule tells you what percentage of your data falls within a certain number of standard deviations from the mean: 68% of the data falls within one standard deviation of the mean. 95% of the data falls within two standard deviations of the mean. 99. 7% of the data falls within three standard deviations of the mean.
Heights of Females Barb 5´ 3¨ 63 in Lisa 5´ 4¨ 64 in Katy Charlie Jenny Miranda Nancy Charlotte Amy Carol 56. 61 5´ 7¨ 67 in 6´ 1¨ 5´ 8¨ 5´ 2¨ 5´ 5¨ 5´ 3¨ Subtract 3. 03 from the mean 73 in 68 in 62 in 65 in 63 in 5´ 6¨ 66 in 5´ 6¨ 59. 64 62. 67 65. 7 68. 73 71. 76 66 in Mean: 65. 7 Standard Deviation: 3. 03 74. 79 Add 3. 03 to the mean
The Normal Distribution has: ● mean = median = mode ● symmetry about the center ● 50% of values less than the mean ● a and 50% greater than the mean
Questions? 1. Assuming this data is normally distributed can you calculate the mean and standard deviation? The mean is halfway between 1. 1 m and 1. 7 m 1. 95% is 2 standard deviations either side of the mean (a total of 4 standard deviations) so what would 1 standard deviation be? 1. How far is 1. 85 from the mean? 1. How many standard deviations is that? The standard deviation is 0. 15 m
Answers 1. 2. Mean = (1. 1 m + 1. 7 m) / 2 = 1. 4 m 1 standard deviation = (1. 7 m-1. 1 m) / 4 = 0. 6 m / 4 3. 4. = 0. 15 m It is 1. 85 - 1. 4 = 0. 45 m from the mean 0. 45 m / 0. 15 m = 3 standard deviations