Calculating and Interpreting the ShapiroWilk Statistic Using statistics

Exploring and reporting the attributes of a set of interval-scale or interval-like data �

Here’s a set of 21 interval-scale scores from a test with a total of

The descriptive statistics are… � Mean = 11. 14286 � Median = 13 �

� The descriptive statistics give a sense of. . . ◦ central tendency ◦

And… � In Parametric Statistics Land. . . ◦ Researchers believe their data will

But… How likely is it that the scores are normally distributed? The Shapiro–Wilk statistic

Using R to check the hypothesis � Enter the data. >mydata = c(4, 5,

� Calculate descriptive statistics. (Remember how? ) >summary >subset (table (mydata), table(mydata)==max (table(mydata))) >sd

� Make a histogram. >hist (mydata, col = “orange”, breaks = 10) © Taylor

Calculate the Shapiro–Wilk statistic � Calculate the Shapiro–Wilk statistic. >shapiro. test (mydata) Shapiro–Wilk normality

In the Shapiro–Wilk output… � The observed value of the Shapiro–Wilk statistic is: W

But… What does this mean? —are the data probably normally distributed or not? ©

� For the Shapiro–Wilk statistic: ◦ If p is more than. 05, we can

� Oh, p = 0. 03527 is less than. 05. ◦ The null hypothesis

� Check homework practice problem #19 from Chapter Two. The null hypothesis: The data

� shapiro. test (spanish. vocab) Shapiro–Wilk normality test data: spanish. vocab W = 0.

� The observed value of the Shapiro–Wilk statistic is: W = 0. 958 �

I’m reminding myself… � For the Shapiro–Wilk statistic: ◦ If p is more than.

� For the Spanish data, p =. 7725, which is greater than . 05.

Slides: 21

Download presentation

Calculating and Interpreting the Shapiro–Wilk Statistic Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014

Exploring and reporting the attributes of a set of interval-scale or interval-like data � Calculate � Create and report descriptive statistics. and review a histogram. * � Calculate and interpret the Shapiro–Wilk statistic. *a. k. a. frequency distribution © Taylor & Francis 2014

Here’s a set of 21 interval-scale scores from a test with a total of 15 points. Student # Score 1 st 4 12 th 13 2 nd 5 13 th 13 3 rd 7 14 th 13 4 th 8 15 th 14 5 th 8 16 th 14 6 th 9 17 th 14 7 th 9 18 th 15 8 th 10 19 th 15 9 th 10 20 th 15 10 th 10 21 st 15 11 th 13 © Taylor & Francis 2014

The descriptive statistics are… � Mean = 11. 14286 � Median = 13 � Mode = 13 and 15 � Range = 11 points � Standard deviation = 3. 42927 © Taylor & Francis 2014

The histogram © Taylor & Francis 2014

� The descriptive statistics give a sense of. . . ◦ central tendency ◦ dispersion � The histogram gives a sense of. . . ◦ the general shape of the distribution ◦ the possibility of outlier scores © Taylor & Francis 2014

And… � In Parametric Statistics Land. . . ◦ Researchers believe their data will match the normal distribution model. � The hypothesis that one of these researchers would propose is: ◦ Null hypothesis: The data are (probably) normally distributed. © Taylor & Francis 2014

But… How likely is it that the scores are normally distributed? The Shapiro–Wilk statistic Tests that hypothesis! © Taylor & Francis 2014

Using R to check the hypothesis � Enter the data. >mydata = c(4, 5, 7, 8, 8, 9, 9, 10, 10, 13, 13, 14, 14, 15, 15) © Taylor & Francis 2014

� Calculate descriptive statistics. (Remember how? ) >summary >subset (table (mydata), table(mydata)==max (table(mydata))) >sd > maximum score – minimum score © Taylor & Francis 2014

� Make a histogram. >hist (mydata, col = “orange”, breaks = 10) © Taylor & Francis 2014

Calculate the Shapiro–Wilk statistic � Calculate the Shapiro–Wilk statistic. >shapiro. test (mydata) Shapiro–Wilk normality test data: mydata W = 0. 9002, p-value = 0. 03527 © Taylor & Francis 2014

In the Shapiro–Wilk output… � The observed value of the Shapiro–Wilk statistic is: W = 0. 9002 � The exact probability of the outcome, W = 0. 9002, is: p-value = 0. 03527 © Taylor & Francis 2014

� For the Shapiro–Wilk statistic: ◦ If p is more than. 05, we can be 95% certain that the data are normally distributed. (In other words, the null hypothesis is probably true. ) ◦ If p is less than. 05, we can be 95% certain that the data are not normally distributed. (In other words, the null hypothesis is probably false. ) © Taylor & Francis 2014

� Oh, p = 0. 03527 is less than. 05. ◦ The null hypothesis is probably not true. ◦ I can be 95% certain that it isn’t true! ◦ The data are probably not normally distributed. © Taylor & Francis 2014

� Check homework practice problem #19 from Chapter Two. The null hypothesis: The data are (probably) normally distributed. � Enter the data. >spanish. vocab = c(41, 33, 32, 29, 27, 26, 24, 19, 18, 17, 14) © Taylor & Francis 2014

� shapiro. test (spanish. vocab) Shapiro–Wilk normality test data: spanish. vocab W = 0. 958, p-value = 0. 7225 © Taylor & Francis 2014

� The observed value of the Shapiro–Wilk statistic is: W = 0. 958 � The exact probability of the observed value, W = 0. 958, is: p-value = 0. 7225 © Taylor & Francis 2014

I’m reminding myself… � For the Shapiro–Wilk statistic: ◦ If p is more than. 05, we can be 95% certain that the data are normally distributed. (In other words, the null hypothesis is probably true. ) ◦ If p is less than. 05, we can be 95% certain that the data are not normally distributed. (That is, the null hypothesis is probably false. ) © Taylor & Francis 2014

� For the Spanish data, p =. 7725, which is greater than . 05. ◦ The null hypothesis is probably true. ◦ I can be 95% certain the hypothesis is true. ◦ The data probably are normally distributed. © Taylor & Francis 2014