Calculating and Interpreting the ShapiroWilk Statistic Using statistics

  • Slides: 21
Download presentation
Calculating and Interpreting the Shapiro–Wilk Statistic Using statistics in small-scale language education research Jean

Calculating and Interpreting the Shapiro–Wilk Statistic Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014

Exploring and reporting the attributes of a set of interval-scale or interval-like data �

Exploring and reporting the attributes of a set of interval-scale or interval-like data � Calculate � Create and report descriptive statistics. and review a histogram. * � Calculate and interpret the Shapiro–Wilk statistic. *a. k. a. frequency distribution © Taylor & Francis 2014

Here’s a set of 21 interval-scale scores from a test with a total of

Here’s a set of 21 interval-scale scores from a test with a total of 15 points. Student # Score 1 st 4 12 th 13 2 nd 5 13 th 13 3 rd 7 14 th 13 4 th 8 15 th 14 5 th 8 16 th 14 6 th 9 17 th 14 7 th 9 18 th 15 8 th 10 19 th 15 9 th 10 20 th 15 10 th 10 21 st 15 11 th 13 © Taylor & Francis 2014

The descriptive statistics are… � Mean = 11. 14286 � Median = 13 �

The descriptive statistics are… � Mean = 11. 14286 � Median = 13 � Mode = 13 and 15 � Range = 11 points � Standard deviation = 3. 42927 © Taylor & Francis 2014

The histogram © Taylor & Francis 2014

The histogram © Taylor & Francis 2014

� The descriptive statistics give a sense of. . . ◦ central tendency ◦

� The descriptive statistics give a sense of. . . ◦ central tendency ◦ dispersion � The histogram gives a sense of. . . ◦ the general shape of the distribution ◦ the possibility of outlier scores © Taylor & Francis 2014

And… � In Parametric Statistics Land. . . ◦ Researchers believe their data will

And… � In Parametric Statistics Land. . . ◦ Researchers believe their data will match the normal distribution model. � The hypothesis that one of these researchers would propose is: ◦ Null hypothesis: The data are (probably) normally distributed. © Taylor & Francis 2014

But… How likely is it that the scores are normally distributed? The Shapiro–Wilk statistic

But… How likely is it that the scores are normally distributed? The Shapiro–Wilk statistic Tests that hypothesis! © Taylor & Francis 2014

Using R to check the hypothesis � Enter the data. >mydata = c(4, 5,

Using R to check the hypothesis � Enter the data. >mydata = c(4, 5, 7, 8, 8, 9, 9, 10, 10, 13, 13, 14, 14, 15, 15) © Taylor & Francis 2014

� Calculate descriptive statistics. (Remember how? ) >summary >subset (table (mydata), table(mydata)==max (table(mydata))) >sd

� Calculate descriptive statistics. (Remember how? ) >summary >subset (table (mydata), table(mydata)==max (table(mydata))) >sd > maximum score – minimum score © Taylor & Francis 2014

� Make a histogram. >hist (mydata, col = “orange”, breaks = 10) © Taylor

� Make a histogram. >hist (mydata, col = “orange”, breaks = 10) © Taylor & Francis 2014

Calculate the Shapiro–Wilk statistic � Calculate the Shapiro–Wilk statistic. >shapiro. test (mydata) Shapiro–Wilk normality

Calculate the Shapiro–Wilk statistic � Calculate the Shapiro–Wilk statistic. >shapiro. test (mydata) Shapiro–Wilk normality test data: mydata W = 0. 9002, p-value = 0. 03527 © Taylor & Francis 2014

In the Shapiro–Wilk output… � The observed value of the Shapiro–Wilk statistic is: W

In the Shapiro–Wilk output… � The observed value of the Shapiro–Wilk statistic is: W = 0. 9002 � The exact probability of the outcome, W = 0. 9002, is: p-value = 0. 03527 © Taylor & Francis 2014

But… What does this mean? —are the data probably normally distributed or not? ©

But… What does this mean? —are the data probably normally distributed or not? © Taylor & Francis 2014

� For the Shapiro–Wilk statistic: ◦ If p is more than. 05, we can

� For the Shapiro–Wilk statistic: ◦ If p is more than. 05, we can be 95% certain that the data are normally distributed. (In other words, the null hypothesis is probably true. ) ◦ If p is less than. 05, we can be 95% certain that the data are not normally distributed. (In other words, the null hypothesis is probably false. ) © Taylor & Francis 2014

� Oh, p = 0. 03527 is less than. 05. ◦ The null hypothesis

� Oh, p = 0. 03527 is less than. 05. ◦ The null hypothesis is probably not true. ◦ I can be 95% certain that it isn’t true! ◦ The data are probably not normally distributed. © Taylor & Francis 2014

� Check homework practice problem #19 from Chapter Two. The null hypothesis: The data

� Check homework practice problem #19 from Chapter Two. The null hypothesis: The data are (probably) normally distributed. � Enter the data. >spanish. vocab = c(41, 33, 32, 29, 27, 26, 24, 19, 18, 17, 14) © Taylor & Francis 2014

� shapiro. test (spanish. vocab) Shapiro–Wilk normality test data: spanish. vocab W = 0.

� shapiro. test (spanish. vocab) Shapiro–Wilk normality test data: spanish. vocab W = 0. 958, p-value = 0. 7225 © Taylor & Francis 2014

� The observed value of the Shapiro–Wilk statistic is: W = 0. 958 �

� The observed value of the Shapiro–Wilk statistic is: W = 0. 958 � The exact probability of the observed value, W = 0. 958, is: p-value = 0. 7225 © Taylor & Francis 2014

I’m reminding myself… � For the Shapiro–Wilk statistic: ◦ If p is more than.

I’m reminding myself… � For the Shapiro–Wilk statistic: ◦ If p is more than. 05, we can be 95% certain that the data are normally distributed. (In other words, the null hypothesis is probably true. ) ◦ If p is less than. 05, we can be 95% certain that the data are not normally distributed. (That is, the null hypothesis is probably false. ) © Taylor & Francis 2014

� For the Spanish data, p =. 7725, which is greater than . 05.

� For the Spanish data, p =. 7725, which is greater than . 05. ◦ The null hypothesis is probably true. ◦ I can be 95% certain the hypothesis is true. ◦ The data probably are normally distributed. © Taylor & Francis 2014