Exploratory Data Analysis Frequency Distributions and Percentiles Statistics
Exploratory Data Analysis, Frequency Distributions and Percentiles Statistics 9/16/2020 Week 2 1
The Data 5 20 6 19 18 17 7 8 16 15 17 8 15 14 10 11 15 11 12 14 15 13 12 11 11 12 13 11 10 9 9/16/2020 Week 2 2
Exploratory Data Analysis • A set of procedures for arranging and displaying numbers that allow the researcher to quickly organize, summarize, and interpret data collected from a research project. • Perhaps the most basic thing I have to say is…look at the data. Always, always plot up the data to see what the number say (Robert Bolles 1988, p. 83) 9/16/2020 Week 2 3
Exploratory Data Analysis • You should always look at data to see what trends and patterns are evident. • Some statistics can be misleading. • REMEMBER-Garbage in, garbage out!! 9/16/2020 Week 2 4
Goals of EDA • Organization, summarization, and interpretation • To know the characteristics of the data researchers collect – Measures of central tendency-The value or values that best represent the entire data set. – Measures of dispersion-how spread out or dispersed the data are. – The shape of the distribution – Relation between variables 9/16/2020 Week 2 5
• Exploratory data analysis can never be the whole story, but nothing else can serve as the foundation stone--as the first step John Tukey 1977, p 3 9/16/2020 Week 2 6
Organizing the Data I • First-Order the data – Stem-and-Leaf Plots • A graphical means of organizing data that allows for exploration of the data and its distribution while retaining the information of the original scores. 9/16/2020 Week 2 7
Organizing the Data II • Frequency Distributions - An organized set of numbers representing the frequency of observations that fall within a specific category or class of a variable. • Class - a mutually exclusive and nominal category that represents elements or subjects that share a common characteristic. 9/16/2020 Week 2 8
Organizing the Data III • Qualitative data - Nominal scales • Quantitative data - Ordinal, interval, and ratio scales 9/16/2020 Week 2 9
Organizing the Data IV • To know the relative position of the score (value of variable) • Percentile rank-the percentage of cases in a comparison group that achieve a score at or below a given score • Cumulative frequency-The total number of scores in a frequency distribution that are within and below a specified class. 9/16/2020 Week 2 10
Organizing the Data V • Class width-The size of a class as determined by the difference between its upper and lower real limits. • Midpoint of a class-The number in the middle of the real limits of a class. – This number is considered to best represent the numbers in the class 9/16/2020 Week 2 11
Grouped Frequency Distributions • A frequency distribution in which the values of the variable have been grouped into classes. – To make computation easier – To make the display more comprehensible – Loss information • How many classes should I have? – Based on your own decisions and on how you want to interpret the data. 9/16/2020 Week 2 12
Creating the classes • Find the difference between the highest and the lowest scores and add 1 to obtain the total number of possible score values. • Divide this figure by the number of classes you want to use and round to a whole number. W • Add (W-1) to the minimum value of the lowest class to obtain the maximum score of the lowest class. • The nest higher class begins at the integer following the maximum score of the lower class. • Assign each obtained score to the class within which it is included. 9/16/2020 Week 2 13
The Shape of Distribution I • Frequency histogram—A form of bar graph used with interval or ratio-scaled frequency. – Used only for interval and ratio scales – Each point on the graph represents the midpoint of the class. • Use odd numbers to set the class width 9/16/2020 Week 2 14
The Shape of Distribution II • Normal distribution—A hypothetical frequency distribution with a characteristic bell -shaped form in which a large portion of scores is at or near the midpoint of the distribution. Skew 9/16/2020 Week 2 15
The Shape of Distribution III • Kurtosis—The degree to which scores are clustered about one common area of a distribution or spread throughout the distribution. – Lepotkurtic refers to distributions where the scores are clustered close to a common point. – Platykurtic referst to a distribution where the scores are spread across the distribution. – Mesokurtic refers to a normally distributed set of data. 9/16/2020 Week 2 16
The Shape of Distribution IV • Skew – The relative symmetry of a distribution of scores. – Positive skew refers to a distribution where the scores are clustered at the lower end of the scale, with a tail of scores at the upper end of the distribution. – Negative skew refers to a distribution where the scores are clustered at the upper end of the scale, with a tail of scores at the lower end of 9/16/2020 Week 2 17 the distribution.
- Slides: 17