Introduction to Descriptive Statistics and Frequency Tables Descriptive
Introduction to Descriptive Statistics and Frequency Tables Descriptive Statistics 1
Descriptive Statistics Methods of collecting, organizing, summarizing, and presenting data in an informative way. Descriptive Numerical Graphical 2
Data Distribution � The distribution of a variable tells us what values it takes and how often it take these values. � Can be visualized as the shape of a table or graph. � Can then be summarized numerically
Visualizing Data: Goals � Using a picture to display the data will help us see patterns. � Some methods be more appropriate for certain types of data � Different visual representations capture different aspects in the data � The picture must show the distribution: ◦ Record the data values. ◦ Indicate the frequency (count) of the data values.
Numerically Summarizing data � Summarize the data with just a few measures � Pick most appropriate descriptors of: � Any observations that stick out ◦ “Typical Value”, central tendency, or center ◦ Variation or variability 5
Basic Data Visualization tools � Remember, The graph we use depends on the type of data we are displaying � (Relative) Frequency Tables � Categorical Data � Quantitative Data ◦ Both Types ◦ Pie Charts ◦ Bar Chart ◦ ◦ ◦ Stem-and-leaf Plots Dotplots Histograms Boxplots Time Series Plots 6
(Relative) Frequency Tables �A Frequency distribution (Table) – shows how data is partitioned, or falls, among several classes (Bins) by listing the categories along with the number of observations in each of them. � Quick � Can easy way to organize our data organize Categorical or Quantitative Data 7
Frequency/Relative Frequency �
Categorical Frequency Table � The following Relative Frequency table gives information on college majors at a particular university.
Bin or Class Information Definitions: � � � Lower and Upper Class Limits For the class 30 – 39: ◦ 30 is the lower class limit ◦ 39 is the upper class limit For ages of adults 20 - 69, an intuitive set of bins is: 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 Bin Width: Difference between consecutive lower class limits ◦ For the class 30 – 39, the class width = 40 – 30 = 10. ◦ Note the class width is NOT the difference between upper and lower class limits for a single class. Class Midpoint: The values in the middle of the classes. Sometimes found by adding the lower limit and upper limit, then dividing by 2. ◦ Ex. For the class 30 – 39, the class midpoint = (30 + 39)/2
Good Practices Concerning Binning � There is no "best" way of selecting bins for quantitative data, however in general bins should: ◦ ◦ Not overlap. Not have any gaps between them. Have the same width Cover the range of the data. � The class limits and width should be reasonable numbers. � How many bins should we use? ◦ A good place to start is the square root of your number of observations ◦ Generally 5 -20 ◦ Too few bins will hide detail ◦ Too many bins spread the data out too far
Quantitative Frequency Table example � � Using the a data set of Compression Strength (PSI)” of golf clubs let’s create the “ideal” table. Considerations: ◦ ◦ � N=80 Range = 245 – 76 = 169 # of bins? Sqrt(80) = 8. 9 Trial class width = 169/8. 9 = 18. 9 Decisions: ◦ ◦ Number of classes = 9 Class width = 20 Range of classes = 20 * 9 = 180 Starting point = 70 12
Categorical Frequency Distribution Example � Consider the following sample of student’s blood types � Create � Most a frequency distribution software can easily handle categorical data Blood Type OB+ I don't remember. O+ I don't remember. AA+ O+ B+ I don't remember. O+ I don't remember. BOO+ I don't remember. 13
Frequency Distribution in Minitab � Stat -> Tables - > Tally Individual values � Double click Variable � Choose items you want to show ◦ In this case Blood Type � You can also have Minitab store the table in your spread sheet if you choose 14
Quantitative Frequency Distribution Example � Consider this data set of the highest temperature recorded in each of the 50 states � Create a frequency distribution � We can do this in mostly any software State Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri High Temp (F) 112 100 128 120 134 114 106 110 109 112 100 118 117 116 118 121 114 105 109 107 112 115 118 State Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming High Temp (F) 117 118 125 106 110 122 108 110 121 113 120 119 111 104 113 120 117 107 110 118 112 114 115 15
Building a Quantitative Frequency Distribution results � Considerations: � Decisions: ◦ N=50 ◦ Max = 134 Min=100 ◦ Range = 34 ◦ Number of classes = 7 ◦ Class width = 5 ◦ Starting point = 100 Frequency Table of High Temperatures (F) Class Frequency Cumulative Frequency Relative Frequency Cumulative Relative Frequency 100 ≤ x < 105 3 3 0. 06 105 ≤ x < 110 8 11 0. 16 0. 22 110 ≤ x < 115 16 27 0. 32 0. 54 115 ≤ x < 120 13 40 0. 26 0. 8 120 ≤ x < 125 7 47 0. 14 0. 94 125 ≤ x < 130 2 49 0. 04 0. 98 130 ≤ x < 135 1 50 0. 02 1 50 1 16
- Slides: 16