Chapter 4 Displaying and Summarizing Quantitative Data CHAPTER
- Slides: 95
Chapter 4 Displaying and Summarizing Quantitative Data CHAPTER OBJECTIVES At the conclusion of this chapter you should be able to: n 1) Construct graphs that appropriately describe quantitative data n 2) Calculate and interpret numerical summaries of quantitative data. n 3) Combine numerical methods with graphical methods to analyze a data set. n 4) Apply graphical methods of summarizing data to choose appropriate numerical summaries. n 5) Apply software and/or calculators to automate graphical and numerical summary procedures.
Displaying Quantitative Data Histograms Stem and Leaf Displays
Relative frequency Relative Frequency Histogram of Exam Grades. 30. 25. 20. 15. 10. 05 0 40 50 60 70 80 Grade 90 100
Frequency Histogram
Histograms A histogram shows three general types of information: n It provides visual indication of where the approximate center of the data is. n We can gain an understanding of the degree of spread, or variation, in the data. n We can observe the shape of the distribution.
All 200 m Races 20. 2 secs or less
Histograms Showing Different Centers
Histograms Showing Different Centers (football head coach salaries)
Histograms Same Center, Different Spread (football head coach salaries)
0 369480 821544. 6154 1273609. 231 1725673. 846 2177738. 462 2629803. 077 3081867. 692 3533932. 308 3985996. 923 4438061. 538 4890126. 154 5342190. 769 5794255. 385 6246320 6698384. 615 7150449. 231 7602513. 846 8054578. 462 8506643. 077 8958707. 692 9410772. 308 9862836. 923 10314901. 54 10766966. 15 11219030. 77 11671095. 38 12123160 12575224. 62 13027289. 23 13479353. 85 13931418. 46 14383483. 08 14835547. 69 15287612. 31 15739676. 92 16191741. 54 16643806. 15 17095870. 77 17547935. 38 More Frequency Excel Example: 2012 -13 NFL Salaries Histogram 1000 900 800 700 600 500 400 300 200 100 Bin
Statcrunch Example: 2012 -13 NFL Salaries
Grades on a statistics exam Data: 75 66 77 66 64 73 91 65 59 86 61 58 70 77 80 58 94 78 62 79 83 54 52 45 82 48 67 55
Frequency Distribution of Grades Class Limits 40 up to 50 Frequency 2 50 up to 60 6 60 up to 70 8 70 up to 80 7 80 up to 90 5 90 up to 100 2 Total 30
Relative Frequency Distribution of Grades Class Limits 40 up to 50 Relative Frequency 2/30 =. 067 50 up to 60 6/30 =. 200 60 up to 70 8/30 =. 267 70 up to 80 7/30 =. 233 80 up to 90 5/30 =. 167 90 up to 100 2/30 =. 067
Relative frequency Relative Frequency Histogram of Grades. 30. 25. 20. 15. 10. 05 0 40 50 60 70 80 Grade 90 100
Based on the histogram, about what percent of the values are between 47. 5 and 52. 5? 1. 2. 3. 4. 50% 5% 17% 30% 10 Countdown
Stem and leaf displays n Have the following general appearance stem leaf 1 8 9 2 1 2 8 9 9 3 2 3 8 9 4 0 1 5 6 7 6 4
Stem and Leaf Displays Partition each no. in data into a “stem” and “leaf” n Constructing stem and leaf display 1) deter. stem and leaf partition (5 -20 stems) 2) write stems in column with smallest stem at top; include all stems in range of data 3) only 1 digit in leaves; drop digits or round off 4) record leaf for each no. in corresponding stem row; ordering the leaves in each row helps n
Example: employee ages at a small company 18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39; stem: 10’s digit; leaf: 1’s digit n 18: stem=1; leaf=8; 18 = 1 | 8 stem leaf 1 8 9 2 1 2 8 9 9 3 2 3 8 9 4 0 1 5 6 7 6 4
Suppose a 95 yr. old is hired stem 1 2 3 4 5 6 7 8 9 leaf 8 9 1 2 8 9 9 2 3 8 9 0 1 6 7 4 5
Number of TD passes by NFL teams: 2012 -2013 season (stems are 10’s digit) stem 4 3 2 2 1 0 leaf 03 247 6677789 01222233444 13467889 8
Pulse Rates n = 138
Advantages/Disadvantages of Stem-and-Leaf Displays Advantages 1) each measurement displayed 2) ascending order in each stem row 3) relatively simple (data set not too large) n Disadvantages display becomes unwieldy for large data sets n
Population of 185 US cities with between 100, 000 and 500, 000 n Multiply stems by 100, 000
Back-to-back stem-and-leaf displays. TD passes by NFL teams: 1999 -2000, 2012 -13 multiply stems by 10 1999 -2000 2 6655 43322221100 9998887666 421 2012 -13 4 3 3 2 2 1 1 0 03 7 24 6677789 01222233444 67889 134 8
Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic. How many pulses are between 67 and 77? Stems are 10’s digits 1. 2. 3. 4. 5. 4 6 8 10 12 10 Countdown
Interpreting Graphical Displays: Shape Symmetric distribution A distribution is symmetric if the right and left n sides of the histogram are approximately mirror images of each other. p A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram Skewed distribution extends much farther out than the right side. Complex, multimodal distribution p Not all distributions have a simple overall shape, especially when there are few observations.
Heights of Students in Recent Stats Class
Shape (cont. )Female heart attack patients in New York state Age: left-skewed Cost: right-skewed
Shape (cont. ): Outliers An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. The overall pattern is fairly symmetrical except for 2 states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population. A large gap in the distribution is typically a sign of an outlier. Alaska Florida
Center: typical value of frozen personal pizza? ~$2. 65
Spread: fuel efficiency 4, 8 cylinders 4 cylinders: more spread 8 cylinders: less spread
Other Graphical Methods for Economic Data n Time plots plot observations in time order, with time on the horizontal axis and the vari-able on the vertical axis ** Time series measurements are taken at regular intervals (monthly unemployment, quarterly GDP, weather records, electricity demand, etc. )
Unemployment Rate, by Educational Attainment
Water Use During Super Bowl
Winning Times 100 M Dash
Numerical Summaries of Quantitative Data Numerical and More Graphical Methods to Describe Univariate Data
2 characteristics of a data set to measure center measures where the “middle” of the data is located n variability measures how “spread out” the data is n
The median: a measure of center Given a set of n measurements arranged in order of magnitude, Median= middle value n odd mean of 2 middle values, n even n Ex. 2, 4, 6, 8, 10; n=5; median=6 n Ex. 2, 4, 6, 8; n=4; median=(4+6)/2=5
Student Pulse Rates (n=62) 38, 59, 60, 62, 63, 64, 65, 67, 68, 70, 70, 71, 72, 73, 74, 75, 75, 76, 77, 77, 78, 79, 80, 80, 84, 85, 87, 90, 91, 92, 93, 94, 95, 96, 96, 98, 103 Median = (75+76)/2 = 75. 5
Medians are used often Year 2011 baseball salaries Median $1, 450, 000 (max=$32, 000 Alex Rodriguez; min=$414, 000) n Median fan age: MLB 45; NFL 43; NBA 41; NHL 39 n Median existing home sales price: May 2011 $166, 500; May 2010 $174, 600 n Median household income (2008 dollars) 2009 $50, 221; 2008 $52, 029 n
The median splits the histogram into 2 halves of equal area
Examples Example: n = 7 17. 5 2. 8 3. 2 13. 9 14. 1 25. 3 45. 8 n Example n = 7 (ordered): m = 14. 1 n 2. 8 3. 2 13. 9 14. 1 17. 5 25. 3 45. 8 n Example: n = 8 17. 5 2. 8 3. 2 13. 9 14. 1 25. 3 35. 7 45. 8 n m = (14. 1+17. 5)/2 = 15. 8 Example n =8 (ordered) 2. 8 3. 2 13. 9 14. 1 17. 5 25. 3 35. 7 45. 8 n
Below are the annual tuition charges at 7 public universities. What is the median tuition? 4429 4960 4971 5245 5546 7586 1. 2. 3. 4. 524 5 4965. 5 4960 4971 10 Countdown
Below are the annual tuition charges at 7 public universities. What is the median tuition? 4429 4960 5245 5546 4971 5587 7586 1. 2. 3. 4. 524 5 4965. 5 5546 4971 10 Countdown
Measures of Spread n The range and interquartile range
Ways to measure variability range=largest-smallest § OK sometimes; in general, too crude; sensitive to one large or small data value § The range measures spread by examining the ends of the data § A better way to measure spread is to examine the middle portion of the data
Quartiles: Measuring spread by examining the middle The first quartile, Q 1, is the value in the sample that has 25% of the data at or Q 1= first quartile = 2. 3 below it (Q 1 is the median of the lower half of the sorted data). m = median = 3. 4 The third quartile, Q 3, is the value in the sample that has 75% of the data at or below it (Q 3 is the median of the upper half of the sorted data). Q 3= third quartile = 4. 2
Quartiles and median divide data into 4 pieces 1/4 Q 1 1/4 M 1/4 Q 3
Quartiles are common measures of spread n http: //www 2. acs. ncsu. edu/UPA/admissi ons/fresprof. htm n http: //www 2. acs. ncsu. edu/UPA/peers/cu rrent/ncsu_peers/sat. htm n University of Southern California
Rules for Calculating Quartiles Step 1: find the median of all the data (the median divides the data in half) Step 2 a: find the median of the lower half; this median is Q 1; Step 2 b: find the median of the upper half; this median is Q 3. Important: when n is odd include the overall median in both halves; when n is even do not include the overall median in either half.
11 n Example 2 4 6 8 10 12 14 16 18 20 n = 10 n. Median nm = (10+12)/2 = 22/2 = 11 n. Q 1 : n. Q 3 median of lower half 2 4 6 8 10 Q 1 = 6 : median of upper half 12 14 16 18 20 Q 3 = 16
Quartile example: odd no. of data values Ø HR’s hit by Babe Ruth in each season as a Yankee 54 59 35 41 46 25 47 60 54 46 49 46 41 34 22 Ø 22 25 34 35 41 41 46 46 46 47 49 54 54 59 60 n § Ordered values: § Median: value in ordered position 8. median = 46 § Lower half (including overall median): Ø 22 25 34 35 41 41 46 46 § Upper half (including overall median): Ø 46 46 47 49 54 54 59 60
Pulse Rates n = 138 Median: mean of pulses in locations 69 & 70: median= (70+70)/2=70 Q 1: median of lower half (lower half = 69 smallest pulses); Q 1 = pulse in ordered position 35; Q 1 = 63 Q 3 median of upper half (upper half = 69 largest pulses); Q 3= pulse in position 35 from the high end; Q 3=78
Below are the weights of 31 linemen on the NCSU football team. What is the value of the first quartile Q 1? 1. 2. 3. 4. 287 257. 5 263. 5 262. 5 # stem leaf 2 22 55 4 23 57 6 24 26 7 25 7 10 26 257 12 27 59 (4) 28 1567 15 29 35599 10 30 333 7 31 45 5 32 155 2 33 6 1 34 0 10 Countdown
Interquartile range lower quartile Q 1 n middle quartile: median n upper quartile Q 3 n interquartile range (IQR) IQR = Q 3 – Q 1 measures spread of middle 50% of the data n
Example: beginning pulse rates n Q 3 = 78; Q 1 = 63 n IQR = 78 – 63 = 15
Below are the weights of 31 linemen on the NCSU football team. The first quartile Q 1 is 263. 5. What is the value of the IQR? 1. 2. 3. 4. 23. 5 39. 5 46 69. 5 # stem leaf 2 22 55 4 23 57 6 24 26 7 25 7 10 26 257 12 27 59 (4) 28 1567 15 29 35599 10 30 333 7 31 45 5 32 155 2 33 6 1 34 0 10 Countdown
5 -number summary of data n Minimum Q 1 median Q 3 maximum n Pulse data 45 63 70 78 111
End of General Numerical Summaries Next: Numerical Summaries of Symmetric Data
Numerical Summaries of Symmetric Data. Measure of Center: Mean Measure of Variability: Standard Deviation
Symmetric Data Body temp. of 93 adults
Recall: 2 characteristics of a data set to measure center measures where the “middle” of the data is located n variability measures how “spread out” the data is n
Measure of Center When Data Approx. Symmetric mean (arithmetic mean) n notation n
Connection Between Mean and Histogram n A histogram balances when supported at the mean. Mean x = 140. 6
Mean: balance point Median: 50% area each half right histo: mean 55. 26 yrs, median 57. 7 yrs
Properties of Mean, Median 1. The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2. The mean uses the value of every number in the data set; the median does not.
Example: class pulse rates n 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 91 96 98 103 140
2010, 2014 baseball salaries 2010 n = 845 = $3, 297, 828 median = $1, 330, 000 max = $33, 000 n 2014 n = 848 = $3, 932, 912 median = $1, 456, 250 max = $28, 000 n
Disadvantage of the mean n Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data
Mean, Median, Maximum Baseball Salaries 1985 - 2014 Baseball Salaries: Mean, Median and Maximum 1985 -2014 Mean Median Maximum 3, 700, 000 3, 200, 000 25, 000 2, 700, 000 20, 000 2, 200, 000 15, 000 1, 700, 000 10, 000 1, 200, 000 Year 2013 2011 2009 2007 2005 2003 2001 1999 1997 1995 1993 0 1991 200, 000 1989 5, 000 1987 700, 000 Maximum Salary 30, 000 1985 Mean, Median Salary 35, 000
Skewness: comparing the mean, and median Skewed to the right (positively skewed) n mean>median n
Skewed to the left; negatively skewed Mean < median n mean=78; median=87; n
Symmetric data n mean, median approx. equal
DESCRIBING VARIABILITY OF SYMMETRIC DATA
Describing Symmetric Data (cont. ) n Measure of center for symmetric data: n Measure of variability for symmetric data?
Example n 2 data sets: x 1=49, x 2=51 x=50 y 1=0, y 2=100 y=50
On average, they’re both comfortable 0 100 49 51
Ways to measure variability range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs. 1.
Previous Example
The Sample Standard Deviation, a measure of spread around the mean n Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations
Calculations … Women height (inches) Mean = 63. 4 Sum of squared deviations from mean = 85. 2 (n − 1) = 13; (n − 1) is called degrees
We’ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator, Excel, or other software. Mean ± 1 s. d. 1. First calculate the variance s 2. Then take the square root to get the standard deviation s.
Population Standard Deviation
Remarks 1. The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement
Remarks (cont. ) 2. Note that s and are always greater than or equal to zero. 3. The larger the value of s (or ), the greater the spread of the data. When does s=0? When does =0? When all data values are the same.
Remarks (cont. ) 4. The standard deviation is the most commonly used measure of risk in finance and business – Stocks, Mutual Funds, etc. 5. Variance § § s 2 sample variance 2 population variance Units are squared units of the original data square $, square gallons ? ?
Remarks 6): Why divide by n-1 instead of n? degrees of freedom n each observation has 1 degree of freedom n however, when estimate unknown population parameter like , you lose 1 degree of freedom n
Remarks 6) (cont. ): Why divide by n-1 instead of n? Example Suppose we have 3 numbers whose average is 9 Choose ANY values for x 1 x 2 n x 1= x 2= and Since the average (mean) is 9, x 1 + x 2 + x 3 must n then x 3 must be equal 9*3 = 27, so x 3 = 27 n once we selected x 1– and (x 1 + xx 22) , x 3 was determined since the average was 9 n 3 numbers but only 2 “degrees of freedom” n
Computational Example
class pulse rates
Review: Properties of s and are always greater than or equal to 0 when does s = 0? n The larger the value of s (or ), the greater the spread of the data n the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement n
Summary of Notation
End of Chapter 4
- Chapter 3 exploring quantitative data answers
- Uniform bell-shaped skewed-right skewed-left
- Displaying quantitative data
- Summarizing qualitative data
- Summarizing quantitative data
- Summarizing quantitative data
- Summarizing quantitative data
- Categorical data displays
- Data preparing exploring examining and displaying
- Collecting and displaying data
- Organizing and displaying data
- Organizing and displaying data
- Displaying data from multiple tables
- Amiable in to kill a mockingbird
- Data analysis qualitative
- Data
- What are the different tips in displaying petit fours
- Displaying distributions with graphs
- Pygame pixel font
- Qualitative vs quantitative data analysis
- Quantitative and qualitative data difference
- Analyzing and interpreting quantitative data
- Which of the following is an example of quantitative
- Because of its importance in summarizing your strategy
- Contoh ringkasan abstrak dan sintesis
- Try to practice paraphrasing summarizing and direct quoting
- Difference between summarising and paraphrasing
- Summarizing and note taking strategies
- Summarizing and note taking strategies
- Techniques of summarizing
- Pepsi coca cola halloween ad
- Partializing social work
- Quoting summarizing and paraphrasing
- Quoting, paraphrasing and summarizing examples
- Compare paraphrasing summarizing and direct quoting
- Qualitative vs quantitative
- Qualitative vs quantitative biology
- Data types in quantitative research
- T test in quantitative research
- Bandura research method
- What is quantitative data
- Tabular and graphical presentation of data
- Quantitative data sources
- Describing quantitative data
- It is the part where you indicate the research instrument
- What is summerising
- Non fiction summary
- Summarizing examples
- Paraphrasing vs summarizing
- What is summarizing ?
- Which of these could dna determine flocabulary
- Basic signal words in summarizing
- How to summarize nonfiction
- Summarizing literary texts lesson 9
- Methods of summarizing
- Distributed summarizing
- Putting the puzzle together main idea
- What is summarizing
- Somebody wanted but so
- Paraphrasing vs summarizing
- Guidelines in summarizing
- Midas summarizing strategy
- 7 steps in writing summary
- Summarizing essay
- A tornado is a powerful twisting windstorm
- Summarizing mini lesson
- Ke yi hkust
- Summarizing nonfiction powerpoint
- Summarizing counseling
- Reflection of meaning in counseling examples
- Unit 8: summarizing the cold war
- What is retelling
- Summary
- Summarizing linking words
- Chapter 3 research parts
- Quantitative analysis for management chapter 3 answers
- Chapter 2 quantitative research
- Stoichiometry
- Role of quantitative research
- What's a qualitative observation
- Difference between qualitative and quantitative
- Quantitative variables examples
- Quantitative and qualitative variables examples
- Qualitative tests for lipids lab report
- Observation is qualitative or quantitative
- Similarities between qualitative and quantitative research
- Similarities between qualitative and quantitative research
- Econometrics and quantitative economics
- Mrs brosseau's binder
- Quantitative and verbal reasoning
- Qualitative and quantitative difference
- Qualitative and quantitative difference
- Sampling methods in qualitative and quantitative research
- Qualitative traits vs quantitative traits
- Examples of qualitative research
- Quantitative and qualitative difference