Chapter 3 Graphical Methods for Describing Data Graphs
Chapter 3 Graphical Methods for Describing Data
Graphs for categorical data
Double Bar Charts When to Use Categorical data How to construct – Constructed like bar charts, but with two (or more) groups being compared – MUST use relative frequencies on the vertical axis – MUST include a key to denote the different Whybars MUST we use relative frequencies?
Each year the Princeton Review conducts a survey of students applying to college and of parents of college applicants. In 2009, 12, 715 high school students responded to the question “Ideally how far from home would you like the college you attend to be? ” Also, 3007 parents of students applying to college responded to the question “how far from would you like What should you home do first? the college your child attends to be? ” Data is displayed in the frequency table below. Frequency Ideal Distance Students Parents Less than 250 miles 4450 1594 250 to 500 miles 3942 902 500 to 1000 miles 2416 331 More than 1000 miles 1907 180 Create a comparative bar chart with these data.
Relative Frequency Ideal Distance Students Parents Less than 250 miles . 35 . 53 250 to 500 miles . 31 . 30 500 to 1000 miles . 19 . 11 More than 1000 miles . 15 . 06 Foundby bydividingthe thefrequencyby bythe thetotal numberofofstudents parents What does this graph show about the ideal distance college should be from home?
Segmented (or Stacked) Bar Charts When to Use Categorical data How to construct – MUST first calculate relative frequencies – Draw a bar representing 100% of the group – Divide the bar into segments corresponding to the relative frequencies of the categories
Remember the Princeton survey. . . Create a segmented bar graph with these data. Relative Frequency First Ideal Distance Students Less than 250 miles . 35 250 to 500 miles . 31 500 to 1000 miles . 19 More than 1000 miles . 15 draw a Parents bar that. 53 represents. 30 100% of the. 11 students who. 06 answered the survey.
Relative Frequency Relative frequency Notice Ideal Distance that this segmented Students Parents bar chart Less than 250 miles. 35 relationship. 53 displays the same between the 250 to 500 miles. 31 opinions of students and. 30 parents concerning 500 to 1000 miles. 19 that college. 11 the ideal distance is from home More than 1000. 15 as miles the double bar. 06 chart does. First draw a Next, divide Do the same thing bar that themiles bar into Less than 250 for parents – represents 250 to 500 miles segments. don’t forget a key 100% of the 500 to 1000 miles denoting each More than 1000 miles who students category answered the survey. Students Parents
Pie (Circle) Chart When to Use Categorical data How to construct – Draw a circle to represent the entire data set – Calculate the size of each “slice”: Relative frequency × 360° – Using a protractor, mark off each slice To describe – comment on which category had the largest proportion or smallest proportion
Typos on a résumé do not make a very good impression when applying for a job. Senior executives were asked how many typos in a résumé would make them not consider a job candidate. The resulting data are summarized in the table below. Number of Typos Frequency Relative Frequency 1 60 . 40 2 54 . 36 3 21 . 14 4 or more 10 . 07 Don’t know 5 . 03 Create a pie chart for these data.
Number of Typos 1 2 Frequency Relative Frequency What does this pie chart tell us about the 60. 40 number of 54 typos occurring in résumés. 36 21 applicant. 14 would not be before the 4 or more 10. 07 considered for a job? 3 Don’t know 5 . 03 First draw a Next, calculate circle to each Repeat for the size ofthe represent slice for “ 1 typo” entire data set. Here is the. 40× 360º =144º completed pie chart created Draw slice. usingthat Minitab.
How to describe a numerical, univariate graph
What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ?
1. Center • discuss where the middle of the data falls • three measures of central tendency – mean, median, & mode The mean and/or median is typically reported rather than the mode.
What strikes you as the most distinctive difference among the distributions of scores in classes D, E, & F?
2. Spread • discuss how spread out the data is • refers to the variability in the data Remember, Standard deviation & IQR will be discussed in Chapter Range = maximum value – 4 minimum value • Measure of spread are – Range, standard deviation, IQR
What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ?
3. Shape • refers to the overall shape of the distribution • symmetrical, uniform, skewed, or bimodal The following slides will discuss these shapes.
Symmetrical 1. Collect data by rolling two dice and recording the sum of the two dice. • refers datatimes. in which both sides Repeatto three are (more or less) the same when the graph is folded 2. Plot your sums on thevertically dotplot ondown the board. the middle • bell-shaped is a special type 3. What shape does this distribution –have? has a center mound with two sloping tails
Uniform 1. Collect data by rolling a single die and recording the number rolled. Repeat • refers to data in which every class five times. has equal or approximately equal frequency 2. Plot your numbers on the dotplot on To help remember the board. the name for this shape, picture 3. soldier What shape does standing in this distribution have? straight lines. What are they wearing?
Skewed 1. Collect data finding the age of five coins in circulation (current year Name a variable distribution minus year ofwith coin)aand record that is negatively skewed. • refers to data in which one side (tail) is longer than the other side 2. Plot the ages on the dotplot on the board. • the direction of skewness is on the 3. What shape does this distribution side of the longer tail have? The directions are positively (or right) skewed or negatively (or left) skewed.
Bimodal (multi-modal) Suppose collect data on the time it takes totodrive San of Luis Obispo, • refers the from number peaks in California. the shapeto of. Monterey, the distribution Some people may take the inland route • Bimodal would 2. 5 have two peaks (approximately hours) while others may take the would coastalhave routemore (between • Multi-modal than 3. 5 and 4 hours). two peaks Bimodal distributions can occur when the data set consist of observations from What shape would thisofdistribution two different kinds individuals or Whathave? would a distribution be called if it objects. had ONLY one peak? Unimodal
What strikes you as the most distinctive difference among the distributions of exam scores in class J ?
4. Unusual occurrences • Outlier - value that lies away from the rest of the data • Gaps • Clusters
5. In context • You must write your answer in reference to the context in the problem, using correct statistical vocabulary and using complete sentences!
Graphs for numerical data
Stem-and-Leaf Displays When to Use Univariate numerical data How to construct Each is split into two parts: Can also number create comparative stem-and-leaf – Select one or more of the leading digits for the Remember the data set collected in Chapter 1 – how many stem displays piercings do–you have? stem Would a stem-and-leaf display be a – List the possible in a vertical column Stem consists of values the first digit(s) Use for small to good graph for this distribution? Why or why not? –Leaf Record the leaf forof each observation beside each - consists the final digit(s) moderate sized corresponding stem value data sets. – Indicate the units for stems and. Be leaves in to a key sure list or legend Doesn’t work well If you have a long lists of every stem from for leaves behind asmallest fewdata stems, thelarge to To describe sets. you can split in value order thestems largest – comment on the center, spread, and shape of the spread the distribution and if there aretoany unusualout features distribution.
The following data are price per ounce for various brands of different brands of dandruff shampoo at a local grocery store. 0. 32 0. 21 0. 29 0. 54 0. 17 0. 28 0. 36 0. 23 Create a stem-and-leaf display with this data? What would an List the stems For the observation of Stem Leaf The. Continue median price per ounce recording each appropriate stem Describe this vertically “ 0. 32”, write the 2 is 1 7 for dandruff shampoo leaf with the be? distribution. behind the “ 3” stem. $0. 285, with a range corresponding stemof 2 1 9 8 3 $0. 37. The distribution is 2 6 3 positively skewed with an 4 outlier at $0. 54. 5 4
The Census Bureau projects the median age in 2030 for the 50 states and Washington D. C. A stem-and-leaf display is shown below. now you We use L for lower leaf. Notice valuesthat (0 -4) see(5 -9). the shape of and H for higher leaf can values this distribution. Notice that you really cannot We can split the stems in order see a distinctive shape for this to better see the shape of the distribution due to the long list distribution. of leaves
The median percentage of primary-school-aged The following is data on the percentage ofto thein children enrolled in school is larger for countries Let’s truncate the leaves Create a to comparative stem. Be sure use comparative What is an appropriate primary-school-aged who are enrolled in Northern Africa thanchildren in Central Africa, but the unit place. and-leaf display. these language when describing stem? schoolare forthe 19 countries in distribution Northern Africa and ranges same. The for countries distributions! for 23 countries in Central in Northern Africa is strongly negatively“ 4” skewed, but “ 4. 6” African. becomes the distribution for countries in Central Africa is approximately symmetrical. Northern Africa 54. 6 34. 3 48. 9 77. 8 59. 6 88. 5 97. 4 92. 5 83. 9 98. 8 91. 6 97. 8 96. 1 92. 2 94. 9 98. 6 86. 6 96. 9 88. 9 Central Africa 58. 3 34. 6 35. 5 45. 4 38. 6 63. 8 53. 9 61. 9 69. 9 43. 0 85. 0 63. 4 58. 4 61. 9 40. 9 73. 9 34. 8 74. 4 97. 4 61. 0 66. 7 79. 6
Histograms When to Use Univariate numerical data How to construct Discrete. Constructed data For comparative histograms – use ―Draw a horizontal scale and mark it with thefor possible differently values for the variable two separate graphs with the same discrete versus ―Draw vertical and mark it with frequency or scalea on the scale horizontal axis continuous data relative frequency ―Above each possible value, draw a rectangle centered at that value with a height corresponding to its frequency or relative frequency To describe – comment on the center, spread, and shape of the distribution and if there any unusual features
Queen honey bees mate shortly after they become adults. During a mating flight, the queen usually takes several partners, collecting sperm that she will store and use throughout the rest of her life. A study on honey bees provided the following data on the number of partners for 30 queen bees. 12 8 9 2 3 7 4 5 5 6 6 4 6 7 7 7 10 4 8 1 6 7 9 7 8 11 6 10 Create a histogram for the number of partners of the queen bees.
Draw a First draw a rectangle Next draw a horizontal above each vertical axis, scaled value with a axis, with scaled the height with possible corresponding frequency values of to relative the or the variable frequency. offrequency. interest. Suppose we use relative frequency instead of frequency on the vertical What do you notice about the shapes axis. of these two histograms?
Histograms When to Use Univariate numerical data How to construct Continuous data ―Mark the boundaries of the class intervals on the horizontal axis ―Draw a vertical scale and mark it with frequency or relative frequency This is the type of histogram that interval ―Draw a rectangle directly above each class withmost a height corresponding to its frequency students are familiar with. or relative frequency To describe – comment on the center, spread, and shape of the distribution and if there any unusual features
A study examined the length of hours spent The median number ofahours spent watching TV per day for sample of children TV per day was greater for the 1 -year-olds age 1 and for a sample of children age 3. Below than for the 3 -year-olds. The distribution for are the comparative histograms. 3 -year-olds was more strongly skewed right than the distribution for the 1 -year-olds, Notice the common scaleranges. on but the two distributions had similar Write a few sentences the horizontal axis comparing the distributions. Children Age 1 Children Age 3
Histograms with unequal intervals When to use - when you have a concentration of data in the middle with some extreme values How to construct - construct similar to histograms with continuous data, but with density on the vertical axis
Cumulative Relative Frequency Plot When to use - used to answer questions about percentiles. How to construct. Percentiles are a value with a given of percent of observations - Mark the boundaries the intervals on the at or below that value. horizontal axis - Draw a vertical scale and mark it with relative frequency - Plot the point corresponding to the upper end of each interval with its cumulative relative frequency, including the beginning point - Connect the points.
The National Climatic Center has been collecting the cumulative weather. Find data for many years. The relative annual rainfall amounts for Albuquerque, New Mexico from 1950 to frequency for each 2008 were used to create the frequency distribution interval below. Annual Rainfall (in inches) Relative frequency 4 to <5 0. 052 5 to <6 0. 103 6 to <7 0. 086 7 to <8 0. 103 8 to <9 0. 172 9 to <10 0. 069 10 to < 11 0. 207 11 to <12 0. 103 12 to <13 0. 052 13 to <14 0. 052 Cumulative relative frequency + + 0. 052 0. 155 0. 241 Continue this pattern to complete the table
The National Climatic Center has been collecting weather data for many years. relative The annual rainfall To create a cumulative frequency amounts Albuquerque, Newthe Mexico from 1950 ofto plot, for graph a point for upper value 2008 the were used to and create frequency distribution interval thethe cumulative relative below. frequency Annual Rainfall (in inches) Relative frequency Cumulative relative frequency 12 to <13 0. 052 0. 947 13 to <14 0. 052 0. 999 In the context of this 4 to <5 0. 052 problem, explain the 0. 052 5 to <6 0. 103 0. 155 meaning of 6 tothis value. 0. 086 <7 0. 241 Why isn’t this value one Plot a point for each interval. 7(1)? to <8 0. 103 0. 344 8 to <9 0. 172 0. 516 Plot a starting point at (4, 0). In the context of this to <10 0. 069 Connect points. 0. 585 problem, 9 explain the 10 to < 11 0. 207 0. 792 meaning of this value. 0. 103 11 to <12 0. 895
Cumulative relative frequency What proportion of years had rainfall amounts that were 9. 5 inches or less? Approximately 0. 55 Rainfall
Cumulative relative frequency Approximately 30% of the years had annual rainfall less than what amount? Approximately 7. 5 inches Rainfall
Cumulative relative frequency Which interval of rainfall amounts had a larger proportion of years – 9 to 10 inches or 10 to 11 inches? Explain Rainfall The interval 10 to 11 inches, because its slope is steeper, indicating a larger proportion occurred.
Displaying Bivariate Numerical Data
Scatterplots When to Use Bivariate numerical data Scatterplots are How to construct discussed in much greater depth in - Draw a horizontal scale and mark it with Chapter 5. variable appropriate values of the independent - Draw a vertical scale and mark it appropriate values of the dependent variable - Plot each point corresponding to the observations To describe - comment the relationship between the variables
Time Series Plots When to Use - measurements collected over time at regular intervals How to construct Can abe considered - Draw horizontal scale and mark it with bivariate data where the appropriate values of time y-variable is thescale variable - Draw a vertical and mark it appropriate values of the observed measured and the x-variable - Plotvariable each point corresponding to the is time observations and connect To describe - comment on any trends or patterns over time
The accompanying time-series plot of movie box office totals (in millions of dollars) over 18 weeks in the summer for 2001 and 2002 appeared in USA Today (September 3, 2002). Describe any trends or patterns that you see.
- Slides: 46