Chapter 2 Displaying and Describing Categorical Data Copyright
Chapter 2 Displaying and Describing Categorical Data Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -11
The Three Rules of Data Analysis n The three rules of data analysis won’t be difficult to remember: 1. Make a picture—things may be revealed that are not obvious in the raw data. These will be things to think about. 2. Make a picture—important features of and patterns in the data will show up. You may also see things that you did not expect. 3. Make a picture—the best way to tell others about your data is with a well-chosen picture. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -22
Election Results 2012 Obama Romney Who won again? ? Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -33
Election Results cont. This map shows the reality… Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -44
What’s Wrong With This Picture? You might think that a good way to show the Titanic data is with this display: n Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -55
The Area Principle n n n The ship display makes it look like most of the people on the Titanic were crew members, with a few passengers along for the ride. When we look at each ship, we see the area taken up by the ship, instead of the length of the ship. The ship display violates the area principle: n The area occupied by a part of the graph should correspond to the size of the value it represents. (bar graphs, pie charts do this well) Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -66
Frequency Tables: Making Piles n n n To make an accurate picture, make piles of data. We can “pile” the data by counting the number of data values in each category of interest. We can organize these counts into a frequency table, which records the totals and the category names. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -77
Frequency Tables: Making Piles (cont. ) n A relative frequency table is similar, but gives the percentages (instead of counts) for each category. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -88
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -99
Frequency Tables: Making Piles (cont. ) n n n Both types of tables show cases are distributed across the categories. They describe the distribution of a categorical variable because they name the possible categories and tell how frequently each occurs. Pro-tip: relative frequency is just a fancy name for percent. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1010
Bar Charts n n n A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. A bar chart stays true to the area principle. The bars are usually spaced apart to make the graph more readable. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1111
Bar Charts (cont. ) n n A relative frequency bar chart displays the relative proportion of counts for each category. A relative frequency bar chart also stays true to the area principle. Replacing counts with percentages in the ship data: Doesn’t look much different, does it? Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1212
Pie Charts n n n When you are interested in parts of the whole, a pie chart might be your display of choice. Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is proportional to the fraction of the whole in each category. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1313
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1414
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1515
Contingency Tables n n A contingency table allows us to look at two categorical variables together. It shows how individuals are distributed along each variable, contingent on the value of the other variable. n Example: we can examine the class of ticket and whether a person survived the Titanic: Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1616
Contingency Tables (cont. ) n n The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. Each frequency distribution is called a marginal distribution of its respective variable. n The marginal distribution of Survival is: Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1717
Contingency Tables (cont. ) n Each cell of the table gives the count for a combination of values of the two values. n For example, the second cell in the crew column tells us that 673 crew members died when the Titanic sunk. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1818
Conditional Distributions n A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. n The following is the conditional distribution of ticket Class, conditional on having survived: Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -1919
Conditional Distributions (cont. ) n The following is the conditional distribution of ticket Class, conditional on having perished: Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2020
Conditional Distributions (cont. ) n n The conditional distributions tell us that there is a difference in class for those who survived and those who perished. This is better shown with pie charts of the two distributions: Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2121
Conditional Distributions (cont. ) n n n We see that the distribution of Class for the survivors is different from that of the nonsurvivors. This leads us to believe that Class and Survival are associated, that they are not independent. The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2222
Segmented Bar Charts n n n A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Each bar is treated as the “whole” and is divided proportionally into segments corresponding to the percentage in each group. Here is the segmented bar chart for ticket Class by Survival status: Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2323
Segmented Bar Charts n Note that using percentages is crucial. It allows us to compare groups of different sample sizes. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2424
Mosaic Plots n n A mosaic plot is a graphical display that allows you to examine the relationship among two or more categorical variables. The mosaic plot starts as a square with length one. The square is divided first into horizontal bars whose widths are proportional to the probabilities associated with the first categorical variable. Then each bar is split vertically into bars that are proportional to the conditional probabilities of the second categorical variable. Additional splits can be made if wanted using a third, fourth variable, etc. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2525
Mosaic Plots n If you wanted to compare the mortality rates between men and women using a mosaic plot, you would first divide the unit square according to the overall proportion of males and females. Roughly 35% of the passengers were female, so the first split of the mosaic plot is 35/65. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2626
Mosaic Plots n Next, split each bar vertically according to the proportion who lived and died. Among females, 67% survived (coded as 1 on this plot) and 33% died (coded as 0). So the female bar shows as 67/33 split. Among males, only 17% survived, so this bar shows a 17/83 split. The strong disparity in mortality between men and women is demonstrated by the lack of alignment of two vertical separating lines. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2727
Mosaic Plots n If the survival rates were identical among men and women, the mosaic plot would look like this: Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2828
Mosaic Plots n Most implementations of the mosaic plot offer as a default a small margin around each cell to make the graph easier to read. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -2929
Mosaic Plots n Here is a mosaic plot looking at the relationship between passenger class and mortality. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3030
Mosaic Plots n You can add a third split to examine the influence of the combination sex and passenger class on mortality. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3131
What Can Go Wrong? n Don’t violate the area principle. n While some people might like the pie chart on the left better, it is harder to compare fractions of the whole, which a flat pie chart does easily. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3232
What Can Go Wrong? (cont. ) n Keep it honest—make sure your display shows what it says it shows. n This plot of the percentage of high-school students who engage in specified dangerous behaviors has a problem. Can you see it? Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3333
What Can Go Wrong? (cont. ) n n Don’t confuse similar-sounding percentages—pay particular attention to the wording of the context. For example: n What percentage of seniors are males? n What percentage of males are seniors? Don’t forget to look at the variables separately too —examine the marginal distributions, since it is important to know how many cases are in each category. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3434
What Can Go Wrong? (cont. ) n n Don’t overstate your case—don’t claim something you can’t. Don’t use unfair or silly averages—this could lead to Simpson’s Paradox, so be careful when you average one variable across different levels of a second variable. (Simpson is not something you need to know for the AP) Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3535
What have we learned? n n n We can summarize categorical data by counting the number of cases in each category (expressing these as counts or percents). We can display the distribution in a bar chart or pie chart. And, we can examine two-way tables called contingency tables, examining marginal and/or conditional distributions of the variables. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3636
AP Tips n Labels and scales on all graphs are required for full credit. Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3737
AP Tips, cont. n Checking for independence is a VERY important idea in this course. We will revisit this topic in Chapter 14 and again in Chapter 25! Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3838
AP Tips, cont. n n Segmented bar graphs are powerful. But only when the categories add to a meaningful total. Take a look at the 2010 AP exam, problem #6. Why can this graph not be segmented? There are many different correct graphs for this problem. How many can you make? Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 2, Slide 1 -3939
- Slides: 39