Chapter 3 Displaying and Describing Categorical Data Sarah
Chapter 3: Displaying and Describing Categorical Data Sarah Lovelace and Alison Vicary Period 2
Vocabulary ● Frequency table: table with category names and counts of totals for each category o gives frequencies that each category occurs ● Relative frequency table: frequency table with percentage each category occurs out of the total number of events/items o If all possible categories are given, relative frequencies (percents) should add up to 100%. o Makes it easier to compare two tables with the same categories but different overall totals ● Distribution: Distribution of a variable describes the variable’s possible values and the relative frequency of each value. o can be seen in a relative frequency table
Vocabulary ● Area principle: The area occupied by a part of the graph should correspond to the size of the value it represents. ● Bar chart: displays the distribution of a categorical variable using bars o Allows visual comparison of frequencies o Has small spaces between the bars o Could be a relative frequency bar chart ● Pie chart: displays the distribution of variable with slices of a circle proportional to the categories’ relative frequencies o must include piece of circle for all categories so relative frequencies add up to 100%
Vocabulary ● Contingency table: table with counts or percentages of individuals falling into categories of multiple variables o Shows number of individuals in a category, contingent on being in another category o Percents can be included in each cell, giving percent of row, column, or entire table o Marginal distribution: frequency distribution of a single variable from a contingency table § may be written in the margin of the contingency table with totals o Conditional distribution: distribution of one variable just for individuals that satisfy a condition of another variable § Example: distribution of grade level, conditional on being in Statistics § Segmented bar chart: bars for each category are divided into segments with sizes proportional to the relative frequencies of another variable conditional on being in that bar’s category
Vocabulary ● Independence: Variables are considered independent if the distribution of one variable is the same for all categories of the other o See if variables are independent by comparing conditional distributions of one variable for each value of the other ● Simpson’s paradox: happens when averages taken across different groups seem to contradict the overall averages o Better to compare percentages or averages of one variable within each category of the other instead of finding the overall average
Problem 29 (page 41) In July 1991 and again in April 2001 the Gallup Poll asked random samples of 1015 adults about their opinions on working parents. The table summarizes responses to this question: “Considering the needs of both parents and children, which of the following do you see as the ideal family in today’s society? ” Based upon these results, do you think there was a change in people’s attitudes during the 10 years between these polls? Explain.
Problem 29, continued 1991 2001 Both work full time 142 131 One works full time, other part time 274 244 One works, other works at home 152 173 One works, other stays home for kids 396 416 No opinion 51 51
Problem 29, continued Relative frequency table: percent of total responses 1991 2001 Both work full time 14. 0% 12. 9% One works full time, other part time 27. 0% 24. 0% One works, other works at home 15. 0% 17. 0% One works, other stays home for kids 39. 0% 41. 0% No opinion 5. 0%
Problem 29, continued Relative frequency bar chart
Problem 29, continued Answer: People’s opinions about working parents and the ideal family did not change over the 10 years from 1991 to 2001. Slight differences between relative frequencies of the two years are minimal and may be attributed to sampling variability.
Problem 31 (page 41) A company held a blood pressure screening clinic for its employees. The results are summarized in the table below by age group and blood pressure level. a. Find the marginal distribution of blood pressure level. b. Find the conditional distribution of blood pressure level within each group. c. Compare these distributions with a segmented bar graph. d. Write a brief description of the association between age and blood pressure among these employees. e. Does this prove that people’s blood pressure increases as they age? Explain. Under 30 30 -49 Over 50 Low 27 37 31 Normal 48 91 93 High 23 51 73
Problem 31, continued Under 30 30 -49 Over 50 Total Low Count % of Row % of Column % of Table 27 28. 4% 27. 6% 5. 7% 37 38. 9% 20. 7% 7. 8% 31 32. 6% 15. 7% 6. 5% 95 100% 20. 0% Normal Count % of Row % of Column % of Table 48 20. 7% 49. 0% 10. 1% 91 39. 2% 50. 8% 19. 2% 93 40. 1% 47. 2% 19. 6% 232 100% 48. 9% High Count % of Row % of Column % of Table 23 15. 6% 23. 5% 4. 9% 51 34. 7% 28. 5% 10. 8% 73 49. 7% 37. 1% 15. 4% 147 100% 31. 0% Total Count % of Row % of Column % of Table 98 20. 7% 100% 20. 7% 179 37. 8% 100% 37. 8% 197 41. 6% 100% 41. 6% 474 100%
Problem 31, continued Low Normal High Under 30 30 -49 Over 50 Total 27 28. 4% 37 38. 9% 31 32. 6% 95 100% Under 30 30 -49 Over 50 Total 48 20. 7% 91 39. 2% 93 40. 1% 232 100% Under 30 30 -49 Over 50 Total 23 15. 6% 51 34. 7% 73 49. 7% 147 100%
Problem 31, continued
Problem 31, continued d. with an increase in age, there is also an increase in blood pressure. The numbers for high and low blood pressure are fairly equal with lower age, but as age increases, the percentage of adults with low blood pressure falls and the percentage of adults with high blood pressure. e. No, these data do not necessarily prove that blood pressure increases with age. They indicate that there is a possibility that they are related, but there may be a lurking variable leading to higher blood pressure with increased age.
- Slides: 15