Lesson 1: 3.1 Relationships Between Two Categorical Values
Duration of Days: 2
Lesson Objective
Students will be able to organize bivariate categorical data in two-way tables, calculate marginal and conditional distributions, and use graphical displays (side-by-side or segmented bar graphs) to determine if an association exists between two variables.
Why is it misleading to compare the "counts" of two groups if the group sizes are different?
What is the difference between knowing the "total percentage of people who like x" versus the "percentage of women who like x"?
How can we tell, just by looking at a graph, if one variable actually has an influence on another?
Two-way table
Marginal distribution
Conditional distribution
Side-by-side bar graph
Segmented bar graph
Association
Explanatory variable
Response variable
HSS-ID.B.5: Summarize categorical data for two categories in two-way frequency tables. Interpret relative frequencies in the context of the data (including joint, marginal, and conditional relative frequencies). Recognize possible associations and trends in the data.
The SAT frequently presents a two-way table and asks students to calculate a probability or percentage based on a specific sub-group (e.g., "Of those who preferred Option A, what fraction were in Grade 10?"). Mastering the "denominator" in conditional distributions is the key to these points.
The purpose of this section is to move students from describing a single group to comparing two groups. It introduces the concept of association, which is the categorical equivalent of correlation.
DOK 1: From the provided table, what percentage of the total sample size identifies as "Male"?
DOK 2: Compare the conditional distribution of "Superpower Choice" for males vs. females. What do you notice?
DOK 3: Based on the segmented bar graph, does there appear to be an association between gender and superpower choice? Justify your answer using specific percentages.
The Problem: A study of the Titanic survivors categorized passengers by "Class" (First, Second, Third, or Crew) and "Survival" (Survived or Perished).
Task A: Calculate the marginal distribution of Survival.
Task B: Calculate the conditional distribution of Survival for First Class passengers and for Third Class passengers.
Task C: Does the data suggest that Survival was associated with Passenger Class?
Confusion of Denominators: Students often use the "table total" as the denominator for every calculation. They struggle to switch between the "marginal" total (the edge of the table) and the "conditional" total (the row or column total).
Association = Causation: Just because Third Class passengers had a lower survival rate doesn't automatically mean their class caused their death (though in this case, it was a major factor); students should practice using "association" instead of "caused."
Comparing Counts: Students may say "More Third Class passengers died than Second Class," forgetting that there were many more Third Class passengers to begin with.
Support (Scaffolding): Use a "Two-Way Table Highlighter" strategy. Have students use a yellow highlighter for the row/column they are focusing on to help them visualize that the "Total" at the end of that specific row is their only denominator.
Extension (Inquiry): Introduce a simple case of Simpson’s Paradox. Show a dataset where a treatment looks better for the whole group but worse for every sub-group when broken down by a third categorical variable (like "Severity of Illness").
Teacher assigns examples from the textbook and other resources.
Access E-Book through Classlink