Chapter 16 Exercises

DataField.Dev

Chapter 16 Exercises

Conceptual Exercises

Exercise 16.1 — Map vs. Territory In the 2016 election, standard county-level choropleths showed Donald Trump winning approximately 84% of U.S. counties by land area. Hillary Clinton won approximately 2.86 million more popular votes. Explain the disconnect between these two facts. What would a better map look like, and what encoding choices would it require?

Exercise 16.2 — Chart Type Selection For each of the following analytical questions, identify the best chart type and justify your choice: a) How has Republican Senate vote share changed in Sun Belt states from 2010 to 2022? b) How does voter turnout propensity vary across combinations of age group (five categories) and party registration (four categories)? c) What is the relationship between median household income and Garza support score, across 85 counties? d) How does the racial composition of the electorate differ between urban, suburban, and rural areas? e) Which ten counties have the highest mean persuadability score?

Exercise 16.3 — Color Scale Design You are designing a choropleth map of county-level partisan vote margin (a variable ranging from -40 (strongly Republican) through 0 (exact tie) to +40 (strongly Democratic)). a) Should you use a sequential or diverging color scale? Why? b) What midpoint color would be appropriate? c) Should the scale be symmetric? What distortion would an asymmetric scale create? d) How would you handle a county where the margin is +42 (outside your scale range)?

Exercise 16.4 — The Ecological Fallacy A county-level scatter plot shows a strong positive correlation (r = 0.72) between the percentage of Hispanic voters in a county and the mean Garza support score. A campaign staffer concludes, "Hispanic voters overwhelmingly support Garza." What is wrong with this conclusion? What additional data would you need to assess whether this individual-level inference is warranted?

Exercise 16.5 — Visualization Ethics A campaign's communication director asks you to produce a chart showing polling trends that "makes our upward momentum look as strong as possible." Describe three specific design choices you could make that would visually amplify apparent momentum. Then describe the ethical problems with each choice and what a more honest visualization would look like.

Coding Exercises

Exercise 16.6 — Reproduce and Extend Run the choropleth code from Section 16.4.2 on the ODA dataset. Then extend it by: a) Adding county name labels to the five largest counties by registered voters b) Adding a subtitle showing the date the data was generated c) Changing the color scale to coolwarm and observing how the interpretation changes

Exercise 16.7 — Custom Grouped Bar Chart Using the ODA dataset, build a grouped bar chart showing mean support score and mean persuadability score broken down by: - Education level (x-axis) AND - Income bracket (color-coded groups within each education bar group)

This requires pivoting the data on two dimensions. Write the complete code, including appropriate axis labels, title, and a sample size annotation for at least one group.

Exercise 16.8 — Turnout Gap Analysis The "turnout gap" is the difference between a group's share of registered voters and its share of actual voters. Using the ODA vote history columns: a) Calculate the turnout gap for each racial/ethnic group in 2018, 2020, and 2022 b) Produce a side-by-side bar chart showing these gaps across all three cycles c) Calculate the net vote impact of the 2020 turnout gap (how many more or fewer votes did each group contribute relative to their share of registered voters?) d) Annotate the chart with the net vote impact calculation for 2020

Exercise 16.9 — Interactive Heatmap Extend the heatmap from Section 16.8 to use Plotly's px.imshow() instead of seaborn. Requirements: a) The heatmap should show mean support score by race/ethnicity (rows) and income bracket (columns) b) Hovering over each cell should show: the mean score, the cell count, the minimum score, and the maximum score in that cell c) The color scale should run from red (low support) to blue (high support) d) Save the output as interactive_heatmap.html

Exercise 16.10 — The GOTV Opportunity Map Build a two-panel visualization that Nadia could use to identify GOTV priorities:

Panel 1 (Scatter): x-axis = mean turnout propensity (estimated from vote history), y-axis = mean Garza support score, bubble size = total voters, color = urban-rural category. Shade the "high-value zone" (low propensity, high support) with a lightly shaded rectangle.

Panel 2 (Bar chart, sorted): Calculate the expected additional Garza votes from each county if its turnout propensity increased by 5 percentage points. Formula: additional_votes = total_voters × 0.05 × (support_score / 100). Show the top 15 counties by this metric as a sorted horizontal bar chart with county names labeled.

Applied Exercises

Exercise 16.11 — The Dashboard Critique Your campaign produces the following dashboard for a county field director meeting: - A map showing counties colored binary: red (mean support < 50) or blue (mean support ≥ 50) - A table of 85 counties sorted alphabetically with their support scores - A pie chart showing the racial composition of the overall electorate

Identify at least four specific problems with this dashboard design. Propose improved alternatives for each panel, explaining what question each improved panel answers better.

Exercise 16.12 — Audience-Specific Visualization The same underlying ODA data needs to be visualized for three different audiences: a) The campaign's field directors (need to know where to focus canvassing resources) b) A journalist writing a story about the demographic coalition Garza needs to win c) The campaign's major donors (need to understand where the race stands and why)

For each audience, describe (in words, you don't need to code all three): (1) what question the visualization needs to answer, (2) what chart type would best serve that purpose, (3) what variables to include, and (4) what level of statistical complexity is appropriate.