Chapter 16 Quiz

DataField.Dev

Chapter 16 Quiz

Instructions: Select the best answer for each multiple-choice question. For short-answer questions, write 2–4 sentences. For the coding question, write working Python code.

Multiple Choice

1. The primary limitation of a standard county-level choropleth map of U.S. election results is:

a) They are too computationally expensive to generate for large states b) Geographic area dominates the visual impression regardless of population distribution c) The resolution is too coarse to show precinct-level variation d) Color encoding is not effective for showing continuous variables

Answer: b

2. A diverging color scale (e.g., red-white-blue) is most appropriate for:

a) Data that ranges from zero to a maximum value with no natural reference point b) Data that has a meaningful natural midpoint around which values fall above and below c) Categorical data with three or more distinct groups d) Time series data showing change over multiple election cycles

Answer: b

3. In GeoPandas, the most common problem when merging a GeoDataFrame with a voter file dataset is:

a) Memory limitations from loading large shapefiles b) County name inconsistencies between data sources c) Coordinate reference system (CRS) incompatibility d) Incompatible column data types

Answer: b

4. The "ecological fallacy" refers to the error of:

a) Using environmental data in political analysis without appropriate controls b) Drawing individual-level conclusions from aggregate (e.g., county-level) data c) Treating ecological validity as equivalent to statistical validity d) Generalizing from non-representative samples to broader populations

Answer: b

5. Which matplotlib/seaborn color palettes are described as perceptually uniform (appropriate for quantitative choropleth maps)?

a) jet, rainbow, spectral b) red, blue, green c) viridis, plasma, cividis d) pastel1, set1, tab10

Answer: c

6. When building an interactive choropleth with Plotly Express, the featureidkey parameter specifies:

a) The column in your data that contains the unique identifier for each geographic unit b) The property in the GeoJSON file that matches your data's location identifier c) The encryption key used to protect sensitive voter data d) The key performance indicator being visualized in the choropleth

Answer: b

7. A grouped (side-by-side) bar chart is preferred over a stacked bar chart when:

a) You want to show the total across all groups and the individual components simultaneously b) You want to make direct comparisons between specific categories across groups easier c) The number of categories is too large to fit side by side d) The variables being compared are proportions that sum to 100%

Answer: b

8. What is the primary advantage of using px.choropleth() with Plotly over geopandas.plot() for campaign presentations?

a) Plotly choropleths load faster and use less memory b) Plotly supports more color scales and projection options c) Plotly produces interactive HTML that non-technical users can explore by hovering and zooming d) Plotly automatically handles county name mismatches between datasets

Answer: c

9. A bar chart showing vote shares should:

a) Start the y-axis at the minimum observed value to maximize visual contrast b) Start the y-axis at 0% to avoid exaggerating differences between bars c) Use a logarithmic scale to handle differences in county size d) Always display both the percentage and the absolute vote count on each bar

Answer: b

10. In the context of political visualization, "sample size annotation" (showing n= for each group) serves primarily to:

a) Comply with IRB requirements for displaying data from human subjects b) Signal to readers whether statistics are based on enough observations to be reliable c) Meet the minimum information requirements for academic publication d) Identify which counties have incomplete voter file data

Answer: b

Short Answer

11. Explain why adding population-scaled bubbles to a county choropleth produces a more accurate visual impression of where votes come from, and describe one scenario in which the standard choropleth (without bubbles) would be the more appropriate choice.

Sample answer: A standard choropleth colors counties by their variable value and implicitly weights each county equally in the visual impression, since eye area is roughly proportional to geographic area. In a presidential election map, this makes sparsely populated rural counties dominate the visual picture even though they contribute far fewer votes than small but densely populated urban counties. Population-scaled bubbles correct for this by making the visual weight of each county proportional to its actual voter count. A scenario where the standard choropleth is appropriate: when the question is about geographic pattern rather than vote contribution — for example, showing which geographic areas have high concentrations of persuadable voters regardless of total population, which might guide the routing of campaign surrogates through a region.

12. What is the difference between using vmin and vmax parameters in a GeoPandas choropleth and why does the choice of these parameters matter analytically?

Sample answer: The vmin and vmax parameters set the endpoints of the color scale — the data values that map to the most extreme colors. If support scores range from 30 to 70 in the data, setting vmin=0 and vmax=100 will use only the middle portion of the color scale, making all counties look roughly similar. Setting vmin=30 and vmax=70 will use the full color scale, making differences between counties visually salient. The choice matters analytically because it determines whether small differences or large differences dominate the visual impression. A choropleth where vmin=45 and vmax=55 will make a 5-point difference look huge; one where vmin=0 and vmax=100 will make the same 5-point difference look negligible. Analysts should choose endpoints that reflect the analytically meaningful range of variation.

13 — Coding Question. Write a complete Python code snippet (using matplotlib or seaborn) that produces a horizontal bar chart showing the mean Garza support score for each income bracket in the ODA dataset, sorted from highest to lowest support. Include: appropriate axis labels, a title, data labels on each bar showing the exact score, and a reference line at support score = 50.

Sample answer:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Assumes df is already loaded from ODA dataset
income_support = df.groupby('income_bracket').agg(
    mean_support=('support_score', 'mean'),
    count=('voter_id', 'count'),
).reset_index().sort_values('mean_support', ascending=True)

fig, ax = plt.subplots(figsize=(10, 6))

colors = ['#2166ac' if v >= 50 else '#d6604d' for v in income_support['mean_support']]
bars = ax.barh(income_support['income_bracket'], income_support['mean_support'],
               color=colors, edgecolor='white', height=0.6)

# Data labels
ax.bar_label(bars, fmt='%.1f', padding=4, fontsize=10)

# Reference line at 50
ax.axvline(50, color='black', linewidth=1.2, linestyle='--', label='Toss-up (50)')

ax.set_xlabel('Mean Garza Support Score (0-100)', fontsize=12)
ax.set_title('Garza Support Score by Income Bracket\n(Blue = Above 50, Red = Below 50)',
             fontsize=13, fontweight='bold')
ax.set_xlim(0, 105)
ax.legend(fontsize=10)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.savefig('income_support_bars.png', dpi=150, bbox_inches='tight')
plt.show()