Quiz: Choosing the Right Chart

DataField.Dev

Quiz: Choosing the Right Chart

20 questions. Aim for mastery (18+). If you score below 14, revisit the relevant sections before moving to Chapter 6.

Multiple Choice (10 questions)

1. According to the chapter's central thesis, the chart type should be determined primarily by:

(a) The type of data you have in your dataset (b) The question you are trying to answer (c) The tool or library you are most comfortable with (d) What looks most visually appealing in the finished output

Answer

**(b)** The question you are trying to answer. The threshold concept of the chapter — "Question Before Chart" — explicitly states that the same dataset can and should produce completely different charts depending on whether you are asking about comparison, distribution, relationship, composition, change over time, or spatial pattern. Data type is the second input, not the first. Tools and aesthetics come later, as part of the context check.

2. Which of the following is NOT one of the five data types used in the chart selection framework?

(a) Categorical (b) Continuous / Quantitative (c) Temporal (d) Hierarchical

Answer

**(d)** Hierarchical. The five data types in the framework are: categorical, continuous/quantitative, temporal, spatial, and network/relational. Hierarchical structures can often be represented as a special case of network data (a tree is a restricted graph) or as categorical data with implied ordering, but "hierarchical" is not one of the five primary types in the framework.

3. Meridian Corp's marketing team wants to know whether customers with higher contract values also have higher retention rates. Both contract value and retention rate are continuous variables. Which chart type is the best starting point?

(a) Pie chart (b) Line chart (c) Scatter plot (d) Stacked bar chart

Answer

**(c)** Scatter plot. The question is a *relationship* question — "how do two continuous variables relate to each other?" — and the signal chart type for relationship questions with two continuous variables is the scatter plot. A line chart would imply temporal sequence. A pie chart would imply part-to-whole composition. A stacked bar chart would imply grouped totals. Only the scatter plot puts the two variables into direct positional encoding so the correlation (or lack of correlation) becomes visible.

4. You are comparing monthly revenue for six product lines over the past year. You want the viewer to see both the trend over time and the comparison across product lines, without creating a spaghetti chart of overlapping lines. The best choice is:

(a) A single line chart with all six series overlaid and a legend (b) A dual-axis chart (c) Small multiples (one line chart per product line) (d) A stacked area chart

Answer

**(c)** Small multiples (one line chart per product line). Small multiples solve the spaghetti problem by separating each series into its own panel with a shared axis. The viewer can see each trend clearly and compare across panels. The single overlaid chart would produce exactly the spaghetti effect you are trying to avoid. A dual-axis chart is dangerous and applies to two variables, not six. A stacked area chart would show composition over time, which is a different question.

5. Which question type does a histogram answer?

(a) Comparison (b) Distribution (c) Relationship (d) Composition

Answer

**(b)** Distribution. A histogram shows the *shape* of a single continuous variable — where values cluster, how they spread out, whether the distribution is symmetric or skewed, and whether it has one peak or several. That is precisely the distribution question. Bar charts (comparison) and histograms look similar but answer fundamentally different questions — which is exactly why Section 5.6 treats "bar chart vs. histogram" as one of the most common confusions.

6. A pie chart is rarely the right choice for composition questions because:

(a) Pie charts cannot display more than two categories (b) Humans are poor at comparing angles and areas, which are the encodings pie charts rely on (c) Pie charts are not supported by most modern visualization libraries (d) Pie charts always distort the data

Answer

**(b)** Humans are poor at comparing angles and areas, which are the encodings pie charts rely on. This connects directly to Cleveland and McGill's encoding accuracy hierarchy from Chapter 2. Position and length are perceived more accurately than angle and area, which is why a bar chart of the same data is easier to read than a pie chart. Pie charts with more than about four or five slices become particularly hard to read because small angle differences are perceptually invisible. A pie chart is not always wrong, but it is almost always a worse choice than a horizontal bar chart for the same data.

7. Temporal data almost always appears on which axis in standard chart conventions?

(a) The y-axis, with time flowing top to bottom (b) The x-axis, with time flowing left to right (c) The radial axis of a polar plot (d) The color channel

Answer

**(b)** The x-axis, with time flowing left to right. This is a strong visualization convention in left-to-right reading cultures. Time flows with reading direction, so the viewer's eye moves from past to future naturally. Violating the convention (time on the y-axis, time flowing right to left) is not technically wrong, but it fights the viewer's expectation and slows comprehension. There are exceptions — polar charts for cyclic data, vertical timelines for narrative use — but for standard time-series charts, time goes on the horizontal axis.

8. You want to show how global temperature varies across different regions of the world. The question is fundamentally "where are the hotspots?" The best chart type is:

(a) A bar chart of average temperatures by country (b) A scatter plot of temperature vs. latitude (c) A choropleth map (d) A line chart with one series per continent

Answer

**(c)** A choropleth map. The question is explicitly a *spatial pattern* question — "where" is the key word. A choropleth map encodes geographic data in its native spatial form, letting the viewer see clusters, gradients, and hotspots that are invisible in non-spatial charts. A bar chart would preserve the values but lose the geography. A scatter plot of temperature vs. latitude captures one dimension of the spatial story but not all of it. A line chart by continent aggregates away the geographic detail.

9. Which of the following is the signal chart type for comparing 2024 values to a historical baseline across multiple categories (e.g., 2024 revenue vs. 2020 revenue for each product line)?

(a) Dual-axis line chart (b) Slope chart (c) Stacked bar chart (d) Bubble chart

Answer

**(b)** Slope chart. A slope chart is specifically designed for two-time-point comparisons across multiple categories. Each category is represented by a line connecting its baseline value to its current value, so the viewer can see both the direction (up/down) and the magnitude (steep/shallow) of change for every category in a single glance. A dual-axis line chart is misleading. A stacked bar chart loses the individual category trajectories. A bubble chart would add a third variable that the question does not require.

10. According to the chapter's decision tree, after you have classified your data and your question, the next step is to:

(a) Select the chart type the tool makes easiest (b) Count your variables and check your dataset size, then apply the context check for audience, medium, and purpose (c) Ask your manager what they want to see (d) Look at examples in a chart gallery until one looks right

Answer

**(b)** Count your variables and check your dataset size, then apply the context check for audience, medium, and purpose. The decision tree is a four-step algorithm: (1) classify the question type, (2) classify the data type and count variables, (3) check dataset size (for overplotting and density concerns), and (4) apply the context check for audience, medium, and purpose. The context check is not optional decoration — it is the final filter that settles the choice when two or three candidates remain after the matrix narrows the field.

True / False (5 questions)

11. "A dataset can be classified as a single data type (e.g., 'this is a time-series dataset') and the chart choice follows from that classification."

Answer

**False.** Every real dataset contains a *mix* of data types. The climate dataset has temporal, continuous, and spatial variables. The Meridian Corp dataset has categorical, continuous, and temporal variables. You classify each variable individually, not the dataset as a whole. The chart you choose depends on which specific variables you are plotting for a specific question — not on a single label applied to the whole dataset.

12. "A bar chart is always the right choice for a comparison question, regardless of how many categories you have."

Answer

**False.** Bar charts are the signal chart type for comparison, but they become unreadable when the number of categories grows large. With 50 or 100 categories, a standard bar chart produces a forest of bars that loses individual detail. At that scale, you should consider alternatives: horizontal bar charts (more labels fit), sorted bar charts (by value, not alphabetically), dot plots (less ink per observation), lollipop charts, or grouped/binned summaries. Dataset size is one of the decision-tree inputs for exactly this reason.

13. "Stacked bar charts are a good way to let viewers compare the second, third, and fourth stack segments across categories."

Answer

**False.** Stacked bar charts are good for showing *total* values and for showing the *first* segment (the one touching the baseline) across categories. Beyond the first segment, viewers cannot easily compare sizes because the segments do not share a common baseline — each segment starts wherever the previous segment ended. If comparing the middle or top segments is the point, use grouped bars, small multiples, or a dot plot.

14. "The chart selection matrix gives you exactly one correct chart type for every combination of data type and question type."

Answer

**False.** The matrix gives you a *ranked list of candidates*, not a single answer. For most cells, two or three chart types are appropriate depending on the size of the data, the number of categories, and the context. The decision tree and the context check narrow the candidates to a final choice. Treating the matrix as a deterministic lookup table misses the role of judgment that the chapter emphasizes.

15. "Because viewers can be taught to read any chart type, audience familiarity is not an important consideration in chart selection."

Answer

**False.** Audience familiarity is one of the three contextual factors that settle the final chart choice (audience, medium, purpose). A violin plot may be the most statistically informative chart for a distribution question, but if your audience has never seen a violin plot, it will communicate less effectively than a box plot or histogram. Meeting the audience where they are is not a compromise — it is an ethical and practical obligation of communication.

Short Answer (3 questions)

16. In three to four sentences, describe the two inputs to the chart selection framework and explain why the framework requires both.

Answer

The two inputs are (1) the *type of data* you have — classified as categorical, continuous, temporal, spatial, or network — and (2) the *type of question* you are asking — classified as comparison, distribution, relationship, composition, change over time, or spatial pattern. The framework requires both because data alone does not determine the chart. The same continuous-vs-continuous dataset can support a scatter plot (relationship question), a pair of histograms (distribution question), or a pair of box plots (comparison question) depending on which question you are asking. Neither input alone is sufficient.

17. Explain why the chapter treats "bar chart vs. histogram" as one of the most common confusions in chart selection. What is the key difference, and how does it map to the question type?

Answer

Bar charts and histograms look nearly identical — both use rectangular bars of varying heights — but they answer different questions. A **bar chart** displays categorical data and answers *comparison* questions: "How does product A compare to product B?" Each bar is a separate category, and there are usually gaps between bars. A **histogram** displays continuous data and answers *distribution* questions: "How are values of a single continuous variable spread out?" The bars (properly called bins) represent intervals on a numerical scale, and they are usually drawn touching each other to emphasize continuity. Confusing them produces charts that display the data without answering the intended question — the visual form is right but the meaning is wrong.

18. Name three common "wrong chart" mistakes discussed in the chapter and, for each, state the fix in one sentence.

Answer

(1) **Pie charts with too many slices** — fix by converting to a horizontal bar chart sorted by value, which uses position and length encodings that are more accurately perceived. (2) **Dual-axis charts that imply false correlations** — fix by using small multiples (separate panels with independent but labeled axes) or by normalizing both series to a common scale so they can share a single axis. (3) **3D bar charts** — fix by removing the third dimension and using a flat 2D bar chart, which eliminates the perspective distortions that undermine every comparison in the original. (Other acceptable answers include: spaghetti line charts — fix with small multiples or the highlight strategy; stacked bar charts used for segment comparison — fix with grouped bars; chart type that does not match the question — fix by going back to the question classification.)

Applied Scenarios (2 questions)

19. You are preparing a slide for Meridian Corp's executive team. The slide should show the revenue breakdown by product line (Enterprise, Professional, Starter, Growth, Legacy) for the most recent quarter, so executives can see which product lines are driving revenue and which are lagging. You have about 5 seconds of attention from each executive before they move to the next slide.

(a) Classify the data type and question type for this visualization task. (b) Recommend a chart type and justify your choice using the framework. (c) Explain why you would not use a pie chart for this task, even though pie charts are traditionally used for composition questions.

Answer

**(a)** The data type is categorical (product line) plus continuous (revenue). The question is a *comparison* question first ("which product lines are driving revenue and which are lagging") with a *composition* flavor ("breakdown by product line"). The comparison interpretation is stronger because the executives need to rank the product lines, not just see proportions. **(b)** A **horizontal bar chart sorted by revenue**, with product lines on the y-axis and revenue on the x-axis. The horizontal orientation accommodates the category labels without rotation. Sorting by value lets the executive's eye travel from top (highest revenue) to bottom (lowest) in a natural reading order. With only five categories and a single continuous metric, this chart can be read and understood in well under five seconds. **(c)** A pie chart would rely on angle and area encoding, which Cleveland and McGill's research shows are less accurately perceived than length on a common scale. With five slices, executives would struggle to tell which is larger when two slices are similar in size. A bar chart lets the comparison happen pre-attentively — the bars are sorted, the lengths are directly comparable, and the ranking is obvious at a glance. Under a 5-second constraint, a pie chart's perceptual inefficiency is a real cost, not a stylistic preference.

20. A public health analyst is investigating whether vaccination rates are correlated with GDP per capita across countries. She has data for 190 countries, including vaccination coverage (continuous), GDP per capita (continuous), region (categorical), and income group (categorical: low, lower-middle, upper-middle, high).

(a) Classify the question type and identify the primary two variables. (b) Recommend the best base chart type for the core question. (c) Suggest one way to incorporate the additional variables (region, income group) into the visualization to enrich the story without creating chart junk. (d) Identify one risk of overplotting with 190 data points and suggest a mitigation.

Answer

**(a)** The question type is **relationship / correlation**. The primary two variables are vaccination coverage and GDP per capita — both continuous. **(b)** A **scatter plot** with GDP per capita on the x-axis (often log-transformed because GDP is highly skewed) and vaccination coverage on the y-axis. The scatter plot is the signal chart type for relationship questions between two continuous variables and will make any correlation, cluster, or outlier visible. **(c)** Use **color** to encode income group (a natural ordinal categorical variable with a sequential palette from low to high) and optionally **shape or faceting** to encode region. Color on income group adds a third dimension without distorting the base chart. If four income groups feel too dense in a single panel, faceting by region into small multiples is a clean alternative. Avoid adding bubbles, 3D effects, or additional axes. **(d)** With 190 points, risks include overplotting in dense regions (countries with similar GDP and vaccination rates overlapping) and the scatter losing individual country identity. Mitigations: (1) use semi-transparent markers so overlapping points are visible; (2) label outliers directly rather than relying on a legend; (3) use log-scaled axes if the data is skewed; (4) consider hex bins or density contours if the overplotting is severe. For 190 points, semi-transparent markers and direct labeling of outliers is usually sufficient.

Review your results against the mastery thresholds at the top of the quiz. If you scored below 14, revisit Sections 5.2 through 5.8 before starting Chapter 6. The Data-Ink Ratio chapter that follows assumes you can already name the right chart type for a given question in under ten seconds.