Quiz: Multi-Variable Exploration

Answer all 20 questions. Answers and explanations are hidden below each question.


Part I: Multiple Choice (10 questions)

Q1. What does sns.pairplot display on the diagonal panels by default?

A) Scatter plots of each variable against itself B) Histograms of each variable (or KDEs when hue is set) C) Empty cells D) Correlation values

Answer **B.** The diagonal shows the distribution of each variable: a histogram by default, switching to a KDE when `hue` is set. The off-diagonal panels are the scatter plots.

Q2. Which seaborn function provides lower-level control over a pair plot, allowing different functions on the upper and lower triangles?

A) sns.pairplot B) sns.PairGrid C) sns.FacetGrid D) sns.jointplot

Answer **B.** `PairGrid` exposes `map_upper`, `map_lower`, and `map_diag` methods. `pairplot` is a wrapper around `PairGrid` that applies the same function to all panels.

Q3. A joint plot with kind="hex" is most useful when:

A) The dataset has fewer than 20 points B) The dataset is so large that a scatter plot would suffer from overplotting C) You want to see a regression line through the data D) The variables are categorical

Answer **B.** Hex bins aggregate points into hexagonal cells and color by count, avoiding overplotting. This makes them valuable for large datasets where a scatter plot would become a solid cloud.

Q4. Which colormap is appropriate for a correlation heatmap?

A) viridis B) magma C) coolwarm D) rocket

Answer **C.** Correlation has a meaningful midpoint (zero), so a diverging colormap like `coolwarm` is appropriate. `viridis`, `magma`, and `rocket` are sequential colormaps suitable for data without a natural midpoint.

Q5. The parameter mask in sns.heatmap does what?

A) Changes the colormap B) Hides specific cells where the mask is True C) Rotates the axis labels D) Annotates the cells with numeric values

Answer **B.** The `mask` parameter accepts a boolean array (or DataFrame). Cells where the mask is True are hidden from the display. This is commonly used to blank out the upper triangle of a symmetric correlation matrix.

Q6. Which NumPy function is typically used to build a mask for the upper triangle of a correlation matrix?

A) np.ones B) np.zeros C) np.triu D) np.diag

Answer **C.** `np.triu` returns the upper triangle of an array; combined with `np.ones_like` and a boolean cast, it produces a mask that hides everything at or above the diagonal.

Q7. In sns.clustermap, the dendrogram encodes:

A) The color palette for each row B) The hierarchical clustering of the rows/columns C) The legend entries D) The correlation values themselves

Answer **B.** The dendrogram is a tree diagram showing how clusters are merged by hierarchical clustering. The height of each internal node represents the distance at which two clusters were joined.

Q8. Which hierarchical clustering method minimizes within-cluster variance?

A) single B) complete C) average D) ward

Answer **D.** Ward's method minimizes the total within-cluster variance at each merge step. It often produces more interpretable clusters than single or complete linkage for correlation matrices.

Q9. If a correlation heatmap shows a value of zero between two variables, what does this definitively tell you?

A) The variables are statistically independent B) The variables have no relationship C) The variables have no linear relationship D) One variable causes the other

Answer **C.** Correlation (Pearson) measures the strength of the *linear* relationship. Two variables can be strongly non-linearly related (e.g., y = x²) and still have zero correlation. Zero correlation does not imply independence or the absence of any relationship.

Q10. When would you use sns.jointplot instead of sns.pairplot?

A) When you want to see all pairs at once B) When you want to focus on a single pair and see its marginal distributions C) When you need a 3D plot D) When you have only categorical variables

Answer **B.** `jointplot` focuses on a single pair and adds marginal distributions on the top and right. `pairplot` shows all pairs but without marginals. Use `jointplot` to zoom in, `pairplot` to zoom out.

Part II: Short Answer (10 questions)

Q11. Name three plot kinds accepted by the kind parameter of sns.jointplot.

Answer Any three of: `"scatter"`, `"kde"`, `"hex"`, `"reg"`, `"hist"`. The `scatter` kind is the default; `hex` is valuable for large datasets; `reg` adds a regression line; `kde` shows bivariate density; `hist` shows a 2D histogram.

Q12. What is the purpose of setting vmin=-1 and vmax=1 when calling sns.heatmap on a correlation matrix?

Answer It fixes the colormap's range to match the mathematical range of correlation (−1 to +1). Without this, the heatmap would autoscale to the actual range of the data — if the strongest correlation were 0.5, that value would appear as maximum color intensity, exaggerating its magnitude visually.

Q13. Describe Shneiderman's information-seeking mantra and how it applies to multi-variable exploration.

Answer "Overview first, zoom and filter, details on demand." In multi-variable work: start with a heatmap or pair plot for overview; zoom in with a joint plot or filtered pair plot; inspect specific rows of the DataFrame for details. The mantra is a workflow, not a rule — it prevents the beginner mistake of diving into details without first understanding the shape of the dataset.

Q14. Explain why sns.pairplot with more than about eight variables becomes impractical.

Answer With n variables, the pair plot has n² panels. At n=8 the plot has 64 panels; at n=20 it has 400. Each panel shrinks proportionally, becoming too small to reveal the shape of the relationships. The labels on the axes also collide. Beyond eight variables, a correlation heatmap is a more compact alternative.

Q15. What is the difference between a diverging colormap and a sequential colormap? Give an example of when to use each.

Answer A **diverging colormap** uses two hues diverging from a neutral midpoint. It is appropriate for data with a meaningful midpoint — correlation (centered at zero), deviations from a baseline, z-scores. A **sequential colormap** uses a single hue progression from low to high. It is appropriate for data without a midpoint — counts, magnitudes, density, intensity. Example: correlation → `coolwarm`; counts → `viridis`.

Q16. In a cluster map, what does the height of an internal node in the dendrogram represent?

Answer The height represents the distance (or dissimilarity) at which the two child clusters were merged. Low heights mean the clusters were already similar when merged; high heights mean the merge occurred at a large dissimilarity. Groups of leaves that share a low internal node are tight clusters.

Q17. Write the code to build a mask that hides everything above the diagonal (including the diagonal itself) of a correlation matrix corr.

Answer
mask = np.triu(np.ones_like(corr, dtype=bool))
`np.triu(ones)` produces a matrix with ones on and above the diagonal; casting to `bool` makes them `True`. Use `k=1` to exclude the diagonal from the mask.

Q18. Why might a correlation heatmap mislead you about the relationship between two variables?

Answer Correlation measures only the linear component of a relationship. Non-linear relationships (quadratic, threshold, oscillating) can have zero correlation despite being deterministic. Additionally, correlation is sensitive to outliers — a few extreme points can inflate or deflate the coefficient. Always verify strong (and suspiciously weak) correlations with a scatter plot.

Q19. Describe the two-stage workflow for exploring a dataset with 30 numeric variables.

Answer Stage 1: correlation heatmap (or cluster map) of all 30 variables to identify which pairs correlate strongly and which groups of variables move together. Stage 2: pick a small subset (5–8) of the most interesting variables and produce a pair plot for detailed inspection. Follow up the strongest pair with a joint plot. The heatmap does the filtering; the pair plot and joint plot do the deep inspection.

Q20. What is the purpose of the center parameter in sns.heatmap, and when does it matter?

Answer `center` tells the colormap where to place its neutral color. For correlation heatmaps, `center=0` ensures that zero correlation maps to the neutral color of a diverging colormap, so positive and negative correlations appear symmetrically. Without `center=0`, the colormap would shift based on the data's mean, creating visual asymmetry that does not reflect the mathematical structure of correlation.

Scoring Rubric

Score Level Meaning
18–20 Mastery You understand the multi-variable toolkit and can pick the right tool for each question.
14–17 Proficient You know the main tools and their basic parameters; review the pitfalls and clustering sections.
10–13 Developing You grasp the big picture; re-read Sections 19.3–19.8 and redo Part B of the exercises.
< 10 Review Re-read the full chapter and work through all Part A and Part B exercises before moving on.

After this quiz, move on to Chapter 20 (Plotly Express), which introduces interactive multi-variable exploration with a similar but web-native API.