Glossary

Key terms used throughout this book, listed alphabetically. Chapter references in parentheses indicate where the term is first introduced or discussed in depth.


Adjacency matrix. A square matrix representation of a graph where entry (i, j) indicates an edge between nodes i and j. Used as an alternative to node-link diagrams for dense networks. (Ch 24)

Aesthetic mapping. The assignment of data variables to visual properties such as position, color, size, or shape. A core concept in the grammar of graphics. (Ch 22)

Affordance. A visual property that suggests how an element can be interacted with, such as a button-like shape indicating clickability in a dashboard. (Ch 29)

Alpha (transparency). A value between 0 (fully transparent) and 1 (fully opaque) that controls how much underlying content shows through a plotted element. (Ch 10)

Altair. A declarative statistical visualization library for Python built on Vega-Lite. Users specify what to plot rather than how to draw it. (Ch 22)

Animation. A sequence of frames showing how a visualization changes over time or across a parameter. In matplotlib, created using FuncAnimation or ArtistAnimation. (Ch 15)

Annotation. Text, arrows, or markers added to a chart to highlight or explain specific data points or regions. (Ch 7)

Area chart. A line chart with the region between the line and the baseline filled with color. Emphasizes volume or magnitude over time. (Ch 11)

Artist. In matplotlib, any object that can be drawn on a Figure: lines, text, patches, images, and Axes themselves. All visual elements are Artists. (Ch 10)

Axes. In matplotlib, the rectangular area within a Figure where data is plotted. Contains the x-axis, y-axis, title, and all plotted elements. Not to be confused with "axis." (Ch 10)

Axis. A single dimension of a plot (x-axis or y-axis) that maps data values to spatial positions via a scale. (Ch 10)

Backend. The rendering engine that matplotlib uses to produce output. Common backends include Agg (raster), PDF, SVG, and interactive backends for Jupyter. (Ch 10)

Bar chart. A chart using rectangular bars whose lengths are proportional to the values they represent. Bars can be vertical or horizontal. (Ch 5, Ch 11)

Binning. The process of grouping continuous data into discrete intervals (bins). Used in histograms and hexbin plots. (Ch 11)

Box plot (box-and-whisker plot). A chart showing the median, interquartile range, and outliers of a distribution. Useful for comparing distributions across groups. (Ch 11, Ch 17)

Bubble chart. A scatter plot where a third variable is encoded as the size (area) of each point. (Ch 14)

Callback. In Dash, a Python function that is automatically triggered when an input component changes, updating one or more output components. (Ch 30)

Cartogram. A map in which geographic regions are distorted in size to represent a data variable rather than land area. (Ch 23)

Categorical data. Data that falls into distinct groups or labels with no inherent numeric order (e.g., country, product type). (Ch 5)

Chart-junk. Non-data visual elements that clutter a chart without adding information: heavy gridlines, 3D effects, decorative images. Term coined by Edward Tufte. (Ch 6)

Choropleth. A thematic map in which geographic regions are shaded or colored to represent data values. (Ch 23)

Cividis. A perceptually uniform colormap optimized for readers with deuteranopia (red-green color blindness). (Ch 3)

Colorbar. A legend that maps colors in a colormap to data values. Typically displayed as a gradient strip beside a heatmap or scatter plot. (Ch 12)

Colorblind-safe. A palette or design that remains distinguishable to people with common forms of color vision deficiency (protanopia, deuteranopia, tritanopia). (Ch 3)

Colormap (cmap). A function that maps scalar values to colors. Sequential colormaps go from light to dark; diverging colormaps have a neutral midpoint. (Ch 3, Ch 12)

Constrained layout. A matplotlib layout engine that automatically adjusts subplot positions to prevent overlapping labels and titles. (Ch 13)

Coordinate system. The framework that maps data values to positions in the plot area. Common systems include Cartesian (x, y), polar (r, theta), and geographic (lat, lon). (Ch 22)

Dashboard. An interactive application that combines multiple visualizations, filters, and summary statistics into a single view. (Ch 29, Ch 30)

Dash. A Python framework for building production-grade analytical web applications, built on Plotly, Flask, and React. (Ch 30)

Data-ink ratio. The proportion of a chart's ink (pixels) devoted to displaying data versus non-data elements. Higher ratios generally produce clearer charts. Concept from Edward Tufte. (Ch 6)

Datashader. A Python library for rendering very large datasets (millions to billions of points) by rasterizing data into fixed-size images. (Ch 28)

Declarative visualization. A paradigm where the user specifies the desired result (mappings, marks, encodings) and the library determines the rendering details. Altair and Plotly Express follow this approach. (Ch 22)

Density plot (KDE plot). A smoothed estimate of a variable's probability density function, produced by kernel density estimation. (Ch 17)

Diverging palette. A color scheme with two contrasting hues meeting at a neutral midpoint, used for data with a meaningful center (e.g., zero, average). (Ch 3)

Dot plot. A chart using positioned dots to represent values. Often preferred over bar charts for comparing values because the eye judges position more accurately than length. (Ch 5)

DPI (dots per inch). A measure of resolution. Screen output typically uses 72--150 DPI; print output uses 300--600 DPI. (Ch 33)

Dual axis. A chart with two different y-axes sharing the same x-axis. Generally discouraged because it invites misinterpretation of the relationship between the two series. (Ch 4)

Encoding. The visual property (position, length, angle, color, size, shape) used to represent a data variable. (Ch 2)

Faceting. Splitting a dataset by a categorical variable and creating one subplot per group, enabling comparison. Also called small multiples or trellis plots. (Ch 8, Ch 16)

Figure. In matplotlib, the top-level container that holds one or more Axes. Corresponds to the entire image or window. (Ch 10)

Fill between. A matplotlib technique for shading the area between two curves, commonly used to show confidence intervals or ranges. (Ch 12)

Funnel chart. A chart showing progressive reduction through stages of a process (e.g., website visitors to purchasers). (Ch 21)

Geospatial visualization. Any chart that displays data on a map or in geographic coordinates. (Ch 23)

Gestalt principles. A set of perceptual rules (proximity, similarity, closure, continuity, enclosure) describing how humans group visual elements. (Ch 2)

Grammar of graphics. A theoretical framework decomposing a chart into data, aesthetics, geometries, scales, coordinates, and facets. Originated by Leland Wilkinson. (Ch 22)

GridSpec. A matplotlib class for creating complex subplot layouts with rows and columns of varying sizes. (Ch 13)

Heatmap. A matrix visualization where cell color represents magnitude. Used for correlation matrices, pivot tables, and confusion matrices. (Ch 14, Ch 18)

Hexbin plot. A scatter-plot alternative for large datasets that bins points into hexagonal cells and maps count (or another aggregate) to color. (Ch 11)

Histogram. A chart that divides a continuous variable into bins and displays the count or frequency of observations in each bin as bars. (Ch 11)

Hover tooltip. A popup that appears when a user moves the cursor over a data point in an interactive chart, displaying additional information. (Ch 20)

Hue. In color theory, the attribute that distinguishes red from blue from green. In seaborn, the hue parameter maps a categorical variable to color. (Ch 3, Ch 16)

Imperative visualization. A paradigm where the user issues step-by-step drawing commands (draw a line, add a label). matplotlib follows this approach. (Ch 10)

Interquartile range (IQR). The range between the 25th and 75th percentiles of a distribution. The "box" in a box plot spans the IQR. (Ch 17)

Jitter. Small random displacement added to data points to reduce overplotting, especially with discrete or rounded values. (Ch 18)

KDE (kernel density estimation). A non-parametric method for estimating the probability density function of a continuous variable. (Ch 17)

Layout engine. The system responsible for positioning subplots, labels, and legends within a Figure. matplotlib offers tight_layout and constrained_layout. (Ch 13)

Legend. A chart component that maps visual encodings (color, shape, size) back to data categories or ranges. (Ch 12)

Lightness. The perceived brightness of a color, independent of hue. Sequential colormaps vary primarily in lightness. (Ch 3)

Line chart. A chart connecting data points with straight line segments, typically used for ordered sequences (time series). (Ch 5, Ch 11)

Lollipop chart. A variant of a bar chart where each bar is replaced by a thin stem and a dot at the value, reducing visual weight. (Ch 14)

Luminance. The measurable intensity of light emitted or reflected by a surface. Related to but distinct from perceived lightness. (Ch 2)

Mark. In the grammar of graphics, the geometric element used to represent data (point, bar, line, area, text). (Ch 22)

Mosaic plot. A chart that uses nested rectangles to show proportions across two categorical variables. (Ch 14)

NetworkX. A Python library for creating, manipulating, and analyzing graphs and networks. (Ch 24)

Node-link diagram. A network visualization where entities are shown as nodes (circles) and relationships as edges (lines). (Ch 24)

Overplotting. A condition where too many points overlap, obscuring the data distribution. Solutions include transparency, jitter, binning, and aggregation. (Ch 11)

Pair plot. A matrix of scatter plots showing the relationship between every pair of variables in a dataset, with univariate distributions on the diagonal. (Ch 19)

Palette. A set of colors used together in a visualization. Palettes can be sequential, diverging, or qualitative. (Ch 3)

Parallel coordinates. A chart where each variable is represented by a vertical axis and each observation is a polyline crossing all axes. Used for high-dimensional data. (Ch 19)

Perceptually uniform. A colormap property ensuring that equal numerical differences map to equal perceived color differences. The viridis family meets this criterion. (Ch 3)

Pie chart. A circular chart divided into wedges proportional to category values. Effective only with 2--5 categories. (Ch 5)

Plotly. A Python graphing library that produces interactive, browser-based visualizations. Offers both Plotly Express (high-level) and Graph Objects (low-level) APIs. (Ch 20, Ch 21)

Plotly Express. The high-level API of Plotly, providing one-line functions for common chart types with automatic theming and animation support. (Ch 20)

Pre-attentive processing. The rapid, automatic visual processing that occurs before conscious attention. Certain visual attributes (color, orientation, size) are detected pre-attentively. (Ch 2)

Qualitative palette. A set of colors designed to be maximally distinguishable, used for unordered categorical data. (Ch 3)

Radar chart (spider chart). A chart using radial axes to display multiple quantitative variables, with data points connected by a polygon. (Ch 14)

Raster image. An image composed of a grid of pixels (PNG, JPEG). Raster images lose quality when scaled up, unlike vector images. (Ch 33)

rcParams. matplotlib's global configuration dictionary controlling default figure size, fonts, line widths, colors, and more. (Ch 12)

Regression plot. A scatter plot with a fitted regression line and optional confidence band. In seaborn: sns.regplot() or sns.lmplot(). (Ch 18)

Ridgeline plot (joy plot). A set of overlapping density plots, offset vertically, for comparing distributions across many groups. (Ch 17)

Sankey diagram. A flow diagram where the width of arrows is proportional to the quantity flowing between stages or nodes. (Ch 21)

Saturation. The intensity or purity of a color. Fully saturated colors are vivid; desaturated colors approach gray. (Ch 3)

Scale. A function that maps data values to visual values (e.g., a linear scale mapping 0--100 to 0--500 pixels). (Ch 22)

Scatter plot. A chart plotting individual observations as points positioned by their x and y values. Reveals relationships, clusters, and outliers. (Ch 5, Ch 11)

Seaborn. A Python visualization library built on matplotlib that provides a high-level interface for statistical graphics. (Ch 16--19)

Sequential palette. A color scheme that varies continuously from light to dark (or low to high saturation), used for ordered numeric data. (Ch 3)

Small multiples. A grid of similar charts, each showing a subset of the data (typically one category per panel), sharing the same axes and scales. (Ch 8)

Sparkline. A small, word-sized line chart embedded in text or a table cell, showing trend without axes or labels. (Ch 6)

Spine. In matplotlib, the lines forming the border of the Axes area (top, bottom, left, right). Removing unnecessary spines reduces chart-junk. (Ch 12)

Stacked bar chart. A bar chart where bars are divided into colored segments representing sub-categories. Shows parts-of-whole within each bar. (Ch 11)

Streamlit. A Python framework for building data apps with minimal code. Scripts run top-to-bottom; widgets are declared inline. (Ch 29)

Strip plot. A scatter plot of one continuous variable against a categorical variable, showing individual observations. (Ch 18)

Subplot. One of multiple Axes arranged within a single Figure. (Ch 13)

Sunburst chart. A radial hierarchical chart where each ring represents a level in the hierarchy, and arc length represents value. (Ch 21)

Tight layout. A matplotlib layout adjustment that automatically pads subplots to prevent overlapping labels. (Ch 13)

Time series. A sequence of data points indexed in chronological order. (Ch 25)

Title. Text at the top of a chart or Axes describing what the visualization shows. Effective titles state the insight, not just the variables. (Ch 7)

Tooltip. See hover tooltip. (Ch 20)

Treemap. A chart that uses nested rectangles to represent hierarchical data, with rectangle area proportional to value. (Ch 21)

Tufte, Edward. Information design pioneer and author of The Visual Display of Quantitative Information. Coined "data-ink ratio" and "chart-junk." (Ch 6)

Vector image. An image described by geometric shapes (paths, curves). Scales to any resolution without quality loss. Formats: SVG, PDF, EPS. (Ch 33)

Vega-Lite. A high-level grammar of interactive graphics specification in JSON. Altair generates Vega-Lite specifications from Python. (Ch 22)

Violin plot. A combination of a box plot and a mirrored KDE plot, showing both summary statistics and the full distribution shape. (Ch 17)

Viridis. The default matplotlib colormap since version 2.0. Perceptually uniform, colorblind-safe, and prints well in grayscale. (Ch 3)

Waffle chart. A grid of small squares (typically 10x10 = 100) where colored squares represent proportions. An alternative to pie charts. (Ch 14)

Widget. In Streamlit or Dash, an interactive UI component (slider, dropdown, checkbox) that lets users control chart parameters. (Ch 29, Ch 30)

Word cloud. A visualization where word size is proportional to frequency or importance. Useful for quick exploration of text data but imprecise for comparison. (Ch 26)

Zoom. An interactive feature allowing users to magnify a region of a chart. Available in Plotly, Altair, and matplotlib's widget backend. (Ch 20)