Chapter 2: How the Eye Sees — Pre-Attentive Processing and Visual Encoding

43 min read

In Chapter 1, we made the case that visualization is a cognitive tool: it offloads pattern recognition from slow, serial conscious thought to the fast, parallel processing of the visual system. We argued that a good chart lets you see the answer.

Learning Objectives

Define pre-attentive processing and explain why it matters for chart design
List the pre-attentive visual attributes and rank them by encoding accuracy
Apply Cleveland & McGill's hierarchy of visual encoding accuracy
Explain the Gestalt principles and demonstrate how they organize visual information
Predict which visual encodings will be most accurately decoded by viewers for a given data type
Identify common chart designs that work against human perception

In This Chapter

2.1 Before You Think: Pre-Attentive Processing
2.2 The Visual Encoding Channels
2.3 Not All Encodings Are Equal: The Cleveland & McGill Hierarchy
2.4 Bertin's Retinal Variables
2.5 Gestalt Principles: How the Eye Groups
2.6 What the Eye Gets Wrong
2.7 Putting It Together: Matching Data to the Right Visual Channel
Chapter Summary
Spaced Review: Chapter 1 Concepts
What's Next
Progressive Project Checkpoint
Key Terms Introduced in This Chapter

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 2: How the Eye Sees — Pre-Attentive Processing and Visual Encoding

Your eye is not a camera — it is an opinion machine.

In Chapter 1, we made the case that visualization is a cognitive tool: it offloads pattern recognition from slow, serial conscious thought to the fast, parallel processing of the visual system. We argued that a good chart lets you see the answer.

But that raises an uncomfortable question: what, exactly, does "seeing" mean?

If the eye were a camera — a passive recorder of light — then any chart that contained the right data would work equally well. A pie chart, a bar chart, a table of numbers: the information is all there, so what difference does the form make?

The difference is enormous. And the reason is that the visual system is not passive. It does not simply record the photons that arrive at the retina. It interprets them. It ranks them. It groups them. It fills in gaps, suppresses irrelevant details, and highlights what it judges to be important — all in the first few hundred milliseconds, before your conscious mind has any say in the matter.

This chapter is about those first few hundred milliseconds. It is the theoretical backbone of the entire book. Every design decision you will make in later chapters — choosing a bar chart over a pie chart, positioning a legend, selecting a color palette, deciding how many series to put on one plot — traces back to the principles introduced here.

We will cover three bodies of work. First, pre-attentive processing: the automatic, involuntary visual operations that happen before conscious attention kicks in. Second, visual encoding channels: the specific visual properties (position, length, color, size, angle, shape) that you can map data to, and the experimental evidence for why some channels are dramatically more accurate than others. Third, Gestalt principles: the perceptual rules by which the eye groups visual elements into patterns, and how those rules shape chart interpretation.

If this sounds like a psychology lecture, good. It is. But it is also the most practical chapter in the book. Once you understand how the eye works, bad chart design becomes physically obvious — you can feel it fighting your perception — and good chart design becomes a matter of aligning your visual choices with the machinery that will process them.

2.1 Before You Think: Pre-Attentive Processing

The 200-Millisecond Rule

Open any chart and something happens before you read the title, before you scan the axis labels, before you engage any conscious reasoning at all. In approximately 200 milliseconds — a fifth of a second — your visual system has already extracted information. It has detected colors. It has grouped shapes. It has registered which element is largest, which is darkest, which is an outlier.

This is pre-attentive processing: the extraction of visual properties that occurs automatically, in parallel across the entire visual field, without the direction of conscious attention. The term comes from research by Anne Treisman in the 1980s, though the phenomenon had been observed for decades earlier. Pre-attentive processing is fast (under 250 ms), parallel (it operates on all elements simultaneously, not one at a time), and involuntary (you cannot choose not to do it).

Consider a concrete example. Imagine a grid of 80 numbers, all printed in black ink, arranged in 8 rows of 10. Your task is to find the number 7. You have no choice but to scan the grid serially, checking each number one at a time. Depending on where the 7 falls, this might take you several seconds.

Now imagine the same grid, but the number 7 is printed in red while all other numbers remain black. You do not need to scan. The 7 pops out. It is immediately, effortlessly, unavoidably visible. You could not miss it if you tried. That is pre-attentive processing at work: the visual system detected the color difference across the entire field in parallel and flagged the outlier before your conscious attention was involved.

This pop-out effect is not a curiosity. It is the fundamental mechanism that makes visualization work. When you color-code data points by category, you are relying on pre-attentive color processing to let the viewer see the groups without reading labels. When you make a bar taller than its neighbors, you are relying on pre-attentive length processing to make the difference visible at a glance. When you place a dot higher on a y-axis, you are relying on pre-attentive position processing to communicate magnitude.

Every good chart exploits pre-attentive processing. Every bad chart fights it.

What Pops Out (and What Doesn't)

Not every visual property triggers pre-attentive processing. Research by Christopher Healey, James Enns, and others has identified a specific set of pre-attentive visual attributes — properties that the visual system can detect in parallel across the field. The major ones are:

Attribute	What It Looks Like	Chart Application
Color hue	Red among blue dots	Categorical encoding (groups, categories)
Color intensity/saturation	Dark among light	Sequential encoding (magnitude)
Size	Large circle among small	Magnitude encoding (bubble charts)
Orientation	Tilted line among vertical	Slope encoding (slopegraphs)
Length	Long bar among short	Magnitude encoding (bar charts)
Position	High dot among low dots	Magnitude encoding (scatter plots, line charts)
Shape	Square among circles	Categorical encoding (marker types)
Motion	Moving dot among still	Animation, highlighting (limited print use)
Enclosure	Boxed item among unboxed	Grouping, emphasis
Curvature	Curved line among straight	Limited chart use

Here is the critical point: pre-attentive processing works on one attribute at a time when searching for a target. If you ask someone to find a red circle among red squares and blue circles, neither the color nor the shape will pop out — the viewer must perform a slow, serial search because the target is defined by a conjunction of two attributes. This is Treisman's feature integration theory, and it has direct implications for chart design: do not require your viewer to decode two channels simultaneously to identify a data point.

A scatter plot where points differ by both shape (circle vs. triangle) and color (blue vs. orange) to encode two categorical variables works — but only if the viewer can query one variable at a time. If the viewer needs to find a specific combination (say, orange triangles), the pre-attentive advantage vanishes. This is why experienced chart designers often encode the most important variable with color (the strongest pre-attentive categorical channel) and use shape only as a secondary, redundant, or low-priority encoding.

Asymmetry in Pop-Out

An important subtlety: pre-attentive pop-out is not always symmetric. In many cases, searching for a present feature among absent features is faster than searching for an absent feature among present features. For example, finding a tilted line among vertical lines (the tilted line "pops out") is faster than finding a vertical line among tilted lines (the vertical line does not pop out as strongly). Similarly, finding a moving dot among stationary dots is nearly instantaneous, but finding the one stationary dot among moving dots is harder.

For chart design, this asymmetry matters when you use pre-attentive attributes for emphasis. If you want an outlier to pop out, give it a present feature that the other elements lack — a bright color among muted ones, a large size among small ones, a filled shape among hollow ones. Do not rely on the absence of a feature to create pop-out; it is a weaker signal.

This also explains why the common design pattern of "gray out everything except what matters" is so effective. The highlighted element has a present feature (saturated color) that the gray context elements lack. The asymmetry of pop-out ensures the highlight is detected immediately.

The Numerosity Effect

Pre-attentive processing also enables rapid estimation of numerosity — roughly how many items of a particular type are present in a display. If a scatter plot has 200 blue dots and 50 orange dots, the viewer can immediately perceive that "there are many more blue than orange" without counting. The visual system performs this approximate enumeration in parallel.

This numerosity perception supports a particular kind of chart reading: assessing relative proportions in scatter plots, density in point clouds, and balance in categorical distributions. But it is approximate, not exact. The viewer can perceive "roughly twice as many" or "far fewer" but cannot pre-attentively count 47 vs. 53 dots. For precise counts, the viewer must switch to serial processing.

Implications for Design

Pre-attentive processing gives you a design principle with teeth:

If you want the viewer to see something instantly, encode it with a pre-attentive attribute. If you encode it with a non-pre-attentive property (like the specific number printed on a label), the viewer must search for it serially.

This is why a bar chart is faster to read than a table of numbers. The bars are processed pre-attentively (length, position); the numbers are not. Both contain the same data. But the bar chart works with the visual system, while the table works around it.

Check Your Understanding

What is the approximate time window for pre-attentive processing?

If you highlight an outlier data point using color in a scatter plot, which type of processing allows the viewer to spot it immediately?

Why does finding a "red circle" among red squares and blue circles require serial search rather than pre-attentive pop-out?

2.2 The Visual Encoding Channels

In Chapter 1, we said that visualization maps data to visual form. Now we can be precise about what "visual form" means. The specific visual properties that you map data values to are called visual encoding channels (or simply visual channels). Jacques Bertin, the French cartographer who published Semiology of Graphics in 1967, called them retinal variables — the properties that the retina can distinguish.

There are roughly nine channels commonly used in data visualization. We will examine each in turn, noting what data types each is suited for and how accurately the eye can decode it.

Position (Common Spatial Position)

Position is the placement of a mark along a shared scale — typically an x-axis or y-axis. When you plot a dot on a scatter plot, the x- and y-coordinates are position encodings. When you draw a line chart, each point's vertical position encodes a value.

Position is the most accurate visual encoding channel. Humans are extraordinarily good at judging the relative position of marks along a common scale. This is why scatter plots, line charts, and dot plots are the workhorses of quantitative visualization: they all use position as their primary encoding.

Position works for both quantitative data (continuous and discrete) and ordinal data (categories with an inherent order). It is less effective for purely nominal categories unless an ordering convention exists.

Length

Length is the spatial extent of a mark — typically the height of a bar in a bar chart. The viewer judges how far the mark extends from a baseline.

Length is the second most accurate channel, close behind position. The key requirement is a common baseline: all bars must start from the same point for the viewer to compare their lengths accurately. This is why stacked bar charts are harder to read than grouped bar charts — the interior segments of a stacked bar do not share a common baseline, so the viewer must estimate length without an anchor.

Practical implication: bar charts should almost always start at zero. If you truncate the y-axis of a bar chart (starting at, say, 50 instead of 0), you are visually shortening the bars and distorting the perceived differences. This is one of the most common forms of chart deception, and we will return to it in Chapter 4.

Angle and Slope

Angle is the rotation of a mark relative to a reference direction. In a pie chart, the central angle of each slice encodes a proportion. In a slopegraph, the angle of the connecting line encodes the rate of change.

Angle is substantially less accurate than position or length. The human visual system is not well calibrated for judging angles, particularly in the 20-70 degree range. This is one of the core reasons why pie charts are problematic: they force the viewer to judge angles, which the visual system does poorly.

Consider this concrete comparison. Imagine two pie charts, each with five slices. In the first, the slices represent 18%, 19%, 20%, 21%, and 22% of the whole. In the second, the slices represent 5%, 10%, 15%, 30%, and 40%. Most viewers can correctly rank the slices in the second chart (the differences are large enough to overcome the angle-decoding penalty), but few can correctly rank the first five slices, because the angle differences are too small for the eye to discriminate. A bar chart showing the same five values (18, 19, 20, 21, 22) would make the ordering trivially visible, because it uses length and position — higher-accuracy channels.

Area

Area is the two-dimensional extent of a mark — the size of a circle in a bubble chart, or the area of a region in a treemap. The viewer judges how much space a mark occupies.

Area is a low-accuracy channel. The problem is well documented: humans systematically underestimate differences in area. A circle with twice the area of another circle does not look twice as large. Research following Stevens's Power Law (a psychophysical law relating physical stimulus magnitude to perceived magnitude) shows that perceived area grows as approximately the 0.7 power of actual area. In plain language: to make a circle look twice as big, you need to make it about 2.5 times the actual area.

This is why bubble charts should be used with caution, and why treemaps are hard to read for precise comparisons. They can show broad patterns (this category is much bigger than that one) but they fail at fine-grained comparison.

Color Hue

Color hue is the categorical dimension of color — red vs. blue vs. green, independent of lightness or saturation. It is the dominant channel for encoding categorical (nominal) data: different groups, different series, different conditions.

Color hue is excellent for categories but poor for quantitative data. The eye can distinguish perhaps 6-8 hues reliably in a single chart before the categories start to blur together. Beyond that, the viewer needs a legend, and the moment they need a legend, they are no longer processing pre-attentively — they are performing serial lookup.

Color hue has no natural ordering. Red is not "more" than blue. This is why using a rainbow color scale for quantitative data (a practice that remains disturbingly common) is perceptually wrong: it imposes a categorical channel on continuous data. We will devote all of Chapter 3 to color.

Color Intensity (Value/Luminance/Saturation)

Color intensity — the lightness or darkness of a color, or its saturation level — is the quantitative dimension of color. A sequential color scale running from pale yellow to dark brown uses intensity to encode magnitude: darker means more.

Intensity is a moderate-accuracy channel for ordered data. It works well for choropleth maps (darker regions have higher values) and heatmaps. It is less precise than position or length — viewers cannot reliably distinguish more than about 5-7 intensity steps — but it excels at conveying an overall pattern across many elements simultaneously.

Size

Size is closely related to area but can also refer to the width or diameter of a mark. It is often used for the third variable in a bubble chart (x-position, y-position, and bubble size).

Size shares area's accuracy problems, compounded by the ambiguity of what exactly is being measured: does the viewer compare diameters (linear) or areas (quadratic)? Chart designers must be explicit, and viewers must be warned. In practice, size is a rough channel suitable for showing broad magnitude differences, not fine distinctions.

Shape

Shape is the form of a mark — circle, square, triangle, diamond, cross, plus sign. It is a categorical channel, used to distinguish groups within a scatter plot.

Shape is a weak pre-attentive attribute when many shapes are present. The eye can distinguish perhaps 4-6 shapes before they start to interfere with each other. Shape is best used as a secondary or redundant encoding — for instance, using both color and shape to encode the same categorical variable, so that the chart remains readable for viewers with color vision deficiency.

Texture and Pattern

Texture (hatching, stippling, crosshatching) is a legacy encoding from the era of black-and-white printing. It works for categorical distinctions but is visually noisy and generally avoided in modern digital visualization. When you see a bar chart with diagonal stripes and crosshatches, you are looking at a design choice that was pragmatic in 1985 and unnecessary today.

Motion

Motion — animation of marks over time — is a powerful pre-attentive attribute (a moving dot among still dots pops out instantly), but it is only available in interactive or animated formats. In static charts (which still make up the majority of data visualization), motion is not an option.

Real-World Application: Encoding Choices in Practice

Consider Meridian Corp's quarterly sales dashboard. The dashboard needs to show revenue by product line (4 categories) over time (12 quarters) with a comparison to target.

A naive design might use a pie chart per quarter — encoding revenue as angle, categories as color hue. The viewer must decode 12 pie charts, each requiring angle judgments, then mentally compare across the 12 to see trends. This is encoding-hostile: it uses a low-accuracy channel (angle) and forces serial comparison across separate charts.

A better design uses a grouped bar chart: time on the x-axis (position encoding for the temporal variable), revenue on the y-axis (length/position encoding for the quantitative variable), and color hue for the four product lines. The target appears as a horizontal reference line. This design uses the two most accurate channels (position and length) for the quantitative data, reserves color for the categorical distinction, and adds a reference mark for comparison — all without requiring the viewer to decode angles.

The data is the same. The encoding choice makes it either painful or effortless to read.

2.3 Not All Encodings Are Equal: The Cleveland & McGill Hierarchy

The previous section described individual channels. Now we need to rank them. Which channels let the viewer decode data most accurately?

The landmark answer came from William Cleveland and Robert McGill in 1984, in a paper titled "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods." Cleveland was a statistician at Bell Labs; McGill was at AT&T. Their work was the first rigorous experimental test of how accurately people decode different visual encodings.

The Experiments

Cleveland and McGill designed a series of experiments in which participants were shown simple charts and asked to judge the ratio of two marked values. For example: "The smaller value is what percentage of the larger?" They tested multiple encoding types: position along a common scale, position along identical but non-aligned scales, length, angle, area, volume, color saturation, and curvature.

The experimental design was elegant. By asking the same question ("what percentage is this of that?") across different encodings, they isolated the effect of the encoding channel itself, holding the judgment task constant.

The Hierarchy

Their results, replicated and extended by many researchers since, produced a ranking of visual encoding accuracy from most to least accurate:

Ranked from Most to Least Accurate:

Rank	Encoding	Typical Chart Type	Error Rate
1	Position on a common scale	Scatter plot, dot plot, line chart	Lowest
2	Position on identical but non-aligned scales	Small multiples with separate y-axes	Low
3	Length	Bar chart (common baseline)	Low-Moderate
4	Angle / Slope	Pie chart, slopegraph	Moderate
5	Area	Treemap, bubble chart	Moderate-High
6	Volume / Curvature	3D bar chart, curvature-based	High
7	Color saturation / density	Choropleth, heatmap	Highest

This ranking is the single most important empirical result in data visualization. It is the reason this book — and every serious visualization guide — recommends bar charts over pie charts, scatter plots over bubble charts, and position-based encodings over area-based ones for precise quantitative comparisons.

What the Hierarchy Means in Practice

The hierarchy does not say "never use area or color." It says: use the most accurate encoding channel that your data type and communication goal allow. Sometimes a heatmap (color intensity) is the right choice because you have a 50x50 matrix and position encoding would require 2,500 separate marks on aligned scales — which is impractical. Sometimes a bubble chart (area) is the right choice because you need to encode a third quantitative variable and you have already used both spatial position channels.

But the hierarchy does say: do not use a low-accuracy channel when a high-accuracy channel would serve equally well. If you are comparing five values and you have the space for a bar chart, do not use a pie chart. The pie chart forces the viewer to decode angle; the bar chart lets them decode position and length. The data is the same; the perceptual load is different.

The hierarchy also explains why certain chart types persist despite being perceptually suboptimal. Pie charts survive because they are visually familiar, because they communicate the "part-of-whole" frame effectively even when the angle decoding is imprecise, and because many audiences expect them. Treemaps survive because they can show hierarchical structure in a compact space. The Cleveland-McGill hierarchy is a tool for making informed tradeoffs, not a prohibition list.

Crowdsourced Replications

The Cleveland-McGill results have been remarkably robust. In 2010, Jeffrey Heer and Michael Bostock (of D3.js fame) replicated the experiments using Amazon Mechanical Turk, testing thousands of participants instead of the original small samples. Their results closely matched the original ranking. In 2014, further replications added nuance — for instance, showing that aligned position judgments are more accurate than unaligned ones even when both are "position" — but the core hierarchy held firm.

Historical Context: The Bell Labs Legacy

Cleveland and McGill were not working in isolation. Bell Labs in the 1970s and 1980s was one of the most productive research environments in history. John Tukey (inventor of the box plot and the FFT algorithm) was down the hall. The S programming language — ancestor of R — was being developed at Bell Labs at the same time. Cleveland himself later developed the loess smoothing algorithm and the concept of "trellis graphics" (what we now call small multiples or faceted plots). The visualization research came out of a culture that took statistical graphics as seriously as statistical methods — a culture that treated "how you look at data" as a first-class research problem.

2.4 Bertin's Retinal Variables

Before Cleveland and McGill ran their experiments, the theoretical groundwork was laid by Jacques Bertin, a French cartographer and semiotician who published Semiology of Graphics in 1967 (the English translation appeared in 1983). Bertin's work is dense, systematic, and ahead of its time by decades.

Bertin identified what he called the retinal variables: the visual properties that the retina can distinguish independently of the two spatial dimensions (x and y position) that define the plane of the page. He listed six:

Size (the areal extent of a mark)
Value (lightness or darkness)
Texture (the pattern or grain of a mark)
Color (what we now call hue)
Orientation (the angle of a mark)
Shape (the form of a mark)

Bertin was not an experimental psychologist — he was a practitioner and theorist. His taxonomy was based on careful observation and logical analysis, not controlled experiments. But his framework was remarkably prescient. The Cleveland-McGill hierarchy can be seen as the empirical validation of a question Bertin was already asking: which retinal variables are best suited for which types of data?

Bertin's Data Type Mapping

Bertin went further than listing variables. He classified them by the types of relationships they could express:

Retinal Variable	Selective?	Ordered?	Quantitative?
Size	Partially	Yes	Yes (approximately)
Value (lightness)	Yes	Yes	No
Texture	Yes	Sometimes	No
Color (hue)	Yes	No	No
Orientation	Yes	No	No
Shape	Yes	No	No

Selective means the variable can distinguish categories — can you pick out all the red marks from the blue ones? Ordered means the variable has a natural ranking — is dark "more" than light? Quantitative means the variable can express proportional differences — is this mark twice as much as that one?

Notice that only size is even approximately quantitative, and Bertin knew it was imprecise. Color hue is selective but not ordered — you can tell groups apart, but the groups have no inherent ranking. This is exactly the insight that Cleveland and McGill confirmed experimentally two decades later.

Why Bertin Still Matters

Bertin's taxonomy remains valuable because it provides a design vocabulary. When you are choosing how to encode a variable, you can ask: Is this variable nominal (categories with no order), ordinal (categories with order), or quantitative (continuous magnitudes)? Then consult Bertin's table: which retinal variables support the relationship I need to express?

If you need to show ordered data, do not use shape (which is not ordered). If you need to show quantitative differences, do not rely on color hue (which is not quantitative). If you need to distinguish categories, any selective variable will work, but hue is the most effective because it is strongly pre-attentive.

This is not abstract theory. It is a decision procedure you will use every time you design a chart.

Bertin's Influence on Modern Tools

Bertin's ideas, though published in the 1960s, directly shaped the tools many of us use today. Jock Mackinlay, a researcher at Xerox PARC and later at Tableau Software, extended Bertin's framework in his 1986 doctoral thesis, adding expressiveness and effectiveness criteria to Bertin's channel classification. Mackinlay's work asked: given a data type and a communication goal, which encoding channels are expressive (capable of representing the data relationship) and which are effective (capable of representing it accurately)? This question — essentially Bertin's taxonomy plus Cleveland-McGill's accuracy ranking — became the design engine inside Tableau's "Show Me" feature, which recommends chart types based on the data you have selected.

Leland Wilkinson's The Grammar of Graphics (1999) extended Bertin's thinking into a formal algebraic system for specifying visualizations. Wilkinson's grammar decomposed a chart into layers (data, aesthetics, statistics, geometry, coordinates, facets) — with the "aesthetics" layer corresponding directly to Bertin's retinal variables. Hadley Wickham's ggplot2 (2005) implemented Wilkinson's grammar in R, and Altair (which you will use in Chapter 22 of this book) implements a similar grammar in Python via Vega-Lite.

When Altair asks you to write alt.Chart(data).mark_point().encode(x='gdp:Q', y='life_exp:Q', color='continent:N'), you are performing a Bertin-style mapping: assigning a quantitative variable to the x-axis position channel, another to the y-axis position channel, and a nominal variable to the color hue channel. The syntax is modern; the perceptual logic is from 1967.

2.5 Gestalt Principles: How the Eye Groups

So far we have discussed individual visual elements — a single bar, a single dot, a single color. But charts contain many elements, and the visual system does not process them in isolation. It automatically organizes them into groups, structures, and patterns. The rules that govern this automatic grouping were described by the Gestalt psychologists in the early 20th century (the German word Gestalt means roughly "form" or "whole pattern").

Six Gestalt principles are directly relevant to chart design. Each one describes a way that the visual system decides "these things go together."

Proximity

Elements that are close together are perceived as belonging to the same group.

This is the most powerful Gestalt grouping principle and one of the most useful tools in chart design. In a grouped bar chart, the bars for a single time period are placed close together with wider gaps between periods. The viewer automatically sees each cluster as a unit — "these bars are about Q1" — without needing a label or a box.

Consider a scatter plot with 200 data points. If the points naturally cluster into three clumps with empty space between them, the viewer sees three groups instantly. No color coding is needed; proximity alone creates the grouping. Conversely, if you color-code three categories but the points are spatially interleaved, the color coding will compete with the proximity signal, and the viewer may struggle.

Practical applications: use whitespace (proximity and its inverse, distance) to group related chart elements (a title and its subtitle, an axis and its labels, a legend and its associated series). Place annotations near the data they describe. Separate distinct chart panels with generous gutters.

Similarity

Elements that share visual properties (color, shape, size) are perceived as belonging to the same group.

When you color-code data points by category — blue for Group A, orange for Group B — you are relying on similarity. The viewer's visual system groups all the blue points together and all the orange points together, even if they are scattered across the plot. This is similarity through color hue, and it is the basis of virtually all categorical color encoding.

Similarity can also operate through shape (all triangles are perceived as one group, all circles as another), size (large marks vs. small marks), or orientation. Color hue is the strongest similarity channel for chart design because it is the most strongly pre-attentive of the categorical visual attributes.

Important nuance: similarity and proximity can conflict. If blue and orange points are randomly scattered, similarity (color) creates the grouping. But if you place all the blue points on the left and all the orange points on the right, both proximity and similarity reinforce the same grouping, making it even stronger. Good chart design aligns proximity and similarity so they reinforce each other rather than compete.

Enclosure

Elements that are enclosed within a boundary are perceived as belonging to the same group.

A bounding box, a shaded background region, or a panel border all trigger enclosure. In a faceted plot (small multiples), each panel is enclosed in its own rectangle, and the viewer instantly perceives each panel as a separate unit. In an annotated chart, a shaded band behind a region of the x-axis (say, a recession period) groups all the data points within that period together.

Enclosure is a powerful grouping mechanism but it adds visual weight (ink). Use it when proximity alone is insufficient — for instance, when you need to highlight a region of a chart that overlaps with other data.

Connection

Elements that are connected by a line or curve are perceived as belonging to the same group.

This is why line charts work. When you draw a line through a series of data points, the viewer perceives those points as a single series — a single story unfolding over time. Without the line, the same points in a scatter plot would be perceived as individual, unrelated marks (unless proximity or similarity grouped them).

Connection is also why drawing lines between related points in a parallel coordinates plot creates the perception of a single observation flowing through multiple dimensions. And it is why legends that use lines to connect a label to a data series are more effective than legends that rely on color swatches alone when the chart is cluttered.

Connection overrides proximity. Two points far apart on a chart are perceived as related if a line connects them. Two points close together are perceived as unrelated if no connection exists and other grouping cues (color, enclosure) point elsewhere.

Continuity

The eye follows smooth, continuous paths rather than abrupt changes in direction.

When two lines cross in a chart, the viewer perceives two continuous lines intersecting, not four line segments meeting at a point. This is continuity at work: the visual system prefers the interpretation that preserves smooth paths.

In chart design, continuity means that trend lines, fitted curves, and smoothed series will be perceived as coherent wholes. It also means that jagged, noisy lines are harder to follow — the eye wants smoothness. This is one perceptual justification for moving averages and smoothing in time-series visualization: not just statistical noise reduction, but perceptual continuity support.

Closure

The eye completes incomplete shapes, perceiving closed forms even when gaps exist.

If you draw three-quarters of a circle, the viewer perceives a circle, not an arc. In chart design, closure explains why gridlines can be light and partial — the viewer's visual system will complete the grid pattern without needing every line drawn in full weight. It also explains why sparklines (tiny inline charts with minimal axes and no labels) work: the viewer's eye completes the implied axis framework.

Closure is a license to remove visual clutter. You do not need to draw every border, close every box, or complete every axis line. The viewer's visual system will fill in the gaps, and the reduced ink load makes the data more prominent.

Check Your Understanding

A scatter plot uses blue circles for Group A and orange triangles for Group B. Which two Gestalt principles cause the viewer to perceive two distinct groups? What happens if the blue and orange points are spatially interleaved?

You remove the top and right border lines from a chart frame, leaving only the x-axis and y-axis. Which Gestalt principle predicts that viewers will still perceive a rectangular chart area?

In a line chart with two series, the lines cross three times. Which Gestalt principle is disrupted at each crossing?

Common Pitfall: Gestalt Conflicts

The most confusing charts are those where Gestalt principles conflict rather than reinforce each other. A common example: a line chart with five series, all drawn in similar colors, with lines that cross frequently. Similarity (similar colors) tells the eye all the lines are one group. Connection (each line is continuous) tells the eye each is a separate series. Continuity (the eye follows smooth paths) is disrupted at every crossing.

The result is a chart that the viewer must decode serially, tracing each line with their eye and checking the legend repeatedly. This is sometimes called "spaghetti chart" for good reason.

The fix is to align the Gestalt principles: use distinct colors (similarity separates the series), reduce the number of series (fewer crossings preserve continuity), and consider direct labeling at the end of each line (connection between the label and the line reinforces grouping without requiring a separate legend).

2.6 What the Eye Gets Wrong

The visual system is remarkably effective, but it is not infallible. Its shortcuts — the very heuristics that make pre-attentive processing fast — also produce systematic errors. Understanding these errors is essential for honest chart design.

Optical Illusions in Charts

The visual system compares elements relative to their context, not in absolute terms. This produces illusions that directly affect chart reading:

The Size-Context Illusion. A circle surrounded by larger circles appears smaller than the same circle surrounded by smaller circles (the Ebbinghaus illusion). In a bubble chart, this means that a bubble's perceived size is influenced by its neighbors. A data point in a region of large bubbles will appear smaller than the same data point in a region of small bubbles, even though its actual area is identical.

The Contrast Illusion. A gray bar on a dark background appears lighter than the same gray bar on a light background. In heatmaps and choropleth maps, adjacent cells or regions influence each other's perceived color intensity. A region with a moderate value may appear light when surrounded by dark neighbors, or dark when surrounded by light neighbors, even though its actual color is unchanged.

The Angle Bisection Illusion. In pie charts, slices near 90 degrees tend to be overestimated, while slices near 0 or 180 degrees tend to be underestimated. This is one reason why pie charts produce systematic errors in proportion judgment — the errors are not random but biased by the angle.

Change Blindness

Change blindness is the failure to notice changes in a visual scene when those changes occur during a disruption (a blink, a page turn, a transition between slides). In a data visualization context, change blindness affects comparison across charts. If you show revenue for Q1 on one slide and revenue for Q2 on the next, the viewer is unlikely to detect subtle differences — the slide transition disrupts the visual comparison.

This is why side-by-side comparison (two charts visible simultaneously) is almost always more effective than sequential comparison (one chart, then another). It is also why animated transitions between chart states must be designed carefully: if too many elements change at once during the transition, the viewer may miss the specific change that matters.

Inattentional Blindness

Inattentional blindness is the failure to perceive a visible object when attention is directed elsewhere. The famous demonstration is the "invisible gorilla" experiment: viewers counting basketball passes fail to notice a person in a gorilla suit walking through the scene.

In charts, inattentional blindness means that a viewer focused on one aspect of the data (the trend in the blue line) may completely miss important information in another part of the chart (the diverging red line, the footnote about a data correction, the axis break). This is not a failure of vision; it is a failure of attention.

Design implication: do not rely on the viewer noticing something. If it is important, make it salient. Use pre-attentive attributes (color, size, position) to ensure that critical information is detected automatically, not left to chance.

The Limits of Working Memory

Working memory — the mental workspace where you hold and manipulate information — has a capacity of roughly 3-5 items for visual information (the old "7 plus or minus 2" figure from George Miller has been revised downward in modern research). This means that a chart with 12 differently-colored categories exceeds the viewer's ability to hold the color-category mappings in memory. They will need to consult the legend repeatedly, slowing comprehension and increasing error.

Design implication: keep the number of distinct visual categories to 5-7 or fewer. If you must show more categories, use strategies like grouping (consolidate the 5 smallest into "Other"), highlighting (show the top 3 in color and the rest in gray), or using small multiples (one panel per category, so each panel has only one series).

The Weber-Fechner Threshold

The visual system does not perceive differences in absolute terms — it perceives them in relative terms. The Weber-Fechner Law states that the just-noticeable difference (JND) between two stimuli is proportional to the magnitude of the stimulus. For length, this means that the difference between a 10 cm bar and a 10.5 cm bar is about as noticeable as the difference between a 20 cm bar and a 21 cm bar — both are 5% differences. But the absolute difference in the second case (1 cm) is twice as large.

For chart design, this means that differences between large values are harder to detect than the same absolute difference between small values. In a bar chart comparing revenues of $1 billion and $1.05 billion, the 5% difference is visible but not dramatic. The same $50 million difference between $100 million and $150 million is visually striking because it represents a 50% relative change.

This is a fundamental reason why percentage change and ratio comparisons often require different charts than absolute value comparisons. A viewer looking at a bar chart naturally perceives relative differences (driven by the Weber-Fechner principle), which aligns well with percentage comparisons but can mislead when absolute differences matter.

The Attentional Bottleneck

Even with pre-attentive processing handling the first 200 milliseconds, conscious attention remains a bottleneck. After the initial parallel scan, the viewer must serially attend to individual elements to extract detailed information — reading exact values, comparing specific pairs, tracing a line through crossings. Each act of conscious attention takes time (roughly 200-500 ms per element) and displaces whatever was previously in attention.

This is why charts that are "busy" feel exhausting even when they contain no individual element that is hard to read. The problem is not any single element — it is the number of serial attention shifts required. A chart with 50 labeled data points, 5 color-coded series, a dual y-axis, and a dozen gridlines demands dozens of attention shifts to comprehend. A cleaner chart with direct labels, fewer series, and minimal gridlines demands far fewer, leaving the viewer's attentional capacity available for the actual analytical task.

Practical rule: count the number of distinct things a viewer must consciously attend to in your chart. If the count exceeds 10, look for ways to reduce it — through aggregation, filtering, highlighting, or redesign.

Thought Experiment: Decoding Different Encodings

Imagine you are given four charts, all showing the same data: the population of five countries.

Chart A: A bar chart with five vertical bars, labeled by country, with a shared y-axis. The bars are 3.2, 3.5, 3.7, 3.9, and 8.1 cm tall.

Chart B: A pie chart with five slices. The slices are 10%, 11%, 12%, 12.5%, and 54.5% of the circle.

Chart C: Five circles (a bubble chart) with areas proportional to the population.

Chart D: Five squares, each filled with a shade of gray from light to dark, where darkness encodes population.

Ask yourself: In which chart can you most quickly and accurately rank the five countries? In which can you most accurately estimate the ratio of the largest to the smallest?

Chart A (bar chart, position/length encoding) wins on both counts. Chart B (pie chart, angle encoding) lets you see that one country dominates, but ranking the four similar slices is difficult. Chart C (bubble chart, area encoding) makes the dominant country obvious but the four similar-sized bubbles are nearly impossible to rank. Chart D (shaded squares, color intensity encoding) gives only the coarsest ranking.

This is the Cleveland-McGill hierarchy in action. Same data, different encodings, wildly different accuracy.

2.7 Putting It Together: Matching Data to the Right Visual Channel

Now we synthesize. Given a dataset and a communication goal, how do you choose the right visual encoding?

The Decision Procedure

Step 1: Classify your variables. For each variable you want to encode, determine its type:

Quantitative (continuous or discrete numbers: revenue, temperature, count)
Ordinal (categories with order: low/medium/high, education level, Likert scale)
Nominal/Categorical (categories without order: product line, country, gender)
Temporal (a special form of ordinal/quantitative: date, time)

Step 2: Assign channels using the hierarchy. Map your most important quantitative variable to the most accurate available channel (position). Map secondary quantitative variables to the next available channels (length, or a second position axis). Map categorical variables to the strongest categorical channel (color hue). Use shape and size as tertiary channels only when needed.

Step 3: Check for Gestalt alignment. Ensure that the grouping signals in your chart (proximity, similarity, connection) reinforce rather than contradict each other. If you use color to encode categories, arrange the chart so that same-colored elements are somewhat proximate, or connect them with lines.

Step 4: Check for perceptual pitfalls. Will the viewer need to judge angles? Can you replace angle with length? Will the viewer need to compare areas? Can you replace area with position? Will there be more than 6-7 categories in a single color encoding? Can you consolidate, filter, or facet?

A Worked Example: The Climate Dataset

This book's progressive project uses a climate dataset: global temperature anomalies, atmospheric CO2 concentrations, and sea-level measurements over 150 years.

When we eventually build the first version of this chart in Chapter 10, we will use a line chart. Let us trace why that choice is correct using the principles from this chapter:

Temperature anomaly is a quantitative variable. By the Cleveland-McGill hierarchy, we should use position encoding. We will map temperature to the y-axis (vertical position on a common scale). This is the highest-accuracy channel.
Year is a temporal variable with inherent order. We will map it to the x-axis (horizontal position on a common scale). Again, the highest-accuracy channel.
The data forms a time series — ordered observations of a single variable over time. The line connecting the points invokes the Gestalt principle of connection, telling the viewer that these are sequential observations of the same phenomenon, not unrelated data points.
The line's overall shape — rising, falling, flat — invokes continuity, allowing the eye to perceive the trend as a smooth path. The year-to-year noise will be visible as small deviations from the smooth path, but the overall trajectory will be immediately apparent.
If we later add CO2 as a second series on the same chart, we will use color hue (similarity) to distinguish the two lines. If we add sea level as a third, we may instead switch to small multiples (one panel per variable), using enclosure to separate the three series and proximity (shared x-axis alignment) to allow temporal comparison across panels.

None of these choices are aesthetic preferences. They are perceptual necessities. Position for quantitative data. Connection for sequential data. Similarity for categorical distinctions. This is what it means to design charts that work with the visual system.

Common Encoding Mistakes

Mistake	What Goes Wrong Perceptually	Better Alternative
Pie chart for comparing values	Angle encoding is low accuracy	Bar chart (length/position)
3D bar chart	Perspective distorts length; adds volume encoding (least accurate)	2D bar chart
Rainbow color scale for continuous data	Hue is categorical, not ordered	Sequential color scale (single hue, varying intensity)
Bubble chart for precise comparison	Area is systematically underestimated	Dot plot or bar chart
Too many series on one line chart	Connection and similarity conflict; spaghetti effect	Small multiples or highlight one series
Truncated bar chart axis	Length encoding is distorted; bars do not start at zero	Start bar axis at zero, or use dot plot if baseline is irrelevant
Dual y-axes	Position encoding becomes ambiguous — which scale?	Two aligned panels with shared x-axis

Chapter Summary

The visual system is not a passive camera. It is an active, opinionated processing engine that extracts information from a scene in under 200 milliseconds, before conscious thought begins. This pre-attentive processing detects visual attributes — color, length, position, size, orientation, shape — in parallel across the visual field, allowing certain visual properties to "pop out" from their surroundings.

Not all visual encodings are equal. The Cleveland-McGill hierarchy, established experimentally in 1984 and replicated many times since, ranks visual encoding channels by the accuracy with which viewers can decode quantitative information: position > length > angle > area > color saturation. Jacques Bertin's retinal variables provide the theoretical vocabulary for these channels, classifying them by their ability to express selective, ordered, and quantitative relationships.

The Gestalt principles — proximity, similarity, enclosure, connection, continuity, and closure — govern how the eye groups visual elements into patterns. Good chart design aligns these principles so they reinforce each other; bad chart design allows them to conflict, producing confusion.

The visual system also has systematic weaknesses: optical illusions caused by contextual comparison, change blindness during visual disruptions, inattentional blindness when attention is directed elsewhere, and working memory limits of 3-5 visual items. Honest, effective chart design accounts for these limitations rather than exploiting or ignoring them.

The practical takeaway is a design procedure: classify your variables, assign them to encoding channels using the accuracy hierarchy, ensure Gestalt alignment, and check for perceptual pitfalls. Every chart design choice in the rest of this book follows this procedure.

Spaced Review: Chapter 1 Concepts

Before moving on, retrieve these ideas from Chapter 1 without looking back:

What is Anscombe's Quartet, and what does it demonstrate about the limits of summary statistics?
Name three cognitive benefits of visualization over tables of numbers.
What does "visualization as a cognitive tool" mean?

If any of these are fuzzy, revisit Chapter 1 before proceeding. The concepts build cumulatively.

What's Next

Chapter 3 takes the color channel — which we have treated as a single entry in the encoding hierarchy — and devotes an entire chapter to it. Color is the most frequently misused encoding in data visualization, the source of the most accessibility failures, and the subject of the most heated design debates. We will cover color spaces, perceptual uniformity, categorical vs. sequential vs. diverging palettes, and accessible design for the 8% of men and 0.5% of women with color vision deficiency.

Progressive Project Checkpoint

Where we are: We have the climate dataset in mind (temperature anomalies, CO2, sea level over 150 years) but have not written a line of code.

What we decided in this chapter: When we build the first chart, we will use position encoding for both the temporal variable (x-axis) and the quantitative variable (y-axis). We will use a line chart because the connection principle naturally conveys sequential temporal data. If we need to show multiple variables, we will use color hue for categorical distinction between series, or small multiples with enclosure for clean separation.

Why these choices are not arbitrary: Position is the highest-accuracy encoding channel (Cleveland & McGill). Connection groups sequential observations (Gestalt). Color hue distinguishes categories pre-attentively. These are perceptual necessities, not stylistic preferences.

Next milestone: In Chapter 3, we will decide what colors to use for those series — and the choice will be grounded in color science, not personal taste.

Key Terms Introduced in This Chapter

Term	Definition
Pre-attentive processing	The automatic extraction of visual properties that occurs in under 250 ms, before conscious attention
Visual encoding	The mapping of a data variable to a visual property (position, length, color, etc.)
Visual channel	A specific visual property available for encoding data (synonym for visual encoding channel)
Cleveland & McGill hierarchy	The experimentally established ranking of visual encoding accuracy: position > length > angle > area > color
Position encoding	Mapping data to the spatial position of a mark along a scale
Length encoding	Mapping data to the spatial extent of a mark (e.g., bar height)
Angle encoding	Mapping data to the rotation of a mark or the central angle of a sector
Area encoding	Mapping data to the two-dimensional extent of a mark
Color encoding	Mapping data to the hue, saturation, or lightness of a mark
Retinal variables	Jacques Bertin's term for the visual properties distinguishable by the retina (size, value, texture, color, orientation, shape)
Jacques Bertin	French cartographer who published Semiology of Graphics (1967), establishing the theoretical vocabulary for visual encoding
Stevens's Power Law	Psychophysical law relating physical stimulus magnitude to perceived magnitude; explains area underestimation
Gestalt principles	Perceptual rules by which the visual system groups elements (proximity, similarity, enclosure, connection, continuity, closure)
Proximity	Gestalt principle: elements close together are perceived as a group
Similarity	Gestalt principle: elements sharing visual properties are perceived as a group
Enclosure	Gestalt principle: elements within a boundary are perceived as a group
Connection	Gestalt principle: elements linked by a line are perceived as a group
Continuity	Gestalt principle: the eye follows smooth, continuous paths
Closure	Gestalt principle: the eye completes incomplete shapes
Change blindness	Failure to detect visual changes during disruptions (blinks, transitions)
Inattentional blindness	Failure to perceive visible objects when attention is directed elsewhere