Prerequisites

Diagnostic Self-Assessment

Before starting this book, work through the checklist below. For each item, honestly assess whether you can do it without looking anything up. You do not need a perfect score --- the categories below explain what is required, what is helpful, and what is explicitly not needed.


Required: Python Fundamentals

You should be able to do all of the following comfortably. If you cannot, work through a Python basics course before starting this book.

Variables, Types, and Expressions

  • [ ] Assign values to variables and use meaningful variable names.
  • [ ] Work with core types: int, float, str, bool, None.
  • [ ] Perform arithmetic operations and understand operator precedence.
  • [ ] Use f-strings for string formatting (e.g., f"Value: {x:.2f}").
  • [ ] Understand truthy/falsy values and boolean logic.

Control Flow

  • [ ] Write if / elif / else blocks.
  • [ ] Write for loops over lists, ranges, and dictionaries.
  • [ ] Write while loops with proper termination conditions.
  • [ ] Use list comprehensions for simple transformations.

Functions

  • [ ] Define functions with positional and keyword arguments.
  • [ ] Use default parameter values.
  • [ ] Return values from functions (including multiple values via tuples).
  • [ ] Understand variable scope (local vs. global) at a basic level.

Data Structures

  • [ ] Create and manipulate lists: indexing, slicing, appending, iterating.
  • [ ] Create and manipulate dictionaries: access by key, iteration over keys/values/items.
  • [ ] Understand tuples and when to use them.
  • [ ] Use sets for membership testing.

Modules and Imports

  • [ ] Import standard library modules (import math, from os import path).
  • [ ] Import third-party packages (import pandas as pd).
  • [ ] Understand the as alias convention for common packages.

Quick Check

Can you read and understand this code without running it?

def summarize_scores(scores, threshold=70):
    """Return counts of passing and failing scores."""
    results = {"pass": 0, "fail": 0}
    for score in scores:
        if score >= threshold:
            results["pass"] += 1
        else:
            results["fail"] += 1
    return results

exam_scores = [85, 62, 91, 58, 73, 69, 77, 44, 95, 70]
summary = summarize_scores(exam_scores)
passing_rate = summary["pass"] / len(exam_scores)
print(f"Passing rate: {passing_rate:.1%}")

If you can predict the output (Passing rate: 60.0%), your Python is ready.


Required: pandas Basics

You should be comfortable with the following pandas operations. You do not need to be a pandas expert --- we use pandas primarily for loading and preparing data before visualization.

Loading Data

  • [ ] Read CSV files with pd.read_csv().
  • [ ] Inspect DataFrames with .head(), .info(), .describe(), .shape.
  • [ ] Understand the difference between a DataFrame and a Series.

Selection and Filtering

  • [ ] Select columns by name: df["column"] and df[["col1", "col2"]].
  • [ ] Filter rows with boolean conditions: df[df["age"] > 30].
  • [ ] Use .loc[] and .iloc[] for label-based and position-based selection.

Grouping and Aggregation

  • [ ] Use .groupby() with single and multiple columns.
  • [ ] Apply aggregation functions: .sum(), .mean(), .count(), .agg().
  • [ ] Understand the structure of a grouped result.

Basic Transformation

  • [ ] Create new columns from existing ones: df["new"] = df["a"] + df["b"].
  • [ ] Use .apply() for simple row-wise or column-wise operations.
  • [ ] Sort with .sort_values().
  • [ ] Handle missing values at a basic level: .isna(), .dropna(), .fillna().

Quick Check

Can you write code to accomplish this task?

Given a CSV file sales.csv with columns region, product, and revenue, load the file, filter to rows where revenue exceeds 1000, group by region, compute the mean revenue per region, and sort the result in descending order.

If you can write that in 4--6 lines of pandas, you are ready.


Required: Basic Statistics Concepts

You need intuition for these concepts --- we will use them to inform chart choices, not to derive proofs.

  • [ ] Mean, median, mode --- You know what they measure and when each is appropriate.
  • [ ] Distributions --- You understand that data has a shape (normal, skewed, bimodal, uniform) and can sketch a rough histogram.
  • [ ] Variance and standard deviation --- You understand them as measures of spread.
  • [ ] Correlation --- You understand positive, negative, and zero correlation at an intuitive level.
  • [ ] Percentiles and quartiles --- You can interpret a box plot.
  • [ ] Categorical vs. numerical data --- You know the difference and can give examples of each.
  • [ ] Sample vs. population --- You understand that data is usually a sample and that summaries are estimates.

You do not need to know hypothesis testing, regression, Bayesian statistics, or any advanced statistical methods. When specialized statistical concepts appear (e.g., in Chapter 27 on statistical/scientific visualization), they are explained in context.


NOT Required

The following are explicitly not prerequisites. If you already have experience with them, great --- you will move faster through certain chapters. But the book assumes no prior knowledge of:

No matplotlib or Visualization Library Experience

We start from zero. Chapter 10 introduces matplotlib's architecture from the ground up. If you have never typed import matplotlib.pyplot as plt, that is perfectly fine. If you have, you will still learn things in Chapters 10--15 because we go far beyond the basic tutorial.

No Design Background

You do not need to have studied graphic design, typography, color theory, or visual communication. Parts I and II teach these from first principles, using perception science rather than artistic intuition. If you think "I'm not a visual person," this book was specifically written for you.

No Art or Drawing Ability

Data visualization is engineering, not art. You will never be asked to draw anything freehand. Every visual in this book is generated by code.

No Web Development

Interactive visualization chapters (Plotly, Altair, Streamlit, Dash) handle the HTML/JavaScript layer for you. You do not need to know HTML, CSS, or JavaScript. When web concepts arise (e.g., how Plotly renders in the browser), they are explained at the level needed.

No Command-Line Expertise

Basic terminal usage (running pip install, python script.py, jupyter notebook) is helpful but not assumed. The Prerequisites for each chapter note any command-line operations required.


Filling Gaps

If you identified gaps in the required prerequisites, here are recommended resources:

Python Fundamentals

  • Official Python Tutorial (docs.python.org/3/tutorial/) --- Free, authoritative, well-structured.
  • Automate the Boring Stuff with Python by Al Sweigart --- Excellent for practical Python fluency. Available free online.
  • Python Crash Course by Eric Matthes --- Well-paced introduction with good exercises.

pandas

  • pandas Documentation Getting Started Tutorials (pandas.pydata.org/docs/getting_started/) --- Official, well-maintained, covers all the basics.
  • Python for Data Analysis by Wes McKinney --- Written by the creator of pandas. Comprehensive and authoritative.
  • Kaggle Learn: Pandas (kaggle.com/learn/pandas) --- Free, short, exercise-driven.

Statistics Concepts

  • Naked Statistics by Charles Wheelan --- Intuitive, non-mathematical introduction to statistical thinking.
  • Khan Academy Statistics and Probability (khanacademy.org/math/statistics-probability) --- Free, video-based, self-paced.
  • OpenIntro Statistics (openintro.org/book/os/) --- Free textbook, more rigorous but still accessible.

Environment Setup

  • Real Python: Installing Python (realpython.com/installing-python/) --- Platform-specific guides.
  • JupyterLab Documentation (jupyterlab.readthedocs.io/) --- Getting started with notebooks.

Environment Checklist

Before starting Chapter 1, confirm that you can:

  1. [ ] Open a terminal and run python --version (shows 3.11 or later).
  2. [ ] Create and activate a virtual environment.
  3. [ ] Run pip install pandas numpy matplotlib successfully.
  4. [ ] Open a Jupyter notebook and execute a cell containing import pandas as pd; print(pd.__version__).
  5. [ ] Load a CSV file in pandas and display the first five rows.

If all five checks pass, you are ready to begin. Turn to Chapter 1.