Prerequisites
Diagnostic Self-Assessment
Before starting this book, work through the checklist below. For each item, honestly assess whether you can do it without looking anything up. You do not need a perfect score --- the categories below explain what is required, what is helpful, and what is explicitly not needed.
Required: Python Fundamentals
You should be able to do all of the following comfortably. If you cannot, work through a Python basics course before starting this book.
Variables, Types, and Expressions
- [ ] Assign values to variables and use meaningful variable names.
- [ ] Work with core types:
int,float,str,bool,None. - [ ] Perform arithmetic operations and understand operator precedence.
- [ ] Use f-strings for string formatting (e.g.,
f"Value: {x:.2f}"). - [ ] Understand truthy/falsy values and boolean logic.
Control Flow
- [ ] Write
if/elif/elseblocks. - [ ] Write
forloops over lists, ranges, and dictionaries. - [ ] Write
whileloops with proper termination conditions. - [ ] Use list comprehensions for simple transformations.
Functions
- [ ] Define functions with positional and keyword arguments.
- [ ] Use default parameter values.
- [ ] Return values from functions (including multiple values via tuples).
- [ ] Understand variable scope (local vs. global) at a basic level.
Data Structures
- [ ] Create and manipulate lists: indexing, slicing, appending, iterating.
- [ ] Create and manipulate dictionaries: access by key, iteration over keys/values/items.
- [ ] Understand tuples and when to use them.
- [ ] Use sets for membership testing.
Modules and Imports
- [ ] Import standard library modules (
import math,from os import path). - [ ] Import third-party packages (
import pandas as pd). - [ ] Understand the
asalias convention for common packages.
Quick Check
Can you read and understand this code without running it?
def summarize_scores(scores, threshold=70):
"""Return counts of passing and failing scores."""
results = {"pass": 0, "fail": 0}
for score in scores:
if score >= threshold:
results["pass"] += 1
else:
results["fail"] += 1
return results
exam_scores = [85, 62, 91, 58, 73, 69, 77, 44, 95, 70]
summary = summarize_scores(exam_scores)
passing_rate = summary["pass"] / len(exam_scores)
print(f"Passing rate: {passing_rate:.1%}")
If you can predict the output (Passing rate: 60.0%), your Python is ready.
Required: pandas Basics
You should be comfortable with the following pandas operations. You do not need to be a pandas expert --- we use pandas primarily for loading and preparing data before visualization.
Loading Data
- [ ] Read CSV files with
pd.read_csv(). - [ ] Inspect DataFrames with
.head(),.info(),.describe(),.shape. - [ ] Understand the difference between a DataFrame and a Series.
Selection and Filtering
- [ ] Select columns by name:
df["column"]anddf[["col1", "col2"]]. - [ ] Filter rows with boolean conditions:
df[df["age"] > 30]. - [ ] Use
.loc[]and.iloc[]for label-based and position-based selection.
Grouping and Aggregation
- [ ] Use
.groupby()with single and multiple columns. - [ ] Apply aggregation functions:
.sum(),.mean(),.count(),.agg(). - [ ] Understand the structure of a grouped result.
Basic Transformation
- [ ] Create new columns from existing ones:
df["new"] = df["a"] + df["b"]. - [ ] Use
.apply()for simple row-wise or column-wise operations. - [ ] Sort with
.sort_values(). - [ ] Handle missing values at a basic level:
.isna(),.dropna(),.fillna().
Quick Check
Can you write code to accomplish this task?
Given a CSV file
sales.csvwith columnsregion,product, andrevenue, load the file, filter to rows where revenue exceeds 1000, group by region, compute the mean revenue per region, and sort the result in descending order.
If you can write that in 4--6 lines of pandas, you are ready.
Required: Basic Statistics Concepts
You need intuition for these concepts --- we will use them to inform chart choices, not to derive proofs.
- [ ] Mean, median, mode --- You know what they measure and when each is appropriate.
- [ ] Distributions --- You understand that data has a shape (normal, skewed, bimodal, uniform) and can sketch a rough histogram.
- [ ] Variance and standard deviation --- You understand them as measures of spread.
- [ ] Correlation --- You understand positive, negative, and zero correlation at an intuitive level.
- [ ] Percentiles and quartiles --- You can interpret a box plot.
- [ ] Categorical vs. numerical data --- You know the difference and can give examples of each.
- [ ] Sample vs. population --- You understand that data is usually a sample and that summaries are estimates.
You do not need to know hypothesis testing, regression, Bayesian statistics, or any advanced statistical methods. When specialized statistical concepts appear (e.g., in Chapter 27 on statistical/scientific visualization), they are explained in context.
NOT Required
The following are explicitly not prerequisites. If you already have experience with them, great --- you will move faster through certain chapters. But the book assumes no prior knowledge of:
No matplotlib or Visualization Library Experience
We start from zero. Chapter 10 introduces matplotlib's architecture from the
ground up. If you have never typed import matplotlib.pyplot as plt, that is
perfectly fine. If you have, you will still learn things in Chapters 10--15
because we go far beyond the basic tutorial.
No Design Background
You do not need to have studied graphic design, typography, color theory, or visual communication. Parts I and II teach these from first principles, using perception science rather than artistic intuition. If you think "I'm not a visual person," this book was specifically written for you.
No Art or Drawing Ability
Data visualization is engineering, not art. You will never be asked to draw anything freehand. Every visual in this book is generated by code.
No Web Development
Interactive visualization chapters (Plotly, Altair, Streamlit, Dash) handle the HTML/JavaScript layer for you. You do not need to know HTML, CSS, or JavaScript. When web concepts arise (e.g., how Plotly renders in the browser), they are explained at the level needed.
No Command-Line Expertise
Basic terminal usage (running pip install, python script.py, jupyter
notebook) is helpful but not assumed. The Prerequisites for each chapter note
any command-line operations required.
Filling Gaps
If you identified gaps in the required prerequisites, here are recommended resources:
Python Fundamentals
- Official Python Tutorial (docs.python.org/3/tutorial/) --- Free, authoritative, well-structured.
- Automate the Boring Stuff with Python by Al Sweigart --- Excellent for practical Python fluency. Available free online.
- Python Crash Course by Eric Matthes --- Well-paced introduction with good exercises.
pandas
- pandas Documentation Getting Started Tutorials (pandas.pydata.org/docs/getting_started/) --- Official, well-maintained, covers all the basics.
- Python for Data Analysis by Wes McKinney --- Written by the creator of pandas. Comprehensive and authoritative.
- Kaggle Learn: Pandas (kaggle.com/learn/pandas) --- Free, short, exercise-driven.
Statistics Concepts
- Naked Statistics by Charles Wheelan --- Intuitive, non-mathematical introduction to statistical thinking.
- Khan Academy Statistics and Probability (khanacademy.org/math/statistics-probability) --- Free, video-based, self-paced.
- OpenIntro Statistics (openintro.org/book/os/) --- Free textbook, more rigorous but still accessible.
Environment Setup
- Real Python: Installing Python (realpython.com/installing-python/) --- Platform-specific guides.
- JupyterLab Documentation (jupyterlab.readthedocs.io/) --- Getting started with notebooks.
Environment Checklist
Before starting Chapter 1, confirm that you can:
- [ ] Open a terminal and run
python --version(shows 3.11 or later). - [ ] Create and activate a virtual environment.
- [ ] Run
pip install pandas numpy matplotlibsuccessfully. - [ ] Open a Jupyter notebook and execute a cell containing
import pandas as pd; print(pd.__version__). - [ ] Load a CSV file in pandas and display the first five rows.
If all five checks pass, you are ready to begin. Turn to Chapter 1.