Appendix A: Math Foundations Refresher

Who this appendix is for: If you see a formula in this book and your first reaction is mild panic, this appendix is for you. We assume you saw this material at some point in high school or early college, but it may have been a while. This is not a math course --- it is a quick refresher designed to give you just enough confidence to follow along without anxiety. If you are comfortable with basic algebra, percentages, and introductory probability, feel free to skip ahead.

A.1 Variables and Expressions

In math (and in Python), a variable is a name that stands in for a value. When we write:

$$y = 3x + 5$$

we mean: "whatever $x$ is, $y$ is three times that value plus five." If $x = 2$, then $y = 3(2) + 5 = 11$.

This is exactly the same idea as Python variables:

x = 2
y = 3 * x + 5   # y is now 11

Key rules you probably remember:

Addition and subtraction are performed left to right: $10 - 3 + 2 = 9$.
Multiplication and division are performed before addition and subtraction: $2 + 3 \times 4 = 14$, not 20.
Parentheses override everything: $(2 + 3) \times 4 = 20$.
Exponents are performed before multiplication: $2 \times 3^2 = 2 \times 9 = 18$.

The order of operations in Python follows these same rules. In data science, you will rarely need to do algebra by hand, but you do need to read formulas and understand what they mean conceptually.

A.2 Solving Simple Equations

Sometimes a formula gives you one variable in terms of another, and you need to rearrange it. The golden rule: whatever you do to one side of the equation, do the same to the other side.

Example: The formula for converting Celsius to Fahrenheit is:

$$F = \frac{9}{5}C + 32$$

If you know $F = 98.6$ and want to find $C$:

Subtract 32 from both sides: $98.6 - 32 = \frac{9}{5}C$ gives $66.6 = \frac{9}{5}C$.
Multiply both sides by $\frac{5}{9}$: $C = 66.6 \times \frac{5}{9} = 37.0$.

In data science, you will almost never solve equations by hand. Python does the arithmetic. But understanding the logic of rearranging formulas helps you read statistical formulas and verify that your code is doing what you expect.

A.3 Percentages and Proportions

A proportion is a number between 0 and 1 representing a fraction of a whole. A percentage is that proportion multiplied by 100.

Proportion	Percentage	English
0.25	25%	"one quarter"
0.50	50%	"one half"
0.01	1%	"one in a hundred"
1.00	100%	"the whole thing"

Converting between them:

Proportion to percentage: multiply by 100. So $0.73 \rightarrow 73\%$.
Percentage to proportion: divide by 100. So $85\% \rightarrow 0.85$.

Percentage change measures how much a quantity grew or shrank relative to its starting value:

$$\text{Percentage change} = \frac{\text{new value} - \text{old value}}{\text{old value}} \times 100$$

If sales went from 200 to 250:

$$\frac{250 - 200}{200} \times 100 = 25\%$$

Percentage points vs. percentages: This distinction trips up many people. If a vaccination rate goes from 60% to 66%, it increased by 6 percentage points but by 10 percent (because $6/60 = 0.10$). In data science, you will encounter both, and being sloppy about the distinction can lead to misleading claims.

Rates: A rate is a proportion expressed per unit of something --- per person, per year, per 100,000 population. For instance, "4.2 cases per 100,000 people" means $4.2 / 100{,}000 = 0.000042$ as a proportion. Rates let us compare groups of different sizes fairly.

A.4 Ratios and Proportions in Context

A ratio compares two quantities. If a class has 12 women and 8 men, the ratio of women to men is $12:8$, which simplifies to $3:2$. You could also express this as $12/8 = 1.5$, meaning there are 1.5 women for every man.

Cross-multiplication is a shortcut for solving proportion equations. If:

$$\frac{a}{b} = \frac{c}{d}$$

then $a \times d = b \times c$.

Example: If 3 out of every 5 surveyed customers prefer Product A, how many out of 800 would you expect to prefer it?

$$\frac{3}{5} = \frac{x}{800}$$

Cross-multiply: $3 \times 800 = 5x$, so $x = 480$.

A.5 Basic Probability Notation

Probability measures how likely an event is, on a scale from 0 (impossible) to 1 (certain).

Notation you will encounter in this book:

Symbol	Meaning	Example
$P(A)$	The probability that event A occurs	$P(\text{rain}) = 0.30$
$P(A \text{ and } B)$	Probability that both A and B occur	$P(\text{rain and cold})$
$P(A \text{ or } B)$	Probability that A or B (or both) occurs	$P(\text{rain or snow})$
$P(A \mid B)$	Probability of A given that B has occurred	$P(\text{late} \mid \text{rain})$
$P(\text{not } A)$ or $P(A^c)$	Probability that A does not occur	$P(\text{no rain}) = 1 - P(\text{rain})$

Key rules:

Complement rule: $P(\text{not } A) = 1 - P(A)$. If there is a 30% chance of rain, there is a 70% chance of no rain.
Addition rule (mutually exclusive events): If events cannot happen simultaneously, $P(A \text{ or } B) = P(A) + P(B)$. The probability of rolling a 1 or a 6 on a fair die is $1/6 + 1/6 = 2/6$.
General addition rule: If events can overlap, $P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)$. You subtract the overlap to avoid counting it twice.
Multiplication rule (independent events): If knowing one event occurred tells you nothing about the other, $P(A \text{ and } B) = P(A) \times P(B)$. The probability of flipping heads twice in a row is $0.5 \times 0.5 = 0.25$.
Conditional probability: $P(A \mid B) = P(A \text{ and } B) / P(B)$. This is the foundation of many statistical methods.

You do not need to memorize these rules to learn data science. But when you see them in Chapters 20--23, you will be glad you reviewed them here.

A.6 Summation Notation ($\Sigma$)

The Greek letter sigma ($\Sigma$) is shorthand for "add up a bunch of things." When you see:

$$\sum_{i=1}^{n} x_i$$

it means: "Start with $i = 1$, then $i = 2$, and so on up to $i = n$. For each $i$, take the value $x_i$. Add them all up."

Concrete example: If we have five test scores $x_1 = 88$, $x_2 = 92$, $x_3 = 75$, $x_4 = 95$, $x_5 = 80$:

$$\sum_{i=1}^{5} x_i = 88 + 92 + 75 + 95 + 80 = 430$$

The mean (average) is:

$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i = \frac{430}{5} = 86$$

The bar over $x$ (read "x-bar") is the standard notation for the sample mean.

In Python, you never need sigma notation. You just write sum(scores) / len(scores) or np.mean(scores). But being able to read sigma notation lets you understand formulas in statistics textbooks and documentation.

A few more common patterns:

Sum of squared values: $\sum x_i^2$ means square each value, then add them up.
Sum of squared differences from the mean: $\sum (x_i - \bar{x})^2$. This is the numerator in the variance formula. It measures how spread out the data is.
Double summation: $\sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}$ means add up every element in a table with $m$ rows and $n$ columns.

A.7 Logarithms

A logarithm answers the question: "What exponent do I need?"

$$\log_b(x) = y \quad \text{means} \quad b^y = x$$

$\log_{10}(1000) = 3$ because $10^3 = 1000$.
$\log_{10}(100) = 2$ because $10^2 = 100$.
$\log_{2}(8) = 3$ because $2^3 = 8$.

Why logarithms appear in data science:

Skewed data. Income, population, and many natural phenomena span several orders of magnitude. Plotting them on a regular scale squishes most data points into a corner. Taking the logarithm spreads them out, making patterns visible. This is why Chapter 15 introduces log-scaled axes.
Multiplicative relationships. If a quantity doubles every year, a regular plot shows an exponential curve that is hard to interpret. A log-transformed plot shows a straight line, which is much easier to analyze.
Statistical models. Logistic regression (Chapter 27) uses the log-odds (also called the logit). The natural logarithm ($\ln$, or $\log_e$, where $e \approx 2.718$) appears frequently in probability and statistics.

Key properties (you do not need to memorize these, but recognizing them helps):

Property	Formula	In words
Log of a product	$\log(a \times b) = \log(a) + \log(b)$	Multiplication becomes addition
Log of a quotient	$\log(a / b) = \log(a) - \log(b)$	Division becomes subtraction
Log of a power	$\log(a^k) = k \cdot \log(a)$	Exponents become multipliers
Log of 1	$\log(1) = 0$ (any base)	Anything to the zero power is 1

In Python:

import math
math.log10(1000)    # 3.0 (base-10 logarithm)
math.log(1000)      # 6.907... (natural logarithm, base e)
math.log2(1024)     # 10.0 (base-2 logarithm)

import numpy as np
np.log(1000)        # 6.907... (natural log, works on arrays)
np.log10(1000)      # 3.0

A.8 Basic Set Notation

A set is a collection of unique items with no particular order. Sets appear throughout statistics and probability.

Notation you may encounter:

Symbol	Meaning	Example
$\{a, b, c\}$	A set containing $a$, $b$, and $c$	$\{1, 2, 3\}$
$\in$	"is an element of"	$2 \in \{1, 2, 3\}$
$\notin$	"is not an element of"	$4 \notin \{1, 2, 3\}$
$A \cup B$	Union: everything in A or B or both	$\{1,2\} \cup \{2,3\} = \{1,2,3\}$
$A \cap B$	Intersection: only what is in both	$\{1,2\} \cap \{2,3\} = \{2\}$
$A \setminus B$ or $A - B$	Difference: in A but not in B	$\{1,2,3\} - \{2\} = \{1,3\}$
$\emptyset$ or $\{\}$	The empty set	No elements
$\|A\|$	The size (cardinality) of set A	$\|\{1,2,3\}\| = 3$

In Python, sets work the same way:

A = {1, 2, 3}
B = {2, 3, 4}

A | B    # Union: {1, 2, 3, 4}
A & B    # Intersection: {2, 3}
A - B    # Difference: {1}
2 in A   # True
4 in A   # False
len(A)   # 3

Sets are useful in data science for finding unique values, comparing groups, and understanding probability (where events are sets of outcomes).

A.9 Inequalities and Intervals

Data science often involves conditions like "values greater than 100" or "ages between 18 and 65." This is where inequalities come in.

Symbol	Meaning
$>$	Greater than
$<$	Less than
$\geq$ or $\ge$	Greater than or equal to
$\leq$ or $\le$	Less than or equal to

Interval notation compactly represents a range:

$[a, b]$ means $a \leq x \leq b$ (both endpoints included, "closed interval")
$(a, b)$ means $a < x < b$ (both endpoints excluded, "open interval")
$[a, b)$ means $a \leq x < b$ (left included, right excluded, "half-open interval")

Half-open intervals appear surprisingly often in data science. When pandas creates histogram bins, it typically uses half-open intervals like $[0, 10)$, $[10, 20)$, $[20, 30)$ so that every value falls into exactly one bin.

A.10 Functions

A function takes an input and produces an output. In math:

$$f(x) = x^2 + 1$$

means "take $x$, square it, add 1." If $x = 3$, then $f(3) = 3^2 + 1 = 10$.

This is directly analogous to Python functions:

def f(x):
    return x ** 2 + 1

f(3)   # 10

Linear functions have the form $f(x) = mx + b$, where $m$ is the slope (how fast the output changes per unit change in input) and $b$ is the y-intercept (the output when $x = 0$). Linear regression (Chapter 26) is all about finding the best values of $m$ and $b$ to fit your data.

Key function vocabulary:

Domain: The set of valid inputs. For $f(x) = \sqrt{x}$, the domain is $x \geq 0$ (you cannot take the square root of a negative number in the real numbers).
Range: The set of possible outputs.
Monotonic: A function that only goes up (increasing) or only goes down (decreasing), never switching direction.

A.11 Coordinate Systems and Graphs

Every scatter plot, line chart, and bar chart in this book uses a Cartesian coordinate system: a horizontal axis (x-axis) and a vertical axis (y-axis) meeting at the origin $(0, 0)$.

A point is specified as $(x, y)$. The point $(3, 7)$ means "go 3 units right, 7 units up."

Slope measures the steepness of a line:

$$\text{slope} = \frac{\text{rise}}{\text{run}} = \frac{y_2 - y_1}{x_2 - x_1}$$

A slope of 2 means "for every 1 unit you move right, the line goes up 2 units." A slope of $-0.5$ means "for every 1 unit right, the line goes down 0.5 units."

In Chapter 26 (Linear Regression), the slope of the best-fit line tells you the relationship between your predictor variable and your outcome variable. Understanding slope is essential for interpreting regression coefficients.

A.12 Absolute Value and Distance

The absolute value of a number is its distance from zero, ignoring the sign.

$$|5| = 5 \qquad |-5| = 5 \qquad |0| = 0$$

In Python: abs(-5) returns 5.

The distance between two numbers on a number line is the absolute value of their difference:

$$\text{distance}(a, b) = |a - b|$$

This concept extends to measuring how far a data point is from the mean, which is the foundation of variance and standard deviation (Chapter 19).

A.13 A Note on Mathematical Maturity

If you made it through this appendix, you have all the math you need for this book. Here is the encouraging news: data science is not about being "good at math." It is about being curious, careful, and willing to let the computer do the arithmetic while you focus on the thinking.

The formulas in this book exist to make ideas precise, not to make you perform calculations by hand. When you see a formula, ask yourself:

What does this formula calculate?
What goes up when the inputs go up? What goes down?
What Python function does this correspond to?

If you can answer those three questions, you understand the formula well enough for data science.

Return to the main text whenever you are ready. You can always come back here for a quick lookup.