Key Takeaways: Introduction to pandas
This is your reference card for Chapter 7 — the chapter where Python stopped being a general-purpose programming language and became a data analysis powerhouse. Keep this nearby whenever you're working with DataFrames.
The Five Core Concepts
-
DataFrame — A two-dimensional labeled table. Rows and columns, with an index. The primary data structure in pandas.
-
Series — A one-dimensional labeled array. A single column of a DataFrame. Has an index, values, and a name.
-
Vectorized operations — Applying an operation to an entire column at once (
df["col"] * 2) rather than looping through values one by one. Faster, safer, more readable. -
Boolean indexing — Filtering rows using a True/False mask (
df[df["col"] > value]). The pandas replacement for loop-with-if-statement. -
read_csv— One-line CSV loading with automatic type detection andNaNfor missing values. Replacescsv.DictReader+ loop + manual type conversion.
DataFrame Operations Quick Reference
Inspection
| Method / Attribute | What It Returns |
|---|---|
df.shape |
Tuple: (rows, columns) |
df.dtypes |
Data type of each column |
df.columns |
Column names |
df.head(n) |
First n rows (default 5) |
df.tail(n) |
Last n rows (default 5) |
df.info() |
Concise summary: types, non-null counts, memory |
df.describe() |
Summary statistics for numeric columns |
df["col"].value_counts() |
Count of each unique value |
df["col"].unique() |
Array of unique values |
df["col"].nunique() |
Number of unique values |
Selection
| What You Want | Syntax | Returns |
|---|---|---|
| One column | df["col"] |
Series |
| Multiple columns | df[["col1", "col2"]] |
DataFrame |
| Rows by position | df.iloc[start:stop] |
DataFrame |
| Rows by label | df.loc[start:stop] |
DataFrame |
| Rows + columns by position | df.iloc[rows, cols] |
varies |
| Rows + columns by label | df.loc[rows, cols] |
varies |
Key difference: iloc uses exclusive end (like Python slicing). loc uses inclusive end.
Filtering
| Pattern | Example |
|---|---|
| Single condition | df[df["col"] > value] |
| AND (both true) | df[(df["col1"] > x) & (df["col2"] == y)] |
| OR (either true) | df[(df["col1"] > x) \| (df["col2"] == y)] |
| NOT | df[~(df["col"] == value)] |
| Membership | df[df["col"].isin(["a", "b", "c"])] |
Always wrap each condition in parentheses when combining with & or |.
Sorting
df.sort_values("col") # Ascending (default)
df.sort_values("col", ascending=False) # Descending
df.sort_values(["col1", "col2"], ascending=[True, False]) # Multi-column
df.sort_values("col").reset_index(drop=True) # Clean index after sort
Creating and Modifying Columns
df["new_col"] = df["col"] * 2 # Vectorized arithmetic
df["new_col"] = df["col1"] + df["col2"] # Combine columns
df["new_col"] = df["col"].apply(some_function) # Apply custom logic
Method Chaining Pattern
result = (df[df["region"] == "AFRO"] # Filter
.sort_values("coverage_pct", # Sort
ascending=False)
[["country", "coverage_pct"]] # Select columns
.head(10)) # Limit rows
Read chains as sentences. Break across lines using parentheses. Keep chains under 5-6 steps. Use intermediate variables when debugging.
Common Gotchas
| Gotcha | Symptom | Fix |
|---|---|---|
| Wrong column name | KeyError |
Check df.columns.tolist() — names are case-sensitive |
and instead of & |
ValueError: ambiguous truth value |
Use & for AND, \| for OR, ~ for NOT |
| Single brackets for multiple columns | KeyError with a tuple |
Use double brackets: df[["col1", "col2"]] |
| Modifying a filtered subset | SettingWithCopyWarning |
Use .copy() or .loc |
loc vs iloc slice end |
Off-by-one errors | loc is inclusive, iloc is exclusive |
| Dot notation on special names | Wrong result or AttributeError |
Always use bracket notation: df["count"] not df.count |
Terms to Remember
| Term | Definition |
|---|---|
| pandas | Open-source Python library for data manipulation and analysis; provides DataFrame and Series |
| DataFrame | Two-dimensional labeled data structure; a table with rows and columns |
| Series | One-dimensional labeled array; a single column of data with an index |
| index | Row labels in a DataFrame or Series; default is 0-based integers |
| column | A named vertical slice of a DataFrame; each column is a Series |
| row | A horizontal slice of a DataFrame; represents one observation or record |
| loc | Label-based indexer for selecting by index values and column names |
| iloc | Integer-position-based indexer for selecting by numeric position |
| boolean indexing | Filtering rows using a Series of True/False values (a boolean mask) |
| vectorized operation | An operation applied to an entire array/column at once, without explicit looping |
| apply | Series method that calls a function on each value, returning a new Series |
| read_csv | pandas function to load a CSV file into a DataFrame with automatic type detection |
| dtypes | DataFrame attribute showing the data type of each column |
| shape | DataFrame attribute returning (rows, columns) as a tuple |
| describe | DataFrame/Series method returning summary statistics for numeric data |
| head | DataFrame/Series method returning the first n rows (default 5) |
What You Should Be Able to Do Now
Use this checklist to verify you've absorbed the chapter. If any item feels shaky, revisit the relevant section or practice with the exercises.
- [ ] Import pandas with the standard
pdalias - [ ] Create DataFrames from dictionaries, lists of dictionaries, and CSV files
- [ ] Inspect any DataFrame with
shape,dtypes,head(),describe(), andinfo() - [ ] Select columns using bracket notation (single and multiple)
- [ ] Select rows using
iloc(by position) andloc(by label) - [ ] Filter rows using boolean indexing with single and combined conditions
- [ ] Sort by one or multiple columns, ascending or descending
- [ ] Create new columns using vectorized arithmetic and
apply() - [ ] Load a CSV file with
pd.read_csv()and understand whatNaNmeans - [ ] Chain methods into readable multi-step pipelines
- [ ] Diagnose common errors:
KeyError,SettingWithCopyWarning,ValueErrorwith boolean operators - [ ] Explain why vectorized operations are preferred over loops
- [ ] Translate English questions into pandas expressions using the grammar of data manipulation
If you checked every box, you're ready for Chapter 8, where you'll learn to handle the messy reality that NaN values and dirty data bring. The tools get sharper from here.