> Chapter goal: Learn to read from and write to Excel workbooks programmatically — creating multi-sheet formatted reports, applying business-grade cell styling, and building the complete Python-to-Excel workflow that turns data analysis into...
In This Chapter
- Where You Are in the Story
- 16.1 The Python Excel Ecosystem: Choosing Your Tool
- 16.2 Reading Excel with pandas
- 16.3 Writing Excel with pandas: The Quick Path
- 16.4 Cell Formatting with openpyxl
- 16.5 Writing Charts into Excel with openpyxl
- 16.6 The Python-to-Excel Workflow: Analyze in pandas, Output to openpyxl
- 16.7 Working with Multiple Sheets: Combining and Splitting
- 16.8 CSV Edge Cases
- 16.9 xlwings: Brief Overview for Live Excel Integration
- 16.10 Best Practices: Python → Excel vs Just Excel
- Summary
Chapter 16: Excel and CSV Integration — Python Meets Spreadsheets
Chapter goal: Learn to read from and write to Excel workbooks programmatically — creating multi-sheet formatted reports, applying business-grade cell styling, and building the complete Python-to-Excel workflow that turns data analysis into deliverables your colleagues can use without touching Python.
Where You Are in the Story
Marcus Webb has been watching Priya's work since Chapter 11 with a mixture of admiration and mild professional anxiety. He's spent fourteen years maintaining Acme Corp's Excel-based reporting system: a cascade of pivot tables, VLOOKUP chains, and color-coded cells that he understands completely and that exactly two other people in the company can edit without breaking.
Every week, he manually exports four regional CSV files from the ERP system, pastes them into a master Excel workbook, updates a summary sheet, reformats the headers, and emails the result to the distribution list. It takes about two hours.
When Priya shows him the Python output from Chapter 13 — clean DataFrames with every regional metric calculated automatically — he asks a single question:
"Can Python write the Excel file too?"
Yes. And it can format it, name the sheets, bold the headers, set column widths, apply number formats, and add a chart. This chapter is about that capability.
Maya Reyes has a different but equally concrete need. She's been manually creating invoices in Word, copying in project data from her tracker CSV, formatting each one, and emailing it as a PDF. She has 12 clients. Invoice day takes most of a workday. By the end of this chapter she has a script that does it in three minutes.
16.1 The Python Excel Ecosystem: Choosing Your Tool
Three main libraries handle Excel files in Python:
| Library | Read | Write | Format Cells | Charts | Excel Formulas | Live Excel |
|---|---|---|---|---|---|---|
| pandas | Yes | Basic | No | No | No | No |
| openpyxl | Yes | Yes | Yes | Yes | Yes | No |
| xlwings | Yes | Yes | Yes | Yes | Yes | Yes (COM) |
pandas is what you've been using for reading. pd.read_excel() loads data into a DataFrame. df.to_excel() writes a DataFrame to an Excel file. Both are fast and simple. Neither produces formatted output.
openpyxl is the production tool for creating formatted Excel workbooks from Python. It can write every cell, set every format, add sheets, merge cells, draw charts, and freeze panes. If you've ever manually formatted an Excel report that you wish were automated, openpyxl is the answer.
xlwings connects Python to a running instance of Excel on your machine. It can read and write cells in a live workbook, trigger macros, and interact with Excel as if a user were typing. This is powerful for situations where you need Excel to recalculate formulas or use Excel-specific features. It requires Excel to be installed and is Windows/Mac only.
For this chapter: pandas for reading, openpyxl for writing formatted output, xlwings for the brief overview.
Install openpyxl:
pip install openpyxl
16.2 Reading Excel with pandas
16.2.1 Reading a Single Sheet
import pandas as pd
# Read the first sheet (default)
sales_df = pd.read_excel("acme_customers.xlsx")
# Read a specific sheet by name
q2_df = pd.read_excel("acme_customers.xlsx", sheet_name="Q2 Sales")
# Read a specific sheet by index (0-based)
third_sheet_df = pd.read_excel("acme_customers.xlsx", sheet_name=2)
pd.read_excel accepts the same keyword arguments as pd.read_csv for most common needs: header, index_col, usecols, skiprows, nrows, dtype, na_values.
16.2.2 Reading Multiple Sheets
Passing sheet_name=None reads all sheets and returns a dictionary where keys are sheet names and values are DataFrames:
# Read all sheets
all_sheets = pd.read_excel("acme_quarterly_report.xlsx", sheet_name=None)
print(type(all_sheets)) # dict
print(list(all_sheets.keys())) # ['Q1', 'Q2', 'Q3', 'Q4', 'Summary']
# Access individual sheets
q1_df = all_sheets["Q1"]
summary_df = all_sheets["Summary"]
# Combine all quarterly sheets into one DataFrame
quarterly_frames = [
all_sheets[quarter].assign(quarter=quarter)
for quarter in ["Q1", "Q2", "Q3", "Q4"]
]
combined_df = pd.concat(quarterly_frames, ignore_index=True)
16.2.3 Reading a Specific Range
Sometimes a worksheet has decorative headers, merged title cells, or footnotes that you want to skip:
# Skip 3 header rows, read only columns B through F
data_df = pd.read_excel(
"acme_report.xlsx",
sheet_name="Sales Detail",
skiprows=3, # Skip decorative header rows
usecols="B:F", # Read only columns B through F
nrows=200, # Read only the first 200 data rows
)
For more precise range control, use openpyxl directly:
from openpyxl import load_workbook
wb = load_workbook("acme_report.xlsx", data_only=True)
ws = wb["Sales Detail"]
# Read a specific rectangular range
data_range = ws["B4:F53"] # Returns a tuple of rows, each row a tuple of cells
for row in data_range:
for cell in row:
print(cell.value, end="\t")
print()
The data_only=True argument to load_workbook is important: it returns the cached formula result rather than the formula string itself. Without it, a cell containing =SUM(A1:A10) would return the string "=SUM(A1:A10)" rather than the calculated value.
16.2.4 Reading with openpyxl for Metadata
pandas reads data. openpyxl reads everything — including cell formatting, comments, and named ranges. Use openpyxl when you need to inspect the workbook structure:
from openpyxl import load_workbook
wb = load_workbook("acme_quarterly_report.xlsx")
# List all sheet names
print(wb.sheetnames) # ['Q1', 'Q2', 'Q3', 'Q4', 'Summary']
# Check sheet dimensions
ws = wb["Q1"]
print(ws.dimensions) # e.g., 'A1:G203'
print(ws.max_row, ws.max_column) # 203, 7
# Read a specific cell
cell_value = ws["C5"].value
print(cell_value)
16.3 Writing Excel with pandas: The Quick Path
For simple cases — writing a DataFrame to Excel with no special formatting — pandas is all you need:
import pandas as pd
summary_df = pd.DataFrame({
"region": ["Northeast", "Southeast", "Midwest", "West"],
"q4_revenue": [580000, 388000, 467000, 742000],
"q4_orders": [520, 348, 419, 665],
})
# Write to a single sheet
summary_df.to_excel("acme_q4_summary.xlsx", sheet_name="Q4 Summary", index=False)
For multiple sheets, use ExcelWriter as a context manager:
with pd.ExcelWriter("acme_quarterly.xlsx", engine="openpyxl") as writer:
q1_df.to_excel(writer, sheet_name="Q1", index=False)
q2_df.to_excel(writer, sheet_name="Q2", index=False)
q3_df.to_excel(writer, sheet_name="Q3", index=False)
q4_df.to_excel(writer, sheet_name="Q4", index=False)
summary_df.to_excel(writer, sheet_name="Summary", index=False)
The context manager ensures the workbook is properly saved and closed when the block exits. Omitting it can result in a partially written or corrupted file.
The output of this approach is functional but plain: black text, default column widths, no bold headers, no currency formatting. For a report that Marcus or Sandra will share, this needs formatting.
16.4 Cell Formatting with openpyxl
openpyxl is the tool for formatting. Its model matches how Excel itself works: each cell is an object with style properties (font, fill, border, alignment, number format), and you set those properties directly.
16.4.1 Opening and Creating Workbooks
from openpyxl import Workbook, load_workbook
# Create a new empty workbook
wb = Workbook()
ws = wb.active # The default sheet created automatically
ws.title = "Summary" # Rename it
# Add additional sheets
q1_ws = wb.create_sheet(title="Q1")
q2_ws = wb.create_sheet(title="Q2")
# Save
wb.save("acme_report.xlsx")
# Load an existing workbook for modification
wb_existing = load_workbook("acme_report.xlsx")
ws_existing = wb_existing["Summary"]
wb_existing.save("acme_report_updated.xlsx")
16.4.2 Writing Data to Cells
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
# Write a single cell — two equivalent approaches
ws["A1"] = "Region"
ws.cell(row=1, column=2).value = "Q4 Revenue"
# Write a row at once
ws.append(["Northeast", 580000, 520])
ws.append(["Southeast", 388000, 348])
# Write from a list of lists
data = [
["Midwest", 467000, 419],
["West", 742000, 665],
]
for row_data in data:
ws.append(row_data)
For bulk writing from a DataFrame, iterate over rows:
for row_idx, data_row in enumerate(summary_df.itertuples(index=False), start=2):
ws.cell(row=row_idx, column=1).value = data_row.region
ws.cell(row=row_idx, column=2).value = data_row.q4_revenue
ws.cell(row=row_idx, column=3).value = data_row.q4_orders
16.4.3 Font Formatting
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
# Bold, larger header font
header_font = Font(name="Calibri", bold=True, size=12, color="FFFFFF")
# Apply to a cell
ws["A1"].font = header_font
# Apply to a range — iterate using ws.iter_rows()
for cell in ws[1]: # First row
cell.font = Font(bold=True, size=11)
16.4.4 Fill (Background Color)
from openpyxl.styles import PatternFill
# Solid fill with a hex color
dark_blue_fill = PatternFill(
fill_type="solid",
fgColor="1F4E79", # Dark blue — hex code without '#'
)
light_gray_fill = PatternFill(
fill_type="solid",
fgColor="F2F2F2",
)
# Apply to header row
for cell in ws[1]:
cell.fill = dark_blue_fill
cell.font = Font(bold=True, color="FFFFFF")
# Alternating row shading — applied to data rows
for row_idx in range(2, ws.max_row + 1):
fill = light_gray_fill if row_idx % 2 == 0 else PatternFill()
for cell in ws[row_idx]:
cell.fill = fill
16.4.5 Number Formats
Number format strings in openpyxl use the same syntax as Excel's custom number format dialog:
from openpyxl.styles import numbers
# Apply currency format to column B (revenue values)
for row_idx in range(2, ws.max_row + 1):
ws.cell(row=row_idx, column=2).number_format = '"$"#,##0' # $580,000
ws.cell(row=row_idx, column=3).number_format = "#,##0" # 520
ws.cell(row=row_idx, column=4).number_format = '0.00%' # 38.20%
ws.cell(row=row_idx, column=5).number_format = '"$"#,##0.00' # $1,234.56
Common format strings:
| Format String | Example Output |
|---|---|
| `'"$"#,##0'` | $580,000 | |
| `'"$"#,##0.00'` | $580,000.00 | |
'#,##0' |
580,000 |
'0.00%' |
38.20% |
'0%' |
38% |
'MM/DD/YYYY' |
02/14/2023 |
'YYYY-MM-DD' |
2023-02-14 |
'#,##0.0"K"' |
580.0K |
16.4.6 Column Widths and Row Heights
# Set specific column widths (in character units)
ws.column_dimensions["A"].width = 18 # Region name column
ws.column_dimensions["B"].width = 16 # Revenue column
ws.column_dimensions["C"].width = 14 # Orders column
# Auto-fit column width based on content (manual approximation)
for col in ws.columns:
max_length = max(
len(str(cell.value)) if cell.value else 0
for cell in col
)
col_letter = col[0].column_letter
ws.column_dimensions[col_letter].width = max(12, min(max_length + 4, 50))
# Set row height
ws.row_dimensions[1].height = 22 # Header row taller
16.4.7 Merged Cells and Alignment
from openpyxl.styles import Alignment
# Merge the title cell across columns A-E
ws.merge_cells("A1:E1")
ws["A1"] = "Acme Corp — Q4 2023 Regional Summary"
ws["A1"].font = Font(bold=True, size=14, color="1F4E79")
ws["A1"].alignment = Alignment(horizontal="center", vertical="center")
ws.row_dimensions[1].height = 28
# Center-align a range
for row in ws.iter_rows(min_row=2, max_row=ws.max_row, min_col=2, max_col=4):
for cell in row:
cell.alignment = Alignment(horizontal="right", vertical="center")
16.4.8 Borders
from openpyxl.styles import Border, Side
# Define border sides
thin_side = Side(style="thin", color="CCCCCC")
thick_side = Side(style="medium", color="1F4E79")
thin_border = Border(
left=thin_side, right=thin_side,
top=thin_side, bottom=thin_side,
)
bottom_thick = Border(bottom=thick_side)
# Apply bottom border to header row
for cell in ws[2]: # Row 2 (below the title)
cell.border = bottom_thick
# Apply thin border to all data cells
for row in ws.iter_rows(min_row=3, max_row=ws.max_row):
for cell in row:
cell.border = thin_border
16.4.9 Freeze Panes
Freezing the header row keeps column labels visible when scrolling through long reports:
ws.freeze_panes = "A3" # Freeze rows above row 3 and columns left of A
# (i.e., rows 1 and 2 are frozen, no column freeze)
The freeze panes value is the first cell that is NOT frozen. "A3" means rows 1-2 and no columns are frozen.
16.5 Writing Charts into Excel with openpyxl
openpyxl can embed charts directly into a worksheet. The chart references data in the workbook — it does not embed a static image.
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
wb = Workbook()
ws = wb.active
ws.title = "Regional Revenue"
# Write header and data
ws.append(["Region", "Q4 Revenue"])
data = [
("Northeast", 580000),
("Southeast", 388000),
("Midwest", 467000),
("West", 742000),
]
for row in data:
ws.append(row)
# Define the data reference for the chart
revenue_ref = Reference(
ws,
min_col=2, # Column B (Revenue)
min_row=1, # Include the header for the series title
max_row=5, # Through row 5 (4 data rows + 1 header)
)
# Define the category labels (Region column)
region_ref = Reference(
ws,
min_col=1,
min_row=2,
max_row=5,
)
# Build the chart
chart = BarChart()
chart.type = "col" # Vertical bars
chart.grouping = "clustered"
chart.title = "Q4 Revenue by Region"
chart.y_axis.title = "Revenue ($)"
chart.x_axis.title = "Region"
chart.style = 10 # Excel style 10 — clean professional look
chart.width = 18 # Chart width in cm
chart.height = 12 # Chart height in cm
chart.add_data(revenue_ref, titles_from_data=True)
chart.set_categories(region_ref)
# Place the chart starting at cell D2
ws.add_chart(chart, "D2")
wb.save("acme_regional_chart.xlsx")
openpyxl supports: BarChart, LineChart, PieChart, AreaChart, ScatterChart, BubbleChart. Each has configuration properties that map to Excel's chart formatting options.
16.6 The Python-to-Excel Workflow: Analyze in pandas, Output to openpyxl
The most practical pattern for business reporting combines both libraries:
- Read raw data with pandas (
pd.read_csv,pd.read_excel) - Clean and transform with pandas (the work from Chapters 12-13)
- Aggregate with pandas (groupby, pivot_table)
- Write formatted output with openpyxl
This separation of concerns is clean. pandas is excellent at data manipulation. openpyxl is excellent at Excel formatting. Together they cover the full workflow.
Here is the complete pattern:
import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils import get_column_letter
# --- Step 1: Analyze ---
raw_df = pd.read_csv("acme_sales_2023.csv", parse_dates=["order_date"])
regional_summary = (
raw_df
.groupby("region")
.agg(
total_revenue=("revenue", "sum"),
total_orders=("order_id", "count"),
avg_order_value=("revenue", "mean"),
gross_margin_pct=("gross_margin", "mean"),
)
.reset_index()
.sort_values("total_revenue", ascending=False)
)
# --- Step 2: Create workbook ---
wb = Workbook()
ws = wb.active
ws.title = "Regional Summary"
# --- Step 3: Write and format ---
# (full formatting code shown in Section 16.7 and the code file)
# --- Step 4: Save ---
wb.save("acme_q4_report.xlsx")
The key discipline: do all calculation in pandas before touching openpyxl. Don't use openpyxl cells for arithmetic — use pandas. openpyxl's job is presentation, not computation.
16.7 Working with Multiple Sheets: Combining and Splitting
16.7.1 Splitting One DataFrame into Multiple Sheets
A common request: "Give me the same report but with a separate tab for each region."
import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from openpyxl.utils import get_column_letter
raw_df = pd.read_csv("acme_sales_2023.csv", parse_dates=["order_date"])
# Write each region to its own sheet
with pd.ExcelWriter("acme_regional_sheets.xlsx", engine="openpyxl") as writer:
for region in sorted(raw_df["region"].unique()):
region_df = raw_df[raw_df["region"] == region].copy()
region_df.to_excel(
writer,
sheet_name=region,
index=False,
)
# Add a summary sheet
summary_df = (
raw_df.groupby("region")
.agg(total_revenue=("revenue", "sum"), order_count=("order_id", "count"))
.reset_index()
)
summary_df.to_excel(writer, sheet_name="Summary", index=False)
16.7.2 Combining Multiple Sheets into One DataFrame
import pandas as pd
# Read all sheets and combine
all_sheets = pd.read_excel("acme_regional_sheets.xlsx", sheet_name=None)
# Exclude the Summary sheet; add a region column from the sheet name
regional_dfs = [
df.assign(region=sheet_name)
for sheet_name, df in all_sheets.items()
if sheet_name != "Summary"
]
combined_df = pd.concat(regional_dfs, ignore_index=True)
print(combined_df.shape)
print(combined_df["region"].value_counts())
16.7.3 Post-Processing with openpyxl After pandas Write
When pd.ExcelWriter writes a workbook, you can access the underlying openpyxl workbook object through the writer to apply formatting:
with pd.ExcelWriter("report.xlsx", engine="openpyxl") as writer:
summary_df.to_excel(writer, sheet_name="Summary", index=False)
# Access the openpyxl workbook and worksheet
workbook = writer.book
worksheet = writer.sheets["Summary"]
# Apply header formatting
header_font = Font(bold=True, color="FFFFFF")
header_fill = PatternFill(fill_type="solid", fgColor="1F4E79")
for cell in worksheet[1]:
cell.font = header_font
cell.fill = header_fill
cell.alignment = Alignment(horizontal="center")
# Set column widths
for col_idx, col_name in enumerate(summary_df.columns, start=1):
worksheet.column_dimensions[get_column_letter(col_idx)].width = (
max(len(str(col_name)) + 4, 14)
)
This is the most ergonomic hybrid approach: use pandas to write the data (which handles column names and row indexing automatically), then use openpyxl to format the result.
16.8 CSV Edge Cases
Chapter 9 covered reading basic CSV files. Production CSV files frequently deviate from the happy path in ways that cause silent data corruption.
16.8.1 Non-Standard Delimiters
Not all "CSV" files use commas. European systems frequently use semicolons (because commas are decimal separators in many European locales). Tab-separated values (TSV) files from ERP exports are common:
# Semicolon-delimited
df = pd.read_csv("export.csv", sep=";")
# Tab-delimited
df = pd.read_csv("export.tsv", sep="\t")
# Auto-detect delimiter (uses Python's csv.Sniffer)
df = pd.read_csv("unknown.csv", sep=None, engine="python")
16.8.2 Quoting and Embedded Commas
A customer address like "Smith, John" contains a comma inside quotes. If the file is not properly quoted, pandas will split this into two fields. The standard solution: pandas handles properly quoted CSV correctly by default. Problems arise with inconsistent quoting:
# Handle different quoting characters
df = pd.read_csv("export.csv", quotechar='"') # Default; explicit for clarity
df = pd.read_csv("export.csv", quotechar="'") # Some systems use single quotes
# Skip rows that fail to parse (use sparingly — silent data loss)
df = pd.read_csv("messy.csv", on_bad_lines="skip")
# Raise an error on bad lines (safer for production)
df = pd.read_csv("messy.csv", on_bad_lines="error")
16.8.3 Encoding Issues
The most common source of mysterious question marks and garbled characters in data from legacy systems:
# Try common encodings in order
for encoding in ["utf-8", "utf-8-sig", "latin-1", "cp1252"]:
try:
df = pd.read_csv("export.csv", encoding=encoding)
print(f"Success with {encoding}")
break
except UnicodeDecodeError:
print(f"Failed with {encoding}")
# For files from Windows systems (BOM marker issue)
df = pd.read_csv("export.csv", encoding="utf-8-sig") # Handles Windows BOM
# For European Windows-legacy files
df = pd.read_csv("export.csv", encoding="cp1252")
The utf-8-sig encoding is specifically for files from Microsoft tools that write a byte order mark (BOM) at the start of UTF-8 files. Loading these as plain utf-8 produces a spurious \ufeff character at the start of the first column name.
16.8.4 Writing CSVs with Controlled Encoding
# Always specify encoding when writing CSVs for cross-system use
df.to_csv("output.csv", index=False, encoding="utf-8-sig") # Excel-compatible
df.to_csv("output.csv", index=False, encoding="utf-8") # Standard
16.8.5 Large CSV Files
When a CSV is too large to fit in memory, read it in chunks:
chunk_size = 50_000
chunks = []
for chunk_df in pd.read_csv("large_export.csv", chunksize=chunk_size):
# Process each chunk (e.g., filter to relevant rows)
filtered_chunk = chunk_df[chunk_df["region"] == "West"]
chunks.append(filtered_chunk)
west_df = pd.concat(chunks, ignore_index=True)
16.9 xlwings: Brief Overview for Live Excel Integration
xlwings connects Python to a running Excel application. Where openpyxl works with saved files, xlwings works with Excel itself. This is useful when:
- You need Excel to recalculate complex formula chains before reading results
- You need to trigger VBA macros from Python
- You're building Python functions that Excel can call as User Defined Functions (UDFs)
- You're automating workflows where a human will continue working in the file after Python runs
pip install xlwings
A brief xlwings example:
import xlwings as xw
# Connect to an already-open workbook
wb = xw.Book("acme_report.xlsx") # Opens or connects to the file
ws = wb.sheets["Summary"]
# Read a range
summary_data = ws.range("A1:E10").value # Returns a list of lists
# Write to a cell
ws.range("G1").value = "Updated by Python"
# Write a 2D array starting at a cell
ws.range("A15").value = [["Region", "Revenue"], ["West", 742000]]
# Call a VBA macro
wb.macro("RefreshPivotTables")()
# Save and close
wb.save()
wb.close()
xlwings also supports UDFs — Python functions that appear in Excel's function library:
import xlwings as xw
@xw.func
def calculate_commission(revenue, tier):
"""Excel-callable Python function: =calculate_commission(C2, D2)"""
rates = {"Bronze": 0.05, "Silver": 0.07, "Gold": 0.09}
return revenue * rates.get(tier, 0.05)
When not to use xlwings: For automated report generation (Marcus's use case), xlwings is overkill. It requires Excel to be installed and running, which makes scheduling on a server difficult. For producing formatted Excel files from data, openpyxl is the right tool. Use xlwings when you need two-way real-time interaction with Excel or need Excel's formula engine.
16.10 Best Practices: Python → Excel vs Just Excel
A common mistake among new Python-in-business users is reaching for Python when Excel would be faster. Here's an honest decision framework:
Use Excel directly when: - The dataset is small (under 50,000 rows) and won't grow - The analysis is one-time and won't be repeated - The output needs to be modified by non-technical users after you deliver it - The formatting requirements are complex and unique to this one report - You're building a model someone else will maintain
Use Python → Excel when: - The report is generated repeatedly (weekly, monthly) from fresh data - The data comes from a source that's awkward to work with in Excel (SQL database, API, web scrape) - The same report structure is needed for multiple outputs (one per region, one per client) - The formatting is consistent and can be codified once - The dataset is large (Excel has a 1,048,576-row limit; Python doesn't) - Calculation errors in Excel are a risk (complex formula chains with circular dependencies)
The sweet spot: Python for computation, openpyxl for formatting, Excel for delivery. The report goes to Marcus or Sandra as an Excel file. They can open it, print it, email it, and sort columns within it — all the things Excel users expect. Python handles the part they can't easily do: repeatable, accurate, fast data assembly.
Summary
This chapter covered the two sides of Python's Excel relationship: reading with pandas and openpyxl, and writing formatted workbooks with openpyxl. The complete workflow — read data, transform in pandas, output to formatted Excel — is one of the highest-value automation patterns for business analysts. It converts hours of manual reporting into seconds of script execution.
The chapter also covered CSV edge cases that trip up production systems: encoding issues, non-standard delimiters, and large file handling. These are unglamorous topics that matter a great deal when your CSV export arrives from a 15-year-old ERP system written when UTF-8 wasn't yet universal.
Maya's complete invoice generator, the milestone that closes her automation arc through Chapter 16, demonstrates the full pattern in production context. In Chapter 17 you'll extend automation further — scheduling scripts, monitoring folders, and sending email notifications automatically.
Next: Chapter 17 — Automating Repetitive Office Tasks