Part 2: Working with Business Data

DataField.Dev

Part 2: Working with Business Data

Chapters 9–16

Something shifted when you finished Part 1.

You can write Python now. You understand variables and data types. You can encode decision logic with if and elif, automate repetition with for and while, package reusable logic into functions, store collections in lists and dictionaries, and write code that does not catastrophically fail the first time it encounters a null value or an unexpected format. These are real skills. You earned them.

But here is the thing you have probably noticed: everything you built in Part 1 was self-contained. The data lived inside the script. You typed the numbers directly into the code. The programs ran, produced output, and then forgot everything the moment they finished. That is fine for learning. It is not fine for business.

Business data does not live in Python scripts. It lives in files — in the CSV exports your CRM produces every Friday, in the Excel workbooks your finance team has maintained since 2011, in the JSON payloads that come back from your payment processor's API, in the folders where regional offices drop their weekly reports. The moment you learn to read those files, work with their contents, and write clean output back out, the gap between "learning Python" and "using Python for actual work" closes.

That is what Part 2 is about.

Where You Are Coming From

In Part 1, you acquired the building blocks. To be specific:

Chapter 1 made the case for why Python is worth your time. Chapter 2 got your environment set up — Python installed, VS Code configured, virtual environment ready. Chapter 3 introduced variables, data types, and operators in business terms. Chapter 4 taught you to encode decision logic: if this condition, then this action. Chapter 5 gave you loops — the ability to apply the same operation to a hundred items as easily as to one. Chapter 6 showed you how to package logic into functions that can be called repeatedly without rewriting. Chapter 7 introduced the four core data structures — lists, tuples, dictionaries, and sets — and how they map to real business concepts. Chapter 8 taught you error handling, which is the difference between code that works once in ideal conditions and code that holds up when reality fails to cooperate.

You have seen both Priya and Maya start to apply these tools. Priya refactored her first manual calculation into clean, reusable functions. Maya rebuilt her rate calculator as a proper function library — one that computes volume discounts, retainer rates, and subcontract margins for each of her clients — and it runs correctly on the first invocation rather than requiring her to change the variables at the top and run it again for every client.

Neither of them is doing anything sophisticated yet. But both of them understand how Python thinks. That understanding is the foundation on which everything in Part 2 rests.

What Part 2 Covers

Eight chapters. A complete toolkit for working with business data.

Chapter 9 opens the door: file I/O. Paths, the open() function, context managers, reading and writing text files, the csv module, JSON files, and the pathlib library. Priya uses this to write her first proper file consolidation script — reading four regional CSV files from a folder, combining them, and writing a master output file. Maya builds a persistent project log that she can open, update, and query without touching a spreadsheet application. Both of them are, for the first time, building things that survive past the end of a script.

Chapter 10 introduces pandas — the library that more or less defines what Python for data work means. DataFrames, Series, the pandas mindset versus the Excel mindset, basic inspection, indexing and slicing. If Chapter 3 was "Python has variables," Chapter 10 is "Python has a spreadsheet engine, and it is extraordinarily powerful."

Chapter 11 is where you load real datasets. read_csv(), read_excel(), reading from URLs. The Acme Corp sales dataset — the one Priya has been manually processing every Monday — makes its first appearance as a proper DataFrame. You learn to inspect data with .info(), .describe(), .head(), and .value_counts(), which are the pandas equivalent of opening a file and getting your bearings.

Chapter 12 addresses the least glamorous and most necessary part of real data work: cleaning. Missing values, duplicates, type mismatches, inconsistent string formats, outlier detection. Real business data is almost never clean. The analyst who knows how to clean it programmatically — quickly, consistently, reproducibly — is considerably more valuable than the one who cleans it by hand.

Chapter 13 is transformation and aggregation: .apply(), .map(), .groupby(), .agg(), .merge(), .join(), .pivot_table(). This is where pandas earns its reputation. Priya builds a weekly report — regional summaries, tier breakdowns, week-over-week comparisons — that would have taken two hours in Excel and now takes thirty-five minutes, most of which is thinking rather than mechanics.

Chapter 14 introduces data visualization with matplotlib. Line charts, bar charts, histograms, scatter plots. The mechanics of making a chart, the principles of making a good one.

Chapter 15 goes further with seaborn and plotly — statistical charts, interactive dashboards, multi-chart layouts. By the end of this chapter, you can build visualizations that Excel cannot produce, and you can automate their creation from fresh data.

Chapter 16 closes the part with Excel and CSV integration: openpyxl for reading and writing Excel files, xlwings for live Excel integration, multi-sheet workbooks, proper formatting. Maya's invoicing system reaches its first complete version here — a Python program that reads her client data, calculates everything correctly, and writes a properly formatted Excel workbook she can send to clients.

The Promise

By the end of Chapter 16, you will be able to do something that most business professionals — even experienced ones — cannot:

You will be able to take a folder of messy, inconsistent, real-world data files, load them into Python, clean them, transform them, join them against reference data, aggregate them into summaries, visualize the results, and write the output into a formatted Excel workbook or set of charts. You will be able to do this in minutes, from new data, every time, with the same results.

That is not a modest capability. That is the core skill of a professional analyst — and you will have built it from the ground up.

Updates: Priya and Maya

Priya starts Part 2 with Python basics in hand and a Monday morning problem she is motivated to solve. Sandra Chen still wants that 9 AM report. Marcus still thinks Python is "another thing to support." By Chapter 13, Priya has written a script that Sandra describes, without irony, as "exactly what I've been asking for for two years."

Maya starts Part 2 with a working rate calculator and the uncomfortable knowledge that her project tracking system is still three disconnected spreadsheets. By Chapter 16, she has an automated invoicing system that reads her project data, applies all her billing rules, and writes formatted invoices without her touching Excel at all. The client who used to wait an extra week for invoices because Maya was busy has already commented on how much faster things have gotten.

A Note on Pace

Part 2 has more technical depth than Part 1. The pandas chapters in particular introduce a large surface area — a lot of methods, a lot of options, a lot of ways to accomplish the same thing. Do not try to memorize it. The goal is pattern recognition: understanding what kind of problem each tool solves, so that when you encounter that kind of problem in your own work, you know where to look.

Write the code. Change it. Break it. See what happens when you use .agg() with a different function. See what your DataFrame looks like after a .merge() with how="left" versus how="inner". The experimentation is the learning.

Let's work with some data.

Chapter 9: File I/O — Reading and Writing Business Data →