Part 3: Automation and Productivity

DataField.Dev

Part 3: Automation and Productivity

Chapters 17–24

This is the part of the book where the work stops feeling like learning and starts feeling like leverage.

Everything in Parts 1 and 2 was foundational. You learned the language. You learned to read files and write them back. You learned pandas, which gave you a proper data analysis engine. You learned to clean messy data, transform it into what you need, and turn it into charts and formatted Excel outputs. Those capabilities are real and they are yours.

But most of what you built in Parts 1 and 2 still required you to be at the keyboard. You ran the script, it did its work, it finished. You initiated it; you managed it. The automation was partial.

Part 3 is about taking that partial automation and making it complete. Scripts that run on a schedule without your involvement. Programs that reach out to the web and pull back data you would have spent an afternoon hunting manually. Automated emails that fire when a condition is met, without anyone deciding to send them. PDFs and Word documents generated from templates in seconds. A database that stores your organization's operational data and answers queries in milliseconds. Cloud services that run your code even when your laptop is closed.

By the end of this part, you will have automated significant chunks of your workday. Not in a theoretical sense — in the sense that there are specific hours you used to spend on specific tasks that Python will now handle without you.

What You Can Do Coming In

Before naming what Part 3 covers, it is worth appreciating what you now have from Parts 1 and 2.

You can write Python programs with clean logic, functions, and error handling that do not fall apart when the input is messy. You can read files — CSV, JSON, text — from disk and iterate over them. You can load a pandas DataFrame from a real dataset, clean it, transform it, merge it against reference data, aggregate it with .groupby(), and produce exactly the summary view you need. You can build charts with matplotlib, statistical visualizations with seaborn, interactive dashboards with plotly. You can write formatted Excel workbooks with multiple sheets.

You have, in other words, a complete analytical toolkit. What you have not yet done is turn that toolkit into infrastructure — into processes that run themselves, connect to external systems, and scale beyond what you could manage manually.

That is what Part 3 builds.

What Part 3 Covers

Chapter 17 starts where Marcus Webb starts every morning at 7:15. He spends twenty-five minutes moving files, renaming them, checking for missing ones, archiving old ones. Every weekday. For three years. Priya, watching from three desks away, has been quietly wondering whether any of that needs to be manual. This chapter teaches you to automate it: file organization scripts, bulk renaming, folder watching, report assembly with shutil and os. The automation audit — the framework for deciding what is worth automating — is here too.

Chapter 18 covers PDFs and Word documents. PyPDF2 for reading and splitting PDFs. python-docx for generating Word documents from templates. If your organization produces any kind of standardized document — contracts, proposals, status reports, briefing packs — this chapter shows you how to generate them programmatically from data rather than assembling them by hand.

Chapter 19 is email automation. smtplib, HTML emails with formatting, attachments, Slack webhooks, alert systems that fire when KPIs cross a threshold. The scenario: Sandra Chen wants to be notified automatically when any region's weekly revenue drops more than fifteen percent from the prior week. By the end of this chapter, that notification sends itself.

Chapter 20 covers web scraping — the ability to pull structured data from websites programmatically. The requests library, BeautifulSoup, navigating HTML, handling rate limits, respecting robots.txt. Practical applications: competitor pricing, job posting intelligence, public financial data, lead generation from directory listings. This chapter also covers the ethics and legality of scraping, because those matter.

Chapter 21 is APIs and external data services. REST APIs, HTTP methods, authentication, working with JSON responses, real business APIs. Once you understand how to talk to an API, an enormous range of data sources becomes directly accessible: weather data for logistics planning, currency rates for international finance, news feeds for market intelligence, your own CRM and ERP systems.

Chapter 22 is scheduling: how to make your scripts run themselves. The schedule library, cron jobs on Mac and Linux, Task Scheduler on Windows, APScheduler for more complex pipelines. Building a scheduled reporting pipeline that runs every Monday at six forty-five without Priya touching it.

Chapter 23 introduces databases — specifically SQLite and PostgreSQL via Python. Relational database concepts, basic SQL, the sqlite3 module, SQLAlchemy ORM. This is where Acme Corp's operational data starts living somewhere more appropriate than a shared network folder full of Excel files.

Chapter 24 closes Part 3 with cloud services: AWS S3 and Google Cloud Storage for reading and writing files, secrets management, environment variables, a brief introduction to serverless functions. Running Python in the cloud is not as complicated as it sounds, and the payoff — scripts that run reliably without depending on anyone's laptop being open — is substantial.

The Promise

After Part 3, you will have automated things that you used to do manually. Not one thing — multiple things. A report that generates itself. An alert that fires when the data says it should. A document that writes itself from a template. A script that pulls data from an API and stores it in a database, on a schedule, without supervision.

This is the moment when business professionals who learn Python tend to look back at their old workflows and experience something that is equal parts relief and mild frustration that they did not do this sooner. The relief is real. So is the frustration. Both are appropriate.

Updates: Priya and Maya

Priya enters Part 3 having transformed her Monday morning from a two-hour manual operation into a thirty-five-minute analytical exercise. But the script still requires her to run it. By Chapter 22, that changes: the Monday consolidation and report run automatically at 6:45 AM, hit Sandra's inbox before anyone arrives, and Priya's Monday morning is now available for the thing analysts are actually supposed to do — think about what the numbers mean.

Marcus, watching this happen, has stopped saying "another thing to maintain" and started asking Priya how she did it.

Maya enters Part 3 with an automated invoicing system that generates correct Excel invoices for each of her clients. But she still sends them manually, still follows up manually, and still has no visibility into which clients are slow to pay until she goes looking. By Chapter 19, her invoicing system emails invoices automatically upon completion, flags overdue accounts with a Slack message, and sends her a weekly receivables summary without her initiating any of it. Her administrative overhead for invoicing has gone from approximately three hours per billing cycle to approximately fifteen minutes.

A Note on What You Are Building

Part 3 introduces more external systems than any other part of the book — files, email, web, APIs, databases, cloud. Each chapter is somewhat independent; you do not need to have read Chapter 20 on web scraping before reading Chapter 21 on APIs. But the cumulative effect is more important than any individual chapter.

What you are building, over the course of these eight chapters, is the habit of asking a different question about your work. Not "how do I do this?" but "does this need to be done manually at all?" That shift in how you think about your work is, arguably, the most valuable thing Part 3 delivers.

Let's automate something.

Chapter 17: Automating Repetitive Office Tasks →