37 min read

> "Nobody cares about your resume. They care about your work."

Learning Objectives

  • Evaluate what makes a data science portfolio project compelling versus generic
  • Transform an analytical notebook into a portfolio-ready narrative with clear structure and polished visuals
  • Write project descriptions that emphasize the question, approach, and findings rather than just the techniques used
  • Build a professional GitHub profile with pinned repositories, README descriptions, and contribution activity
  • Plan two additional portfolio projects that demonstrate range and align with career goals

Chapter 34: Building Your Portfolio: Projects That Get You Hired

"Nobody cares about your resume. They care about your work." — Austin Kleon, Show Your Work!


Chapter Overview

You've spent 33 chapters learning how to think with data. You can wrangle messy datasets, build visualizations that actually communicate, run statistical tests, train machine learning models, evaluate them honestly, and write up your findings for real audiences. That's a serious collection of skills.

But here's the uncomfortable truth: skills you can't demonstrate might as well not exist.

No hiring manager has ever said, "Well, the candidate told me they were great at pandas, and that was good enough for me." They want to see your work. They want to read your thinking. They want to look at a project you did and understand not just what you built, but how you think.

That's what a portfolio is for. Not a list of certificates. Not a collection of course assignments. A portfolio is a curated set of projects that shows a real person asking real questions, wrestling with real data, and producing real insights. It's the bridge between "I took a data science class" and "I am a data scientist."

This chapter is about building that bridge — and it's more practical than anything we've covered so far. We're going to talk about what hiring managers actually look for, what makes a project stand out, how to structure your GitHub profile, how to write a README that makes someone want to keep reading, and how to turn the vaccination rate project you've been building throughout this book into the centerpiece of your professional portfolio.

In this chapter, you will learn to:

  1. Evaluate what makes a data science portfolio project compelling versus generic (all paths)
  2. Transform an analytical notebook into a portfolio-ready narrative with clear structure and polished visuals (all paths)
  3. Write project descriptions that emphasize the question, approach, and findings rather than just the techniques used (all paths)
  4. Build a professional GitHub profile with pinned repositories, README descriptions, and contribution activity (all paths)
  5. Plan two additional portfolio projects that demonstrate range and align with career goals (all paths)

Note — Learning path annotations: Every objective in this chapter is marked (all paths) because portfolio building is essential regardless of your career direction. Whether you're aiming for data analyst, data scientist, or ML engineer roles, your portfolio is how people evaluate you before they meet you.


34.1 Why a Portfolio Matters More Than a Degree

Let's start with a question that might feel a little provocative: if you had to choose between a master's degree in data science and a portfolio of three outstanding projects, which would help you get hired faster?

The answer, according to virtually every hiring manager and data science leader I've encountered, is: it depends on the company, but the portfolio never hurts, and the degree sometimes isn't enough.

Here's why.

The Hiring Manager's Problem

Imagine you're a data science manager at a mid-sized company. You posted a job opening for a junior data scientist, and you received 347 applications. Most of the resumes look similar: Python, pandas, scikit-learn, SQL, Tableau, "experience with machine learning." About 200 of the applicants have a relevant degree. About 150 completed a data science bootcamp. At least 80 have some kind of certification.

You need to get that list down to 10 people to phone screen, and then 4 to bring in for interviews. You have maybe 90 seconds per application before you have to decide: yes, maybe, or no.

What separates "yes" from "maybe" in those 90 seconds?

It's almost never the degree. It's almost never the list of skills. It's whether, somewhere in the application, there's evidence that this person can actually do the work — not just list the tools, but ask a question, get messy data, clean it, analyze it, build something, and explain what they found.

A portfolio link in a resume is like a door. When the hiring manager clicks it and finds a well-organized GitHub profile with three or four projects that have clear READMEs, interesting questions, and clean notebooks, that application moves to the "yes" pile. When they click it and find a default GitHub profile with forked repositories and no original work, or when there's no link at all, it moves to "maybe" at best.

What a Portfolio Proves

A degree proves you can complete a structured program. That's valuable, but it's table stakes. A portfolio proves something harder and more important:

  • You can identify interesting questions. Not just answer homework problems, but notice something in the world that data could illuminate.
  • You can work with real data. Not clean textbook datasets, but the messy, incomplete, frustrating data that actual jobs involve.
  • You can make decisions under uncertainty. Every data science project involves choices — how to handle missing values, which model to try, what to include in the final report. Your portfolio shows your judgment.
  • You can communicate. The biggest complaint hiring managers have about junior data scientists is that they can build models but can't explain them. A well-written notebook or blog post proves you can.
  • You're genuinely curious. A portfolio full of projects on topics you clearly care about tells the hiring manager something a resume never can: this person loves this work.

The Good News

Here's the good news: you've already done most of the hard part. Over the course of this book, you've built a complete data science investigation — the vaccination rate analysis. You've cleaned data, built visualizations, run statistical tests, trained models, and written up findings. That project, polished and presented well, is already a strong portfolio piece.

This chapter is about the polishing and the presenting. It's about taking work you've already done (or can do with skills you already have) and making it visible, professional, and compelling.


34.2 What Makes a Portfolio Project Stand Out

Not all portfolio projects are created equal. Let me be direct about what works and what doesn't, because I've seen a lot of portfolios, and the patterns are clear.

The Generic Portfolio (What Everyone Has)

Here's what 80% of aspiring data scientists have in their portfolios:

  1. Titanic survival prediction. The Kaggle classic. You predicted who survived the Titanic using passenger class, age, and gender. So did 50,000 other people.

  2. Iris flower classification. Three species, four features, 150 samples. You got 97% accuracy. Everyone gets 97% accuracy. The dataset was designed to be easy.

  3. MNIST handwritten digit recognition. You built a neural network that classifies handwritten digits. It works. It's also every deep learning tutorial's first exercise.

  4. Housing price prediction. You used the Boston Housing dataset (or its replacement) to predict home prices with linear regression. The notebook has no written analysis — just code and a final RMSE number.

  5. Sentiment analysis of movie reviews. You downloaded the IMDB dataset and trained a classifier. The notebook ends with an accuracy score and no interpretation.

What do these projects have in common? They're all tutorials. They use pre-packaged datasets, follow well-worn paths, and require almost no original thinking. A hiring manager looking at this portfolio learns that you can follow a tutorial. They don't learn whether you can think.

I'm not saying these datasets are worthless for learning — they're excellent practice. But as portfolio pieces, they need something extra: an original question, a novel angle, or at minimum, a thoughtful written analysis that goes beyond what any tutorial provides.

What Makes a Project Compelling

A standout portfolio project has five qualities. I call them the CRISP criteria:

C — Clear question. The project starts with a specific, interesting question. Not "let me explore this dataset" but "I wanted to know whether airline delays have gotten worse over the past decade, and if so, whether certain routes are affected more than others." The question should be something a curious person would actually want to know the answer to.

R — Real (or realistic) data. The data isn't a pre-cleaned Kaggle dataset. It's something the person found, downloaded, or scraped. Maybe it came from a government open data portal, a public API, or a combination of sources that had to be merged and cleaned. The messiness of the data — and how the person dealt with it — is part of the story.

I — Independent thinking. The project shows decisions the person made and why. "I chose to drop rows with missing income data rather than imputing because the missingness appeared to be related to income level itself (see Figure 3), which would bias any imputation." That sentence tells a hiring manager more about analytical ability than 100 lines of code.

S — Story and structure. The project reads like a narrative, not a code dump. There's an introduction that sets up the question, sections that walk through the analysis step by step, visualizations with actual captions that explain what the reader should notice, and a conclusion that answers the original question (or honestly explains why the data couldn't fully answer it).

P — Polished presentation. The notebook is clean. There are no leftover debugging cells. Variable names are meaningful. Charts have proper labels and titles. The Markdown headers create a logical structure. The whole thing looks like something the person is proud of, not something they threw together at 2 AM.

The CRISP Criteria in Practice

Let's see how these criteria distinguish a generic project from a compelling one:

Generic: "Titanic Survival Prediction" — Loads the dataset, encodes features, trains a random forest, reports accuracy.

Compelling: "What the Titanic Passenger Data Reveals About Class, Gender, and Survival in Maritime Disasters" — Starts with a historical question about whether class-based evacuation protocols systematically disadvantaged certain passengers. Uses the Titanic data but goes beyond prediction: examines survival rates across intersections of class, gender, and age; contextualizes findings with historical accounts; discusses the ethical dimensions of "women and children first" policies and who actually benefited. Ends with a reflection on what the data can and cannot tell us.

Same dataset. Completely different project. The second one shows thinking.

Generic: "Housing Price Prediction" — Runs linear regression, reports RMSE, done.

Compelling: "Does a View of the Mountains Really Add $50,000? Estimating the Price Premium of Geographic Features in Colorado Real Estate" — Scrapes listing data from a real estate API, engineers a "mountain view" feature using geographic coordinates, compares model performance with and without the feature, and writes a blog-style analysis of whether the common real estate wisdom holds up.

Same technique. Completely different level of analytical curiosity.

The Three-Project Portfolio

You don't need twenty projects. You need three to five good ones. Here's a portfolio structure that demonstrates range:

Project 1: The Deep Dive. One project where you went deep — thorough data cleaning, multiple analytical approaches, careful interpretation. This is your showcase piece. Your vaccination rate analysis from this book is a natural fit here.

Project 2: The Domain Project. One project in a domain you genuinely care about. Sports? Music? Education? Climate? Healthcare? Finance? Pick a domain where your enthusiasm shows through. Hiring managers can tell when someone is genuinely interested versus going through the motions.

Project 3: The Technical Demonstration. One project that shows a specific technical skill clearly. Maybe it involves web scraping, or working with a large dataset, or building an interactive dashboard, or deploying a model. This project doesn't need to be as narratively rich — it's showing that you can handle a specific technical challenge.

Optional Project 4: The Quick Analysis. A shorter project — maybe a blog post — where you take a timely question and do a quick, clean analysis. This shows you can work efficiently and communicate concisely, not just produce 50-page notebooks.

Tip

Quality over quantity. Always. Three polished projects beat ten sloppy ones. If a project isn't something you'd be proud to show a hiring manager, don't include it in your portfolio.


34.3 Transforming Your Notebook into a Portfolio Piece

You've been building the vaccination rate analysis throughout this book. It has data cleaning, exploration, visualization, statistical testing, modeling, and interpretation. But right now, it's probably a collection of chapter exercises — useful for learning, but not organized as a coherent story.

Let's fix that.

The Portfolio Notebook Structure

A portfolio-ready Jupyter notebook follows a structure that mirrors a good data science report:

1. Title and Introduction
   - What question are you investigating?
   - Why does it matter?
   - What data are you using? (with source attribution)

2. Data Acquisition and Cleaning
   - Where did the data come from?
   - What problems did you find?
   - What decisions did you make to address them?
   - Summary statistics of the cleaned data

3. Exploratory Analysis
   - Key visualizations with written interpretations
   - Surprising findings or patterns
   - Questions that emerged from exploration

4. Formal Analysis / Modeling
   - What methods did you choose, and why?
   - Results with honest interpretation
   - Model evaluation and comparison (if applicable)

5. Findings and Conclusions
   - What did you learn?
   - What are the limitations?
   - What would you do differently with more time or data?

6. References and Acknowledgments

Step-by-Step: Polishing Your Vaccination Project

Here's exactly how to take your chapter-by-chapter vaccination analysis and transform it into a portfolio piece.

Step 1: Create a new notebook. Don't try to edit your working notebooks. Start a fresh notebook called something like vaccination-rate-disparities-analysis.ipynb. You're going to curate your best work from the chapter notebooks into this new one.

Step 2: Write the introduction first. Before any code, write three to four paragraphs of Markdown explaining: - What question you're investigating (e.g., "What factors explain the wide variation in COVID-19 vaccination rates across countries and regions?") - Why it matters (public health policy, equity, pandemic preparedness) - What data you're using (WHO vaccination data, World Bank economic indicators, etc.) - What approach you took (exploratory analysis, statistical testing, predictive modeling)

This introduction is the most important part of your portfolio notebook. It's what a hiring manager reads first. If it's clear, specific, and interesting, they'll keep reading. If it's "In this notebook I explore vaccination data," they'll close the tab.

Step 3: Show your data cleaning thoughtfully. Don't show every cleaning step — that's tedious. Show the interesting ones: the decisions that required judgment. For example:

# 47 countries had missing GDP data. Rather than dropping them entirely
# (which would eliminate most of Sub-Saharan Africa from the analysis),
# I imputed GDP using the most recent available year's data from the
# World Bank, flagging imputed values for sensitivity analysis later.

This kind of annotation shows thinking, not just coding.

Step 4: Curate your visualizations. You probably created dozens of charts across chapters. Pick the five to eight that tell the most compelling story. Each chart should have: - A descriptive title (not "Figure 1" but "Vaccination Rates by World Bank Income Group, 2021-2023") - Axis labels with units - A Markdown cell immediately after it that explains what the chart shows and what the reader should notice

A common mistake is including every visualization you created. Don't. A portfolio notebook is not a lab notebook — it's an edited, curated presentation of your best analytical work.

Step 5: Present your statistical analysis. When you present hypothesis tests or model results, explain them in plain language alongside the technical output:

# The Kruskal-Wallis test reveals a statistically significant difference
# in vaccination rates across income groups (H = 89.3, p < 0.001).
# Post-hoc Dunn's test shows the primary driver is the gap between
# low-income and all other groups — the difference between upper-middle
# and high-income countries is much smaller.

Step 6: Be honest about limitations. This is where many beginners fail — they present their results as if they're definitive. A portfolio piece that honestly discusses limitations is more impressive, not less:

  • "This analysis uses country-level aggregates, which mask within-country variation that may be substantial."
  • "GDP per capita is a rough proxy for economic development; healthcare spending or health system capacity might be better predictors but were not available for all countries."
  • "The models explain about 62% of the variance in vaccination rates, suggesting important factors not captured in our data."

Step 7: Write a conclusion that answers your question. Return to the question you asked in the introduction and answer it directly. What did you find? What surprised you? What would you investigate next?

Before and After: A Real Comparison

Before (raw chapter exercise):

# load data
df = pd.read_csv('vaccination_data.csv')
df.head()
df.describe()
plt.figure(figsize=(10,6))
plt.bar(df.groupby('region')['rate'].mean().index,
        df.groupby('region')['rate'].mean().values)
plt.show()

After (portfolio version):

# ============================================================
# Data Loading and Initial Inspection
# ============================================================
# Source: WHO COVID-19 Vaccination Dataset
# Downloaded: 2024-03-15 from https://covid19.who.int/data
# Coverage: 194 WHO member states, January 2021 - December 2023
# ============================================================

vaccination_df = pd.read_csv(
    'data/who_vaccination_rates_2021_2023.csv',
    parse_dates=['date_reported'],
    dtype={'country_code': 'category'}
)

print(f"Dataset contains {len(vaccination_df):,} records")
print(f"covering {vaccination_df['country_code'].nunique()} countries")
print(f"from {vaccination_df['date_reported'].min():%B %Y} "
      f"to {vaccination_df['date_reported'].max():%B %Y}")

Followed by a Markdown cell:

The WHO vaccination dataset contains daily cumulative vaccination counts for 194 member states. After aggregating to country-level completion rates (defined as the percentage of the population fully vaccinated as of the most recent reporting date), we can compare vaccination progress across regions and income groups.

The difference isn't just cosmetic — it shows that the author thinks about the reader, which is exactly what data science communication requires.

Removing the Scaffolding

Your working notebooks from each chapter contain exploration, dead ends, debugging, and experiments. That's what learning looks like, and it's great. But a portfolio notebook shouldn't show your scaffolding.

Remove: - Cells where you were figuring out how a function works - Duplicate analyses where you tried something, it didn't work, and you tried something else - Debugging output (print(df.shape) repeated twelve times) - Cells with no output or cells that just say # TODO - Any cell that begins with "As per Chapter X..."

Keep: - The final, clean version of each analysis step - Decisions you made and why (these are gold in a portfolio) - Honest discussions of what didn't work and what you learned from it

Think of it this way: Your working notebooks are your kitchen. Your portfolio notebook is the plated dish you bring to the table. Nobody wants to see the pile of dirty pans — they want the finished meal, beautifully presented, with a story about where the ingredients came from.


34.4 Building Your GitHub Profile

Your GitHub profile is your professional storefront. For many hiring managers, it's the first thing they look at after your resume — sometimes before your resume. Let's make it count.

Setting Up Your Profile

If you don't have a GitHub account yet, create one. Choose a username that looks professional — your name or a clean variation of it. janesmith-data is fine. xXx_dataLord_420_xXx is not.

Profile README. GitHub lets you create a special repository with the same name as your username (e.g., janesmith/janesmith) that displays as your profile README. Use this. Keep it brief and professional:

# Hi, I'm Jane Smith

I'm an aspiring data scientist with a background in public health.
I'm passionate about using data to understand health disparities
and inform evidence-based policy.

**What I'm working on:**
- Analyzing global vaccination rate disparities using WHO data
- Building interactive dashboards with Plotly
- Learning SQL and cloud data tools

**Skills:** Python | pandas | scikit-learn | matplotlib | seaborn |
SQL | Jupyter | Git

**Find me:** [LinkedIn](link) | [Blog](link)

That's it. No ASCII art. No animated GIFs. No "visitor counter" badges. Clean, professional, human.

Profile photo. Use a real photo of yourself — or at least something that looks professional. The default gray avatar signals "I set up this account and never came back."

Pinned repositories. GitHub lets you pin up to six repositories to the top of your profile. Use this feature strategically. Pin your best projects — the ones with clear READMEs and polished work.

Repository Structure for Data Science Projects

Every portfolio project should be its own repository with a consistent structure:

vaccination-rate-analysis/
    README.md              # Project overview (this is critical)
    notebooks/
        analysis.ipynb     # The polished analysis notebook
    data/
        raw/               # Original data files (or instructions to download)
        processed/         # Cleaned data files
    src/                   # Helper scripts or functions (if any)
    figures/               # Saved versions of key charts
    requirements.txt       # Python dependencies
    .gitignore             # Exclude large data files, checkpoints

Two important details:

  1. Data files. If the data is small (under 50 MB), include it. If it's large, include a script or clear instructions for downloading it. Never commit massive data files to git — it makes your repository unwieldy and sometimes violates data licenses.

  2. The .gitignore file. At minimum, ignore .ipynb_checkpoints/, __pycache__/, .env, and any large data files. You don't want notebook checkpoints cluttering your repository.

Writing a README That Gets Read

Your README is the front door of your project. A hiring manager scanning your GitHub will read the README before opening the notebook — and if the README is empty or generic, they might never open the notebook at all.

A great project README has these sections:

Title. Something descriptive and interesting. Not "Project 1" but "Vaccination Rate Disparities: What Explains the Global Divide?"

Overview. Two to three sentences describing what the project investigates and what you found. Lead with the most interesting finding.

Motivation. Why did you do this project? What question were you trying to answer? This is where your genuine curiosity should shine through.

Data Sources. Where did the data come from? How big is it? What time period does it cover? Include links to the original sources.

Key Findings. Three to five bullet points summarizing your most important results. These should be written for a general audience — no jargon. Think of them as the "headline findings" that would go in an executive summary.

Methods. A brief summary of the analytical approach: what tools you used, what techniques you applied, how you evaluated results.

Repository Structure. A quick guide to what's in each folder and file.

How to Reproduce. Instructions for running the analysis: install dependencies, download data, run the notebook.

Limitations and Future Work. What would you do differently? What questions remain? This shows intellectual honesty.

Here's a realistic example:

# Vaccination Rate Disparities: What Explains the Global Divide?

## Overview

An analysis of COVID-19 vaccination rates across 194 countries,
investigating the relationship between economic indicators, healthcare
infrastructure, and vaccination coverage. Key finding: GDP per capita
explains ~45% of vaccination rate variance, but healthcare spending
as a percentage of GDP is a stronger predictor than absolute wealth
when controlling for region.

## Motivation

During the COVID-19 pandemic, vaccination rates varied enormously
across countries — from over 90% in some high-income nations to
under 10% in parts of Sub-Saharan Africa. I wanted to understand
what drove these differences: was it purely an economic issue, or
were other factors at play?

## Key Findings

- Low-income countries had a median vaccination rate of 14.2%,
  compared to 72.8% in high-income countries
- Healthcare spending (% of GDP) was a stronger predictor than
  GDP per capita alone (R² improvement of 0.08)
- Sub-Saharan Africa showed the widest within-region variation,
  suggesting country-level factors beyond economics matter
- A random forest model identified healthcare worker density as
  the single most important feature for prediction

## Data Sources

- WHO COVID-19 Vaccination Data (2021-2023)
- World Bank Development Indicators (2021)
- WHO Global Health Expenditure Database (2020)

Notice what this README does: it tells a story. It has a question, an approach, and findings. A hiring manager reading this for 30 seconds knows exactly what you did, what you found, and how you think.

Contribution Activity and Consistency

GitHub shows your contribution history as a green heatmap. While hiring managers know that commits don't equal productivity, a completely blank activity chart raises questions. Here are some ways to maintain visible activity:

  • Commit regularly as you work on projects, not just once when you upload everything. Small, descriptive commits ("clean missing values in GDP data," "add regional comparison chart") are better than one massive commit.
  • Write meaningful commit messages. "update" and "fix stuff" tell nobody anything. "Add sensitivity analysis for imputed GDP values" tells a story.
  • Don't game the system. Hiring managers can see through hundreds of trivial commits meant to make the heatmap green. Five meaningful commits per week are worth more than fifty empty ones.

34.5 Beyond GitHub: Other Ways to Showcase Your Work

Technical Blog Posts

Writing about data science is one of the most effective ways to build visibility and demonstrate your communication skills. A blog post about a project you did, a technique you learned, or a dataset you explored reaches a wider audience than a GitHub repository.

Where to blog: - Medium (and specifically its data science publications like Towards Data Science) has the largest built-in audience for data science content. - dev.to is popular among developers and data practitioners. - A personal website using GitHub Pages, Jekyll, Hugo, or a simple site builder gives you full control over presentation.

What to write about: - Take one of your portfolio projects and write it up as a narrative blog post. Not a tutorial ("how to build a random forest") but an investigation story ("What I learned about global vaccination disparities by building three different models"). - Write about a concept you struggled to understand. "A Beginner's Guide to Why Cross-Validation Matters (And When It Saved Me)" is exactly the kind of post hiring managers love, because it shows both humility and understanding. - Analyze a timely dataset. When interesting data is released — election results, sports statistics, economic indicators — do a quick, clean analysis and write it up while the topic is still relevant.

How to write well: - Start with the question, not the code. Your reader should understand what you're investigating in the first paragraph. - Use visualizations as anchors for the narrative. A good blog post alternates between text and charts, with each chart accompanied by interpretation. - Show your code, but not all of it. Include the important parts — the decisions, the interesting transformations, the model setup. Skip the boilerplate. - End with what you learned and what questions remain. The ending is what people remember.

Your Resume

Your resume should be one page (two at most for experienced professionals). For data science positions, it should include:

Skills section. Be specific and honest. List the tools you can actually use — not every tool you've heard of. Group them logically: "Languages: Python, SQL" and "Libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn" and "Tools: Jupyter, Git, GitHub."

Projects section. Include two to three projects with brief descriptions that follow the pattern: "Built [what] using [how] to answer [question], finding [result]." For example: "Analyzed WHO vaccination data for 194 countries using pandas and scikit-learn to identify predictors of vaccination coverage; found that healthcare worker density was a stronger predictor than GDP per capita."

Portfolio link. Make it prominent. If your GitHub profile is strong, link it. If you have a personal website or blog, link that too.

Education and courses. Include your degree(s) and any significant courses or certifications. But don't pad this section — three relevant courses are better than listing every course you've ever taken.

LinkedIn

LinkedIn matters more than many technologists want to admit. Recruiters use it actively, and having a complete, professional profile is worth the effort.

Headline. Don't just write "Student" or "Unemployed." Write something like "Aspiring Data Scientist | Public Health + Analytics | Python, SQL, Machine Learning" — this tells people what you do and what you're interested in, even before they click your profile.

About section. Write a brief narrative (three to four sentences) about your background, what interests you about data science, and what kind of work you're looking for. Be human — this isn't a legal document.

Featured section. LinkedIn lets you pin featured content. Pin your best blog post, a link to your GitHub, or a PDF of a project summary.

Projects and skills. Fill these sections out. They're searchable — recruiters filter by specific skills, and if you don't list "pandas" or "SQL," you won't show up in their searches.


34.6 The Technical Interview: What to Expect

Once your portfolio gets you past the resume screen, you'll face interviews. Data science interviews typically have several components, and knowing what to expect reduces anxiety enormously.

The Phone Screen

Usually 30-45 minutes with a recruiter or hiring manager. They'll ask about your background, your projects, and your interest in the role. This is where your portfolio pays off — when they ask "tell me about a project you've worked on," you have real stories to tell.

How to describe a project in an interview:

Use the STAR-D framework (an adaptation of the classic STAR method, with a Data twist):

  • Situation: What was the context? "I was exploring WHO vaccination data and noticed that GDP alone didn't explain the enormous variation in vaccination rates."
  • Task: What specifically were you trying to do? "I wanted to identify which factors beyond GDP best predicted a country's vaccination coverage."
  • Action: What did you do? "I merged data from three sources, engineered features for healthcare spending and worker density, and compared three different models."
  • Result: What did you find? "Healthcare worker density emerged as the strongest single predictor, explaining more variance than GDP per capita."
  • Decisions: What judgment calls did you make, and why? "I chose to use a random forest for feature importance analysis because the relationships appeared non-linear in my exploratory plots."

That last piece — the D for Decisions — is what separates good answers from great ones. Anyone can describe what they did. Explaining why they made specific choices shows genuine understanding.

The Technical Assessment

Many companies include a coding or analysis component. This takes several forms:

Live coding. You solve a problem on a shared screen while talking through your approach. The interviewer cares as much about your thought process as your code. Talk out loud. If you're stuck, explain what you're thinking. Ask clarifying questions.

Common topics: - pandas operations (filtering, grouping, merging) - Basic statistics (mean, median, standard deviation, interpreting p-values) - SQL queries (SELECT, JOIN, GROUP BY, HAVING) - Simple algorithm questions (nothing like LeetCode hard — usually data manipulation)

Take-home assessment. You receive a dataset and a question, and you have 48-72 hours to produce an analysis. This is where everything in this chapter comes together: the take-home is essentially a mini portfolio project, and the same principles apply. Start with the question, clean the data thoughtfully, create clear visualizations, interpret your results, and present everything in a polished notebook.

Tips for take-home assessments: - Read the prompt carefully. Answer the question they asked, not the question you wish they'd asked. - Don't over-engineer. If a simple linear regression answers the question, you don't need XGBoost. Using simple methods well is more impressive than using complex methods badly. - Write narrative. Treat it like a portfolio piece. The person reading it is evaluating your communication as much as your technical skill. - Include a brief summary at the top. The reviewer might have 20 take-homes to evaluate. Make their job easy. - State your assumptions. If you made a judgment call (like how to handle missing data), say so and explain why. This shows maturity.

Data challenge / case study. Some companies present a business problem and ask you to propose an analytical approach. "We have 5 million user sessions and we're trying to reduce churn. How would you approach this?" They're testing your problem-structuring ability, not your coding.

The Behavioral Interview

Don't neglect this. Many data science candidates prepare heavily for technical questions and ignore behavioral ones. Common behavioral questions:

  • "Tell me about a time you had to work with messy data."
  • "Describe a situation where your analysis led to an unexpected result. What did you do?"
  • "How do you explain technical concepts to non-technical stakeholders?"
  • "Tell me about a project that didn't go as planned."

Your portfolio projects are your answer bank. Every project you've done — including the vaccination analysis — contains stories about overcoming data quality issues, making difficult decisions, and learning from surprises. Practice telling those stories concisely.


34.7 Choosing Your Next Portfolio Projects

You have one strong project — the vaccination analysis. Now you need two or three more. Here's how to choose projects that demonstrate range without overextending.

Finding Project Ideas

Government open data portals. The US, UK, EU, and many other governments publish enormous amounts of data. data.gov, data.gov.uk, and the EU Open Data Portal have thousands of datasets on everything from air quality to school test scores to crime rates.

Kaggle datasets (not competitions). Kaggle's dataset section has interesting, well-documented datasets that haven't been analyzed to death. Look for datasets with fewer than 100 notebooks — these are less likely to be saturated.

APIs. Build a project that pulls data from an API — Twitter/X, Reddit, Spotify, weather services, sports statistics. This demonstrates web data acquisition skills.

Your own life. Some of the most compelling portfolio projects come from personal data. Track your own habits, analyze your music listening history (Spotify provides this), or investigate something in your community using local data.

Current events. When something interesting happens in the world, find the data behind it. Elections, natural disasters, economic shifts, sports championships — these create natural questions that data can address.

Matching Projects to Career Goals

Different career paths value different project types:

Data Analyst roles: Prioritize projects with strong visualization, clear business insights, and SQL. A project that builds a dashboard or produces a slide-deck-ready report is more relevant than a complex ML pipeline.

Data Scientist roles: Show the full lifecycle — question formulation through modeling and communication. Include at least one project with statistical testing and one with machine learning.

ML Engineer roles: Include a project where you deploy a model or build a pipeline. Even a simple Flask app that serves predictions or a streamlined scikit-learn pipeline demonstrates engineering sensibility.

Domain-specific roles: If you want to work in healthcare data science, build a healthcare project. If you want to work in finance, analyze financial data. Domain-specific projects signal serious interest and may give you domain knowledge that other candidates lack.

Project Scope: Not Too Big, Not Too Small

A common mistake is choosing a project that's too ambitious. You start with grand plans, get bogged down in data cleaning, and abandon it half-finished. A half-finished project in your portfolio is worse than no project at all.

Good scope for a portfolio project: - Can be completed in 15-25 hours of focused work - Uses one to three data sources - Requires meaningful cleaning but not months of it - Has a clear question that can be answered with the data available - Produces three to eight polished visualizations - Fits in a single notebook with clear narrative

Too big: - "I'm going to build a real-time stock price prediction system with a streaming data pipeline." - "I want to analyze every tweet ever posted about climate change." - "I'm building a recommendation engine for every movie on Netflix."

Too small: - "I loaded a dataset and made a bar chart." - "I calculated the mean and standard deviation of this column."

Just right: - "I analyzed 10 years of USDA crop yield data to investigate whether organic farming productivity has been catching up to conventional farming." - "I scraped 5,000 job postings for data science positions to identify the most in-demand skills by city and company size." - "I built a model to predict which NYC subway stations experience the most delays, using MTA performance data and weather records."


34.8 The Personal Brand Question

"Personal brand" sounds like marketing jargon, and honestly, it kind of is. But the underlying idea is practical: when someone in data science hears your name or sees your GitHub profile, what do they think?

You don't need to be famous. You don't need tens of thousands of Twitter followers. But having a consistent, visible presence — even a small one — makes a real difference in a competitive job market.

Here's what a minimal personal brand looks like:

  1. A GitHub profile with three to five polished projects and clear READMEs (you're building this now).
  2. A LinkedIn profile with a clear headline, a brief "About" section, and your projects listed.
  3. One or two blog posts about projects you've done or things you've learned.
  4. Active participation in at least one community — a local meetup, an online forum, a Slack group, a Discord server.

That's it. Four things. None of them require you to be an extrovert or a social media guru. They just require you to make your work visible instead of keeping it on your laptop.

Networking Without Being Weird

Many technically-minded people find networking uncomfortable because they associate it with schmoozing, self-promotion, and insincerity. Good news: the kind of networking that works in data science is none of those things.

Go to meetups. Most cities have data science or Python meetups (check Meetup.com). You don't have to give a talk — just show up, listen, and ask questions. After a few events, you'll recognize people, and they'll recognize you. That's networking.

Participate in online communities. Reddit's r/datascience, the dbt Community Slack, and various Discord servers have active communities where people share projects, ask questions, and help each other. Being helpful — answering questions when you can, sharing interesting resources — builds reputation organically.

Do informational interviews. Reach out to data scientists on LinkedIn and ask for 20 minutes of their time. Most people are willing to chat about their work. Come prepared with specific questions: "How did you transition from academia to industry?" or "What do you wish you'd known in your first year as a data scientist?" These conversations give you insight and expand your network.

Share your work. When you finish a project, post about it. A LinkedIn post saying "Just finished an analysis of global vaccination rate disparities — here's what I found" with a link to your blog post or GitHub repository can reach hundreds of people. You're not bragging; you're sharing something you're proud of. People respect that.


34.9 The Notebook Presentation Layer: Making Your Work Visually Professional

Beyond content and structure, there are presentation details that separate amateur-looking notebooks from professional ones. These may seem superficial, but they make a real difference in how your work is perceived.

Chart Styling

Default matplotlib charts look like defaults — and hiring managers have seen thousands of them. A few small changes make a big difference:

# Set a consistent style at the top of your notebook
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12

These four lines ensure every chart in your notebook has a clean grid background, a readable size, and consistent font sizing. They take 10 seconds to add and improve every visualization that follows.

Color palettes. Choose a palette and stick with it throughout the notebook. Consistency signals intentionality. Also consider accessibility — about 8% of men and 0.5% of women have some form of color vision deficiency. Avoid relying solely on red-green distinctions. Seaborn's colorblind palette is designed for this purpose.

Annotations. The best portfolio visualizations have annotations that draw attention to the key finding. A line chart showing vaccination rates over time is good. The same chart with an annotation pointing to the date when a particular policy was implemented, with a text label explaining the connection, is much better.

Markdown Formatting

Your Markdown cells should be as carefully formatted as your code:

  • Use headers hierarchically. H1 (#) for the title only. H2 (##) for major sections. H3 (###) for subsections. Don't skip levels (going from H2 to H4).
  • Use bold for key terms the first time they appear: "The CRISP criteria evaluate portfolio project quality across five dimensions."
  • Use bullet points for lists of three or more items. Avoid long paragraphs in notebooks — they're harder to read on screen than in print.
  • Use blockquotes (>) for findings, key insights, or important notes: "> Key finding: Healthcare worker density is a stronger predictor than GDP per capita."
  • Use horizontal rules (---) to separate major sections visually.

Code Formatting

Clean code signals professionalism:

  • Descriptive variable names. vaccination_rate_by_region is better than df2. income_group_summary is better than temp.
  • Section headers in code. Use comment blocks to mark code sections: python # ============================================================ # Section 3: Exploratory Analysis # ============================================================
  • Avoid inline magic numbers. Instead of df[df['rate'] > 60], use HIGH_VAX_THRESHOLD = 60 and then df[df['rate'] > HIGH_VAX_THRESHOLD]. Named constants make your code self-documenting.
  • Hide utility code. If you have helper functions (custom color maps, formatting functions), put them in a separate .py file and import them. This keeps the notebook focused on the analysis.

The "Would I Show This to My Boss?" Test

Before publishing any notebook, scroll through it slowly and ask one question for every cell: "If my future boss looked at this, would I be comfortable?" This test catches debugging cells you forgot to remove, charts with missing labels, and Markdown cells with placeholder text like "TODO: interpret this chart."


34.10 Progressive Project Milestone: Polish the Portfolio Piece

Your milestone for this chapter is concrete: take the vaccination rate analysis you've built throughout this book and transform it into a portfolio-ready showcase.

Checklist

Use this checklist to ensure your portfolio piece meets the CRISP criteria:

  • [ ] Clear question. The notebook starts with a specific research question, stated in the first few paragraphs.
  • [ ] Real data. Data sources are cited with links and download dates.
  • [ ] Independent thinking. At least three analytical decisions are documented and justified (e.g., how you handled missing data, why you chose a particular model, how you interpreted an ambiguous result).
  • [ ] Story and structure. The notebook follows the portfolio structure outlined in Section 34.3, with clear Markdown headers, transitions between sections, and a conclusion that answers the original question.
  • [ ] Polished presentation. All charts have titles, axis labels, and interpretive captions. No leftover debugging cells. Variable names are descriptive. Code is commented where the intent isn't obvious.

Additional Steps

  • [ ] Create a GitHub repository for the project with the folder structure described in Section 34.4.
  • [ ] Write a README following the template in Section 34.4.
  • [ ] Write meaningful commit messages as you polish the notebook.
  • [ ] Pin the repository on your GitHub profile.
  • [ ] (Optional) Write a blog post summarizing the project's key findings.
  • [ ] (Optional) Post about the project on LinkedIn with a brief summary and link.

34.10 Common Mistakes and How to Avoid Them

Let me save you from the mistakes I've seen most frequently.

Mistake 1: All Code, No Words

A notebook that's 90% code and 10% Markdown tells the viewer "I can code." A notebook that's 60% Markdown and 40% code tells the viewer "I can think and code and communicate." Guess which one hiring managers prefer?

Fix: After every significant code cell, add a Markdown cell explaining what just happened and what it means. Not what the code does (they can read code) but what the results mean.

Mistake 2: No Original Questions

Using a pre-packaged dataset to answer a pre-packaged question is a tutorial, not a project. "I predicted Titanic survival" is not interesting. "I investigated whether socioeconomic status was a stronger predictor of survival than gender, challenging the 'women and children first' narrative" is.

Fix: For every project, articulate a question that is yours — a question that came from your curiosity, not from a tutorial prompt.

Mistake 3: Abandoned Projects

Nothing looks worse than a GitHub profile full of projects that were clearly started and never finished. A repository with an empty README, a half-completed notebook, and a last commit from 18 months ago signals "I start things but don't finish them."

Fix: Only publish completed projects. If you have works in progress, keep them private until they're ready. Better to have three finished projects than seven abandoned ones.

Mistake 4: Copying Tutorials Verbatim

Hiring managers have seen every popular tutorial. If your portfolio project is a line-for-line reproduction of a Medium article or a Kaggle kernel, it will be recognized. And it will hurt your credibility.

Fix: It's fine to learn from tutorials — that's what they're for. But your portfolio project should extend, adapt, or build on what you learned. Use a different dataset. Ask a different question. Add your own analysis. Make it yours.

Mistake 5: Ignoring Aesthetics

Fair or not, appearance matters. A notebook with default chart styling, no headers, and a wall of code is harder to read and leaves a worse impression than one with custom styling, clear structure, and polished visualizations.

Fix: Invest time in presentation. Set figure sizes explicitly. Choose color palettes thoughtfully (and accessibly — remember color-blind viewers). Use consistent Markdown formatting. It doesn't have to be perfect, but it should look like you cared.

Mistake 6: Overselling Your Results

"My model achieved 94% accuracy, proving that vaccination rates can be predicted from economic indicators alone." No. Your model achieved 94% accuracy on this dataset, which doesn't "prove" anything in a general sense. Overstatement makes you look naive.

Fix: Use qualified language. "The model achieved 94% accuracy on the test set, suggesting that economic indicators are strong predictors of vaccination rates in this dataset. Further validation on additional data would be needed to confirm this relationship."


34.11 The Portfolio as a Living Document

One last thing: your portfolio is not a one-time project. It's a living, evolving collection that grows with you.

Add new projects as you learn new skills. When you learn SQL, add a project that demonstrates SQL skills. When you learn deep learning, add a project that uses neural networks. When you move into a new domain, add a domain-specific project.

Update old projects when you learn better techniques. Your first portfolio project will look rough to you in a year — that's a sign of growth. Update it with cleaner code, better visualizations, and more nuanced analysis. Or archive it and replace it with something better.

Remove weak projects as you build stronger ones. Your portfolio should always represent your current best work, not a historical record of everything you've ever done.

Track what works. If a particular project generates interest in interviews — if hiring managers ask about it, comment on it, or reference it — that's a signal. Make that project more prominent and consider building more projects in that vein.

Think of your portfolio like a garden: plant new things, water what's growing, prune what's dead, and rearrange as the seasons change.


Chapter Summary

Building a portfolio is not an afterthought — it's a core data science skill. The ability to present your work clearly, structure your projects for maximum impact, and make your thinking visible to hiring managers is just as important as your ability to write a pandas pipeline or train a scikit-learn model.

You now know: - Why portfolios matter more than credentials alone - What makes a project compelling (the CRISP criteria: Clear question, Real data, Independent thinking, Story and structure, Polished presentation) - How to transform a working notebook into a portfolio-ready showcase - How to build a professional GitHub profile with clear READMEs and logical structure - How to write about your work in blog posts and project descriptions - What to expect in technical interviews and how to prepare - How to choose additional portfolio projects that demonstrate range

The vaccination rate analysis you've been building throughout this book is already a strong portfolio piece — you just need to polish it, present it, and put it out into the world.

In Chapter 35, you'll take that polishing to its logical conclusion: the capstone project, where you combine all of your work into a single, coherent data science investigation.


Looking ahead: Chapter 35 is the capstone — your chance to bring everything together into a complete, end-to-end data science investigation. Think of it as the ultimate portfolio piece: a project that demonstrates every major skill you've learned in this book. It's the most demanding chapter, and also the most rewarding.