Chapter 34 Quiz: Building Your Portfolio

Contributors to Introduction to Data Science

Chapter 34 Quiz: Building Your Portfolio

Instructions: This quiz tests your understanding of Chapter 34. Answer all questions before checking the solutions. For multiple choice, select the best answer. For short answer questions, aim for 2-4 clear sentences. Total points: 100.

Section 1: Multiple Choice (10 questions, 4 points each)

Question 1. What does the "C" in the CRISP portfolio criteria stand for?

(A) Code quality
(B) Clear question
(C) Comprehensive analysis
(D) Career relevance

Answer

**Correct: (B)** CRISP stands for Clear question, Real data, Independent thinking, Story and structure, and Polished presentation. The "C" emphasizes that a compelling portfolio project starts with a specific, interesting question — not just "let me explore this data" but a focused investigation that a curious person would want to know the answer to. Having clean code is important (and falls under "P" for Polished), but the question is what makes a project *interesting*.

Question 2. A hiring manager is reviewing 300+ applications and has about 90 seconds per resume. According to the chapter, what most often separates "yes" from "maybe" applications?

(A) A prestigious degree
(B) A long list of technical skills
(C) Evidence of actual project work (e.g., a portfolio link with polished projects)
(D) Multiple certifications from online platforms

Answer

**Correct: (C)** Degrees, skill lists, and certifications are table stakes — many candidates have them. What moves an application from "maybe" to "yes" is evidence that the candidate can actually *do the work*: a GitHub profile with polished projects, clear READMEs, and visible analytical thinking. The portfolio is the bridge between "I took classes" and "I can do this job."

Question 3. Which of the following would be the STRONGEST portfolio project?

(A) A Titanic survival prediction that achieves 85% accuracy using gradient boosting
(B) An MNIST digit classification notebook with a convolutional neural network
(C) An investigation of whether airline delays have worsened over the past decade, using DOT data merged with weather records
(D) A housing price prediction using the Boston Housing dataset with linear regression

Answer

**Correct: (C)** Option C has all five CRISP qualities: a clear question ("Have airline delays worsened?"), real data from government sources requiring merging, independent thinking in the analytical design, an inherent narrative structure (a question with a discoverable answer), and implies polished output. Options A, B, and D use well-worn tutorial datasets and questions, demonstrating that the author can follow instructions but not necessarily think independently.

Question 4. According to the chapter, how many portfolio projects do you need?

(A) At least 10 to show breadth
(B) Exactly 1 very deep project
(C) 3 to 5 polished projects that demonstrate range
(D) As many as possible, even if some are unfinished

Answer

**Correct: (C)** Quality over quantity. Three to five well-polished projects showing range (a deep dive, a domain project, and a technical demonstration) are more impressive than ten mediocre ones. Critically, unfinished projects should never appear in your portfolio — they signal an inability to follow through. Better to have three completed projects than seven abandoned ones.

Question 5. When transforming a working notebook into a portfolio piece, which of these should you REMOVE?

(A) Markdown cells explaining analytical decisions
(B) Debugging cells and leftover print(df.shape) statements
(C) Written conclusions about limitations
(D) Captions explaining what visualizations show

Answer

**Correct: (B)** Debugging output, experimental cells that didn't lead anywhere, and leftover scaffolding code should be removed from a portfolio notebook. These are normal parts of the working process but don't belong in the final presentation. Markdown explanations (A), limitations discussions (C), and visualization captions (D) should all be *kept* — they demonstrate communication skills and analytical maturity.

Question 6. What is the recommended structure for a project's GitHub repository?

(A) Just the notebook file in the root directory
(B) A README, notebooks folder, data folder, requirements.txt, and .gitignore
(C) A data folder containing everything — notebooks, data, and documentation
(D) Multiple notebooks with no README needed because the code speaks for itself

Answer

**Correct: (B)** A well-organized repository has a clear structure: README.md at the root (the front door), a notebooks/ folder for Jupyter notebooks, a data/ folder (with raw/ and processed/ subdirectories), requirements.txt for reproducibility, and a .gitignore to exclude checkpoints and large files. Option A lacks organization; option C mixes concerns; option D ignores the fact that code does not, in fact, speak for itself — written context is essential.

Question 7. In the STAR-D interview framework, what does the "D" stand for?

(A) Data — describe the dataset you used
(B) Delivery — explain how you delivered the results
(C) Decisions — explain judgment calls you made and why
(D) Difficulty — describe the hardest part of the project

Answer

**Correct: (C)** The "D" stands for Decisions. While STAR (Situation, Task, Action, Result) is a standard interview framework, the chapter adds "D" because in data science, the *judgment calls* you make — which model to use, how to handle missing data, what metrics to optimize — reveal more about your analytical ability than any other part of the answer. Explaining *why* you made specific choices separates thoughtful data scientists from those who just follow procedures.

Question 8. What is the biggest mistake beginners make in take-home assessments?

(A) Not using enough complex models
(B) Over-engineering the solution when a simpler approach would work
(C) Spending too much time on data cleaning
(D) Including too many visualizations

Answer

**Correct: (B)** The chapter emphasizes that using simple methods well is more impressive than using complex methods badly. If a linear regression answers the question, XGBoost isn't necessary. Take-home assessments evaluate communication, analytical reasoning, and judgment — not just technical complexity. A clean, well-communicated analysis with appropriate methods beats a messy, over-complicated one every time. Answering the question clearly is always the priority.

Question 9. Which approach to GitHub contribution activity does the chapter recommend?

(A) Make hundreds of small commits daily to keep the activity heatmap green
(B) Commit only when a project is completely finished, in one large upload
(C) Make regular, meaningful commits with descriptive messages as you work
(D) Fork popular repositories to show activity even without original work

Answer

**Correct: (C)** Regular, meaningful commits with descriptive messages show genuine work in progress. Option A games the system and hiring managers can see through it. Option B misses the benefit of showing your working process. Option D shows no original work. Descriptive commit messages like "add sensitivity analysis for imputed GDP values" tell a story that reviewers appreciate.

Question 10. According to the chapter, what should a project README lead with?

(A) Installation instructions
(B) A list of technologies used
(C) An overview of the project with the most interesting finding
(D) The author's biography

Answer

**Correct: (C)** The README should lead with the overview — what the project investigates and what was found. The most interesting finding should be front and center because it's what makes someone want to keep reading. Technical details (installation, technologies) come later. The README is the front door of your project; lead with what makes the project compelling, not with logistics.

Section 2: True/False (4 questions, 5 points each)

Question 11. TRUE or FALSE: Using a well-known dataset like Titanic or Iris automatically makes a portfolio project weak.

Answer

**FALSE.** Using a well-known dataset doesn't automatically make a project weak — what matters is what you *do* with it. A Titanic project that asks a novel question, provides original analysis, and includes thoughtful written interpretation can be a strong portfolio piece. The problem is when people replicate a tutorial without adding original thinking. The chapter's point is that the *question and analysis* matter more than the *dataset*.

Question 12. TRUE or FALSE: A portfolio notebook should include every visualization and analysis you performed, to show thoroughness.

Answer

**FALSE.** A portfolio notebook is a curated presentation, not a lab notebook. You should include only the five to eight visualizations that tell the most compelling story. Including every chart, debugging output, and dead-end analysis makes the notebook tedious and unfocused. Think of the portfolio notebook as a plated dish, not the pile of dirty pans in the kitchen.

Question 13. TRUE or FALSE: Discussing the limitations of your analysis in a portfolio piece makes you look less competent.

Answer

**FALSE.** Honestly discussing limitations makes you look *more* competent, not less. It shows intellectual honesty, self-awareness, and a mature understanding of what data can and cannot tell you. Hiring managers are more concerned by candidates who present results as definitive than by candidates who clearly articulate what they don't know. Overstatement is a red flag; honest limitation is a strength.

Question 14. TRUE or FALSE: A personal website with three polished project write-ups is more impressive than a GitHub profile with twenty forked repositories and no original work.

Answer

**TRUE.** Quality always beats quantity in portfolio building. Three polished projects with clear questions, real data, thoughtful analysis, and written conclusions demonstrate far more about a candidate's abilities than twenty forked repositories (which show nothing beyond the ability to click the "fork" button). Hiring managers look for evidence of original thinking and communication skill, both of which require completed, polished work.

Section 3: Short Answer (4 questions, 5 points each)

Question 15. Explain the difference between a "working notebook" and a "portfolio notebook." What stays, what goes, and why?

Answer

A working notebook is your exploration space — it contains experiments, debugging output, dead ends, and raw analysis as you figure things out. A portfolio notebook is a curated, edited presentation of your best work. You remove: debugging cells, duplicate analyses, leftover `print()` statements, cells with no output, and references to textbook exercises. You keep: the final clean version of each analysis step, documented analytical decisions with reasoning, polished visualizations with captions, and honest discussions of what worked and what didn't. The analogy is a kitchen (working) versus a plated dish (portfolio) — the reader wants the finished meal, not the dirty pans.

Question 16. A friend shows you their portfolio project. The notebook contains 150 lines of excellent code and produces great charts, but has only two Markdown cells — a title and "End." What advice would you give them?

Answer

The project demonstrates coding ability but fails on communication — arguably the most important portfolio criterion. I would advise them to add narrative Markdown throughout: an introduction stating the question and why it matters, interpretation after each major analysis step explaining what the results mean, captions for every visualization pointing out what the reader should notice, and a conclusion answering the original question. A notebook that's 60% Markdown and 40% code is more impressive than one that's 90% code, because it shows the person can *think and communicate*, not just code.

Question 17. Why does the chapter recommend the three-project portfolio structure (Deep Dive, Domain Project, Technical Demo) rather than just picking any three projects?

Answer

The three-project structure demonstrates *range*, which is crucial for a junior data science candidate. The Deep Dive shows depth — the ability to conduct thorough, multi-faceted analysis. The Domain Project shows genuine intellectual curiosity and the ability to apply data science to a field the candidate cares about. The Technical Demo shows specific technical skill (web scraping, deployment, API integration, etc.). Together, they prove the candidate can go deep, follow their interests, and handle technical challenges — a much more complete picture than three projects that all use the same techniques on similar datasets.

Question 18. In the context of take-home assessments, the chapter says "a clean, well-communicated simple analysis beats a messy, uncommunicated complex one." Explain why this is true from the hiring manager's perspective.

Answer

Hiring managers are evaluating judgment, communication, and analytical thinking — not just technical ability. A simple analysis that clearly answers the question, with well-labeled charts and written interpretation, shows the candidate can prioritize, communicate effectively, and deliver results. A complex analysis that's hard to follow, poorly communicated, and uses unnecessarily advanced techniques suggests the candidate prioritizes showing off over solving problems. In real data science work, stakeholders need clear answers, not impressive algorithms — and the take-home is a preview of how the candidate would perform on the job.

Section 4: Applied Scenarios (2 questions, 5 points each)

Question 19. You're preparing for a phone screen interview. The recruiter says "Tell me about a project you've worked on." Using the STAR-D framework, outline how you would describe your vaccination rate analysis in under two minutes.

Answer

**Situation:** "During the COVID-19 pandemic, vaccination rates varied enormously across countries — from over 90% in some nations to under 10% in others. I wanted to understand what drove those differences." **Task:** "I set out to identify which national-level indicators best predicted a country's vaccination coverage, using data for 194 WHO member states." **Action:** "I merged data from three sources — WHO vaccination records, World Bank development indicators, and health expenditure data. I cleaned about 4,500 records, handling missing values for 47 countries by imputing nearest-year data. I conducted exploratory analysis, statistical testing across income groups, and trained three predictive models — linear regression, logistic regression, and random forest." **Result:** "The random forest model achieved an R-squared of 0.78 and identified healthcare worker density as the single strongest predictor of vaccination rates — stronger than GDP per capita, which surprised me." **Decisions:** "I chose the random forest partly because my exploratory analysis showed non-linear relationships, but also because its feature importance output was more interpretable for communicating results to a non-technical audience. I also chose to impute rather than drop missing GDP data because dropping would have eliminated most of Sub-Saharan Africa from the analysis."

Question 20. You receive a take-home assessment with this prompt: "Here is a dataset of 50,000 customer records for an e-commerce company. Analyze the data and provide recommendations for reducing customer churn." You have 48 hours. Outline your plan, identifying the three most important things to get right.

Answer

**Plan:** 1. Spend the first hour on EDA — understand the data shape, distributions, and what "churn" means in this dataset (is it defined for you, or do you need to define it?). 2. Clean the data, documenting decisions about missing values and outliers. 3. Create four to five key visualizations: churn rate over time, churn by customer segment, feature distributions by churn status. 4. Build a simple model (logistic regression first, then maybe a random forest) to identify the strongest predictors of churn. 5. Write an executive summary with three to five actionable recommendations. **Three most important things to get right:** 1. **Answer the question they asked** — they want *recommendations for reducing churn*, not just a model. End with actionable business insights, not just accuracy metrics. 2. **Communicate clearly** — write narrative Markdown throughout; include a summary at the top so the reviewer can get the gist in 60 seconds. 3. **Show judgment, not just technique** — state assumptions, explain analytical choices, be honest about limitations. A well-reasoned simple analysis beats an unexplained complex one.