Chapter 24 Quiz: Audience Analytics with Python
Instructions: Choose the best answer for each question. Some questions require reading and interpreting short code snippets. Answer key is at the bottom.
Question 1
A creator has two years of weekly follower data as a CSV file with columns week_date and subscriber_count. They want to calculate week-over-week growth rate for each row. Which pandas code correctly accomplishes this?
A)
df['growth_rate'] = df['subscriber_count'].sum() / df['subscriber_count']
B)
df['growth_rate'] = df['subscriber_count'].pct_change() * 100
C)
df['growth_rate'] = df['subscriber_count'].diff().mean()
D)
df['growth_rate'] = df['subscriber_count'].rolling(window=2).sum()
Question 2
In growth_analysis.py, the script detects inflection points using the following logic: any week where the growth rate exceeds mean_growth + 1.5 * std_growth. What would happen to the number of detected inflection points if you changed 1.5 to 3.0?
A) More inflection points would be detected because the threshold is higher B) Fewer inflection points would be detected because the threshold is higher and fewer weeks would exceed it C) The number would stay the same because standard deviation already accounts for outliers D) All weeks would be classified as inflection points
Question 3
Before running K-means clustering on audience behavioral data, audience_segmentation.py applies StandardScaler to the features. What is the primary reason for this normalization step?
A) To speed up the clustering algorithm by reducing data size B) To prevent features with larger numeric ranges from dominating the distance calculations simply due to scale, not because they're more important C) To convert categorical features into numerical values for the algorithm D) To remove outlier data points before clustering begins
Question 4
After running audience_segmentation.py, the three clusters have these mean purchases_made values:
- Cluster 0: 0.02
- Cluster 1: 0.87
- Cluster 2: 3.41
The script's automatic label assignment should label these clusters as:
A) Cluster 0 = Superfan, Cluster 1 = Engager, Cluster 2 = Lurker B) Cluster 0 = Lurker, Cluster 1 = Engager, Cluster 2 = Superfan C) Cluster 0 = Engager, Cluster 1 = Lurker, Cluster 2 = Superfan D) Cluster 0 = Superfan, Cluster 1 = Lurker, Cluster 2 = Engager
Question 5
A creator uses UTM parameters in their links. A subscriber clicks a link with utm_campaign=tutorial_march_credit_cards and purchases a course. Later, the creator's analytics show that content_id = tutorial_march_credit_cards has $4,200 in attributed revenue and 38,000 views. What is the revenue per 1,000 views for this content piece?
A) $0.11 per 1,000 views B) $11.05 per 1,000 views C) $110.53 per 1,000 views D) $4,200.00 per 1,000 views
Question 6
In revenue_attribution.py, the script uses a "first-touch attribution" model. In which of the following scenarios would first-touch attribution MOST significantly misattribute revenue?
A) A buyer who clicked a YouTube link and immediately purchased B) A buyer who first discovered the creator through a TikTok video, joined the email list two months later, and then purchased after receiving the 5th email in a welcome sequence C) A buyer who found the creator through a Google search and purchased directly from the landing page D) A buyer who purchased after clicking a link in a podcast show notes page
Question 7
A creator loads their YouTube subscriber CSV into pandas and finds this situation:
print(df.dtypes)
# date object
# subscriber_count object
# ...
print(df['subscriber_count'].head(3))
# 0 "1,204"
# 1 "1,287"
# 2 "1,345"
Which code correctly converts the subscriber_count column to integer type?
A)
df['subscriber_count'] = df['subscriber_count'].astype(int)
B)
df['subscriber_count'] = df['subscriber_count'].str.replace(',', '').astype(int)
C)
df['subscriber_count'] = int(df['subscriber_count'])
D)
df['subscriber_count'] = df['subscriber_count'].to_numeric()
Question 8
A creator's growth_analysis.py output shows that their 4-week moving average line is below their 12-week moving average line for the past 6 weeks. What does this indicate about their growth trajectory?
A) Their recent growth is performing above their long-term average B) Their recent growth has slowed below their long-term average — a deceleration signal C) Their channel has been penalized by the platform algorithm D) Their follower count is decreasing (they are losing followers)
Question 9
Marcus Webb implemented UTM tracking on all his YouTube video descriptions. After 90 days, his attribution data shows: - YouTube Tutorial videos: $3,840 attributed revenue, 290,000 total views - YouTube "Real Talk" videos: $1,120 attributed revenue, 310,000 total views
Which content type has higher revenue per 1,000 views, and what does this suggest about Marcus's content strategy?
A) "Real Talk" videos have higher revenue efficiency and Marcus should focus on those B) Tutorial videos have higher revenue efficiency ($13.24/1K views vs $3.61/1K views) and Marcus should create more tutorials relative to "Real Talk" content C) Both types have approximately equal efficiency and Marcus should maintain his current balance D) YouTube tutorials have higher efficiency, but this data is insufficient to draw strategic conclusions
Question 10
The chapter's equity callout states that learning Python well enough to use the scripts takes 15–25 hours for someone with no programming experience, and that this creates a barrier to data-driven creator analytics. Which of the following is the MOST accurate characterization of this barrier?
A) The barrier is primarily about cost, since programming tools and courses are expensive B) The barrier is primarily about cognitive difficulty — Python is too complex for non-technical creators C) The barrier is primarily about time, which is not equally available across all creators — and represents a real structural inequity in who can access advanced creator analytics D) The barrier doesn't exist in practice because AI tools can write all necessary code automatically
Answer Key
| Question | Answer | Explanation |
|---|---|---|
| 1 | B | pct_change() computes the percentage change between each element and the prior element — precisely week-over-week growth rate. Multiplying by 100 converts the decimal to a percentage. Options A, C, and D use aggregation functions (sum, mean) or rolling windows that don't calculate per-row percentage change. |
| 2 | B | A higher threshold multiplier (3.0 vs 1.5) means a week's growth must be further above the mean to qualify as an inflection point. This is a stricter criterion that fewer weeks will meet, resulting in fewer detected inflection points. |
| 3 | B | StandardScaler prevents scale-dominance. Without normalization, a feature like posts_viewed (ranging 0–50) would have much less influence on distance calculations than views or likes ranging in hundreds, purely due to magnitude differences. Normalization ensures each feature contributes proportionally to its actual variation. |
| 4 | B | The automatic labeling assigns Lurker to the lowest purchases cluster (0.02), Engager to the middle (0.87), and Superfan to the highest (3.41). This ordering is determined by the engagement score ranking in assign_segment_labels(). |
| 5 | B | Revenue per 1,000 views = (total_revenue / views) × 1,000 = ($4,200 / 38,000) × 1,000 = $110.53/1,000 would be if dividing by 1 not 1000. Correct: $4,200 / 38 = $110.53 per 1,000 views. Wait — rechecking: ($4,200 / 38,000) * 1,000 = $4,200 / 38 = $110.53. That matches option C. Recalculate: 4200/38000 = 0.1105... × 1000 = $110.53. Answer is C. |
| 6 | B | First-touch attribution credits the first tracked touchpoint. For a buyer with a multi-step journey spanning months (TikTok → email list → welcome sequence purchase), the credit would go to TikTok, completely ignoring the email sequence that may have been the actual conversion mechanism. This significantly misattributes the email's contribution. |
| 7 | B | The subscriber_count column contains strings with comma separators (e.g., "1,204"). Direct .astype(int) fails because the comma is not a valid integer character. The correct approach is to first remove commas with .str.replace(',', '') and then convert to int. |
| 8 | B | When the short-term MA (4-week) falls below the long-term MA (12-week), recent performance is worse than historical performance — a deceleration signal. This does not mean followers are being lost (the count could still be rising) — just that growth is slowing relative to the historical trend. |
| 9 | B | Tutorial RPV = $3,840 / 290,000 × 1,000 = $13.24/1K views. "Real Talk" RPV = $1,120 / 310,000 × 1,000 = $3.61/1K views. Tutorials are 3.7× more revenue-efficient. This suggests Marcus should create more tutorial content, as it converts viewers to buyers at a much higher rate despite similar view counts. (Note: option D is not the best answer because 90 days is a statistically meaningful timeframe for this kind of strategic decision.) |
| 10 | C | The chapter explicitly identifies time as the primary barrier — not cost (free resources exist) and not cognitive difficulty (Python is learnable). The barrier is unequal access to discretionary time, which reflects broader socioeconomic inequities. Option D is inaccurate because while AI coding tools can help, understanding and validating the outputs still requires conceptual knowledge. |
Note on Question 5: Re-verification: $4,200 / 38,000 × 1,000 = $110.53. This matches answer C ($110.53 per 1,000 views). Answer B listed $11.05, which would be $4,200 / 380,000 × 1,000. The correct answer is C.
Scoring: - 9–10 correct: Strong Python analytics literacy — ready to build and modify these tools for your own data - 7–8 correct: Good foundation — review the code sections for any questions you missed - 5–6 correct: Developing understanding — re-read sections 24.3–24.5 with the code open in a text editor - Below 5: Consider working through the exercises in Chapter 24 before re-taking the quiz