Appendix F: Python Analytics Toolkit
Chapters 24 through 26 introduced a range of quantitative methods for understanding your audience, forecasting revenue, and running rigorous experiments. This appendix puts all of those methods in your hands as working, tested Python code. You will find three modules here, each self-contained and usable independently, but designed to work together as a coherent toolkit.
The goal is not to teach Python from scratch — the prerequisites for this appendix assume you have worked through the concepts in Part VI and have at least passing familiarity with running Python scripts. The goal is to give you production-quality, heavily commented code that you can adapt to your own data within an afternoon.
Prerequisites and Learning Path
Before working through this appendix, you should have read:
- Chapter 24 (Audience Analytics) — growth rate calculations, inflection point detection, and audience segmentation
- Chapter 25 (Revenue Modeling) — income volatility metrics, multi-stream forecasting, and the Monte Carlo method
- Chapter 26 (Experimentation) — A/B testing, chi-square significance, and sample size planning
You should also be comfortable with:
- Running Python scripts from a terminal or command prompt
- Installing packages with
pip - Opening CSV files in a spreadsheet application (to understand the data structure before loading it)
- Basic pandas operations: reading a file, filtering rows, selecting columns
You do not need prior experience with scikit-learn, scipy, or matplotlib. All relevant functions are wrapped in clear, documented interfaces that abstract away the statistical machinery.
Installation and Setup
1. Verify Python Version
This toolkit requires Python 3.9 or higher. Check your version:
python --version
# Expected: Python 3.9.x, 3.10.x, 3.11.x, or 3.12.x
If you see Python 2.x or a version below 3.9, install a newer version from python.org. On Windows, make sure to check "Add Python to PATH" during installation.
2. Create a Virtual Environment (Recommended)
Isolating this toolkit's dependencies prevents conflicts with other Python projects:
# Navigate to your project directory
cd my-creator-analytics
# Create a virtual environment
python -m venv venv
# Activate it
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
Your terminal prompt should now show (venv) at the beginning.
3. Install Required Packages
pip install pandas numpy matplotlib scipy scikit-learn
This installs:
| Package | Version (minimum) | Purpose |
|---|---|---|
| pandas | 1.5+ | Data loading, cleaning, time-series operations |
| numpy | 1.23+ | Array math, random number generation |
| matplotlib | 3.6+ | Chart generation, dashboard output |
| scipy | 1.9+ | Chi-square tests, normal distribution functions |
| scikit-learn | 1.1+ | K-means clustering, StandardScaler |
To save the exact versions you install (for reproducibility):
pip freeze > requirements.txt
4. Download the Toolkit Files
Copy the three module files into your project directory:
my-creator-analytics/
analytics_toolkit.py
platform_data_fetcher.py
audience_dashboard.py
requirements.txt
venv/
data/ <-- your CSV exports go here
5. Verify Installation
Run this quick test to confirm everything is working:
# test_install.py
from analytics_toolkit import calculate_growth_rate
from platform_data_fetcher import generate_sample_youtube
import pandas as pd
df = generate_sample_youtube(n_days=30)
growth = calculate_growth_rate(df, metric_col="views", period="weekly")
print("Installation successful!")
print(growth.tail(3))
If you see a table of weekly view totals and growth rates, you are ready to proceed.
Module Overview
The toolkit consists of three files with distinct responsibilities:
analytics_toolkit.py
The core analytics library. Contains all the functions introduced in Chapters 24–26, grouped into three logical sections:
Growth Analytics (Chapter 24):
- load_platform_csv — load and standardize platform CSV exports
- calculate_growth_rate — period-over-period growth with configurable aggregation
- find_inflection_points — statistical detection of growth spikes
- segment_audience — K-means clustering of audience behavior
Revenue Forecasting (Chapter 25):
- monte_carlo_revenue — multi-stream Monte Carlo simulation
- calculate_income_volatility — coefficient of variation and volatility rating
Experimentation (Chapter 26):
- run_ab_test — chi-square test with uplift calculation and recommendation
- calculate_required_sample_size — power analysis for test planning
Visualization:
- plot_growth_chart — time-series line chart with inflection markers
- plot_revenue_forecast — histogram + monthly fan chart
- plot_ab_test_results — side-by-side bar chart with significance summary
platform_data_fetcher.py
Handles the messy work of loading data from different platform export formats. Each platform uses different column names, date formats, and CSV structures. This module normalizes everything into a consistent schema. Includes:
load_youtube_export— YouTube Studio channel and content reportsload_tiktok_export— TikTok Creator Center analyticsload_instagram_export— Instagram Professional Dashboard insightsload_email_export— ConvertKit and Mailchimp broadcast reportsload_auto— auto-detection of platform typevalidate_dataframe— data quality checking- Sample data generators for each platform (for testing and demos)
audience_dashboard.py
A complete dashboard generator that assembles data from all other modules into a single multi-panel matplotlib figure. Can run in demo mode (no real data required) or production mode pointing at a directory of CSV exports. Run it from the command line:
# Demo mode (synthetic data)
python audience_dashboard.py --demo --output my_dashboard.png
# Production mode
python audience_dashboard.py --data-dir ./data/ --output dashboard.png --name "My Channel"
Worked Examples
Example 1: Load YouTube Data and Plot Growth
from platform_data_fetcher import generate_sample_youtube
from analytics_toolkit import calculate_growth_rate, plot_growth_chart
import matplotlib.pyplot as plt
# In production: df = load_youtube_export("studio_export.csv")
df = generate_sample_youtube(n_days=365)
# Calculate weekly subscriber growth
weekly = calculate_growth_rate(df, metric_col="subscribers", period="weekly")
print(weekly.head())
# Plot it
fig = plot_growth_chart(
weekly,
date_col="date",
metric_col="subscribers",
title="Weekly Subscriber Count",
save_path="subscriber_growth.png"
)
plt.show()
This produces a line chart with a shaded area beneath it. The chart is saved as a PNG file at the path you specify with save_path.
Example 2: Detect Viral Moments with Inflection Points
from platform_data_fetcher import generate_sample_tiktok
from analytics_toolkit import calculate_growth_rate, find_inflection_points, plot_growth_chart
df = generate_sample_tiktok(n_days=180)
# Aggregate views daily (TikTok data is already daily, so just use it directly)
weekly = calculate_growth_rate(df, metric_col="views", period="weekly")
# Find inflection points at 2x the normal growth rate
spikes = find_inflection_points(weekly, metric_col="views", threshold_multiplier=2.0)
print(f"Found {len(spikes)} viral moments:")
print(spikes[["date", "views", "growth_rate", "z_score"]])
# Plot with spike markers
fig = plot_growth_chart(
weekly,
metric_col="views",
title="TikTok Views — Viral Moments Highlighted",
inflection_points=spikes,
save_path="viral_moments.png"
)
The red triangle markers on the chart show exactly when your content spiked beyond normal variation. Correlating these dates with your content calendar reveals what triggered the growth.
Example 3: Segment Your Audience by Engagement Behavior
from platform_data_fetcher import generate_sample_instagram
from analytics_toolkit import segment_audience
import pandas as pd
df = generate_sample_instagram(n_days=180)
# Segment by engagement dimensions
segmented = segment_audience(
df,
feature_cols=["impressions", "reach", "likes", "comments", "saves"],
n_clusters=4
)
# Inspect the resulting segments
summary = segmented.groupby("segment_label")[
["impressions", "likes", "comments", "saves"]
].mean().round(1)
print(summary)
# Count members per segment
print(segmented["segment_label"].value_counts())
The output shows each segment's average behavior. "Power Fan" rows have high engagement across all dimensions; "Casual Viewer" rows show high impressions but low interaction. Understanding segment sizes helps you design content for each audience tier.
Example 4: Run a Revenue Monte Carlo Simulation
from analytics_toolkit import monte_carlo_revenue, plot_revenue_forecast
import matplotlib.pyplot as plt
# Define your revenue streams with mean and standard deviation
streams = {
"AdSense": {
"mean": 800,
"std": 200,
"growth_rate": 0.015 # 1.5% monthly growth
},
"Sponsorships": {
"mean": 2000,
"std": 800
},
"Online Course": {
"mean": 1500,
"std": 600,
"growth_rate": 0.03
},
"Memberships": {
"mean": 650,
"std": 100,
"growth_rate": 0.01
},
}
results = monte_carlo_revenue(streams, n_simulations=5000, months=12)
print(f"Annual Revenue Forecast:")
print(f" Conservative (P10): ${results['p10']:>10,.0f}")
print(f" Likely (P50): ${results['p50']:>10,.0f}")
print(f" Optimistic (P90): ${results['p90']:>10,.0f}")
fig = plot_revenue_forecast(results, title="12-Month Revenue Forecast", save_path="forecast.png")
plt.show()
The n_simulations=5000 argument runs 5,000 independent revenue paths. The P10 figure is your planning floor — if you can operate comfortably at that income level, your business can survive a bad year. The P90 figure is your upside scenario.
Example 5: Measure Income Volatility
from analytics_toolkit import calculate_income_volatility
# Replace with your actual monthly totals
my_monthly_income = [
2400, 3100, 1800, 4200, 2900, 2200,
5100, 1900, 3300, 2700, 4800, 3600
]
volatility = calculate_income_volatility(my_monthly_income)
print(f"Mean monthly income: ${volatility['mean']:,.0f}")
print(f"Std deviation: ${volatility['std']:,.0f}")
print(f"Coefficient of var: {volatility['cv']:.2f}")
print(f"Worst drawdown: {volatility['max_drawdown_pct']:.1f}%")
print(f"10th pct floor: ${volatility['p10_floor']:,.0f}")
print(f"Volatility rating: {volatility['volatility_rating']}")
A CV above 0.5 signals "High" volatility — months vary enough that financial planning becomes difficult. The 10th-percentile floor is the number to budget around: it represents a genuinely bad month that still has a 90% chance of beating it.
Example 6: Plan an A/B Test Before You Run It
from analytics_toolkit import calculate_required_sample_size
# Your current thumbnail click-through rate
current_ctr = 0.052 # 5.2%
# You want to detect a 20% relative improvement (5.2% -> 6.24%)
sample_size = calculate_required_sample_size(
baseline_rate=current_ctr,
minimum_detectable_effect=0.20,
power=0.80,
alpha=0.05
)
print(f"Required viewers per thumbnail variant: {sample_size['n_per_group']:,}")
print(f"Total viewers needed (both variants): {sample_size['total_n']:,}")
print(f"Your baseline CTR: {sample_size['baseline_rate']*100:.1f}%")
print(f"Target CTR (if test wins): {sample_size['target_rate']*100:.2f}%")
Run this calculation before you expose your audience to any test. If the required sample size exceeds your typical monthly impression volume, the test will take too long to be practically useful — consider testing a larger change that requires a smaller sample.
Example 7: Analyze an A/B Test After It Runs
from analytics_toolkit import run_ab_test, plot_ab_test_results
import matplotlib.pyplot as plt
# Thumbnail A (control): classic talking-head thumbnail
# Thumbnail B (test): redesigned with bold text overlay
result = run_ab_test(
control_conversions=312,
control_total=6000,
test_conversions=398,
test_total=6000,
alpha=0.05
)
print(f"Control CTR: {result['control_rate']*100:.2f}%")
print(f"Test CTR: {result['test_rate']*100:.2f}%")
print(f"Uplift: {result['relative_uplift']:+.1f}%")
print(f"p-value: {result['p_value']:.4f}")
print(f"Significant: {result['significant']}")
print(f"\nRecommendation:\n{result['recommendation']}")
fig = plot_ab_test_results(
control_data={"conversions": 312, "total": 6000},
test_data={"conversions": 398, "total": 6000},
test_results=result,
title="Thumbnail A/B Test — Jan 2024",
save_path="ab_test_results.png"
)
plt.show()
Example 8: Load Real Instagram Data
from platform_data_fetcher import load_instagram_export, validate_dataframe
# Export from: Instagram Professional Dashboard > Insights > Export Data
df = load_instagram_export("instagram_insights_jan2024.csv")
# Validate data quality before analysis
report = validate_dataframe(df, required_cols=["date", "impressions", "reach"])
if report["valid"]:
print(f"Data looks good: {report['row_count']} rows, {report['date_range']}")
else:
print("Data quality issues:")
for issue in report["issues"]:
print(f" - {issue}")
print(df.head())
print(df.dtypes)
The validate_dataframe function catches common export problems — missing columns, high null rates, date gaps — before they silently corrupt your analysis. Always run validation on real data.
Example 9: Multi-Platform Growth Comparison
from platform_data_fetcher import (
generate_sample_youtube,
generate_sample_tiktok,
generate_sample_instagram,
)
from analytics_toolkit import calculate_growth_rate
import pandas as pd
import matplotlib.pyplot as plt
yt = generate_sample_youtube(n_days=180)
tt = generate_sample_tiktok(n_days=180)
ig = generate_sample_instagram(n_days=180)
fig, ax = plt.subplots(figsize=(12, 5))
for df, col, label, color in [
(yt, "subscribers", "YouTube Subscribers", "#ff0000"),
(tt, "new_followers", "TikTok New Followers (cumulative)", "#010101"),
(ig, "new_followers", "Instagram New Followers (cumulative)", "#c13584"),
]:
if col in df.columns:
plot_df = df[["date", col]].dropna()
if col == "new_followers":
plot_df = plot_df.copy()
plot_df[col] = plot_df[col].cumsum()
# Normalize to index = 100 at start for fair comparison
start_val = plot_df[col].iloc[0]
if start_val > 0:
plot_df["indexed"] = plot_df[col] / start_val * 100
ax.plot(plot_df["date"], plot_df["indexed"],
label=label, color=color, linewidth=2)
ax.set_title("Cross-Platform Follower Growth (Indexed to 100)", fontsize=13, fontweight="bold")
ax.set_ylabel("Indexed Growth (Start = 100)")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("platform_comparison.png", dpi=150)
plt.show()
Indexing to 100 at the start date lets you compare growth rates across platforms with very different absolute follower counts. A creator with 500,000 YouTube subscribers and 8,000 TikTok followers can still see that TikTok is growing three times faster.
Example 10: Run the Full Dashboard in Demo Mode
This is the fastest way to see everything working together:
python audience_dashboard.py --demo --output demo_dashboard.png
The script: 1. Generates 365 days of synthetic YouTube data 2. Generates 180 days of synthetic TikTok and Instagram data 3. Generates 52 weeks of synthetic email analytics 4. Runs K-means segmentation on the TikTok data 5. Runs a Monte Carlo forecast for five revenue streams 6. Runs an A/B test analysis on synthetic thumbnail data 7. Assembles a five-panel dashboard PNG 8. Prints a text summary to the console
Open demo_dashboard.png to see the finished product. Then swap in your own data files when you are ready.
Function Reference Table
| Function | Module | Returns | Key Parameters |
|---|---|---|---|
load_platform_csv |
analytics_toolkit | DataFrame | filepath, platform |
calculate_growth_rate |
analytics_toolkit | DataFrame | df, date_col, metric_col, period |
find_inflection_points |
analytics_toolkit | DataFrame | df, metric_col, threshold_multiplier |
segment_audience |
analytics_toolkit | DataFrame | df, feature_cols, n_clusters |
monte_carlo_revenue |
analytics_toolkit | dict | streams, n_simulations, months |
calculate_income_volatility |
analytics_toolkit | dict | monthly_incomes |
run_ab_test |
analytics_toolkit | dict | control_conversions, control_total, test_conversions, test_total |
calculate_required_sample_size |
analytics_toolkit | dict | baseline_rate, minimum_detectable_effect, power, alpha |
plot_growth_chart |
analytics_toolkit | Figure | df, date_col, metric_col, title, inflection_points |
plot_revenue_forecast |
analytics_toolkit | Figure | results, title |
plot_ab_test_results |
analytics_toolkit | Figure | control_data, test_data, test_results |
load_youtube_export |
platform_data_fetcher | DataFrame | filepath, report_type |
load_tiktok_export |
platform_data_fetcher | DataFrame | filepath |
load_instagram_export |
platform_data_fetcher | DataFrame | filepath |
load_email_export |
platform_data_fetcher | DataFrame | filepath, provider |
load_auto |
platform_data_fetcher | (DataFrame, str) | filepath |
validate_dataframe |
platform_data_fetcher | dict | df, required_cols |
generate_sample_youtube |
platform_data_fetcher | DataFrame | n_days, start_date, seed |
generate_sample_tiktok |
platform_data_fetcher | DataFrame | n_days, start_date, seed |
generate_sample_instagram |
platform_data_fetcher | DataFrame | n_days, start_date, seed |
generate_sample_email |
platform_data_fetcher | DataFrame | n_broadcasts, start_date, seed |
generate_dashboard |
audience_dashboard | str | growth_df, revenue_results, engagement_df, segmented_df, ab_results, output_path |
print_text_summary |
audience_dashboard | None | growth_df, revenue_results, engagement_df, segmented_df, ab_results |
run_demo |
audience_dashboard | None | output_path |
Notes on Extending the Toolkit
Adding a New Platform
To add support for a new platform (say, Pinterest or LinkedIn):
- In
platform_data_fetcher.py, create a column mapping dictionary following the existing pattern. - Write a
load_pinterest_export(filepath)function that reads the CSV, applies the mapping, parses dates, and addsplatform='pinterest'. - Update
load_autoto detect Pinterest exports by checking for a distinctive column name. - Write a
generate_sample_pinterestfunction for testing.
Adding a New Revenue Stream Type
The monte_carlo_revenue function accepts any dict of stream definitions. To model a more complex distribution (for example, a stream that has a binary outcome — either a sponsorship deal happens or it does not), extend the function to accept a distribution key:
# Hypothetical extension
streams = {
"Sponsorship": {
"mean": 5000,
"std": 1000,
"distribution": "bernoulli", # custom extension
"probability": 0.6 # 60% chance per month
}
}
Then add a branch inside monte_carlo_revenue to handle the "bernoulli" distribution type.
Scheduling Automated Reports
On any Linux/macOS system with cron, you can schedule the dashboard to regenerate weekly:
# Edit crontab
crontab -e
# Add this line to run every Monday at 7am
0 7 * * 1 /path/to/venv/bin/python /path/to/audience_dashboard.py \
--data-dir /path/to/data/ \
--output /path/to/reports/dashboard_$(date +\%Y\%m\%d).png
On Windows, use Task Scheduler to run the equivalent command on a schedule.
Connecting to Live APIs
The loaders in platform_data_fetcher.py are designed for CSV exports, which is the most universally available data format. However, YouTube, Instagram, and TikTok all offer official APIs. If you want to automate data collection without manual CSV exports, you can replace the file-loading functions with API calls while keeping all downstream analytics functions unchanged. The analytics toolkit is deliberately decoupled from the data source — it operates on standardized DataFrames regardless of where those DataFrames came from.
Testing Your Own Extensions
Each sample generator accepts a seed parameter for reproducibility. Use different seed values to generate multiple independent test datasets:
# Generate multiple test scenarios
test_datasets = [
generate_sample_youtube(seed=i) for i in range(5)
]
# Run your function on all five and compare results
for i, df in enumerate(test_datasets):
result = your_new_function(df)
print(f"Seed {i}: {result}")
This pattern makes it easy to check that your code behaves reasonably across a range of inputs, not just one lucky dataset.