Appendix F: Python Analytics Toolkit

Chapters 24 through 26 introduced a range of quantitative methods for understanding your audience, forecasting revenue, and running rigorous experiments. This appendix puts all of those methods in your hands as working, tested Python code. You will find three modules here, each self-contained and usable independently, but designed to work together as a coherent toolkit.

The goal is not to teach Python from scratch — the prerequisites for this appendix assume you have worked through the concepts in Part VI and have at least passing familiarity with running Python scripts. The goal is to give you production-quality, heavily commented code that you can adapt to your own data within an afternoon.


Prerequisites and Learning Path

Before working through this appendix, you should have read:

  • Chapter 24 (Audience Analytics) — growth rate calculations, inflection point detection, and audience segmentation
  • Chapter 25 (Revenue Modeling) — income volatility metrics, multi-stream forecasting, and the Monte Carlo method
  • Chapter 26 (Experimentation) — A/B testing, chi-square significance, and sample size planning

You should also be comfortable with:

  • Running Python scripts from a terminal or command prompt
  • Installing packages with pip
  • Opening CSV files in a spreadsheet application (to understand the data structure before loading it)
  • Basic pandas operations: reading a file, filtering rows, selecting columns

You do not need prior experience with scikit-learn, scipy, or matplotlib. All relevant functions are wrapped in clear, documented interfaces that abstract away the statistical machinery.


Installation and Setup

1. Verify Python Version

This toolkit requires Python 3.9 or higher. Check your version:

python --version
# Expected: Python 3.9.x, 3.10.x, 3.11.x, or 3.12.x

If you see Python 2.x or a version below 3.9, install a newer version from python.org. On Windows, make sure to check "Add Python to PATH" during installation.

Isolating this toolkit's dependencies prevents conflicts with other Python projects:

# Navigate to your project directory
cd my-creator-analytics

# Create a virtual environment
python -m venv venv

# Activate it
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Your terminal prompt should now show (venv) at the beginning.

3. Install Required Packages

pip install pandas numpy matplotlib scipy scikit-learn

This installs:

Package Version (minimum) Purpose
pandas 1.5+ Data loading, cleaning, time-series operations
numpy 1.23+ Array math, random number generation
matplotlib 3.6+ Chart generation, dashboard output
scipy 1.9+ Chi-square tests, normal distribution functions
scikit-learn 1.1+ K-means clustering, StandardScaler

To save the exact versions you install (for reproducibility):

pip freeze > requirements.txt

4. Download the Toolkit Files

Copy the three module files into your project directory:

my-creator-analytics/
    analytics_toolkit.py
    platform_data_fetcher.py
    audience_dashboard.py
    requirements.txt
    venv/
    data/           <-- your CSV exports go here

5. Verify Installation

Run this quick test to confirm everything is working:

# test_install.py
from analytics_toolkit import calculate_growth_rate
from platform_data_fetcher import generate_sample_youtube
import pandas as pd

df = generate_sample_youtube(n_days=30)
growth = calculate_growth_rate(df, metric_col="views", period="weekly")
print("Installation successful!")
print(growth.tail(3))

If you see a table of weekly view totals and growth rates, you are ready to proceed.


Module Overview

The toolkit consists of three files with distinct responsibilities:

analytics_toolkit.py

The core analytics library. Contains all the functions introduced in Chapters 24–26, grouped into three logical sections:

Growth Analytics (Chapter 24): - load_platform_csv — load and standardize platform CSV exports - calculate_growth_rate — period-over-period growth with configurable aggregation - find_inflection_points — statistical detection of growth spikes - segment_audience — K-means clustering of audience behavior

Revenue Forecasting (Chapter 25): - monte_carlo_revenue — multi-stream Monte Carlo simulation - calculate_income_volatility — coefficient of variation and volatility rating

Experimentation (Chapter 26): - run_ab_test — chi-square test with uplift calculation and recommendation - calculate_required_sample_size — power analysis for test planning

Visualization: - plot_growth_chart — time-series line chart with inflection markers - plot_revenue_forecast — histogram + monthly fan chart - plot_ab_test_results — side-by-side bar chart with significance summary

platform_data_fetcher.py

Handles the messy work of loading data from different platform export formats. Each platform uses different column names, date formats, and CSV structures. This module normalizes everything into a consistent schema. Includes:

  • load_youtube_export — YouTube Studio channel and content reports
  • load_tiktok_export — TikTok Creator Center analytics
  • load_instagram_export — Instagram Professional Dashboard insights
  • load_email_export — ConvertKit and Mailchimp broadcast reports
  • load_auto — auto-detection of platform type
  • validate_dataframe — data quality checking
  • Sample data generators for each platform (for testing and demos)

audience_dashboard.py

A complete dashboard generator that assembles data from all other modules into a single multi-panel matplotlib figure. Can run in demo mode (no real data required) or production mode pointing at a directory of CSV exports. Run it from the command line:

# Demo mode (synthetic data)
python audience_dashboard.py --demo --output my_dashboard.png

# Production mode
python audience_dashboard.py --data-dir ./data/ --output dashboard.png --name "My Channel"

Worked Examples

Example 1: Load YouTube Data and Plot Growth

from platform_data_fetcher import generate_sample_youtube
from analytics_toolkit import calculate_growth_rate, plot_growth_chart
import matplotlib.pyplot as plt

# In production: df = load_youtube_export("studio_export.csv")
df = generate_sample_youtube(n_days=365)

# Calculate weekly subscriber growth
weekly = calculate_growth_rate(df, metric_col="subscribers", period="weekly")
print(weekly.head())

# Plot it
fig = plot_growth_chart(
    weekly,
    date_col="date",
    metric_col="subscribers",
    title="Weekly Subscriber Count",
    save_path="subscriber_growth.png"
)
plt.show()

This produces a line chart with a shaded area beneath it. The chart is saved as a PNG file at the path you specify with save_path.


Example 2: Detect Viral Moments with Inflection Points

from platform_data_fetcher import generate_sample_tiktok
from analytics_toolkit import calculate_growth_rate, find_inflection_points, plot_growth_chart

df = generate_sample_tiktok(n_days=180)

# Aggregate views daily (TikTok data is already daily, so just use it directly)
weekly = calculate_growth_rate(df, metric_col="views", period="weekly")

# Find inflection points at 2x the normal growth rate
spikes = find_inflection_points(weekly, metric_col="views", threshold_multiplier=2.0)
print(f"Found {len(spikes)} viral moments:")
print(spikes[["date", "views", "growth_rate", "z_score"]])

# Plot with spike markers
fig = plot_growth_chart(
    weekly,
    metric_col="views",
    title="TikTok Views — Viral Moments Highlighted",
    inflection_points=spikes,
    save_path="viral_moments.png"
)

The red triangle markers on the chart show exactly when your content spiked beyond normal variation. Correlating these dates with your content calendar reveals what triggered the growth.


Example 3: Segment Your Audience by Engagement Behavior

from platform_data_fetcher import generate_sample_instagram
from analytics_toolkit import segment_audience
import pandas as pd

df = generate_sample_instagram(n_days=180)

# Segment by engagement dimensions
segmented = segment_audience(
    df,
    feature_cols=["impressions", "reach", "likes", "comments", "saves"],
    n_clusters=4
)

# Inspect the resulting segments
summary = segmented.groupby("segment_label")[
    ["impressions", "likes", "comments", "saves"]
].mean().round(1)
print(summary)

# Count members per segment
print(segmented["segment_label"].value_counts())

The output shows each segment's average behavior. "Power Fan" rows have high engagement across all dimensions; "Casual Viewer" rows show high impressions but low interaction. Understanding segment sizes helps you design content for each audience tier.


Example 4: Run a Revenue Monte Carlo Simulation

from analytics_toolkit import monte_carlo_revenue, plot_revenue_forecast
import matplotlib.pyplot as plt

# Define your revenue streams with mean and standard deviation
streams = {
    "AdSense": {
        "mean": 800,
        "std": 200,
        "growth_rate": 0.015  # 1.5% monthly growth
    },
    "Sponsorships": {
        "mean": 2000,
        "std": 800
    },
    "Online Course": {
        "mean": 1500,
        "std": 600,
        "growth_rate": 0.03
    },
    "Memberships": {
        "mean": 650,
        "std": 100,
        "growth_rate": 0.01
    },
}

results = monte_carlo_revenue(streams, n_simulations=5000, months=12)

print(f"Annual Revenue Forecast:")
print(f"  Conservative (P10): ${results['p10']:>10,.0f}")
print(f"  Likely (P50):       ${results['p50']:>10,.0f}")
print(f"  Optimistic (P90):   ${results['p90']:>10,.0f}")

fig = plot_revenue_forecast(results, title="12-Month Revenue Forecast", save_path="forecast.png")
plt.show()

The n_simulations=5000 argument runs 5,000 independent revenue paths. The P10 figure is your planning floor — if you can operate comfortably at that income level, your business can survive a bad year. The P90 figure is your upside scenario.


Example 5: Measure Income Volatility

from analytics_toolkit import calculate_income_volatility

# Replace with your actual monthly totals
my_monthly_income = [
    2400, 3100, 1800, 4200, 2900, 2200,
    5100, 1900, 3300, 2700, 4800, 3600
]

volatility = calculate_income_volatility(my_monthly_income)

print(f"Mean monthly income:  ${volatility['mean']:,.0f}")
print(f"Std deviation:        ${volatility['std']:,.0f}")
print(f"Coefficient of var:   {volatility['cv']:.2f}")
print(f"Worst drawdown:       {volatility['max_drawdown_pct']:.1f}%")
print(f"10th pct floor:       ${volatility['p10_floor']:,.0f}")
print(f"Volatility rating:    {volatility['volatility_rating']}")

A CV above 0.5 signals "High" volatility — months vary enough that financial planning becomes difficult. The 10th-percentile floor is the number to budget around: it represents a genuinely bad month that still has a 90% chance of beating it.


Example 6: Plan an A/B Test Before You Run It

from analytics_toolkit import calculate_required_sample_size

# Your current thumbnail click-through rate
current_ctr = 0.052  # 5.2%

# You want to detect a 20% relative improvement (5.2% -> 6.24%)
sample_size = calculate_required_sample_size(
    baseline_rate=current_ctr,
    minimum_detectable_effect=0.20,
    power=0.80,
    alpha=0.05
)

print(f"Required viewers per thumbnail variant: {sample_size['n_per_group']:,}")
print(f"Total viewers needed (both variants):   {sample_size['total_n']:,}")
print(f"Your baseline CTR:                      {sample_size['baseline_rate']*100:.1f}%")
print(f"Target CTR (if test wins):              {sample_size['target_rate']*100:.2f}%")

Run this calculation before you expose your audience to any test. If the required sample size exceeds your typical monthly impression volume, the test will take too long to be practically useful — consider testing a larger change that requires a smaller sample.


Example 7: Analyze an A/B Test After It Runs

from analytics_toolkit import run_ab_test, plot_ab_test_results
import matplotlib.pyplot as plt

# Thumbnail A (control): classic talking-head thumbnail
# Thumbnail B (test): redesigned with bold text overlay
result = run_ab_test(
    control_conversions=312,
    control_total=6000,
    test_conversions=398,
    test_total=6000,
    alpha=0.05
)

print(f"Control CTR:  {result['control_rate']*100:.2f}%")
print(f"Test CTR:     {result['test_rate']*100:.2f}%")
print(f"Uplift:       {result['relative_uplift']:+.1f}%")
print(f"p-value:      {result['p_value']:.4f}")
print(f"Significant:  {result['significant']}")
print(f"\nRecommendation:\n{result['recommendation']}")

fig = plot_ab_test_results(
    control_data={"conversions": 312, "total": 6000},
    test_data={"conversions": 398, "total": 6000},
    test_results=result,
    title="Thumbnail A/B Test — Jan 2024",
    save_path="ab_test_results.png"
)
plt.show()

Example 8: Load Real Instagram Data

from platform_data_fetcher import load_instagram_export, validate_dataframe

# Export from: Instagram Professional Dashboard > Insights > Export Data
df = load_instagram_export("instagram_insights_jan2024.csv")

# Validate data quality before analysis
report = validate_dataframe(df, required_cols=["date", "impressions", "reach"])
if report["valid"]:
    print(f"Data looks good: {report['row_count']} rows, {report['date_range']}")
else:
    print("Data quality issues:")
    for issue in report["issues"]:
        print(f"  - {issue}")

print(df.head())
print(df.dtypes)

The validate_dataframe function catches common export problems — missing columns, high null rates, date gaps — before they silently corrupt your analysis. Always run validation on real data.


Example 9: Multi-Platform Growth Comparison

from platform_data_fetcher import (
    generate_sample_youtube,
    generate_sample_tiktok,
    generate_sample_instagram,
)
from analytics_toolkit import calculate_growth_rate
import pandas as pd
import matplotlib.pyplot as plt

yt = generate_sample_youtube(n_days=180)
tt = generate_sample_tiktok(n_days=180)
ig = generate_sample_instagram(n_days=180)

fig, ax = plt.subplots(figsize=(12, 5))

for df, col, label, color in [
    (yt, "subscribers", "YouTube Subscribers", "#ff0000"),
    (tt, "new_followers", "TikTok New Followers (cumulative)", "#010101"),
    (ig, "new_followers", "Instagram New Followers (cumulative)", "#c13584"),
]:
    if col in df.columns:
        plot_df = df[["date", col]].dropna()
        if col == "new_followers":
            plot_df = plot_df.copy()
            plot_df[col] = plot_df[col].cumsum()
        # Normalize to index = 100 at start for fair comparison
        start_val = plot_df[col].iloc[0]
        if start_val > 0:
            plot_df["indexed"] = plot_df[col] / start_val * 100
            ax.plot(plot_df["date"], plot_df["indexed"],
                    label=label, color=color, linewidth=2)

ax.set_title("Cross-Platform Follower Growth (Indexed to 100)", fontsize=13, fontweight="bold")
ax.set_ylabel("Indexed Growth (Start = 100)")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("platform_comparison.png", dpi=150)
plt.show()

Indexing to 100 at the start date lets you compare growth rates across platforms with very different absolute follower counts. A creator with 500,000 YouTube subscribers and 8,000 TikTok followers can still see that TikTok is growing three times faster.


Example 10: Run the Full Dashboard in Demo Mode

This is the fastest way to see everything working together:

python audience_dashboard.py --demo --output demo_dashboard.png

The script: 1. Generates 365 days of synthetic YouTube data 2. Generates 180 days of synthetic TikTok and Instagram data 3. Generates 52 weeks of synthetic email analytics 4. Runs K-means segmentation on the TikTok data 5. Runs a Monte Carlo forecast for five revenue streams 6. Runs an A/B test analysis on synthetic thumbnail data 7. Assembles a five-panel dashboard PNG 8. Prints a text summary to the console

Open demo_dashboard.png to see the finished product. Then swap in your own data files when you are ready.


Function Reference Table

Function Module Returns Key Parameters
load_platform_csv analytics_toolkit DataFrame filepath, platform
calculate_growth_rate analytics_toolkit DataFrame df, date_col, metric_col, period
find_inflection_points analytics_toolkit DataFrame df, metric_col, threshold_multiplier
segment_audience analytics_toolkit DataFrame df, feature_cols, n_clusters
monte_carlo_revenue analytics_toolkit dict streams, n_simulations, months
calculate_income_volatility analytics_toolkit dict monthly_incomes
run_ab_test analytics_toolkit dict control_conversions, control_total, test_conversions, test_total
calculate_required_sample_size analytics_toolkit dict baseline_rate, minimum_detectable_effect, power, alpha
plot_growth_chart analytics_toolkit Figure df, date_col, metric_col, title, inflection_points
plot_revenue_forecast analytics_toolkit Figure results, title
plot_ab_test_results analytics_toolkit Figure control_data, test_data, test_results
load_youtube_export platform_data_fetcher DataFrame filepath, report_type
load_tiktok_export platform_data_fetcher DataFrame filepath
load_instagram_export platform_data_fetcher DataFrame filepath
load_email_export platform_data_fetcher DataFrame filepath, provider
load_auto platform_data_fetcher (DataFrame, str) filepath
validate_dataframe platform_data_fetcher dict df, required_cols
generate_sample_youtube platform_data_fetcher DataFrame n_days, start_date, seed
generate_sample_tiktok platform_data_fetcher DataFrame n_days, start_date, seed
generate_sample_instagram platform_data_fetcher DataFrame n_days, start_date, seed
generate_sample_email platform_data_fetcher DataFrame n_broadcasts, start_date, seed
generate_dashboard audience_dashboard str growth_df, revenue_results, engagement_df, segmented_df, ab_results, output_path
print_text_summary audience_dashboard None growth_df, revenue_results, engagement_df, segmented_df, ab_results
run_demo audience_dashboard None output_path

Notes on Extending the Toolkit

Adding a New Platform

To add support for a new platform (say, Pinterest or LinkedIn):

  1. In platform_data_fetcher.py, create a column mapping dictionary following the existing pattern.
  2. Write a load_pinterest_export(filepath) function that reads the CSV, applies the mapping, parses dates, and adds platform='pinterest'.
  3. Update load_auto to detect Pinterest exports by checking for a distinctive column name.
  4. Write a generate_sample_pinterest function for testing.

Adding a New Revenue Stream Type

The monte_carlo_revenue function accepts any dict of stream definitions. To model a more complex distribution (for example, a stream that has a binary outcome — either a sponsorship deal happens or it does not), extend the function to accept a distribution key:

# Hypothetical extension
streams = {
    "Sponsorship": {
        "mean": 5000,
        "std": 1000,
        "distribution": "bernoulli",  # custom extension
        "probability": 0.6            # 60% chance per month
    }
}

Then add a branch inside monte_carlo_revenue to handle the "bernoulli" distribution type.

Scheduling Automated Reports

On any Linux/macOS system with cron, you can schedule the dashboard to regenerate weekly:

# Edit crontab
crontab -e

# Add this line to run every Monday at 7am
0 7 * * 1 /path/to/venv/bin/python /path/to/audience_dashboard.py \
    --data-dir /path/to/data/ \
    --output /path/to/reports/dashboard_$(date +\%Y\%m\%d).png

On Windows, use Task Scheduler to run the equivalent command on a schedule.

Connecting to Live APIs

The loaders in platform_data_fetcher.py are designed for CSV exports, which is the most universally available data format. However, YouTube, Instagram, and TikTok all offer official APIs. If you want to automate data collection without manual CSV exports, you can replace the file-loading functions with API calls while keeping all downstream analytics functions unchanged. The analytics toolkit is deliberately decoupled from the data source — it operates on standardized DataFrames regardless of where those DataFrames came from.

Testing Your Own Extensions

Each sample generator accepts a seed parameter for reproducibility. Use different seed values to generate multiple independent test datasets:

# Generate multiple test scenarios
test_datasets = [
    generate_sample_youtube(seed=i) for i in range(5)
]

# Run your function on all five and compare results
for i, df in enumerate(test_datasets):
    result = your_new_function(df)
    print(f"Seed {i}: {result}")

This pattern makes it easy to check that your code behaves reasonably across a range of inputs, not just one lucky dataset.