Capstone Project 3: Designing and Evaluating a Media Literacy Intervention

Project Overview

Media literacy education is one of the most promising tools available for building long-term, scalable resistance to misinformation at the societal level. Unlike content moderation or fact-checking, which operate reactively on specific pieces of content, media literacy education operates proactively on the cognitive and epistemic habits of individuals, potentially building resistance that persists and generalizes across new information challenges. The evidence base for media literacy interventions has grown substantially in the past decade, offering meaningful guidance on what works, what does not, and why.

This project takes you through the complete research and design process of creating, evaluating, and refining a media literacy intervention. You will conduct a needs assessment for a specific target audience, design a five-session curriculum grounded in the theoretical and empirical literature, create an evaluation plan with pre/post measures and a control group design, analyze simulated pre/post data using Python, and produce a professional report suitable for presentation to a school district, community organization, public library system, or philanthropic foundation.

This project is simultaneously about media literacy and about applied research methodology. The skills it develops — needs assessment, theory-grounded program design, rigorous evaluation, and evidence-based reporting — transfer across the full range of educational and public health intervention design contexts.

Learning Objectives

By completing this project, you will be able to:

  1. Conduct a structured needs assessment that identifies a target audience's knowledge gaps, motivations, and barriers using both existing literature and original data collection
  2. Translate theoretical frameworks (inoculation theory, constructivist learning principles, behavioral nudge design) into specific curriculum activities
  3. Design a randomized or quasi-experimental evaluation capable of providing credible evidence about intervention effectiveness
  4. Select and justify outcome measures that are valid, reliable, and sensitive to change
  5. Analyze pre/post intervention data using appropriate statistical methods including paired t-tests, Cohen's d, and regression analysis
  6. Interpret and communicate evaluation findings honestly, including null results and effect size considerations
  7. Apply an ethical framework to the design and evaluation of epistemic interventions

Phase 1: Needs Assessment

1.1 Selecting Your Target Audience

The first and most consequential decision in intervention design is selecting your target audience. Your choice should be specific enough to support tailored design but broad enough to be realistically reachable. Strong choices for this project include:

  • Middle school students (grades 6-8): A critical developmental window for forming media habits; highly susceptible to peer-mediated misinformation on platforms like TikTok and Instagram; generally accessible through school partnerships.
  • Community college students: A diverse adult population with varying prior media literacy education; high stakes for accurate information in educational and financial decision-making.
  • Adults over 65: Research consistently identifies this group as particularly susceptible to certain types of online misinformation; motivated to learn if framing is respectful and empowering.
  • Parents of school-age children: A "bridging" audience who can influence both their children's media habits and their own; reachable through school communications.
  • New voters (ages 18-24): A critical civic moment; high smartphone use; strong opinions about political information credibility.

Once you have selected your audience, you need to know them in depth. The needs assessment phase gathers that knowledge.

1.2 Literature Review

Before collecting original data, review what is already known about your target audience's media literacy levels, common misinformation vulnerabilities, and prior intervention research. Your literature review should answer:

  1. What does research say about this population's information evaluation habits and common errors?
  2. What interventions have been tried with this population? What were their designs and results?
  3. What theoretical mechanisms are most relevant to this population's susceptibility and motivation?
  4. What barriers to engagement with media literacy education does this population face?

A strong literature review cites at least 15 peer-reviewed sources and synthesizes findings rather than simply listing them.

1.3 Theory of Change

Your intervention design should be grounded in an explicit theory of change: a logic model that connects your activities to your desired outcomes through specified mechanisms. For a media literacy intervention, a theory of change might look like this:

Inputs: Curriculum sessions, facilitator training, pre/post assessment tools Activities: Interactive workshops on logical fallacies, source evaluation practice, inoculation exercises Immediate outputs: Increased knowledge of media literacy concepts, increased self-efficacy for evaluating sources Short-term outcomes (1-3 months): Changed information evaluation behavior, reduced sharing of unverified content Long-term outcomes (6-12 months): Sustained habit change, diffusion to social networks, improved civic participation

Be specific about the mechanisms connecting each step. "Better source evaluation knowledge leads to improved source evaluation behavior" is not a mechanism — it is an assumption. The mechanism is something like: "Repeated deliberate practice with source evaluation tasks builds procedural automaticity, reducing the cognitive load required for evaluation in high-stakes real-world situations, making evaluation more likely to occur."

1.4 Original Data Collection (Needs Survey)

"""
capstone03/needs_assessment.py
Survey design and analysis for media literacy needs assessment.
"""

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import logging
from scipy import stats

logger = logging.getLogger(__name__)
FIGURE_DIR = Path("figures")
FIGURE_DIR.mkdir(exist_ok=True)

# Survey instrument: adapt for your specific target population
# These items should be pilot-tested with 5-10 members of your target audience
# before finalizing.

NEEDS_SURVEY_ITEMS = {
    # Section 1: Current information habits
    "news_sources_count": (
        "How many distinct sources do you typically consult when you encounter "
        "a news story you are unsure about? (0 = I don't check multiple sources)"
    ),
    "verification_frequency": (
        "When you see a surprising claim on social media, how often do you "
        "verify it before sharing? (1=Never, 5=Always)"
    ),
    "lateral_reading_awareness": (
        "Are you familiar with 'lateral reading' as a technique for evaluating "
        "sources? (1=Never heard of it, 5=I use it regularly)"
    ),
    "trust_calibration_news": (
        "How much do you trust mainstream news organizations to report accurately? "
        "(1=Not at all, 5=Completely)"
    ),
    "trust_calibration_social": (
        "How much do you trust information you see on social media? "
        "(1=Not at all, 5=Completely)"
    ),
    # Section 2: Knowledge assessment (factual items)
    "knows_who_owns_media": (
        "True/False: Most major US news networks are owned by a small number of "
        "large corporations."
    ),
    "understands_algorithm": (
        "True/False: Social media platforms show you content based partly on what "
        "they predict will keep you engaged."
    ),
    "understands_confirmation_bias": (
        "True/False: People tend to evaluate information more critically when it "
        "contradicts their prior beliefs."
        # Note: This is FALSE — the correct answer is that people evaluate confirming
        # information less critically (confirmation bias). Correct answer: False.
    ),
    # Section 3: Attitudes and self-efficacy
    "media_literacy_importance": (
        "How important do you think media literacy skills are in today's world? "
        "(1=Not important, 5=Extremely important)"
    ),
    "self_efficacy_evaluation": (
        "How confident are you in your ability to identify misinformation? "
        "(1=Not confident, 5=Very confident)"
    ),
    "motivation_to_learn": (
        "How interested would you be in participating in a workshop on evaluating "
        "online information? (1=Not interested, 5=Very interested)"
    ),
    # Section 4: Barriers
    "barrier_time": (
        "How much does lack of time prevent you from verifying information before "
        "sharing? (1=Never, 5=Very often)"
    ),
    "barrier_complexity": (
        "How often do you feel that evaluating information sources is too "
        "complicated? (1=Never, 5=Very often)"
    ),
}


def analyze_needs_survey(survey_df: pd.DataFrame) -> dict:
    """
    Analyze survey responses to identify knowledge gaps and design priorities.

    Args:
        survey_df: DataFrame where each row is a survey respondent and
                   columns correspond to NEEDS_SURVEY_ITEMS keys

    Returns:
        Dict of analysis results
    """
    results = {}

    # Descriptive statistics
    numeric_cols = [
        col for col in survey_df.columns
        if col in NEEDS_SURVEY_ITEMS and survey_df[col].dtype in [np.float64, np.int64]
    ]

    if numeric_cols:
        results["descriptive_stats"] = survey_df[numeric_cols].describe().to_dict()

    # Identify priority gaps (low scores on knowledge/skill items)
    knowledge_cols = [
        "lateral_reading_awareness", "knows_who_owns_media",
        "understands_algorithm", "understands_confirmation_bias"
    ]
    available_knowledge_cols = [c for c in knowledge_cols if c in survey_df.columns]

    if available_knowledge_cols:
        knowledge_means = survey_df[available_knowledge_cols].mean()
        results["knowledge_gaps"] = knowledge_means.sort_values().to_dict()
        logger.info(f"Knowledge gaps identified: {results['knowledge_gaps']}")

    # Motivation and barrier analysis
    if "motivation_to_learn" in survey_df.columns:
        mean_motivation = survey_df["motivation_to_learn"].mean()
        results["mean_motivation"] = float(mean_motivation)

    barrier_cols = [c for c in ["barrier_time", "barrier_complexity"] if c in survey_df.columns]
    if barrier_cols:
        results["barrier_means"] = survey_df[barrier_cols].mean().to_dict()

    # Visualize knowledge gap profile
    if available_knowledge_cols:
        fig, ax = plt.subplots(figsize=(9, 5))
        means = survey_df[available_knowledge_cols].mean()
        colors = ["#F44336" if v < 3 else "#4CAF50" for v in means.values]
        bars = ax.barh(means.index, means.values, color=colors)
        ax.axvline(x=3, color="gray", linestyle="--", alpha=0.7, label="Neutral midpoint")
        ax.set_xlim(1, 5)
        ax.set_xlabel("Mean Score", fontsize=12)
        ax.set_title("Knowledge and Skill Gap Profile", fontsize=13, fontweight="bold")
        ax.legend()
        plt.tight_layout()
        plt.savefig(FIGURE_DIR / "needs_assessment_gap_profile.png", dpi=150, bbox_inches="tight")
        plt.show()

    return results

Phase 2: Intervention Design

2.1 Theoretical Frameworks

Your curriculum design must be explicitly grounded in at least two of the following theoretical frameworks. For each framework, explain specifically how it shaped the design of particular activities.

Inoculation Theory (McGuire, 1964; van der Linden, 2021) Inoculation works by forewarning people about manipulation attempts and then providing "weakened doses" of the manipulation, accompanied by refutational material. Applied to media literacy, this means: - Explaining the manipulative techniques used to produce misinformation before exposing participants to examples - Providing a worked refutation of the technique as applied to a real example - Having participants practice applying the refutation to novel examples

The key design principles are: (1) forewarning precedes exposure, (2) the "dose" is weakened enough not to overwhelm defenses, (3) participants actively generate their own refutations (active inoculation is more effective than passive).

Constructivist Learning Principles (Piaget, Vygotsky) Knowledge is constructed through active engagement, not transmitted passively. Implications for design: - Activities should require participants to apply concepts to new problems, not just receive explanations - Learning should build progressively on prior knowledge - Social learning (peer discussion, collaborative tasks) supports deeper understanding - Metacognitive reflection ("What did you notice about your own thinking?") consolidates learning

Behavioral Nudge Design (Thaler & Sunstein, Pennycook & Rand) Research suggests that simple nudges — prompts that redirect attention without restricting choice — can improve information evaluation behavior. Relevant findings: - Prompting people to consider accuracy before sharing (accuracy nudge) significantly reduces sharing of false headlines - Friction nudges (adding an extra step before sharing) reduce inadvertent sharing - Social norm nudges ("Most people verify information before sharing") can shift behavior through conformity

Self-Determination Theory (Ryan & Deci) For behavior change to be sustained, it must be motivated intrinsically, not just externally. Design implications: - Activities should support autonomy (choice in topics, personal relevance) - Competence should be built progressively with achievable challenges - Relatedness to community and social identity should be leveraged

2.2 The Five-Session Curriculum

Design a five-session curriculum where each session is 60–90 minutes. The following framework is a starting point; you should substantially customize it for your target audience, modifying vocabulary, examples, and activities to match their developmental level, cultural context, and motivational profile.


Session 1: The Information Environment — Understanding the Landscape

Learning Objectives: - Participants can describe how social media algorithms select and prioritize content - Participants recognize that their information environment is personalized and filtered - Participants identify at least three business-model incentives that shape online content

Opening Activity (15 min): The Personalized Feed Experiment Ask participants to open their social media feed and, without scrolling, write down: What is the first piece of news content they see? What emotion does it evoke? Have they seen content from this source before? Brief pair discussion, then whole-group debrief. The goal is to activate metacognitive awareness of their own consumption patterns.

Core Content (25 min): How Your Feed is Built Explain at an accessible level how recommendation algorithms work: engagement optimization, the attention economy, the role of emotional content in maximizing clicks. Use concrete examples from platforms participants use. Reference the research finding that false news spreads faster and farther than true news on Twitter (Vosoughi et al., 2018) as a counterintuitive but important fact.

Skill Activity (20 min): The Lateral Reading Introduction Introduce lateral reading as the skill of stepping outside a source to investigate it. Demonstrate with a worked example: a website making a health claim. Show how to: (1) open a new tab, (2) search for the source (not the claim), (3) use Wikipedia as a starting point, (4) find what credible sources say about this source. Have pairs practice with one new example.

Reflection and Homework (10 min): Ask participants to spend five minutes before the next session noting, in a simple log, where the information they share came from. Not to change behavior yet — just to observe.


Session 2: The Anatomy of Misinformation — Types, Techniques, and Tells

Learning Objectives: - Participants distinguish between misinformation, disinformation, and malinformation - Participants identify at least five common misinformation techniques - Participants recognize the emotional "tells" that signal manipulative content

Opening Activity (10 min): The Misinformation Gallery Display six examples of content on a projector or printed handout: two clearly reliable pieces, two clearly unreliable, and two ambiguous. Ask participants to rate each 1-5 for trustworthiness individually, then discuss in pairs. Reveal labels and discuss disagreements.

Core Content (25 min): Inoculation — Techniques in Action Using the inoculation framework, explain each major misinformation technique with a real example, then provide a refutation. Techniques to cover: emotional manipulation, impersonation, misleading statistics, false dichotomies, appeal to fake experts, selective use of true facts. For each technique: (1) name and explain it, (2) show a real example with the technique highlighted, (3) provide the refutation, (4) ask participants to generate the refutation themselves for a similar example.

Skill Activity (25 min): Spot the Technique Provide twelve short content examples (headlines, social media posts, excerpts). Participants work individually to identify which technique (if any) is being used, then compare answers in small groups and discuss disagreements. Full-group debrief on the most contested examples.

Reflection (10 min): Ask: "Which technique do you think you are most vulnerable to? Why?" Brief pair discussion. This metacognitive exercise increases engagement and personalizes the learning.


Session 3: Evaluating Sources — The SIFT Method and Lateral Reading

Learning Objectives: - Participants can execute all four steps of the SIFT method fluently - Participants perform lateral reading independently on an unfamiliar source within 3 minutes - Participants identify the features of domain registration, funding transparency, and expert credentials

Opening Activity (15 min): Speed Lateral Reading Give participants an unfamiliar source and three minutes to assess its credibility using any method they currently use. Compare findings and methods. This establishes a pre-practice baseline before teaching the systematic method.

Core Content (20 min): SIFT in Depth Walk through each SIFT step with worked examples: - Stop: The most important step — interrupting the automatic engagement impulse. Practice identifying emotional triggers that should prompt a Stop. - Investigate the source: Not reading the source deeply, but looking it up externally. - Find better coverage: For specific claims, finding independent high-quality reporting on the same claim. - Trace claims: Locating the original source of a claim that has been republished or paraphrased.

Skill Activity (35 min): SIFT Lab Structured practice with five sources of varying credibility. Participants work through each source using the SIFT protocol, documenting their process. Sources should include: one highly credible source, one "lookalike" impersonation site, one politically biased-but-not-fabricated source, one completely fabricated source, and one legitimate but niche expert source.

Reflection and Homework (10 min): Each participant commits to evaluating one piece of information they encounter in the next week using the SIFT method and bringing their experience to Session 4.


Session 4: Numbers, Graphs, and Data Literacy

Learning Objectives: - Participants identify at least five common techniques for misleading audiences through data visualization - Participants read statistical claims critically, distinguishing relative from absolute risk - Participants recognize when "data" is being used rhetorically rather than analytically

Opening Activity (15 min): "Which Graph Lies?" Present three pairs of graphs showing the same data, each pair using a different presentation (truncated axis vs. full axis; percentage vs. absolute number; cherry-picked time period vs. full time period). Ask participants to notice what is different and what impression each gives. This is often striking and memorable.

Core Content (25 min): Data Literacy Fundamentals Cover: truncated axes, cherry-picked baselines, relative vs. absolute risk (with a concrete health example: "50% more likely to get X" vs. "risk increases from 0.002% to 0.003%"), correlation vs. causation (with amusing examples), missing sample sizes and confidence intervals, and the difference between anecdote and evidence.

Skill Activity (30 min): Data Audit Provide five claims drawn from real news stories, each accompanied by the underlying data. Participants work in pairs to: (1) identify any gap between what the data shows and what the claim states, (2) write a more accurate version of the claim, and (3) decide whether the original claim was misleading, merely imprecise, or accurate.

Reflection (10 min): Discussion: "Where do you encounter data-based claims most often? What makes you trust or distrust them?" Personalize to your target audience's context.


Session 5: Integration, Practice, and Personal Action Planning

Learning Objectives: - Participants integrate skills from previous sessions into a coherent practice - Participants identify the personal contexts where they are most vulnerable to misinformation - Participants commit to specific behavioral changes with a concrete implementation plan

Opening Activity (20 min): The SIFT + Data Challenge A timed, integrated challenge combining all skills from the course. Participants receive a briefing packet containing five content items (articles, graphs, social media posts, one video transcript) and have 15 minutes to assess each using all tools covered. Group debrief on findings and disagreements.

Core Content (15 min): What Research Tells Us About Long-Term Habit Change Brief, accessible summary of the behavior change literature as it applies to information hygiene: habit formation, implementation intentions ("If I see a surprising headline, I will open a new tab and look up the source before sharing"), social commitment effects, and the value of deliberate practice over passive awareness.

Personal Action Planning (20 min): Each participant completes a personal information hygiene plan: three specific situations where they commit to using a specific skill (e.g., "When I see a news story about health/medicine, I will check the source's funding before sharing"). Plans are shared with a partner who will serve as an accountability buddy for the next month.

Course Synthesis and Feedback (15 min): Final reflection on what was most useful, what was most challenging, and what participants wish they had learned. This is also the moment to administer the post-test assessment.


Phase 3: Content Development Details

For each session, you are required to develop:

  1. A detailed facilitator guide (script-level detail for the first three activities of each session)
  2. All participant handouts (source evaluation worksheets, data audit worksheets, SIFT protocol cards)
  3. All assessment items for that session's content
  4. Slide deck or visual aids (description is sufficient; full design is encouraged but not required)

The facilitator guide should include: - Timing notes for each activity - Discussion questions with expected responses and follow-up prompts for common misconceptions - Accommodation suggestions for participants with different prior knowledge levels - Notes on culturally sensitive examples for diverse audiences


Phase 4: Evaluation Design

4.1 Research Design

A pre/post design with a waitlist control group is the strongest feasible design for a student project. The design is:

  • Time 1: Administer pre-test to both treatment and control groups
  • Intervention: Treatment group receives the five-session curriculum; control group continues as normal (waitlist design — they receive the intervention after data collection is complete)
  • Time 2: Administer post-test to both groups immediately following the treatment group's final session
  • Time 3 (optional): Follow-up assessment 4-8 weeks later to measure retention

Randomization: If you have access to a school setting with multiple classrooms, classrooms should be randomized to treatment and control conditions (cluster randomization), not individual students within the same classroom (which creates contamination risk).

4.2 Outcome Measures

Design a 30-item pre/post assessment covering:

Knowledge (10 items, 1 point each): Multiple choice and true/false items assessing factual knowledge about media literacy concepts covered in the curriculum. Example:

Which of the following best explains why social media algorithms tend to amplify emotionally provocative content? (a) Platform engineers intentionally design algorithms to spread false information (b) Emotionally provocative content generates more engagement signals, which algorithms optimize for (c) Users prefer emotionally provocative content because they are poorly educated (d) Government regulations require platforms to include viral content

Skill: Credibility Assessment (10 items, 2 points each): Present 10 sources or claims and ask participants to rate their credibility on a 5-point scale. Score participants based on calibration against a pre-established answer key developed by domain experts. These items should be piloted and validated before use.

Behavioral Intentions (5 items, 2 points each): Ask about likelihood of specific information evaluation behaviors using validated behavioral intention scales.

Self-Efficacy (5 items, 2 points each): Adapted from the News Literacy Self-Efficacy Scale, asking about confidence in specific evaluation tasks.

Total possible score: 40 points

4.3 Power Analysis

Before finalizing your sample size target, conduct a power analysis to ensure your study is adequately powered to detect a meaningful effect.

"""
capstone03/power_analysis.py
Statistical power analysis for the media literacy intervention evaluation.
"""

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from pathlib import Path
import logging

logger = logging.getLogger(__name__)
FIGURE_DIR = Path("figures")


def compute_required_sample_size(
    effect_size_d: float = 0.4,
    alpha: float = 0.05,
    power: float = 0.80,
    two_tailed: bool = True,
) -> int:
    """
    Compute required sample size per group for an independent samples t-test.

    Args:
        effect_size_d: Expected Cohen's d (0.2=small, 0.5=medium, 0.8=large)
        alpha: Significance threshold (Type I error rate)
        power: Desired statistical power (1 - Type II error rate)
        two_tailed: Whether to use two-tailed critical values

    Returns:
        Required sample size per group

    Note: For media literacy interventions, meta-analyses suggest typical
    effect sizes in the range d = 0.2 to 0.5. Using d = 0.3 is conservative
    and appropriate for planning purposes.
    """
    # Binary search for required n
    from scipy.stats import norm, t

    alpha_per_tail = alpha / 2 if two_tailed else alpha
    z_alpha = norm.ppf(1 - alpha_per_tail)
    z_power = norm.ppf(power)

    # Analytical formula: n = ((z_alpha + z_power) / d)^2 * 2
    # This is approximate; we also use simulation for validation
    n_approx = int(np.ceil(2 * ((z_alpha + z_power) / effect_size_d) ** 2))

    # Validate with simulation
    n_per_group = n_approx
    detected_power = simulate_power(effect_size_d, n_per_group, alpha, n_simulations=5000)

    logger.info(
        f"Power analysis: effect_size={effect_size_d}, alpha={alpha}, "
        f"target_power={power}"
    )
    logger.info(
        f"Required n per group: {n_per_group} "
        f"(simulated power: {detected_power:.3f})"
    )

    return n_per_group


def simulate_power(
    effect_size_d: float,
    n_per_group: int,
    alpha: float,
    n_simulations: int = 5000,
) -> float:
    """Estimate power through Monte Carlo simulation."""
    np.random.seed(42)
    rejections = 0

    for _ in range(n_simulations):
        control = np.random.normal(0, 1, n_per_group)
        treatment = np.random.normal(effect_size_d, 1, n_per_group)
        _, p_value = stats.ttest_ind(treatment, control)
        if p_value < alpha:
            rejections += 1

    return rejections / n_simulations


def plot_power_curves(
    effect_sizes: list = None,
    n_range: range = None,
):
    """Plot statistical power as a function of sample size for different effect sizes."""
    if effect_sizes is None:
        effect_sizes = [0.2, 0.3, 0.4, 0.5, 0.8]
    if n_range is None:
        n_range = range(20, 300, 10)

    fig, ax = plt.subplots(figsize=(10, 6))
    colors = ["#9E9E9E", "#2196F3", "#4CAF50", "#FF9800", "#F44336"]
    labels = ["Small (d=0.2)", "d=0.3", "Medium (d=0.4)", "d=0.5", "Large (d=0.8)"]

    for effect_size, color, label in zip(effect_sizes, colors, labels):
        powers = []
        for n in n_range:
            power = simulate_power(effect_size, n, alpha=0.05, n_simulations=1000)
            powers.append(power)
        ax.plot(list(n_range), powers, color=color, lw=2, label=label)

    ax.axhline(y=0.80, color="black", linestyle="--", alpha=0.5, label="80% power threshold")
    ax.set_xlabel("Sample Size Per Group (n)", fontsize=12)
    ax.set_ylabel("Statistical Power", fontsize=12)
    ax.set_title("Statistical Power by Effect Size and Sample Size", fontsize=13, fontweight="bold")
    ax.legend(fontsize=10)
    ax.set_ylim(0, 1.05)
    ax.set_xlim(min(n_range), max(n_range))

    plt.tight_layout()
    plt.savefig(FIGURE_DIR / "power_curves.png", dpi=150, bbox_inches="tight")
    plt.show()

Phase 5: Pilot Analysis

"""
capstone03/pilot_analysis.py
Analysis of simulated pre/post intervention data.
This module generates and analyzes a simulated dataset for demonstration.
Replace simulated data with actual collected data when available.
"""

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from pathlib import Path
import logging

logger = logging.getLogger(__name__)
FIGURE_DIR = Path("figures")
FIGURE_DIR.mkdir(exist_ok=True)
plt.style.use("seaborn-v0_8-whitegrid")


def generate_simulated_data(
    n_treatment: int = 45,
    n_control: int = 43,
    true_effect_d: float = 0.45,
    pre_mean: float = 22.0,
    pre_sd: float = 5.5,
    pre_post_correlation: float = 0.65,
    random_seed: int = 42,
) -> pd.DataFrame:
    """
    Generate simulated pre/post intervention data for a media literacy study.

    Parameters reflect plausible values based on the empirical literature
    on media literacy interventions (effect sizes typically d = 0.2-0.6,
    moderate pre-post correlation).

    Returns a DataFrame with columns:
    participant_id, group, pre_score, post_score, change_score
    """
    np.random.seed(random_seed)

    # Total score = 40 points (as defined in Phase 4)
    total_points = 40

    # Treatment group
    pre_treatment = np.random.normal(pre_mean, pre_sd, n_treatment)
    # Post-test includes improvement due to intervention + normal test-retest
    post_noise_treatment = np.sqrt(1 - pre_post_correlation**2) * pre_sd
    post_treatment = (
        pre_post_correlation * (pre_treatment - pre_mean)
        + (pre_mean + true_effect_d * pre_sd)
        + np.random.normal(0, post_noise_treatment, n_treatment)
    )

    # Control group (minimal change — some learning from pre-test exposure)
    pre_control = np.random.normal(pre_mean, pre_sd, n_control)
    post_noise_control = np.sqrt(1 - pre_post_correlation**2) * pre_sd
    pre_test_effect = 0.05 * pre_sd  # Small test-retest improvement
    post_control = (
        pre_post_correlation * (pre_control - pre_mean)
        + (pre_mean + pre_test_effect)
        + np.random.normal(0, post_noise_control, n_control)
    )

    # Clip to valid score range
    pre_treatment = np.clip(pre_treatment, 0, total_points)
    post_treatment = np.clip(post_treatment, 0, total_points)
    pre_control = np.clip(pre_control, 0, total_points)
    post_control = np.clip(post_control, 0, total_points)

    # Construct DataFrame
    treatment_df = pd.DataFrame({
        "participant_id": [f"T{i:03d}" for i in range(n_treatment)],
        "group": "treatment",
        "pre_score": pre_treatment,
        "post_score": post_treatment,
    })

    control_df = pd.DataFrame({
        "participant_id": [f"C{i:03d}" for i in range(n_control)],
        "group": "control",
        "pre_score": pre_control,
        "post_score": post_control,
    })

    df = pd.concat([treatment_df, control_df], ignore_index=True)
    df["change_score"] = df["post_score"] - df["pre_score"]
    df["pct_change"] = (df["change_score"] / df["pre_score"] * 100).round(2)

    logger.info(f"Generated simulated dataset: n={len(df)} ({n_treatment} treatment, {n_control} control)")
    return df


def compute_cohen_d(group1: np.ndarray, group2: np.ndarray) -> float:
    """Compute Cohen's d effect size for two independent groups."""
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    pooled_sd = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
    return (np.mean(group1) - np.mean(group2)) / pooled_sd


def compute_paired_cohen_d(pre: np.ndarray, post: np.ndarray) -> float:
    """Compute Cohen's d for a paired pre/post design."""
    diff = post - pre
    return np.mean(diff) / np.std(diff, ddof=1)


def analyze_primary_outcome(df: pd.DataFrame) -> dict:
    """
    Primary outcome analysis: test whether the intervention increased
    media literacy scores relative to the control group.

    Uses:
    1. Independent samples t-test on change scores (ANCOVA-light approach)
    2. Paired t-test within treatment group
    3. Effect size (Cohen's d)
    4. Confidence intervals
    """
    treatment = df[df["group"] == "treatment"]
    control = df[df["group"] == "control"]

    results = {}

    # --- Primary analysis: difference in change scores ---
    treatment_change = treatment["change_score"].values
    control_change = control["change_score"].values

    t_stat, p_value = stats.ttest_ind(treatment_change, control_change)
    effect_d = compute_cohen_d(treatment_change, control_change)

    # 95% confidence interval on difference in means (Welch's)
    n1, n2 = len(treatment_change), len(control_change)
    se = np.sqrt(np.var(treatment_change, ddof=1)/n1 + np.var(control_change, ddof=1)/n2)
    df_welch = (np.var(treatment_change, ddof=1)/n1 + np.var(control_change, ddof=1)/n2)**2 / (
        (np.var(treatment_change, ddof=1)/n1)**2/(n1-1) +
        (np.var(control_change, ddof=1)/n2)**2/(n2-1)
    )
    t_crit = stats.t.ppf(0.975, df_welch)
    mean_diff = np.mean(treatment_change) - np.mean(control_change)
    ci_lower = mean_diff - t_crit * se
    ci_upper = mean_diff + t_crit * se

    results["primary"] = {
        "treatment_mean_change": float(np.mean(treatment_change)),
        "treatment_sd_change": float(np.std(treatment_change, ddof=1)),
        "control_mean_change": float(np.mean(control_change)),
        "control_sd_change": float(np.std(control_change, ddof=1)),
        "mean_difference": float(mean_diff),
        "ci_95_lower": float(ci_lower),
        "ci_95_upper": float(ci_upper),
        "t_statistic": float(t_stat),
        "p_value": float(p_value),
        "cohen_d": float(effect_d),
        "effect_interpretation": (
            "negligible" if abs(effect_d) < 0.2 else
            "small" if abs(effect_d) < 0.5 else
            "medium" if abs(effect_d) < 0.8 else
            "large"
        ),
    }

    # --- Within-treatment paired analysis ---
    t_stat_paired, p_value_paired = stats.ttest_rel(
        treatment["post_score"].values,
        treatment["pre_score"].values,
    )
    paired_d = compute_paired_cohen_d(
        treatment["pre_score"].values,
        treatment["post_score"].values
    )

    results["within_treatment"] = {
        "pre_mean": float(treatment["pre_score"].mean()),
        "post_mean": float(treatment["post_score"].mean()),
        "mean_change": float(treatment["change_score"].mean()),
        "t_statistic": float(t_stat_paired),
        "p_value": float(p_value_paired),
        "cohen_d_paired": float(paired_d),
    }

    # Log results
    logger.info("\n" + "="*60)
    logger.info("PRIMARY OUTCOME ANALYSIS RESULTS")
    logger.info("="*60)
    logger.info(f"Treatment group mean change: {results['primary']['treatment_mean_change']:.2f} "
                f"(SD = {results['primary']['treatment_sd_change']:.2f})")
    logger.info(f"Control group mean change:   {results['primary']['control_mean_change']:.2f} "
                f"(SD = {results['primary']['control_sd_change']:.2f})")
    logger.info(f"Mean difference: {results['primary']['mean_difference']:.2f} "
                f"[95% CI: {results['primary']['ci_95_lower']:.2f}, "
                f"{results['primary']['ci_95_upper']:.2f}]")
    logger.info(f"t({df_welch:.1f}) = {results['primary']['t_statistic']:.3f}, "
                f"p = {results['primary']['p_value']:.4f}")
    logger.info(f"Cohen's d = {results['primary']['cohen_d']:.3f} "
                f"({results['primary']['effect_interpretation']} effect)")

    return results


def plot_pre_post_change(df: pd.DataFrame, save: bool = True):
    """Visualize pre-test and post-test scores with change scores by group."""

    fig, axes = plt.subplots(1, 3, figsize=(16, 6))

    # Plot 1: Pre-test scores by group
    for ax, (score_type, score_col, title) in zip(
        axes,
        [
            ("Pre-test", "pre_score", "Pre-test Scores by Group"),
            ("Post-test", "post_score", "Post-test Scores by Group"),
            ("Change", "change_score", "Change Scores by Group"),
        ]
    ):
        treatment_vals = df[df["group"] == "treatment"][score_col]
        control_vals = df[df["group"] == "control"][score_col]

        ax.violinplot(
            [treatment_vals.values, control_vals.values],
            positions=[1, 2],
            showmedians=True,
        )
        ax.scatter(
            [1 + np.random.uniform(-0.05, 0.05, len(treatment_vals))],
            treatment_vals.values,
            alpha=0.4, s=20, color="#2196F3",
        )
        ax.scatter(
            [2 + np.random.uniform(-0.05, 0.05, len(control_vals))],
            control_vals.values,
            alpha=0.4, s=20, color="#F44336",
        )
        ax.set_xticks([1, 2])
        ax.set_xticklabels(["Treatment", "Control"])
        ax.set_title(title, fontsize=11, fontweight="bold")
        ax.set_ylabel("Score (out of 40)" if score_type != "Change" else "Score Change")

    plt.suptitle(
        "Media Literacy Intervention: Pre/Post Score Distributions",
        fontsize=13, fontweight="bold", y=1.02
    )
    plt.tight_layout()

    if save:
        plt.savefig(FIGURE_DIR / "pre_post_distributions.png", dpi=150, bbox_inches="tight")
        logger.info("Saved pre/post distribution plot")
    plt.show()


def plot_individual_trajectories(df: pd.DataFrame, n_sample: int = 30):
    """Plot individual score trajectories for a sample of participants."""
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))

    for ax, group in zip(axes, ["treatment", "control"]):
        group_df = df[df["group"] == group].sample(
            min(n_sample, len(df[df["group"] == group])),
            random_state=42
        )

        color = "#2196F3" if group == "treatment" else "#F44336"
        improved = (group_df["change_score"] > 0).sum()
        declined = (group_df["change_score"] < 0).sum()
        n = len(group_df)

        for _, row in group_df.iterrows():
            lw = 1.5
            alpha = 0.4
            line_color = (
                "#1565C0" if row["change_score"] > 2 else
                "#EF9A9A" if row["change_score"] < -2 else
                "#90CAF9"
            )
            ax.plot([0, 1], [row["pre_score"], row["post_score"]],
                    color=line_color, alpha=alpha, lw=lw)

        ax.set_xticks([0, 1])
        ax.set_xticklabels(["Pre-test", "Post-test"])
        ax.set_ylabel("Score (out of 40)")
        ax.set_ylim(0, 42)
        ax.set_title(
            f"{group.capitalize()} Group (n={n})\n"
            f"Improved: {improved} ({improved/n*100:.0f}%), "
            f"Declined: {declined} ({declined/n*100:.0f}%)",
            fontsize=11, fontweight="bold"
        )

    plt.suptitle("Individual Score Trajectories", fontsize=13, fontweight="bold")
    plt.tight_layout()
    plt.savefig(FIGURE_DIR / "individual_trajectories.png", dpi=150, bbox_inches="tight")
    plt.show()


def subgroup_analysis(df: pd.DataFrame, subgroup_col: str = "pre_score_tertile") -> pd.DataFrame:
    """
    Exploratory subgroup analysis: did the intervention work equally well
    for participants with different baseline knowledge levels?

    This analysis is exploratory and should be interpreted cautiously
    (not as a primary outcome) due to multiple comparison concerns.
    """
    df_copy = df.copy()

    # Create pre-score tertiles
    df_copy["pre_score_tertile"] = pd.qcut(
        df_copy["pre_score"],
        q=3,
        labels=["Low baseline", "Medium baseline", "High baseline"]
    )

    results = []
    for group in ["treatment", "control"]:
        for tertile in ["Low baseline", "Medium baseline", "High baseline"]:
            mask = (df_copy["group"] == group) & (df_copy["pre_score_tertile"] == tertile)
            subset = df_copy[mask]
            if len(subset) > 0:
                results.append({
                    "group": group,
                    "baseline_level": tertile,
                    "n": len(subset),
                    "mean_change": subset["change_score"].mean(),
                    "sd_change": subset["change_score"].std(),
                    "pre_mean": subset["pre_score"].mean(),
                    "post_mean": subset["post_score"].mean(),
                })

    results_df = pd.DataFrame(results)

    # Visualize
    treatment_results = results_df[results_df["group"] == "treatment"]
    control_results = results_df[results_df["group"] == "control"]

    fig, ax = plt.subplots(figsize=(9, 6))
    x = np.arange(3)
    width = 0.35

    bars1 = ax.bar(
        x - width/2,
        treatment_results["mean_change"].values,
        width,
        label="Treatment",
        color="#2196F3",
        alpha=0.85,
        yerr=treatment_results["sd_change"].values / np.sqrt(treatment_results["n"].values),
        capsize=5,
    )
    bars2 = ax.bar(
        x + width/2,
        control_results["mean_change"].values,
        width,
        label="Control",
        color="#F44336",
        alpha=0.85,
        yerr=control_results["sd_change"].values / np.sqrt(control_results["n"].values),
        capsize=5,
    )

    ax.set_xticks(x)
    ax.set_xticklabels(["Low Baseline", "Medium Baseline", "High Baseline"])
    ax.set_ylabel("Mean Change Score (with SE)", fontsize=12)
    ax.set_title("Intervention Effect by Baseline Knowledge Level\n(Exploratory Subgroup Analysis)", fontsize=12, fontweight="bold")
    ax.legend()
    ax.axhline(y=0, color="black", linewidth=0.8)
    ax.annotate(
        "Note: Subgroup analyses are exploratory.\nInterpret with caution.",
        xy=(0.98, 0.02),
        xycoords="axes fraction",
        ha="right",
        va="bottom",
        fontsize=8,
        color="gray",
    )

    plt.tight_layout()
    plt.savefig(FIGURE_DIR / "subgroup_analysis.png", dpi=150, bbox_inches="tight")
    plt.show()

    return results_df


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    # Generate simulated data
    df = generate_simulated_data(
        n_treatment=45,
        n_control=43,
        true_effect_d=0.45,
    )

    # Run power analysis
    from power_analysis import compute_required_sample_size, plot_power_curves
    n_required = compute_required_sample_size(effect_size_d=0.40)
    plot_power_curves()

    # Primary outcome analysis
    results = analyze_primary_outcome(df)

    # Visualizations
    plot_pre_post_change(df)
    plot_individual_trajectories(df)
    subgroup_df = subgroup_analysis(df)

    print("\nSubgroup Analysis Results:")
    print(subgroup_df.to_string(index=False))

Phase 6: Refinement and Reporting

Interpreting Your Results Honestly

One of the most important things this project tests is your intellectual honesty in reporting findings. Apply these standards:

Report effect sizes alongside p-values. A statistically significant result with a tiny effect size (d = 0.05) may not be educationally meaningful. A non-significant result with a moderate effect size (d = 0.35) in a small sample may be worth replicating at scale. P-values alone are not sufficient.

Report null results without minimizing them. If your analysis finds no significant effect, that is a genuine finding. Describe it clearly, discuss plausible explanations (insufficient dosage, ceiling effect, measurement problems, sample size limitations), and recommend specific modifications based on what you observe.

Distinguish pre-specified from exploratory analyses. If you test five different outcomes and one is significant, the probability that this is a false positive is much higher than the stated alpha level. Label exploratory analyses clearly and interpret them with appropriate caution.

Acknowledge limitations specifically. Not "this study has limitations" but "the primary limitation of this study is that the sample was recruited from a single undergraduate psychology course at a selective university, limiting generalizability to the intended target audience of high school students."


Deliverables Checklist

  • [ ] Needs assessment literature review (1,500–2,500 words, minimum 15 peer-reviewed sources)
  • [ ] Theory of change document with logic model diagram
  • [ ] Complete five-session curriculum with facilitator guides and participant handouts
  • [ ] Pre/post assessment instrument (30 items with answer key)
  • [ ] Evaluation design document describing research design, randomization, power analysis, and outcome measures
  • [ ] Analysis notebook (using the code in Phase 5, either with simulated data or actual collected data)
  • [ ] Figures: power curves, pre/post distributions, individual trajectories, subgroup analysis
  • [ ] Final report (3,500–5,000 words following the structure below)
  • [ ] Ethical analysis section addressing the ethics of epistemic intervention

Final Report Structure

  1. Abstract (250 words): Research question, target audience, design, key findings, implications
  2. Introduction (500–700 words): Significance of media literacy education, gap in existing work, this project's contribution
  3. Needs Assessment (500–700 words): Target audience profile, knowledge gaps identified, theory of change
  4. Curriculum Design (600–800 words): Theoretical frameworks applied, curriculum overview, design decisions and rationale
  5. Evaluation Design (400–500 words): Research design, power analysis, outcome measures, limitations
  6. Results (500–700 words): Quantitative findings with appropriate statistics, effect sizes, and confidence intervals
  7. Discussion (400–600 words): Interpretation of findings, comparison with prior literature, limitations, recommendations
  8. Ethical Considerations (300–400 words): Who benefits and who might be harmed? What assumptions does this intervention make about what constitutes good epistemic practice? How should the intervention handle politically contested empirical claims?
  9. Conclusion (200–300 words): Summary and implications for practice and future research

Grading Rubric

Criterion Description Points
1. Needs Assessment Quality Literature review is thorough and well-synthesized. Theory of change is specific and well-grounded. Target audience is well-defined. 0–10
2. Theoretical Grounding Curriculum design is explicitly connected to at least two theoretical frameworks. Connections are specific, not generic. 0–10
3. Curriculum Quality All five sessions are fully developed. Activities are specific, relevant, and appropriately challenging. Facilitator guides are detailed and useful. 0–10
4. Assessment Instrument Items are well-constructed, cover the learning objectives, and are aligned with the theoretical framework. Difficulty level is appropriate. 0–10
5. Evaluation Design Rigor Research design adequately addresses confounds. Power analysis conducted correctly. Randomization plan is feasible. 0–10
6. Statistical Analysis Correct statistical tests applied. Effect sizes computed and interpreted. Confidence intervals reported. 0–10
7. Visualization Quality Figures are clear, accurately labeled, and support the narrative. Appropriate figure types chosen for data. 0–10
8. Intellectual Honesty Results reported accurately including null findings. Limitations described specifically. Exploratory analyses distinguished from confirmatory. 0–10
9. Ethical Analysis Substantive engagement with the ethics of epistemic intervention. Potential harms considered. Value assumptions examined. 0–10
10. Report Quality Report is well-organized, clearly written, and appropriate for a professional audience. Abstract accurately represents the work. 0–10

Total: 100 points


Environment Setup

pip install pandas numpy scipy matplotlib seaborn
pip install jupyter notebook ipykernel
pip install scikit-learn
pip install statsmodels  # For more advanced regression analyses
pip install pingouin    # Cleaner statistical output with effect sizes

For optional advanced analyses:

pip install pymc      # Bayesian analysis (for the ambitious)
pip install rpy2      # R interface for mixed-effects models (if R is installed)