Capstone 2 Grading Rubric: The Misinformation Tracker

Capstone 2 Grading Rubric: The Misinformation Tracker

Total points: 100 Deliverable weights: D1: 15 pts | D2: 25 pts | D3: 25 pts | D4: 20 pts | D5: 15 pts

Deliverable 1: Claim Taxonomy and Intake Design (15 points)

Component 1A: Claim Taxonomy Decision Tree (5 points)

Score	Criteria
5	Decision tree is clearly specified with four distinct, operationalizable steps. Each step has a concrete decision criterion (not just a name). Both example claims (one that clears the filter, one that does not) are specific and well-chosen, with reasoning that demonstrates understanding of what makes a claim "verifiable," "material," and "attributable."
4	Decision tree is complete and mostly clear. Examples are specific and appropriate. Minor ambiguity in one or two decision criteria.
3	Decision tree covers the essential steps but at least one step lacks a concrete decision criterion. Examples are adequate but generic or underdeveloped.
2	Decision tree is incomplete (missing one or more steps) or decision criteria are circular or vague. Examples are poorly chosen or not connected to the criteria.
1	Decision tree is present but substantially incomplete or incorrect. Little evidence of engagement with what makes claims verifiable.
0	Not submitted or entirely off-task.

Component 1B: Rating Rubric Operationalization (5 points)

Score	Criteria
5	A1-A4 and I1-I3 scales are each given concrete, distinguishable decision criteria. For each accuracy level, the student specifies what type of evidence would support that rating and — critically — what distinguishes adjacent levels (A1 from A2, A2 from A3, etc.). Impact levels are given quantitative or semi-quantitative criteria (e.g., specific impression thresholds or source-type criteria) rather than subjective descriptions.
4	Rating criteria are present and mostly distinguishable. Adjacent level differentiation is addressed for most pairs. Impact levels are grounded in evidence considerations.
3	Rating criteria cover all levels but fail to adequately distinguish adjacent levels. "High impact" and "medium impact" may be defined in ways that overlap.
2	Criteria present but not operationalized — definitions describe the category name rather than providing decision rules.
1	Rating rubric is substantially underdeveloped.
0	Not present.

Component 1C: Nonpartisanship Protocol (3 points)

Score	Criteria
3	At least three specific procedural mechanisms are identified (not just principles or intentions). Each procedure addresses a distinct potential source of bias (e.g., intake selection bias, rating bias, impact assessment bias). Procedures are concrete enough that an external reviewer could audit compliance.
2	Two to three procedures are identified, but at least one is vague or duplicative. Mechanisms are stated at the level of intention rather than procedure.
1	One or two generic principles stated but no specific procedural mechanisms.
0	Not present or entirely principled without procedural content.

Component 1D: Limitations Statement (2 points)

Score	Criteria
2	Limitations statement is honest and specific: it identifies what the tracker cannot do (not just what it chooses not to do), why those limitations exist (methodological or resource constraints), and what this means for how readers should interpret the tracker's outputs. Written in accessible language suitable for a general audience. 200-300 words.
1	Limitations are present but generic or overly formal. Reads as a legal disclaimer rather than a substantive communication to readers.
0	Not present.

Deliverable 2: Spread Analysis for Two Claims (25 points)

Graded on each of the five components for each claim (5 components × 5 points = 25 points total, averaged across both claims)

Component 2A: Claim Documentation (5 points, avg. across both claims)

Score	Criteria
5	Both analyses begin with exact claim text (quoted, not paraphrased), specific attribution (not just "Whitfield said" but specific venue, date, and format), and source type classification with justification. The documentation is precise enough that an independent analyst could verify the claim's origin.
4	Documentation is complete for both claims with minor gaps (e.g., date is approximate rather than exact, or source type classification lacks justification).
3	Documentation is complete for one claim and partial for the other.
2	Documentation is partial for both claims, with missing attribution details or imprecise claim text.
1	Documentation is present but inadequate — student has paraphrased rather than quoted, or attribution is vague.
0	Not present.

Component 2B: Verification Analysis (5 points, avg. across both claims)

Score	Criteria
5	Primary sources relevant to the claim are identified and specifically cited (not just named). The verification analysis distinguishes between what the sources directly show and what they allow the analyst to infer. The A1-A4 rating is assigned with explicit reference to the rating criteria from Deliverable 1. A "what would change this rating" statement specifies the evidence type that would warrant a different rating — not just "new evidence" but what kind of evidence.
4	Verification is grounded in specific sources. Rating is assigned with justification. "What would change this" is present but could be more specific.
3	Verification analysis is present and reaches a reasonable conclusion, but the connection between the evidence and the rating is not fully articulated. Either sources are named without specification or the rating rationale is not connected to the rubric criteria.
2	Verification present but superficial — the student has named relevant topics but not engaged with actual evidence.
1	Verification analysis is substantially absent or the rating is assigned without any evidence-based justification.
0	Not present.

Component 2C: Spread Pathway Analysis (5 points, avg. across both claims)

Score	Criteria
5	Five-stage pathway model is applied to both claims with specific, data-supported content at each stage. Student uses `oda_media.csv` data to identify specific articles, sources, or patterns at each pathway stage rather than describing the pathway in general terms. Where data is unavailable for a stage, the student notes this limitation explicitly.
4	Pathway model applied to both claims with some data support. One or two stages are described without specific data evidence.
3	Pathway model applied but primarily descriptively without data connection. Student describes what "probably happened" rather than what the data shows.
2	Pathway model is mentioned but not meaningfully applied to the specific claims. Generic description of how misinformation spreads.
1	Little evidence of pathway analysis.
0	Not present.

Component 2D: Correction Gap Estimate (5 points, avg. across both claims)

Score	Criteria
5	Correction gap is estimated for both claims using a transparent method: original reach estimate with data source, correction reach estimate with data source, and a correction coverage ratio. The student acknowledges uncertainty in these estimates. The student draws a conclusion about the correction gap's significance for each claim rather than just reporting numbers.
4	Correction gap estimated with reasonable methodology. Uncertainty acknowledged. One claim may lack a specific correction reach estimate (acceptable if the student notes that no corrections were found in the dataset and discusses implications).
3	Correction gap is discussed but estimated imprecisely or without clear methodology. Student compares "big" vs. "small" without specific estimates.
2	Correction gap is mentioned but not estimated or analyzed.
1	Correction gap concept is not engaged with.
0	Not present.

Component 2E: Impact Rating (5 points, avg. across both claims)

Score	Criteria
5	I1-I3 rating assigned for both claims with explicit justification connected to the student's own rubric criteria from Deliverable 1. If the student identifies a discrepancy between the claims' impact (e.g., one is I1 and the other is I2), they explain the factors driving the difference.
4	Impact rating assigned with justification for both claims. Justification is connected to evidence (data on reach, source type) but not fully connected to Deliverable 1 criteria.
3	Impact rating assigned for both claims but justification is largely assertion-based.
2	Impact rating assigned without meaningful justification.
1	Impact rating is absent or entirely unjustified.
0	Not present.

Deliverable 3: Automated Detection Pipeline (25 points)

Component 3A: VADER Sentiment Flagging (6 points)

Score	Criteria
6	VADER implementation is syntactically correct and produces interpretable results. The student reports: total articles flagged, precision against fact-checker-labeled ground truth (i.e., what fraction of flagged articles are labeled misinformation?), and a brief interpretation of what the precision estimate means for the pipeline's usefulness. Code is clearly commented.
5	Implementation correct, results reported, interpretation present. Minor issues with precision calculation or interpretation.
4	Implementation correct. Results reported (flag count). Precision calculation is absent or incorrect.
3	VADER implemented but threshold application or watchlist construction has errors. Results are reported but may reflect the error.
2	VADER code is present but does not run correctly or produces nonsensical results without the student noticing.
1	VADER is attempted but code is non-functional.
0	Not present.

Component 3B: TF-IDF Classifier (10 points)

Score	Criteria
10	Classifier implementation is complete and correct: binary labels are correctly created from `factcheck_rating`, a TF-IDF vectorizer is fit on training data only (not the full dataset), cross-validation reports F1, precision, and recall for the misinformation class (not just overall accuracy), and top 10 features for each class are listed and briefly interpreted (do the features make intuitive sense?). Class imbalance is addressed (e.g., `class_weight='balanced'`).
9	All components present. Minor issue: one metric missing, features listed without interpretation, or class imbalance not addressed.
8	Cross-validation and features present. Metrics reported but not well interpreted. Class imbalance not addressed.
7	Classifier functional. Cross-validation present. Either feature analysis or class imbalance handling is missing.
6	Classifier functional but cross-validation methodology has a flaw (e.g., fitting vectorizer on all data before split). Results are reported.
5	Classifier partially functional. Results present but affected by methodological errors.
3-4	Classifier attempted but non-functional or substantially incorrect.
1-2	Classifier code is present but does not run.
0	Not present.

Component 3C: False Positive Analysis (5 points)

Score	Criteria
5	Student manually reviews 20 high-probability classifier flags, correctly distinguishes true positives from false positives and meta-coverage (articles about misinformation, not propagating it), and reports a precision estimate with appropriate uncertainty acknowledgment. The student discusses what types of content are generating false positives and what that implies for the pipeline.
4	Manual review conducted. Precision estimate reported. False positive types discussed but without meta-coverage distinction.
3	Manual review conducted. Precision estimate reported without meaningful discussion of false positive patterns.
2	Manual review claimed but methodology unclear. No systematic coding.
1	False positive analysis is purely theoretical — student discusses the concept but does not apply it to their classifier's output.
0	Not present.

Component 3D: Source Credibility Downweighting (4 points)

Score	Criteria
4	Source credibility adjustment is implemented using `source_type`. Student demonstrates how the adjustment changes flag rates for different source types (e.g., showing flag rate before and after adjustment by source_type). Interpretation discusses whether the adjustment seems to be working as intended.
3	Adjustment implemented. Flag rate changes shown. Minimal interpretation.
2	Adjustment attempted but implementation has errors (e.g., adjusting the wrong direction, or adjusting all sources equally).
1	Adjustment discussed but not implemented.
0	Not present.

Deliverable 4: Draft Public Rating (20 points)

Component 4A: Rating Card (5 points)

Score	Criteria
5	Rating card contains all required elements: claim text (verbatim), attribution, accuracy rating (A1-A4), impact rating (I1-I3), and plain-language explanation (150 words or fewer). The plain-language explanation is genuinely accessible — it explains the rating in terms a general reader can understand without specialized vocabulary, while still conveying the essential evidence basis for the rating.
4	All elements present. Plain-language explanation is mostly accessible but uses one or two terms that require political science background.
3	All elements present. Explanation is partially accessible but relies on analytical vocabulary not appropriate for a general audience.
2	One or more required elements missing.
1	Rating card is substantially incomplete.
0	Not present.

Component 4B: Methodology Details (8 points)

Score	Criteria
8	Evidence chain is complete and transparent: all primary sources cited with specifics (not just named), all sources together form a coherent evidentiary argument for the rating, expert consultation is realistic and adds analytical value (not just confirmation of a pre-existing conclusion), campaign response is included and addressed (even if fictional, it should engage with a realistic objection), and the "what would change this rating" statement is specific about evidence type.
7	Evidence chain complete. One element (expert consultation or campaign response) is thin but present. "What would change" is present but could be more specific.
6	Evidence chain covers the essentials but is missing one component or one component is superficial.
5	Evidence chain is present but significant gaps: sources are named without specifics, or the argument from evidence to rating is not fully articulated.
4	Evidence chain is present but underdeveloped. Expert consultation or campaign response is absent.
3-2	Methodology details are substantially incomplete.
1	Minimal methodology detail present.
0	Not present.

Component 4C: Spread Data Section (4 points)

Score	Criteria
4	Reach estimate is provided with a transparent method (not just a number, but how it was estimated). Correction reach is estimated or explicitly noted as unmeasured (with explanation of why). Correction gap statement is specific about what the gap means for public information about this claim.
3	Reach and correction reach estimated. Correction gap noted. Method not fully transparent.
2	Estimates present but without methodological basis.
1	Spread data section is largely absent or purely generic.
0	Not present.

Component 4D: Internal Reviewer Note (3 points)

Score	Criteria
3	Internal note identifies a genuine judgment call in the rating (a point where the evidence was ambiguous, two reasonable analysts might disagree, or the rating criteria didn't cleanly apply). Student explains how they resolved the judgment call and acknowledges residual uncertainty. Note demonstrates intellectual honesty about the limits of the rating.
2	Judgment call identified but not developed. Resolution explained without acknowledgment of residual uncertainty.
1	Reviewer note present but identifies only obvious choices rather than genuine difficulty.
0	Not present.

Deliverable 5: Ethics and Equity Audit Report (15 points)

Component 5A: Accountability Structure (3 points)

Score	Criteria
3	At least two distinct accountability mechanisms beyond self-review, with specific descriptions of how each mechanism works, who is involved, and how it would catch errors that the lead analyst might miss. The mechanisms are complementary (addressing different failure modes) rather than redundant.
2	Two mechanisms described but one is vague or they overlap significantly.
1	One mechanism described, or both are described in general terms without specifics.
0	Not present.

Component 5B: Asymmetric Amplification Assessment (3 points)

Score	Criteria
3	Assessment correctly identifies whether the specific claim poses an amplification risk by assessing: how widely the claim was already known before the fact-check, whether the tracker's minimum-reach threshold policy applies, and what the expected net effect of the fact-check (debunking minus amplification) is likely to be. Conclusion is specific to the claim rather than generic.
2	Assessment addresses amplification risk but does not connect it specifically to the claim's reach profile or the minimum-reach threshold policy.
1	Amplification risk is mentioned but the assessment is generic (describes the concept without applying it to the specific claim).
0	Not present.

Component 5C: Free Speech Boundary (3 points)

Score	Criteria
3	Example is correctly identified as falling outside the tracker's scope with a clear, specific reason (opinion vs. verifiable claim, genuine expert disagreement, values claim, etc.). The reasoning demonstrates understanding of the distinction between what can be rated as false and what cannot — and why this boundary matters for the tracker's legitimacy.
2	Example is plausible but the boundary reasoning is incomplete or slightly misdirected.
1	Example is identified but the reasoning is essentially "this is too controversial" rather than a principled boundary argument.
0	Not present.

Component 5D: Equity Audit (4 points)

Score	Criteria
4	Two genuine equity gaps are identified (not just the language access and source coverage gaps described in the main text — student should identify gaps in their own tracker design). Each gap is specific: what community is affected, how the gap manifests in the methodology, and what the consequence is for that community's representation. Each remediation is concrete and feasible.
3	Two gaps identified. One is well-developed with a specific remediation; the other is present but less developed.
2	One gap with a specific remediation, or two gaps without substantive remediations.
1	Equity audit is present but addresses only the gaps explicitly identified in the main capstone text without applying the framework to the student's own design.
0	Not present.

Component 5E: Campaign Pushback Response (2 points)

Score	Criteria
2	Response correctly: (a) takes the new evidence seriously rather than dismissing it; (b) assesses whether the criminologist's characterization ("genuinely complex") constitutes evidence that would change the rating (it likely does not on its own, but the student should reason through why); (c) maintains the rating if the evidence doesn't change it, with clear justification; (d) invites the campaign to provide specific documentation if they have additional evidence. Response is professional, not defensive.
1	Response is present but either too capitulating (suggesting the rating will be reconsidered without evidence) or too dismissive (not engaging with the new information at all).
0	Not present or entirely non-responsive.

Grade Thresholds

Score Range	Grade
90-100	A
80-89	B
70-79	C
60-69	D
Below 60	F

Notes for Graders

On the Python deliverable: Grade the code on whether it runs correctly and whether the results are interpreted accurately. Students who write clean, commented code that reaches a correct conclusion should receive full marks for the relevant components even if there are minor stylistic differences from the model solution. Students who write code that does not run correctly but demonstrates clear conceptual understanding may receive partial credit (up to 60% of component points) if their methodology notes show they understood the correct approach.

On claim ratings: There is reasonable latitude in how claims are rated at the margin. A claim rated A2 where the model answer is A1 is not automatically wrong — if the student's reasoning for A2 is coherent and grounded in the evidence, give full methodological credit. The most penalizable errors are: rating opinions or arguments as misinformation, failing to distinguish adjacent rating levels, and applying different evidence thresholds to claims from different campaigns.

On equity: The equity audit is the deliverable where students most commonly lose points by treating it as a formality. The best audits identify genuine tensions in the methodology — places where the tracker's design creates equity gaps — and propose remediations that require actual methodological change, not just policy statements.

Extra credit: Students who complete the optional LDA topic modeling component in Deliverable 3 may receive up to 3 extra credit points. Award points based on: correct implementation (1 pt), plausible topic identification (1 pt), and insightful interpretation (1 pt).