Capstone Project 3: Social Justice Data Audit

Contributors

Project Overview

You are a research analyst at a policy think tank. Your organization has been asked to investigate whether a particular system — educational, criminal justice, employment, housing, or lending — shows evidence of disparities or bias that could affect specific groups. Your job is to let the data speak, honestly and rigorously, and to present your findings in a way that informs policy without oversimplifying the picture.

This is not advocacy dressed up as analysis. It's genuine statistical investigation applied to questions of fairness and equity. You may find evidence of significant disparities, or you may find that the data tells a more nuanced story than expected. Either outcome is valuable. The point is to do the work with integrity.

This project requires every major technique from the course — descriptive statistics, visualization, inference, regression, and ethical reasoning — applied to one of the most consequential areas where statistics meets the real world.

Estimated time: 15-25 hours over 2-3 weeks

Deliverables: 1. A Jupyter notebook containing all code, analysis, and narrative (the "technical report") 2. A 3-page policy brief written for legislators or administrators 3. A 1-page ethical reflection

Step 1: Choose Your Dataset and Investigation Question

Select a publicly available dataset related to social equity from one of the following domains (or propose an alternative with your instructor's approval):

Domain 1: Education - Civil Rights Data Collection (CRDC): U.S. Department of Education data on school discipline, access to advanced courses, teacher quality, and resource allocation — broken down by race, gender, and disability status. - National Center for Education Statistics (NCES): Graduation rates, test scores, school funding, and demographics. - College Scorecard: Post-graduation earnings, debt levels, and completion rates by institution.

Investigation examples: - Do suspension rates differ significantly by race after controlling for school size and poverty level? - Is there an association between the percentage of students of color in a school and access to AP courses? - Do students from different income backgrounds have significantly different loan repayment rates, even at similar types of institutions?

Domain 2: Criminal Justice - Stanford Open Policing Project: Traffic stop data from multiple states, including driver demographics and stop outcomes. - The Sentencing Project / U.S. Sentencing Commission: Federal sentencing data with demographic variables. - Local police department open data: Many cities publish arrest, use-of-force, or complaint data.

Investigation examples: - Are drivers of certain racial groups searched at significantly higher rates during traffic stops, even after controlling for stop reason and location? - Is there a significant difference in sentence length by race for similar offenses, controlling for criminal history? - Do complaint rates against officers vary by precinct demographics?

Domain 3: Employment and Hiring - Bureau of Labor Statistics / Current Population Survey: Employment rates, wages, and occupational data by demographics. - EEOC charge data: Discrimination complaint data by type and basis. - Glassdoor or PayScale salary data (publicly available subsets).

Investigation examples: - Is there a significant gender pay gap in a specific industry after controlling for experience, education, and job level? - Do callback rates for job applications differ by applicant name characteristics? (audit study datasets) - Is there an association between workforce diversity and company performance metrics?

Domain 4: Housing and Lending - Home Mortgage Disclosure Act (HMDA) data: Mortgage application outcomes by race, income, and geography. - HUD Fair Housing complaints: Discrimination complaint data. - Zillow / Redfin open data: Housing prices and neighborhood demographics.

Investigation examples: - Are mortgage denial rates significantly higher for minority applicants after controlling for income, credit score, and loan-to-value ratio? - Is there a significant correlation between neighborhood racial composition and home value appreciation over the past decade? - Do housing code violation rates differ by neighborhood demographics?

Your investigation question must: - Involve a clear comparison between groups defined by a protected or socially relevant characteristic (race, gender, income, disability, geography, etc.) - Be answerable with the data available — don't claim to measure what the data doesn't contain - Be framed neutrally: you're investigating whether a disparity exists, not assuming it does - Require both descriptive and inferential methods

Deliverable for this step: A 300-400 word statement including: - Your investigation question - The dataset you've chosen and its source - Why this question matters for policy - Your initial hypothesis - A brief acknowledgment of what the data can and cannot tell you (e.g., "this data can show whether disparities exist in outcomes but cannot by itself prove intentional discrimination")

Step 2: Data Acquisition and Preparation

Download your dataset and prepare it for analysis. Every step must be documented in your Jupyter notebook.

Required tasks:

Load and inspect the data. Report dimensions, variable types, and summary statistics. Pay special attention to demographic variables: How are race, gender, income, or other group identifiers coded? Are categories granular enough or too aggregated?
Create a data dictionary. For each variable: - Variable name and description - Type (categorical or numerical) - Coding scheme (especially for demographic variables — note any limitations, e.g., "race is coded as five categories; multiracial individuals are classified as 'other'") - Source and collection method
Assess data quality with an equity lens. In addition to standard quality checks, address: - Are any demographic groups systematically underrepresented in the data? (Small cell sizes can make inference unreliable for some groups.) - Are there variables that serve as proxies for protected characteristics? (e.g., ZIP code as a proxy for race) - Is the data aggregated in ways that might mask within-group variation? (ecological fallacy risk)
Clean the data. Handle missing values, inconsistencies, and outliers. Document every decision with special attention to: - Whether missing data patterns differ by group (differential missingness can introduce bias) - Whether removing outliers disproportionately affects certain groups - Whether your cleaning decisions change the demographic composition of the analysis sample
Create analysis-ready comparison groups. Define the groups you'll compare and report the sample size in each group. If any group has fewer than 30 observations, note this as a limitation.

Rubric focus for this step: Data handling, ethics.

Step 3: Exploratory Data Analysis

Explore the data with a focus on understanding group differences and the structure of potential disparities.

Required elements:

Demographic profile. Describe the composition of your dataset: - Frequency tables and bar charts for demographic variables - Cross-tabulations showing the intersection of key demographics (e.g., race by gender, race by income category) - Note any groups with very small sample sizes
Outcome variable exploration. For the main outcome you're investigating: - Overall distribution (histogram or bar chart) - Distribution broken down by group (side-by-side box plots, grouped bar charts, density plots by group) - Group-level descriptive statistics (mean, median, standard deviation for numerical outcomes; proportions for categorical outcomes)
Potential confounders. Identify variables that might explain group differences through legitimate, non-discriminatory channels: - Visualize how potential confounders are distributed across your comparison groups - Create a correlation matrix or contingency table showing relationships among key variables - Discuss which confounders you can control for and which you can't
Initial disparity assessment. Before running formal tests, describe what the raw data suggests: - What's the observed difference in outcomes between groups? - Does the difference look large or small? - Could confounders explain the difference?
EDA narrative. Write 2-3 paragraphs summarizing the landscape of your data. Describe what patterns you observe, what concerns you have about the data, and what the EDA suggests about your research question.

Rubric focus for this step: Visualization, interpretation.

Step 4: Statistical Analysis

Conduct a rigorous analysis of group disparities. You must include all four of the following analysis components:

4A: Unadjusted Group Comparison

Compare outcomes across groups without controlling for confounders. This establishes the baseline disparity.

Required elements: - State null and alternative hypotheses - Choose and justify the appropriate test: - Two-sample t-test or Wilcoxon rank-sum for numerical outcomes (two groups) - ANOVA or Kruskal-Wallis for numerical outcomes (three or more groups) - Two-proportion z-test or chi-square test for categorical outcomes - Report the test statistic, p-value, and confidence interval for the difference - Calculate and interpret the effect size. This is essential for disparity analysis — a statistically significant difference might be trivially small, or a non-significant result might reflect a meaningful disparity in an underpowered sample. - Cohen's d for mean differences - Cramer's V for categorical associations - Risk ratios or odds ratios for proportion comparisons - Interpret the result: What is the magnitude of the unadjusted disparity?

4B: Adjusted Analysis (Controlling for Confounders)

Repeat the analysis while controlling for legitimate confounding variables. This is the critical step that separates a naive group comparison from a rigorous one.

Required elements: - Build a regression model (linear or logistic, as appropriate) with the group variable and at least 2-3 potential confounders as predictors - Interpret the coefficient on the group variable: Does the disparity persist after controlling for confounders? How much does it change? - Report model diagnostics (r-squared or pseudo-r-squared, residual plots, VIF for multicollinearity) - Compare the adjusted and unadjusted results explicitly: "The unadjusted difference in [outcome] between [Group A] and [Group B] was [X]. After controlling for [confounders], the difference was [Y], a [reduction/increase] of [Z%]." - Discuss which confounders had the largest impact and why

4C: Effect Size and Practical Significance

Dedicate explicit attention to the magnitude of any disparities found, separate from their statistical significance.

Required elements: - Report all effect sizes with confidence intervals - Contextualize the effect: Is this a small, medium, or large disparity by conventional standards? What does it mean in real-world units? - Calculate the practical impact: If the disparity were eliminated, how many people would be affected? What would change in dollar terms, percentage points, or other meaningful units? - Discuss whether the sample size was adequate to detect the effect size you observed (brief power analysis) - If the result is not statistically significant, discuss whether this means "no disparity" or "insufficient evidence to detect a disparity" — these are very different conclusions

4D: Robustness Checks

Test whether your findings hold up under different analytical choices.

Required elements (choose at least two): - Alternative test: If you used a parametric test, also run the nonparametric equivalent (or vice versa). Do the conclusions change? - Subgroup analysis: Does the disparity vary across subgroups? (e.g., does a racial disparity in sentencing look different for drug offenses vs. violent offenses?) - Bootstrap analysis: Use bootstrap methods to construct confidence intervals for the disparity. Compare to your parametric results. - Sensitivity to outliers: Remove extreme values and re-run the analysis. Are results driven by a handful of unusual cases? - Alternative confounders: Add or remove control variables in your regression model. Is the group coefficient stable across specifications?

Rubric focus for this step: Statistical analysis, question formulation.

Step 5: Ethical Framework and Reflection

This project demands more ethical reflection than the others, because you're analyzing data about real disparities that affect real people's lives.

Required elements (1 page minimum):

Framing and language. How do you describe the groups you're comparing and the disparities you've found? Language choices matter. Discuss at least one framing decision you made and why. (For example: "disparity" vs. "gap" vs. "inequality" vs. "bias" — these words carry different implications.)
What the data can and cannot show. Clearly state whether your analysis can support causal claims. If you found that Group A has worse outcomes than Group B even after controlling for confounders, can you conclude discrimination? What alternative explanations remain? (Omitted variable bias, historical factors, measurement error in confounders, etc.)
The ecological fallacy. If you're working with aggregate data (e.g., county-level or school-level), address whether group-level patterns can be applied to individuals. A county with more diverse schools and higher test scores does not mean that individual diverse students score higher.
Potential for harm. Consider how your findings could be misused: - Could group-level statistics be used to stereotype individuals? - Could your findings be taken out of context to argue against policies that help disadvantaged groups? - Could your analysis inadvertently reinforce the idea that certain groups are inherently deficient, rather than highlighting systemic barriers?
Researcher positionality. Briefly reflect on your own position relative to the communities in your data. How might your background affect what questions you ask, how you interpret ambiguous results, and what you choose to emphasize?
Recommendations for responsible use. Given everything above, how should your findings be used? What caveats would you want policymakers to understand?

Rubric focus for this step: Ethics.

Step 6: Policy Brief

Write a 3-page (approximately 1,200-1,500 word) policy brief presenting your findings to legislators, school board members, or administrators.

Policy brief format:

Summary box (top of page 1): A 50-word summary of the key finding and recommendation, set off in a bordered text box. This is what a busy legislator will read. Make it count.
Background. Why does this issue matter? What is the current state of policy? (2-3 paragraphs, citing relevant context)
What the data shows. Present your key findings in clear, non-technical language. Include 2-3 well-designed visualizations. Focus on effect sizes and practical significance, not p-values.
What the data does not show. Explicitly state the limitations of your analysis. What questions remain unanswered? What additional data would be needed?
Policy recommendations. Based on your findings, what actions do you recommend? Be specific and realistic. Distinguish between recommendations supported by your data and those that require additional evidence.
Methodology note. A brief (3-4 sentence) description of your methods for readers who want to assess the rigor of the analysis.

Writing principles for policy briefs: - Lead with the conclusion, not the methodology - Use plain language (no jargon, no formulas, no p-values in the main text) - Present numbers in context ("Black applicants were denied mortgages at 1.8 times the rate of White applicants with similar incomes and credit scores" — not "the chi-square test was significant with p < 0.001") - Distinguish between "the data shows X" and "we recommend Y" — the recommendation involves values and priorities that go beyond the data

Rubric focus for this step: Communication.

Step 7: Reproducibility Check

Before submitting, verify that your work is fully reproducible.

Checklist: - [ ] The Jupyter notebook runs from top to bottom without errors ("Restart and Run All") - [ ] All data files are included or download instructions are provided - [ ] All library imports are at the top of the notebook - [ ] Data cleaning and subset creation steps are documented and justified - [ ] Random seeds are set for bootstrap and simulation procedures - [ ] All figures have titles, axis labels, legends, and source annotations - [ ] Demographic groups are labeled clearly and respectfully in all outputs - [ ] Code cells include comments explaining the analytical logic - [ ] The notebook includes explicit statements about what can and cannot be concluded from each analysis

Rubric focus for this step: Reproducibility.

Project Structure

Organize your Jupyter notebook with the following section headers:

1. Investigation Question and Context
2. Data Description and Dictionary
3. Data Cleaning and Preparation
   3a. Standard quality checks
   3b. Equity-specific quality assessment
4. Exploratory Data Analysis
   4a. Demographic profile
   4b. Outcome distributions by group
   4c. Confounder assessment
5. Statistical Analysis
   5a. Unadjusted group comparison
   5b. Adjusted analysis (regression with controls)
   5c. Effect sizes and practical significance
   5d. Robustness checks
6. Discussion and Interpretation
7. Ethical Framework and Reflection
8. Conclusions
9. References

Submit the following files: - capstone_social_justice.ipynb — your complete Jupyter notebook - policy_brief.pdf — your 3-page policy brief - ethical_reflection.pdf — your 1-page ethical reflection (this can be the same content as Section 7 in the notebook, formatted as a standalone document) - Any data files needed to run the notebook

Tips for Success

Start neutral. Frame your investigation as a genuine inquiry, not a predetermined conclusion. "Is there a disparity?" is a better starting question than "prove there's a disparity." If the data shows no significant disparity after controlling for confounders, that's a legitimate and important finding.
Effect sizes matter more than p-values here. In social justice research, knowing that a disparity is "statistically significant" is only the beginning. Knowing that it's large or small — and what it means in human terms — is what actually matters for policy.
Control for confounders, but explain what you're doing and why. When you add control variables to a regression, you're asking: "Does the group difference persist after accounting for these other factors?" Be thoughtful about which confounders to include. Controlling for a mediator (a variable that is itself caused by discrimination) can hide real disparities.
Be careful with causal language. "Correlated with," "associated with," and "predicts" are not the same as "causes." Your observational data almost certainly cannot prove causation, and overclaiming causation in a social justice context can have serious consequences.
Respect the people in your data. These are real people's lives represented in rows and columns. Use respectful language, avoid reductive characterizations, and remember that aggregate statistics never tell the full story of individual experiences.
Use the rubric. It's included in this section. Read it before you start, check it as you work, review it before you submit.

Assessment

This project is assessed using the Capstone Rubric provided in this section. The rubric evaluates eight criteria:

Criterion	Weight
Question Formulation	10%
Data Handling	10%
Statistical Analysis	20%
Visualization	10%
Interpretation	15%
Ethics	20%
Communication	10%
Reproducibility	5%

Total: 100%

Note: The weighting for this project gives extra weight to Ethics (compared to the other capstone projects) because responsible analysis of equity data is fundamental to the project's purpose.

See the detailed rubric for performance level descriptions (Excellent / Good / Developing / Needs Improvement) for each criterion.