Appendix E: Templates and Worksheets
These templates provide structured frameworks for the core tasks in statistical analysis. Photocopy them, print them, or recreate them in your notebook. They are designed to slow you down just enough to think carefully before, during, and after each analysis.
E.1 Hypothesis Test Template
Use this template for ANY hypothesis test — proportions, means, chi-square, ANOVA, nonparametric, or regression coefficients. Fill in every field before drawing a conclusion.
HYPOTHESIS TEST WORKSHEET
1. Research Question (in plain English):
2. Hypotheses:
-
H0: _______ (null — status quo, no effect, no difference)
-
Ha: _______ (alternative — the claim you're testing)
-
Test direction: [ ] Two-tailed [ ] Left-tailed [ ] Right-tailed
3. Significance Level:
-
alpha = ___ (set BEFORE looking at results)
-
Justification for this alpha: _______
4. Conditions / Assumptions Check:
| Condition | Met? | Evidence |
|---|---|---|
| Random sample or random assignment | [ ] Yes [ ] No [ ] Unclear | ______ |
| Independence (10% condition or separate groups) | [ ] Yes [ ] No [ ] N/A | ______ |
| Sample size / normality condition | [ ] Yes [ ] No | ______ |
| Equal variances (if applicable) | [ ] Yes [ ] No [ ] N/A | ______ |
- If conditions are not met, what should you do? _________
5. Test Information:
- Test name: _____
- Test statistic formula: _____
- Observed test statistic value: _____
- Degrees of freedom (if applicable): _____
- p-value: _____
6. Decision:
- [ ] Reject H0 (p-value <= alpha)
- [ ] Fail to reject H0 (p-value > alpha)
7. Conclusion (in context — use the words of the research question):
8. Effect Size and Practical Significance:
- Effect size measure: ___ Value: _____
- Is the effect practically meaningful? ___________
- 95% CI for the parameter: ( __ , __ )
9. Limitations and Caveats:
E.2 Confidence Interval Template
CONFIDENCE INTERVAL WORKSHEET
1. Parameter of Interest (in plain English):
2. Parameter Symbol: ____
3. Point Estimate:
- Symbol: ____
- Value: ____
4. Conditions Check:
| Condition | Met? | Evidence |
|---|---|---|
| Random sample | [ ] Yes [ ] No [ ] Unclear | ______ |
| Independence (10% condition) | [ ] Yes [ ] No | ______ |
| Normality / sample size condition | [ ] Yes [ ] No | ______ |
5. Confidence Level: ___ %
6. CI Formula:
- Formula: point estimate +/- (critical value) x (standard error)
- Standard error formula: _____
- Standard error value: _____
- Critical value (z or t): _____
- Degrees of freedom (if t): _____
- Margin of error: _____
7. Confidence Interval:
( __ , __ )
8. Interpretation (fill in the blanks):
"We are _% confident that the true ____ is between __ and _____."
9. What this does NOT mean (check your understanding):
- [ ] It does NOT mean there is a _____% probability the parameter is in this interval.
- [ ] It means that if we repeated the sampling process many times, approximately _____% of the resulting intervals would contain the true parameter.
10. Practical Interpretation:
Is the interval narrow enough to be useful? _ What decisions can you make based on this range? _________
E.3 Study Design Evaluation Checklist
Use this checklist to evaluate ANY study — whether you're reading it in a news article, a journal paper, or designing your own.
STUDY DESIGN EVALUATION
Study Title / Source: _____________
Date of Evaluation: _____
A. Basic Classification
- [ ] Observational study
- [ ] Experiment (randomized)
- [ ] Natural experiment / quasi-experiment
- [ ] Survey
B. Sampling
- How were participants selected? ____________
- Sampling method: [ ] Simple random [ ] Stratified [ ] Cluster [ ] Convenience [ ] Other
- Sample size: n = _____
- Is the sample representative of the target population? [ ] Yes [ ] No [ ] Unclear
- Potential sampling biases:
- [ ] Selection bias
- [ ] Nonresponse bias
- [ ] Survivorship bias
- [ ] Volunteer/self-selection bias
- [ ] Other: ___
C. Variables
- Explanatory variable(s): _____________
- Response variable(s): _________
- Potential confounding variables: ____________
- Were confounders controlled for? [ ] Yes (how?) [ ] No
D. Experimental Design (if applicable)
- Was there random assignment to groups? [ ] Yes [ ] No
- Was there a control group? [ ] Yes [ ] No
- Was blinding used? [ ] Single-blind [ ] Double-blind [ ] No
- Was a placebo used? [ ] Yes [ ] No [ ] N/A
E. Causal Claims
- Does the study claim a causal relationship? [ ] Yes [ ] No
- Is a causal claim justified? [ ] Yes [ ] No
- Reasoning: __________
F. Ethical Considerations
- Was informed consent obtained? [ ] Yes [ ] No [ ] Unclear [ ] N/A
- Was IRB approval mentioned? [ ] Yes [ ] No [ ] N/A
- Are there privacy concerns? [ ] Yes [ ] No
- Could the findings be used to harm the study population? [ ] Yes [ ] No
G. Overall Assessment
- Strengths of the study: ____________
- Weaknesses: _________
- Confidence in the conclusions (1-5): _____
- What additional information would strengthen the study? _______
E.4 Data Cleaning Log Template
Every data cleaning decision changes the story your data tells. Document every decision for reproducibility and transparency.
DATA CLEANING LOG
Dataset: _____ Date: ___
Raw dataset dimensions: _ rows x _ columns
| Step | Action | Columns Affected | Rows Changed | Justification | Decision Made By |
|---|---|---|---|---|---|
| 1 | |||||
| 2 | |||||
| 3 | |||||
| 4 | |||||
| 5 | |||||
| 6 | |||||
| 7 | |||||
| 8 | |||||
| 9 | |||||
| 10 |
Common actions to log: - Removed duplicate rows - Dropped rows with missing values in column(s) ___ - Imputed missing values in ___ using ___ method - Recoded variable ___ (original values -> new values) - Created new variable ___ from ___ - Removed outliers in ___ (criteria: ___) - Fixed inconsistent entries in ___ (e.g., "CA" and "California") - Changed data type of ___ from ___ to ___ - Filtered to subset where ___
Final dataset dimensions: _ rows x _ columns
Rows removed (total): ___ ( _____% of original)
Sensitivity check: Would different cleaning decisions change the main conclusions? - [ ] Yes (describe how: _________) - [ ] No - [ ] Not yet checked
E.5 Statistical Analysis Report Template
Use this structure for the Data Detective Portfolio and any formal statistical report.
STATISTICAL ANALYSIS REPORT
Title: ___________
Author(s): _____ Date: _____
1. Introduction (1/2 to 1 page)
- What question are you investigating?
- Why does this question matter?
- What dataset are you using and where did it come from?
- What is the scope of your analysis? (What are you including/excluding?)
2. Data Description (1/2 to 1 page)
- Source and collection method
- Sample size (n)
- Key variables: name, type (categorical/numerical), and brief description
- Data dictionary (table format)
3. Data Cleaning and Preparation (1/2 page + cleaning log)
- Summary of cleaning steps (attach full cleaning log as appendix)
- Missing data: how much, what patterns, how handled
- Any variables created or recoded
- Final dataset dimensions
4. Exploratory Data Analysis (1-2 pages)
- Visualizations: histograms, box plots, bar charts, scatterplots
- Summary statistics: center, spread, shape
- Notable patterns, outliers, or unexpected findings
- Each figure should have a title, axis labels, and a one-sentence interpretation
5. Statistical Analysis (2-3 pages)
- State each hypothesis test formally (H0, Ha, alpha)
- Show conditions checks
- Report test statistics, p-values, and confidence intervals
- Report effect sizes
- For regression: report the model equation, R-squared, residual diagnostics
- Interpret every result in context
6. Discussion and Conclusions (1 page)
- What did you find? (Summary of key results)
- What do the results mean in practical terms?
- What are the limitations of your analysis?
- What can you NOT conclude? (Correlation vs. causation, generalizability)
- What would you do differently with more time or data?
7. Ethical Considerations (1/2 page)
- Who collected this data and why?
- Whose voices are included/excluded?
- Could your analysis be misused? How?
- What biases might affect your conclusions?
8. References
- Cite the dataset source
- Cite any external references used
E.6 Presentation Planning Worksheet
For presenting statistical findings to a non-technical audience.
PRESENTATION PLANNING WORKSHEET
Topic: ____________
Audience: ____ Time Limit: _
Audience Analysis
- What does my audience already know about statistics? ____
- What do they care about? ___________
- What decision will they make based on my presentation? ________
- What is the ONE thing I want them to remember? ________
Structure
Opening Hook (30 seconds — 1 minute):
Context / Why This Matters (1-2 minutes):
Key Finding 1:
- Result: _____________
- Visual: _____________
- Plain-language explanation: _______
Key Finding 2:
- Result: _____________
- Visual: _____________
- Plain-language explanation: _______
Key Finding 3 (if applicable):
- Result: _____________
- Visual: _____________
- Plain-language explanation: _______
Limitations and Caveats (1 minute):
Recommendation / Call to Action:
Visualization Checklist
For each graph or table in the presentation:
- [ ] Title is clear and descriptive
- [ ] Axes are labeled with units
- [ ] Font is large enough to read from the back of the room
- [ ] Colors are colorblind-friendly
- [ ] No 3D effects or chartjunk
- [ ] The main message is obvious within 5 seconds
- [ ] Source is cited
E.7 Ethical Review Checklist
Use this checklist BEFORE collecting data, during analysis, and before publishing results.
ETHICAL REVIEW CHECKLIST
Project: _____ Date: ___
Before Data Collection
- [ ] Is this research covered by an IRB protocol (if applicable)?
- [ ] Have participants given informed consent?
- [ ] Is participation voluntary? Can participants withdraw?
- [ ] Have you explained how the data will be used, stored, and shared?
- [ ] Are you collecting only the data you need (data minimization)?
- [ ] Could this data be used to identify individuals? If so, what protections are in place?
- [ ] Are you compensating participants fairly?
During Analysis
- [ ] Have you pre-registered your hypotheses, or are you being transparent about which analyses are exploratory?
- [ ] Are you testing only the hypotheses you planned, or are you searching for significant results (p-hacking)?
- [ ] Are you reporting ALL analyses you ran, not just the ones with significant results?
- [ ] Have you checked whether your results look different for different demographic subgroups?
- [ ] Are you using appropriate statistical methods for your data type and research question?
- [ ] Are you interpreting p-values correctly (probability of data given H0, NOT probability of H0)?
- [ ] Are you distinguishing between statistical significance and practical significance?
Before Reporting Results
- [ ] Does your visualization accurately represent the data (no truncated axes, misleading scales, or cherry-picked time windows)?
- [ ] Are you honest about the limitations of your analysis?
- [ ] Are you making causal claims only when the study design supports them?
- [ ] Have you considered who might be harmed by your conclusions?
- [ ] Are you using language that is precise and avoids sensationalism?
- [ ] Is your analysis reproducible? Could someone else follow your steps and get the same results?
- [ ] Have you acknowledged potential biases in the data collection and analysis process?
Special Considerations for Algorithmic / AI Applications
- [ ] Have you evaluated model performance separately for different demographic groups?
- [ ] Are there proxy variables that could introduce discrimination?
- [ ] Who bears the cost of false positives? False negatives? Is that distribution fair?
- [ ] Is there a human review mechanism for high-stakes decisions?
- [ ] Have you considered the Chouldechova impossibility result (you can't equalize all fairness metrics simultaneously when base rates differ)?