Index

Page references are given as chapter and section numbers: Ch.N section N.X. Primary definitions are in bold; secondary references are in regular type. Anchor example appearances are marked with [AE]. Python functions appear in code font.


A

A/B testing, Ch.4 section 4.7, Ch.13 section 13.14, Ch.16 section 16.3 [AE: Alex Rivera], Ch.26 section 26.2 accuracy (classification), Ch.24 section 24.8, Ch.26 section 26.5 accuracy paradox, Ch.24 section 24.8 addition rule, Ch.8 section 8.5 adjusted R-squared, Ch.23 section 23.5, Ch.22 section 22.8 Agresti, Alan, Ch.14 section 14.6 algorithmic bias, Ch.1 section 1.4, Ch.26 section 26.5, Ch.27 section 27.7 - COMPAS algorithm, Ch.16 case-study-02, Ch.24 section 24.12, Ch.26 section 26.5 - healthcare algorithm (Obermeyer), Ch.1, Ch.4, Ch.26 section 26.5 - proxy variables, Ch.16 case-study-02, Ch.24 section 24.12, Ch.26 section 26.5 alternative hypothesis (Ha), Ch.13 section 13.3, Ch.14 section 14.3, Ch.15 section 15.2 Anaconda, Ch.3 section 3.11, Appendix C analysis of variance, see ANOVA Angwin, Julia, Ch.16 case-study-02, Ch.26 case-study-01 ANOVA, Ch.20 section 20.2, Ch.17 section 17.9 - assumptions, Ch.20 section 20.9 - eta-squared, Ch.20 section 20.10 - F-statistic, Ch.20 section 20.5 - one-way, Ch.20 section 20.2 - post-hoc tests, Ch.20 section 20.8 - table, Ch.20 section 20.7 - Welch's, Ch.20 section 20.9 Anscombe's Quartet, Ch.5 further-reading, Ch.22 section 22.1 ASA statement on p-values (2016), Ch.13 section 13.6, Ch.17 section 17.8 ASA statement (2019), Ch.17 section 17.8 AUC (Area Under ROC Curve), Ch.24 section 24.9

B

bar chart, Ch.5 section 5.2, Ch.25 section 25.4 base rate fallacy, Ch.9 section 9.14, Ch.26 section 26.3 Bayes, Thomas, Ch.9 further-reading Bayes' theorem, Ch.9 section 9.6, Ch.13 section 13.6, Ch.24 section 24.14, Ch.26 section 26.3 - medical testing, Ch.9 section 9.8, Ch.26 section 26.3 - natural frequencies, Ch.9 section 9.7 - spam filters, Ch.1 section 1.4, Ch.9 section 9.10 Belmont Report, Ch.27 section 27.6 Bem, Daryl, Ch.13 case-study-01 between-group variability, Ch.20 section 20.3 bias, Ch.4 section 4.3 - algorithmic, see algorithmic bias - nonresponse, Ch.4 section 4.3 - response, Ch.4 section 4.3 - selection, Ch.4 section 4.3 - survivorship, Ch.4 section 4.3 bias-variance tradeoff, Ch.26 section 26.4 Bickel, Hammel, and O'Connell (Berkeley admissions), Ch.27 section 27.2 big data, Ch.26 section 26.6 bimodal, Ch.5 section 5.7, Ch.6 section 6.2 binary outcome, Ch.24 section 24.1 binning, Ch.7 section 7.7 binomial distribution, Ch.10 section 10.3 - BINS conditions, Ch.10 section 10.3 - normal approximation, Ch.10 section 10.8, Ch.14 section 14.3 - scipy.stats.binom, Ch.10 section 10.3 birthday problem, Ch.8 section 8.1, Ch.8 section 8.9 blinding, Ch.4 section 4.7 Bonferroni correction, Ch.17 section 17.9, Ch.20 section 20.8 bootstrap, Ch.18 section 18.2 - BCa method, Ch.18 section 18.4 - confidence interval, Ch.18 section 18.4 - distribution, Ch.18 section 18.3 - np.random.choice, Ch.18 section 18.3 box plot, Ch.6 section 6.6, Ch.5 section 5.9 Box, George E. P., Ch.10 section 10.1, Ch.10 section 10.9 BRFSS (CDC), Ch.3 section 3.5, Appendix D Brown, Cai, and DasGupta (2001), Ch.14 section 14.6

C

Cairo, Alberto, Ch.25 further-reading care ethics, Ch.27 section 27.8 categorical variable, Ch.2 section 2.2, Ch.5 section 5.2 causal claims checklist, Ch.4 section 4.8 CCPA, Ch.27 section 27.6 CDC, see BRFSS cell (notebook), Ch.3 section 3.2 Central Limit Theorem (CLT), Ch.11 section 11.4, Ch.12 section 12.2, Ch.14 section 14.3 - conditions, Ch.11 section 11.11 - for proportions, Ch.11 section 11.5 - simulation, Ch.11 section 11.3 chartjunk, Ch.25 section 25.2 Chen, Dr. Maya [anchor example], Ch.1 section 1.5, Ch.7 section 7.1, Ch.9 section 9.8, Ch.10 section 10.7, Ch.12 section 12.5, Ch.14 section 14.4, Ch.15 section 15.4, Ch.16 section 16.7, Ch.22 case-study-01, Ch.24 section 24.6, Ch.26 section 26.3 cherry-picking, Ch.27 section 27.4 chi-square distribution, Ch.19 section 19.2, Appendix A section A.3 chi-square test, Ch.19 section 19.2 - chi2_contingency(), Ch.19 section 19.5 - chisquare(), Ch.19 section 19.3 - conditions, Ch.19 section 19.4 - Cramer's V, Ch.19 section 19.6 - goodness-of-fit, Ch.19 section 19.3 - standardized residuals, Ch.19 section 19.8 - test of independence, Ch.19 section 19.5 Chouldechova impossibility result, Ch.16 case-study-02, Ch.24 case-study-02, Ch.26 section 26.5 Clark, Sally, Ch.9 case-study-02 classification, Ch.24 section 24.13, Ch.26 section 26.2 cleaning log, Ch.7 section 7.10, Ch.25 section 25.15, Appendix E section E.4 cluster sampling, Ch.4 section 4.2 codebook, see data dictionary coefficient of determination, see R-squared Cohen, Jacob, Ch.17 section 17.4 Cohen's d, Ch.17 section 17.4 - benchmarks, Ch.17 section 17.4 Cohen's f, Ch.20 section 20.10 Cohen's h, Ch.17 section 17.4 collaborative filtering, Ch.26 section 26.2 College Scorecard, Appendix D colorblind-friendly palette, Ch.25 section 25.5 Common Rule (45 CFR 46), Ch.27 section 27.6 complement, Ch.8 section 8.4 COMPAS algorithm, Ch.16 case-study-02, Ch.24 section 24.12, Ch.26 section 26.5, Ch.27 case-study-02 conditional probability, Ch.9 section 9.2, Ch.8 section 8.7 - pd.crosstab(normalize='index'), Ch.9 section 9.4 confidence band, Ch.25 section 25.9 confidence interval, Ch.12 section 12.2 - bootstrap, Ch.18 section 18.4 - for a mean, Ch.12 section 12.5 - for a proportion, Ch.12 section 12.6, Ch.14 section 14.6 - interpretation, Ch.12 section 12.3, Appendix F Q1 - sample size, Ch.12 section 12.9 - stats.t.interval(), Ch.12 section 12.7 - template, Appendix E section E.2 - tradeoff triangle, Ch.12 section 12.8 - Wilson interval, Ch.14 section 14.6 confidence level, Ch.12 section 12.3 confounding variable, Ch.4 section 4.6, Ch.22 section 22.5, Ch.23 section 23.4 confusion matrix, Ch.24 section 24.8, Ch.26 section 26.5 - confusion_matrix() (sklearn), Ch.24 section 24.10 contingency table, Ch.8 section 8.7, Ch.19 section 19.5 - pd.crosstab(), Ch.8 section 8.7, Ch.19 continuity correction, Ch.10 section 10.8 continuous variable, Ch.2 section 2.3 control chart, Ch.6 case-study-02, Ch.11 case-study-02 control group, Ch.4 section 4.7 convenience sample, Ch.4 section 4.2 correlation coefficient (r), see Pearson's r - correlation does not imply causation, Ch.4 section 4.6, Ch.22 section 22.5, Appendix F Q5 - heatmap, Ch.22 section 22.3 - spurious, Ch.22 section 22.5 correlation matrix, Ch.22 section 22.3 - numpy.corrcoef(), Ch.22 section 22.3 CORREL (Excel), Ch.22 section 22.3 Cox, David, Ch.24 further-reading Cramer's V, Ch.19 section 19.6 critical value, Ch.12 section 12.4, Appendix A cross-sectional study, Ch.2 section 2.7 CSV, Ch.3 section 3.5

D

data, Ch.1 section 1.1 data cleaning, Ch.7 section 7.1, Appendix E section E.4 data dictionary, Ch.2 section 2.5 data-ink ratio, Ch.25 section 25.2 data literacy, Ch.1 section 1.4 data mining, Ch.26 section 26.6 data privacy, Ch.27 section 27.6 data quality, Ch.7 section 7.1 data science pipeline, Ch.28 section 28.4 Data Detective Portfolio, Ch.1 section 1.5, Ch.28 data sources, Appendix D data storytelling, Ch.25 section 25.7 data visualization principles, Ch.25 section 25.2 data wrangling, Ch.7 section 7.1 DataFrame, Ch.3 section 3.4 de Moivre, Abraham, Ch.10 further-reading deepfake, Ch.26 section 26.9 degrees of freedom, Ch.6 section 6.5, Ch.12 section 12.4, Ch.15 section 15.2, Ch.19 section 19.2, Ch.20 section 20.5 Delacre, Lakens, and Leys (2017), Ch.16 further-reading dependent samples, Ch.16 section 16.4 descriptive statistics, Ch.1 section 1.1 .describe(), Ch.3 section 3.5, Ch.6 section 6.10 discrete variable, Ch.2 section 2.3 distribution-free, Ch.21 section 21.3 distribution shape, Ch.5 section 5.7 double-blind, Ch.4 section 4.7 .dropna(), Ch.7 section 7.3 .drop_duplicates(), Ch.7 section 7.4 .dtypes, Ch.3 section 3.5 dummy variable, see indicator variable .duplicated(), Ch.7 section 7.4

E

ecological fallacy, Ch.27 section 27.3 effect size, Ch.17 section 17.4, Ch.22 section 22.18, Ch.25 section 25.10 - Cohen's d, Ch.17 section 17.4 - Cramer's V, Ch.19 section 19.6 - eta-squared, Ch.20 section 20.10 - R-squared, Ch.22 section 22.8 Efron, Bradley, Ch.18 section 18.2 Empirical Rule (68-95-99.7), Ch.6 section 6.7, Ch.10 section 10.2 error bars, Ch.25 section 25.9 eta-squared, Ch.20 section 20.10 ethical review checklist, Appendix E section E.7 event, Ch.8 section 8.2 Excel functions - AVERAGE, Ch.6 section 6.5.1 - CHISQ.TEST, Ch.19 section 19.3 - CONFIDENCE.T, Ch.12 section 12.7 - CORREL, Ch.22 section 22.3 - COUNTBLANK, Ch.7 section 7.2 - INTERCEPT, Ch.22 section 22.8 - MEDIAN, Ch.6 section 6.5.1 - NORM.S.DIST, Ch.14 section 14.9 - NORM.S.INV, Ch.12 section 12.7 - RSQ, Ch.22 section 22.8 - SLOPE, Ch.22 section 22.8 - STDEV.S, Ch.6 section 6.5.1 - T.INV.2T, Ch.12 section 12.7 - T.TEST, Ch.16 section 16.10 executive summary, Ch.25 section 25.11 expected frequency, Ch.19 section 19.2 expected value, Ch.10 section 10.2 experiment, Ch.4 section 4.1 exploratory data analysis (EDA), Ch.5 section 5.1 extrapolation, Ch.22 section 22.10

F

F-distribution, Ch.20 section 20.6, Appendix A section A.4 F-statistic, Ch.20 section 20.5 F1 score, Ch.24 section 24.8 f_oneway(), Ch.20 section 20.7 Facebook emotional contagion study, Ch.4 section 4.7, Ch.27 section 27.6 fail to reject, Ch.13 section 13.7, Appendix F Q3 fairness impossibility theorem, see Chouldechova impossibility result false discovery rate, Ch.13 case-study-01 false negative, Ch.9 section 9.8, Ch.13 section 13.9 false positive, Ch.9 section 9.8, Ch.13 section 13.9 family-wise error rate (FWER), Ch.17, Ch.20 section 20.2 feature engineering, Ch.7 section 7.8 file drawer problem, Ch.17 section 17.9 .fillna(), Ch.7 section 7.3 Fisher, Ronald, Ch.12 further-reading, Ch.13 further-reading, Ch.20 section 20.6 five-number summary, Ch.6 section 6.6 FiveThirtyEight data, Appendix D frequency distribution, Ch.5 section 5.4

G

Gallup, George, Ch.4 section 4.4 Galton, Francis, Ch.11 epigraph, Ch.22 section 22.9 gambler's fallacy, Ch.8 section 8.3 Gapminder, Ch.1 case-study-02, Appendix D garden of forking paths, Ch.13 section 13.12 Gauss, Carl Friedrich, Ch.10 further-reading, Ch.22 further-reading GDPR, Ch.27 section 27.6 general multiplication rule, Ch.9 section 9.11 Gigerenzer, Gerd, Ch.9 section 9.7 goodness-of-fit test, Ch.19 section 19.3 Google Colab, Ch.3 section 3.2, Appendix C section C.1 Google Flu Trends, Ch.26 section 26.6 Gosset, William Sealy (Student), Ch.12 section 12.4, Ch.15 grand mean, Ch.20 section 20.4 .groupby(), Ch.3 section 7

H

hallucination (AI), Ch.26 section 26.8 HARKing, Ch.27 section 27.5 .head(), Ch.3 section 3.5 hedging language, Ch.25 section 25.9 histogram, Ch.5 section 5.4 hypothesis test template, Appendix E section E.1 hypothesis testing, Ch.13 section 13.2 - five-step procedure, Ch.13 section 13.4 - logic of, Ch.13 section 13.2

I

IDE, Ch.3 section 11 IMRaD, Ch.25 section 25.11 imputation, Ch.7 section 7.3 independent events, Ch.8 section 6 independent samples, Ch.16 section 16.3 indicator variable (dummy), Ch.23 section 23.6 inferential statistics, Ch.1 section 1.1 .info(), Ch.3 section 3.5 informed consent, Ch.4 section 4.7, Ch.27 section 27.6 interaction term, Ch.23 section 23.7 intercept, Ch.22 section 22.6 interquartile range (IQR), Ch.6 section 6.4 interval estimate, Ch.12 section 12.2 Ioannidis, John, Ch.13 case-study-01 IRB (Institutional Review Board), Ch.4 section 4.7, Ch.27 section 27.6 .isna(), Ch.7 section 7.2

J

John, Loewenstein, and Prelec (2012), Ch.27 section 27.5 joint probability, Ch.8 section 8.7 Jupyter notebook, Ch.3 section 3.2, Appendix C

K

Kaggle, Appendix D Kahneman, Daniel, Ch.8, Ch.9 kernel, Ch.3 section 3.2 Knaflic, Cole Nussbaumer, Ch.25 further-reading Kruskal, William, Ch.21 further-reading Kruskal-Wallis test, Ch.21 section 21.8 - stats.kruskal(), Ch.21 section 21.8

L

Laplace, Pierre-Simon, Ch.8 epigraph, Ch.11 further-reading large language model (LLM), Ch.26 section 26.8 law of large numbers, Ch.8 section 8.3, Ch.11 section 11.4 law of total probability, Ch.9 section 9.6 least squares, Ch.22 section 22.6 Legendre, Adrien-Marie, Ch.22 further-reading level of measurement, Ch.2 section 2.6 Levene's test, Ch.20 section 20.9 - stats.levene(), Ch.20 section 20.9 library (Python), Ch.3 section 3.4 likelihood ratio, Ch.9 section 9.13 Likert scale, Ch.2 section 2.3 LINE conditions (regression), Ch.22 section 22.16 linear relationship, Ch.22 section 22.3 linregress(), Ch.22 section 22.11 listwise deletion, Ch.7 section 7.3 Literary Digest poll (1936), Ch.4 section 4.4, Ch.11 case-study-01, Ch.14 case-study-01 log-odds (logit), Ch.24 section 24.3 logistic regression, Ch.24 section 24.4 - LogisticRegression() (sklearn), Ch.24 section 24.10 - odds ratios, Ch.24 section 24.6 - sm.Logit(), Ch.24 section 24.10 longitudinal study, Ch.2 section 2.7 lurking variable, Ch.22 section 22.5

M

machine learning, Ch.26 section 26.2 Mann, Henry B., Ch.21 further-reading Mann-Whitney U test, Ch.21 section 21.6 - stats.mannwhitneyu(), Ch.21 section 21.6 margin of error, Ch.12 section 12.2 - for proportions, Ch.14 section 14.7 marginal probability, Ch.8 section 8.7 matched pairs, Ch.16 section 16.4 matplotlib, Ch.5 section 5.2, Appendix B section B.3 maximum likelihood estimation, Ch.24 section 24.7 MCAR, MAR, MNAR, Ch.7 section 7.2 McKinney, Wes, Ch.3 section 3.4 mean, Ch.6 section 6.1 - .mean(), Ch.6 section 6.5.1 - vs. median, Ch.6 section 6.2 mean square, Ch.20 section 20.5 Meadow, Roy, Ch.9 case-study-02 median, Ch.6 section 6.1 - .median(), Ch.6 section 6.5.1 Milgram, Stanley, Ch.4, Ch.27 misinformation, Ch.26 section 26.9 misleading graphs, Ch.25 section 25.3 missing data (NA/NaN), Ch.7 section 7.2 mode, Ch.6 section 6.1 Monte Carlo simulation, Ch.18 section 18.6 Monty Hall problem, Ch.8 case-study-01 multicollinearity, Ch.23 section 23.8 multiple comparisons problem, Ch.20 section 20.2 multiple regression, Ch.23 section 23.2 - sm.OLS(), Ch.22 section 22.11, Ch.23 multiplication rule, Ch.8 section 8.6 mutually exclusive, Ch.8 section 8.5

N

Naive Bayes classifier, Ch.9 section 9.10 Narayanan and Shmatikov, Ch.27 section 27.6 negative predictive value (NPV), Ch.9 section 9.8 Neyman, Jerzy, Ch.12 further-reading, Ch.13 further-reading Nightingale, Florence, Ch.5 case-study-02 nominal, Ch.2 section 2.3 nonparametric test, Ch.21 section 21.3 - decision guide, Appendix F Q13 nonresponse bias, Ch.4 section 4.3 normal distribution, Ch.10 section 10.5 - scipy.stats.norm, Ch.10 section 10.7 - standard normal, Ch.10 section 10.6 normality assumption, Ch.15 section 15.6 np.random.choice(), Ch.11 section 11.3, Ch.18 section 18.3 null hypothesis (H0), Ch.13 section 13.3 numerical variable, Ch.2 section 2.2

O

observation, Ch.1 section 1.1 observational study, Ch.4 section 4.1 observational unit, Ch.2 section 2.1 observed frequency, Ch.19 section 19.2 Obermeyer et al. (2019), Ch.1, Ch.26 section 26.5 odds, Ch.24 section 24.3 odds ratio, Ch.24 section 24.6 Okafor, Sam [anchor example], Ch.1 section 1.5, Ch.6 section 6.1, Ch.8 section 8.2, Ch.10 section 10.3, Ch.11 section 11.5, Ch.12 section 12.6, Ch.13 section 13.3, Ch.14 section 14.5, Ch.15 section 15.9, Ch.16 section 16.4, Ch.17 section 17.7, Ch.24 section 24.11, Ch.26 section 26.12 OLS (Ordinary Least Squares), sm.OLS(), Ch.22 section 22.11, Ch.23 one-sample t-test, Ch.15 section 15.2 - stats.ttest_1samp(), Ch.13 section 13.13, Ch.15 one-sample z-test for proportions, Ch.14 section 14.3 - proportions_ztest(), Ch.14 section 14.8 one-tailed test, Ch.13 section 13.8 one-way ANOVA, see ANOVA O'Neil, Cathy, Ch.26 further-reading open data, Ch.27 section 27.5 Open Science Collaboration (2015), Ch.1 case-study-01, Ch.17 section 17.10, Ch.27 optional stopping, Ch.27 section 27.5 ordinal, Ch.2 section 2.3 outcome, Ch.8 section 8.2 outlier, Ch.5 section 5.7, Ch.6 section 6.9, Appendix F Q9 - IQR method, Ch.6 section 6.4 - z-score method, Ch.6 section 6.8 overfitting, Ch.26 section 26.4

P

p-hacking, Ch.1 case-study-01, Ch.13 section 13.12, Ch.17 section 17.9, Ch.27 section 27.5 p-value, Ch.13 section 13.5 - correct interpretation, Ch.13 section 13.5, Appendix F Q2 - five misconceptions, Ch.13 section 13.6 paired t-test, Ch.16 section 16.4 - stats.ttest_rel(), Ch.16 section 16.9 pandas, Ch.3 section 3.4, Appendix B section B.2 parameter, Ch.2 section 2.4 pd.crosstab(), Ch.8 section 8.7, Ch.19 pd.cut(), Ch.7 section 7.7 pd.melt(), Ch.7 section 7.9 pd.read_csv(), Ch.3 section 3.5 Pearson, Egon, Ch.13 further-reading Pearson, Karl, Ch.19 further-reading, Ch.22 further-reading Pearson's r, Ch.22 section 22.3 - stats.pearsonr(), Ch.22 section 22.3 percentile, Ch.6 section 4 permutation test, Ch.18 section 18.6 pie chart, Ch.5 section 5.3 placebo, Ch.4 section 4.7 plus-four method, Ch.14 section 14.6 point estimate, Ch.12 section 12.2 Polya, George, Ch.11 further-reading pooled standard error, Ch.16 section 16.3 population, Ch.1 section 1.1 population proportion, Ch.14 section 14.3 positive predictive value (PPV), Ch.9 section 9.8 posterior probability, Ch.9 section 9.13 power, see statistical power power analysis, Ch.17 section 17.7 - TTestIndPower, Ch.17 section 17.7 power curve, Ch.17 section 17.7 practical significance, Ch.17 section 17.11, Appendix F Q4 precision (classification), Ch.24 section 24.8 prediction vs. inference, Ch.26 section 26.7 pre-registration, Ch.13 section 13.12, Ch.27 section 27.5 prior probability, Ch.9 section 9.13 PRISM Rating, Ch.26 section 26.12, Ch.26 case-study-02 probability, Ch.8 section 2 - classical, Ch.8 section 8.2 - relative frequency, Ch.8 section 8.2 - subjective, Ch.8 section 8.2 probability density function (PDF), Ch.10 section 10.4 probability distribution, Ch.10 section 10.2 proportion_confint(), Ch.12 section 12.7, Ch.14 section 14.8 proportions_ztest(), Ch.14 section 14.8, Ch.16 section 16.9 ProPublica, Ch.16 case-study-02, Ch.24 section 24.12, Ch.26 case-study-01 prosecutor's fallacy, Ch.9 section 9.3 proxy variable, Ch.16 case-study-02, Ch.24 section 24.12, Ch.26 section 26.5 publication bias, Ch.13 section 13.12, Ch.17 section 17.9 Python reference, Appendix B

Q

QQ-plot, Ch.10 section 10.9 - stats.probplot(), Ch.10 section 10.9 QRPs, see questionable research practices quartile, Ch.6 section 6.4 questionable research practices (QRPs), Ch.27 section 27.5

R

R-squared (coefficient of determination), Ch.22 section 22.8, Ch.23 section 23.5 - adjusted, Ch.23 section 23.5 random sample, Ch.4 section 2 random variable, Ch.10 section 10.2 randomization, Ch.4 section 4.5 range, Ch.6 section 6.4 rank, Ch.21 section 21.4 recall, Ch.24 section 24.8 recoding, Ch.7 section 7.6 recommendation algorithm, Ch.26 section 26.2 registered reports, Ch.17 section 17.10, Ch.27 section 27.5 regression line, Ch.22 section 22.6 regression to the mean, Ch.22 section 22.9 regplot() (seaborn), Ch.22 section 22.11 re-identification, Ch.27 section 27.6 rejection region, Ch.13 section 13.7 relative frequency, Ch.5 section 5.4 replication crisis, Ch.1 case-study-01, Ch.13, Ch.17 section 17.10, Ch.27 reproducibility, Ch.7 section 7.10, Ch.25 section 25.15 resampling, Ch.18 section 18.3 residual, Ch.22 section 22.6 - diagnostics, Ch.22 section 22.16, Ch.23 resistant measure, Ch.6 section 6.1 response bias, Ch.4 section 4.3 rights-based ethics, Ch.27 section 27.8 Rivera, Alex [anchor example], Ch.1 section 1.5, Ch.4 case-study-02, Ch.8 section 8.5, Ch.12 section 12.2, Ch.13 section 13.14, Ch.16 section 16.3, Ch.24 section 24.10, Ch.26 section 26.2 ROC curve, Ch.24 section 24.9 - roc_curve() (sklearn), Ch.24 section 24.9 robustness, Ch.15 section 15.7 roc_auc_score() (sklearn), Ch.24 section 24.9 Rosenthal, Robert, Ch.17 section 17.9 Rosling, Hans, Ch.1 case-study-02 Rubin, Donald, Ch.7 section 7.2

S

sample, Ch.1 section 1.1 sample proportion (p-hat), Ch.14 section 14.3 sample size determination, Ch.12 section 12.9, Ch.17 section 17.7 sample space, Ch.8 section 8.2 sampling bias, Ch.4 section 4.3, Ch.26 section 26.3 sampling distribution, Ch.11 section 11.2 - of the mean, Ch.11 section 11.2 - of the proportion, Ch.11 section 11.5 sampling variability, Ch.11 section 11.2 SAT/ACT score distributions, Ch.10 case-study-01 scatterplot, Ch.5 section 5.9, Ch.22 section 22.2 scipy.stats, see individual function names seaborn, Ch.5 section 5.2, Appendix B section B.3 selection bias, Ch.4 section 4.3 sensitivity (true positive rate), Ch.9 section 9.8, Ch.24 section 24.5 Shapiro-Wilk test, Ch.10 section 10.9 - stats.shapiro(), Ch.10 section 10.9 Shewhart, Walter, Ch.6 case-study-02, Ch.11 case-study-02 sigmoid function, Ch.24 section 24.2 sign test, Ch.21 section 21.5 significance level (alpha), Ch.13 section 13.7 Simmons, Nelson, and Simonsohn (2011), Ch.13, Ch.17 simple linear regression, Ch.22 section 22.6 Simpson's paradox, Ch.27 section 27.2 simulation-based inference, Ch.18 section 18.2 skewed left, Ch.5 section 5.7 skewed right, Ch.5 section 5.7 slope, Ch.22 section 22.6 small multiples, Ch.25 section 25.2 specificity (true negative rate), Ch.9 section 9.8, Ch.24 section 24.5 Sports Illustrated Jinx, Ch.22 section 22.9 spurious correlation, Ch.22 section 22.5 standard deviation, Ch.6 section 6.5 - .std(), Ch.6 section 6.5.1 standard error, Ch.11 section 11.6 - .sem() (pandas), Ch.12 section 12.7 standard normal distribution, Ch.10 section 10.6 standardized residual, Ch.19 section 19.8 statistic, Ch.2 section 2.4 statistical power, Ch.17 section 17.6 statistical significance, Ch.13 section 13.7, Appendix F Q4 - vs. practical significance, Ch.17 section 17.11 statistical tables, Appendix A statistical thinking, Ch.1 section 1.4 statistics (definition), Ch.1 section 1.1 STATS checklist, Ch.26 section 26.9 statsmodels, see OLS, Logit, proportions_ztest, pairwise_tukeyhsd, TTestIndPower stem-and-leaf plot, Ch.5 section 5.5 Stevens, S. S., Ch.2 section 2.6 stratified sampling, Ch.4 section 4.2 StreamVibe [fictional company], Ch.1 section 1.5, throughout [AE: Alex Rivera] Student, see Gosset, William Sealy study design evaluation checklist, Appendix E section E.3 success-failure condition, Ch.14 section 14.3 sum of squares, Ch.20 section 20.4 supervised learning, Ch.26 section 26.2 survivorship bias, Ch.4 section 4.3 Sweeney, Latanya, Ch.27 section 27.6 symmetric, Ch.5 section 5.7 systematic sampling, Ch.4 section 4.2

T

t-distribution, Ch.12 section 12.4, Ch.15 section 15.2, Appendix A section A.2 - scipy.stats.t, Ch.12 section 12.7 Taleb, Nassim Nicholas, Ch.10 further-reading technical report, Ch.25 section 25.11 10% condition, Ch.11 section 11.11 test of independence, Ch.19 section 19.5 test statistic, Ch.13 section 13.4 threshold (classification), Ch.24 section 24.8 tidy data, Ch.7 section 7.9 train_test_split() (sklearn), Ch.24 section 24.10 training data, Ch.26 section 26.3 treatment group, Ch.4 section 4.7 tree diagram, Ch.9 section 9.5 ttest_1samp(), Ch.13 section 13.13, Ch.15 ttest_ind(), Ch.16 section 16.9 ttest_rel(), Ch.16 section 16.9 Tufte, Edward, Ch.25 section 25.2 Tukey, John, Ch.6 section 6.6, Ch.20 section 20.8 Tukey's HSD, Ch.20 section 20.8 - pairwise_tukeyhsd(), Ch.20 section 20.8 Tuskegee syphilis study, Ch.4 section 4.7, Ch.27 section 27.6 Tversky, Amos, Ch.8, Ch.9 two-proportion z-test, Ch.16 section 16.6 two-sample t-test, Ch.16 section 16.3 two-tailed test, Ch.13 section 13.8 Type I error, Ch.13 section 13.9, Appendix F Q6 Type II error, Ch.13 section 13.9, Appendix F Q6

U

UCI Machine Learning Repository, Appendix D unbiased estimator, Ch.11 section 11.4 unimodal, Ch.5 section 5.7 unsupervised learning, Ch.26 section 26.2 utilitarian ethics, Ch.27 section 27.8

V

.value_counts(), Ch.3 section 3.6 variable, Ch.1 section 1.1 variance, Ch.6 section 6.5 - .var(), Ch.6 section 6.5.1 variance inflation factor, see VIF Verhulst, Pierre-Francois, Ch.24 section 24.2 VIF (Variance Inflation Factor), Ch.23 section 23.8 vos Savant, Marilyn, Ch.8 case-study-01

W

Wald, Abraham, Ch.4 section 4.3 Wald interval, Ch.12 section 12.7, Ch.14 section 14.6 Wallis, W. Allen, Ch.21 further-reading Washington, Professor James [anchor example], Ch.1 section 1.5, Ch.8 section 8.4, Ch.9 section 9.3, Ch.10 section 10.11, Ch.13 section 13.9, Ch.16 section 16.6, Ch.19, Ch.24 section 24.12, Ch.26 section 26.5, Ch.27 section 27.7 Welch's ANOVA, Ch.20 section 20.9 Welch's t-test, Ch.16 section 16.3 Wickham, Hadley, Ch.7 section 7.9 Wilcoxon, Frank, Ch.21 further-reading Wilcoxon rank-sum test, Ch.21 section 21.6 - stats.mannwhitneyu(), Ch.21 section 21.6 Wilcoxon signed-rank test, Ch.21 section 21.7 - stats.wilcoxon(), Ch.21 section 21.7 Williams, Daria [fictional], Ch.1 section 1.5, Ch.10 section 10.3, Ch.11 section 11.5, Ch.13 section 13.3, Ch.14 section 14.5, Ch.17 section 17.7 Wilson, Edwin, Ch.14 further-reading Wilson interval, Ch.14 section 14.6 - proportion_confint(method='wilson'), Ch.14 section 14.8 winner's curse, Ch.17 section 17.6 with replacement, Ch.18 section 18.3 within-group variability, Ch.20 section 20.3 Wong's palette, Ch.25 section 25.5 World Bank, Appendix D World Happiness Report, Appendix D World Health Organization (WHO), Appendix D

Z

z-score, Ch.6 section 8, Ch.10 section 10.6 z-table, Ch.10 section 10.6, Appendix A section A.1