Further Reading: Nonparametric Methods

Books

For Deeper Understanding

Myles Hollander, Douglas A. Wolfe, and Eric Chicken, Nonparametric Statistical Methods, 3rd edition (2014) The definitive reference on nonparametric statistics. Hollander, Wolfe, and Chicken cover every test in this chapter in exhaustive detail — exact distributions, large-sample approximations, tied observations, confidence intervals, and power calculations. The third edition adds modern computational methods. If you pursue research that relies heavily on nonparametric methods, this is the book to own. Graduate-level but clearly written.

W. J. Conover, Practical Nonparametric Statistics, 3rd edition (1999) More accessible than Hollander and Wolfe, Conover's text is organized around test selection rather than mathematical theory. Each chapter begins with the question you're trying to answer and walks through the appropriate nonparametric test. His comparison tables — showing when each test is preferred over its parametric counterpart — are particularly useful. The title word "Practical" is earned.

Erich L. Lehmann, Nonparametrics: Statistical Methods Based on Ranks, revised edition (2006) Lehmann's classic text provides a deeper mathematical treatment of rank-based methods. If you're interested in why rank-based tests work — the mathematical theory behind the Wilcoxon tests, efficiency calculations, and the asymptotic relative efficiency results mentioned in this chapter — this is the authoritative source. The 2006 revision with Howard D'Abrera added modern material. For mathematically inclined students.

For the Conceptually Curious

Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference (2004) Wasserman's graduate-level overview includes a clear chapter on nonparametric methods that places them in the broader context of statistical inference. His treatment connects nonparametric tests to permutation tests (Chapter 18 of this textbook) and shows how both arise from the same fundamental principle: testing hypotheses without specifying a parametric model. Chapter 9 is particularly relevant.

Rand R. Wilcox, Introduction to Robust Estimation and Hypothesis Testing, 4th edition (2022) Wilcox goes beyond traditional nonparametric methods to cover modern robust statistical techniques — trimmed means, M-estimators, and bootstrap methods. His central argument: classical methods (t-tests, ANOVA) can be unreliable with non-normal data even with large samples, and robust alternatives should be the default. A provocative and well-supported perspective that challenges the conventional approach.

Articles and Papers

Wilcoxon, F. (1945). "Individual Comparisons by Ranking Methods." Biometrics Bulletin, 1(6), 80-83. The original three-page paper that introduced both the rank-sum test and the signed-rank test. Wilcoxon's presentation is remarkably concise — he derives both tests in under three pages, with worked examples. It's a masterclass in clear scientific writing and is easily readable by introductory statistics students. Available free through JSTOR.

Mann, H. B., & Whitney, D. R. (1947). "On a Test of Whether One of Two Random Variables is Stochastically Larger than the Other." Annals of Mathematical Statistics, 18(1), 50-60. Mann and Whitney's extension of Wilcoxon's rank-sum test, providing the U-statistic formulation and proving its theoretical properties. More mathematical than Wilcoxon's paper but foundational for understanding what the Mann-Whitney U test actually tests (stochastic dominance, not just median comparison).

Kruskal, W. H., & Wallis, W. A. (1952). "Use of Ranks in One-Criterion Variance Analysis." Journal of the American Statistical Association, 47(260), 583-621. The paper introducing the Kruskal-Wallis test as a nonparametric ANOVA. Kruskal and Wallis provide both the test statistic derivation and extensive tables of critical values. Their discussion of why the chi-square approximation works — connecting to the central limit theorem for rank statistics — is insightful.

Hodges, J. L., & Lehmann, E. L. (1956). "The Efficiency of Some Nonparametric Competitors of the t-Test." Annals of Mathematical Statistics, 27(2), 324-335. The paper that established the asymptotic relative efficiency (ARE) results cited in this chapter. Hodges and Lehmann proved that the Wilcoxon test is at least 86.4% as efficient as the t-test for any continuous distribution, and exactly 95.5% as efficient for normal distributions. Their work showed that the "price" of nonparametric robustness is far lower than previously assumed.

Fagerland, M. W. (2012). "t-tests, non-parametric tests, and large studies — a paradox of statistical practice?" BMC Medical Research Methodology, 12, 78. A thought-provoking paper arguing that nonparametric tests are sometimes used unnecessarily with large samples (where the CLT ensures the t-test works fine) and underused with small samples (where they're genuinely needed). Fagerland provides practical guidelines for choosing between parametric and nonparametric approaches — directly relevant to the decision framework in Section 21.9.

de Winter, J. C. F., & Dodou, D. (2010). "Five-Point Likert Items: t-test versus Mann-Whitney-Wilcoxon." Practical Assessment, Research & Evaluation, 15, Article 11. A simulation study comparing the t-test and Mann-Whitney U test specifically for 5-point Likert scale data — exactly the kind of ordinal data Maya encounters. The authors find that the t-test and Mann-Whitney give similar results for Likert data under most conditions, but the Mann-Whitney is preferred when sample sizes are small or unequal. Directly addresses the "can I use a t-test on Likert data?" question.

Online Resources

Penn State STAT 500: "Nonparametric Tests" https://online.stat.psu.edu/stat500/lesson/11

A well-structured free tutorial covering the sign test, Wilcoxon tests, and Kruskal-Wallis test with worked examples and practice problems. The visual approach — showing the ranking procedure step by step — complements the treatment in this chapter.

Statquest: "Wilcoxon Rank Sum Test / Mann-Whitney U Test, Clearly Explained" https://www.youtube.com/watch?v=BT1FKd1Qzjw

Josh Starmer's visual explanation of the Mann-Whitney U test uses animations to show how ranking works and why it makes the test robust to outliers. His trademark "Clearly Explained" approach is ideal for building intuition before tackling the formulas. Previously recommended for hypothesis testing (Chapter 13) and ANOVA (Chapter 20).

Real Statistics Using Excel https://real-statistics.com/non-parametric-tests/

Charles Zaiontz's website provides step-by-step Excel implementations of every nonparametric test in this chapter, along with the free Real Statistics Resource Pack add-in. If you need to perform nonparametric tests in Excel (because your workplace doesn't use Python), this is the best free resource available.

SciPy Documentation: scipy.stats https://docs.scipy.org/doc/scipy/reference/stats.html

The official documentation for the Python functions used in this chapter: mannwhitneyu(), wilcoxon(), kruskal(), and binomtest(). Each function page includes parameter descriptions, mathematical definitions, and usage examples. Bookmark this as your go-to reference for implementation details.

Connections to Future Chapters

Chapter 22 (Correlation and Simple Linear Regression): The ranking logic from this chapter extends naturally to correlation. Spearman's rank correlation ($r_s$) is the nonparametric alternative to Pearson's correlation ($r$): it converts both variables to ranks and then computes Pearson's $r$ on those ranks. This makes Spearman's correlation robust to outliers and appropriate for ordinal data — exactly the same advantages that rank-based tests provide over t-tests and ANOVA. When you encounter Spearman's $r_s$ in Chapter 22, you'll recognize it as the correlation equivalent of the Wilcoxon rank-sum test.

Chapter 23 (Multiple Regression): Robust regression techniques address the same problem for regression that nonparametric methods address for group comparisons: what do you do when the usual assumptions (normality, homoscedasticity) fail? Techniques like robust standard errors (HC standard errors) and quantile regression extend the "robust methods for messy reality" theme to the regression setting.

Chapter 26 (Statistics and AI): Many machine learning methods are inherently nonparametric — they make no assumptions about the distribution of the data. Random forests, support vector machines, and k-nearest neighbors don't assume normality or linearity. The philosophy behind nonparametric statistics — let the data speak without imposing distributional assumptions — is exactly the philosophy behind many modern AI methods.

Historical Note: Wilcoxon and the Chemical Company

Frank Wilcoxon (1892-1965) was not a professor of statistics. He was an industrial chemist who worked at American Cyanamid Company, a chemical manufacturer. His 1945 paper introducing both the rank-sum and signed-rank tests grew from the practical needs of chemical research: comparing the effectiveness of insecticides, fungicides, and industrial processes, often with small samples and no guarantee of normality.

Wilcoxon's key insight — that replacing measurements with their ranks would produce a valid test without distributional assumptions — was born from necessity, not mathematical elegance. He needed tests that would work with the messy, small-sample data that industrial research produced, and the existing parametric tests required assumptions that his data couldn't satisfy.

What's remarkable about Wilcoxon's 1945 paper is its brevity and clarity. In just three pages, he introduced two tests that would become among the most widely used in all of statistics. He provided worked examples, critical value tables, and practical guidance — all in a format that working scientists could immediately apply.

The lesson for students: some of the most powerful statistical tools were created not by mathematicians seeking theoretical elegance, but by practitioners who needed methods that worked with real data. That's exactly the spirit of this chapter: when the elegant assumptions of parametric methods fail, practical tools step in.

Henry Mann and Donald Whitney extended Wilcoxon's rank-sum test two years later (1947), deriving the U-statistic formulation and proving the test's theoretical properties more rigorously. The dual naming — "Wilcoxon rank-sum test" and "Mann-Whitney U test" — reflects this parallel development and persists in different textbooks and software packages to this day.

William Kruskal and W. Allen Wallis extended the approach to multiple groups in 1952, completing the nonparametric toolkit that mirrors the parametric progression from two-sample t-test to one-way ANOVA.