Chapter 21: Further Reading

Exploratory Data Analysis of Market Data


Foundational EDA

  1. Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley. The seminal work that established EDA as a discipline. Tukey's emphasis on visualization, resistance to outliers, and iterative analysis remains as relevant today as when it was written. Essential for understanding the philosophy behind EDA.

  2. Cleveland, W. S. (1993). Visualizing Data. Hobart Press. A practical guide to statistical graphics, emphasizing the principles of effective data visualization. Cleveland's work on scatterplot smoothing, trellis displays, and graphical methods for data exploration is directly applicable to prediction market analysis.

  3. Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media. While written for R, the EDA framework and philosophy presented in this book are language-agnostic. The chapter on EDA provides an excellent modern framework for systematic data exploration.


Time Series Analysis

  1. Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press. The definitive graduate-level textbook on time series analysis, including extensive coverage of ARIMA models, spectral analysis, and regime-switching models. Chapter 22 on Markov switching is particularly relevant to Section 21.9.

  2. Tsay, R. S. (2010). Analysis of Financial Time Series. 3rd ed. Wiley. A comprehensive treatment of financial time series methods, including volatility modeling (GARCH), multivariate analysis, and non-linear models. The examples are from traditional finance but the methods transfer directly to prediction markets.

  3. Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice. 3rd ed. OTexts. A freely available online textbook covering modern forecasting methods. The chapters on time series decomposition, autocorrelation analysis, and exponential smoothing are directly applicable. Available at https://otexts.com/fpp3/.


Volatility Modeling

  1. Bollerslev, T. (1986). "Generalized Autoregressive Conditional Heteroskedasticity." Journal of Econometrics, 31(3), 307-327. The original GARCH paper. Understanding this foundational work is essential for anyone applying volatility models to financial or prediction market data.

  2. Engle, R. F. (2001). "GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics." Journal of Economic Perspectives, 15(4), 157-168. An accessible introduction to ARCH/GARCH models by the inventor of ARCH. Explains the intuition behind volatility clustering and how these models capture it.

  3. Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). "Modeling and Forecasting Realized Volatility." Econometrica, 71(2), 579-625. Introduces the concept of realized volatility from high-frequency data. The methods are applicable to prediction markets with frequent trading, providing a model-free measure of volatility.


Regime Detection and Change-Point Analysis

  1. Rabiner, L. R. (1989). "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition." Proceedings of the IEEE, 77(2), 257-286. The classic tutorial on HMMs. Despite its focus on speech recognition, this paper provides the clearest explanation of the forward-backward algorithm, Viterbi decoding, and Baum-Welch parameter estimation.

  2. Hamilton, J. D. (1989). "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." Econometrica, 57(2), 357-384. Introduces the Markov-switching model for economic time series. This paper laid the foundation for regime-switching models in finance and economics.

  3. Killick, R., Fearnhead, P., & Eckley, I. A. (2012). "Optimal Detection of Changepoints with a Linear Computational Cost." Journal of the American Statistical Association, 107(500), 1590-1598. Introduces the PELT algorithm for efficient change-point detection. This is the algorithm implemented in the Python ruptures library used in this chapter.

  4. Adams, R. P., & MacKay, D. J. C. (2007). "Bayesian Online Changepoint Detection." arXiv preprint arXiv:0710.3742. Presents an elegant Bayesian approach to online change-point detection. Particularly relevant for real-time monitoring of prediction markets.

  5. Truong, C., Oudre, L., & Vayer, N. (2020). "Selective Review of Offline Change Point Detection Methods." Signal Processing, 167, 107299. A comprehensive review of change-point detection methods, covering both parametric and non-parametric approaches, with guidance on method selection. Accompanies the ruptures Python library.


Distribution Analysis

  1. Johnson, N. L., Kotz, S., & Balakrishnan, N. (1995). Continuous Univariate Distributions. Vol. 2. Wiley. The definitive reference on continuous distributions, including extensive coverage of the Beta distribution and its properties. Essential for understanding the distributional models discussed in Section 21.7.

  2. Hartigan, J. A., & Hartigan, P. M. (1985). "The Dip Test of Unimodality." The Annals of Statistics, 13(1), 70-84. Introduces the dip test for unimodality, referenced in Section 21.7.2. The test is a principled way to detect bimodal distributions in prediction market price data.


Prediction Markets (General)

  1. Wolfers, J., & Zitzewitz, E. (2004). "Prediction Markets." Journal of Economic Perspectives, 18(2), 107-126. An excellent introduction to prediction markets from an economics perspective. Discusses calibration, market design, and the interpretation of prediction market prices as probabilities.

  2. Arrow, K. J., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J. O., ... & Zitzewitz, E. (2008). "The Promise of Prediction Markets." Science, 320(5878), 877-878. A brief but influential manifesto on the value of prediction markets, signed by prominent economists. Provides context for why rigorous EDA of prediction market data matters.

  3. Manski, C. F. (2006). "Interpreting the Predictions of Prediction Markets." Economics Letters, 91(3), 425-429. A critical examination of whether prediction market prices can be interpreted as probabilities. Relevant to understanding what EDA of prediction market prices actually tells us about underlying beliefs.


Statistical Testing

  1. Box, G. E. P., & Pierce, D. A. (1970). "Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models." Journal of the American Statistical Association, 65(332), 1509-1526. Introduces the Box-Pierce test for serial correlation, the precursor to the Ljung-Box test used in Section 21.6.

  2. Ljung, G. M., & Box, G. E. P. (1978). "On a Measure of Lack of Fit in Time Series Models." Biometrika, 65(2), 297-303. The original Ljung-Box test paper. This remains the standard test for serial correlation in time series residuals.


Visualization and Communication

  1. Tufte, E. R. (2001). The Visual Display of Quantitative Information. 2nd ed. Graphics Press. The classic work on data visualization principles. Tufte's concepts of data-ink ratio, chart junk, and small multiples are directly applicable to prediction market visualization.

  2. Wilke, C. O. (2019). Fundamentals of Data Visualization. O'Reilly Media. A modern, practical guide to data visualization. Covers color theory, coordinate systems, and visualization of distributions and time series. Available freely at https://clauswilke.com/dataviz/.


Python Libraries and Tools

  1. McKinney, W. (2017). Python for Data Analysis. 2nd ed. O'Reilly Media. The definitive guide to pandas, the Python library used throughout this chapter for data manipulation. Essential for anyone working with prediction market data in Python.

  2. VanderPlas, J. (2016). Python Data Science Handbook. O'Reilly Media. Comprehensive coverage of NumPy, pandas, matplotlib, and scikit-learn. The EDA and visualization chapters are particularly relevant. Available freely at https://jakevdp.github.io/PythonDataScienceHandbook/.


Software Documentation

  • statsmodels: https://www.statsmodels.org/ --- Python library for statistical modeling, including time series analysis, autocorrelation tests, and GARCH models.
  • hmmlearn: https://hmmlearn.readthedocs.io/ --- Python library for Hidden Markov Models.
  • ruptures: https://centre-borelli.github.io/ruptures-docs/ --- Python library for change-point detection.
  • plotly: https://plotly.com/python/ --- Interactive visualization library used for the charts in Section 21.10.
  • scipy.stats: https://docs.scipy.org/doc/scipy/reference/stats.html --- Statistical functions including distribution fitting, hypothesis tests, and descriptive statistics.
  • seaborn: https://seaborn.pydata.org/ --- Statistical data visualization built on matplotlib, used for heatmaps and distribution plots.