Appendix F: Notation and Symbols Guide
This appendix provides a comprehensive reference for all mathematical, statistical, set-theoretic, graphical, information-theoretic, and machine learning notation used throughout the textbook. Symbols are organized by domain and accompanied by plain-language definitions, the mathematical form where relevant, and chapter references. This appendix is especially useful when reading Chapters 22–24 (computational methods) and Chapter 28 (Bayesian reasoning).
F.1 Probability Notation
Probability statements are fundamental to Chapters 3, 13, 16, 28, and Appendix A.
| Symbol | Name | Plain-language meaning | Example |
|---|---|---|---|
| P(A) | Probability of A | The chance that event A occurs; a number between 0 and 1 | P(article is false) = 0.08 |
| P(Aᶜ) or P(not A) or P(Ā) | Complement of A | The chance that A does NOT occur; equals 1 − P(A) | P(article is true) = 0.92 |
| P(A ∩ B) or P(A and B) | Joint probability | The chance that both A and B occur together | P(false AND shared) |
| P(A ∪ B) or P(A or B) | Union probability | The chance that at least one of A or B occurs | P(false OR sensationalized) |
| P(A | B) | Conditional probability | The chance of A occurring given that B has already occurred | P(false | emotionally charged) |
| P(B | A) | Likelihood | The probability of observing B if A is the true state | P(positive test | disease) |
Bayes' Theorem in Full Notation
P(A|B) = P(B|A) × P(A) / P(B)
Expanded form using the law of total probability:
P(A|B) = P(B|A) × P(A) / [P(B|A)×P(A) + P(B|Aᶜ)×P(Aᶜ)]
| Component | Name | Meaning |
|---|---|---|
| P(A) | Prior | Belief in A before seeing evidence B |
| P(A|B) | Posterior | Updated belief in A after seeing evidence B |
| P(B|A) | Likelihood | How probable is B if A is true? |
| P(B) | Marginal likelihood | Total probability of observing B under all scenarios |
Additional Probability Notation
| Symbol | Meaning |
|---|---|
| Ω | Sample space — the set of all possible outcomes |
| ∅ | Empty set — the event with probability 0 |
| P(A ∩ B) = P(A) × P(B) | Independence condition — A and B are independent if this holds |
| E[X] or μₓ | Expected value (mean) of random variable X |
| Var(X) or σ²ₓ | Variance of random variable X |
| SD(X) or σₓ | Standard deviation of random variable X |
| Cov(X, Y) | Covariance of X and Y |
| X ~ N(μ, σ²) | X is normally distributed with mean μ and variance σ² |
| X ~ Bin(n, p) | X follows a Binomial distribution with n trials, probability p |
| X ~ Poisson(λ) | X follows a Poisson distribution with rate λ |
F.2 Statistical Notation
Used throughout Chapters 27–29 and Appendix A.
Descriptive Statistics
| Symbol | Name | Formula | Meaning |
|---|---|---|---|
| x̄ (x-bar) | Sample mean | x̄ = Σxᵢ / n | Average of observed values |
| μ (mu) | Population mean | μ = Σxᵢ / N | True average of all values in the population |
| s | Sample standard deviation | s = √[Σ(xᵢ − x̄)² / (n−1)] | Spread of sample values around the sample mean |
| σ (sigma) | Population standard deviation | σ = √[Σ(xᵢ − μ)² / N] | Spread of population values around the population mean |
| s² | Sample variance | s² = Σ(xᵢ − x̄)² / (n−1) | Squared spread of sample values |
| σ² | Population variance | σ² = Σ(xᵢ − μ)² / N | Squared spread of population values |
| n | Sample size | — | Number of observations in a sample |
| N | Population size | — | Total number of units in the population |
| Mdn | Median | Middle value when sorted | Value separating the upper and lower halves |
| Mo | Mode | — | Most frequently occurring value |
Inference and Hypothesis Testing
| Symbol | Name | Meaning |
|---|---|---|
| H₀ | Null hypothesis | The hypothesis of no effect, no difference, or no relationship |
| H₁ or Hₐ | Alternative hypothesis | The hypothesis of an effect, difference, or relationship |
| α (alpha) | Significance level | The probability threshold for rejecting H₀ (typically 0.05) |
| β (beta) | Type II error rate | The probability of failing to reject H₀ when H₁ is true |
| 1 − β | Statistical power | The probability of correctly rejecting H₀ when H₁ is true |
| p | p-value | The probability of observing a test statistic this extreme or more extreme, if H₀ were true |
| t | t-statistic | Test statistic following a t-distribution; used in t-tests |
| F | F-statistic | Test statistic following an F-distribution; used in ANOVA |
| χ² (chi-square) | Chi-square statistic | Test statistic following a chi-square distribution; used for categorical data tests |
| df | Degrees of freedom | The number of values free to vary in computing a statistic |
| CI | Confidence interval | Range of plausible values for a parameter, e.g., 95% CI [0.32, 0.48] |
| z | Z-score or Z-statistic | Standardized value: z = (x − μ)/σ; also used as a test statistic |
| SE | Standard error | Standard deviation of a sampling distribution: SE = s/√n |
Correlation and Regression
| Symbol | Name | Formula | Meaning |
|---|---|---|---|
| r | Pearson correlation | Σ[(xᵢ−x̄)(yᵢ−ȳ)] / [√Σ(xᵢ−x̄)² × √Σ(yᵢ−ȳ)²] | Linear relationship strength between X and Y; ranges −1 to +1 |
| ρ (rho) | Spearman correlation | r on rank-transformed data | Monotonic relationship; robust to outliers and non-normality |
| R² | Coefficient of determination | R² = r² (for simple regression) | Proportion of variance in Y explained by X |
| β₀ | Intercept | Predicted Y when X = 0 | Baseline level of Y independent of X |
| β₁ | Slope | Change in predicted Y per unit increase in X | Strength of linear relationship in regression |
| ε (epsilon) | Residual / error | ε = Y − Ŷ | Unexplained deviation between observed and predicted Y |
| Ŷ (Y-hat) | Predicted value | Ŷ = β₀ + β₁X | Fitted value from the regression model |
Effect Sizes
| Symbol | Name | Formula | Interpretation |
|---|---|---|---|
| d | Cohen's d | (μ₁ − μ₂) / σ_pooled | Standardized mean difference; 0.2 small, 0.5 medium, 0.8 large |
| g | Hedges' g | d × correction factor | Bias-corrected d for unequal sample sizes |
| OR | Odds ratio | [P(A|E)/P(not A|E)] / [P(A|not E)/P(not A|not E)] | Comparison of odds between two groups |
| RR | Relative risk | P(outcome|exposed) / P(outcome|unexposed) | Ratio of outcome probabilities between groups |
| ARR | Absolute risk reduction | P(outcome|unexposed) − P(outcome|exposed) | Absolute difference in outcome probability |
| NNT | Number needed to treat | 1 / ARR | Number of people needing treatment to prevent one outcome |
| V | Cramér's V | √(χ²/(n × min(r−1, c−1))) | Effect size for chi-square test of independence |
F.3 Set Notation
Used in formal definitions throughout the textbook, particularly in Chapters 22–24.
| Symbol | Name | Plain-language meaning | Example |
|---|---|---|---|
| ∈ | Element of | "is a member of" the set | x ∈ {1, 2, 3} means x is one of 1, 2, or 3 |
| ∉ | Not element of | "is not a member of" the set | 4 ∉ |
| ⊂ or ⊆ | Subset | Every element of A is also in B | {1,2} ⊂ |
| ∩ | Intersection | Elements in both A and B | {1,2,3} ∩ {2,3,4} = {2,3} |
| ∪ | Union | Elements in A, in B, or in both | {1,2} ∪ {2,3} = {1,2,3} |
| ∅ | Empty set | The set with no elements | {x : x > 5 and x < 3} = ∅ |
| |A| | Cardinality | The number of elements in set A | |{1,2,3}| = 3 |
| Aᶜ or Ā | Complement | All elements NOT in A (relative to a universal set U) | If U = {1,...,5} and A = {1,2}, then Aᶜ = {3,4,5} |
| × | Cartesian product | All ordered pairs (a, b) with a ∈ A and b ∈ B | {0,1} × {0,1} = {(0,0),(0,1),(1,0),(1,1)} |
| : or | | Such that | Used in set-builder notation | {x : x > 0} means "all x such that x > 0" |
| ∀ | For all | Universal quantifier | ∀x ∈ A, f(x) > 0 means "for every x in A, f(x) is positive" |
| ∃ | There exists | Existential quantifier | ∃x ∈ A such that f(x) = 0 means "some x in A makes f zero" |
F.4 Graph and Network Notation
Used in Chapter 23 (Network Analysis) and Chapter 16 (Information Diffusion).
| Symbol | Name | Definition |
|---|---|---|
| G | Graph | A mathematical structure G = (V, E) consisting of a set of vertices (nodes) and edges |
| V | Vertex set | The set of all nodes in a graph; |V| = n is the number of nodes |
| E | Edge set | The set of all edges (connections) in a graph; |E| = m is the number of edges |
| (u, v) | Edge | An edge connecting node u and node v |
| G = (V, E) | Graph definition | The graph G is fully defined by its vertex set V and edge set E |
| k or d(v) | Node degree | The number of edges incident to node v; in directed graphs, separate in-degree (k_in) and out-degree (k_out) |
| A | Adjacency matrix | Square n × n matrix where A_ij = 1 if edge (i,j) exists, 0 otherwise; A_ij = w_ij for weighted graphs |
| D | Degree matrix | Diagonal n × n matrix where D_ii = k_i (degree of node i) |
| L | Laplacian matrix | L = D − A; encodes graph structure; eigenvalues reveal community count |
| k̄ | Mean degree | Average degree across all nodes: k̄ = 2|E| / |V| |
| C(v) | Clustering coefficient | Fraction of node v's neighbors that are also connected to each other |
| d(u,v) | Shortest path length | Number of edges in the shortest path between nodes u and v |
| diam(G) | Graph diameter | Maximum shortest path length across all pairs of nodes |
| Q | Modularity | Q = Σ_c [L_c/m − (d_c/2m)²]; measures quality of community structure; Q ∈ [−1, 1] |
| C_B(v) | Betweenness centrality | Fraction of all-pairs shortest paths that pass through node v |
| C_C(v) | Closeness centrality | Reciprocal of the average shortest path from v to all other nodes |
| PR(v) | PageRank | Iterative measure of node importance based on quality-weighted in-links |
| β | Infection rate | In SIR model: rate at which susceptible individuals become infected per contact |
| γ | Recovery rate | In SIR model: rate at which infected individuals recover |
| R₀ | Basic reproduction number | R₀ = β/γ; average number of secondary infections per infected individual; epidemic spreads if R₀ > 1 |
F.5 Information Theory Notation
Used in Chapters 23 and Appendix A.
| Symbol | Name | Formula | Meaning |
|---|---|---|---|
| H(X) | Shannon entropy | H(X) = −Σᵢ p(xᵢ) log₂ p(xᵢ) | Average uncertainty of random variable X; measured in bits |
| H(X, Y) | Joint entropy | H(X,Y) = −ΣΣ p(x,y) log₂ p(x,y) | Uncertainty of the joint distribution of X and Y |
| H(X|Y) | Conditional entropy | H(X|Y) = H(X,Y) − H(Y) | Remaining uncertainty in X after learning Y |
| I(X;Y) | Mutual information | I(X;Y) = H(X) + H(Y) − H(X,Y) | Amount of information X and Y share; I(X;Y) ≥ 0 |
| D_KL(P||Q) | KL divergence | Σᵢ P(xᵢ) log[P(xᵢ)/Q(xᵢ)] | How much P differs from reference distribution Q; not symmetric |
| D_JS(P||Q) | Jensen-Shannon divergence | [D_KL(P||M) + D_KL(Q||M)] / 2, where M = (P+Q)/2 | Symmetric, bounded version of KL divergence |
| b | Log base | Typically b = 2 (bits), b = e (nats), or b = 10 (dits) | Unit depends on base: base 2 gives bits |
F.6 Machine Learning Notation
Used in Chapters 22–24.
Data Representation
| Symbol | Name | Meaning |
|---|---|---|
| x | Feature vector | A single data point represented as a vector of feature values; x ∈ ℝᵈ |
| xᵢ | i-th feature | The i-th component of the feature vector x |
| X | Feature matrix | The n × d matrix of all training examples, where each row is one example |
| y | Label vector | The vector of true labels for each training example |
| ŷ | Predicted label | The model's prediction; also written ŷ = f(x; θ) |
| d | Feature dimension | Number of features in each data point |
| n | Number of examples | Number of training examples (rows in X) |
| k | Number of classes | Number of distinct output categories in a classification problem |
Model Parameters and Training
| Symbol | Name | Meaning |
|---|---|---|
| w or θ | Weights / Parameters | The learnable parameters of a model |
| w₀ or b | Bias | The intercept term in a linear model |
| L(θ) or ℒ | Loss function | A scalar measure of how wrong the model's predictions are |
| ∇L or ∇θL | Gradient | The vector of partial derivatives of L with respect to all parameters θ |
| η or α | Learning rate | Step size for gradient descent: θ ← θ − η∇L |
| λ | Regularization parameter | Controls strength of L1 (Lasso) or L2 (Ridge) regularization penalty |
Evaluation Metrics
| Symbol | Name | Formula | Meaning |
|---|---|---|---|
| TP | True Positive | — | Misinformation correctly identified as misinformation |
| TN | True Negative | — | True content correctly identified as true |
| FP | False Positive | — | True content incorrectly labeled as misinformation |
| FN | False Negative | — | Misinformation incorrectly labeled as true |
| Acc | Accuracy | (TP+TN)/(TP+TN+FP+FN) | Proportion of all cases correctly classified |
| P | Precision | TP/(TP+FP) | Of all predicted positives, fraction truly positive |
| R | Recall (sensitivity) | TP/(TP+FN) | Of all true positives, fraction correctly identified |
| F₁ | F1 score | 2PR/(P+R) | Harmonic mean of precision and recall |
| AUC | Area under ROC curve | ∫ ROC curve | Probability that model ranks a positive case above a negative case |
NLP Notation
| Symbol | Name | Meaning |
|---|---|---|
| V | Vocabulary | The set of all unique tokens in the training corpus |
| |V| | Vocabulary size | Number of unique tokens |
| tf(t,d) | Term frequency | Number of times term t appears in document d |
| idf(t) | Inverse document frequency | log(N / df(t)); N = total docs, df(t) = docs containing t |
| tfidf(t,d) | TF-IDF weight | tf(t,d) × idf(t); higher for distinctive terms |
| eₜ | Token embedding | Dense vector representation of token t; eₜ ∈ ℝᵈᵉ |
| d_e | Embedding dimension | Size of the token embedding vector (e.g., 768 for BERT-base) |
| h | Hidden state | Internal representation in a neural network layer |
| Attn(Q,K,V) | Attention | Scaled dot-product attention: softmax(QKᵀ/√dₖ)V |
F.7 Notation from Chapters 22–24 and 28
This section provides quick look-up for notation introduced in specific chapters.
Chapter 22: Computational Misinformation Detection
TF-IDF(t, d) = tf(t, d) × log(N / (1 + df(t)))
Logistic regression: P(y=1|x) = σ(wᵀx + b) = 1 / (1 + exp(−(wᵀx + b)))
Cross-entropy loss: L = −[y log(ŷ) + (1−y) log(1−ŷ)]
BERT tokenization: input = [CLS] token₁ token₂ ... tokenₙ [SEP]
Classification head: ŷ = softmax(W · h_[CLS] + b)
Chapter 23: Network Analysis
Degree of node v: k(v) = |{u : (u,v) ∈ E}|
Betweenness centrality: C_B(v) = Σ_{s≠v≠t} [σ_st(v) / σ_st]
where σ_st = number of shortest paths from s to t
σ_st(v) = those paths that pass through v
PageRank: PR(v) = (1−d)/N + d × Σ_{u→v} [PR(u) / k_out(u)]
where d ≈ 0.85 is the damping factor
Modularity: Q = (1/2m) Σ_{ij} [A_ij − k_i k_j / 2m] × δ(c_i, c_j)
where δ(c_i, c_j) = 1 if i and j are in the same community
SIR model differential equations:
dS/dt = −β S I / N
dI/dt = β S I / N − γ I
dR/dt = γ I
R₀ = β / γ
Chapter 24: Coordinated Behavior Detection
Cosine similarity: sim(A, B) = (A · B) / (|A| × |B|)
Jaccard similarity: J(A, B) = |A ∩ B| / |A ∪ B|
Co-retweet matrix C: C_ij = number of tweets retweeted by both account i and account j
Synchrony score for account pair (i,j):
S_ij = number of posts within window Δt / total posts by i and j
Chapter 28: Bayesian Reasoning Applied
Posterior ∝ Likelihood × Prior
P(θ|data) ∝ P(data|θ) × P(θ)
Bayes factor: BF = P(data|H₁) / P(data|H₀)
BF > 3: moderate evidence for H₁
BF > 10: strong evidence for H₁
BF > 100: decisive evidence for H₁
Credibility updating:
Odds(H|E) = Odds(H) × [P(E|H) / P(E|¬H)]
Odds(H) = P(H) / P(¬H)
F.8 Greek Alphabet Reference
| Uppercase | Lowercase | Name | Common use in this textbook |
|---|---|---|---|
| Α | α | Alpha | Significance level; learning rate |
| Β | β | Beta | Type II error rate; regression coefficient |
| Γ | γ | Gamma | Recovery rate in SIR model |
| Δ | δ | Delta | Change in a quantity; time window |
| Ε | ε | Epsilon | Error term; small positive constant |
| Θ | θ | Theta | Model parameters (generic) |
| Λ | λ | Lambda | Poisson rate; regularization strength |
| Μ | μ | Mu | Population mean |
| Ξ | ξ | Xi | — |
| Π | π | Pi | Mathematical constant 3.14159...; product notation |
| Ρ | ρ | Rho | Spearman correlation; density |
| Σ | σ | Sigma | Summation (uppercase); standard deviation (lowercase) |
| Τ | τ | Tau | Time; Kendall's rank correlation |
| Φ | φ | Phi | Cumulative standard normal distribution function |
| Χ | χ | Chi | Chi-square test statistic |
| Ψ | ψ | Psi | — |
| Ω | ω | Omega | Sample space (uppercase); angular frequency |
F.9 Notation Conventions
Scalars: Lowercase italic letters — a, b, x, y.
Vectors: Lowercase bold letters — x, w, θ — or lowercase letters with an arrow: x⃗.
Matrices: Uppercase bold letters — X, A, W.
Sets: Uppercase calligraphic or italic letters — V, E, S, Ω.
Random variables: Uppercase italic letters — X, Y, Z.
Realizations of random variables: Corresponding lowercase letters — x, y, z.
Estimated quantities: Hat notation — θ̂ is an estimate of θ; ŷ is a prediction.
Transpose: Superscript T — xᵀ is the transpose of x.
Norms: ‖x‖₂ is the L2 (Euclidean) norm; ‖x‖₁ is the L1 norm.
Infinity: ∞ denotes an unbounded limit.
Proportionality: ∝ means "is proportional to" (differs by a positive constant).
Approximately: ≈ means approximately equal; ~ (in probability) means "distributed as."