Appendix F: Notation and Symbols Guide

This appendix provides a comprehensive reference for all mathematical, statistical, set-theoretic, graphical, information-theoretic, and machine learning notation used throughout the textbook. Symbols are organized by domain and accompanied by plain-language definitions, the mathematical form where relevant, and chapter references. This appendix is especially useful when reading Chapters 22–24 (computational methods) and Chapter 28 (Bayesian reasoning).


F.1 Probability Notation

Probability statements are fundamental to Chapters 3, 13, 16, 28, and Appendix A.

Symbol Name Plain-language meaning Example
P(A) Probability of A The chance that event A occurs; a number between 0 and 1 P(article is false) = 0.08
P(Aᶜ) or P(not A) or P(Ā) Complement of A The chance that A does NOT occur; equals 1 − P(A) P(article is true) = 0.92
P(A ∩ B) or P(A and B) Joint probability The chance that both A and B occur together P(false AND shared)
P(A ∪ B) or P(A or B) Union probability The chance that at least one of A or B occurs P(false OR sensationalized)
P(A | B) Conditional probability The chance of A occurring given that B has already occurred P(false | emotionally charged)
P(B | A) Likelihood The probability of observing B if A is the true state P(positive test | disease)

Bayes' Theorem in Full Notation

P(A|B) = P(B|A) × P(A) / P(B)

Expanded form using the law of total probability:

P(A|B) = P(B|A) × P(A) / [P(B|A)×P(A) + P(B|Aᶜ)×P(Aᶜ)]
Component Name Meaning
P(A) Prior Belief in A before seeing evidence B
P(A|B) Posterior Updated belief in A after seeing evidence B
P(B|A) Likelihood How probable is B if A is true?
P(B) Marginal likelihood Total probability of observing B under all scenarios

Additional Probability Notation

Symbol Meaning
Ω Sample space — the set of all possible outcomes
Empty set — the event with probability 0
P(A ∩ B) = P(A) × P(B) Independence condition — A and B are independent if this holds
E[X] or μₓ Expected value (mean) of random variable X
Var(X) or σ²ₓ Variance of random variable X
SD(X) or σₓ Standard deviation of random variable X
Cov(X, Y) Covariance of X and Y
X ~ N(μ, σ²) X is normally distributed with mean μ and variance σ²
X ~ Bin(n, p) X follows a Binomial distribution with n trials, probability p
X ~ Poisson(λ) X follows a Poisson distribution with rate λ

F.2 Statistical Notation

Used throughout Chapters 27–29 and Appendix A.

Descriptive Statistics

Symbol Name Formula Meaning
x̄ (x-bar) Sample mean x̄ = Σxᵢ / n Average of observed values
μ (mu) Population mean μ = Σxᵢ / N True average of all values in the population
s Sample standard deviation s = √[Σ(xᵢ − x̄)² / (n−1)] Spread of sample values around the sample mean
σ (sigma) Population standard deviation σ = √[Σ(xᵢ − μ)² / N] Spread of population values around the population mean
Sample variance s² = Σ(xᵢ − x̄)² / (n−1) Squared spread of sample values
σ² Population variance σ² = Σ(xᵢ − μ)² / N Squared spread of population values
n Sample size Number of observations in a sample
N Population size Total number of units in the population
Mdn Median Middle value when sorted Value separating the upper and lower halves
Mo Mode Most frequently occurring value

Inference and Hypothesis Testing

Symbol Name Meaning
H₀ Null hypothesis The hypothesis of no effect, no difference, or no relationship
H₁ or Hₐ Alternative hypothesis The hypothesis of an effect, difference, or relationship
α (alpha) Significance level The probability threshold for rejecting H₀ (typically 0.05)
β (beta) Type II error rate The probability of failing to reject H₀ when H₁ is true
1 − β Statistical power The probability of correctly rejecting H₀ when H₁ is true
p p-value The probability of observing a test statistic this extreme or more extreme, if H₀ were true
t t-statistic Test statistic following a t-distribution; used in t-tests
F F-statistic Test statistic following an F-distribution; used in ANOVA
χ² (chi-square) Chi-square statistic Test statistic following a chi-square distribution; used for categorical data tests
df Degrees of freedom The number of values free to vary in computing a statistic
CI Confidence interval Range of plausible values for a parameter, e.g., 95% CI [0.32, 0.48]
z Z-score or Z-statistic Standardized value: z = (x − μ)/σ; also used as a test statistic
SE Standard error Standard deviation of a sampling distribution: SE = s/√n

Correlation and Regression

Symbol Name Formula Meaning
r Pearson correlation Σ[(xᵢ−x̄)(yᵢ−ȳ)] / [√Σ(xᵢ−x̄)² × √Σ(yᵢ−ȳ)²] Linear relationship strength between X and Y; ranges −1 to +1
ρ (rho) Spearman correlation r on rank-transformed data Monotonic relationship; robust to outliers and non-normality
Coefficient of determination R² = r² (for simple regression) Proportion of variance in Y explained by X
β₀ Intercept Predicted Y when X = 0 Baseline level of Y independent of X
β₁ Slope Change in predicted Y per unit increase in X Strength of linear relationship in regression
ε (epsilon) Residual / error ε = Y − Ŷ Unexplained deviation between observed and predicted Y
Ŷ (Y-hat) Predicted value Ŷ = β₀ + β₁X Fitted value from the regression model

Effect Sizes

Symbol Name Formula Interpretation
d Cohen's d (μ₁ − μ₂) / σ_pooled Standardized mean difference; 0.2 small, 0.5 medium, 0.8 large
g Hedges' g d × correction factor Bias-corrected d for unequal sample sizes
OR Odds ratio [P(A|E)/P(not A|E)] / [P(A|not E)/P(not A|not E)] Comparison of odds between two groups
RR Relative risk P(outcome|exposed) / P(outcome|unexposed) Ratio of outcome probabilities between groups
ARR Absolute risk reduction P(outcome|unexposed) − P(outcome|exposed) Absolute difference in outcome probability
NNT Number needed to treat 1 / ARR Number of people needing treatment to prevent one outcome
V Cramér's V √(χ²/(n × min(r−1, c−1))) Effect size for chi-square test of independence

F.3 Set Notation

Used in formal definitions throughout the textbook, particularly in Chapters 22–24.

Symbol Name Plain-language meaning Example
Element of "is a member of" the set x ∈ {1, 2, 3} means x is one of 1, 2, or 3
Not element of "is not a member of" the set 4 ∉
⊂ or ⊆ Subset Every element of A is also in B {1,2} ⊂
Intersection Elements in both A and B {1,2,3} ∩ {2,3,4} = {2,3}
Union Elements in A, in B, or in both {1,2} ∪ {2,3} = {1,2,3}
Empty set The set with no elements {x : x > 5 and x < 3} = ∅
|A| Cardinality The number of elements in set A |{1,2,3}| = 3
Aᶜ or Ā Complement All elements NOT in A (relative to a universal set U) If U = {1,...,5} and A = {1,2}, then Aᶜ = {3,4,5}
× Cartesian product All ordered pairs (a, b) with a ∈ A and b ∈ B {0,1} × {0,1} = {(0,0),(0,1),(1,0),(1,1)}
: or | Such that Used in set-builder notation {x : x > 0} means "all x such that x > 0"
For all Universal quantifier ∀x ∈ A, f(x) > 0 means "for every x in A, f(x) is positive"
There exists Existential quantifier ∃x ∈ A such that f(x) = 0 means "some x in A makes f zero"

F.4 Graph and Network Notation

Used in Chapter 23 (Network Analysis) and Chapter 16 (Information Diffusion).

Symbol Name Definition
G Graph A mathematical structure G = (V, E) consisting of a set of vertices (nodes) and edges
V Vertex set The set of all nodes in a graph; |V| = n is the number of nodes
E Edge set The set of all edges (connections) in a graph; |E| = m is the number of edges
(u, v) Edge An edge connecting node u and node v
G = (V, E) Graph definition The graph G is fully defined by its vertex set V and edge set E
k or d(v) Node degree The number of edges incident to node v; in directed graphs, separate in-degree (k_in) and out-degree (k_out)
A Adjacency matrix Square n × n matrix where A_ij = 1 if edge (i,j) exists, 0 otherwise; A_ij = w_ij for weighted graphs
D Degree matrix Diagonal n × n matrix where D_ii = k_i (degree of node i)
L Laplacian matrix L = D − A; encodes graph structure; eigenvalues reveal community count
Mean degree Average degree across all nodes: k̄ = 2|E| / |V|
C(v) Clustering coefficient Fraction of node v's neighbors that are also connected to each other
d(u,v) Shortest path length Number of edges in the shortest path between nodes u and v
diam(G) Graph diameter Maximum shortest path length across all pairs of nodes
Q Modularity Q = Σ_c [L_c/m − (d_c/2m)²]; measures quality of community structure; Q ∈ [−1, 1]
C_B(v) Betweenness centrality Fraction of all-pairs shortest paths that pass through node v
C_C(v) Closeness centrality Reciprocal of the average shortest path from v to all other nodes
PR(v) PageRank Iterative measure of node importance based on quality-weighted in-links
β Infection rate In SIR model: rate at which susceptible individuals become infected per contact
γ Recovery rate In SIR model: rate at which infected individuals recover
R₀ Basic reproduction number R₀ = β/γ; average number of secondary infections per infected individual; epidemic spreads if R₀ > 1

F.5 Information Theory Notation

Used in Chapters 23 and Appendix A.

Symbol Name Formula Meaning
H(X) Shannon entropy H(X) = −Σᵢ p(xᵢ) log₂ p(xᵢ) Average uncertainty of random variable X; measured in bits
H(X, Y) Joint entropy H(X,Y) = −ΣΣ p(x,y) log₂ p(x,y) Uncertainty of the joint distribution of X and Y
H(X|Y) Conditional entropy H(X|Y) = H(X,Y) − H(Y) Remaining uncertainty in X after learning Y
I(X;Y) Mutual information I(X;Y) = H(X) + H(Y) − H(X,Y) Amount of information X and Y share; I(X;Y) ≥ 0
D_KL(P||Q) KL divergence Σᵢ P(xᵢ) log[P(xᵢ)/Q(xᵢ)] How much P differs from reference distribution Q; not symmetric
D_JS(P||Q) Jensen-Shannon divergence [D_KL(P||M) + D_KL(Q||M)] / 2, where M = (P+Q)/2 Symmetric, bounded version of KL divergence
b Log base Typically b = 2 (bits), b = e (nats), or b = 10 (dits) Unit depends on base: base 2 gives bits

F.6 Machine Learning Notation

Used in Chapters 22–24.

Data Representation

Symbol Name Meaning
x Feature vector A single data point represented as a vector of feature values; x ∈ ℝᵈ
xᵢ i-th feature The i-th component of the feature vector x
X Feature matrix The n × d matrix of all training examples, where each row is one example
y Label vector The vector of true labels for each training example
ŷ Predicted label The model's prediction; also written ŷ = f(x; θ)
d Feature dimension Number of features in each data point
n Number of examples Number of training examples (rows in X)
k Number of classes Number of distinct output categories in a classification problem

Model Parameters and Training

Symbol Name Meaning
w or θ Weights / Parameters The learnable parameters of a model
w₀ or b Bias The intercept term in a linear model
L(θ) or ℒ Loss function A scalar measure of how wrong the model's predictions are
∇L or ∇θL Gradient The vector of partial derivatives of L with respect to all parameters θ
η or α Learning rate Step size for gradient descent: θ ← θ − η∇L
λ Regularization parameter Controls strength of L1 (Lasso) or L2 (Ridge) regularization penalty

Evaluation Metrics

Symbol Name Formula Meaning
TP True Positive Misinformation correctly identified as misinformation
TN True Negative True content correctly identified as true
FP False Positive True content incorrectly labeled as misinformation
FN False Negative Misinformation incorrectly labeled as true
Acc Accuracy (TP+TN)/(TP+TN+FP+FN) Proportion of all cases correctly classified
P Precision TP/(TP+FP) Of all predicted positives, fraction truly positive
R Recall (sensitivity) TP/(TP+FN) Of all true positives, fraction correctly identified
F₁ F1 score 2PR/(P+R) Harmonic mean of precision and recall
AUC Area under ROC curve ∫ ROC curve Probability that model ranks a positive case above a negative case

NLP Notation

Symbol Name Meaning
V Vocabulary The set of all unique tokens in the training corpus
|V| Vocabulary size Number of unique tokens
tf(t,d) Term frequency Number of times term t appears in document d
idf(t) Inverse document frequency log(N / df(t)); N = total docs, df(t) = docs containing t
tfidf(t,d) TF-IDF weight tf(t,d) × idf(t); higher for distinctive terms
eₜ Token embedding Dense vector representation of token t; eₜ ∈ ℝᵈᵉ
d_e Embedding dimension Size of the token embedding vector (e.g., 768 for BERT-base)
h Hidden state Internal representation in a neural network layer
Attn(Q,K,V) Attention Scaled dot-product attention: softmax(QKᵀ/√dₖ)V

F.7 Notation from Chapters 22–24 and 28

This section provides quick look-up for notation introduced in specific chapters.

Chapter 22: Computational Misinformation Detection

TF-IDF(t, d) = tf(t, d) × log(N / (1 + df(t)))

Logistic regression: P(y=1|x) = σ(wᵀx + b) = 1 / (1 + exp(−(wᵀx + b)))

Cross-entropy loss: L = −[y log(ŷ) + (1−y) log(1−ŷ)]

BERT tokenization: input = [CLS] token₁ token₂ ... tokenₙ [SEP]

Classification head: ŷ = softmax(W · h_[CLS] + b)

Chapter 23: Network Analysis

Degree of node v: k(v) = |{u : (u,v) ∈ E}|

Betweenness centrality: C_B(v) = Σ_{s≠v≠t} [σ_st(v) / σ_st]
  where σ_st = number of shortest paths from s to t
        σ_st(v) = those paths that pass through v

PageRank: PR(v) = (1−d)/N + d × Σ_{u→v} [PR(u) / k_out(u)]
  where d ≈ 0.85 is the damping factor

Modularity: Q = (1/2m) Σ_{ij} [A_ij − k_i k_j / 2m] × δ(c_i, c_j)
  where δ(c_i, c_j) = 1 if i and j are in the same community

SIR model differential equations:
  dS/dt = −β S I / N
  dI/dt = β S I / N − γ I
  dR/dt = γ I
  R₀ = β / γ

Chapter 24: Coordinated Behavior Detection

Cosine similarity: sim(A, B) = (A · B) / (|A| × |B|)

Jaccard similarity: J(A, B) = |A ∩ B| / |A ∪ B|

Co-retweet matrix C: C_ij = number of tweets retweeted by both account i and account j

Synchrony score for account pair (i,j):
  S_ij = number of posts within window Δt / total posts by i and j

Chapter 28: Bayesian Reasoning Applied

Posterior ∝ Likelihood × Prior
P(θ|data) ∝ P(data|θ) × P(θ)

Bayes factor: BF = P(data|H₁) / P(data|H₀)
  BF > 3: moderate evidence for H₁
  BF > 10: strong evidence for H₁
  BF > 100: decisive evidence for H₁

Credibility updating:
  Odds(H|E) = Odds(H) × [P(E|H) / P(E|¬H)]
  Odds(H) = P(H) / P(¬H)

F.8 Greek Alphabet Reference

Uppercase Lowercase Name Common use in this textbook
Α α Alpha Significance level; learning rate
Β β Beta Type II error rate; regression coefficient
Γ γ Gamma Recovery rate in SIR model
Δ δ Delta Change in a quantity; time window
Ε ε Epsilon Error term; small positive constant
Θ θ Theta Model parameters (generic)
Λ λ Lambda Poisson rate; regularization strength
Μ μ Mu Population mean
Ξ ξ Xi
Π π Pi Mathematical constant 3.14159...; product notation
Ρ ρ Rho Spearman correlation; density
Σ σ Sigma Summation (uppercase); standard deviation (lowercase)
Τ τ Tau Time; Kendall's rank correlation
Φ φ Phi Cumulative standard normal distribution function
Χ χ Chi Chi-square test statistic
Ψ ψ Psi
Ω ω Omega Sample space (uppercase); angular frequency

F.9 Notation Conventions

Scalars: Lowercase italic letters — a, b, x, y.

Vectors: Lowercase bold letters — x, w, θ — or lowercase letters with an arrow: x⃗.

Matrices: Uppercase bold letters — X, A, W.

Sets: Uppercase calligraphic or italic letters — V, E, S, Ω.

Random variables: Uppercase italic letters — X, Y, Z.

Realizations of random variables: Corresponding lowercase letters — x, y, z.

Estimated quantities: Hat notation — θ̂ is an estimate of θ; ŷ is a prediction.

Transpose: Superscript T — xᵀ is the transpose of x.

Norms: ‖x‖₂ is the L2 (Euclidean) norm; ‖x‖₁ is the L1 norm.

Infinity: ∞ denotes an unbounded limit.

Proportionality: ∝ means "is proportional to" (differs by a positive constant).

Approximately: ≈ means approximately equal; ~ (in probability) means "distributed as."