Appendix F: Notation and Symbols Guide

This appendix provides a comprehensive reference for all mathematical, statistical, set-theoretic, graphical, information-theoretic, and machine learning notation used throughout the textbook. Symbols are organized by domain and accompanied by plain-language definitions, the mathematical form where relevant, and chapter references. This appendix is especially useful when reading Chapters 22–24 (computational methods) and Chapter 28 (Bayesian reasoning).

F.1 Probability Notation

Probability statements are fundamental to Chapters 3, 13, 16, 28, and Appendix A.

Symbol	Name	Plain-language meaning	Example
P(A)	Probability of A	The chance that event A occurs; a number between 0 and 1	P(article is false) = 0.08
P(Aᶜ) or P(not A) or P(Ā)	Complement of A	The chance that A does NOT occur; equals 1 − P(A)	P(article is true) = 0.92
P(A ∩ B) or P(A and B)	Joint probability	The chance that both A and B occur together	P(false AND shared)
P(A ∪ B) or P(A or B)	Union probability	The chance that at least one of A or B occurs	P(false OR sensationalized)
P(A \| B)	Conditional probability	The chance of A occurring given that B has already occurred	P(false \| emotionally charged)
P(B \| A)	Likelihood	The probability of observing B if A is the true state	P(positive test \| disease)

Bayes' Theorem in Full Notation

P(A|B) = P(B|A) × P(A) / P(B)

Expanded form using the law of total probability:

P(A|B) = P(B|A) × P(A) / [P(B|A)×P(A) + P(B|Aᶜ)×P(Aᶜ)]

Component	Name	Meaning
P(A)	Prior	Belief in A before seeing evidence B
P(A\|B)	Posterior	Updated belief in A after seeing evidence B
P(B\|A)	Likelihood	How probable is B if A is true?
P(B)	Marginal likelihood	Total probability of observing B under all scenarios

Additional Probability Notation

Symbol	Meaning
Ω	Sample space — the set of all possible outcomes
∅	Empty set — the event with probability 0
P(A ∩ B) = P(A) × P(B)	Independence condition — A and B are independent if this holds
E[X] or μₓ	Expected value (mean) of random variable X
Var(X) or σ²ₓ	Variance of random variable X
SD(X) or σₓ	Standard deviation of random variable X
Cov(X, Y)	Covariance of X and Y
X ~ N(μ, σ²)	X is normally distributed with mean μ and variance σ²
X ~ Bin(n, p)	X follows a Binomial distribution with n trials, probability p
X ~ Poisson(λ)	X follows a Poisson distribution with rate λ

F.2 Statistical Notation

Used throughout Chapters 27–29 and Appendix A.

Descriptive Statistics

Symbol	Name	Formula	Meaning
x̄ (x-bar)	Sample mean	x̄ = Σxᵢ / n	Average of observed values
μ (mu)	Population mean	μ = Σxᵢ / N	True average of all values in the population
s	Sample standard deviation	s = √[Σ(xᵢ − x̄)² / (n−1)]	Spread of sample values around the sample mean
σ (sigma)	Population standard deviation	σ = √[Σ(xᵢ − μ)² / N]	Spread of population values around the population mean
s²	Sample variance	s² = Σ(xᵢ − x̄)² / (n−1)	Squared spread of sample values
σ²	Population variance	σ² = Σ(xᵢ − μ)² / N	Squared spread of population values
n	Sample size	—	Number of observations in a sample
N	Population size	—	Total number of units in the population
Mdn	Median	Middle value when sorted	Value separating the upper and lower halves
Mo	Mode	—	Most frequently occurring value

Inference and Hypothesis Testing

Symbol	Name	Meaning
H₀	Null hypothesis	The hypothesis of no effect, no difference, or no relationship
H₁ or Hₐ	Alternative hypothesis	The hypothesis of an effect, difference, or relationship
α (alpha)	Significance level	The probability threshold for rejecting H₀ (typically 0.05)
β (beta)	Type II error rate	The probability of failing to reject H₀ when H₁ is true
1 − β	Statistical power	The probability of correctly rejecting H₀ when H₁ is true
p	p-value	The probability of observing a test statistic this extreme or more extreme, if H₀ were true
t	t-statistic	Test statistic following a t-distribution; used in t-tests
F	F-statistic	Test statistic following an F-distribution; used in ANOVA
χ² (chi-square)	Chi-square statistic	Test statistic following a chi-square distribution; used for categorical data tests
df	Degrees of freedom	The number of values free to vary in computing a statistic
CI	Confidence interval	Range of plausible values for a parameter, e.g., 95% CI [0.32, 0.48]
z	Z-score or Z-statistic	Standardized value: z = (x − μ)/σ; also used as a test statistic
SE	Standard error	Standard deviation of a sampling distribution: SE = s/√n

Correlation and Regression

Symbol	Name	Formula	Meaning
r	Pearson correlation	Σ[(xᵢ−x̄)(yᵢ−ȳ)] / [√Σ(xᵢ−x̄)² × √Σ(yᵢ−ȳ)²]	Linear relationship strength between X and Y; ranges −1 to +1
ρ (rho)	Spearman correlation	r on rank-transformed data	Monotonic relationship; robust to outliers and non-normality
R²	Coefficient of determination	R² = r² (for simple regression)	Proportion of variance in Y explained by X
β₀	Intercept	Predicted Y when X = 0	Baseline level of Y independent of X
β₁	Slope	Change in predicted Y per unit increase in X	Strength of linear relationship in regression
ε (epsilon)	Residual / error	ε = Y − Ŷ	Unexplained deviation between observed and predicted Y
Ŷ (Y-hat)	Predicted value	Ŷ = β₀ + β₁X	Fitted value from the regression model

Effect Sizes

Symbol	Name	Formula	Interpretation
d	Cohen's d	(μ₁ − μ₂) / σ_pooled	Standardized mean difference; 0.2 small, 0.5 medium, 0.8 large
g	Hedges' g	d × correction factor	Bias-corrected d for unequal sample sizes
OR	Odds ratio	[P(A\|E)/P(not A\|E)] / [P(A\|not E)/P(not A\|not E)]	Comparison of odds between two groups
RR	Relative risk	P(outcome\|exposed) / P(outcome\|unexposed)	Ratio of outcome probabilities between groups
ARR	Absolute risk reduction	P(outcome\|unexposed) − P(outcome\|exposed)	Absolute difference in outcome probability
NNT	Number needed to treat	1 / ARR	Number of people needing treatment to prevent one outcome
V	Cramér's V	√(χ²/(n × min(r−1, c−1)))	Effect size for chi-square test of independence

F.3 Set Notation

Used in formal definitions throughout the textbook, particularly in Chapters 22–24.

Symbol	Name	Plain-language meaning	Example
∈	Element of	"is a member of" the set	x ∈ {1, 2, 3} means x is one of 1, 2, or 3
∉	Not element of	"is not a member of" the set	4 ∉
⊂ or ⊆	Subset	Every element of A is also in B	{1,2} ⊂
∩	Intersection	Elements in both A and B	{1,2,3} ∩ {2,3,4} = {2,3}
∪	Union	Elements in A, in B, or in both	{1,2} ∪ {2,3} = {1,2,3}
∅	Empty set	The set with no elements	{x : x > 5 and x < 3} = ∅
\|A\|	Cardinality	The number of elements in set A	\|{1,2,3}\| = 3
Aᶜ or Ā	Complement	All elements NOT in A (relative to a universal set U)	If U = {1,...,5} and A = {1,2}, then Aᶜ = {3,4,5}
×	Cartesian product	All ordered pairs (a, b) with a ∈ A and b ∈ B	{0,1} × {0,1} = {(0,0),(0,1),(1,0),(1,1)}
: or \|	Such that	Used in set-builder notation	{x : x > 0} means "all x such that x > 0"
∀	For all	Universal quantifier	∀x ∈ A, f(x) > 0 means "for every x in A, f(x) is positive"
∃	There exists	Existential quantifier	∃x ∈ A such that f(x) = 0 means "some x in A makes f zero"

F.4 Graph and Network Notation

Used in Chapter 23 (Network Analysis) and Chapter 16 (Information Diffusion).

Symbol	Name	Definition
G	Graph	A mathematical structure G = (V, E) consisting of a set of vertices (nodes) and edges
V	Vertex set	The set of all nodes in a graph; \|V\| = n is the number of nodes
E	Edge set	The set of all edges (connections) in a graph; \|E\| = m is the number of edges
(u, v)	Edge	An edge connecting node u and node v
G = (V, E)	Graph definition	The graph G is fully defined by its vertex set V and edge set E
k or d(v)	Node degree	The number of edges incident to node v; in directed graphs, separate in-degree (k_in) and out-degree (k_out)
A	Adjacency matrix	Square n × n matrix where A_ij = 1 if edge (i,j) exists, 0 otherwise; A_ij = w_ij for weighted graphs
D	Degree matrix	Diagonal n × n matrix where D_ii = k_i (degree of node i)
L	Laplacian matrix	L = D − A; encodes graph structure; eigenvalues reveal community count
k̄	Mean degree	Average degree across all nodes: k̄ = 2\|E\| / \|V\|
C(v)	Clustering coefficient	Fraction of node v's neighbors that are also connected to each other
d(u,v)	Shortest path length	Number of edges in the shortest path between nodes u and v
diam(G)	Graph diameter	Maximum shortest path length across all pairs of nodes
Q	Modularity	Q = Σ_c [L_c/m − (d_c/2m)²]; measures quality of community structure; Q ∈ [−1, 1]
C_B(v)	Betweenness centrality	Fraction of all-pairs shortest paths that pass through node v
C_C(v)	Closeness centrality	Reciprocal of the average shortest path from v to all other nodes
PR(v)	PageRank	Iterative measure of node importance based on quality-weighted in-links
β	Infection rate	In SIR model: rate at which susceptible individuals become infected per contact
γ	Recovery rate	In SIR model: rate at which infected individuals recover
R₀	Basic reproduction number	R₀ = β/γ; average number of secondary infections per infected individual; epidemic spreads if R₀ > 1

F.5 Information Theory Notation

Used in Chapters 23 and Appendix A.

Symbol	Name	Formula	Meaning
H(X)	Shannon entropy	H(X) = −Σᵢ p(xᵢ) log₂ p(xᵢ)	Average uncertainty of random variable X; measured in bits
H(X, Y)	Joint entropy	H(X,Y) = −ΣΣ p(x,y) log₂ p(x,y)	Uncertainty of the joint distribution of X and Y
H(X\|Y)	Conditional entropy	H(X\|Y) = H(X,Y) − H(Y)	Remaining uncertainty in X after learning Y
I(X;Y)	Mutual information	I(X;Y) = H(X) + H(Y) − H(X,Y)	Amount of information X and Y share; I(X;Y) ≥ 0
D_KL(P\|\|Q)	KL divergence	Σᵢ P(xᵢ) log[P(xᵢ)/Q(xᵢ)]	How much P differs from reference distribution Q; not symmetric
D_JS(P\|\|Q)	Jensen-Shannon divergence	[D_KL(P\|\|M) + D_KL(Q\|\|M)] / 2, where M = (P+Q)/2	Symmetric, bounded version of KL divergence
b	Log base	Typically b = 2 (bits), b = e (nats), or b = 10 (dits)	Unit depends on base: base 2 gives bits

F.6 Machine Learning Notation

Used in Chapters 22–24.

Data Representation

Symbol	Name	Meaning
x	Feature vector	A single data point represented as a vector of feature values; x ∈ ℝᵈ
xᵢ	i-th feature	The i-th component of the feature vector x
X	Feature matrix	The n × d matrix of all training examples, where each row is one example
y	Label vector	The vector of true labels for each training example
ŷ	Predicted label	The model's prediction; also written ŷ = f(x; θ)
d	Feature dimension	Number of features in each data point
n	Number of examples	Number of training examples (rows in X)
k	Number of classes	Number of distinct output categories in a classification problem

Model Parameters and Training

Symbol	Name	Meaning
w or θ	Weights / Parameters	The learnable parameters of a model
w₀ or b	Bias	The intercept term in a linear model
L(θ) or ℒ	Loss function	A scalar measure of how wrong the model's predictions are
∇L or ∇θL	Gradient	The vector of partial derivatives of L with respect to all parameters θ
η or α	Learning rate	Step size for gradient descent: θ ← θ − η∇L
λ	Regularization parameter	Controls strength of L1 (Lasso) or L2 (Ridge) regularization penalty

Evaluation Metrics

Symbol	Name	Formula	Meaning
TP	True Positive	—	Misinformation correctly identified as misinformation
TN	True Negative	—	True content correctly identified as true
FP	False Positive	—	True content incorrectly labeled as misinformation
FN	False Negative	—	Misinformation incorrectly labeled as true
Acc	Accuracy	(TP+TN)/(TP+TN+FP+FN)	Proportion of all cases correctly classified
P	Precision	TP/(TP+FP)	Of all predicted positives, fraction truly positive
R	Recall (sensitivity)	TP/(TP+FN)	Of all true positives, fraction correctly identified
F₁	F1 score	2PR/(P+R)	Harmonic mean of precision and recall
AUC	Area under ROC curve	∫ ROC curve	Probability that model ranks a positive case above a negative case

NLP Notation

Symbol	Name	Meaning
V	Vocabulary	The set of all unique tokens in the training corpus
\|V\|	Vocabulary size	Number of unique tokens
tf(t,d)	Term frequency	Number of times term t appears in document d
idf(t)	Inverse document frequency	log(N / df(t)); N = total docs, df(t) = docs containing t
tfidf(t,d)	TF-IDF weight	tf(t,d) × idf(t); higher for distinctive terms
eₜ	Token embedding	Dense vector representation of token t; eₜ ∈ ℝᵈᵉ
d_e	Embedding dimension	Size of the token embedding vector (e.g., 768 for BERT-base)
h	Hidden state	Internal representation in a neural network layer
Attn(Q,K,V)	Attention	Scaled dot-product attention: softmax(QKᵀ/√dₖ)V

F.7 Notation from Chapters 22–24 and 28

This section provides quick look-up for notation introduced in specific chapters.

Chapter 22: Computational Misinformation Detection

TF-IDF(t, d) = tf(t, d) × log(N / (1 + df(t)))

Logistic regression: P(y=1|x) = σ(wᵀx + b) = 1 / (1 + exp(−(wᵀx + b)))

Cross-entropy loss: L = −[y log(ŷ) + (1−y) log(1−ŷ)]

BERT tokenization: input = [CLS] token₁ token₂ ... tokenₙ [SEP]

Classification head: ŷ = softmax(W · h_[CLS] + b)

Chapter 23: Network Analysis

Degree of node v: k(v) = |{u : (u,v) ∈ E}|

Betweenness centrality: C_B(v) = Σ_{s≠v≠t} [σ_st(v) / σ_st]
  where σ_st = number of shortest paths from s to t
        σ_st(v) = those paths that pass through v

PageRank: PR(v) = (1−d)/N + d × Σ_{u→v} [PR(u) / k_out(u)]
  where d ≈ 0.85 is the damping factor

Modularity: Q = (1/2m) Σ_{ij} [A_ij − k_i k_j / 2m] × δ(c_i, c_j)
  where δ(c_i, c_j) = 1 if i and j are in the same community

SIR model differential equations:
  dS/dt = −β S I / N
  dI/dt = β S I / N − γ I
  dR/dt = γ I
  R₀ = β / γ

Chapter 24: Coordinated Behavior Detection

Cosine similarity: sim(A, B) = (A · B) / (|A| × |B|)

Jaccard similarity: J(A, B) = |A ∩ B| / |A ∪ B|

Co-retweet matrix C: C_ij = number of tweets retweeted by both account i and account j

Synchrony score for account pair (i,j):
  S_ij = number of posts within window Δt / total posts by i and j

Chapter 28: Bayesian Reasoning Applied

Posterior ∝ Likelihood × Prior
P(θ|data) ∝ P(data|θ) × P(θ)

Bayes factor: BF = P(data|H₁) / P(data|H₀)
  BF > 3: moderate evidence for H₁
  BF > 10: strong evidence for H₁
  BF > 100: decisive evidence for H₁

Credibility updating:
  Odds(H|E) = Odds(H) × [P(E|H) / P(E|¬H)]
  Odds(H) = P(H) / P(¬H)

F.8 Greek Alphabet Reference

Uppercase	Lowercase	Name	Common use in this textbook
Α	α	Alpha	Significance level; learning rate
Β	β	Beta	Type II error rate; regression coefficient
Γ	γ	Gamma	Recovery rate in SIR model
Δ	δ	Delta	Change in a quantity; time window
Ε	ε	Epsilon	Error term; small positive constant
Θ	θ	Theta	Model parameters (generic)
Λ	λ	Lambda	Poisson rate; regularization strength
Μ	μ	Mu	Population mean
Ξ	ξ	Xi	—
Π	π	Pi	Mathematical constant 3.14159...; product notation
Ρ	ρ	Rho	Spearman correlation; density
Σ	σ	Sigma	Summation (uppercase); standard deviation (lowercase)
Τ	τ	Tau	Time; Kendall's rank correlation
Φ	φ	Phi	Cumulative standard normal distribution function
Χ	χ	Chi	Chi-square test statistic
Ψ	ψ	Psi	—
Ω	ω	Omega	Sample space (uppercase); angular frequency

F.9 Notation Conventions

Scalars: Lowercase italic letters — a, b, x, y.

Vectors: Lowercase bold letters — x, w, θ — or lowercase letters with an arrow: x⃗.

Matrices: Uppercase bold letters — X, A, W.

Sets: Uppercase calligraphic or italic letters — V, E, S, Ω.

Random variables: Uppercase italic letters — X, Y, Z.

Realizations of random variables: Corresponding lowercase letters — x, y, z.

Estimated quantities: Hat notation — θ̂ is an estimate of θ; ŷ is a prediction.

Transpose: Superscript T — xᵀ is the transpose of x.

Norms: ‖x‖₂ is the L2 (Euclidean) norm; ‖x‖₁ is the L1 norm.

Infinity: ∞ denotes an unbounded limit.

Proportionality: ∝ means "is proportional to" (differs by a positive constant).

Approximately: ≈ means approximately equal; ~ (in probability) means "distributed as."