A scannable, reference-grade summary for review and the exam. Roughly 80% tables and decision aids. If
you remember one thing: a breach is usually a failure to correlate data, not to collect it — and
fidelity, not coverage, is the currency of a SOC.
Core definitions (the term ledger for this chapter)
| Term |
One-line definition |
| SIEM |
System that ingests, normalizes, correlates, and alerts on logs from many sources for real-time detection and investigation. |
| Log source |
Any system/application that produces timestamped event records (AD, EDR, firewall, cloud, app…). |
| Normalization |
Mapping fields from many source formats onto one common schema (consistent names/formats). |
| Parsing |
Extracting the meaningful fields out of a raw log message (the step before normalization). |
| Correlation rule |
Logic that fires an alert when a defined pattern occurs across events/sources/time. |
| Use case (detection) |
A named threat scenario to detect, with its logic, sources, severity, response, and false-positive risk. |
| Alert fatigue |
Desensitization/degraded performance from too many alerts (mostly false positives) — causes missed attacks. |
| False positive |
An alert with no real malicious activity. (False negative: a real attack with no alert.) |
| Log retention |
How long logs are kept; driven by detection/hunting needs and compliance (PCI-DSS, GLBA). |
| Data lake |
Cheap, long-term store of vast raw data; schema-on-read; no real-time correlation by itself. |
| SOAR |
Security Orchestration, Automation, and Response — automates the response to alerts via playbooks. |
| Detection-as-code |
Managing detection rules as version-controlled, reviewable, testable text (e.g., Sigma). |
| Dashboard |
At-a-glance visual of metrics/events from SIEM queries (operational vs. executive). |
Log-source priority (collect top-down, by detection value)
| # |
Source |
Method |
Why it ranks here |
| 1 |
Identity / auth (AD, Entra, IdP, VPN) |
agent + API |
Identity is the new perimeter; catches credential abuse across the whole kill chain |
| 2 |
Endpoint detection (EDR) |
vendor agent |
Process creation, persistence, defense evasion — code execution lives here |
| 3 |
Cloud control plane (CloudTrail / Azure / GCP) |
API pull |
The only place cloud attacker actions are recorded |
| 4 |
Network edge (firewall, proxy, DNS, IDS/IPS) |
syslog (TLS) |
Context + C2/exfil detection (builds on Ch.10) |
| 5 |
Servers (Windows/Linux) |
agent / syslog |
Interactive logons, privilege changes, key services |
| 6 |
Critical applications (core/online banking) |
agent / app logs |
Access to crown jewels; high-value custom detections |
| 7 |
SaaS / email (M365, mail gateway) |
API pull |
Phishing, account takeover |
Rule of thumb: Easy-to-collect ≠ worth-collecting. Collect what serves a use case; send a cheaper full copy to a data lake.
Collection methods
| Method |
Use for |
Trade-off |
| Agent |
Hosts, endpoints (rich data) |
Reliable; must deploy/maintain on every machine |
| Syslog |
Network devices, Unix (use TLS) |
Universal and cheap; loose, inconsistent format |
| API pull |
Cloud / SaaS (no agents allowed) |
High value (identity/cloud); scheduled, rate-limited |
| Stream (Kafka-style) |
Very high-volume sources |
Decouples firehose from SIEM; more infrastructure |
Normalization at a glance
RAW (3 sources, same event) NORMALIZED (common schema)
sshd "Failed password for jchen from X" ┐ {timestamp, source, user, src_ip,
4625 TargetUserName=jchen IpAddress=X ┼──► action, outcome, host}
firewall "DENY TCP X:p -> Y:22" ┘ -> one query/rule works across all
- Parse to get fields out; normalize to give them common names/formats.
- Map to a published model — ECS, OCSF, or Splunk CIM — don't invent field names.
- UTC everywhere; NTP on every source. Time-based correlation breaks under clock drift.
The correlation ladder (simple → powerful)
| Rung |
Type |
Example |
Fidelity |
| 1 |
Single-event (atomic) |
Audit log cleared (Win 1102); login from a banned country |
High but narrow |
| 2 |
Threshold |
50 failed logins from one source, many accounts, 5 min (spray) |
Good; counting is the power |
| 3 |
Sequence / temporal |
Failure burst → success for same account (brute force worked) |
High; reads ordered attacks |
| 4 |
Cross-source |
IDS exploit alert → host's outbound to new IP (exploit → C2) |
High; needs normalization |
| 5 |
Behavioral / baseline |
Service account logs in interactively for the first time |
Powerful; noisy early (→ Ch.34) |
Why correlation beats single-event alerting: attacks are processes (kill chain, Ch.2); each step looks ordinary, but the sequence/combination betrays the attacker.
Meridian's first ten use cases (starter catalog)
| # |
Use case |
Rung |
ATT&CK (approx.) |
| 1 |
Brute force followed by success |
sequence |
T1110 → T1078 |
| 2 |
Password spraying (one src, many users) |
threshold |
T1110.003 |
| 3 |
Impossible travel |
sequence/geo |
T1078 |
| 4 |
Security/audit log cleared |
single-event |
T1070.001 |
| 5 |
New privileged-group membership added |
single-event |
T1098 / T1078 |
| 6 |
Service account interactive logon (new) |
behavioral |
T1078.002 |
| 7 |
Disabled/expired account login attempt |
single-event |
T1078 |
| 8 |
MFA disabled or reset |
single-event |
T1556 |
| 9 |
Outbound to known-bad / new external IP |
cross-source |
T1071 (C2) |
| 10 |
Mass file access or deletion (ransomware) |
threshold |
T1486 |
Querying: the same shape in three dialects
Investigation: count logins by user for one source IP, last hour, most first.
| Language |
Used by |
Style |
| SQL |
databases, data lakes |
leads with SELECT … GROUP BY … ORDER BY |
| SPL |
Splunk |
pipeline: search \| stats … by \| sort |
| KQL |
Microsoft Sentinel/Defender |
pipeline: where \| summarize … by \| sort |
-- SQL
SELECT user, COUNT(*) attempts FROM events
WHERE action='login' AND src_ip='203.0.113.77' AND timestamp >= NOW()-INTERVAL '1' HOUR
GROUP BY user ORDER BY attempts DESC;
-- SPL
index=auth action=login src_ip="203.0.113.77" earliest=-1h
| stats count AS attempts by user | sort - attempts
// KQL
Events | where action=="login" and src_ip=="203.0.113.77" | where timestamp >= ago(1h)
| summarize attempts=count() by user | sort by attempts desc
Shape to memorize: filter (lead with time bound) → aggregate → sort → (sometimes join).
| Technique |
What it does |
Watch out for |
| Tune thresholds/conditions |
Narrow the rule (raise count, require new IP, span many accounts) |
Don't narrow away the real attack |
| Allowlist known-benign |
Exclude scanners, backup/service accounts, monitoring |
An allowlist is a documented hole — review it |
| Aggregate / deduplicate |
100 identical alerts → one "100×" |
Don't merge genuinely distinct events |
| Risk-based alerting |
Weak signals accumulate a score; surface high scorers |
Tune weights; needs entity tracking |
| Suppress / schedule |
Mute known maintenance/batch windows |
Keep the window tight and documented |
Decision rule: a noisy-but-valuable rule → TUNE (narrow conditions). A noisy-and-worthless rule → consider disable as a documented risk decision. Disabling a valuable rule = a silent false negative — strictly worse than visible noise.
The fatigue arithmetic: 800 alerts/day × 3% true ≈ 24 true positives buried in 776 false; a 5-analyst SOC (~100 alert capacity) cannot find them. Fidelity, not coverage.
SIEM vs. Data Lake vs. SOAR
|
SIEM |
Data Lake |
SOAR |
| Job |
Detect & investigate |
Store cheaply, long |
Respond (automate) |
| Real-time correlation? |
Yes |
No (by itself) |
Acts on SIEM alerts |
| Schema |
on write (normalized) |
on read |
n/a |
| Cost driver |
ingest volume / storage |
raw storage (cheap) |
integrations |
| Used for |
alerting, queries, dashboards |
retention, hunting, forensics |
playbooks: enrich, contain, ticket |
Modern pattern: high-value logs → SIEM (hot, real-time); cheaper full copy → data lake (cold, long retention); response orchestrated by SOAR.
Dashboards & metrics (bridge to Ch.36)
| Audience |
Dashboard |
Shows |
| SOC |
Operational |
open alerts by severity, oldest un-triaged, noisy rules, log-source health |
| Leadership |
Executive |
trends: MTTD, MTTR, FP rate, ATT&CK coverage, program status |
Metrics are born in the SIEM: MTTD/MTTR need timestamped activity + alert records. No logging → no measurement (developed in Ch.36).
Recurring themes surfaced
| Theme |
In this chapter |
| 1 — Process, not product |
A SIEM is a loop (detect→investigate→respond→tune); tuning is standing operations |
| 2 — Asymmetry |
Attacker needs one ignored alert; alert fatigue hands it over for free |
| 4 — Defense in depth / assume breach |
Logging assumes prevention fails; correlation catches the attacker already inside |
| 5 — Compliance is the floor |
PCI-DSS/GLBA mandate logging, but the goal is detection beyond the audit |
Certification crosswalk
| Concept |
CompTIA Security+ |
(ISC)² CISSP |
| SIEM, log aggregation/correlation |
Security Operations (logging & monitoring) |
Domain 7 — Security Operations |
| Log sources, collection, retention |
Monitoring; data sources |
Domain 7; Domain 2 (asset/retention) |
| Normalization, time sync (NTP/UTC) |
Logging concepts |
Domain 7 |
| Correlation rules / use cases |
Detection & alerting |
Domain 7 (detective controls) |
| Alert fatigue, false pos/neg, tuning |
Alerting & monitoring |
Domain 7 |
| SOAR / automation |
Automation & orchestration |
Domain 7 |
| Dashboards, MTTD/MTTR |
Reporting |
Domain 7; Domain 1 (governance metrics) |
Common pitfalls (quick hit-list)
- ☐ "Collect everything" → expensive, noisy SIEM that gets ignored. Collect by use case.
- ☐ Easy-to-collect sources prioritized over high-value ones.
- ☐ No time bound on queries → scans terabytes, degrades the SIEM.
- ☐ Clocks not synced / not UTC → sequence correlation silently fails.
- ☐ Importing hundreds of vendor-default rules → instant alert fatigue.
- ☐ Disabling a noisy-but-valuable rule instead of tuning it → silent blind spot.
- ☐ Allowlists added without owner/justification/review → undocumented holes.
- ☐ Logs only on the host → an attacker clears them. Ship off-box.
- Program artifact: Meridian's logging & monitoring standard — prioritized source list (with owners/onboarding), normalization + UTC/NTP, ≥1-year retention with hot/cold split, first-ten use-case catalog, detections-as-code, weekly tuning review.
bluekit module: siem.py — normalize(raw, source) (raw event → common schema) and correlate(events, rule) (threshold/sequence rule → alerts).
- Builds on: Ch.10 (network monitoring feeds the SIEM), Ch.7 (firewall/IDS as sources). Sets up: Ch.22 (detection engineering & hunting, Sigma), Ch.24 (IR consumes alerts; SOAR orchestrates response), Ch.36 (metrics & board reporting). Spaced review: Ch.10, Ch.6.