Exercises: Security Information and Event Management (SIEM)

DataField.Dev

Exercises: Security Information and Event Management (SIEM)

These exercises move from vocabulary to the daily craft of a SOC analyst: reading logs, writing correlation rules, querying, and — the skill that separates working SIEMs from decorative ones — tuning. Difficulty is marked ⭐ (recall/application), ⭐⭐ (analysis), and ⭐⭐⭐ (synthesis/open-ended). A dagger (†) marks problems with a full worked solution in Appendix: Answers to Selected Exercises — try every problem before you read one.

All IPs are in documentation ranges; all logs are illustrative (Tier 3). Times are UTC unless noted. Work in your own notebook, a private repo, or — better — a free SIEM in your home lab.

Part A — Core vocabulary ⭐

1.† In one sentence each, define log source, normalization, parsing, correlation rule, and use case. Then write one sentence that uses all five correctly in the context of detecting a VPN brute-force.

2. Classify each item as a log source, a normalization step, a correlation pattern, or an alert-tuning technique: (a) a domain controller; (b) renaming TargetUserName to user; (c) "10 failed logins in 5 minutes from one source"; (d) allowlisting the vulnerability scanner's IP; (e) AWS CloudTrail; (f) mapping a raw timestamp to UTC; (g) "IDS alert followed by an outbound connection to a new IP"; (h) raising a threshold from 5 to 20.

3. Explain the difference between a false positive and a false negative. For a brute-force detection rule, give a concrete example of each.

4.† Define alert fatigue and explain, with the arithmetic, why a SOC with excellent detection coverage can still miss a real attack. Use a five-analyst team as your example.

5. Distinguish a SIEM, a data lake, and a SOAR in one sentence each, by the job each does. Then state which one you would use to: (a) store three years of raw logs cheaply for occasional hunting; (b) automatically disable an account when a high-severity alert fires; (c) correlate identity and firewall logs in real time.

6. Why is forwarding logs off the host to a central SIEM a security control and not merely an operational convenience? Name the MITRE ATT&CK behavior it defends against.

Part B — Analyze this log ⭐⭐

7.† You are handed this excerpt from a normalized authentication log. All times UTC; source IP is in 203.0.113.0/24.

02:14:01  source=vpn  user=swhitfield  src_ip=203.0.113.88  action=login  outcome=failure
02:14:03  source=vpn  user=dokafor     src_ip=203.0.113.88  action=login  outcome=failure
02:14:04  source=vpn  user=mreyes      src_ip=203.0.113.88  action=login  outcome=failure
02:14:06  source=vpn  user=pnair       src_ip=203.0.113.88  action=login  outcome=failure
02:14:08  source=vpn  user=evasquez    src_ip=203.0.113.88  action=login  outcome=failure
02:14:11  source=vpn  user=tbrandt     src_ip=203.0.113.88  action=login  outcome=success

(a) What attack is this most likely to be — brute force or password spraying — and what field pattern tells you? (b) Which single event should escalate the alert's severity, and why? (c) Which of the first ten use cases does this match? (d) Name two tuning conditions that would keep this detection high-fidelity.

8. Here are three raw events from three different systems, all involving the same source IP. Write the normalized form of each (fields: timestamp, source, user, src_ip, action, outcome, host).

A) May 14 09:03:11 web01 sshd[771]: Failed password for root from 198.51.100.5 port 41122 ssh2
B) EventID=4625 TimeCreated=2025-05-14T09:03:12Z TargetUserName=admin IpAddress=198.51.100.5 WorkstationName=DC01 Status=0xC000006A
C) 2025-05-14 09:03:13 DENY TCP 198.51.100.5:41200 -> 10.10.1.9:3389 rule=17 iface=outside

Then write the one-line question (in plain English) that this normalization now lets you answer but could not before.

9.† This Windows event appears in the SIEM at 03:47 UTC, outside any change window:

EventID=1102  TimeCreated=2025-05-14T03:47:55Z  host=FILESRV2  SubjectUserName=svc_backup
Message="The audit log was cleared"

(a) Why is this a high-fidelity, single-event detection? (b) What does the fact that svc_backup (a service account) cleared the log suggest? (c) What two follow-up queries would you run immediately?

10. An analyst sees 312 alerts in the queue this morning, of which 300 are the same rule firing on the same benign batch job that runs nightly. Diagnose the problem in SIEM terms and name two distinct techniques that would reduce this to a single, correct alert (or none).

11.† Below is a week of a single noisy rule's activity. Decide whether to tune, allowlist, suppress, or disable it, and justify your choice in two sentences.

Rule: "Outbound connection to non-corporate IP from a server"
Fires: ~140x/day. Investigation shows ~138/day are the patch server reaching the
       vendor's update CDN, and the monitoring system reaching a status API.
       ~2/day are genuinely worth a look.

Part C — Write the correlation rule ⭐⭐

12. Write, in plain pseudocode or a SQL-style query against a normalized events table, a threshold correlation rule for password spraying: one src_ip, many distinct users, all outcome=failure, in a short window. State your threshold and window and justify them.

13.† Write a sequence correlation rule for impossible travel: the same user has two successful logins from locations far enough apart that no human could travel between them in the elapsed time. Describe the data you need (what makes "location" available), the logic, and the most likely false positive — and how you would tune it out.

14. Write a cross-source correlation rule that joins an IDS exploit alert against a host with that host subsequently making an outbound connection to a never-before-seen external IP within ten minutes. Name the two log sources, the join key, and the ATT&CK stages this spans.

15.† Convert this informal request into a proper use case specification (use case name, ATT&CK technique if known, log sources, trigger logic, severity, analyst response, and the main false-positive risk): "Tell me when somebody turns off multi-factor authentication for a user."

16. Write a behavioral detection (in pseudocode) for a service account logging in interactively for the first time. What state must the SIEM keep to know "first time," and why is this detection more prone to false positives in the first weeks after deployment?

Part D — Query it (SQL / SPL / KQL) ⭐⭐

17.† Translate this investigation into all three query languages (SQL, SPL, KQL) against a normalized event store: "Over the last 24 hours, for source IP 203.0.113.88, show each user attempted and the count, most-attempted first."

18. An alert fired on user mreyes. Write a single query (in your choice of SQL, SPL, or KQL) that returns everything mreyes did in the last 12 hours, sorted by time. Explain why this is usually an analyst's second query (and what the first one was).

19.† Write a query that returns, for the last hour, any src_ip with 20 or more outcome=failure login events across 5 or more distinct users — i.e., spraying. (Hint: you will need a count and a distinct-count and a HAVING/post-aggregation filter.) Give it in SQL.

20. A teammate's query has no time bound and is scanning the entire dataset, slowing the SIEM for everyone. Rewrite it to add a sensible time window and explain, in one sentence, the operational reason this matters.

Part E — Tune the alert / reduce fatigue ⭐⭐–⭐⭐⭐

21.† A brute-force rule ("5 failed logins for one user in 10 minutes") fires ~60 times a day, almost all on legitimate users who mistyped a new password and then succeeded. Propose three distinct tuning changes that cut the false positives without creating a blind spot for a real spray-then-success attack. For each, state what benign case it excludes and what malicious case it preserves.

22. Your manager, frustrated by a noisy rule, says "just turn it off." Write a three-sentence response that explains, in SIEM terms, why disabling it is usually worse than tuning it — and what you propose instead.

23.† Design a risk-based alerting scheme (in words) for a user account: instead of every minor detection paging an analyst, low- and medium-signal events accumulate a score for the user, and only a threshold score surfaces an alert. List five contributing signals and a rough weight for each, and explain how this reduces fatigue compared with binary alerting.

24. Given a SOC that can investigate ~100 alerts/day and a current queue of 800/day at a 96% false-positive rate, compute the true positives per day and the realistic number being investigated. Then set a target false-positive rate that would let the team actually cover the true positives, and name two ways to get there.

Part F — Design it ⭐⭐⭐

25. Design a log-source collection plan for a new 300-person company moving to the cloud (Microsoft 365, AWS, a fleet of laptops, one office firewall). List the sources in priority order, the collection method for each (agent / syslog / API / stream), and one sentence per source on its detection value. Justify why your top three are first.

26.† Design Meridian's "first ten use cases" as you would prioritize them for a bank, and defend the ordering. You may reuse, reorder, or replace the list in the chapter — but argue, for your top three, why a bank in particular should detect those first.

27. Design the SIEM-vs-data-lake split for an organization generating 2 TB of logs/day on a SIEM that licenses by ingest volume. Which categories of logs go to the real-time SIEM, which to the cheaper data lake, and how do you decide? State one risk your split accepts.

28. ⭐⭐⭐ Design a logging & monitoring standard (one page, in outline form) for a small organization: log sources and owners, normalization and time standard, retention, the first detections, and the tuning process. This mirrors the chapter's Project Checkpoint — make it something a real team could adopt.

Part G — CTF-style challenge ⭐⭐⭐

29.† The six-day foothold. Reconstruct the chapter's opening incident from these (illustrative, normalized) events scattered across three sources. Put them in order, state which correlation rule would have caught the attack and at which step, and write that rule. Then explain why each event, alone, would not have alerted.

day1 14:02:10  source=win_security  user=svc_app   src_ip=10.20.5.40  action=login  outcome=success  host=APPSRV9   logon_type=interactive
day1 14:09:33  source=edr           user=svc_app   host=APPSRV9  action=process  proc="net group /domain"  outcome=success
day1 15:12:04  source=win_security  user=svc_app   host=DC01     action=group_add  target_group="Domain Admins"  outcome=success
day1 03:11:50  source=edr           user=svc_app   host=APPSRV9  action=process  proc="whoami /priv"  outcome=success

(Note: one event's timestamp is deliberately out of order in the listing — part of the challenge is ordering correctly, which is why UTC and synchronized clocks matter.)

Part H — Interleaved & forward-looking ⭐⭐

30. (Builds on Chapter 10.) Chapter 10 produced flow summaries and a beaconing score from network data. Explain how a beaconing finding from that chapter becomes a log source and a correlation input here, and write a cross-source rule that combines "host shows beaconing behavior" with an identity or endpoint event to raise fidelity.

31. (Builds on Chapter 7.) A firewall (Chapter 7) writes allow/deny logs. Explain why these are a valuable but lower-priority SIEM source than identity logs, and give one detection that requires firewall logs and one that does not need them at all.

32. (Builds on Chapter 6.) Why does the network-fundamentals reality that many devices each keep their own clock make UTC + NTP a prerequisite for correlation rather than a nicety? Give a concrete example of a real attack a sequence rule would miss under three minutes of clock drift.

33. This chapter previews Sigma and detection-as-code (developed in Chapter 22). In two sentences, predict why writing detections as version-controlled text — rather than clicks in a console — matters more as a SOC grows from one analyst to twenty.

34. ⭐⭐⭐ Open reflection. The chapter claims "fidelity, not coverage, is the currency of a SOC." Argue the strongest counter-case (when might broad coverage matter more than fidelity?), then say where you ultimately land and why. Half a page.

Solutions to daggered (†) problems are in the Answers appendix. The remaining problems are deliberately open — bring them to a study group, your instructor, or your home-lab SIEM.