Exercises: Incident Response

DataField.Dev

Exercises: Incident Response

These exercises move from recall to live decision-making under uncertainty — the core of incident response. Difficulty is marked ⭐ (recall/application), ⭐⭐ (analysis), and ⭐⭐⭐ (synthesis/open-ended judgment). A dagger (†) marks problems with a full worked solution in Appendix: Answers to Selected Exercises — attempt every problem before you read one.

Incident response is judgment under time pressure. For the scenario problems, write your answer the way you would actually run it: name the decision, the tradeoff, the action, and why. "I would look it up" is never a complete answer — say what you would do.

Part A — The lifecycle and core vocabulary ⭐

1.† List the six phases of the NIST SP 800-61 incident-response lifecycle in order, and explain in one sentence why the model is drawn as a loop rather than a straight line.

2. Distinguish, in one sentence each, a security event, a security incident, a playbook, and a runbook. Then give one concrete example of each at Meridian.

3. Explain the difference between short-term and long-term containment, and give one example of each from a ransomware response.

4.† Define eradication and recovery and explain why doing recovery before eradication is finished is dangerous. What is the mantra for a deeply compromised host, and why?

5. What is an incident commander, and why is it often a mistake for the most technically skilled responder in the room to also serve as IC?

6. Define blameless postmortem. State the one ground rule that makes it "blameless," and explain what blame does to an organization's future incident reporting.

Part B — Severity classification & triage ⭐⭐

7.† Using a SEV-1–SEV-4 matrix like Meridian's (§24.2), classify each and justify in a phrase: - (a) A single user reports a phishing email; no one clicked; the email was blocked. - (b) EDR shows ransomware actively encrypting files on a file server. - (c) A non-privileged employee's laptop has commodity adware, already quarantined by EDR. - (d) A domain-admin account logged in from an impossible-travel location 20 minutes ago and is active now. - (e) An external researcher emails that a Meridian customer database appears to be for sale on a forum.

8. A junior analyst closes an EDR alert as a false positive because "it was just one alert and the machine looks fine." Explain what triage step they skipped, and what they should have done instead.

9.† Triage answers four questions. For the alert in 7(b), write a one-line answer to each of the four (is it real? how bad? how far? what now?), and identify which question is the most urgent to drive before taking any containment action — and which is most urgent to not delay containment for.

10. Explain the tension between scoping before containment and containing immediately. Give one incident type where you would scope first and one where you would contain first, and state the rule that distinguishes them.

Part C — Make the containment call ⭐⭐–⭐⭐⭐

11.† Make the call. For each incident, state your containment posture (immediate-aggressive vs. quiet-scope-first vs. proportionate) and name the single dominant concern (stop damage / preserve evidence / business continuity / avoid tipping off) that drives your choice: - (a) Active ransomware mid-encryption, spreading via a domain-admin account. - (b) A suspected nation-state intruder, resident ~6 weeks, quietly exfiltrating documents, no destruction. - (c) One employee's laptop with a single piece of contained commodity malware. - (d) A live attacker interactively typing commands on an internet-facing web server right now.

12. The evidence tradeoff. During containment, an engineer wants to immediately power off a compromised server "to be safe." Explain what is lost by powering off versus isolating, and when (if ever) an emergency power-off is the right call.

13.† Tipping off. You discover a stealthy intruder who has three known footholds and signs of additional persistence you have not fully mapped. A nervous executive demands you "shut it all down right now." Write a three-sentence response explaining the risk of premature, loud containment and what you propose instead.

14. A compromised account turns out to be a service account with domain-admin rights (the Meridian svc_backup problem). Explain why this single fact dramatically expands the scope of the incident, and list three containment/eradication actions its privilege level forces.

Part D — Write the playbook / write the policy ⭐⭐–⭐⭐⭐

15.† Write the playbook. Draft a phishing-with-credential-harvesting playbook for Meridian's SOC: 8–12 numbered steps from "email reported" through closure, written at the decision/coordination level (the playbook layer, not the runbook layer). Include the triage decision, the scoping step (who else got it / clicked), containment (account/session actions), and the lessons-learned hook.

16. Write the runbook. Pick one step from your playbook in 15 — "disable the compromised account and revoke its active sessions" — and write it as a runbook: numbered, tool-specific (you may invent plausible tool names), with the approval required and a verification step that proves it worked.

17.† Write the policy. Draft a one-paragraph ransom-payment policy for Meridian that a board would approve: the default posture, who is allowed to revisit it, what they must consult, and at least two reasons payment is disfavored. (Reference, in plain terms, the sanctions and no-guarantee problems.)

18. Design the comms. Build a communications matrix for a SEV-1 at Meridian as a small table: audience | who owns the message | when/trigger | one-line message content. Include at least: the response team, the broader workforce, legal/insurer, the federal banking regulator, and affected customers.

19. ⭐⭐⭐ Design the severity matrix. Meridian's matrix is bank-specific. Design a SEV-1–SEV-4 matrix for a different organization you choose (a hospital, a SaaS startup, a school district). Define each level on the axes that organization actually cares about, and state what each level triggers. Explain one way your matrix differs from a bank's and why.

Part E — Respond to this incident (tabletop) ⭐⭐⭐

20.† Run the tabletop. You are the incident commander. Respond to these injects in order; for each, write what you decide and do, naming the phase and the playbook/role involved. - Inject 1 (T+0): EDR alerts on three servers: a process is deleting volume shadow copies; overnight backups failed integrity checks. - Inject 2 (T+10 min): Scoping shows all three were reached from one compromised privileged account. - Inject 3 (T+30 min): Files on two servers now have a .LOCKED extension; a ransom note demands payment in 72 hours and threatens to leak exfiltrated customer data. - Inject 4 (T+2 h): Containment is holding; initial vector found (an exposed admin interface + reused, no-MFA password); no confirmed bulk data exfiltration yet. - Inject 5 (debrief): Name three concrete action items your response should produce.

21. Facilitate. You must design a 90-minute tabletop for your team next quarter. Write (a) the one-paragraph scenario, (b) three timed injects that force hard decisions, and (c) the three questions you will ask in the debrief. Explain what makes a good inject (what each of yours is designed to test).

22. ⭐⭐⭐ The pay-or-not decision. Recovery from backups will take an estimated five days, during which the bank cannot originate loans; the attacker offers a decryptor for 75 BTC and claims it will restore service in two hours. Walk the decision: who decides, what they must consult, what the technical, legal, ethical, and sanctions considerations are, and why "it's faster" is not by itself sufficient justification to pay.

Part F — Analyze this (telemetry & artifacts) ⭐⭐

23.† Analyze the timeline. Your scribe assembled this (illustrative) timeline from the logs during an incident. All times UTC; IPs in documentation ranges.

02:14  auth   user=svc_report  src=10.4.2.9    result=SUCCESS  (first seen this host)
02:15  proc   host=APPSRV03    cmd="whoami /priv"             parent=w3wp.exe
02:18  proc   host=APPSRV03    cmd="net group 'Domain Admins' /domain"
02:31  auth   user=svc_report  dst=DC01         result=SUCCESS  (logon type 3)
02:33  proc   host=DC01        cmd="ntdsutil ... ifm"          (created a copy of the AD database)
02:40  net    host=DC01        dst=203.0.113.77  bytes_out=412309000

(a) Narrate what the attacker did, step by step, in plain language. (b) Map each line to a likely MITRE ATT&CK tactic (initial foothold use, discovery, lateral movement, credential access, exfiltration). (c) What is the scope implication of line 5 (the AD database copy)? (d) Name the two most urgent containment actions.

24. Spot the under-scoping. A responder reports: "Contained. We found the malware on WKS-22, quarantined it, and reset that user's password. Closing the incident." List four questions you would ask before you let them close it, each aimed at a scope gap.

25. Read the ransom note. A note claims "we have exfiltrated 2.5 million customer records and will leak them unless paid." Your egress logs show one 400 MB outbound transfer to an unknown host during the incident window. (a) Does the note's claim match the evidence? (b) What does this do to your regulatory notification analysis? (c) Why should you neither blindly believe nor blindly dismiss the claim?

Part G — CTF-style challenge ⭐⭐⭐

26.† The 36-hour clock. It is Saturday 07:00. At 07:00 you have an EDR alert; by 09:30 you have confirmed ransomware on internal servers and a ransom note; by 14:00 you have determined that customer data was likely but not confirmed exfiltrated. The federal banking rule requires regulator notification within 36 hours of determining a qualifying incident occurred. Build the timeline of notification decisions: at what point(s) does a "determination" plausibly occur, who must be in the room to make it, what notifications are triggered at each, and what is the risk of waiting for certainty about exfiltration before notifying? (There is judgment here — defend your reading of "determination.")

Part H — Interleaved & forward-looking ⭐⭐

27. (Interleave Ch. 22.) During triage you map the alerting activity to MITRE ATT&CK. Explain concretely how that mapping helps you scope — what it tells you to go look for that you would otherwise miss.

28. (Interleave Ch. 19.) The Meridian incident hinged on an over-privileged service account. Tie this back to PAM: which two PAM controls from Chapter 19, had they been in place, would most have reduced the blast radius of this incident, and why?

29. (Interleave Ch. 21.) Explain why a SOC with chronic alert fatigue has a weaker incident response capability, not just a tiring job. Connect false-positive rate to MTTD.

30. ⭐⭐⭐ (Forward to Ch. 25.) During containment you isolated but did not power off or wipe the compromised servers. Explain, in two or three sentences, what the next chapter (forensics) will need from those preserved systems, and why incident response and forensics must be planned together rather than treated as sequential, unrelated activities.

Solutions to daggered (†) problems are in the Answers appendix. The remaining problems are deliberately open — run the tabletops with a study group or your instructor, because incident response is a team skill that does not survive being learned alone.