Answers to Selected Exercises

Worked solutions to the daggered (†) and odd-numbered exercises from each chapter. Try every problem before reading its solution.

Chapter 1

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Threat — a potential cause of harm. Vulnerability — a weakness that could be used to cause harm. Exploit — the technique that turns a vulnerability into harm. Risk — the likelihood of that harm times its impact. Combined sentence: "A laptop thief (threat) exploits an unlocked screen and unencrypted disk (vulnerability) by simply walking off with the device and reading it (exploit), and the risk is the chance of that happening times the value of the data lost."

4. Residual risk is the risk remaining after controls are applied. It can never reach zero because every control has gaps, costs, and failure modes, and reducing risk further eventually costs more than the harm prevented. A business might consciously accept a non-trivial residual risk when the cost of further mitigation exceeds the expected loss — for example, accepting a small fraud rate rather than adding friction that would drive away customers.

6. Reasonable scores (judgments vary; the justification matters): - (a) Guest WiFi shares a segment with tellers: L4 (easy to attempt, common misconfig) × I4 (path to teller systems) = 16, CRITICAL. - (b) Intern read-only to public brochures: L1 × I1 = 1, LOW — trivial. - (c) No account lockout on the portal: L4 (constantly attacked) × I5 (customer accounts/funds) = 20, CRITICAL. - (d) Untested backups: L2 (failure only manifests during recovery) × I5 (catastrophic if it fails) = 10, HIGH — and see the note that the model can understate this. Ranking: (c) 20, (a) 16, (d) 10, (b) 1.

9. Risk is likelihood × impact because a risk requires both a real chance of occurring and a real consequence; multiplication sends the product to zero if either factor is zero, correctly encoding "both must be present." Addition would not: a finding with likelihood 5 and impact 0 would score 5 under addition (appearing as a real risk) but 0 under multiplication (correctly, no harm is possible). Concretely, "a guaranteed event that causes no harm" (L5, I0) and "a moderate, damaging risk" (L3, I3) both sum to a misleading comparison, while multiplication ranks them 0 vs 9 — the right order.

10. (a) A password-spraying or credential-stuffing attack — many distinct usernames tried from one source in seconds. (b) The shared src IP across many different user values in a tight time window is the strongest indicator (one source attacking many accounts). (c) A combination: the attacker is the threat, weak/guessable passwords or missing lockout is the vulnerability, and the automated submission of credentials is the exploit. (d) Account lockout or rate-limiting, and phishing-resistant MFA, would reduce the risk (Chapter 16).

12. Example response: "We did not waste the money — we got exactly what we paid for. The attacker had a valid password and still failed, because the authentication control we invested in did its job; that is what working security looks like. The lesson is not that prevention failed but that our layered defense held when a human, as humans always eventually will, made a mistake."

14. Example first three risk-register rows (format: risk · asset · L · I · score/band · treatment): 1. Credential attack via password-only login · banking platform · 4 · 5 · 20 CRITICAL · enforce phishing-resistant MFA + lockout. 2. Orphaned/over-privileged accounts enable lateral movement · AD/Entra ID · 3 · 5 · 15 CRITICAL · access review + disable stale accounts + reduce admin rights. 3. Weak segmentation lets a foothold reach the cardholder data environment · CDE · 3 · 5 · 15 CRITICAL · segment the CDE with default-deny between zones.

17. Problems with the ticket: "the entire company" is not a threat actor (it conflates all employees with an adversary); "CRITICAL" is asserted without a likelihood or impact analysis; visibility of a calendar is a low-impact confidentiality issue, not obviously a vulnerability worth emergency remediation. Rewrite: "Finding: the CEO's calendar is visible to all staff (information-disclosure / least-privilege issue). Asset: executive schedule metadata. Threat: a curious or malicious insider, or an external attacker who has already compromised an internal account. Likelihood 2 (requires insider access or prior compromise), Impact 2 (limited sensitivity; could aid social engineering or physical targeting) → score 4, MEDIUM. Recommend restricting calendar visibility as routine least-privilege hygiene, not an emergency." Part of the skill is recognizing that the correct answer may be "this is low risk."


Chapter 2

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. The five threat-actor types, with motivation and typical capability: - Nation-state / APT — motivation: espionage, strategic advantage, sabotage; capability: very high, patient, custom tooling and zero-days. - Cybercriminal — motivation: money; capability: low to high, businesslike, scales via automation. - Hacktivist — motivation: ideology/protest/attention; capability: low to moderate, occasionally skilled. - Insider — motivation: grievance, greed, or (most often) accident; capability: starts with legitimate access. - Script kiddie — motivation: ego/curiosity; capability: low, runs others' tools, cannot improvise.

4. Most cybercriminals are economically rational: they want the best return for the least effort, so they prefer soft targets and abandon hard ones. Being more expensive to attack than a comparable organization makes the rational choice for them to go elsewhere — you do not have to be impregnable, only costlier than your neighbor. That logic fails against a nation-state with a specific reason to want you: their objective is you in particular (your data, your access, your strategic value), not "any easy victim," so they will spend more, wait longer, and develop custom tooling rather than reprice you and move on. Against that adversary, "harder than the neighbor" is irrelevant; you must lean on defense in depth, detection, and response.

7. The seven kill-chain stages, with attacker action and a defensive opportunity each: 1. Reconnaissance — gather target info (emails, tech, exposed services). Defense: reduce public footprint; monitor for active scanning. 2. Weaponization — build the weapon/payload (phish, malware, fake portal). Defense: threat intel may recognize known tooling/infrastructure. 3. Delivery — transmit the weapon (email arrives, link clicked, USB plugged, bad update downloaded). Defense: email/URL filtering, sandboxing, trained users who report. 4. Exploitation — the weapon triggers (code runs, vuln exploited, creds used). Defense: patching, hardening, EDR, application control, phishing-resistant MFA. 5. Installation — establish persistence (backdoor, service, task, account). Defense: EDR, baseline/ autoruns monitoring, least privilege. 6. Command and Control (C2) — malware phones home for instructions. Defense: network/DNS monitoring, block known C2, beacon detection. 7. Actions on Objectives — accomplish the goal (exfil, ransomware, fraud, destroy). Defense: DLP, anomaly on large transfers, segmentation to limit reach, backups to defeat ransomware leverage.

10. Command and Control is one of the best detection stages because the installed malware must communicate with the attacker to be useful, and that communication has detectable patterns; an attacker finds it hard to operate without generating some network signal. Two network-level signals: (a) regular beaconing — connections to an external destination at near-constant intervals (a heartbeat); (b) connections to known-bad or anomalous destinations — traffic to domains/IPs flagged by threat intel, newly registered look-alike domains, or DNS used as a covert channel (unusually long or frequent queries to one domain).

12. Tactic = the adversary's goal in a phase (e.g., for ransomware, Impact — make data unavailable). Technique = the method achieving a tactic, with a stable ID (e.g., Data Encrypted for Impact). Procedure = the specific implementation (e.g., a particular ransomware family that enumerates shares, deletes shadow copies, and encrypts with a named extension). TTPs abbreviates tactics, techniques, and procedures — the characteristic way an adversary operates. The term is useful because it names the durable, behavior-level fingerprint of an adversary (costly for them to change) rather than the ephemeral indicators (IPs, hashes) they swap easily.

15. Critique: "a detection for every technique and a fully green map" is the classic ATT&CK anti-pattern. It encourages brittle, low-quality detections written just to claim a cell, burns out the team, and produces a coverage map that looks complete while catching nothing real. ATT&CK is a prioritization and communication tool, not a compliance bingo card. The correct use: (1) identify the adversaries who realistically target you (your threat-actor profile) and the techniques they use; (2) build good detections — ones that actually fire on real behavior with acceptable false-positive rates — for those techniques first; (3) be honest about gaps. A truthful, partial map with real coverage of the relevant techniques is more valuable than a green wall, because the green wall creates false confidence: leadership believes it is covered, the team stops improving, and the brittle detections silently fail against real intrusions.

17. (a) The pattern most suggests Command and Control — five outbound connections to the same destination at near-constant ~60-second intervals with near-constant small byte counts. The single most suspicious characteristic is the regularity (the metronome-like timing), which is unnatural for human-driven traffic and characteristic of an automated beacon. (b) The term is beaconing. (c) Both: the specific destination IP is an IoC, while "beaconing to a C2 server" is a TTP (the technique Application Layer Protocol / C2 behavior). (d) Detection: a beacon detector flagging regular intervals to one destination (plus blocking/sinkholing known-bad domains). Prevention: egress filtering/ default-deny outbound so internal hosts cannot freely reach arbitrary external destinations, and blocking the destination once identified.

19. Mapping each step to a kill-chain stage: 1. Scraped employee names from public sources → Reconnaissance. 2. Registered a look-alike domain and built a fake VPN page → Weaponization. 3. Employee received the "password expiry" email and entered credentials on the fake page → Delivery (the email/link reaching and being acted on by the target). 4. Logged into the real VPN and installed a reboot-surviving remote-access tool → Exploitation (using the harvested credentials) into Installation (persistence). 5. The tool beaconed to an external server every five minutes → Command and Control. 6. Copied a customer database out over two days → Actions on Objectives (Exfiltration).

21. A beaconing detection (pseudocode/prose): for each (host, external destination) pair over a time window, collect the connection timestamps; compute the gaps between consecutive connections; if there are at least N connections and the gaps are all within a tolerance of their mean (i.e., highly regular), flag the pair as a possible beacon. Parameters to tune and a false-positive cause for each: interval tolerance (how much jitter to allow — too tight misses jittered beacons, too loose flags normal periodic traffic); minimum count (how many connections before flagging — too low flags coincidental pairs); window length (how long to observe — too short cannot establish regularity, too long buries a short beacon). Legitimate sources of regular traffic that cause false positives: software update checks, NTP/ time sync, monitoring/heartbeat agents, and email/calendar clients polling on a schedule — which is why a real detector pairs regularity with destination reputation and an allowlist of known-good periodic services. (See exercise-solutions.py for a minimal implementation with hand-traced output.)

23. STRIDE-lite for Active Directory / Entra ID (identity) — one model among several valid ones: - S — Spoofing: actor: criminal using stolen/sprayed credentials, or an insider reusing a session; kill-chain: Delivery → Exploitation; defense: phishing-resistant MFA everywhere, conditional access. - T — Tampering: actor: an attacker on a domain controller altering group membership/GPOs; kill-chain: Privilege Escalation; defense: tiered administration, change auditing, protected groups, monitoring. - R — Repudiation: threat: a privileged action with no reliable, tamper-evident trail; defense: centralized immutable logging of all privileged actions. - I — Information disclosure: actor: criminal/APT enumerating the directory (users, groups, trusts); kill-chain: Discovery; defense: least-privilege read, detect mass enumeration, honeytoken accounts. - D — Denial of service: threat: ransomware encrypts the domain controllers, halting all authentication; kill-chain: Actions on Objectives (Impact); defense: resilient/redundant DCs, tested offline backups. - E — Elevation of privilege: actor: any foothold escalating to domain admin (the crown-jewel objective); kill-chain: Privilege Escalation → Lateral Movement; defense: least privilege, PAM, tiering, privileged access workstations, segmentation. Key insight (matches Case Study 1): nearly every category, for nearly every other asset, eventually routes through compromising identity — so getting identity right raises the attacker's cost across many asset models at once.

27. Tabletop, ordered steps (reasoning from the chain; full IR process is Chapter 24): 1. Triage and confirm scope — correlate the three signals (reported phish, beaconing host, service- account login) to determine whether they are one incident; identify the affected workstation and account. (Stage being addressed: understanding where the attacker is in the chain — they have reached at least C2.) 2. Contain the C2 channel — isolate the beaconing workstation from the network and block/sinkhole the beacon destination, severing the attacker's control before Actions on Objectives. (Break: Command and Control.) 3. Contain the identity — disable or reset the implicated service account and force re-authentication; check what that account can reach. (Break: Privilege Escalation / Lateral Movement — deny the pivot.) 4. Block the delivery vector — quarantine the phishing email across all 30 recipients, block the sender/URL, and check who else clicked. (Break: Delivery — prevent further footholds.) 5. Hunt and preserve — search for the same IoCs (beacon destination, persistence artifacts) on other hosts, preserve evidence, and escalate per the IR plan. (Stage: ensure no parallel chain is running.)

29. Re-classification: this is not an APT. The evidence shows a cybercriminal / opportunistic actor of low-to-moderate capability: a year-old unpatched CVE with a public exploit is the opposite of a zero-day (no advanced capability required — just an unpatched target); the in-and-out in two days shows haste, not the patient persistence of espionage; and a file copied to cloud storage over HTTPS is ordinary exfiltration, not "advanced." Misuses of the vocabulary in the report: (1) APT — applied to an actor showing neither advanced technique nor persistence; (2) motivation — never established, but speed and the lack of long-term access point to money/opportunism, not espionage; (3) capability — overstated ("sophisticated zero-day") when the actor merely used a public exploit against a missing patch; (4) TTP — "advanced techniques" is asserted without naming any specific technique; the actual technique (exploit a known CVE on an internet-facing service; exfil over a web protocol) is unremarkable; (5) dwell time — the report conflates attacker patience with time-to-discovery: the two days were the attacker's actual operation, while the "months" were how long it took the defender to notice, which is a detection failure, not evidence of a patient adversary. Corrected summary: "An opportunistic, financially-motivated attacker exploited a known, year-old unpatched vulnerability on an internet-facing server (initial access), and within two days exfiltrated data to a cloud-storage account over HTTPS. The intrusion was not sophisticated; its long time-to-discovery (months) reflects a gap in our detection, not the attacker's patience. Priorities: patch known-exploited vulnerabilities on internet-facing assets (Chapter 23), and close the detection gap that let a two-day intrusion go unnoticed for months (Chapters 21–22)."


Chapter 3

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Confidentiality — information is disclosed only to those authorized to see it; attacked by disclosure (data theft, interception). Integrity — information and systems are accurate and unaltered except by authorized parties (and tampering is detectable); attacked by tampering (fraud, silent record changes). Availability — systems and data are reachable by authorized users when needed; attacked by denial (ransomware, DoS, deletion).

4. Authentication verifies who you are (proving an identity claim); it defends against impersonation. Authorization determines what you are allowed to do once authenticated; it defends against privilege escalation. A system that gets the first right but the second wrong: one that correctly verifies a teller's identity with a password and MFA (good authentication) but then grants every authenticated user full administrative rights (broken authorization) — identity is proven, but the permissions are catastrophically over-broad.

7. Classification (function / nature): - (a) firewall inbound block → preventive / technical - (b) quarterly access reviews → detective / administrative (they find over-provisioning after the fact; they do not prevent it) - (c) CCTV camera in the server room → detective / physical - (d) restore from backup → corrective / technical - (e) isolate + monitor an unpatchable system → compensating / technical (an alternative meeting the intent of "patch it," with a detective element from the monitoring) - (f) security-awareness training → preventive / administrative - (g) SIEM impossible-travel alert → detective / technical - (h) badge reader at the data-center door → preventive / physical - (i) documented IR plan → corrective / administrative - (j) full-disk encryption → preventive / technical

9. A compensating control is an alternative that satisfies the intent of a primary control you cannot implement directly (e.g., you cannot patch a legacy core-banking server, so you isolate it on its own segment and monitor it heavily — meeting the spirit of "remove the vulnerability" without patching). A corrective control restores the system after an undesirable event to limit damage (e.g., restoring the core database from backup after ransomware). The difference is timing and purpose: compensating substitutes for a missing preventive control before an incident; corrective acts after one. PCI-DSS makes room for compensating controls because real environments sometimes genuinely cannot meet a stated requirement directly; the standard allows a documented alternative provided it meets the same security objective, which keeps the requirement's intent intact rather than granting a free pass.

11. Least-privilege violations on svc_dbbackup and the fix: - Domain Admins — wildly excessive; a backup job needs nothing domain-wide. Remove. - Remote Desktop Users + interactive logon enabled — a service/machine account should not be human-logon-capable; this is an attack path (someone can log in as the service). Remove interactive logon; remove from RDP Users. - Local admin on 14 servers — far beyond reading one database. Remove. - Full read/write to ALL file shares — it only needs to write to the backup NAS path. Scope to that one path. - Password 4 years old, never rotated, shared with two humans — a service-account credential must be rotated and must not be shared (sharing destroys non-repudiation and least privilege). Rotate; vault it; remove human knowledge of it (Chapter 19/20). - Corrected permissions: a non-interactive service account with (1) read on the core database only and (2) write on the specific backup NAS path only — nothing else. Keep it out of all privileged groups.

13. Example separation-of-duties policy (wire transfers over $100,000): "All outbound wire transfers exceeding $100,000 require two distinct individuals: an initiator, who enters the transfer, and an approver, who independently reviews and authorizes it. No individual may both initiate and approve the same transfer, and the approver must hold the dedicated Approver role. Transfers above $1,000,000 require a second independent approver (dual control). Any exception must be authorized in writing by the CISO or delegate, logged, alerted to the SOC, and reviewed after the fact; no standing exception is permitted." The account type most likely to silently defeat this policy is an over-powerful administrative account (a system or application superuser/admin) that can perform both the initiate and approve actions — which is exactly why privileged accounts get their own dedicated controls in Chapter 19.

16. (a) Across the four lines: line 2 (reading the salaries share) attacks confidentiality; line 3 (adding a member to Domain Admins) is a privilege-escalation / authorization attack threatening integrity (and enabling far more); line 4 (clearing the Security log) attacks the integrity of the audit trail and is an accounting/anti-forensics attack. Authentication is also implicated — the attacker is operating as svc_admin. (b) The shared svc_admin account is a non-repudiation problem because the log faithfully records that svc_admin did these things but cannot tie them to a specific human; with the account shared, no individual can be held responsible and the actor can deny it. (c) Least privilege enforced on svc_admin would most have limited the damage: a service account scoped to its actual function could not read HR salaries, add Domain Admins, or clear the Security log. (Separation of duties and the elimination of the shared account also help.) (d) A detective control that should have fired: a SIEM alert on modification of the Domain Admins group and on Security-log clearing — both are high-fidelity, rarely-legitimate events (Chapters 21–22).

18. First five steps as the on-shift analyst, with the principle/control each relies on: 1. Confirm and scope the failed login. Pull the SSO logs for the clicking user; verify the FIDO2 key blocked the login and that no session was established. (Relies on accounting — logs as ground truth — and on the preventive control, phishing-resistant MFA, having held.) 2. Reset the exposed credential. The password was entered into the attacker's page, so it is compromised: force a reset and invalidate active sessions. (Least privilege / containment — assume the secret is in the attacker's hands.) 3. Hunt for reuse / blast radius. Check whether that password (or pattern) is used elsewhere and whether the user's account has more access than their role needs. (Least privilege — limit what one compromised identity could reach.) 4. Search for other recipients/clickers. Use the malicious URL and sender to find everyone who received or visited it; check for any successful logins. (Defense in depth + detection — one reported email is a sample; assume others were targeted.) 5. Contain the infrastructure and record everything. Block the malicious domain/URL at the proxy and email gateway, document the timeline, and preserve evidence. (Corrective + preventive controls; accounting/non-repudiation for the eventual report.)

21. Critique of "just put a firewall in front of it": a single firewall is one preventive layer (§3.5) protecting an asset that cannot be patched, so any flaw that reaches the server (an allowed port, a trusted internal host that is itself compromised, an application-layer attack the firewall does not inspect) leads straight to an unpatchable, high-value system with no further obstacle and no detection. It violates defense in depth (no independent layers behind it) and offers nothing detective or corrective. A compensating-control package that meets the intent of "keep this server secure" without patching: - Network isolation / microsegmentation (preventive/technical): place the server on its own tightly controlled segment with default-deny, allowing only the specific hosts and ports it genuinely needs (Chapters 6–7). - Heavy monitoring / detection (detective/technical): log all access and alert on any anomalous connection or command — because you assume the preventive layer will eventually be bypassed. - Tightened access + least privilege (preventive/administrative + technical): restrict who and what may reach it to the absolute minimum; remove standing admin access; require privileged access through a controlled path (Chapter 19). - Backup + tested recovery (corrective/technical): ensure the server can be restored quickly, since you cannot fix the vulnerability itself. Together these are diverse in type and nature, so no single failure exposes the asset — the definition of a real compensating package rather than a single wall.

24. The principle audit. Violations, ranked by the risk each creates: 1. "Anyone on the network can read everything (no login)" — violates zero trust (implicit network-location trust) and least privilege; highest risk, because any compromised internal host or insider reads all data with no authentication at all. Fix: require authentication for all access; trust no request because of where it originates; grant read access by role/need. 2. "If the login service is down, grant edit access to everyone" — violates fail-safe default (this fails open); severe, because a simple outage (or an attacker who induces one) hands universal write access. Fix: fail closed — deny edits when the authorization decision cannot be made. 3. "The single admin can approve its own changes" / "all editors share one account" / "no log of who changed what" — violates separation of duties (self-approval), non-repudiation and accounting (a shared editor account and no change log destroy attributability). Fix: split change-making from change-approval across distinct, named accounts; give each editor a unique identity; log every change to who/what/when. Rewriting the top three (authentication-for-all + deny-by-default + per-user accountability with separated approval) converts the portal from "trusts its own network and fails open with shared, unlogged accounts" into a least-privilege, fail-safe, attributable design.


Chapter 4

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design-oriented, or discussed in class. Reasoning matters more than a single "right" verdict; where a number is asked for, the method is the point.

1. Plaintext — the original readable data before encryption. Ciphertext — the scrambled output after encryption. Symmetric encryption — encryption using one shared secret key for both encryption and decryption. Asymmetric encryption — encryption using a public/private key pair, where what one key locks only the other unlocks. Combined sentence: "To send a file to a colleague, I encrypt the plaintext into ciphertext with a fast symmetric key, then protect that symmetric key by encrypting it with my colleague's public key using asymmetric encryption, so only their private key can recover it." (This is hybrid encryption, §4.3.)

4. Kerckhoffs's principle: a cryptosystem should remain secure even if everything about it except the key is public knowledge — i.e., the secrecy must live in the key, not the algorithm. "Proprietary encryption with a secret algorithm" is a red flag because secure algorithms (AES, SHA-2, RSA) are strong precisely because they are public and have survived decades of expert scrutiny; a hidden algorithm has not been scrutinized, often hides amateur mistakes, and substitutes "security through obscurity" for real security. The hidden algorithm is a liability, not a feature — and once discovered (as secret algorithms always are), there is nothing left protecting the data if its only defense was secrecy.

6. The key-distribution problem: symmetric encryption requires both parties to already share the same secret key, but securely getting that key to the other party — over channels that may be wiretapped — is hard; you cannot simply send the key in the clear, because an eavesdropper would capture it. Asymmetric encryption solves this by eliminating the shared secret: each party publishes a public key that anyone may use to encrypt to them, while the matching private key (never transmitted) does the decryption. An eavesdropper sees only public keys and ciphertext and can do nothing with them. Real systems still use symmetric encryption for the bulk work because asymmetric encryption is far slower and size-limited; hybrid encryption uses asymmetric crypto only to exchange a small symmetric key, then symmetric crypto (AES) to encrypt the actual data — speed where you need speed, key distribution where you need it.

8. Weakest to strongest, with prohibitions marked: - DES — 56-bit key, brute-forceable in hours. PROHIBITED. - RSA-1024 — factorable by well-resourced attackers; below modern floors. PROHIBITED. - AES-128 — secure for the foreseeable future (a 128-bit symmetric key is far stronger than a 1024-bit RSA key, because the hard problems differ). - ECC P-256 — ≈ comparable to RSA-3072; secure and efficient. - RSA-3072 — secure; the recommended RSA size for long-lived secrets. - AES-256 — secure; the common default floor for regulated data. ECC/RSA equivalence to remember: ECC-256 ≈ RSA-3072 (and ECC-384 ≈ RSA-7680). The ranking mixes symmetric and asymmetric, which is itself a teaching point: you cannot compare a symmetric and an asymmetric key size directly by the number of bits — AES-128 is "stronger" than RSA-1024 despite the smaller number, because each bit of a symmetric key contributes far more security.

11. The three properties and a failure consequence for each: - Deterministic (same input → same digest): if it failed, you could not verify integrity at all, since re-hashing the same data would not reproduce the expected digest. - One-way / preimage-resistant (cannot recover input from digest): if it failed, an attacker could reverse stored password hashes or signed digests directly into the original data. - Collision-resistant (cannot find two inputs with the same digest): if it failed (as with MD5/SHA-1), an attacker could forge integrity checks and signatures — e.g., get a benign document signed and reuse that signature on a malicious one with the same hash.

13. (a) The two attacks are precomputation (rainbow tables) and fast offline brute force. (b) A salt defeats precomputation: because each user's password is hashed with a unique random salt, an attacker cannot reuse a single precomputed table across users (they would have to rebuild a table per salt, which is infeasible), and identical passwords now produce different stored hashes. The salt works even though it is stored in plaintext because its value comes from uniqueness, not secrecy. (c) Switching to Argon2 with a work factor defeats fast brute force: Argon2 is deliberately slow and memory-hard, so instead of billions of guesses per second the attacker manages only thousands, turning a feasible offline attack into an infeasible one. You need both: the salt forces per-user guessing, and the slow algorithm makes that per-user guessing impractical.

16. Four mistakes and their fixes: - (A) random.random() is a non-cryptographic PRNG (predictable) and returns a float, not key material — bad randomness (§4.7). Fix: secrets.token_bytes(32) (a CSPRNG) for a 256-bit key. - (B) AES.MODE_ECB leaks patterns — identical plaintext blocks become identical ciphertext (§4.2). Fix: use an authenticated mode, AES.MODE_GCM, with a unique random nonce per message. - (C) SECRET_KEY = "hunter2-prod-key" is a hard-coded key that will land in version control — the classic key-management failure (§4.7). Fix: load it from a KMS or secret store at runtime; never in source. (It is also a low-entropy string, not proper key material.) - (D) hashlib.md5(pw) for password storage is fast and broken and unsalted (§4.4). Fix: use a password-hashing library (Argon2/bcrypt) with a per-user salt and a tuned work factor.

18. Things wrong or confused in "our proprietary military-grade 4096-bit hashing algorithm — even we can't decrypt it": - "proprietary" algorithm — a red flag (§4.1, Kerckhoffs); secure crypto is public and scrutinized. - "military-grade" — a marketing phrase with no technical meaning; not a property you can verify. - "4096-bit hashing" — conflates concepts: 4096-bit is a key size associated with asymmetric encryption (e.g., RSA), not with hashing (hashes have a fixed digest size like 256 bits and no "key size"). - "even we can't decrypt it" — hashing is one-way; there is nothing to "decrypt" in the first place, so the claim describes a non-property as if it were a feature, and reveals the vendor does not understand the distinction between hashing and encryption. A defender hearing this should distrust the product: each phrase signals confusion or obfuscation rather than sound cryptography.

21. Example encryption-standard snippet (one paragraph): "All sensitive data at rest shall be encrypted with AES-256 in GCM (authenticated) mode; ECB and unauthenticated modes are prohibited. Integrity hashing shall use SHA-256 or SHA-3; MD5 and SHA-1 are prohibited for security purposes. Passwords shall be stored using Argon2id (or bcrypt) with a unique per-user salt and an approved work factor — never a bare or unsalted hash. Asymmetric keys shall be RSA-3072 or larger, or ECC P-256 or larger; RSA-1024 is prohibited. All keys, IVs, nonces, and salts shall be generated from a cryptographically secure random source. Cryptographic keys shall never appear in source code, configuration, or logs; they shall be stored separately from the data they protect, access-restricted to named service identities, and rotated on a defined schedule and upon suspected compromise." (Each clause is testable — an auditor can check the configuration, the code-scan results, and the key-access policy.)

26. The identical ciphertexts. (a) ECB mode — it is the only common mode where identical plaintext blocks deterministically produce identical ciphertext blocks. (b) Without the key, you can infer the structure and repetition of the plaintext: which records share a field value (e.g., the same ZIP code or status), how many distinct values a field takes, and patterns that may reveal meaning — the "ECB penguin" leak applied to records. You learn relationships among records, and with known/guessed plaintext you can sometimes map ciphertext blocks back to values. (c) The cipher is not broken — AES is fine; the defect is the mode: ECB provides no diffusion across blocks, so it leaks plaintext patterns regardless of key strength. (d) Switch to an authenticated mode with a unique IV/nonce per encryption — AES-GCM — so identical plaintext no longer yields identical ciphertext and integrity is added as well.


Chapter 5

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Data in transit is data moving across a network between systems; its primary threat is an attacker on the path (eavesdropping or man-in-the-middle), and its primary control is a secure transport protocol (TLS, IPsec, WireGuard). Data at rest is data sitting in storage (disk, database, backup, cloud bucket); its primary threat is an attacker who gains access to the storage medium (a stolen laptop, a copied database file), and its primary control is encryption of the storage, where the decisive factor is key management (separating the key from the data).

4. Certificate pinning is an application constraining which certificate or public key it will accept for a server, rejecting even a validly CA-signed certificate that is not the expected one. At Meridian: the mobile banking app pins to the bank's own keys, so an attacker with a mis-issued-but-trusted certificate, or a malicious root CA installed on a customer's phone, still fails. Operational risk: if Meridian rotates its key without updating the pinned app, the app bricks itself — pins must be managed with a careful rotation plan. Mutual TLS (mTLS) is TLS where both client and server present and verify certificates, so each proves its identity. At Meridian: internal service-to-service calls and high-value partner APIs authenticate by certificate rather than a guessable shared token. Operational risk: it multiplies the certificate-lifecycle burden by every client, demanding automated certificate management at scale; a client whose certificate expires loses access.

7. TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384: - ECDHE — key exchange: ephemeral elliptic-curve Diffie–Hellman → provides forward secrecy. - RSA — authentication: the server's certificate is an RSA certificate whose private key signs the handshake. - AES_256_GCM — bulk encryption: AES with a 256-bit key in GCM, an AEAD mode (confidentiality + integrity together). - SHA384 — the hash used in the handshake's key derivation and integrity. Judgment: good — modern, forward-secret, AEAD. It belongs on the allowlist.

9. Findings on legacy.meridianbank.example: (1) obsolete protocols TLS 1.0 and TLS 1.1 are offered; (2) forward secrecy is not on all suites (some non-ephemeral key exchange exists); (3) the weak cipher 3DES is offered (Sweet32). The certificate itself is fine (valid, 220 days, RSA-2048) — not a finding. Grade: F — an obsolete protocol is offered, which is an automatic F under the chapter's grading logic (an attacker can downgrade clients to TLS 1.0). Remediation, priority order: (a) disable TLS 1.0 and TLS 1.1 and remove the 3DES (and any non-ephemeral) suites, leaving only TLS 1.2/1.3 with ECDHE-AEAD — this is the exploitable weakness; (b) re-scan to confirm the endpoint now grades A, and add it to the standard cipher policy so it cannot drift back.

11. Testing in a modern browser is unsafe to conclude from because the browser negotiates the strongest mutually supported option and hides the rest: it will pick AES-256-GCM and show a padlock even if the server also still offers RC4, 3DES, or TLS 1.0 that an attacker's downgraded client would gladly select. The padlock proves the browser's own connection is encrypted, not that the server's offered configuration is safe. The analyst should instead enumerate what the server offers with a defensive scanner (testssl.sh, sslscan, or nmap ssl-enum-ciphers) — all protocols and all cipher suites, plus the certificate's validity and forward-secrecy support — and grade that. Audit what is offered, not what your browser happened to negotiate.

12. (a) The 09:31 connections stand out because they negotiated TLS 1.0 with RC4 — an obsolete protocol and a broken cipher — from a single source, repeatedly, in contrast to the modern TLS 1.3 connections elsewhere. (b) result=OK means the server accepted these weak parameters, i.e., the server still offers TLS 1.0/RC4 — a misconfiguration, because a hardened server would have refused and the connection would have failed. (c) It is both: a misconfiguration (the server should not offer these) and possible attack/recon behavior (a client deliberately speaking only the weak protocol is what a downgrade attempt or a scanner looks like). (d) Immediate (configuration): disable TLS 1.0/1.1 and RC4 on the endpoint so such connections are refused. Investigative: pivot on src=203.0.113.77 — what else has it done, is it a known scanner or an attacker, and did any sensitive session actually complete over the weak channel?

14. Most likely root cause: a TLS certificate expired at 00:00 UTC and clients now reject it — a self-inflicted outage, since nothing was deployed and certificate errors are the symptom. Preventing practice: a complete certificate inventory with automated expiry monitoring and renewal (alerting well before expiry, e.g., at 30/14/7 days, and auto-renewing via ACME where possible) so a certificate never lapses unnoticed. (This is the cert_days_left/tls_config_grade instinct from the chapter.)

16. Everything wrong with the policy block: - Protocols: SSLv3, TLS 1.0, and TLS 1.1 are obsolete and downgradeable — remove them; offer only TLS 1.2 and TLS 1.3. - Ciphers ALL:!aNULL: "everything except no-authentication" still permits RC4, 3DES, export, static-RSA (no forward secrecy), and CBC suites — a downgrade attacker selects the weakest. Replace with an explicit allowlist of ECDHE + AEAD (AES-GCM / ChaCha20-Poly1305) suites only. - prefer_server_cipher_order: false: lets the client choose, so an attacker's downgraded client picks the weakest mutually supported suite — set it to true so the server enforces the strongest. - HSTS not set: without HSTS a browser can be stripped down to plaintext HTTP — enable HSTS with a long max-age. Corrected intent: TLS 1.2/1.3 only; ECDHE-AEAD cipher allowlist; server cipher preference on; HSTS enabled. (Exact directive syntax is implementation-specific; intent is what is graded.)

19. Example TLS configuration standard (public web properties): 1. Protocols: support TLS 1.2 and TLS 1.3 only; SSLv2/3 and TLS 1.0/1.1 are disabled everywhere. 2. Cipher suites: on TLS 1.2, only ECDHE key exchange (forward secrecy) with AES-GCM or ChaCha20-Poly1305 (AEAD) and SHA-256 or better; no RC4, 3DES, export, static-RSA, or CBC. TLS 1.3 uses its default AEAD suites. 3. Server preference: the server selects the strongest mutually supported suite (server cipher order on). 4. HSTS: enabled with a long max-age on all web properties. 5. Certificates: RSA ≥ 2048 bits or ECDSA P-256+; SHA-256 signatures; maximum ~1-year lifetime; renewal automated (ACME) or owned and ticketed. 6. Monitoring: every endpoint is scanned on a continuous schedule and must hold grade A; any regression alerts; certificate expiry is tracked with escalating alerts at 30/14/7 days.

21. Migration to mTLS for service-to-service auth: - Stand up (or use) an internal CA to issue short-lived client certificates to each service/workload. - Issue each service a client certificate (identity = the service), and configure each server to require and verify a client certificate on incoming calls, replacing the shared API token. - Roll out in monitor-then-enforce stages: first accept both token and certificate, verify every service presents a valid certificate, then remove the token. New lifecycle burden: every service now has a certificate that must be issued, deployed, rotated, and renewed before expiry — the certificate-lifecycle problem multiplied by the number of services; an expired client certificate breaks that service's calls. What helps (from §5.6): automated certificate issuance and renewal (e.g., ACME / an internal CA with short-lived certs and auto-rotation), a complete certificate inventory with expiry monitoring, and HSM/KMS protection for the internal CA's key. Short lifetimes plus automation make the scale manageable. (mTLS recurs as applied practice later in the book.)

24. The certificate that should not exist. (a) A trusted CA issuing a certificate for secure-login.meridianbank.example that Meridian never requested most likely indicates mis-issuance — through CA error, fraud, or a CA compromise — which is dangerous even though Meridian's own servers are untouched because the entire TLS trust model rests on every trusted CA never doing this; a third party now holds a browser-trusted certificate for a Meridian name. (b) With it, an attacker who can also get into a network path (rogue WiFi, DNS manipulation) can present a trusted certificate and MITM customers who connect to that hostname — reading or altering their session — because their browser will accept the certificate without warning. (c) Defensive actions, in order: (1) confirm via CT logs and the CA that Meridian did not request it; (2) contact the issuing CA to revoke the mis-issued certificate immediately; (3) check DNS — ensure the hostname does not resolve to anything Meridian controls and watch for malicious use; (4) consider a CAA record restricting which CAs may issue for Meridian's domains to prevent recurrence; (5) monitor for any traffic/incidents tied to the hostname. (d) Certificate pinning in the mobile banking app would have protected app customers even before the certificate was caught, because a pinned app refuses any certificate but the bank's own expected key, regardless of which CA signed it — a trusted-but-wrong certificate is still rejected. (The browser, trusting the CA system, would not be protected — which is exactly why high-value apps pin.)


Chapter 6

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design-oriented, or discussed in class.

1. (a) IP source address — Layer 3 (Network); attack: IP spoofing. (b) TCP port number — Layer 4 (Transport); attack: port scanning / SYN flood. (c) MAC address and ARP — Layer 2 (Data Link); attack: ARP spoofing / MAC flooding. (d) HTTP request body — Layer 7 (Application); attack: SQL injection / XSS. (e) TLS-encrypted session — Layer 6 (Presentation) (with session at 5); attack: weak/misconfigured TLS or downgrade.

4. The 4-tuple is source IP, source port, destination IP, destination port. The destination port alone is insufficient because many clients connect to the same destination socket at once (e.g., thousands of users to a web server on 443); they are told apart by their differing source IP and source port. Only all four values together uniquely identify one connection, which is how a server tracks a thousand simultaneous conversations.

7. (a) The 09:02 activity is a healthy completed connection (SYN → SYN-ACK → ACK to port 443). The 09:05 activity is a port scan: one source (203.0.113.9) sends lone SYNs to many different ports (21, 22, 23, 80, 445) in the same second, none completing. (b) The strongest indicator is the set of many distinct destination ports from one source with no completed handshakes (the dport field varying while src_ip is constant). (c) It is a combination: the scanning host/operator is the threat, any unnecessarily exposed listening service is a vulnerability, and the scan itself is reconnaissance — arguably the exploit of the reconnaissance phase, though no harm has yet occurred. Most precisely, it is threat activity probing for vulnerabilities. (d) Controls: a default-deny firewall (so the scan hits closed/denied ports and is logged), reducing the exposed attack surface, and rate-limiting/IDS to alert on the scan pattern (Chapter 7).

8. (a) A SYN flood (a protocol denial-of-service attack). (b) The diagnostic is the completion ratio: SYNs received are ~733× baseline (880,000 vs 1,200) while completed connections stay near baseline (1,050 vs 1,180), so almost none of the SYNs complete the handshake — and half-open connections balloon to ~610,000, filling the server's table. (c) You can't block the source IPs because they are spoofed and numerous (random, ~70,000 distinct), so a blocklist is futile and risks blocking legitimate users. (d) Defenses: SYN cookies on the server/load balancer (let it survive a full half-open table) and upstream rate-limiting / DDoS scrubbing at the ISP or a provider — the volumetric component must be absorbed upstream, not at your own already-saturated edge.

10. UDP is connectionless — it has no three-way handshake, so there is no return acknowledgment to forge and a sender's source address is never validated by a handshake. This makes UDP services easy to spoof: an attacker forges the victim's source IP and sends small queries to many open UDP servers (DNS, NTP, etc.), each of which sends a much larger response to the victim. The lack of a handshake both enables the spoofing and (because the response is bigger than the request) provides the amplification, which is why reflection/amplification DDoS overwhelmingly uses UDP services.

11. The most serious problem is rule 3, ALLOW any -> any any, placed above the default-deny. It makes the entire policy permissive: because first-match wins (or because a broad allow shadows everything), every flow is permitted and the network is effectively flat, destroying the segmentation the other rules imply. The "to avoid breaking things" comment is the tell — it trades all security for convenience. Rewrite: delete rule 3 entirely and rely on the explicit allows plus the default-deny:

1. ALLOW  branch    -> banking-app    tcp/443
2. ALLOW  corporate -> internet(DMZ)  tcp/443,80
3. DENY   any       -> any            any        # DEFAULT-DENY (was rule 4)

Then add back, one at a time, only the specific flows that are genuinely required (each justified), rather than a blanket allow. Segmentation is the enforced default-deny; a single any-to-any allow re-flattens it.

13. Flawed assumptions: (1) "Behind the firewall" assumes the firewall sees and stops internal (east-west) traffic — it does not; a perimeter firewall only inspects north-south traffic. (2) "NAT hides our addresses" treats NAT as a security control — it only rewrites addresses to conserve IPv4 and provides no authentication, content inspection, or protection once a foothold exists internally. (3) The implicit assumption that the internal network is trustworthy violates "assume breach." The missing internal encryption leaves open a man-in-the-middle attack: an attacker who gains any internal foothold can ARP- spoof the local segment and read or alter cleartext internal traffic (credentials, sessions). The fix is to encrypt internal traffic with validated TLS and add Layer 2 defenses and segmentation — none of which NAT or the perimeter provides.

16. Ordered, default-deny ruleset (pseudo-syntax; first match wins):

# DMZ web server 192.0.2.20; back-end app server 10.0.5.10
1. ALLOW  internet     -> 192.0.2.20   tcp/443     # public HTTPS to the web server
2. ALLOW  192.0.2.20   -> 10.0.5.10    tcp/8443    # web server reaches its back-end only
3. DENY   192.0.2.20   -> 10.0.0.0/8   any         # explicit: web server reaches nothing else internal
4. DENY   any          -> any          any         # DEFAULT-DENY (log all)

Annotations: Rule 1 permits only inbound HTTPS to the one DMZ host (no other port, no other source). Rule 2 permits exactly the one internal flow the web server needs. Rule 3 is an explicit, logged denial of any other internal reach from the (internet-exposed, therefore higher-risk) web server — redundant with rule 4 but legible and edit-proof. Rule 4 denies and logs everything else. Order matters: the specific allows must precede the denials, and the catch-all deny must be last.

18. Example network segmentation standard (program-ready): "Meridian's network is divided into trust zones — internet, DMZ, core, cardholder data environment (CDE), branch, corporate, management, and guest — separated by enforced firewall boundaries. The default posture between all zones is deny; only explicitly required flows are permitted, each documented and justified, and the most specific rules precede a catch-all default-deny. The CDE is segmented from all other zones, reachable only from named systems on named ports, to contain cardholder data and to limit PCI-DSS scope. All inter-zone traffic — permitted and denied — is logged and monitored, so that any attempt to cross a boundary generates a record the SOC can alert on. The standard is reviewed on change and at least annually."

20. Design (one valid answer). Redesign the flat credit-union network into trust zones:

                 INTERNET
                    │
            [ PERIMETER FW ]  default-deny inbound
                    │
            ┌───────┴────────┐
            │      DMZ        │   public website (only internet-reachable system)
            └───────┬────────┘
            [ INTERNAL FW ]  default-deny between zones
        ┌───────┬───────┴───────┬───────────┐
        │       │               │           │
   [CORPORATE] [TELLER/BRANCH] [CORE/        [MANAGEMENT]
    staff,      teller          CARD SYSTEMS] device admin
    email       workstations    (isolate any  (priv. only)
                                cardholder data here, strictest)

   [ GUEST WiFi ] -> internet only, isolated from all internal zones

Zones and trust: DMZ (low) holds the public website; teller/branch (medium) reaches only the banking application; corporate (medium) is general staff; core/card systems (highest) hold member financial data with any cardholder data isolated most strictly; management (high) holds device admin; guest (none) is internet-only. Place a perimeter firewall (north-south) and an internal firewall (east-west) with default-deny between every zone; permit only required flows; log inter-zone traffic. Isolate any cardholder data into its own zone reachable from the fewest possible systems (PCI-DSS).

23. (a) A man-in-the-middle attack via ARP spoofing at Layer 2 — the gateway IP now maps to a different (attacker) MAC, diverting traffic through the attacker, and the certificate warnings indicate the attacker is attempting to intercept TLS sessions (which the clients should reject). (b) First three containment actions: (1) isolate the implicated segment/port — disable the switch port presenting the rogue MAC or quarantine the suspected attacker host; (2) verify and lock the gateway's MAC binding (check switch ARP/MAC tables; apply a static ARP entry or enable dynamic ARP inspection) to stop the diversion; (3) instruct affected users to stop and not bypass certificate warnings, and begin scoping which sessions/credentials may have been exposed. (c) Earlier prevention/detection: dynamic ARP inspection and switch port security (block the spoof), static ARP for critical gateways, validated TLS on internal traffic (so even a successful MITM cannot read it), and segmentation to shrink the population of hosts an attacker can reach — plus monitoring switch logs for duplicate/changed MAC↔IP bindings.

25. The mislabeled "firewall." False/misleading claims: (1) "Makes the internal network invisible and therefore secure" — NAT hides addresses but provides no authentication, content inspection, or access control; invisibility is not security. (2) "Attackers can't reach it because everything is on 10.0.0.0/8" — private addressing only stops unsolicited inbound connections from the internet; it does nothing once an attacker has an internal foothold (e.g., via phishing) or rides an allowed outbound connection. (3) "Internal segmentation is unnecessary" — false: without segmentation, one foothold reaches everything east-west, which NAT cannot prevent. (4) "Internal encryption is unnecessary" — false: an internal attacker can ARP-spoof and read cleartext traffic via MITM regardless of NAT. What a phished user still enables: the attacker controls an internal host behind the NAT and can then scan the flat internal network, move laterally to servers and sensitive systems, ARP-spoof to intercept internal traffic, and reach the crown jewels — exactly the Pinewood/flat-network outcome (Case Study 2). NAT is a side effect that conserves addresses, not a wall; the appliance is a router being sold as a security program.


Chapter 7

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design tasks, or discussed in class.

1. Stateless firewall — examines each packet in isolation against a rule list (IP, port, protocol), with no memory of prior packets. Stateful firewall — tracks active connections in a state table and evaluates packets in the context of the conversation they belong to. Next-generation firewall — a stateful firewall that additionally inspects application-layer content and ties flows to user identity and threat intelligence. Each adds, in order: connection-state awareness (stateful over stateless), then application/user awareness (NGFW over stateful).

4. An access control list is an ordered set of permit/deny rules applied to traffic at an interface, evaluated top-to-bottom with the first matching rule taking effect. Order matters because a broad rule placed above a narrow one shadows it. Example: rule A permit tcp any -> 10.30.0.50 dport 5432 above rule B deny tcp 10.20.5.0/24 -> 10.30.0.50 — with A first, the deny in B never fires for that subnet; swap them and the subnet is correctly blocked while others are still allowed.

6. Default-deny denies all traffic except what is explicitly permitted; default-allow permits all traffic except what is explicitly blocked. When a brand-new, un-anticipated service appears: under default-deny it is blocked until someone deliberately permits it (failure mode = too restrictive = fails safe); under default-allow it is reachable until someone notices and blocks it (failure mode = exposed = fails open). Default-deny fails safe; default-allow fails open.

8. Problems with the given ruleset: (1) -P FORWARD ACCEPT sets default-allow — must be DROP. (2) permit 10.20.0.0/16 -> 10.30.0.0/24 opens the entire CDE to the whole corporate network — remove it. (3) the SSH permit is acceptable but should be the only admin path (scoped to the jump host, which it is). (4) permit tcp --dport 5432 with no source allows any host to reach anything on 5432 — scope to the payment app server and the db. (5) missing a stateful return rule and an explicit deny-and-log. Corrected:

iptables -P FORWARD DROP
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -s 10.30.0.40 -d 10.30.0.50 -p tcp --dport 5432 -j ACCEPT
iptables -A FORWARD -s 10.20.9.10 -d 10.30.0.0/24 -p tcp --dport 22 -j ACCEPT
iptables -A FORWARD -s 10.20.0.0/16 -d 10.30.0.0/24 -j LOG --log-prefix "CDE-DENY: "
iptables -A FORWARD -s 10.20.0.0/16 -d 10.30.0.0/24 -j DROP

9. A permit ip any any near the top of a first-match-wins ruleset matches essentially every packet, so every carefully written rule below it never executes — the firewall effectively becomes default-allow regardless of the work done beneath. The chapter's worked example shows the same hazard: the same packet is allowed or denied based only on rule order. What catches it: a periodic firewall rule review that forces every permit to be re-justified from a rule register, so an orphaned broad rule with no business owner surfaces and is deleted. (PCI-DSS explicitly requires such periodic review.)

10. Many firewalls drop unmatched traffic via an implicit deny-all, but silently — generating no log. An explicit deny-and-log rule both drops the traffic and records it, so blocked attempts become security events you can investigate. The investigative capability the explicit version adds: evidence. A breach investigation may hinge on a log line showing that something tried to reach the CDE at 3 a.m. and was blocked; the implicit deny leaves no such trace.

12. Placement and authority: an IDS sits out of band (it receives a copy of traffic via a SPAN port or tap) and has authority only to alert; an IPS sits in-line (traffic flows through it) and has authority to drop malicious packets in real time. To block a critical, actively exploited vulnerability you need an IPS, because only an in-line device can stop the traffic before it arrives. The operational risk introduced: the IPS is now in the live path, so a false-positive signature can block legitimate traffic, and an IPS outage can break connectivity entirely — which is why only high-confidence signatures are set to block, and noisier ones only alert.

15. Suricata/Snort-style rule (alert):

alert tcp $HOME_NET any -> $EXTERNAL_NET 80 ( \
    msg:"POLICY Outbound HTTP request for /etc/passwd (file disclosure indicator)"; \
    flow:established,to_server; \
    content:"GET"; http_method; \
    content:"/etc/passwd"; http_uri; \
    classtype:policy-violation; sid:9000031; rev:1; )

To convert to an inline block, change alert to drop. You might hesitate because, in-line, a false positive blocks legitimate traffic in the live path — and /etc/passwd could legitimately appear in, say, a documentation request — so blocking risks an outage for a low-confidence indicator. Alert-and- investigate is often the safer posture for a policy/heuristic signature.

17. A signature-only IPS is incomplete because signatures only match known patterns. A zero-day has no signature, so nothing matches and the attack passes invisibly. A realistic low-and-slow scenario: an attacker with stolen valid credentials logs in during business hours, uses only built-in administrative tools (no malware → no signature), and exfiltrates data in small chunks over allowed HTTPS — the signature IPS sees nothing to match for weeks. To cover the gap, add anomaly-based detection (flags deviation from a baseline, catching the novel/abnormal) and correlation across data sources (Chapter 21), so behavior that no single signature describes still surfaces.

19. The three 802.1X roles: supplicant = the connecting device and its software (at Meridian, an employee laptop's network stack); authenticator = the switch or access point controlling the port (the branch access switch); authentication server = the back-end that validates credentials (Meridian's RADIUS service checking Active Directory / Entra ID). The authenticator is a gatekeeper that does not decide identity because it keeps the port closed and relays the authentication exchange to the RADIUS server, but the server performs the actual validation and returns permit/deny — so the gatekeeping device never needs to hold the credential database.

21. Certificate-based 802.1X resists spoofing because a certificate requires possession of a private key that is never transmitted; an attacker who observes the exchange cannot reproduce it. MAB (MAC Authentication Bypass) authenticates by MAC address alone, and a MAC is broadcast in plaintext and trivially cloned — the defeating attack is simply copying an allowed device's MAC (e.g., a printer's) to inherit its access. Cloning the public MAC gains nothing under certificate auth without the key. For devices that genuinely cannot run a supplicant (printers, cameras, badge readers), use MAB only inside tightly restricted segments, so that even a successful MAC clone reaches only a small, low-value set of systems.

23. Coarse VLAN segmentation restricts traffic between a few large zones but leaves traffic within a zone unrestricted. Microsegmentation applies access policy between individual workloads (potentially per host), so even systems on the "same" network must be explicitly permitted to communicate. It specifically defeats lateral movement (east-west spread): a foothold can reach only the few systems explicitly permitted, not the whole zone. It is a "project, not a checkbox" because the hard part is not enforcement but knowing which flows are legitimate — you must observe real traffic first, or default-deny between workloads will break production and force defeating-broad exceptions.

26. Enabling default-deny between workloads without first mapping real traffic breaks every legitimate flow nobody documented (a reporting job, a backup agent, a monitoring poll), causing outages; teams then either roll back entirely or punch broad holes that defeat the segmentation. The disciplined sequence is observe → derive least-privilege policy from evidence → enforce. The earlier-chapter capability that provides the map is the network flow monitoring of Chapter 10 (NetFlow/Zeek-style visibility), run in observe-only mode to discover what actually talks to what before any rule is enforced.

28. With 1,000,000 events, 100 malicious, 99% detection, 0.1% false-positive rate: - True alerts = 100 × 0.99 = 99. - False alerts = (1,000,000 − 100) × 0.001 ≈ 1,000. - P(alert is real) = 99 / (99 + 1000) ≈ 9.0%. This is the base-rate problem: because real attacks are rare relative to the huge volume of benign traffic, even a tiny false-positive rate produces far more false alarms than true ones, so "99.9% accurate" is meaningless without the base rate. Implication: most alerts are false, analysts cannot chase all of them, and the remedy is aggressive tuning plus correlation (combining signals so the surviving alerts are high-confidence) rather than buying a marginally more "accurate" sensor.

31. The rule that lies. The single permit (10.40.0.10 -> 10.30.0.50:5432) protects the database from direct corporate access, but it does not protect the payment app server itself, which is reachable from the corporate network on its admin port. Most likely path: the tester (1) reaches the payment app server over its exposed admin port from the corporate network, (2) compromises or pivots through it, and (3) from the app server — whose IP is permitted to reach the database on 5432 — queries the CDE database directly. The single rule was a true statement about one path while leaving the app server as an unguarded stepping-stone. Which controls would each break the chain: NAC/802.1X (the tester's access depends on reaching the corporate network; if she used a rogue device, NAC quarantines it); microsegmentation (restrict who may reach the app server's admin port — ideally only the bastion); a bastion host (force all admin access through one monitored chokepoint); IDS/anomaly detection (the unusual corp→app-admin connection and the app→db query pattern would alert). Layered rewrite: default-deny corp→app except the bastion on the admin port; app→db permitted only on 5432; microsegment the app server; alert on any corp host touching the app's admin port; require MFA + session recording at the bastion. The protection becomes a chain of barriers, not a single rule that is true about only one path.


Chapter 8

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design tasks, or discussed in class. Code-backed answers (Ex. 21, 26) are also in code/exercise-solutions.py.

1. Defining cipher/handshake and verdict for each: - WEP — RC4 with a 24-bit initialization vector and a static key; remove (broken — key recoverable from captured traffic in minutes regardless of passphrase length). - WPA — RC4 with TKIP per-packet key mixing; avoid/remove (deprecated stopgap on a weak cipher, treat as near-WEP). - WPA2 — AES-CCMP with a four-way handshake (Personal/PSK mode); use if patched (AES is sound, but PSK is offline-guessable so a long random passphrase is mandatory). - WPA3 — AES with the SAE (Dragonfly) handshake; use/prefer (resists offline guessing, adds forward secrecy and Enhanced Open, mandates Protected Management Frames).

3. Meridian branch on WPA2-Personal with passphrase MeridianBank. The threat is an offline dictionary/brute-force attack on a captured four-way handshake: an attacker in range captures the handshake passively (or forces a device to reconnect with a deauthentication frame to capture it on demand), then takes the capture away and guesses passphrases offline — with no further contact with the network — at billions of attempts per second on dedicated hardware. MeridianBank is a short, guessable, dictionary-derived phrase and would fall almost immediately. Where it happens: capture is on the network (in radio range); the guessing is entirely offline, which is why it is undetectable. The two preventive controls: (1) use a long, random passphrase so offline guessing is infeasible — far better than a memorable phrase; and (2, far stronger) abandon PSK entirely and move to WPA-Enterprise (802.1X), where there is no shared secret to capture and crack at all. The second is strongly preferred for any organization, because it also fixes revocation and accountability.

6. (a)→(iii) pre-shared key = one passphrase shared by all devices; (b)→(iv) EAP = an extensible container that can carry many authentication methods; (c)→(i) 802.1X = a port-based access-control standard with three roles (supplicant/authenticator/authentication server); (d)→(ii) RADIUS = the protocol an authenticator uses to ask a central server to verify a user; (e)→(v) Enhanced Open = encryption for password-free ("open") networks.

8. PEAP credential harvest in the parking lot. (a) The clients were almost certainly configured to skip server-certificate validation. In PEAP the device should verify the RADIUS server's certificate before sending the password into the TLS tunnel; with validation off, the device builds the tunnel to the attacker's evil-twin RADIUS server and sends the password straight to the attacker. (b) The impact is worse than "they got onto our WiFi" because the wireless password is the Active Directory password — so the attacker now holds a corporate credential usable far beyond WiFi: email, VPN, internal apps, remote access. A WiFi foothold became an identity compromise. (c) Strongest fix: move to EAP-TLS (mutual certificate authentication — there is no password to steal even against a rogue server). Acceptable fallback: keep PEAP but rigorously enforce and audit server-certificate validation (pin the expected RADIUS certificate or its issuing CA) via centrally managed device policy, never leaving it to per-device defaults.

10. Three operational advantages of WPA-Enterprise over WPA2-Personal, each tied to a PSK failure: 1. Individual revocation. Under PSK, revoking a departed teller means re-keying every device, so it never happens and former staff keep working credentials for years. Enterprise disables the user in the directory once and they lose wireless instantly. (Bad outcome prevented: ex-employee retains access.) 2. Accountability/attribution. Under PSK the AP sees one shared identity, so logs cannot show who connected. Enterprise authenticates each user as themselves, producing per-user connection logs. (Bad outcome prevented: an incident where you cannot tell which account was on the network.) 3. Per-user segmentation. Under PSK every device lands on the same network. Enterprise lets RADIUS tell the AP which VLAN to place each user on, enforcing segmentation at connect time. (Bad outcome prevented: a guest or low-trust device sharing the teller segment because everyone shares one network.) (Also acceptable: no shared secret exists to be written on a sticky note / captured-and-cracked offline.)

12. Reading the WIDS alert summary. (a) Two attacks: an evil twin (lines at 09:31:50 and 09:32:30 — Meridian-Staff advertised from the unknown radio XX-AB:CD:EF, with clients then associating to it) and a deauthentication attack (09:31:51 — a burst of 847 deauth frames spoofing the authorized AP's source in 10 seconds). The relationship: the deauth attack is the lever — it kicks clients off the legitimate AP so they reconnect, and the louder evil twin catches them; the deauth makes the evil twin reliable. (b) The highest-fidelity evil-twin indicator is 09:31:50 / 09:32:30: a known-corporate SSID (Meridian-Staff) advertised from a BSSID (XX-AB:CD:EF) that is not the authorized AP (AP-00:11:22). The rule that fired: right SSID, wrong hardware (a beacon for a corporate SSID from a BSSID not on the authorized allowlist). (c) 802.11w / Protected Management Frames (PMF) — had it been enabled, the forged deauthentication frames at 09:31:51 would have been rejected as unauthenticated, and the deauth flood would not have worked. (d) First containment: locate and remove the rogue radio (walk it down by signal strength) and/or have the WIPS contain XX-AB:CD:EF, then check for and reset any credentials that may have been harvested by the evil twin.

14. The employee's consumer AP. (a) They have created a rogue access point — an unauthorized AP bridged onto the internal wired network. Risk: likelihood is moderate-to-high that it exists undetected (well-intentioned staff do this often and nobody reports it) and impact is high (an unauthenticated, probably weakly-secured doorway from radio range — including the parking lot — directly onto the internal network, bypassing the official wireless controls). A defensible score is L3 × I5 = 15, CRITICAL, or L4 × I4 = 16 depending on how reachable the signal is. (b) Specific dangers: it is likely WPA2-Personal with a weak/default passphrase or open; it may have WPS enabled; it provides a path from outside onto the internal LAN that bypasses the WIDS-monitored official APs; and it is unmanaged and unpatched. (c) Detective: a WIDS that correlates an unknown radio to the wired network (rogue-AP detection — §8.4). Preventive: disable unused network jacks / enforce port-based NAC (802.1X on the wire — Chapter 7) so an unauthorized device cannot get a connection even if plugged in; plus awareness training so staff request coverage fixes instead of improvising (Chapter 30).

17. Bluetooth card readers in the cardholder-data path. Three proportionate controls: (1) patch the readers' Bluetooth stacks and firmware (BlueBorne-class bugs are patchable and these devices process sensitive input); (2) inventory them as the small computers they are and monitor for tampering; (3) use modern secure pairing (Bluetooth "Secure Connections") and disable discoverability when not pairing. It is specifically a PCI-DSS concern, not merely IT, because the reader sits in the cardholder-data path: a compromised reader could capture or manipulate payment-card data at the point of entry, which falls directly under PCI-DSS's requirements to protect cardholder data and maintain secure systems — so a Bluetooth weakness here is a compliance finding with potential fines and fraud liability, not just an IT hygiene issue.

19. Example wireless security policy snippet (auditable bullets): - Permitted/prohibited protocols: WPA3 is required on all new and upgradeable wireless; WPA2-AES is permitted only where hardware cannot support WPA3, with a documented replacement date. WEP and WPA(TKIP) are prohibited and are treated as critical findings wherever discovered. - Staff authentication: Staff wireless must use WPA-Enterprise (802.1X) with EAP-TLS preferred; PEAP/EAP-TTLS are permitted only with enforced, audited server-certificate validation. No shared passphrase may be used for any network that reaches internal systems. - Guest isolation: Guest wireless must be internet-only and fully isolated from all internal segments by default-deny firewall rules; client isolation must be enabled. - Operational/IoT segmentation: Printers, cameras, signage, and other operational devices must be on a dedicated segment that can reach neither the internet nor the staff segment. - Management-frame protection: 802.11w (Protected Management Frames) must be enforced on all wireless. - Monitoring: A WIDS must monitor all sites for rogue access points and SSID impersonation (evil twins); alerts are triaged per the wireless runbook. - Rogue APs: Unauthorized access points are prohibited and removed on detection; unused network jacks are disabled.

22. Guest-segment firewall rules (default-deny). The guest VLAN policy, complete:

ALLOW  guest -> internet (direct, content-filtered)        # the only thing guests may reach
DENY   guest -> staff VLAN                                 # protects teller workstations / banking apps
DENY   guest -> ops/IoT VLAN                               # no path to operational devices
DENY   guest -> any internal subnet / management network   # no internal access of any kind
DENY   guest -> guest  (client isolation)                  # guests cannot attack each other
# DEFAULT: deny anything not explicitly allowed above.

"Internet only, deny all internal" is the entire security of guest WiFi because guest WiFi makes no attempt to authenticate or trust the devices on it — anyone, including an attacker or an evil twin impersonating the guest network, may be on it. Therefore the only thing that keeps guest WiFi from being a doorway inside is that the segment cannot reach anything internal. If the isolation is intact, a fully compromised guest network reaches only the public internet (which it could reach anyway); if a single allow-rule toward an internal destination is added "for convenience," the entire security model collapses because there is now a trusted path from an untrusted network. The firewall rules are the control; the "guest" label guarantees nothing.

23. The parking-lot mystery — reconstruct the chain. Most likely sequence: 1. Day 1, disconnects = a deauthentication attack. The attacker sent forged deauth frames, knocking staff tablets off WiFi repeatedly. Control that should have stopped it: 802.11w / Protected Management Frames (authenticates management frames so deauth cannot be forged). 2. Day 1–2, an evil twin. The deauth pushed devices to reconnect, and the attacker's access point advertised the same SSID with a stronger signal, so devices/users connected to it. Control: a WIDS (detects a corporate SSID from an unauthorized BSSID), plus WPA3/EAP-TLS (removes the password the attacker is after and makes luring far less rewarding). 3. Day 2, the cloned login page captures the password. The teller, reconnecting, saw a familiar-looking login page served by the evil twin and entered her credentials, which the attacker harvested. Control: phishing-resistant authentication / EAP-TLS or certificate-/MFA-backed login (no password to phish), and enforced server-certificate validation if PEAP is in use. 4. Day 3, the credential is used from elsewhere. The attacker logged in with the stolen credential. The one control that limits the damage even if everything else fails: segmentation with default-deny between segments — if the teller's wireless/credential reaches only an isolated segment and not the systems that matter, the harvested credential buys the attacker little. Telemetry that catches it earliest: a WIDS would have caught the deauth flood on Day 1 (the leading indicator) and the evil-twin beacon — the earliest possible detection; failing that, authentication logs showing a credential used from an impossible location/time (as in Case Study 2) catch the pivot on Day 3.

29. Why capturing a WPA2-Personal handshake cannot be detectably stopped: the handshake is exchanged over the air in radio range, and an attacker only has to passively listen — they transmit nothing during capture, run no software against your equipment, and leave no entry in any log you control. (They can also force a handshake with a deauth, but they do not even need to; they can simply wait for a device to join.) Because the capture is undetectable and the subsequent guessing happens offline on the attacker's own hardware, there is nothing for a defender to detect or respond to — which forces the defense entirely into prevention: make the passphrase long and random so offline guessing is infeasible, or (better) eliminate the shared secret with WPA-Enterprise. This illustrates Chapter 1's theme that defenders must be right every time while attackers need only one success (Theme 2), and the related principle that when you cannot detect, you must prevent — you get no second chance after a silent capture.


Chapter 9

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. DNS poisoning — inserting a fraudulent record into a resolver's cache so later lookups return an attacker-chosen address. DNS tunneling — encoding non-DNS data inside DNS queries/responses (in subdomain labels) to smuggle traffic through a network that permits port 53. DNS exfiltration — using that tunnel specifically to steal data, dribbling it out in query labels. DNS sinkhole — a resolver policy that returns a controlled, safe answer for known-malicious domains, redirecting hosts away from the attacker. Distinguishing sentence: tunneling is the channel (any data smuggled over DNS, including command-and-control), while exfiltration is the data-theft use of that channel.

4. Classic DNS answers carry no cryptographic proof of origin and (historically) traveled over UDP with only a 16-bit transaction ID and source port to match a response to a query. An attacker who can inject a forged response that matches those fields, arriving before the legitimate answer, has it accepted as genuine — there is no signature to check. Caching then multiplies the damage: the resolver stores the forged record for its TTL and serves it to every client that queries during that window, so a single successful forgery poisons many victims until the TTL expires. (DNSSEC closes the first gap by making answers verifiable; source-port randomization, added after Kaminsky, narrowed the second.)

7. (a) v=spf1 ip4:203.0.113.10 ~allWEAK: the ~all soft-fail tells receivers to accept but mark unlisted senders, doing little to stop spoofing. Fix: change to -all (hard fail) once you are confident your senders are listed. (b) v=DMARC1; p=none; ...MONITOR-ONLY / no protection: p=none takes no action, so spoofing of the domain still succeeds. Fix: complete the listening phase, fix your own failing senders, then advance to p=quarantine and p=reject. (c) v=spf1 include:_spf.cloud-mail.example ?allDANGER (decorative): ?all is neutral and instructs receivers to do nothing about unlisted senders, so the record provides no enforcement at all. Fix: replace ?all with -all.

10. Meridian DNS hardening plan: (1) Sign external zones with DNSSEC so other resolvers can validate Meridian's records and reject forged answers for the bank's domains. (2) Enable DNSSEC validation on internal resolvers so Meridian rejects forged answers for domains it queries. (3) Deploy a threat-intel-fed DNS sinkhole to block known-malicious domains at resolution and surface infected hosts. (4) Ship resolver query logs to the SIEM to enable tunneling/DGA/known-bad detections (§9.6). (5) Note the gap DNSSEC does not close — it does not stop typosquatting (a look-alike is a legitimately registered domain) or provide confidentiality; so user-awareness/look-alike detection (and, for privacy, DoH/DoT) remain necessary as separate controls.

12. (a) Indicators: the From: domain is a look-alike (meridi1anbank.example, digit one for letter L); the Reply-To is an unrelated free-mail address (cfo-urgent@gmail.example) that differs from the From; financial urgency in the subject ("URGENT wire — confidential, before 4pm"); and SPF fails for the sending IP. The display name ("Dana Okafor, CISO") impersonates an executive while the address does not match the real domain. (b) The From: field shows the display-name-vs-address mismatch: a trusted human name wrapped around an untrusted address. (c) Meridian's SPF/DKIM/DMARC on the real meridianbank.example would not block this message because it was sent from a different domain (meridi1anbank.example) that Meridian does not control — Meridian can only publish authentication records for domains it owns. The controls that would help: the secure email gateway flagging the newly registered look-alike domain's poor reputation and the display-name impersonation, and look-alike / edit-distance detection (§9.6) surfacing the cousin domain.

14. First six response steps (report-phish workflow): (1) Preserve the reported message and pull its full headers, sender, and URL/IOC. (2) Search mail logs / the SEG for every recipient of the same message (by sender, subject, URL) — find all 40. (3) Pull/quarantine the message from all 40 inboxes before more clicks occur. (4) Identify the 2 who clicked (proxy/DNS logs for the malicious URL) and check for credential submission. (5) Contain the 2 affected accounts — force password reset, revoke sessions, and (if MFA exists) confirm it held; isolate their endpoints if malware is suspected. (6) Block the IOCs — add the sender, URL, and any look-alike domain to the gateway blocklist and the DNS sinkhole, and open an incident ticket for tracking. (Then: notify affected users, watch for follow-on logins, and feed the lure to the awareness program.)

17. Enforcing record: v=DMARC1; p=reject; rua=mailto:dmarc-reports@meridianbank.example; pct=100; adkim=s; aspf=s Monitoring-phase record (publish first): v=DMARC1; p=none; rua=mailto:dmarc-reports@meridianbank.example; pct=100 Difference: p=none takes no action on failing mail — it only collects aggregate reports so you can discover and fix your own legitimate senders before enforcing. p=reject (with strict alignment adkim=s / aspf=s) causes receivers to bounce mail that fails authentication and alignment for the domain. You move from the first to the second only after the listening phase confirms your own streams pass.

19. Alignment requires that the domain authenticated by SPF or DKIM match the domain in the visible From: header (strict = exact match; relaxed = same organizational domain). Example where SPF passes but DMARC fails: an attacker sends from a server they fully control under attacker.example, with an envelope sender of bounce@attacker.example — SPF passes for attacker.example because the sending IP is listed in that domain's SPF — but they set the visible From: ceo@meridianbank.example. DMARC checks alignment, sees the SPF-authenticated domain (attacker.example) does not match the From domain (meridianbank.example), and fails, applying Meridian's policy (reject). This is exactly the desired outcome: SPF passing for the attacker's own domain must not let them wear Meridian's From address.

22. (a) Missing security headers from the baseline: Strict-Transport-Security (HSTS), Content-Security-Policy, X-Content-Type-Options, X-Frame-Options, Referrer-Policy. (b) Missing cookie attributes: Secure (mitigates cookie theft over plain HTTP), HttpOnly (mitigates cookie theft via XSS — JavaScript cannot read it), SameSite=Strict or Lax (mitigates CSRF). (c) Corrected set:

Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
Content-Security-Policy: default-src 'self'; script-src 'self'; object-src 'none'; frame-ancestors 'none'
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Referrer-Policy: strict-origin-when-cross-origin
Set-Cookie: session=abc123; Path=/; Secure; HttpOnly; SameSite=Strict

25. Secure — the cookie is only ever sent over HTTPS, so it cannot be sniffed on an unencrypted connection (defends against interception on a downgraded/plain-HTTP link). HttpOnly — the cookie is invisible to JavaScript, so an injected script cannot read and exfiltrate it (defends against session theft via cross-site scripting). SameSite=Strict — the browser will not attach the cookie to cross-site requests, so a malicious site cannot ride the user's session (defends against cross-site request forgery, CSRF). The one that defends a session cookie against theft via an XSS flaw is HttpOnly.

27. Host 10.0.0.7 shows DGA behavior: three queries to random-looking domains, all returning NXDOMAIN (the malware is cycling through unregistered candidate domains looking for the one the attacker registered). Host 10.0.0.9 shows tunneling: queries to long, high-entropy subdomain labels under a single parent (exfil.example), all returning NOERROR (the domain resolves, to the attacker's authoritative server, and is carrying data). The single distinguishing field is the response code (rcode): a DGA produces a burst of NXDOMAIN (failed lookups while searching), whereas a tunnel produces successful NOERROR responses (an established channel in use).

29. BEC detection without malware/links relies on correlating weak signals, none decisive alone: (1) sender domain newly registered or a look-alike of yours or a partner's; (2) display name matches an executive/known contact while the email address does not; (3) Reply-To differs from From; (4) financial-urgency language ("wire," "urgent," "confidential," "change of bank details," "gift cards"); (5) first-time sender to this recipient or unusual sender/recipient pair; (6) mention of a change to payment details or an out-of-cycle large payment. Correlation is required because each signal in isolation is common and benign (plenty of legitimate mail is urgent), but their combination — a new-domain sender, impersonating an executive by display name, with a reply-to mismatch and a payment-change request — is high-confidence malice, and there is no payload to trigger a content-based rule. The SIEM (Chapter 21) stitches the weak signals into one strong alert.

31. (a) Look-alike domains and their techniques: meridian-bank.example (hyphen insertion); meridi1anbank.example (digit-for-letter substitution, 1→l, edit distance 1); meridianbank.example.co (TLD/subdomain swap — the real name is pushed into a subdomain of an attacker-controlled .co registration, or a different TLD); mer1d1anbank.example (multiple digit substitutions, i→1 twice). (b) Automated detection: compute edit (Levenshtein) distance of each inbound sender domain's label against your brand and known-partner brands, flagging distance 1–2 as a likely cousin; additionally screen for homoglyphs (look-alike Unicode characters) and TLD variants of your exact brand string. The code/exercise-solutions.py looks_like() function demonstrates the edit-distance approach. (c) For the two that are not look-alikes — meridianbank.example (your real domain) and title-co.example (an apparently unrelated third party) — decide safety by other means: the real domain should authenticate under your own DMARC; the third party should be evaluated on domain age/reputation, whether it authenticates, whether you have an established relationship with it, and (for any money/data request) the out-of-band verification process from Case Study 2.


Chapter 10

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class. Runnable helpers for the calculation problems are in code/exercise-solutions.py (hand-traced).

1. Packet capture (PCAP) — recording network traffic byte-for-byte exactly as it crosses an interface (and the .pcap file that stores it). Zeek — a network security monitoring platform that turns live traffic into one structured connection log per connection plus protocol logs, instead of storing raw packets. NetFlow/IPFIX — protocols that emit compact per-conversation flow records (5-tuple + byte/packet counts + time, no payload). NDR (network detection and response) — the operational discipline of continuously monitoring network telemetry to detect, investigate, and respond to threats. In relation: full PCAP, Zeek, and flow are three altitudes of the visibility that NDR operationalizes — PCAP keeps every byte briefly, flow keeps a tiny summary of everything for a long time, and Zeek sits in between; NDR is the practice of consuming all three to find adversaries.

4. A network baseline is a model of what normal traffic looks like for an environment — which hosts talk to which, on what ports, in what volumes, at what times. Almost every network detection is a comparison against it because attacker behavior is recognized as deviation from normal (a beacon is "more regular than normal," exfiltration is "more outbound than normal," lateral movement is "more internal destinations than normal"); without a reference for "normal," every observation is just a number with no meaning. Server baseline question: does this server normally make any outbound internet connections at all? (a domain controller that suddenly does is alarming). Workstation baseline question: how much does this workstation normally send outbound per day? (so a large multiple stands out).

7. Raw packet captures are far too voluminous to ingest and retain in a SIEM at scale — a busy link produces terabytes per day of full capture, which would overwhelm storage and indexing. So the heavy ground-level data stays behind in the capture appliance (often a rolling short-retention buffer) and is retrieved on demand only when a specific investigation needs the actual bytes. What flows into the SIEM is the lightweight, structured telemetry — Zeek logs and flow records — plus the alerts/notices sensors generate; these are small enough to retain and, crucially, to correlate with logs from other sources.

8. (a) Command-and-control beaconing: one internal host (10.20.4.55) checking in with one external destination (192.0.2.80:443) on a near-constant hourly interval. (b) The signal is in two fields: the connection start times — almost exactly one hour apart (02:00, 03:00, 04:00, 05:00, 06:00) — and the byte counts, which are near-constant (~2,190 bytes) across every flow; humans browse irregularly and with varying sizes, automation does not. (c) It evaded the endpoint agent because each connection was an ordinary browser-like TLS session, and the firewall because it was allowed outbound port 443 to a destination with no bad reputation — no single event was anomalous, only the weeks-long pattern. (d) beacon_score over the grouped connections (flagging the low variance of inter-arrival times) would catch it; so would correlation/alerting in the SIEM, and ultimately blocking the destination once confirmed.

10. (a) The byte asymmetry — 7.8 GB sent versus 142 MB received — is the classic shape of exfiltration: a host pushing data out, not pulling content in (a normal web session receives far more than it sends). (b) 7.8 GB against a ~150 MB/day baseline is roughly 52× the host's normal daily outbound (7,800 / 150 ≈ 52), an enormous anomaly. (c) It is still detectable because encryption hides the payload, not the metadata — the volume, direction, destination, and timing are all visible regardless of TLS, and a clean reputation does not change the fact that this host has never behaved this way. (d) top_talkers surfaces it: ranking (src, dst) pairs by total bytes puts this conversation at the top as the outbound-volume first-look.

12. (a) Host A is far more likely a beacon. The deciding statistic is the variance (or coefficient of variation) of the inter-arrival times: A's gaps [3600, 3600, 3601, 3599, 3600] have near-zero spread (a high beacon_score), while B's gaps [3200, 4100, 2900, 4500, 3300] vary widely (a low score). (b) The average does not settle it because both average ~3,600 seconds — the mean alone cannot distinguish "metronomic" from "irregular but centered on an hour"; only the spread of the gaps reveals regularity. (c) In Host B's case the attacker is adding jitter — deliberately randomizing the check-in interval to defeat simple regularity detection. The retention choice that helps catch even a heavily jittered beacon is long flow-data retention: over weeks or months, even a jittered beacon's central tendency and persistence to the same destination become statistically obvious, whereas a short window shows too few check-ins to separate jitter from noise.

14. Four reasons "full PCAP on every link, one year, into the SIEM" fails: (1) Storage — full capture on busy links is terabytes per day; a year network-wide is petabytes, financially and physically infeasible. (2) SIEM ingestion — raw packets are far too voluminous to index and correlate centrally; the SIEM would drown. (3) Lossy in practice — teams that attempt this quietly drop packets, producing incomplete captures that betray them during an investigation. (4) Wrong tool for most detection — beaconing, exfiltration, and lateral movement are detected by aggregation (flow/Zeek), not by reading payloads, so full capture everywhere is unnecessary as well as unaffordable. Rewrite: flow data (NetFlow/IPFIX) everywhere, retained 12–13 months (the cheap, wide, long census); Zeek logs from sensors on high-value and east-west links, retained ~90 days (the rich detection layer); targeted full packet capture as a short rolling buffer (e.g., 72 hours) only on the most sensitive links (the internet edge and the CDE), pulled on demand; only the structured Zeek/flow telemetry and alerts forwarded to the SIEM, with raw PCAP kept local. This matches detail to cost and ships only what the SIEM can use.

19. Triage steps for a beacon_score of 0.97, ~200 daily check-ins to 192.0.2.80:443 over three weeks: (1) Flow data — confirm the cadence and persistence across the full three weeks and check the per-flow byte counts (tiny + constant supports beaconing; large outbound would add exfiltration concern). (2) Zeek conn.log — verify conn_state (stable SF channels), durations, and that the pairing is consistent, and look for other internal hosts beaconing to the same destination (corroboration). (3) Zeek ssl.log — pull the destination's certificate and SNI; a freshly registered domain is a strong C2 tell, and the metadata is visible despite encryption. (4) Zeek dns.log — see how the host resolved the destination and when (does it line up with the beacon's start?), and rule out DNS itself being a covert channel (query volume/entropy). (5) Targeted full PCAP — if still live, capture the channel for deeper inspection and evidence, but expect the payload to be encrypted. Handoff to incident response (Chapter 24): once the beacon is confirmed and the host is judged compromised, this becomes an incident — preserve, contain the host(s), and let IR drive eradication and (with endpoint forensics, Chapter 25) identify the implant; network monitoring has done its job by scoping what, where, how long, and whether bulk data left.

22. Gaps [600, 600, 590, 610, 600]. Mean = (600+600+590+610+600)/5 = 3000/5 = 600.0. Deviations from the mean: 0, 0, −10, +10, 0 → squares: 0, 0, 100, 100, 0 → sum = 200. Variance = 200/5 = 40.0; standard deviation = √40 ≈ 6.325. Coefficient of variation = 6.325/600 ≈ 0.0105. beacon_score = 1 − CV = 1 − 0.0105 = 0.98946 → rounded to 3 places = 0.989. Yes — this is strongly beacon-like: a score of ~0.99 means the check-ins are almost perfectly regular (gaps of ~10 minutes with only a ±10-second wobble), well above a typical 0.9 threshold.

25. (a) The host is both beaconing and exfiltrating, slowly. It contacts one external IP once per day at ~midnight (a very regular, low-frequency beacon — high beacon_score on the daily timestamps), and each daily transfer is ~40 MB, which over six days cumulates to ~240 MB — but no single day approaches the "5 GB/hour" volume alarm, so the per-interval threshold never trips. It is low and slow. (b) Two detection strategies catch it together: a timing detection (beacon_score on the daily check-in times flags the suspicious regularity even at once-per-day frequency) and a cumulative-volume detection (summing outbound bytes per destination over a multi-day window flags ~240 MB to one non-standard destination, which the per-hour alarm misses). (c) Retaining flow data for 13 months rather than 7 days is decisive because a once-per-day beacon produces only one connection per day; in a 7-day window you see seven points — too few to establish regularity or a meaningful cumulative total — whereas over months you see the same midnight check-in repeat dozens of times to the same destination, making both the rhythm and the cumulative volume unmistakable. (d) Example finding: "Host X has beaconed to external IP Y at ~daily intervals for [N] days and cumulatively sent ~[total] to a non-standard destination, consistent with low-and-slow exfiltration; recommend containment and endpoint forensics."


Chapter 11 — Answers to Selected Exercises

Full worked solutions to the daggered (†) exercises from exercises.md. Reasoning matters more than exact wording; where a problem is design-open, a strong model answer and a rubric are given.


1.† (Vocabulary) Define hardening, attack surface reduction, baseline configuration, least functionality; then one sentence using all four.

  • Hardening: deliberately configuring a system to reduce its attack surface and raise the cost of compromise (remove/disable unneeded software, services, and features; tighten accounts; enable controls; log).
  • Attack surface reduction: systematically eliminating exposed functionality (services, ports, accounts, interpreters, protocols) so there is less for an attacker to reach or abuse — the organizing goal of hardening.
  • Baseline configuration: the documented, approved set of settings for a system type (usually derived from a CIS Benchmark and adjusted), built and audited against.
  • Least functionality: running only the software and services a host's role requires.

One-sentence use: "We hardened the web servers by enforcing a baseline configuration derived from the CIS Benchmark, applying least functionality (removing every service the web role didn't need) as our main lever for attack surface reduction."


4.† (CIS levels) A CIS Benchmark is a consensus, platform-specific secure-configuration standard. Level 1 = settings that improve security with minimal impact on functionality (suitable for most systems); Level 2 = stricter settings for high-security environments that may break functionality and require testing. Deciding factor (any reasonable one): the sensitivity/exposure of the system's role — e.g., a host in the cardholder-data environment warrants Level 2 (accept operational cost for stricter settings), while a general-purpose workstation takes Level 1 so functionality isn't broken. The level choice is itself a risk decision (likelihood × impact applied to a configuration knob).


5.† (Secure Boot / TPM) Secure Boot verifies the digital signature of each component in the boot chain (firmware → bootloader → kernel) and refuses to load anything not validly signed — it stops a bootkit/rootkit from running before the OS and its defenses start. A TPM adds a hardware root of trust: it stores keys and boot measurements securely, can seal disk-encryption keys so the disk only decrypts if the machine boots an unmodified configuration, and supports hardware-backed attestation that a machine booted into a known-good state. A pre-OS control is valuable on even an otherwise hardened machine because every OS-level control (EDR, MAC, logging, allowlisting) can be undercut if an attacker runs code before the OS loads; Secure Boot + TPM close that gap beneath everything else.


8.† (Harden this Windows server) Priority order with the technique each removes/records: 1. Deploy LAPS / disable the shared local Administrator — removes the shared-credential lateral-movement path (one recovered password opening the whole fleet). Highest priority: it converts a fleet-wide compromise into a single-host incident. 2. Disable SMBv1 (via Group Policy) — removes the deprecated lateral-movement/worm protocol (WannaCry/NotPetya class). 3. Set Defender tamper protection ON (and enable Defender + key ASR rules) — stops an attacker from silencing the endpoint defenses before acting. 4. Enable PowerShell script-block logging (4104) + process-creation auditing (4688 w/ command line) / Sysmon — adds the telemetry that turns an invisible intrusion into a reconstructable timeline (detection layer). 5. Disable unused roles/services (Print Spooler) and set PowerShell to AllSigned / Constrained Language — reduces attack surface and constrains living-off-the-land abuse. (Set the host firewall to default-deny as well; the order above leads with the controls that map to the breach's worst facts.)


9. (Harden this Linux host) App needs 80/443; admins need 22. Actions: - Services: keep 80/443 (web) and 22 (SSH). Mask/remove the MTA on 25 and rpcbind on 111 — they are not required by the web role (remove the package where possible; removal beats disabling). - SSH (sshd_config): PermitRootLogin no (a stolen root password is useless remotely), PasswordAuthentication no → keys only (defeats the password spraying that hammers internet-facing SSH), MaxAuthTries 3 (slows brute force), AllowGroups ssh-users (only that group may even attempt login). - Validate with sshd -t before reloading; keep file-system hygiene (audit setuid bins, no stray world-writable files) and ensure SELinux/AppArmor is enforcing and auditd is shipping logs off-box.


11.† (SELinux setenforce 0) It is the wrong fix because setting SELinux to permissive disables the host's strongest containment layer in production — the very layer that would confine a compromised process (even one running as root) to its policy. The denial that prompted it is SELinux doing its job: an unexpected access was blocked. Right fix: read the denial in the audit log (ausearch -m avc), and use audit2allow to generate a targeted policy addition that grants the application the specific access it legitimately needs, leaving everything else confined. Risk introduced by setenforce 0: an exploited service is no longer boxed (it can read /etc/shadow, write outside its paths, open arbitrary connections), and because the reflex tends to repeat per-box, MAC can end up off fleet-wide — exactly the condition that let the Case Study 2 attacker entrench unconfined. Treat permissive/disabled in production as a finding.


13.† (Find the misconfig — Linux audit) Three most dangerous, ranked: 1. PasswordAuthentication=yes (should be no) — exposes SSH to credential guessing/spraying from the internet; this is the single change that, combined with a weak password, opens direct compromise (precisely the Case Study 2 vector). 2. PermitRootLogin=yes (should be no) — lets a guessed/stolen root password log in directly as root with no escalation step, handing full control immediately. 3. SELinux=permissive (should be enforcing) — the host's containment is off; a compromised process is unconfined, so any foothold escalates freely. (The /etc/shadow perms=0640 is also wrong — it should be far more restrictive (e.g., 0000/root-only) — but the SSH + SELinux trio is what turns a guess into a full, unconfined compromise; auditd=running is fine.)


15.† (Read the harden.py report; prevention vs detection) See code/exercise-solutions.py (Exercise 15) for the programmatic version. - smbv1_enabled=Trueprevention gap: restores a legacy lateral-movement protocol an attacker can use. - powershell_logging=Falsedetection gap: script activity goes unrecorded, so an intrusion leaves no timeline. - application_allowlisting=offprevention gap: unknown/unapproved code may execute (default-deny for code is lost). The pairing is the chapter's point: prevention controls stop the action; detection controls record what wasn't prevented. Both off together (as on the breach server) is the worst case — the attacker has an easy path and leaves no trace.


17.† (Analyze this log — PowerShell) (a) Likely chain: a malicious Office document (the parent is winword.exe) spawns powershell.exe with an encoded command (-enc), which downloads and executes a remote script (IEX (New-Object Net.WebClient).DownloadString(...)) — classic initial access/execution — and then attempts to access LSASS (Get-Process lsass | Out-File ...l.dmp) to dump credentials for lateral movement. (b) The Defender ASR rule "Block all Office applications from creating child processes" would have blocked step one (Word spawning PowerShell). (Also relevant: the ASR rule that blocks credential stealing from LSASS for the dump attempt.) (c) The detecting controls are PowerShell script-block logging (EventID 4104) and process-creation auditing with command line (EventID 4688) — the §11.2 telemetry that records the encoded command and the parent→child lineage an attacker hoped would be invisible.


19.† (Standard: what's missing / inconsistent) Missing (the chapter says a complete standard must include): (1) an enforcement mechanism that re-applies/corrects drift — "configure each server at build time" is one-time and will decay; it should specify Group Policy / configuration management. (2) a verification/audit step — the standard names no way to prove a host meets it (no drift audit). (Also acceptable: no patch timelines named; no LAPS.) Internal inconsistency: "Local admin: shared image password, rotated annually" contradicts the entire point of the standard — a shared local-admin password is the breach's lateral-movement path; it must be LAPS (unique, random, rotated per host), not one shared password rotated yearly.


22.† (Write the patch policy) A defensible four-tier risk-based host patch policy for Meridian: - Emergency — 24–72 h: Critical severity and actively exploited in the wild (on CISA's KEV list). Why: attackers are using it now; the window is the threat. - 7 days: Critical/High severity on an internet-facing asset. Why: high exposure + serious flaw. - 30 days: High/Medium severity on an internal asset. Why: serious but lower exposure; time to test. - Next monthly cycle: Low severity. Why: minimal risk; batch with routine maintenance. Rule for systems that cannot be patched on demand: "A system that cannot meet its tier is isolated on a restricted segment, hardened and monitored intensively, and covered by a documented risk acceptance with compensating controls and a review/expiry date." (Deploy in rings: pilot → broad → critical; verify installed, not merely sent.)


28.† (CTF — The comfortable server) Each breach fact → the single control that prevents/records it: | Fact | Control | |---|---| | Lateral movement via one shared local-admin password across 40 servers | LAPS (unique/rotated per host) | | Tools transferred over SMBv1 | Disable SMBv1 (Group Policy) | | Encoded PowerShell the team can't reconstruct | PowerShell script-block logging (4104) + 4688 / Sysmon | | Defender disabled before acting | Defender Tamper Protection ON | | Attacker "lived" comfortably / unconfined | Application allowlisting (servers) + the hardening baseline enforced via Group Policy |

If you could deploy only ONE fleet-wide to most reduce the blast radius: LAPS. The defining harm here was scale — one foothold became forty compromised servers — and the single fact enabling that scale was the shared local-administrator credential. LAPS makes the reused-credential technique impossible (host B won't accept host A's password), converting a fleet-wide compromise into a single-host incident. (Strong alternative, also creditable: Group Policy enforcing the full hardening baseline, because it turns off SMBv1, enables logging, and enforces tamper protection across the fleet at once — broader, but LAPS is the tightest single answer to blast radius specifically.)


Selected non-daggered answers (brief):

  • 2. Patched ≠ hardened: patching fixes known code flaws; hardening removes exposed configuration. Example patched-but-unhardened: a current Windows Server with SMBv1 on, shared local-admin, no logging. Example hardened-but-unpatched: a tightly configured box missing this month's critical update. Neither is fully "secure" — you need both; "current" and "hardened" are different properties.
  • 6. AV = signature/known-bad-file matching; EDR = behavioral/technique detection + telemetry + response. EDR catches fileless / living-off-the-land attacks (e.g., encoded PowerShell from Office) that have no malicious file for a signature to match.
  • 16. "Deployed" (sent to the fleet) ≠ "installed" (actually applied) — offline, failed, and excepted machines leave a tail. Report (a) % of fleet with the patch installed (verified by re-scan) and (b) age of the oldest missing critical patch instead of a single "deployed within 7 days" claim.
  • 20. (a) Fixed-function ATM controller: allowlisting + EDR ideal (stable known software set, high value). (b) Developer workstation: EDR (varied software makes allowlisting hard; AV alone insufficient). (c) Domain controller: EDR (crown jewel; behavioral visibility essential) — AV too. (d) Internet-facing web server: EDR, ideally allowlisting (servers run a fixed set).
  • 24. See code/exercise-solutions.py (Exercise 24): weight settings by severity, score = 100 × (1 − lost_weight / total_weight); weight shared-credential/SMBv1/allowlisting/tamper highest (prevention of high-impact techniques), logging slightly lower (still vital, but detective). Trend the per-host score over time and chase the lowest scorers.

Chapter 12

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class. All fixes are defensive: vulnerable patterns are shown only to secure them; no working exploit is given.

1. Secure SDLC (SSDLC) — building security activities into every phase of software development rather than as a pre-launch audit. Input validation — checking that incoming data matches the expected shape (allowlist), server-side, before acting on it. Output encoding — transforming data so the destination interpreter treats it as inert, for the specific context. Security requirement — a specific, verifiable statement of what the software must do or never do to be secure. Combined: "In a secure SDLC, we threat-model a feature into security requirements, then a developer satisfies them with server-side input validation on the way in and context-correct output encoding on the way out."

4. "We ran the OWASP Top 10 and passed" is weak because the Top 10 is a list of risk categories for awareness and prioritization, not a pass/fail test — it has no defined "pass," and a one-time check is a snapshot, not a program. The categories are also revised over time as data changes. A mature program would say instead: "We threat-model features, enforce a secure-coding standard, run SAST/SCA on every build and DAST before release, monitor dependency advisories continuously, and prioritize findings by risk" — i.e., a continuous process, not an event.

7. Weakness: Broken Access Control / insecure direct object reference (OWASP A01). The handler returns whatever doc_id the client supplies, checking only that the user is authenticated, never that they are authorized for that specific document. Abuse: a logged-in user changes doc_id in the URL to another user's document ID and reads it. Fix: authorize the authenticated identity for that specific object on every retrieval, server-side and default-deny:

def get_document(request):
    user = request.authenticated_user            # trust the session identity, not the URL
    doc_id = request.params["doc_id"]
    if not user_can_access(user, doc_id):        # per-object authorization check
        raise PermissionError("not authorized")
    return storage.fetch(doc_id)

9. Category: Security Misconfiguration (OWASP A05) — and the default credential also implicates Identification & Authentication Failures (A07). Risks: debug_mode/show_stack_traces leak internal details (file paths, library versions, even query structure) to an attacker on error — reconnaissance that aids other attacks; the default admin/admin credential is a trivial full compromise. Fix: disable debug and verbose errors in production (return generic error pages; log details server-side only), remove or change every default credential, and manage the configuration as code so the hardened settings are the only settings deployed (the Chapter 11 baseline discipline, applied to the app tier).

11. Weakness: a hard-coded secret (a live database credential) committed to source control — a cryptographic/secrets failure. Why severe even if later deleted: version control retains history, so the credential is permanently readable in past commits by anyone who has ever cloned the repo, and exists in backups and forks; deleting the line does not un-leak it. Correct pattern: keep secrets out of code entirely — inject from a secrets manager or the environment at runtime — scan commits for secrets before merge, and rotate any secret that ever touched a repository (treat it as compromised). Full discipline in Chapter 20.

import os
DB_PASSWORD = os.environ["DB_PASSWORD"]    # injected at runtime; rotate the old leaked value
conn = connect(user="svc_app", password=DB_PASSWORD)

13. Each vague item rewritten as a verifiable "SHALL" requirement + how to verify: (a) "The server SHALL accept only an allowlist of file types (PDF/JPEG/PNG) and enforce a maximum size of N MB; all other uploads SHALL be rejected." — Verify: unit test + DAST with disallowed/oversized files. (b) "On every file retrieval, the server SHALL verify the authenticated user is authorized for that specific file, keyed to identity, never to a client-supplied ID alone." — Verify: code review + abuse test changing the ID. (c) "Source code SHALL contain no secrets; secrets SHALL be injected at runtime, and a secret scan SHALL pass on every commit." — Verify: secret-scanning gate in CI. (d) "The application SHALL log every authentication event, access-control failure, and high-value action with user identity and timestamp." — Verify: log review against a test script of those actions.

15. "Export all customer transactions to CSV" — three STRIDE-tied security requirements: - (Information disclosure) "The export SHALL include only records the authenticated user is authorized to see, enforced server-side." — prevents one user exporting another's data. - (Elevation of privilege / Repudiation) "Every export SHALL be authorized against the user's role and logged with identity, timestamp, and record count." — prevents an under-privileged user exporting and ensures an audit trail. - (Denial of service) "Exports SHALL be rate-limited and bounded in size/row count per request." — prevents a giant or repeated export exhausting resources. (A CSV-injection/formula concern in the output is also worth a requirement: neutralize leading =,+,-,@ in cell values — an output-encoding analog.)

17. (a) SCA — it inventories your (transitive) dependencies and matches versions against known-vuln databases; SAST examines your own code, not third-party CVEs, and DAST would only catch it if it happened to trigger the specific flaw. (b) DAST — it tests the running app from outside and observes the stack-trace leak in the real response; SAST might flag verbose-error config but cannot confirm runtime behavior, and SCA is about components, not behavior. (c) SAST — it traces a taint path from a source (user input) to a sink (SQL string) in code at rest; DAST might find it only if it triggers injectable behavior, and SCA is irrelevant to your own code.

19. Steps to turn 4,000 SAST findings into a defensible gate: (1) deduplicate and group by rule/CWE; (2) suppress known false-positive patterns for your codebase and frameworks (with documented justification, reviewed periodically); (3) enrich each remaining finding with severity and exploitability and the criticality of the asset/code path it sits in (risk = likelihood × impact, Chapter 1); (4) gate the release only on the high-confidence, high-severity subset (e.g., injection/secrets on internet-facing/sensitive paths) — Chapter 31's ci_gate; (5) route the rest to a backlog with SLAs. You suppress findings that are demonstrably not exploitable or not reachable; you gate on confirmed, severe, reachable issues — never on raw volume, which only trains developers to ignore the tool.

21. Add-a-new-payee feature — STRIDE threat model. (a) Trust boundary: the customer's browser (untrusted) to the bank's app server; a second boundary sits between the app and the payments/core system that will actually move money.

   CUSTOMER BROWSER  ║ trust      BANK APP SERVER         PAYMENTS / CORE
   (untrusted)       ║ boundary
     add payee ──────╫────────▶ [authn][validate] ─▶ store payee ─▶ later: send funds
                     ║                  │
                     ║          most threats cross this line ↑

(b) One threat per STRIDE letter: S — an attacker adds a payee in a victim's session (account takeover or request forgery — CSRF is dissected in Chapter 13). T — the routing/account number is altered in transit or storage. R — a disputed fraudulent payee with no record of who added it/when. I — payee list of one customer exposed to another (IDOR). D — automated mass payee additions exhaust resources or enable fraud at scale. E — adding a payee skips an authorization/step-up that high-value changes require. (c) Three requirements: "Adding a payee SHALL require an authenticated session and a re-authentication or step-up (e.g., MFA) for this sensitive change." (S/E) — "The server SHALL validate account/routing numbers against expected formats and checksums, server-side, and reject malformed input." (T/D) — "Every payee addition SHALL be logged with user identity, timestamp, and the payee details, and SHALL be visible to the customer (e.g., a confirmation notification)." (R/I)

24. (a) The attacker is probing for the Log4Shell vulnerability by placing a lookup string into many different fields (q, ua, user) across many paths, because the vulnerable code could log any of them — a spray to find where the application logs untrusted input. (b) No — finding probe attempts in logs shows you are being scanned (everyone is), not that you are vulnerable; what settles it is whether you actually run a vulnerable Log4j version in a path that logs that input. (c) SCA determines actual exposure, because it inventories your (transitive) dependencies and tells you whether a vulnerable Log4j is present and where — the "do we use it, and where?" question that detection alone cannot answer.

25. Three questions for the vendor claiming "not affected": (1) "Across all components you ship to us, including transitive dependencies, do any include a vulnerable Log4j version?" (2) "How did you determine this — did you inventory the full dependency tree, and when?" (3) "Will you commit to notifying us if that assessment changes as new information emerges?" The artifact that lets the vendor prove it is a software bill of materials (SBOM) for the product (introduced in Chapter 23, made a contractual vendor requirement in Chapter 29) — a machine-readable inventory you can check yourself against the advisory.

28. The "secure" code review. Distinct problems and their categories: 1. SQL injection (A03): the query is built by concatenating user and pw into a string — the length check does nothing to prevent injection. 2. Identification & Authentication Failures (A07): len(user) < 50 is not validation of credentials; there is no real input validation and no protection against automated guessing/lockout. 3. Cryptographic Failures / secrets (A02): the password is compared in plaintext against the database (implying plaintext storage) rather than verified against a slow password hash. 4. Broken Access Control / session management (A01/A07): the session cookie is set to the username — guessable and forgeable; it must be a random, server-side session identifier. 5. Logging & Monitoring Failures (A09): only success is logged; failed and anomalous logins are not. Corrected version (auth library + slow hash are stand-ins; full treatment in Chapter 16):

import re
def login(request):
    user = request.params["user"]
    pw   = request.params["pw"]
    if not re.fullmatch(r"[A-Za-z0-9_]{3,20}", user):        # positive validation, server-side
        log.info("login_invalid_username"); return deny()
    row = db.execute("SELECT pw_hash FROM users WHERE name = ?", (user,))  # parameterized
    if row and verify_password(pw, row["pw_hash"]):          # compare against a stored slow hash
        sid = new_random_session_id()                        # random session id, NOT the username
        log.info(f"login_ok user={user}")
        return ok(set_cookie=f"session={sid}")
    log.info(f"login_failed user={user}")                    # log failures too (A09)
    return deny()

The point of the exercise: a confident "fully secured" note can hide five independent flaws spanning five OWASP categories — which is why secure coding relies on patterns (parameterize, hash, random sessions, validate, log), tooling, and review, not on a developer's self-assessment.


Chapter 13

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or designed for discussion/lab. All fixes are framed defensively; no working exploit payloads are provided.

1. SQL injection — attacker-controlled input is interpreted as part of a SQL command, letting the attacker alter the query's logic. Cross-site scripting (XSS) — an attacker injects script that runs in another user's browser in the trusted site's context. Cross-site request forgery (CSRF) — a malicious page makes a victim's browser send an authenticated, unintended state-changing request to a site where the victim is logged in. Server-side request forgery (SSRF) — an attacker tricks a server into making an outbound request to a destination of the attacker's choosing (often internal). Shared root cause of the first two: attacker-controlled data is allowed to be interpreted as code (the failure to keep data and code in separate lanes).

4. A WAF is "defense in depth, not a fix" because it is a pattern-matcher placed in front of a vulnerability that still exists in the code; skilled attackers bypass WAFs with encoding/obfuscation, and the bug remains exploitable if a rule is missed or weakened. Done well: it blocks the noisy majority of automated/opportunistic attacks, provides virtual patching (a rule that blocks exploitation of a specific bug while developers fix the code), and generates attack telemetry for the SOC. It must never be used to justify leaving the underlying vulnerability unfixed (the "we don't need to fix the injection, the WAF blocks it" fallacy).

7. Vulnerability: SQL injection — account_id is interpolated into the query via an f-string, so a value carrying SQL syntax could alter the statement's structure (and "it looks like an integer" is not a control; the value is still attacker-influenced text). Fix: parameterize.

def get_balance(conn, account_id):
    return conn.execute("SELECT balance FROM accounts WHERE id = ?", (account_id,)).fetchone()

9. Vulnerability: DOM-based XSS — client-side JS reads ?name= from location.search (attacker- controllable) and writes it into the page via innerHTML (a dangerous sink), so markup in name executes. The server never sees the payload, so access logs may be blind to it. Fix: use a safe sink that cannot introduce markup.

const name = new URLSearchParams(location.search).get("name");
document.getElementById("greeting").textContent = "Hi " + name;   // textContent, not innerHTML

Defense in depth: a strict CSP (script-src 'self', no 'unsafe-inline') with report-uri would block and report an injected script even if a sink were missed.

11. Two vulnerabilities: (1) SQL injectionusername is concatenated into the lookup query; (2) session fixation — the session ID is unchanged across the anonymous→authenticated transition, so an attacker who pre-set the victim's session ID is logged in as them. Fix both:

def login(session, conn, username, password):
    row = conn.execute(
        "SELECT id, pw_hash FROM users WHERE username = ?", (username,)   # fix 1: parameterized
    ).fetchone()
    if row and verify(password, row["pw_hash"]):
        session.regenerate_id()        # fix 2: rotate session id at login -> defeats fixation
        session.user_id = row["id"]
        return True
    return False

13. Safe version (same as Ex. 7): conn.execute("SELECT balance FROM accounts WHERE id = ?", (account_id,)). It makes the value impossible to interpret as SQL because the database receives the command and the data separately — the command (... WHERE id = ?) is fully parsed before the bound value is ever involved, so the value can only be compared as data, never parsed as command structure.

16. Safe fetch_logo with an application-layer SSRF guard:

import ipaddress, socket
ALLOWED_HOSTS = {"images.partner.example", "cdn.assets.example"}

def fetch_logo(url):
    host = urlparse(url).hostname
    if host not in ALLOWED_HOSTS:                       # 1) allowlist (fail closed)
        raise ValueError("destination not allowed")
    ip = ipaddress.ip_address(socket.gethostbyname(host))   # 2) resolve, then check
    if ip.is_private or ip.is_loopback or ip.is_link_local:
        raise ValueError("internal destination blocked")    # blocks 169.254.*, 127.*, 10.*, ...
    return requests.get(url, allow_redirects=False).content # re-check on each redirect if you follow them

Two additional defenses beyond the code, and why code alone is insufficient: (a) re-check the resolved IP after every redirect and defend against DNS rebinding — a hostname can pass the initial check yet resolve (or redirect) to an internal address later; (b) network egress filtering so the app server cannot reach the metadata endpoint or internal hosts even if the app guard is bypassed (and, in cloud, the hardened token-required metadata service). SSRF is hard to fully prevent at the app layer alone, so the network layer must back it up (defense in depth).

18. A strong CSP for Meridian's customer pages:

Content-Security-Policy:
  default-src 'self';        # fallback for any resource type not named below: own origin only
  script-src 'self';         # scripts only from our origin; NO 'unsafe-inline' -> injected inline <script> is blocked
  style-src 'self';          # styles only from our origin
  object-src 'none';         # no plugins (Flash/Java)
  base-uri 'self';           # attacker cannot repoint relative URLs via an injected <base>
  frame-ancestors 'self';    # page may not be framed by other sites -> clickjacking defense
  report-uri /csp-report     # browser POSTs violation reports here -> SOC telemetry

The load-bearing line for XSS is script-src 'self' without 'unsafe-inline'. Inline scripts/event handlers must move to served .js files (or use nonces/hashes for the few that cannot).

20. Illustrative SQL-injection probing detection:

SELECT src_ip, COUNT(*) AS hits
FROM web_access_logs
WHERE (request_uri RLIKE '(?i)(union\s+select|or\s+1=1|--|/\*|;\s*drop)' OR status_code = 500)
  AND event_time > NOW() - INTERVAL '15' MINUTE
GROUP BY src_ip
HAVING hits > 20
ORDER BY hits DESC;

What it misses, and why: a patient attacker who URL-encodes or otherwise obfuscates payloads, uses blind injection (no obvious keywords, inferring data from response timing/booleans), or stays under the threshold will evade it — the rule catches the noisy/automated majority, not a careful adversary. It is a tripwire, not a guarantee; the fix (parameterization) is what makes the app safe.

23. (a) SQL-injection probing/discovery against /search. The attacker is testing the q parameter: a bare quote produces a 500 (the query broke — injectable), a boolean tautology and then two- and three-column UNION SELECT attempts probe the query's structure (column count). (b) The strongest indicators are the q parameter contents (SQL syntax: ', OR '1'='1, UNION SELECT) and the 500 status codes clustered from one source in seconds. (c) Root cause: the /search endpoint builds its query by string concatenation of q; fix: parameterize the query. (d) The 500 → 200 transition at 14:22:05 is significant because it suggests the attacker found a syntactically valid injected query (the three-null UNION matched the column count and returned 200) — i.e., probing is succeeding and exploitation is imminent; escalate and block now.

25. (a) Anomaly: the same session ID (8f3a...) is used from two different source IPs (203.0.113.10 then 203.0.113.99), and the second source immediately performs sensitive actions (add payee, transfer). This most suggests session hijacking/theft (a stolen/replayed session token — possibly via XSS, given this chapter) rather than CSRF (CSRF rides the victim's own browser/IP). (b) Two controls: HttpOnly on the session cookie (denies an XSS payload the cookie) and session binding/anomaly monitoring (alert on one session ID from two distant IPs/user-agents), plus step-up re-authentication before a transfer. (c) A WAF may not catch it because each request, on its own, looks like a valid authenticated request with a legitimate session cookie — there is no attack signature in the HTTP for the WAF to match; this is exactly why application-layer logging/detection matters.

27. Response to stored XSS in the transfer-memo field (rendered unescaped in the staff admin console): 1. Immediate containment: stop the bleeding for the privileged surface — temporarily escape/encode or suppress memo rendering in the admin console (or take the affected view offline), and apply a strict CSP (script-src 'self') to the console so any injected inline script is blocked while you fix the code. 2. Code fix: remove the raw/unescaped render and rely on the auto-escaping template engine so the memo is rendered as inert text (context-aware output encoding); add a test asserting the field is encoded. 3. Find prior exploitation: scan stored memo values for script-like content (<script, onerror=, javascript:, <svg); review admin-console access and CSP-violation logs for evidence the payload executed in a staff session; check for follow-on actions from affected staff sessions. 4. Defense-in-depth that would have limited blast radius: the strict CSP on the console (blocks the injected script even when encoding was missing) and HttpOnly on the staff session cookie (so a script that did run could not steal the session). The deeper lesson: customer→staff stored XSS is privilege-escalating and should be ranked accordingly.

29. Web vulnerability classes this "profile photo + personal-website URL, viewed by staff" feature can introduce, with the control for each: - Stored XSS (the website URL or any text field rendered on the public profile and in the staff console): context-aware output encoding by default; strict CSP; encode the URL and render it as a safe link (and validate the scheme — allow only http(s):, never javascript:). - javascript:/data: URL injection in the "personal website" link: scheme allowlist (only http/https); encode on output. - SSRF (if the server fetches the website URL to generate a preview/thumbnail, or fetches the photo from a URL): destination allowlist + block internal ranges (re-check after DNS/redirects) + egress filtering. - Malicious file upload for the photo (out of this chapter's core but adjacent): validate type/size, store outside the web root, serve with a safe content-type and Content-Disposition, never execute. - CSRF on the "save profile" action: anti-CSRF token + SameSite cookie. - Injection wherever the new fields are stored/queried: parameterized queries. The two traps the hint points at: the URL field (XSS via javascript: + SSRF on any server-side fetch) and the staff console rendering (privilege-escalating stored XSS into a higher-privileged browser).

32. The clever bypass. Rejecting any input containing a single quote (') is inadequate for at least two distinct reasons (described without a working exploit): (1) Many injections need no single quote — numeric contexts (e.g., ... WHERE id = <value>) let an attacker alter logic using digits, operators, and keywords with no quote at all; so a quote-blocklist does nothing there. (2) Encoding and second-order paths evade the filter — input can arrive URL-/Unicode-encoded and be decoded later, or be stored now (passing the filter) and concatenated into a query later (second-order injection), so the quote check at the front door never sees the dangerous form. (Bonus reason: it breaks legitimate input — the customer named O'Brien — pushing teams to weaken the filter.) The fix that makes the entire bypass class irrelevant is the parameterized query: because the command is parsed before the bound value is involved, no content of the value — quote or not, encoded or not, stored or fresh — can change the query's structure. You stop playing whack-a-mole with characters and remove the vulnerability by construction.


Chapter 14

Worked solutions to the daggered (†) exercises. The remaining exercises are open-ended or for discussion.

1. IoT — the category of everyday physical objects given computing and network connectivity. Embedded device — the special-purpose computer built inside such a product, running fixed-function software on constrained hardware. Firmware — the persistent low-level code stored on that embedded computer that controls its hardware. Combined sentence: "A network-connected security camera is an IoT device whose embedded device (the small special-purpose computer inside it) runs firmware that streams video — firmware the manufacturer last updated years ago and will never patch again."

4. Mobile app sandboxing is the OS-enforced isolation that confines each app to its own storage and memory, allowing access to anything outside (camera, contacts, network, other apps' data) only through permission-gated interfaces the OS brokers. The chain: sandboxing is what contains a malicious app on a healthy phone, so the platform's whole "one bad app ≠ disaster" guarantee rests on it → jailbreaking/rooting removes the OS restrictions that enforce the sandbox, so malware can escape confinement and reach system functions and other apps' data → therefore a jailbroken/rooted device has lost the very mechanism that made it safe to trust, so conditional access detects the jailbreak/root and blocks the device from corporate data rather than relying on protections that are no longer enforced.

7. Running the table through iotinv.py logic: default-credential offenders are cam-br04-01 (admin/admin), hvac-br04 (admin/password), and printer-br04 (root/root). atm-0107 (svc / strong unique) and the two devices with no credential interface (phone-ortiz, tablet-rao) are not flagged. Unmanaged devices are the three IoT devices (cam-br04-01, hvac-br04, printer-br04). Remediate first: cam-br04-01 — a camera with default credentials is both an emergency (anyone reaching it is admin in milliseconds) and typically exposes a reachable web/management interface, so it is the most likely to be found and abused; change its credentials immediately, then place it on the contained IoT segment. (All three default-cred devices are fix-in-minutes for the credential change and must still be segmented, because the credential fix is not durable against future firmware flaws.)

10. Triage actions, highest priority first: (1) M-204 (Android 13, jailbroken) — highest priority; conditional access should already have blocked it from corporate data; quarantine, and the device needs to be wiped/re-imaged or replaced and the owner spoken to, since a rooted device has lost its security model. (2) M-203 (iOS 16, encrypted=no) — serious; an unencrypted device exposes everything if lost; quarantine until full-device encryption is enforced. (3) M-202 (Android 10) — out of date and below a reasonable minimum supported version; patch-or-replace list, not an emergency. M-201 is healthy. The control that should already have limited the worst row's damage is conditional access: a jailbroken device should never have been allowed to reach mail/documents in the first place.

11. (a) The two-plus connections to 203.0.113.77 (an external host, port 443, ~42–51 KB each) should alert. It is high-signal on this device specifically because the camera's entire legitimate behavior is talking to its recorder at 192.0.2.10 on 554 — it has no business reaching the internet at all, so any external connection is glaring with essentially no false-positive risk. (b) Yes, consistent with compromise: outbound HTTPS to an unknown external host moving tens of kilobytes looks like command-and-control check-in and/or data exfiltration; an attacker who compromised the camera is staging or exfiltrating. (c) Segmentation as described would normally have prevented this — the IoT segment is supposed to deny outbound internet by default. This log implies that egress rule was missing or misconfigured for this segment; the additional rule that would have stopped it is "deny 192.0.2.0/24 (IoT) → any internet destination." (d) Detection rule: alert when any host in the IoT segment connects to a destination not on its device-class allow-list (and specifically any outbound internet) — see iot-offallowlist.sigma.yml.

13. Questions to classify the unknown device before touching it: What is its MAC OUI / fingerprint (what kind of device is it)? What segment and switch port is it on? What is it talking to, and on what ports? When did it first appear, and does any change ticket or purchase explain it? Is it reachable from/ to sensitive segments? Two likely benign explanations: a newly deployed approved device that simply wasn't added to the inventory yet (a process gap), or an employee-connected convenience device (shadow IoT — unauthorized but not malicious). Two likely malicious explanations: an attacker-planted device (e.g., a rogue access point or implant) bridging onto the network, or a previously-unknown compromised IoT device that has begun beaconing. The response is the same loop regardless: identify, isolate to a contained segment, and either bring it under management or remove it.

15. Example BYOD policy section (model answer — yours may differ in wording): 1. Eligibility: BYOD permitted for staff in roles that do not handle cardholder data or the most sensitive systems; high-sensitivity roles use corporate-owned devices. 2. Enrollment: BYOD devices must enroll a managed work profile; corporate data is accessed only inside it. 3. Minimum requirements: current supported OS, full-device or container encryption on, screen lock with passcode/biometric, and no jailbreak/root — enforced by conditional access. 4. What the company manages: only the work profile — corporate apps, corporate data, and the container's security settings. 5. What the company CANNOT see or do: the company cannot view personal apps, photos, messages, browsing, or location, and cannot wipe the personal side of the device. 6. Data handling: copy/paste, "open in," and file transfer of corporate data out of the work profile into personal apps or personal cloud storage are blocked. 7. Lost/stolen: report immediately; the company will selectively wipe the work profile only. 8. Offboarding: on departure or role change, the work profile is selectively wiped; personal data is untouched. 9. Acceptable use: corporate data accessed only for legitimate business purposes; no storing corporate data outside the managed container. 10. Support & liability: the company supports the work profile and corporate apps only; the employee is responsible for the device hardware and personal software. (The key graded elements: eligibility/scope, minimum requirements, explicit "can/cannot see and do," data-handling rules, and the selective-wipe offboarding process — in plain language.)

16. Example device-segmentation design (model answer):

BRANCH SEGMENTATION (default-deny between every segment)
  CORPORATE        10.x.10.0/24 : 6 tellers, 1 file server, 4 managed laptops
      -> allow: business apps to core (north) only.
  CARDHOLDER (CDE) 10.x.20.0/24 : 2 POS terminals (PCI scope)
      -> no path from IoT or guest; tightest segment.
  IoT/FACILITIES   10.x.30.0/24 : 3 cameras, 2 badge readers, 1 HVAC, 2 printers
      allow-list (the shortest in the branch):
        cameras      -> 10.x.30.10 (NVR)      :554
        badge readers-> 10.x.30.11 (ACS)      :443
        HVAC         -> 10.x.30.12 (BMS)      :443
        printers     -> 10.x.30.13 (print)    :9100
        INTERNET     -> DENY (all IoT devices)
  GUEST            10.x.99.0/24 : customer Wi-Fi
      -> internet-only; no path to any internal segment.

Default rule between segments: deny. Justifications: cameras/badge/HVAC/printers each reach exactly the one server they need and nothing else (a compromise inherits only that reach); the IoT segment is denied the internet (kills C2/botnet/exfil); the CDE is unreachable from IoT and guest (PCI requirement and good architecture); guest is fully isolated to the internet. Each cross-segment ALLOW exists only because a specific business need justifies it; everything else is denied.

20. The five-minute foothold — defensive reconstruction. Kill chain in chapter vocabulary: - Initial access: the attacker reached the lobby camera using documented default credentials — the camera was a vulnerability (reachable management interface + unchanged defaults); the pen-tester is the threat; logging in with the published password is the exploit. - Foothold: the camera's embedded OS shell gave the attacker a position inside the network. - Reconnaissance / lateral movement: because the network was flat (no segmentation), the camera could reach the teller workstation and the file server directly. - Objective: reach branch data via the file server. Single architectural change that stops the pivot at the camera: device segmentation — placing the camera on an isolated, default-deny IoT segment whose allow-list permits it to reach only the recording server. Even with the camera compromised, there is no path from it to the tellers or file server, so the pivot dies at step one. Two additional controls preventing/detecting the initial access: (1) change the default credentials on deployment (removes the front-door weakness), and (2) monitoring that alerts when the camera talks off its allow-list or scans the local segment (detects the foothold immediately). Bonus: an inventory/discovery process would have found the (shadow) camera before an attacker did. (No offensive steps are detailed — the analysis is entirely of the defenses that change the outcome.)

21. (Interleaves Chapter 11.) Four hardening measures for an ATM that can be hardened: (1) application allowlisting so only the ATM software and its required components may execute; (2) disable unnecessary services/ports to shrink the attack surface; (3) remove or disable default and unused accounts and enforce strong unique credentials on those that remain; (4) enable full-disk encryption and secure boot/TPM where the platform supports it, and apply vendor patches within a managed window. Application allowlisting is the single best fit for a special-purpose device because such a device runs a small, fixed, known set of programs — so allowlisting blocks all unauthorized code (including malware) with essentially no impact on legitimate function, which is exactly the situation where allowlisting's usual operational burden (maintaining the list on a general-purpose machine where software changes constantly) disappears. The same logic applies to kiosks and any single-purpose embedded system.


Chapter 15

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Provider / customer / shared, with model: - (a) Customer — IaaS; the OS is on your side of the line. - (b) Provider — PaaS; the provider patches the managed database engine and its OS. - (c) Customer — SaaS; deciding who may access data is always the customer's, in every model. - (d) Provider — IaaS (and all models); physical datacenter security is always the provider's. - (e) Shared — the provider supplies the encryption capability, but the customer must enable it and manage keys; it is frequently off until you turn it on. - (f) Provider — the durability of the storage service itself is the provider's (e.g., S3's durability guarantee). Note the contrast with (e): the service's durability is the provider's, but whether your data in it is encrypted/public is yours.

4. Rebuttal: Moving to the cloud offloads the layers the provider can secure better than you can — the physical datacenter, the hardware, and the hypervisor — but it does not offload the layers that encode your business decisions. In every model, including SaaS, data classification and identity/access remain yours: the provider cannot decide which of your data is sensitive or who in your organization should have access, and it will let you make a bucket public or grant * permissions without objection. The move did not offload (for example) IAM/access management — an over-broad policy is still entirely your breach. "Security is the provider's job now" is precisely the assumption that produces public-bucket breaches.

6. (a) Yes, the bucket is public. (b) The grant Grantee: { URI: ".../groups/global/AllUsers" } Permission: READAllUsers is AWS's special group meaning everyone on the internet, unauthenticated, so READ to AllUsers makes every object listable and downloadable by anyone. (c) The attacker's "exploit" is simply an HTTP GET request to the bucket/object URL — there is no break-in, which is why nothing is detected; automated scanners enumerate public buckets continuously and will find it. (d) S3 Block Public Access enabled account-wide would have prevented this regardless of the ACL — it overrides bucket-level settings and refuses to make any bucket public. (Run cloudpost.s3_public on this ACL → True.)

7. (a) The policy grants Action: * on Resource: * — every action on every resource, i.e., effectively administrative control of the entire account, when the job only needs to write to one bucket. (b) Precise risk: if the job's credentials leak (into a repo, a log file, a compromised host), the attacker inherits total account control — they can read every bucket, delete databases, create new admin users, and disable logging. The blast radius is the whole account; it should be a single bucket. (c) Least-privilege rewrite:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:PutObject"],
    "Resource": ["arn:aws:s3:::meridian-logs-shipping/*"],
    "Condition": { "Bool": { "aws:SecureTransport": "true" } }
  }]
}

(Run cloudpost.iam_overbroad on the original → True; on the rewrite → False.)

10. Field-by-field: eventName: StopLogging on eventSource: cloudtrail.amazonaws.com means someone turned off the CloudTrail audit log; userIdentity shows the IAM user svc-deploy in account 123456789012; sourceIPAddress 198.51.100.23 is where the call came from; errorCode: null means it succeeded. One-sentence summary: the deploy service account just successfully disabled the account's audit logging at 03:11 UTC. Yes — page the on-call analyst immediately. Disabling logging is one of the highest-fidelity alerts there is: there is almost no legitimate reason to do it, and it is a classic move by an attacker who has gained access and wants to cover their tracks. The fact that a service account (not a human) did it, at 03:11, makes it more suspicious, not less — and a service account should arguably never have permission to call StopLogging at all (least privilege).

12. (a) The chain: the attacker exploits the SSRF bug to make the server fetch a URL of the attacker's choosing → they point it at the instance metadata service at http://169.254.169.254/latest/meta-data/iam/security-credentials/... → the metadata service returns the VM role's temporary IAM credentials → because the role grants Action: *, those credentials confer full account control → the attacker enumerates and downloads every bucket and the database (total breach). (b) Three independent controls, each of which alone breaks the chain: - IMDSv2 enforced (application/instance layer) — requires a session token and blocks the simple GET-based SSRF path to metadata. - Least-privilege instance role (identity layer) — a stolen role credential is then bounded to exactly what the app needs (e.g., read one bucket), not the whole account. - Fix the SSRF (application code layer, Chapter 13) — no attacker-controlled fetch in the first place. This is the defense-in-depth argument exactly: three layers, any one of which would have stopped a total-compromise that a single web bug plus a single over-broad role would otherwise have caused.

16. Least-privilege IAM policy for the Lambda (read meridian-statements, write meridian-statements-processed, TLS required, nothing else):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::meridian-statements",
        "arn:aws:s3:::meridian-statements/*"
      ],
      "Condition": { "Bool": { "aws:SecureTransport": "true" } }
    },
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject"],
      "Resource": ["arn:aws:s3:::meridian-statements-processed/*"],
      "Condition": { "Bool": { "aws:SecureTransport": "true" } }
    }
  ]
}

Notes graders should look for: s3:ListBucket is granted on the bucket ARN (no /*) while GetObject/ PutObject are on the object ARN (/*) — a common point of confusion; read and write are split into separate statements scoped to different buckets (the job cannot write to the source or read the destination); and the TLS condition is present. The policy passes cloudpost.iam_overbroad as not over-broad (no wildcard action+resource).

21. The leaked key. (a) First fifteen minutes, in order: (1) deactivate/delete the exposed access key (the AKIA... key) so it can no longer be used — containment first; (2) assume the account is compromised given the key had * permissions, and engage incident response; (3) search CloudTrail for every action taken with that access key ID to scope what, if anything, was done; (4) rotate any other credentials that may have been exposed alongside it and purge the secret from the repository history (not just the latest commit). (b) Search CloudTrail filtered on the leaked access key ID in userIdentity: review the source IPs and event names. Activity from the deploy job's expected IP doing expected actions is normal; an unfamiliar source IP, a geographic region the deploy job never runs from ("impossible travel"), or actions the job never performs (e.g., CreateUser, ListBuckets across the account, GetObject on buckets unrelated to deployment) indicate the key was used by someone other than the job — a breach. (c) Three preventive controls, one per layer: code — secret scanning in CI/pre-commit (Chapter 20) so the key is never committed; identity — never issue long-lived AKIA... keys for this purpose; use a role with short-lived credentials so there is no durable secret to leak, and scope it least- privilege so even a leak is bounded; detection — alert on CreateAccessKey and on use of any access key from an unfamiliar IP/region. (The key shown is AWS's published documentation example, never a real credential.)


Chapter 16

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design tasks, or discussed in class.

1. The three factors and their defining weaknesses: - Knowledge (something you know) — passwords, PINs, security questions. Weakness: can be copied without taking anything (phished, guessed, observed, or leaked), often without the owner noticing. - Possession (something you have) — hardware keys, authenticator apps, smart cards, a SIM. Weakness: only as strong as how possession is proven (a readable code can be relayed; a hardware signature cannot). - Inherence (something you are) — fingerprint, face, iris, voice. Weakness: irrevocable after compromise, and on a network only as trustworthy as the device that measured it.

4. Assurance mapping: - (a) password only → AAL1 (single factor). - (b) password + push with number matching → AAL2 (MFA; not phishing-resistant, so not AAL3). - (c) FIDO2 hardware key only → AAL3 (phishing-resistant, hardware-based; a single FIDO2 authenticator satisfies AAL3 because it proves possession of a private key bound to the origin). - (d) password + SMS OTP → AAL2 (MFA, but the weakest form; SMS is SIM-swap- and relay-vulnerable). - (e) a synced passkey → AAL2 (MFA and phishing-resistant; typically treated as AAL2 rather than AAL3 because the private key can be synced/exported via the cloud account rather than being hardware-bound). A Meridian asset for AAL3: money movement (the wire-transfer system) or domain-administrator access.

6. Storage verdicts: - (a) plaintext — unacceptable; one breach exposes every credential, and reuse compromises other sites. - (b) MD5(password)unacceptable; fast and (as used) unsalted, so rainbow tables and GPUs crack it trivially. MD5 is also cryptographically broken. - (c) SHA256(password) unsalted — unacceptable; fast and unsalted, so precomputation and fast brute-force both apply. - (d) SHA256(salt + password)weak/insufficient; the salt defeats rainbow tables, but SHA-256 is still fast, so an attacker can try billions of guesses per second on a GPU. Salting alone is not enough. - (e) bcrypt with per-user salt and a real work factor — acceptable; deliberately slow. - (f) Argon2idbest; deliberately slow and memory-hard (blunts GPU/ASIC parallelism). Preferred.

8. Entropy ($H = L \times \log_2 N$): - 10-char random over 94 symbols: $10 \times 6.55 = 65.5$ bits. - 4-word diceware (7,776-word list): $4 \times 12.92 = 51.7$ bits. So the 10-character random password is stronger (~65.5 vs ~51.7 bits). However, a human-chosen "P@ssw0rd1" has far less effective entropy than $10 \times \log_2 94 \approx 65$ bits, because humans do not choose uniformly at random — they cluster on a tiny, predictable subset (dictionary words, a capital at the front, a digit/symbol at the end). The formula reasons about the alphabet; the attacker reasons about what people actually pick. That gap is exactly why breached/common-password screening matters more than any composition rule.

11. Phishability ranking (most → least) and resistances: - SMS OTP — most phishable AND additionally SIM-swappable. Resists: nothing the others don't; it is the weakest. Vulnerable to: SIM swap, real-time relay/phishing. - TOTP — phishable by relay, but resists SIM swap (no phone number to hijack). Vulnerable to: real-time relay/phishing. Resists: SIM swap, credential stuffing. - push with number matching — least phishable of the three (no code to type; the required number must be read off the genuine screen). Resists: push fatigue (mostly), SIM swap. Vulnerable to: sophisticated relay — it is better, not phishing-resistant. None of the three is truly phishing-resistant; only FIDO2/passkeys are.

13. A push-fatigue (MFA-fatigue) attack: the attacker, already holding the password, repeatedly initiates logins, firing a stream of "Approve sign-in?" push prompts at the victim's phone — sometimes dozens, often at inconvenient hours — until the exhausted, confused, or annoyed user taps Approve just to stop the noise. Every cryptographic check still "passes" because nothing cryptographic was broken: the password was correct and the user genuinely approved the prompt; the attack defeated the human, not the protocol. Number matching defeats it by replacing the bare Approve/Deny with a two-digit number the login screen displays and the user must type into the app. A reflexive tap no longer works, and a blind attacker (who cannot see the genuine login screen) cannot supply the number — so the only person who can complete the approval is someone looking at the real screen, i.e., the legitimate user initiating a real login.

16. Two independent reasons a phishing-captured FIDO2 signature fails at the real site, tied to the WebAuthn flow: 1. No reusable secret (steps 4–5 of Figure 16.2): the user's response is a cryptographic signature over the server's challenge, not a code or password. A flawless fake page captures the signature, but it is single-use over that specific challenge and reveals nothing reusable — there is no shared secret to steal. 2. Origin binding (steps 3, 5, 7): the browser supplies the origin (the real address-bar domain), the authenticator signs challenge + origin, and the real site verifies that the signed origin matches its own domain. A signature produced for the phishing origin (meridian-bank-login.example) is rejected by the real site (meridianbank.example) because the origins differ. (And the authenticator holds no key registered for the look-alike domain in the first place, so for a truly different domain there is no credential to offer at all.)

18. Hardware key vs. synced passkey: | | Hardware security key (device-bound) | Synced passkey | |---|---|---| | Private key lives | On the device only; non-exportable | In a cloud credential manager, synced across devices | | Recovery if lost | Backup key / recovery flow | Restored from the cloud account on a new device | | Best for | Highest-assurance accounts (admin, money) — clears AAL3 | Broad workforce/consumer rollout; usability + self-recovery | | Principal residual risk | Provisioning + backup-key logistics; cost | Only as strong as the cloud account that syncs it — harden that account |

Recommendation: (a) general workforce → synced passkeys (usability and self-service recovery are decisive at scale; still phishing-resistant); (b) domain administrators → device-bound hardware keys (you want a private key that physically cannot leave the device and AAL3 assurance for crown-jewel access). Use both, matched to tier.

24. Auth-log analysis: - (a) Credential stuffing — many distinct usernames, each tried once, from IPs spread across the whole range, with a low (~2%) but nonzero success rate. - (b) The two strongest indicators: the enormous count of distinct usernames in a short window, and the low-but-nonzero success rate (an attacker replaying known-valid-elsewhere passwords lands a small fraction). The spread of source IPs across the /24 is a third (evasion of per-IP limits). - (c) Per-account lockout largely misses it because each account is only tried once or twice — under any reasonable lockout threshold. Lockout watches one account at a time; stuffing's signal is across many accounts. - (d) The single most decisive control: MFA (ideally phishing-resistant), which makes the success rate irrelevant — a correct stolen password alone no longer completes a login. (Breach-password screening at set-time is the complementary root-cause control.)

26. Impossible-travel analysis: - (a) The detection is impossible travel — two successful authentications from geographically incompatible locations within a time too short to physically travel between them. - (b) Two underlying compromises: (i) the attacker stole a valid session token (e.g., via an AITM relay) and is replaying it from their own location; (ii) the attacker phished/relayed the credentials and is logging in directly from elsewhere while the real user is still active. - (c) Benign false-positive cause: the user is on a VPN or routed through a corporate egress / cloud exit node that geolocates to a different region, or two devices (phone on cellular, laptop on VPN). Tune by allow-listing known VPN/egress IP ranges and corporate proxies, and by weighting the alert with other signals (new device, unusual app) rather than firing on geography alone.

28. Example one-page authentication standard (AAL-by-asset-tier), abbreviated: - Tier 0 — public/low-value (marketing site, read-only info): AAL1; password permitted; breach-screening on. - Tier 1 — workforce productivity (email, M365, general apps): AAL2 min; password + MFA; number matching and context required on any push; phasing to passkeys. - Tier 2 — customer banking / teller systems: AAL2; migrate customers and tellers to passkeys; SMS OTP fallback only during sunset, never the sole high-value factor; smart lockout + breach-screening on. - Tier 3 — money movement, privileged/admin, core/CDE access: AAL3 — phishing-resistant FIDO2/hardware MFA required, no exceptions; phishable fallbacks disabled and audited. Storage & policy clauses: Argon2id with per-user salt and tuned work factor; no mandatory periodic expiration (force change only on evidence of compromise); length over composition; breach-corpus screening at set-time; password managers and paste explicitly allowed; smart (source/behavior-aware) throttling. Two residual risks the standard creates: (1) account recovery / the help desk becomes the new soft target (an attacker who can't phish the login attacks the reset flow) — mitigate with strong recovery verification, second-person approval for Tier 3, cool-downs, and backup keys as the primary recovery path; (2) synced-passkey cloud accounts must themselves be hardened, since a compromised cloud account can misuse the passkeys.

32. Example three rows of an authentication risk register (judgments vary; justification matters): 1. Push-fatigue approval grants admin access · AD/Entra admin accounts · L3 · I5 · 15 CRITICAL · move admins to phishing-resistant FIDO2; enable number matching meanwhile. 2. Credential stuffing succeeds against customer accounts via reused passwords · online banking · L4 · I4 · 16 CRITICAL · breach-password screening + risk-based step-up MFA + success-rate alerting. 3. SIM swap defeats SMS OTP on a high-value customer, enabling account takeover · customer accounts · L2 · I4 · 8 HIGH · migrate customers to passkeys; require step-up phishing-resistant approval for money movement; reduce reliance on SMS as the sole factor.

33. The "compliant" disaster memo (CTF). The vendor's claim — "12-char passwords, full complexity, 30-day rotation, hashed with SHA-256" — is weaker than it sounds on both axes: - Storage: "hashed with SHA-256" is the headline failure. SHA-256 is a fast, general-purpose hash; if it is unsalted, a database breach falls instantly to rainbow tables, and even salted it is brute-forced at billions of guesses per second on a GPU. There is no work factor and no memory-hardness. A breach of this vendor would likely expose a large fraction of passwords quickly — and because users reuse passwords, it would compromise their accounts elsewhere too, turning the vendor's breach into a multi-service incident. - Policy: full complexity + 30-day rotation is the old, discredited model. Composition rules push users to predictable patterns (Password1!); aggressive 30-day rotation pushes them to weak incremental changes (Spring24!Summer24!) and to writing passwords down. Worse, the policy almost certainly does not screen against breached/common passwords — so a fully "compliant" Companyname1! sails through while sitting on every attacker's wordlist. What a breach actually costs: rapid mass password recovery → account takeovers at the vendor and at every service where users reused the password → regulatory and reputational fallout. The three highest-priority changes, in order: (1) replace SHA-256 with a salted, memory-hard hash (Argon2id/bcrypt/scrypt) — makes a breach survivable; (2) screen passwords against breached/common lists at set-time — removes the ammunition for guessing and stuffing; (3) add MFA (ideally phishing-resistant or risk-based step-up) and drop forced 30-day rotation in favor of change-on-compromise — defeats stolen-password reuse and stops training weak passwords. The lesson: "complexity + rotation + we hash" can all be literally true and still describe a badly insecure system; the right questions are which hash (slow + salted?) and do you screen what people actually pick?


Chapter 17

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Authentication — verifying a claimed identity (who you are). Authorization — determining what that verified identity may do. Accounting — recording what the identity actually did. Combined sentence: "A wire operator proves she is herself with a security key (authentication), the system confirms her role permits her to initiate but not approve the wire (authorization), and the action is written to an immutable log noting who entered it and when (accounting)."

4. Privilege creep is the gradual accumulation of access rights an individual collects as they change roles over time, without the access from prior roles being removed. The asymmetry driving it: granting access removes a blocker and has an eager champion (someone is asking, work is urgent, "yes" is frictionless), while revoking access has no champion (nobody files a ticket to lose access, and the person who changed jobs does not volunteer to give up old permissions). The biggest creep generator is the mover event (internal transfer): organizations reliably remove access on departure (leaver) but reliably forget to remove the old role's access on a role change, so old and new permissions coexist.

8. (a) No — pure RBAC keys only on role membership and cannot express the ward relationship, the shift window, or the device condition. (b) ABAC can. The conditions rely on: "assigned to that patient's ward" = a subject↔resource relationship attribute (nurse.ward == patient.ward); "during the nurse's shift" = an environment attribute (current_time within nurse.shift); "from a managed workstation" = an environment/device attribute (device.managed == true). (c) permit read(chart) if subject.role == "Nurse" AND subject.ward == resource.ward AND now ∈ subject.shift AND env.device == "managed". (The common production form is RBAC for the role check plus ABAC conditions for ward/shift/device.)

10. "Strong authentication makes weak authorization worse" because strong authentication reliably attaches over-broad access to a confirmed identity: the system is certain who the user is and then lets that user reach far more than their job needs, so a compromise of that account yields exactly the broad access, with no uncertainty about identity to slow the attacker. Concrete scenario: "A support engineer's account has standing read access to all 400 customer tenants. Flawless phishing-resistant MFA guarantees the account truly belongs to that engineer — so when the attacker steals a live session, the well-authenticated, over-authorized account hands them all 400 tenants' data with no escalation needed." The strong authentication did nothing to limit the blast radius; only least-privilege authorization would have.

13. A reasonable RBAC design (judgments vary; the reasoning matters):

Role Inherits Adds Must NOT have
All_Staff (base) email, intranet, time-clock any banking access
Cashier All_Staff deposit, withdrawal, read_member_balance reversals; payment setup
Lead_Cashier Cashier reverse_txn (≤ limit) payment setup/release
Member_Services All_Staff open/close account, update_member_info reversals; payment setup
Branch_Manager Lead_Cashier approve_exception, read_reports set up payments; approve own approvals
Payments_Setup All_Staff create_payee, enter_ach_payment release_ach_payment
Payments_Release All_Staff release_ach_payment create_payee / enter_ach_payment

(a) Base role All_Staff factors out universal access. (b) Hierarchy: Lead_Cashier = Cashier + a reversal; Branch_Manager builds on Lead_Cashier. (c) The required SoD split is in the back-office payments function: setting up/entering a payment and releasing it must be different roles (Payments_Setup vs. Payments_Release), assigned to different people, because one person who can both create a payee and release a payment to it can commit fraud alone — exactly the toxic combination the chapter centers on. The provisioning system should refuse to assign both payment roles to one identity.

17. (a) Stale grants relative to the current Senior Teller job: Branch_Ops (6-week coverage ended in 2021), Reporting_Admin (project closed Q4 2022), Wire_Approver (3-week coverage ended). The appropriate current grants are Senior_Teller (and redundant-but-harmless Teller). (b) Toxic combination: Branch_Ops contains initiate_wire and Wire_Approver contains approve_wire, so the account holds both wire permissions — it can initiate a wire and approve its own wire, moving money out with no second person. It is dangerous because it defeats segregation of duties: one compromised or dishonest account can complete a high-risk, irreversible action alone. (c) The control that should have removed each: Branch_Ops and Wire_Approver were temporary coverage grants that should have been time-boxed (auto-expiring) or removed by a mover/coverage-end process; Reporting_Admin should have been removed by a project-offboarding / time-boxed project access control; and all three should have been caught by a periodic access review reading across the account's row and an automated SoD combination scan.

19. Example response: "Copying an existing employee's access clones whatever that employee has accumulated over their whole career — including old permissions from past roles that they should have lost but never did — so we'd be handing a brand-new hire a pile of access their job doesn't need on day one, and quietly recreating any segregation-of-duties problems that person had. Instead we'll provision the new hire from the clean role that matches their actual job, so they get exactly what the job requires and nothing more. If they turn out to need something extra, we add that one role with a justification — which is auditable, unlike an inherited mystery bundle."

21. Example segregation-of-duties policy section: - No single identity may hold both initiate_wire and approve_wire; these are enforced as mutually-exclusive roles. - The following permission combinations are likewise forbidden on any one identity: (create_vendor + approve_payment), (create_payee + release_payment), (modify_payroll + approve_payroll). - Forbidden combinations are blocked at provisioning (the access system refuses to grant the second of any conflicting pair) and at runtime (the policy decision point refuses the conflicting act). - For high-risk actions, the system enforces self-approval prevention: the approver of a transaction must not be the person who initiated it, regardless of roles held. - Transactions above defined thresholds require a second, distinct approver (dual approval). - Combined-entitlement (cross-role) access reviews are run quarterly for all roles touching money movement, cardholder data, or administrative access; each review affirmatively recertifies or revokes, and an automated SoD scan flags any new conflicting combination.

23. See code/exercise-solutions.py for the runnable solution. The function returns True only if (1) rbac_check(approver_roles, "approve_wire") is true, (2) approver_id != wire["initiated_by"], and (3) for amount > THRESHOLD, a distinct second_approver is present. Hand-traced: - Allowed: approver a.khan (holds Wire_Approver), wire initiated by j.ortiz, amount 50,000 (≤ threshold) → all checks pass → True. - Denied: approver j.ortiz equals wire["initiated_by"] → self-approval check fails → False. The key point: rule (2) is a runtime check at the PDP, so it protects the action even if a single account somehow holds both wire roles — the dynamic layer of segregation of duties.

25. Reading down the Wire:approve column answers "which roles can approve a wire?" — the resource-owner question. In the broken matrix it reveals that Senior_Teller and Branch_Manager can approve wires (alarming — wire approval should be confined to a dedicated, separated role, and these roles also hold Wire:init). Reading across the Branch_Manager row answers "what can a branch manager do everywhere?" — the reviewer question. It reveals that the manager role holds both Wire:init and Wire:approve (a built-in toxic combination at the role level — anyone with that role can initiate and approve a wire alone), plus reversal and reporting powers. The column read finds over-broad resource access; the row read finds the combination that violates segregation of duties.

28. First five responder steps (each exercises a chapter-17 concept): 1. Confirm the authorization anomaly — verify from the access/role data whether this teller account even should hold approve_wire (it should not). (Authorization / least privilege.) If it does, privilege creep created the capability; if not, this is a deeper compromise. 2. Pull the accounting trail — examine the wire's full log: who initiated it, the amount, the payee, prior approvals. (Accounting.) Look first here to determine scope and whether money actually moved. 3. Check for self-approval / SoD breach — was the approver also the initiator, or did this complete a wire with no independent second party? (Segregation of duties.) 4. Evaluate the context signals — 02:14 local time and an unmanaged device are exactly the environmental attributes an ABAC policy would weight; treat the off-hours, off-device pattern as a strong indicator of a stolen session vs. legitimate work. (ABAC.) 5. Contain — disable/hold the account and any in-flight wire pending verification, and escalate per the IR plan (🔗 Ch.24). Where to look first to distinguish creep-abuse from a stolen session: the accounting trail plus the device/session and authentication logs — a legitimate-but-creep account would show normal device/hours and a history; a stolen session shows anomalous device, off-hours, and a first- ever use of the privilege.

30. (a) The reporting service is a deputy holding broad read access to every database; by exposing an endpoint that runs arbitrary queries for any authenticated caller, it lets a low-privilege user borrow the service's wide authority — the user asks the service to read data the user's own account cannot, and the service, using its privileges rather than checking the caller's, returns it. That is the "confused deputy." (b) The violated principle is least privilege, and the deeper failure is missing authorization of the caller: the service authorizes by its own (the deputy's) privileges rather than by the caller's authority. (c) Two fixes: (i) tighten the deputy — give the reporting service only the minimum read access it actually needs, scoped per request, instead of standing access to everything; (ii) authorize the caller — before running a query, the service must check that the calling user is itself authorized for the requested data (pass and enforce the caller's identity/permissions, not just the service's), as shown in code/exercise-solutions.py. Fix (ii) is the essential one; fix (i) limits the blast radius if it is ever bypassed (defense in depth).


Chapter 18

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class. Computational solutions (8, 10, 23) are also encoded, runnable and hand-traced, in code/exercise-solutions.py.

1. Single sign-on (SSO) — the capability to authenticate once and then reach many applications without re-authenticating to each. Federation — SSO that crosses organizational boundaries via a pre-established trust, so one organization's users reach another's apps without the second managing their credentials. Directory service — a specialized database that is the authoritative source of truth for identities (e.g., Active Directory, Entra ID). Access certification — the periodic process of confirming each access grant is still appropriate or flagging it for removal. Combined sentence: "When the directory service still listed a departed contractor as an enabled user, their single sign-on identity continued to reach every federated application — until the quarterly access certification caught the orphan and revoked it."

4. An orphaned account is an account with no valid owner or business purpose — most commonly an enabled account for a person who has left. The three properties that make it valuable to an attacker out of all proportion to its count: (1) it is valid — it authenticates successfully, so it defeats the front door without any exploit; (2) it is unmonitored — nobody watches an account for a person who left, so misuse raises no alarm; and (3) it is often privileged or well-connected — it accumulated access during the owner's tenure and never lost it. Together these make an orphan a pre-positioned, trusted foothold ideal for lateral movement and privilege escalation.

5. The joiner-mover-leaver (JML) lifecycle manages an identity across its lifetime. Joiner: a new person who must be provisioned with an account and birthright access — failure mode: over-provisioning (granting more than the role needs, e.g., copying the last hire's access). Mover: a person who changes roles and whose access must be reconciled (grant new, revoke old) — failure mode: privilege creep (and the toxic segregation-of-duties combinations it can create) because the process only adds and never removes. Leaver: a departing person whose access must be fully deprovisioned everywhere — failure mode: the orphaned account left enabled because no leaver trigger fired or some system was missed.

8. Reconcile the AD-enabled list against the source of truth (HR + contractor roster). Orphans: - contractor_lee — flagged because there is no roster record (engagement ended four months ago, no leaver trigger fired). Strongest orphan signal. - svc_backup — flagged because it has no human owner listed; a service account with no owner/purpose (Chapter 20) should not exist until claimed and documented. - rgarcia — flagged because the person is terminated (HR shows they left last month) yet the account is still enabled. jdoe, asmith, mwilson, and bchen all match active people and are kept. (Encoded as ex08_find_orphans()['contractor_lee', 'svc_backup', 'rgarcia'].)

10. (a) Privilege creep (justified by a past role, not the current Loan Officer role): teller-app and cash-drawer-recon (from Teller) and branch-ops-admin (from Branch Operations) — plus wire-APPROVE, which no current role of theirs justifies at all. The current role legitimately needs only loan-origination and wire-INITIATE. (b) The dangerous pair is wire-INITIATE + wire-APPROVE: holding both lets one person initiate and approve a wire disbursement with no second party — a segregation-of-duties violation that enables a single individual to commit wire fraud unchecked, which is exactly the toxic combination a bank's controls exist to prevent. (c) The mover transition failed: each role change added new access without reconciling away the old, so entitlements accreted across six years. (Encoded as ex10_privilege_creep().)

11. The four security-control fields: - <Issuer> — states who is vouching (the IdP). If the SP ignores it, it may accept an assertion from an untrusted issuer; the SP must check the issuer is the expected, trusted IdP. - <Signature> — cryptographically proves the assertion really came from that IdP and was not altered. Defends against forgery/tampering. If the SP does not verify it (or accepts unsigned assertions), an attacker who can craft or modify an assertion can log in as anyone — the classic SAML signature-bypass breach. - <Conditions NotBefore/NotOnOrAfter> — the short validity window. Defends against replay of a captured assertion: outside the window the SP must reject it. If ignored, a stolen assertion is valid indefinitely. - <Audience> — binds the assertion to one specific application. Defends against an assertion captured by one SP being replayed against a different SP. If not enforced, an assertion for app.example is accepted by other.example. The unifying lesson: the protocol is only as strong as the service provider's validation of it.

13. SAML SSO flow (service-provider-initiated), in order: (1) the user's browser requests the application (the SP); (2) the SP sees no session and builds a SAML AuthnRequest, redirecting the browser to the IdP; (3) the browser carries the request to the IdP; (4) authentication happens here — the IdP authenticates the user with password + phishing-resistant MFA; (5) the IdP builds and signs a SAML assertion and redirects the browser back to the SP's Assertion Consumer Service; (6) the browser POSTs the signed assertion to the SP; (7) the application verifies it here — the SP checks the IdP's signature, the audience, and the timestamps, then reads the user's identity; (8) the SP establishes a logged-in session. The one credential the application never receives is the user's password — it goes only to the IdP. The advantages: the app cannot leak a credential it never holds, MFA is enforced centrally, and disabling the single IdP identity cuts off every federated app at once.

16. A contractor account policy that makes CONTRACTOR_X structurally impossible: - System of record: every contractor must have a record in an authoritative contractor roster with a mandatory engagement end date; no roster record, no account. - Account creation: contractor accounts are created only from an approved roster entry, with least-privilege birthright access for the engagement and no standing privileged access. - Mandatory technical control: every contractor account is created with an expiration date in the directory matching the engagement end date (a fail-safe — it dies on schedule even if everyone forgets). - MFA: contractor accounts are not exempt from phishing-resistant MFA (the Eastfield exemption is the anti-pattern). - Deprovisioning triggers: the roster fires a leaver trigger on the end date (or early termination) that deprovisions the identity, and SCIM fans the disable out to integrated SaaS apps. - Safety net: disable-after-90-days-inactivity catches anything the triggers miss. - Review: contractor accounts are included in the quarterly access certification of sensitive entitlements, so a human confirms what automation missed.

21. First five actions for a still-enabled departed-engineer domain-admin account showing a recent login from an unfamiliar IP (a likely active compromise of the crown jewels): 1. Treat it as an incident, not just a cleanup — invoke the IR process (Chapter 24). A domain-admin account in unexplained use is potentially a full domain compromise; the scope is "everything." 2. Preserve evidence before destroying it — capture the authentication logs, the source IP, the sessions, and the account state now. This is the tension to address directly: you must cut off access, but a careless action can erase the evidence (Chapter 25). Snapshot first, then act — and disable (do not delete) the account so its history survives. 3. Cut the active access — disable the account at the authoritative source (on-prem AD), and revoke all active sessions/tickets (a disabled account can retain valid Kerberos tickets until they expire; force their invalidation). Rotate/observe what the account could reach. 4. Scope the blast radius — because it was domain admin, assume lateral movement: hunt for what the account touched, look for newly created accounts or changed group memberships (T1098), and check other privileged accounts (this is where Chapter 19's tiering and Chapter 22's hunting feed in). 5. Eradicate and harden — rotate credentials the account could have exposed (potentially including the krbtgt account in a confirmed domain compromise), then fix the governance root cause so a domain-admin account cannot survive an offboarding: certification of all privileged accounts, and the PAM controls of Chapter 19. The order balances speed against evidence by capturing state first, then cutting access immediately after — not by delaying containment, but by sequencing a fast snapshot ahead of the disable.

23. The account that shouldn't exist. Reconcile the four lists. Orphans (enabled in AD but no active HR record) = dave (transferred/left a year ago — note he is not in HR active), svc_etl (a service account, owner team Data Eng), and ghost_admin (no HR record, no owner, no documentation). Of these, the single most dangerous is the one that is simultaneously orphan + privileged + actively used: ghost_admin — it has no record anywhere (orphan), it is a member of Domain Admins (privileged), and it logged in within the last seven days (actively used). It is the textbook pre-positioned foothold: a forgotten, undocumented, highly privileged account that something is actively using. Remediation order: (1) ghost_admin first and as an incident — orphan + domain admin + live use is a probable active compromise; preserve evidence, disable at source, revoke sessions, scope (as in Ex. 21). (2) dave next — an orphaned account for a departed employee; disable at source (lower urgency because no recent privileged use is noted, but still an ex-employee orphan). (3) svc_etl — investigate, do not blindly disable: it is a claimed service account with an owner team, so confirm with Data Eng whether it is still needed and bring it under proper service-account governance (Chapter 20) rather than treating it as an orphan to kill. (Encoded as ex23_ctf() → orphans ['dave', 'ghost_admin', 'svc_etl'], most dangerous ['ghost_admin'].)


Chapter 19

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design-oriented, or discussed in class. Code-traceable solutions also appear in code/exercise-solutions.py.

1. Privileged account — an account whose elevated permissions could compromise systems if misused. Credential vaulting — storing privileged secrets in an audited vault and brokering access so humans need not know a password to use it. Just-in-time (JIT) access — granting privilege only for a bounded window, auto-removed, so no standing access exists. Tiered administration — partitioning admins/systems into sensitivity tiers and forbidding higher-tier credentials from being exposed on lower tiers. Privileged access workstation (PAW) — a dedicated, hardened machine used only for administration (no email/web). Combined sentence: "To administer a domain controller (a Tier 0 system), an engineer uses a PAW under tiered administration, checks out the privileged account from credential vaulting, and holds it only for a short just-in-time window."

4. A break-glass account is a highly privileged account reserved for emergencies — when the PAM system, identity provider, or MFA is down and you must still be able to get in. You need one (or two) because locking yourself out of your own domain during an outage is a real, serious risk. Three rules that keep it from becoming a backdoor: (1) store the credential offline, in a sealed, physically secured location (a safe), long and random, not in the spreadsheet you were trying to retire; (2) alert loudly on any use — a break-glass login should page the SOC instantly, because its use is always either a genuine emergency or an attack; (3) test and rotate it on a schedule, and never exempt it from monitoring "because it's the emergency account." It is a deliberate, monitored exception, not an unwatched bypass.

7. At least eight categories of privileged account to hunt for, and where: (1) Domain/Enterprise Admins — AD privileged groups, resolving nested-group membership; (2) Local administrators — every workstation and server (look for a shared/identical local Administrator password); (3) Service accounts — accounts running services and scheduled tasks, especially any nested into privileged groups; (4) Database adminssa/DBA accounts on each database, and credentials embedded in apps and runbooks; (5) Hypervisor/storage admins — vCenter, SAN management consoles; (6) Backup operators — the backup software (they can read everything by design); (7) Cloud privileged identities — the AWS root user and high-privilege IAM roles, Entra ID Global/Privileged Role admins; (8) Security-tool admins — the SIEM, EDR, and the PAM vault itself (the most-forgotten and most-dangerous). Bonus: accounts that can modify Group Policy (push code to every machine) and PKI/CA admins. The discipline: resolve group nesting and count paths to admin, not just the named admins.

10. By hand: - DOMAIN\da-jones: not vaulted → NOT_VAULTED; standing → STANDING_ACCESS; shared=False → (none); 200 days → STALE_CREDENTIAL. 3 flags. - Administrator: not vaulted → NOT_VAULTED; standing → STANDING_ACCESS; shared and not recorded → SHARED_NO_RECORDING; 410 days → STALE_CREDENTIAL. 4 flags. - svc-report: vaulted → (no NOT_VAULTED); standing → STANDING_ACCESS; not shared → (none); 30 days → (no STALE). 1 flag. Return order (most flags first): Administrator (4), DOMAIN\da-jones (3), svc-report (1). For the worst account (Administrator), the flags most directly enable rung 3 (credential harvesting) — it is a standing, shared, unrecorded credential that can be sitting on many hosts — and rung 5 (escalation) — because standing access means a harvested credential is immediately powerful with no further escalation needed. (Confirmed in code/exercise-solutions.py, ex10.)

11. Weaknesses and fixes: - Shared domain admin password in a manager, 8 months oldvault the account with brokered access and check-in rotation; make it JIT-eligible, not standing (§19.2, §19.3). - Identical local Administrator password on every machine → deploy LAPS for a unique, rotated local-admin password per machine (kills the rung-4 shared-key lateral-movement highway) (§19.2). - Admins using domain admin accounts on their own laptops for emailtiering: separate Tier 0 account used only from a PAW; enforce logon restrictions denying Tier 0 accounts logon to Tier 2 (§19.4). This is the single most important fix — it closes rung 3. - Emergency domain admin in the same shared manager, exempt from rotation, no alerting → proper break-glass design: offline/sealed storage, long/random, alert on every use, scheduled test + rotation (§19.2). - No privileged-session recording → enable session recording on privileged sessions for accountability, audit, and forensics; consider real-time monitoring on Tier 0 (§19.5).

15. Three-tier model for Meridian (one acceptable placement): - Tier 0 (control plane): domain controllers; the PAM vault. (Also AD, PKI/CA, the IdP.) - Tier 1 (servers): the core-banking app servers. (Also databases, VMware, business apps.) - Tier 2 (workstations): user laptops; help-desk workstations. - AWS IAM root + high-privilege roles are a cloud control plane — treat as Tier 0 equivalent (their own privileged tier), administered only from a PAW with JIT. - Logon rule between tiers: a higher-tier credential is never used to log on to a lower tier; enforce with OS-level/auth-policy logon restrictions that deny Tier 0 accounts the right to authenticate to Tier 1 and Tier 2. PAW requirement for Tier 0: all Tier 0 administration is performed only from a dedicated, hardened PAW (no email/web, application allowlisting). Diagram per Figure 19.1.

19. Example PAM policy statement: "Meridian manages all privileged access — any account that can administer systems, identities, or data — under this standard. The default posture is no standing privileged access: privileged rights are granted just-in-time, for a bounded window, with approval for the most sensitive tiers, and removed automatically. All privileged credentials are held in an audited vault and rotated automatically, including after each use for the highest-risk accounts; no individual administrator knows the password of a Tier 0 account. Privileged sessions on sensitive systems are recorded for accountability and audit. Administration of the control plane is performed only from dedicated privileged access workstations under a tiered model in which higher-tier credentials are never exposed on lower-tier systems. A small number of break-glass emergency accounts exist for outages; they are stored offline and any use is alerted and investigated. The objective is that the compromise of a single account cannot lead to administrative control of the bank."

23. This is the §19.1 escalation ladder in progress — almost certainly an active intrusion moving toward domain admin. Reading it: at 02:14 a domain admin account (da-smith) authenticates to a Tier 2 laptop (ws-1147) with no vault checkout in the window — both anomalies (a Tier 0 credential on Tier 2, and out-of-band privileged access). It then authenticates to a Tier 1 database server, where the EDR flags LSASS memory access (credential harvesting, rung 3), then reaches a domain controller (rung 5/6). Strongest single indicator: the LSASS memory read on a server by a privileged account (D5) — it is unambiguous credential-theft tradecraft — though the out-of-band privileged logon with no checkout (D1) and the Tier 0 credential on a Tier 2 host (D2) are nearly as strong and fire earlier. Rungs seen: 3 (harvest), 4 (lateral movement, the chain of logons), heading into 5/6 (domain admin / dominance). This should be a CRITICAL, immediate IR escalation.

25. First five response actions, in order (bridges to Chapter 24): 1. Treat the privileged account as compromised and rotate/disable it immediately via the vault — kill the credential's usefulness now; do not wait. (If JIT, revoke any active activation.) 2. Terminate the active privileged session (real-time session monitoring) on the domain controller to stop further action while you investigate. 3. Isolate the source/affected hosts (the DC and any host the account touched) from the network to contain lateral movement, balancing against the operational impact of isolating a DC (escalate the decision). 4. Pull the session recording and the vault/auth logs to scope exactly what the account did and where it went — what was accessed, what was changed. 5. Page the incident commander and begin the formal IR process (Chapter 24), assuming domain compromise until proven otherwise (check for new accounts, GPO changes, backup tampering). The PAM artifact that most accelerates the investigation is the session recording (§19.5): it turns "we think the attacker may have reached the DC" into "here is exactly what they ran and touched," giving the responder a precise blast radius instead of a guess.

27. The invisible admin (svc-legacy-sync). Full risk: this is a standing, privileged (nested-into-Domain-Admins), unowned, never-rotated, unvaulted service account running on six servers — i.e., a ready-made rung-5 prize. Walking the ladder: an attacker who reaches any of the six servers (or finds the account's password in a config file / scheduled-task definition) obtains a credential that is effectively domain admin with no escalation needed — and because it has never rotated, even a years- old captured copy still works. It is especially dangerous because it is a non-human account: its credential is typically stored in plaintext or weakly in config/scripts where anyone with file access can read it; it does not get MFA; nobody watches it log in; and "an application needs it" deters anyone from touching it, so it persists indefinitely. Safe remediation plan: (1) do not blindly disable it — that could break the unknown application; first discover what it actually does: which six servers, which scheduled tasks/services run as it, what it connects to, and what it truly requires. (2) Right-size it: determine the minimum privileges the application needs and remove it from Domain Admins (almost certainly it does not need domain-admin rights). (3) Vault and rotate the credential, updating the application configuration to retrieve it from the vault (or move to a managed/group-managed service account or workload identity — Chapter 20). (4) Monitor it (alert on any interactive logon, which a service account should never do). What you must discover before you can safely disable or vault it: its real dependencies — every place the credential is used and the exact permissions the application requires — because changing or rotating it without that knowledge will break production. The lesson: the most dangerous privileged account is the one nobody owns and nobody is watching.


Chapter 20

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Secret — any confidential value (password, key, token, certificate) that grants access or proves identity to a system. Machine identity — the identity a non-human entity (script, service, container, job) uses to authenticate. Workload identity — granting a workload access based on what it provably is and where it runs, rather than a secret it carries. Service account — an account used by a program or automated process rather than a human. Secret sprawl — secrets copied across many locations with no central control, inventory, or rotation. Combined sentence: "A hard-coded API key is secret sprawl the moment it is committed to source (now in code, history, and every clone), and it becomes a secret leak the instant that repository reaches a place an attacker can read it — turning sprawl into breach."

4. Non-human identities outnumber human ones (commonly cited 10×–50× in cloud-heavy environments) because every microservice, scheduled job, function, container, pipeline, device, and vendor integration needs its own identity to authenticate, and these are created far more freely than employees are hired — often ad hoc, with no lifecycle to retire them. Three contributing categories: (1) service/application accounts (services authenticating to databases and to each other); (2) workload identities (containers, functions, CI jobs); (3) device and integration identities (IoT/devices and third-party API integrations). The count compounds because, unlike employees, machine identities are rarely retired.

6. Three ways environment variables are weaker than a vault: (1) they are readable by any process running as the same user and are inherited by child processes, so a single compromised process or a printenv in an error handler leaks all of them; (2) they routinely end up in logs, crash reports, and debug endpoints; (3) they have no per-access audit trail, no automatic rotation, and no expiry — you cannot tell who read a secret or force it to change. One way they are nonetheless an improvement over hard-coding: the secret is no longer committed to source control, so it is not preserved in git history or in every developer's clone. (Env vars are a step, not a destination; the destination is the workload holding no secret and fetching a short-lived one from a vault at runtime.)

8. Workload-identity redesign: instead of baking an IAM access key into the container image, run the container on AWS compute (e.g., ECS/EKS/EC2) with an IAM role attached, scoped to only read access to only the one S3 bucket the service needs. The container retrieves temporary, auto-rotating credentials from the platform's instance/container metadata service at runtime — there is no static key in the image to leak, harvest, or rotate manually. Residual risk: a vulnerability in the application (classically SSRF, Chapter 13) could trick it into fetching its own credentials from the metadata service and returning them to an attacker. Mitigations: require the session-token-protected version of the metadata service (defeats naive SSRF), keep the role least-privileged (so stolen temporary creds can do little — read one bucket, not administer the account), and rely on the short credential lifetime plus behavioral detection. The key point: the long-lived secret is eliminated; the remaining risk is smaller, shorter, and detectable.

11. With now = 2026-06-14 (UTC), cert_days_left = (notAfter − now).days: - (a) 2026-06-20 → 6 days → triggers 30-day alert (in fact CRITICAL at ≤7). - (b) 2026-09-30 → 108 days → OK, no alert. - (c) 2026-06-10 → −4 days → already EXPIRED (also "alerts", but it is too late — a self-inflicted outage in progress). - (d) 2026-07-14 → 30 days → triggers the 30-day renewal alert (renew now; do not wait for it to drop further). Items (a), (c), and (d) require action; (b) does not yet.

13. Short-lived certificates are operationally more secure for two reasons. Renewal: a multi-day (or multi-hour) lifetime makes manual renewal impossible, which forces automation, and an automated renewal process does not forget a date the way a human-maintained spreadsheet does — so the most common outage (expiry) is structurally prevented. Revocation: revocation via CRL/OCSP is unreliable because clients frequently fail open (trust the certificate) when they cannot reach the revocation endpoint, so a compromised long-lived certificate may keep working despite being "revoked"; a short-lived certificate effectively revokes itself by expiring quickly, bounding the exposure window from a stolen private key regardless of whether revocation actually reaches every client. Short lifetimes turn a fragile control (revocation) into an automatic one (expiry).

15. Findings (kind, value): - aws_access_key_id, AKIAIOSFODNN7EXAMPLEAKIA + 16 uppercase alphanumerics (IOSFODNN7EXAMPLE). - github_pat, ghp_EXAMPLEEXAMPLEEXAMPLEEXAMPLEEXAMPLghp_ + 36 chars. - private_key_block, -----BEGIN EC PRIVATE KEY----- — matches the private-key header pattern. The db_host value (10.20.0.15) is a documentation IP, not a credential; the greeting and note lines contain no credential-shaped token. (The note "rotate me, set in 2019, never touched" is a comment, not a secret — though it is an excellent description of the very problem the chapter addresses.) Correct response to the real findings: rotate each secret and migrate to a vault or workload identity; also remediate history, but rotation is the action that closes the exposure.

17. A reasonable pattern: \bxox[bp]-[0-9A-Za-z\-]{10,}\b (matches xoxb-/xoxp- followed by at least 10 characters of the allowed set). True-positive example (fake): xoxb-EXAMPLE-FAKE-1234567890. False-positive risk: a naive pattern such as xox\w+ would match benign words or identifiers that merely start with "xox", and even this pattern could flag a non-secret string that happens to fit the shape. To reduce false positives: require the - separator and a minimum length (done here), validate the full known token structure where the format is more specific, and/or corroborate with entropy or context (e.g., the variable name). Operational note: confirm a hit is a real secret before declaring an incident, but for any value confirmed to be a true live secret, the response is to rotate — false-positive tolerance should never become a reason to skip rotating a real leak.

20. (a) The pattern — a credential that for years ran one boring job (02:00, one bucket, two write actions) now being used at 03:14 from a foreign IP to call ListAllMyBuckets and ListUsers — indicates a compromised/leaked credential being used for reconnaissance by an attacker enumerating what the key can see. (b) First two containment actions: disable (rotate) the access key to immediately sever access, and preserve the credential's full audit/CloudTrail history before anything changes, to scope what was touched. (c) The single action that actually stops the attacker's access is to rotate — disable the leaked key and issue a new credential; nothing else (deleting code, blocking the IP) reliably ends access, since the attacker holds the key and can move infrastructure. (d) The long-term fix from this chapter's standard: move the job to workload identity (an IAM role → no static key exists to leak) and scope it least-privilege to the one bucket; secondarily, the behavioral detection that fired is itself the preventive monitoring control to keep.

22. Example four-rule secrets-management standard (each enforceable and auditable): 1. Storage: "No secret shall be committed to source control, container images, or configuration files; all secrets shall be retrieved at runtime from the approved secrets vault." (Auditable by secret scanning of repos/images.) 2. Rotation: "Every static secret shall rotate automatically at least every 90 days; secrets that support dynamic issuance shall be issued short-lived (TTL ≤ 24 hours)." (Auditable by vault rotation records.) 3. Service-account privilege: "Every service account and machine role shall be scoped to the minimum permissions its workload requires, reviewed quarterly, and denied interactive logon." (Auditable by reviewing entitlements and logon-rights settings.) 4. Certificate expiry: "Every certificate shall be in a central inventory with automated expiry monitoring; certificates shall be renewed before expiry, with alerts at 30 and 7 days." (Auditable by the inventory and alert records.)

25. (a) Most likely attack hypothesis: an attacker who has compromised a workload identity (e.g., via a vulnerable app or a poisoned dependency in the pod) is performing lateral movement — probing the vault for high-value secrets it does not normally use, having found that this pod's identity is permitted (over-broadly) to read legacy/partner-api-key. (b) Two anomalies that should have been detections: (1) first use of a long-dormant secret (legacy/partner-api-key, untouched for 414 days, suddenly requested) and (2) a workload requesting a secret it has never requested before (the pod normally reads only app/config). (Optionally a third: the rate — three requests in ten minutes.) (c) Rotation order: first rotate legacy/partner-api-key (the secret actually exfiltrated, and the one granting external partner access — stop that access now); then rotate/revoke the compromised workload identity's credentials and tighten its vault policy so it can no longer read secrets outside its need; then review and rotate any other secrets that identity was permitted to read, on the assumption they may also have been pulled. (d) §20.1's "machine behavior is boring" made this catchable because a workload's secret-access pattern should be perfectly regular — this pod reads one config secret, always — so a request for a never-before-touched, long-dormant partner key is a stark, high-confidence deviation. A human's access patterns vary naturally (people legitimately open new resources all the time), so the same behavior from a person would be ambiguous and likely missed.


Chapter 21

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design-oriented, or discussed in class. Constructed logs and figures are illustrative (Tier 3).

1. Log source — any system/application that produces timestamped event records (AD, EDR, firewall…). Normalization — mapping fields from many source formats onto one common schema (consistent names and formats). Parsing — extracting the meaningful fields out of a raw log message (the step before normalization). Correlation rule — logic that fires an alert when a defined pattern occurs across events/sources/time. Use case — the named threat scenario a rule serves, with its sources, severity, response, and false-positive risk. Combined sentence: "To detect a VPN brute-force, the SIEM collects the VPN log source, parses each event and normalizes it to the common schema, then a correlation rule (many failures then a success for one account) implements the use case 'credential brute-force resulting in access.'"

4. Alert fatigue is the desensitization and degraded performance analysts suffer when alert volume — especially false positives — exceeds what they can meaningfully investigate. Excellent coverage can still miss attacks because coverage without fidelity floods the queue: with five analysts able to investigate ~20 alerts each (≈100/shift) but 800 alerts/day at a 97% false-positive rate, there are only ~24 true positives buried in ~776 false ones, and the team can touch only ~100 alerts total. The true positives are not missed through laziness but because they are statistically un-findable in that much noise — the attacker needs only one ignored alert, and the flood supplies it. Fidelity, not coverage, determines whether attacks are actually seen.

7. (a) Password spraying — the field pattern is one src_ip (203.0.113.88) against many distinct user values, each a single failure, in seconds. (Brute force is the opposite: one user, many failures.) (b) The outcome=success for tbrandt at 02:14:11 should escalate severity — the spray worked on one account, turning reconnaissance into access. (c) It matches use case #2 (password spraying) and, once the success lands, shades into #1/#3 (a successful unauthorized login). (d) Two tuning conditions to keep it high-fidelity: require the source IP to be not seen before for these accounts (excludes a noisy-but- benign internal scanner), and require the failures to span several distinct users from one source within a short window (the defining spray shape) — and treat the success-after-spray as the high-severity trigger.

9. (a) It is high-fidelity because clearing the audit log is rarely legitimate — it is an "Indicator Removal" technique (ATT&CK T1070.001) attackers use to cover their tracks, so a single such event is meaningful on its own without needing correlation. (b) That svc_backup, a service account, cleared the log is doubly suspicious: service accounts run software and have no business clearing audit logs interactively, suggesting the account is compromised and being used by an attacker. (c) Immediate follow-up queries: (1) everything svc_backup did on FILESRV2 and elsewhere in the surrounding window (did it log in interactively, run unusual processes, change privileges?); and (2) whether logging stopped or has gaps on that host after the clear (an attacker may have disabled forwarding) — and confirm the central SIEM still has the pre-clear events, since they were forwarded off the box.

11. Allowlist (with a documented entry per benign source), not disable, tune, or suppress. The rule catches a genuinely valuable behavior (a server reaching the internet, a real C2 indicator), so disabling it would create a blind spot; the noise comes from two specific, known-legitimate destinations (the patch CDN and a status API), which is exactly the case an allowlist handles — exclude those known-good destinations while leaving the rule firing on every other outbound connection. (Tuning a threshold would not help, since the issue is which destinations, not how many; suppression by time would not help, since the noise is continuous.) Document each allowlist entry with an owner and review date — an allowlist is a deliberate hole.

13. Impossible travel sequence rule. Data needed: successful login events with a src_ip from which an approximate geolocation can be derived (a geo-IP lookup gives city/coordinates), plus accurate UTC timestamps. Logic: for each user, take consecutive successful logins; compute the geographic distance between their source locations and the time elapsed; if the implied travel speed exceeds what is physically possible (e.g., faster than a commercial flight), alert. Most likely false positive: corporate VPN or cloud egress — a user's traffic exits from two different data centers, so they appear to be in two cities at once. Tune it out: allowlist the corporate VPN egress ranges and exclude cloud- provider IP space (so VPN-induced "travel" is ignored), and raise the implied-speed threshold so plausible same-day air travel does not trip it; optionally require that one of the two locations be genuinely novel for the user.

15. Use case specification for "tell me when someone turns off MFA for a user": - Use case name: MFA disabled or reset for a user account. - ATT&CK: T1556 (Modify Authentication Process) — weakening an account's protection. - Log sources: identity provider audit logs (Entra ID / IdP) — events for "authentication method removed," "MFA disabled," or "MFA registration reset." - Trigger logic: an administrative action that disables or resets MFA for any user account, especially if the actor is not a member of the approved identity-administration group, or the target is a privileged account. - Severity: High (an attacker who has admin access often disables a victim's MFA to maintain access; legitimate resets also happen, so triage is required). - Analyst response: confirm the change was an approved help-desk/identity action for that user; if unexpected, treat the actor's account as potentially compromised, re-enable MFA, and hunt for what the actor did before and after. - Main false-positive risk: legitimate help-desk MFA resets (a user lost their phone). Tune by allowlisting the help-desk workflow/service and by raising severity when the target is privileged or the actor is unusual.

17. Same investigation in all three languages — "last 24h, src_ip 203.0.113.88, count by user, most first":

SQL:

SELECT user, COUNT(*) AS attempts
FROM events
WHERE action='login' AND src_ip='203.0.113.88' AND timestamp >= NOW()-INTERVAL '24' HOUR
GROUP BY user ORDER BY attempts DESC;

SPL:

index=auth action=login src_ip="203.0.113.88" earliest=-24h
| stats count AS attempts by user
| sort - attempts

KQL:

Events
| where action == "login" and src_ip == "203.0.113.88"
| where timestamp >= ago(24h)
| summarize attempts = count() by user
| sort by attempts desc

All three share the shape filter (lead with time bound) → aggregate (count by user) → sort descending. SQL leads with the projection/GROUP BY; SPL and KQL read top-to-bottom as pipelines.

19. Spraying detection — src_ip with ≥20 failures across ≥5 distinct users in the last hour (SQL):

SELECT src_ip,
       COUNT(*)             AS failures,
       COUNT(DISTINCT user) AS distinct_users
FROM events
WHERE action='login' AND outcome='failure'
  AND timestamp >= NOW() - INTERVAL '1' HOUR
GROUP BY src_ip
HAVING COUNT(*) >= 20 AND COUNT(DISTINCT user) >= 5
ORDER BY failures DESC;

The COUNT(DISTINCT user) is the key: it is the distinct-user count (not the raw failure count) that distinguishes spraying (one source, many accounts) from a single account's brute force. The HAVING applies both thresholds after aggregation.

21. Three tuning changes that cut false positives on a "5 failed logins in 10 minutes" rule without blinding it to a real spray-then-success: 1. Raise the threshold and shorten the window (e.g., 15 failures in 2 minutes). Excludes: a user mistyping a new password a few times over several minutes. Preserves: an automated spray, which is fast and high-volume. 2. Require the failures to span multiple distinct accounts from one source (spraying) or require a never-before-seen src_ip for that account. Excludes: one legitimate user fumbling their own password from their normal device. Preserves: an attacker hitting many accounts, or one account from a new location. 3. Require the subsequent success to come from a different src_ip than the user's baseline. Excludes: "mistyped then succeeded from my usual laptop." Preserves: "failures then a success from a new attacker IP" — the dangerous case. None of these disables the rule; each narrows it to exclude a specific benign pattern while keeping the malicious one.

23. Risk-based alerting scheme for a user account. Instead of each minor detection paging an analyst, low/medium-signal events add to a per-user risk score, and only a threshold score surfaces an alert. Five contributing signals with rough weights: - Login from a never-before-seen country/ASN: +3 - Login outside the user's normal hours: +1 - Access to a sensitive system above the user's baseline volume: +4 - MFA method changed/reset recently: +3 - A small burst of failed logins before a success: +2 Surface an alert when a user's accumulated score crosses, say, 6 within a rolling window. This reduces fatigue compared with binary alerting because no single weak signal pages anyone — a lone off-hours login (+1) is ignored — but a combination that indicates real risk (new-country login +3 and sensitive bulk access +4 = 7) rises above the threshold and gets one consolidated, high-context alert instead of several low-value ones. It turns a flood of weak signals into a ranked, manageable few (the lineage of UEBA in Chapter 34).

26. Prioritized "first ten" for a bank specifically, with the top three defended (orderings vary; the justification matters): 1. Brute force / spraying resulting in successful login — a bank's online-banking and VPN front doors are constantly attacked from the internet; a successful credential attack is the most common path to customer funds and a reportable breach. First because it is both the most likely and the highest impact. 2. New privileged-group membership / service-account interactive logon — the six-day-foothold class of attack: an attacker who is already inside escalates toward the core. A bank's crown jewels (core ledger, AD) are reached by privilege escalation, so detecting escalation early is decisive. 3. Mass file access or deletion (ransomware) — ransomware against a bank threatens availability of transaction processing, a life-or-death operational risk and a regulator's concern; early detection of mass file operations buys the minutes that limit the blast radius. The remaining seven (impossible travel, log clearing, disabled-account logins, MFA disabled, outbound to new/known-bad IP, etc.) round out coverage. The argument for a bank is that #1–#3 map directly to the bank's worst outcomes — stolen customer funds, compromise of the core, and loss of transaction availability — which is exactly the prioritization-by-impact discipline from Chapter 1.

29. The six-day foothold. Correct chronological order (note the out-of-order line — the 03:11:50 whoami /priv is listed third but actually occurs before the 15:12:04 group_add, which is why UTC and synchronized clocks matter for ordering):

day1 14:02:10  win_security  svc_app  login (interactive) success  host=APPSRV9   <-- service acct, interactive: anomaly
day1 14:09:33  edr           svc_app  process "net group /domain"            host=APPSRV9   <-- discovery
day1 03:11:50  edr           svc_app  process "whoami /priv"                 host=APPSRV9   <-- (precedes the group_add)
day1 15:12:04  win_security  svc_app  group_add -> "Domain Admins"  success  host=DC01      <-- privilege escalation

Which rule would have caught it, and where: Use case #6 — service account interactive logon (never-before-seen) — would have alerted at step one, 14:02:10, the moment svc_app logged in interactively for the first time, long before the escalation. Use case #5 — new privileged-group membership — would have caught the group_add at 15:12:04 as a second, overlapping safety net. Rule (#6, pseudocode):

ALERT if user in SERVICE_ACCOUNTS
   and action=login and logon_type=interactive and outcome=success
   and (user, host) not seen interactive in prior 30 days

Why each event alone would not alert: service accounts log in (constantly, as service/network logons — the interactive type is the tell, which a single generic "login" rule ignores); administrators run net group and whoami legitimately; group memberships change during normal IT operations. Only reading the sequence — an interactive service-account logon, then discovery, then self-escalation to Domain Admins within hours — reveals the attack, which is precisely why the silos that kept these three sources apart left the bank blind for six days.


Chapter 22

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design tasks, or discussed in class/lab. Sigma rules are shown in logic form; technique IDs are real ATT&CK identifiers.

1. Threat detection — identifying malicious or unauthorized activity, automatically or by analysis. Detection engineering — the disciplined building, testing, and maintenance of the automated rules that fire alerts. Threat hunting — the proactive, human-led search for adversary activity that no alert caught. The loop: detection engineering automates what you can anticipate; hunting finds what you didn't; every hunt finding is fed back as a new automated detection, so the gap it revealed is closed permanently.

3. Indicator-based detection matches known-bad static artifacts (a hash, IP, domain, registry key) in telemetry. Behavioral detection matches adversary technique/behavior (the action they must perform). Indicator-based: advantage — cheap, fast, precise, few false positives; limitation — brittle, evaded by a free value change, only catches what you've seen before. Behavioral: advantage — durable, catches novel and stealthy adversaries because it keys on what they must do; limitation — harder to write and generates more false positives that must be tuned away.

6. Strategic — high-level (who targets your sector and why): "Financial-sector banks are being hit by supply-chain-compromise actors." Operational — campaign-level TTPs mapped to ATT&CK: "This group gains access via a trojanized update and beacons over jittered HTTPS." Tactical — specific indicators: "These 3 domains and 2 hashes belong to the campaign." Most valuable to a detection engineer is operational, because TTPs sit near the top of the pyramid of pain and yield durable behavioral detections, whereas tactical indicators are short-lived and strategic intelligence informs priorities, not specific rules.

7. (a) Row 10.20.5.40 -> 203.0.113.77 is the clearest C2 beacon — driven by the combination of high conn_count to a single destination (automated, not human), very low jitter_stddev (9s) (regular = beacon-like), and one distinct_dst (a dedicated channel). Row 10.20.5.51 -> 198.51.100.9 is also beacon-shaped (low jitter, single dest) and must be triaged, not dismissed. (b) Row 10.20.5.22 -> CDN edge is almost certainly benign human/CDN web traffic: 37 distinct destinations and highly variable interval — that is browsing/content delivery, not a single regular beacon. (c) The fastest confirmation is enriching the destination against the asset/vendor inventory and resolving its domain/age — a server- management host talking to a recently-registered, unlisted external IP is the tell. (d) ATT&CK: tactic Command and Control, technique T1071.001 (Application Layer Protocol: Web Protocols), with T1573 (Encrypted Channel) since it is HTTPS.

9. Indicators (match-only, short-lived): the 3 file hashes, 2 IPs, 1 domain — load into the TIP/SIEM, expect them to be stale. Techniques (engineer for these): (i) "persistence via a scheduled task running a signed-but-abused binary" → T1053 (Scheduled Task/Job) + signed-binary-proxy abuse → make it a detection (scheduled-task creation is reliably logged; a clean behavioral rule is feasible). (ii) "beacons over HTTPS every 15 minutes" → T1071.001 → make it both a detection (the durable beacon rule) and, if you suspect current compromise, a hunt to find any existing beacon now. Rationale: techniques you can express as a reliable, low-false-positive rule become detections; those you cannot, or those you want to check retroactively, become hunts.

11. Sigma rule (logic form) for rundll32 launched with a URL:

title: Rundll32 Executing a Remote URL Payload
id: <any-uuid>
description: rundll32.exe launched with a URL in its command line (remote payload exec).
references: [https://attack.mitre.org/techniques/T1218/011/]
tags: [attack.defense_evasion, attack.t1218.011]
logsource: {category: process_creation, product: windows}
detection:
  selection:
    Image|endswith: '\rundll32.exe'
    CommandLine|contains:
      - 'http://'
      - 'https://'
  condition: selection
falsepositives:
  - Rare legitimate scripts that pass a URL argument to rundll32 (uncommon; review).
level: high

The key moves: select on the image (rundll32) AND a URL in the command line; tag with T1218.011 (System Binary Proxy Execution: Rundll32); document the (rare) benign case; set high.

13. ATT&CK mappings (confidence noted): (a) account added to local Administrators → tactic Privilege Escalation / Persistence, T1098 (Account Manipulation) or the local-group-modification sub-behavior (high confidence on tactic; describe generically if unsure of exact sub-ID). (b) wmic spawning a process on a remote host → Execution / Lateral Movement, T1047 (Windows Management Instrumentation) (high). (c) clearing the Windows Security event log → Defense Evasion, T1070.001 (Indicator Removal: Clear Windows Event Logs) (high). (d) DNS query for a long high-entropy domain → Command and Control, T1568.002 (Dynamic Resolution: Domain Generation Algorithms) (high). (e) new service running from a temp directory → Persistence / Privilege Escalation, T1543.003 (Create or Modify System Process: Windows Service) (high).

16. Hypotheses (template: "If an adversary were doing [technique], we'd see [observable] in [data source]"): (a) Lateral movement — "If an adversary were moving laterally via SMB/admin shares (T1021.002), we'd see one source host authenticating to admin shares (C$/ADMIN$) on many distinct hosts in a short window — look in Windows logon (4624/4672) and SMB access logs." (b) Exfiltration — "If an adversary were exfiltrating data (T1041 / T1048), we'd see an internal host sending an unusually large outbound volume to a single external destination, especially off-hours — look in NetFlow/Zeek conn.log byte counts per host-destination pair." (c) Persistence — "If an adversary established persistence via a registry Run key (T1547.001), we'd see writes to ...\CurrentVersion\Run pointing to an executable in a user/temp profile — look in registry-modification telemetry (Sysmon EID 13)."

18. Hunt loop for LSASS credential dumping: (1) Hypothesis — "If an adversary dumped credentials from LSASS (T1003.001), we'd see a non-system process open a handle to lsass.exe with memory-read access." (2) Data — process-access telemetry (Sysmon Event ID 10 or EDR); many orgs do not collect it by default — if absent, that is the finding (a visibility gap). (3) Query/analytic — filter process-access events where TargetImage ends in lsass.exe and GrantedAccess includes read rights, excluding known-good sources. (4) Triage — Windows Defender (MsMpEng.exe) and some monitoring/EDR agents legitimately touch LSASS; suppress those, investigate the rest. (5) Conclusion — confirm (a non-allowlisted process read LSASS → likely credential theft), refute, or "inconclusive — no data." (6) Operationalize — deploy the LSASS-access Sigma rule (with the known-good filter) as a durable detection, and if data was missing, file the Sysmon/EDR deployment as a data-source gap.

20. A hunt ends with operationalization because the hunt's lasting value is the permanent program improvement, not the single answer. The two operational outputs are: (a) if you found something, a new durable detection so the same activity is caught automatically next time (the gap is closed forever); (b) if you couldn't even look (missing/incomplete data), a documented visibility gap to fund. A hunt that produces neither has, at best, postponed the next miss: you spent analyst time, learned nothing reusable, and the same blind spot remains for the adversary to exploit again.

22. Coverage sketch for rules {phishing-click, Office→shell, brute-force, known-bad-IP}:

  Initial Access   Execution      Credential Access   Command & Control
  phishing ███      Office→shell ███   (none) ░░░          known-bad-IP ▒▒▒
                                                          (IoC only, brittle)

The two biggest blind spots: Credential Access (none) — an adversary could dump credentials or stuff them and you'd never see it; for a bank, stolen credentials are the primary path to customer funds and lateral movement. Command and Control (indicator-only) — matching known-bad IPs is bottom-of-pyramid and useless against fresh infrastructure (the SolarWinds problem); a bank needs a behavioral beacon detection. Both gaps mean a targeted adversary's core steps are invisible.

24. A red cell means either "we have no detection rule" or "we have a rule but don't collect the data it needs." The data gap is the more expensive one: writing a rule takes an afternoon, but deploying a new telemetry source (e.g., process-access logging across thousands of endpoints, some locked by vendors) is a multi-month, cross-team, budgeted project — and the rule is worthless until the data exists. Coverage mapping doubles as data-source gap analysis because, to mark a technique "covered," you must verify both that a rule exists and that the SIEM ingests the data source the rule queries; the exercise of checking each technique surfaces exactly which telemetry you are missing, which "write more rules" never reveals.

26. (a) A fixed-interval detector keyed on "interval == 300s exactly" misses this beacon because the adversary added jitter — the per-connection gap varies, so it never equals a single fixed value, and the rule never matches. A low-variance detector (STDDEV(inter_arrival) < threshold) catches it because, despite the jitter, the timing is still tightly clustered (counts ~28–33/hour, low spread) — regularity, not exact periodicity, is the real signature. (b) If the adversary randomizes the count aggressively (5–55/hour), a second signal still betrays the beacon: the destination itself — a single, long-lived, unexplained external destination that a server-class host contacts repeatedly over days, regardless of per-hour count; also consistent small payload sizes typical of check-in traffic. (c) This whole game is on the network/host-artifact and below layers of the pyramid (timing, counts, infrastructure). The higher-pyramid move that ends it: detect the TTP/context — "a server-tier host that should never talk to the internet is talking to an unknown external destination at all" — which the adversary cannot evade without abandoning C2 from that host entirely.


Chapter 23

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. CVE — a unique public identifier for one disclosed vulnerability (a name/catalog entry). CVSS — a 0–10 score of a vulnerability's intrinsic severity. EPSS — the probability (0–1) a vulnerability will be exploited in the wild in the next 30 days. KEV — CISA's catalog of vulnerabilities known to be actively exploited right now. Combined: "Log4Shell is the vulnerability named CVE-2021-44228; its CVSS of 10.0 said it was maximally severe, its EPSS near 0.94 said exploitation was almost certain, and its presence on the KEV catalog confirmed it was already being exploited — so it jumped every queue."

4. The skipped stage is Verify. Deploying a patch is not proof it applied successfully on every host, didn't get rolled back, and left no second vulnerable copy behind; only a re-scan (authenticated) confirms the finding is actually gone. The distinction matters because "deployed" and "remediated" are different states — a program that trusts ticket status over re-scan results ships silent gaps to production (exactly the Case Study 1 hosts with a hidden second copy of Log4j). What confirms closure: an authenticated re-scan no longer reporting the finding.

7. (a) Authenticated — only a credentialed login can read actual installed OS patch levels; an external probe at best guesses from banners and misses local state. (b) Unauthenticated — to see your true external attack surface you must look as an anonymous internet attacker does. (c) Authenticated — reading exact installed library versions on a managed fleet requires logging in; banners can't be trusted and many libraries aren't externally visible. (d) Unauthenticated, from outside your perimeter — to test whether a host you believe is internal is in fact internet-reachable, you must probe it from the outside; an internal/authenticated view would not reveal the unintended external exposure.

9. (a) The Heartbleed line shows openssl 1.0.1f-1ubuntu2.27. Many Linux distributions back-port security fixes without changing the upstream version string, so a package that looks like vulnerable 1.0.1f may have been patched by the distribution — a classic false positive. Confirm by checking the distribution's package changelog / security tracker for the fixed package revision, or by testing the specific vulnerability rather than trusting the version banner. (b) The ExampleRCE finding is CVSS 9.8 but the service is installed, not running: there is no live exploit path, so its real risk (and priority) drops sharply — schedule it (remove or patch the unused package) rather than emergency it. (c) Likely remediation order: Log4Shell first (CVSS 10.0, and you would immediately check EPSS/KEV — it is on KEV and near-certain to be exploited; even on an internal segmented host it warrants urgent mitigation), then validate and remediate Heartbleed (real if not a false positive; gather whether the service is internet-reachable and whether private keys could be exposed), then the not-running ExampleRCE (lowest, no live path). The non-CVSS signals to gather first: EPSS and KEV status for each CVE, whether each affected service is actually running and reachable, and the asset's exposure/value.

11. A privileged read-only scanning account can read configuration and (often) sensitive data on every host in scope, so an attacker who steals it gains an enterprise-wide reconnaissance and data-access capability — it is effectively a master key. Three controls: (1) vault the credential and rotate it frequently, never hard-coding it (Chapter 20's secrets management); (2) restrict its use — source-IP-restrict it to the scanner appliances, deny interactive logon, and scope its permissions to the minimum the scanner needs (read-only, not admin write); (3) monitor it intensely — alert on any use outside scheduled scan windows or from any host other than the scanner, since the account should have an extremely predictable usage pattern.

12. Priorities (KEV or high EPSS dominates; context breaks ties): - (a) RCE, CVSS 9.8, EPSS 0.91, KEV, internet-facing portal → P1-Emergency (max risk on every axis). - (b) Same RCE, but isolated dev sandbox, no inbound path → still serious (KEV), but exposure is far lower; P2 at most in practice — mitigate/patch on schedule, not tonight. Same CVE as (a); asset context is the entire difference. - (c) Info-disclosure, CVSS only 5.3, but EPSS 0.88 and on KEV, internet-facing API gateway → P1-Emergency despite the modest CVSS, because it is actively exploited and exposed. (The clearest illustration that CVSS is not priority.) - (d) Priv-esc, CVSS 8.4 but EPSS 0.004, not KEV, internal workstation requiring prior local access → P4-Routine; an attacker must already be on the box and nobody is exploiting it. - (e) Deserialization, CVSS 9.1 but EPSS 0.03, not KEV, internal segmented batch server → P3-High; high severity but low exploit likelihood and limited exposure — scheduled, not emergency.

14. Critique: "sort by CVSS descending" prioritizes intrinsic severity in a vacuum and ignores the two things that actually make a vulnerability dangerous to you — whether it is being exploited (EPSS/KEV) and whether your affected asset is reachable and valuable (context). The specific failure mode: the rule buries low-CVSS-but-actively-exploited findings. Concrete example: a CVSS 6.4 vulnerability that is on KEV and EPSS 0.93 on your internet-facing VPN would sort below hundreds of CVSS 8–10 findings that nobody is exploiting on isolated systems — so the one flaw attackers are actually using gets worked last. The rule to use instead: risk-based prioritization — rank by CVSS × (EPSS/KEV) × asset exposure, with KEV-listed findings on reachable assets jumping the queue regardless of CVSS.

17. Example five-tier risk-based patch-SLA table for a hospital (internet-facing vs. internal):

Tier Definition Internet-facing Internal
Emergency On KEV and on an exposed/critical asset, or active exploitation against us 24–72 h 7 d
Critical High CVSS + high EPSS, or KEV on a less-exposed asset 7 d 14 d
High High CVSS, moderate EPSS, exposed 14 d 30 d
Medium Contained exposure, low exploit likelihood 30 d 60 d
Low Minimal real risk; may be accepted 90 d 90 d / accept

The top tier is tighter than a typical compliance minimum because actively-exploited, internet-facing flaws are being attacked now and a hospital's exposure includes patient-safety and availability stakes; the compliance bar is a floor (Theme 5), and a floor slower than the attacker is not protection. Un-patchable clinical devices are handled by mitigation + monitoring + governed exception, not by the patch SLA (they cannot meet it), which is exactly why the policy must pair SLAs with an exception process.

19. Example exception (risk-acceptance) policy section:

Exceptions to patch SLAs. An exception may be granted only when remediation within the SLA is not feasible, and only if it includes ALL of: (1) a documented justification stating the specific blocker (e.g., the patch breaks a named business system; no vendor fix exists yet); (2) at least one compensating control that demonstrably reduces the risk in the interim (segmentation, WAF/firewall restriction, access restriction, or feature disablement), with the control documented and linked to the exception; (3) a named, accountable risk owner in the business who formally accepts the risk; (4) an expiry date (maximum 90 days) and a mandatory re-review at expiry, at which the exception is closed or explicitly re-justified and re-approved — no exception auto-renews; and (5) risk-proportional approval: low-risk exceptions may be approved by a manager, while any exception covering a KEV-listed vulnerability or an asset in the cardholder/regulated-data environment requires CISO (and, above a defined threshold, risk-committee) approval. Removing or changing a compensating control triggers immediate re-review of every exception that depends on it.

21. Red flags: - EXC-0007internet-facing remote-access gateway with a KEV vulnerability, no compensating control, no expiry, never re-reviewed, owner is a vague group ("IT"), approved only at team-lead level for a KEV/internet-facing risk. This is a textbook landmine. Fixes: add real compensating controls (or remediate now); set an expiry and re-review immediately; assign a named senior owner; escalate approval to the CISO; given KEV + internet-facing, the right answer is almost certainly fix it now, not extend the exception. - EXC-0042 — basically healthy: low CVSS, not KEV, isolated lab, a real compensating control (network isolation), a named owner, a future expiry, and a recent re-review. Manager-level approval is appropriate for this low risk. (Minor nit: the owner approving their own exception is acceptable only at low risk; for anything higher, separate the roles.) - EXC-0051cardholder-data interface, CVSS 9.8, EPSS 0.80, KEV, expired four years ago but still "active", never re-reviewed, compensating control added two years ago (is it still in place and effective?), approved only at team-lead level for a critical regulated-data risk. Fixes: this should trigger an emergency review; verify the compensating controls actually still exist and work; escalate ownership and approval to the CISO/risk committee; and treat the underlying flaw as a funding priority for actual remediation — a KEV vulnerability on the CDE under a long-expired exception is precisely the Northgate breach pattern.

23. The permanent "temporary" exception (organizational drift) occurs when an exception granted for a real, time-boxed reason is silently renewed, never re-reviewed, loses its accountable owner (who leaves or reorganizes), and outlives its compensating control — so the risk persists by inertia rather than by any conscious decision. The single most important process control is a mandatory expiry date with forced re-review (ideally auto-escalating at expiry). It works by refusing to let the risk stay un-decided: at expiry the system forces a named, senior person to either close the exception or re-justify and re-sign it on the record — converting "nobody is deciding to keep this risk" back into "someone is consciously, accountably deciding to keep this risk," which is the difference between governed risk and a landmine.

25. First six actions in the first two hours of Log4Shell, on-call at Meridian (cannot patch everything): 1. Declare and staff the response — open the on-call bridge, pull in the SOC manager and an engineer; confirm the facts (CVE-2021-44228, RCE, no-auth, KEV, exploited in the wild). 2. Discover in parallel — kick off an emergency authenticated scan with the Log4Shell plugin; run a file-system search for log4j-core-*.jar on managed Java hosts; pull vendor advisories for appliances. 3. Stand up detection — deploy a SIEM/network query for outbound LDAP/RMI (JNDI) callbacks from server subnets, which catches any instance already being exploited and flags vulnerable+reachable hosts. 4. Prioritize by exposure — since every instance shares the same CVE/CVSS/EPSS/KEV, rank by asset context: internet-facing and untrusted-data-ingesting assets first, then internal, then isolated. 5. Mitigate the top tier now — for internet-facing instances, deploy WAF signatures for the exploit string and block outbound LDAP/RMI egress (the egress block works even against obfuscated payloads), plus the JndiLookup-removal/config-flag where feasible; apply vendor emergency mitigations for appliances. 6. Communicate — notify the CISO and asset owners of the exposure and actions, set a follow-up cadence, and queue tested patches for the coming days (emergency change control). Keep the detection query live.

29. The inverted backlog. (a) The program sorts by CVSS descending only, so a finding's position reflects intrinsic severity alone; a KEV-listed, EPSS-0.93 flaw with a modest CVSS of 6.4 sinks to position 800 beneath hundreds of higher-CVSS findings that nobody is exploiting — the methodology structurally hides the most dangerous, actively-exploited finding. (b) Corrected prioritization: re-rank by risk = CVSS × (EPSS/KEV) × asset exposure. The VPN finding is on KEV (active exploitation) and internet-facing (maximum exposure) with EPSS 0.93, so it becomes P1-Emergency and surfaces at or near the top, far above the higher-CVSS-but-unexploited findings. (c) Justification to the CISO: "Our backlog is ordered by CVSS, which measures severity, not the risk that we will actually be breached — so an actively-exploited, internet-facing vulnerability is buried at position 800 while we patch things no one is attacking. We need to re-rank the entire backlog by real risk (KEV/EPSS/exposure) immediately, because the current order has us defending the wrong doors." (d) The metric that would have caught it: open KEV exposure (count of KEV-listed vulnerabilities open on internet-facing assets) — tracking it would have made the single KEV-on-VPN finding impossible to miss regardless of its CVSS.


Chapter 24

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, scenario, or tabletop problems best discussed in a group. Several solutions also appear in code form in code/exercise-solutions.py.

1. The six NIST SP 800-61 phases in order: (1) Preparation; (2) Detection & Analysis; (3) Containment, Eradication & Recovery; (4) Post-Incident Activity (the three middle actions are often listed together as one phase, giving the familiar four-phase shape). It is drawn as a loop because post-incident lessons feed directly back into preparation, and because detection/analysis and containment iterate within an incident (you contain what you have scoped, learn the scope is larger, and loop back) — it is a cycle, not a one-pass checklist.

4. Eradication is removing the attacker, their access, tools, and persistence, and closing the initial vector; recovery is the verified restoration of systems to normal operation. Doing recovery before eradication is finished is dangerous because you may restore a system the attacker still controls, or bring systems back through a door (the unpatched vector, the un-rotated credential) that is still open — you simply restart the incident. The mantra for a deeply compromised host is "wipe and reimage, do not disinfect," because you can rarely prove you found and removed every implant or persistence mechanism; rebuilding from known-good media guarantees a clean state.

7. Severity classifications (judgments can vary slightly; justification matters): - (a) reported phishing, no click, blocked → SEV-4 (no compromise; track and trend). - (b) ransomware actively encrypting a file server → SEV-1 (active, destructive, recovery at risk). - (c) contained commodity adware on one non-priv laptop → SEV-3 (single host, contained, low impact). - (d) domain-admin account, impossible-travel, active now → SEV-2 (single privileged account, possibly → SEV-1 if it reaches customer data / core / a DC; declare-ready). - (e) Meridian customer DB appears for sale on a forum → SEV-1 (confirmed/likely customer-data breach with regulatory stakes).

9. The four triage questions for 7(b) (active ransomware on a server): - Is it real? Yes — corroborated by the EDR behavior plus failed backups; not a false positive. - How bad? SEV-1 — active, irreversible encryption on a server; recovery path may be under attack. - How far? Unknown — at least this server; presume potential spread and scope immediately. - What now? Contain immediately (fast/destructive incident) and scope in parallel; declare; assign IC. The question most urgent to drive before acting is normally "how far?" (scope) — but for an active, destructive incident you must not delay containment to finish scoping; you contain immediately and scope in parallel. So: do not let "how far?" block containment here, even though it is the question you otherwise most want answered.

11. Containment posture and the single dominant concern: - (a) active ransomware via domain admin → IMMEDIATE aggressive; dominant concern stop the damage (irreversible loss, recovery under attack; tipping off is moot). - (b) ~6-week-resident nation-state intruder, quiet exfil, no destruction → QUIET scope first, then coordinated containment everywhere at once; dominant concern avoid tipping off (loud action burns known footholds and triggers hidden ones). - (c) one laptop, single contained commodity malware → PROPORTIONATE (isolate the host, standard cleanup/rebuild); dominant concern business continuity / proportionality (no need for drama). - (d) live attacker typing on an internet-facing web server now → IMMEDIATE containment of that host (isolate, kill the session), but with awareness that a capable attacker may have other footholds — contain the active hands-on-keyboard access fast, then scope hard; dominant concern is stop the active intrusion while not assuming it is the only foothold.

13. Example response to "shut it all down right now" for a stealthy intruder with partly-mapped persistence: "Shutting everything down loudly right now will tell a patient, capable attacker that we've found them — and because we have not yet mapped all their persistence, they will burn the footholds we know about and activate the ones we don't, leaving us blind and them still inside. I propose we quietly finish scoping over the next [hours/day] — find every foothold, account, and persistence mechanism — and then contain everywhere simultaneously, so we evict them in one coordinated move rather than playing whack-a-mole. The exception is if we see them moving toward destruction or our crown jewels, in which case we contain immediately."

15. Example phishing-with-credential-harvesting playbook (decision/coordination level; 8–12 steps): 1. Intake/triage. A reported or detected phishing email; validate it is real (headers, URL, sandbox the link safely). Classify severity (SEV-3 baseline; escalate if credentials were entered or a privileged user is involved). 2. Scope the campaign. Search the mail gateway/SIEM for all recipients of the same message; identify who received, who clicked (proxy logs), and who entered credentials (auth logs on the fake-portal pattern, if observable). 3. Declare if warranted. If credentials were harvested for one or more accounts, declare an incident and assign roles (IC, scribe). 4. Contain identities. For each compromised account: force password reset, revoke active sessions and tokens, and require MFA re-enrollment; if privileged, escalate severity and notify per comms plan. 5. Contain the message. Purge the phishing email from all mailboxes; block the sender/domain/URL at the gateway, proxy, and DNS resolver; submit IOCs to detections (Ch.22). 6. Scope the foothold. If any account was used post-compromise, investigate what it accessed and whether the attacker pivoted (treat as a possible larger intrusion — pivot on the identity). 7. Eradicate. Remove any mail rules, OAuth grants, or persistence the attacker added to compromised mailboxes; close the access the harvested credentials gave. 8. Recover & verify. Confirm accounts are secured, sessions revoked, no attacker persistence remains; restore normal access. 9. Communicate. Notify affected users (and, if data was exposed, legal/GRC for any notification analysis); warn the broader workforce of the campaign and the reporting channel. 10. Lessons learned. Feed the campaign into awareness training (Ch.30) and detection tuning; capture metrics; produce action items.

16. Example runbook for "disable the compromised account and revoke its active sessions" (tool-specific; plausible tool names; the playbook says what, this says exactly how): 1. In the identity console (e.g., Entra ID / AD Users & Computers), locate the account by UPN/sAMAccountName. 2. Disable the account (set "account is disabled" / Disable-ADAccount). Approval: on-call IR lead or SOC manager; record the approver and time in the incident log. 3. Revoke active sessions and refresh tokens (e.g., "Revoke sessions" in the identity console / Revoke-AzureADUserAllRefreshToken), so existing tokens cannot continue to be used. 4. For on-prem/Kerberos exposure, reset the password twice (to invalidate cached/forgeable tickets for that principal) if warranted by scope. 5. Verify: confirm the account shows disabled; attempt (in a controlled way) to use a known session and confirm it is now rejected; check sign-in logs for any successful auth after the revocation timestamp (if there is any, escalate — the attacker may hold another credential or a non-revocable token path). 6. Record all actions, timestamps, and the verification result in the incident timeline (the scribe).

17. Example ransom-payment policy paragraph: "Meridian's default posture is not to pay ransomware demands; the bank will recover from its offline, immutable, tested backups. The decision to deviate from this default may be made only by the CEO, in consultation with the CISO, outside legal counsel, and the cyber-insurer, and only after the IR team has determined that recovery from backups is genuinely impossible. Any consideration of payment must first confirm that paying the identified actor is lawful (payments to sanctioned entities are prohibited and may themselves be a violation) and must explicitly acknowledge that payment guarantees neither a working decryptor nor the deletion of any exfiltrated data. No member of the incident-response team is authorized to make or commit to a payment decision during a response."

20. Incident-commander walkthrough of the tabletop injects: - Inject 1 (T+0): Triage (Detect & Analyze). Corroborate the EDR alerts with the failed backups — shadow-copy deletion is a ransomware precursor (ATT&CK "Inhibit System Recovery"). Classify SEV-1. Declare the incident; assign IC and scribe; open the war room and out-of-band channel; invoke the ransomware playbook. - Inject 2 (T+10): Scope + contain. A single compromised privileged account reached all three servers → presume domain-wide reach; pull the privileged inventory (Ch.19). Decision: immediate aggressive containment — isolate the three servers in EDR (powered, for evidence), disable the account and force-revoke its sessions/Kerberos tickets, block C2. Authorize in parallel. - Inject 3 (T+30): Comms + legal + the clock. Encryption confirmed; ransom note with 72-hour double-extortion threat. Run workstreams in parallel: continue technical scoping (did data leave? check egress/DLP); engage legal and the cyber-insurer (privilege; coverage); start the 36-hour determination clock and assess state breach-notification exposure if customer data left; surface the pay-or-not question but do not decide it in the room (escalate per policy). - Inject 4 (T+2h): Eradicate + recover (planning). Initial vector found (exposed admin interface + reused, no-MFA password); no confirmed bulk exfiltration. Plan eradication: rebuild the servers from known-good media, rotate all privileged creds + krbtgt double-reset, remove the interface exposure and any persistence, strip the over-privileged account's rights. Plan recovery: restore from offline, immutable backups, staged by priority, with two-week heightened monitoring. - Inject 5 (debrief — three action items): e.g., (1) remove domain-admin from the service account and enforce JIT (owner, 30 days); (2) write/validate the krbtgt-reset runbook (owner, 30 days); (3) add the out-of-band comms app and saved hunting queries to SOC onboarding (owner, 14 days).

23. (a) Narration: a service-like account (svc_report) authenticates to APPSRV03 for the first time; the foothold runs whoami /priv and enumerates Domain Admins (discovery); it then authenticates to the domain controller DC01 (lateral movement); on DC01 it uses ntdsutil ... ifm to create a copy of the Active Directory database (credential access — this dumps all domain credentials); finally it exfiltrates ~412 MB to an external host. (b) ATT&CK tactics by line: line 1 → initial foothold use; lines 2–3 → Discovery; line 4 → Lateral Movement (with Credential Access intent); line 5 → Credential Access (NTDS/AD database); line 6 → Exfiltration. (c) The AD-database copy (line 5) means every domain credential should be presumed compromised — the scope is domain-wide, and eradication must include resetting all credentials and the krbtgt account (twice). (d) Two most urgent containment actions: (1) disable svc_report, revoke its sessions, and invalidate Kerberos tickets domain-wide; (2) isolate DC01 and block the exfiltration destination — then plan the full domain credential reset.

26. The 36-hour-clock timeline (judgment required — defend your reading of "determination"): - 07:00 — an EDR alert. No "determination" yet; this is an unconfirmed signal. The clock has not started. - 09:30 — ransomware confirmed on internal servers + a ransom note. This is the most defensible point at which a "determination" that a qualifying computer-security incident has occurred is made — a confirmed compromise materially disrupting/affecting operations. Start the 36-hour federal banking notification clock here (notification due by ~21:30 the next day). In the room for this determination: the IC, the CISO, legal counsel, and GRC. Notification triggered: the primary federal regulator. - 14:00 — customer data likely but not confirmed exfiltrated. This affects the state breach-notification / customer analysis (and may sharpen the federal one), but you should not wait for certainty about exfiltration to make the banking determination — the qualifying determination was already reasonable at 09:30 based on the operational impact. Waiting for exfiltration certainty risks blowing the 36-hour window. The state/customer notifications follow their own (often longer) clocks once the data determination firms up. The key judgment: "determination" is not "complete certainty about every fact"; it is the point at which a reasonable institution concludes a qualifying incident has occurred. Treating it as "once we know everything" is the error that misses statutory deadlines.


Chapter 25

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class. All scenarios are illustrative (Tier 3); IPs use documentation ranges.

1. Order of volatility — the rule to collect evidence from most-fragile (RAM) to least-fragile (backups) so durable collection doesn't destroy fragile evidence first. Chain of custody — the documented, unbroken record of who handled evidence, when, and why, proving it is unaltered. Write blocker — a device/mode that allows reads of an evidence drive but blocks all writes, protecting the original. Forensic artifact — data left by normal activity (registry, $MFT, logs, Prefetch) that reveals what happened. Combined: "Imaging a compromised laptop, you capture RAM first per the order of volatility, then read the disk through a write blocker so the original is never altered, analyze its forensic artifacts on the verified image, and log every handoff in the chain of custody."

4. Two reasons to analyze the image, not the original: (1) analysis can inadvertently alter data (mounting, opening files, tools writing), so you preserve the original sealed as the authoritative source that can be re-hashed at any time to prove it is untouched; (2) you may need to re-run analysis or have a second examiner reproduce it, which requires an unaltered baseline. If the only image fails its post-acquisition hash check, the image is unusable — do not analyze it; re-image from the original (still sealed), and investigate why it failed: a write-blocker failure, a failing/bad-sector drive, or a tooling error. Document the failure and the re-acquisition.

7. Ordered preservation for a live, compromised, non-destroying Windows server: 1. Confirm/establish isolation (network-contain) so it can do no further harm, but leave it powered on. 2. Capture memory first with a trusted tool to an external drive (most volatile — processes, connections, injected code, in-memory secrets). 3. Document live volatile state (running processes, network connections, logged-on users) — ideally derived from the capture. 4. Power down for disk acquisition (or image live if the situation requires, but cold imaging is cleaner here). 5. Image the disk through a write blocker, bit-for-bit. 6. Hash source and image (SHA-256); confirm match. 7. Start chain of custody; seal the original; analyze the image. Memory capture comes before disk imaging because memory is destroyed by power-off and the disk is not — the order of volatility. Disk imaging comes before any rebuild/recovery so the on-host evidence survives.

9. Pulling the power destroyed: the contents of RAM (running malware processes, decrypted payloads, injected code, credentials/keys held in memory), live network connections (active C2 sessions, exfiltration in progress), the ARP cache and routing state, and open files / temp data that existed only in the running system. All of items 1–4 in the order of volatility are gone, irreversibly. Coaching (two sentences): "Your instinct to act fast and freeze the scene is exactly right — that urgency is what good responders have. The fix is just the order: on a live machine we capture memory first and only then power off, because pulling the plug erases the most valuable evidence of an active intrusion before we've collected it."

12. (a) An attacker brute-forced/guessed the local administrator password over RDP: two failed RDP logons (4625, type 10) at 08:01–08:02 from 203.0.113.7, then a successful RDP logon (4624, type 10) from the same source at 08:02:31, followed by installation of a service "UpdateSvc" pointing at C:\Users\Public\u.exe (a suspicious path) at 08:05, and the Prefetch shows u.exe ran. (b) The pair 4625 (failed) immediately followed by 4624 (success) from the same source, same account, same logon type, in seconds — failures then a success is the signature of a successful password-guessing attempt. (c) Prefetch adds proof that u.exe actually executed (run count, last-run time) — the event log shows the service was installed/configured, but Prefetch shows the binary ran. (d) Controls: disable/ restrict RDP exposure (no internet-facing RDP), require MFA for remote access, account lockout/ rate-limiting on failed logons, and a strong local-admin password (e.g., LAPS) — Chapters 16/11.

14. (a) An attacker authenticated over SSH as deploy from 203.0.113.7 at 08:30, escalated to a root shell via sudo at 08:31, and established persistence via a cron job. (b) The empty .bash_history on an account that clearly ran commands indicates the history was deliberately cleared or disabled (e.g., unset HISTFILE) — anti-forensics. It is not a dead end: the absence is itself evidence of intent to hide, and the auth.log (login + sudo) and the cron entry reconstruct the activity independently of the missing history; off-host sources (NetFlow, SIEM) would corroborate further. (c) The cron entry */10 * * * * /tmp/.cache/beacon.sh is a persistence/command-and-control beacon: it runs a hidden script (in /tmp/.cache, a dot-directory) every 10 minutes, representing the persistence (and likely C2) stage of the attack.

16. The absence of expected evidence is itself evidence because normal system operation produces predictable traces; their disappearance requires an explanation, and the most common explanation in an intrusion is deliberate destruction. Two examples: (1) Windows Event ID 1102 — the Security log being cleared, plus an abnormal gap in otherwise continuous logging, tells you an attacker was active at that host and tried to cover tracks at that time (it narrows where and when to investigate, and confirms intent). (2) A file present in Prefetch (it ran) but absent from disk (deleted) tells you a tool was used and then removed — confirming attacker tooling and a secure-deletion/anti-forensic step, and telling you to hunt for the tool elsewhere (other hosts, the SIEM, memory).

17. Merged UTC timeline:

2025-06-01 09:03:11  VPN     VPN connect user=mwong src=198.51.100.20   <- initial access
2025-06-01 09:14:50  WinEvt  network logon svc_sql
2025-06-01 09:14:58  MFT     C:\ProgramData\p.exe created
2025-06-01 09:15:22  WinEvt  service installed "Helper" -> p.exe

Probable initial access: the VPN connect by mwong at 09:03:11 — the earliest related event, in a different system from the host artifacts, and the entry from which the host activity follows.

19. (a) Trusting only the local log is dangerous because the attacker cleared it at 03:02; the cleared log would make the activity appear to start at 03:02, understating the intrusion and risking under-scoping (you would miss the earlier compromise and any hosts touched before 03:02). (b) Weight the off-host sources most heavily — the VPN log and NetFlow — because they live in systems the attacker on the host could not reach or alter; append-only/off-host evidence is the hardest to tamper with. (c) Corrected initial-access time: 02:02 (the VPN log's earliest related event), not 03:02 — roughly an hour earlier than the on-host log suggested, with related network activity at 02:30.

21. Scoping pivots, in order, with the indicator expected at each step: 1. Pivot on the malicious file hash → search the whole fleet (EDR/SIEM) → expect to find the file on additional hosts (e.g., WS-14, WS-22, DB-07). 2. Pivot on which account placed/ran the file → identify the compromised account (e.g., svc_backup) → expect more hosts where that account was used. 3. Pivot on where that account logged on from / the entry point → trace to the initial-access vector (e.g., a VPN session by a user account) → expect the entry IP and credential. 4. Pivot on the entry IP / credential → confirm the source (stolen credential, external IP) → expect the root cause (e.g., VPN without MFA). This process converges on the root cause (the initial-access gap) while enumerating the full scope (every host, account, and data store touched).

23. (a) Root cause: a VPN account protected by a password alone, with no multi-factor authentication, allowed an attacker with a stolen credential direct internal access, from which they pivoted and spread. (b) Rebuilding FS-01 addresses only the proximate symptom (the encrypted server); it does nothing about the entry gap, so the same stolen-credential path remains open and the bank can be breached again the same way. (c) The fix at root-cause level is phishing-resistant MFA on the VPN (and review of all remote-access/service-account paths) — the control owned by Chapter 16 (Authentication), the same class of control that saved Meridian from the Chapter 1 phishing attempt. (d) The RCA finding feeds the lessons-learned phase of incident response (Chapter 24), updates the risk register (Chapter 1's first artifact), and may reshape authentication/identity controls in the program.

26. Forensic errors in the admin's sequence (at least six), with what each destroyed/contaminated: 1. Logged in as domain admin to "look around" → created new logon artifacts, altered access times, and introduced a privileged session that contaminates the scene (and risks spreading the compromise). 2. Opened and read suspicious files → changed file access timestamps, muddying the $MFT/MACB timeline and the very files that were evidence. 3. Ran a full antivirus scan → touched/quarantined/possibly deleted evidence files and altered timestamps across the disk en masse. 4. Deleted the malware → destroyed the primary malicious artifact needed for analysis, attribution, and scoping (hash to pivot on). 5. Rebooted the server twicedestroyed all volatile memory evidence (running processes, C2 connections, in-memory keys) — irreversible. 6. Waited 90 minutes to call the SOC → delayed preservation while state continued to change; more evidence decayed. Some damage is irreversible: the memory contents and the original timestamps cannot be recovered, and the deleted malware may be unrecoverable if also wiped. Corrected first-30-minutes procedure: do not log in or touch the box beyond what's needed to preserve; isolate it from the network (contain) but leave it powered on; capture memory with a trusted tool to external media; image the disk through a write blocker and hash source and image; start the chain of custody; engage the SOC/IR (and counsel if needed) immediately; and only then analyze the verified image — never the live original.


Chapter 26

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Policy — a high-level, mandatory statement of management's intent and direction (the what and why), board/executive-approved and technology-neutral. Standard — a mandatory, specific, testable rule that implements a policy (the how much / which), naming versions and thresholds. Procedure — a mandatory, step-by-step set of instructions for a task in conformance with a standard (the exactly how). Guideline — a recommended, non-mandatory practice (advice). The single distinguishing property: a guideline is the only non-mandatory tier — it recommends ("should/consider"), while policy, standard, and procedure all mandate ("must").

4. A control owner is the named individual or role accountable that a specific control is designed, operated, and remains effective. It is not always the performer: at Meridian the IAM analyst runs the quarterly access review (Responsible), but the IAM team lead may be the control owner (Accountable that the review happens, is complete, and is evidenced). Accountability does not delegate the way work does because accountability is the single point of answerability — the one role that cannot say "not my job" — whereas the task can be handed to anyone. You can delegate doing; you cannot delegate being answerable for the result.

7. The six NIST CSF 2.0 Functions are Govern, Identify, Protect, Detect, Respond, Recover. Govern is new in 2.0 (CSF 1.1 had only the latter five). Its addition matters for governance because NIST thereby elevated governance — organizational context, risk-management strategy, roles and responsibilities, policy, and oversight — from an implicit assumption to a top-level function, formally recognizing that the other five functions fail without it. A framework that now leads with Govern is the field codifying this chapter's thesis: a pile of Protect/Detect controls is not a program until it is governed.

10. Example rebuttal: "A GRC platform is a filing cabinet, not governance — it can store policies, track review dates, and route approvals, but it cannot decide what your policies should say, assign a human being to own each control, or enforce anything. Governance is the process of deciding, documenting, owning, reviewing, and enforcing, which only your people can perform; the platform just holds the artifacts that process produces. Buying it before doing the process gives you an empty, well-organized cabinet — which is exactly the security is a process, not a product trap."

12. A defensible RACI for "approve and publish a new security standard":

Activity: Approve and publish a new security standard
  Board:        I   (informed; the board governs policy/appetite, not each standard)
  CISO:         A   (accountable — standards are approved at the CISO/committee level)
  GRC Analyst:  R   (responsible — drafts, formats, routes, publishes, logs the review date)
  Engineer:     C   (consulted — must confirm the standard is technically implementable/testable)
  System Owner: C   (consulted — bears the operational impact of the requirement)

Reasoning: the single A is the CISO, because a standard (unlike a policy) is approved at the CISO/committee level, not the board — the board is merely Informed. The Engineer and System Owner are Consulted (their input is needed before approval so the standard is feasible and its impact understood), whereas the Board is Informed (told after, as oversight) rather than consulted, because involving the board in every standard would collapse the governance/management distinction.

14. A defensible RACI for "accept a residual-risk exception to the hardening standard":

Activity: Accept a residual-risk exception (legacy system cannot meet the baseline)
  Board:        I   (informed; exceptions surface in board reporting, board doesn't approve each)
  CISO:         A   (accountable — owns risk-acceptance within the board's risk appetite)
  GRC Analyst:  R   (responsible — documents the exception, compensating control, time-box, register)
  System Owner: R   (responsible — co-signs the residual risk; bears the business consequence)
  Engineer:     C   (consulted — proposes/validates the compensating control)

The System Owner shares Responsibility because they are requesting the deviation and bear its business consequence, so they must co-sign the residual risk being accepted on their system — accountability for the business outcome cannot sit only with security. The CISO is Accountable because risk acceptance is a security-program decision calibrated against the board's risk appetite; the CISO is the role empowered (by the charter) to accept defined residual risk on the program's behalf and answer for it.

16. Example opening clause of Meridian's Information Security Policy (technology-neutral):

Purpose. Meridian Regional Bank is committed to protecting the confidentiality, integrity, and availability of the information and systems entrusted to it by its customers, employees, and partners. This policy establishes the Bank's intent and direction for information security and authorizes the standards, procedures, and controls that implement it. Scope. It applies to all employees, contractors, and third parties who access Bank information or systems, and to all information and systems the Bank owns or operates, regardless of location or technology. Authority. Information security is a responsibility of the Board and executive management; the Board sets the Bank's risk appetite and provides oversight, and the Chief Information Security Officer is authorized to develop, maintain, and enforce the supporting standards and procedures, and to grant or deny exceptions in accordance with that risk appetite. Compliance is mandatory.

The test: no algorithm, length, product, or version appears, so nothing here will be wrong in five years; every specific lives downstairs in a standard. The "Authority" sentence doubles as the security charter grant.

19. Rewrite — durable intent stays in the policy; specifics move down:

  • Policy (keep, generalized): "Meridian protects its information assets. Workstations must lock automatically when left unattended; authentication and endpoint protection appropriate to the risk are required." (Durable intent; technology-neutral.)
  • Standard (moved): "Passwords must be at least 14 characters and checked against a breach corpus (note: a fixed-length 'exactly 12, changed every 90 days' is itself outdated guidance — modern standards prefer length + breach-checking over forced rotation)." — belongs in the Authentication Standard.
  • Standard (moved): "Endpoint protection: the approved EDR agent, current supported version." — the specific product/version (CrowdStrike Falcon v7+) belongs in an Endpoint Protection Standard, never the policy, because a vendor change must not require board re-approval.
  • Procedure (moved): "To lock a workstation, press Win+L before leaving the desk." — the keystroke is a procedure/user-instruction, not policy.

Tell that the original was mis-tiered: each moved item is something a vendor, version, or keystroke change would force you to edit — the §26.2 signal that content is too specific for the policy tier.

21. Mapping controls to CSF Functions:

  Asset inventory (Ch.1)        -> IDENTIFY
  Phishing-resistant MFA (Ch.16)-> PROTECT
  Centralized SIEM (Ch.21)      -> DETECT
  IR plan + playbooks (Ch.24)   -> RESPOND
  (no control defines roles)    -> GOVERN  <-- THE GAP

GOVERN is the Function left with no control — which is exactly the gap the chapter's opening examiner found and the policy_coverage checkpoint surfaced (the bank had Protect/Detect/Respond controls and no governance of roles). Coverage of this five-Function slice is 80% with GOVERN as the named gap; the fix is the governance structure, RACI, and policy set this chapter builds.

22. (non-daggered, but the computation:) controls cover {PR.AA, DE.CM, RS.MA}; framework [GV.RR, ID.AM, PR.AA, DE.CM, RS.MA]. Covered (in framework order) = PR.AA, DE.CM, RS.MA (3 items); gaps = GV.RR, ID.AM (2 items); pct = 100 × 3 / 5 = 60.0%.

23. Both facts are true because coverage measures whether a control is mapped to each framework item, not whether that control is effective. A program can map a written control to every framework item (100% coverage — the floor: no topic was forgotten) while those controls are misconfigured, unowned, stale, or simply not operating well (no ceiling: effectiveness). Coverage answers "did we address each required topic at all?"; it does not answer "do our controls actually work?" This is theme 5 — compliance is the floor, not the ceiling — in one metric: 100% coverage is necessary but never sufficient, which is why mature programs report coverage and, separately, a maturity/effectiveness view.

24. Three distinct governance defects in the patching standard: 1. Last reviewed 41 months ago → lifecycle stage 5 (review & maintain) failure: the document is stale; nobody re-confirmed it against current threats/law. (An auditor flags this directly.) 2. Permits a 180-day SLA for critical vulnerabilities → the content is now dangerously lax (a consequence of the stale review); critical vulns are often exploited within days. This is the defect an attacker most directly benefits from — the stale standard blesses leaving critical holes open for half a year, so an engineer "complying" is exposed exactly as long as the attacker needs. 3. Owner field reads "Security Team" → an ownership/RACI failure: collective ownership is no ownership (zero real Accountable), the orphaned-control pattern; there is no single name to drive the review or answer for the SLA. The attacker benefits most from #2 (the lax SLA), but #1 and #3 are why #2 was allowed to persist — governance defects compound.

27. The phantom control. The document is fine; the governance around it is the problem. A Vendor Offboarding Procedure that exists on paper but cannot answer the four questions is not a real control because: - (1) No named owner → orphaned control (no Accountable in the RACI); nobody is answerable that it runs. Fix: name a control owner (e.g., a Vendor Access Owner) — exactly one Accountable. - (2) No review date → lifecycle stage 5 failure; the procedure may be stale and no one is keeping it current. Fix: set and track a review cadence with a next-review date. - (3) No evidence of execution → lifecycle stage 4 (implement/enforce) failure; a procedure not executed (and not evidenced) is indistinguishable from one that doesn't exist. Fix: run it on a defined trigger (contract termination) and on a schedule, recording evidence each time. - (4) No parent standard/policy → the document doesn't trace up the hierarchy, so it has no authority and isn't tied to a mandate. Fix: link it to a Vendor Access Management Policy and a deprovisioning standard. Minimum fixes to make it real: assign a single Accountable owner; tie execution to a trigger plus schedule with retained evidence; set a review cadence and date; and connect the procedure to a parent standard and policy. Note that not one of these fixes is "rewrite the document" or "buy a tool" — the phantom is a governance vacuum around a perfectly good page, which is precisely the Case Study 2 breach in miniature.


Chapter 27

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class. Calculation results are also reproduced in code/exercise-solutions.py. All figures illustrative.

1. Risk management is the continuous, organization-wide program of identifying, analyzing, treating, and monitoring risk. A risk assessment is a point-in-time activity within that program that identifies and analyzes risks to produce a prioritized picture. Risk management is continuous; the assessment is point-in-time.

4. Inherent risk is the level of risk before any controls are applied (the raw exposure); residual risk is the risk that remains after controls. Naming both makes a treatment decision auditable because it shows how far the treatment moved the risk and what was knowingly left on the table — an examiner or board wants to see not just the end state but the magnitude of the reduction and the accepted remainder.

7. AV $1,500; EF 1.0 (theft loses the full value, data included); ARO 8/year. $SLE = 1{,}500 \times 1.0 = \$1{,}500$. $ALE = 1{,}500 \times 8 = \$12{,}000$/year.

9. Compromise risk on a \$2,000,000 DB. Before: EF 0.30, ARO 0.5. $SLE_{before} = 2{,}000{,}000 \times 0.30 = 600{,}000$; $ALE_{before} = 600{,}000 \times 0.5 = \$300{,}000$/yr. After MDR (EF→0.10, ARO→0.1): $SLE_{after} = 2{,}000{,}000 \times 0.10 = 200{,}000$; $ALE_{after} = 200{,}000 \times 0.1 = \$20{,}000$/yr. Net value $= 300{,}000 - 20{,}000 - 120{,}000 = +\$160{,}000$/yr. Buy it — it cuts expected loss by \$280K and costs \$120K, a net annual benefit of \$160K.

11. Risk X: $ALE = 50{,}000 \times 6.0 = \$300{,}000$. Risk Y: $ALE = 900{,}000 \times 0.25 = \$225{,}000$. (a) X = \$300K, Y = \$225K. (b) prioritize() sorts by ALE descending, so X first, despite Y's far larger per-event loss. (c) Both were qualitatively HIGH, hiding their difference; the quantitative view shows X (frequent, small) carries \$75K more expected annual loss than Y (rare, severe). The quantitative view breaks the tie the equal bands obscured — though a prudent analyst also notes Y's single-event severity (\$900K) is a tail risk worth a separate look, since one Y event could exceed a year's worth of X.

13. (a) Transfer — insurance shifts the financial impact to a third party. (b) Avoid — declining the product eliminates the risk by not undertaking the activity. (c) Mitigate — MFA applies a control that lowers likelihood. (d) Accept — a documented, owned, time-bounded decision to live with a low residual risk.

16. The four things that would have made the insurer's silent acceptance defensible, each as a register sentence: 1. Explicit decision: "We accept the risk of running the unsupported claims-portal framework until its scheduled replacement on [date]." (It was never stated as a decision.) 2. Documented + owned: "Risk owner: [named executive with budget authority]; recorded in the risk register as R-PORTAL-EOL." (No record or owner existed.) 3. Compensating controls: "Interim controls: WAF in front of the portal, enhanced monitoring, virtual patching of known framework CVEs." (None were added, so residual = full inherent risk.) 4. Time-bounded review + trigger: "Review monthly, and immediately upon publication of any relevant CVE for the framework." (No review date existed, so the scanner finding three months out was never acted on.)

18. Example enterprise register rows (new risks; format follows §27.5 — judgments vary): 1. R8 — Cloud storage misconfiguration exposes data. Threat: opportunistic scanners / external attackers. Vuln: an over-permissive S3 bucket policy in the AWS footprint. Asset: customer documents in cloud storage. Inherent: L3 × I5 = 15 CRITICAL. Treatment: mitigate — CSPM guardrails + bucket policy remediation + block-public-access. Residual: L1 × I5 = 5 MEDIUM. Owner: Head of Cloud Engineering. Status: in treatment, due Q3. Review: quarterly. 2. R9 — Ransomware encrypts on-prem file servers. Threat: RaaS affiliate. Vuln: flat internal network + incomplete EDR coverage. Asset: shared file servers / operations. Inherent: L3 × I5 = 15 CRITICAL. Treatment: mitigate (segmentation, EDR, tested backups) + transfer (cyber-insurance tail). Residual: L2 × I4 = 8 HIGH. Owner: Head of IT Infrastructure. Status: in treatment. Review: quarterly. 3. R10 — Insider exfiltrates customer data before departure. Threat: departing employee. Vuln: broad standing access + no egress DLP. Asset: customer PII. Inherent: L2 × I5 = 10 HIGH. Treatment: mitigate — least-privilege + DLP + offboarding access removal. Residual: L1 × I5 = 5 MEDIUM. Accept residual. Owner: Head of HR + CISO (joint). Status: in treatment. Review: semi-annually.

21. Complete register row for the DDoS risk (§27.3 worked example): - ID: R6-ddos-outage. - Description: Extortion-driven DDoS attack takes the online-banking platform offline, causing an outage, revenue loss, and customer attrition. - Asset: Online-banking availability. - Threat/vuln: DDoS crews / limited volumetric-attack absorption at the edge. - Inherent: L4 × I4 = 16 CRITICAL; ALE \$2,000,000/yr (SLE \$1,000,000 × ARO 2.0). - Treatment: Mitigate with a managed DDoS/CDN service (net value +\$1.65M/yr) + transfer the catastrophic tail via cyber-insurance. - Residual: ALE ≈ \$100,000/yr (SLE \$200,000 × ARO 0.5), within the "≤ \$150K/yr availability" tolerance — accept the residual. - Owner: Head of Digital Banking + CIO. - Status/due: In treatment; service live by Q3. - Review: quarterly, and on any active DDoS-extortion campaign targeting regional banks.

23. Example regional-hospital risk-appetite statement (abridged; appetite + tolerance per category): - Patient-safety systems (clinical devices, monitoring): appetite Very low. Tolerance: zero tolerance for any unmitigated risk that could affect patient safety or device availability during care; any such risk is escalated to the CMO and CIO immediately. - Patient data / privacy (PHI, HIPAA): appetite Very low. Tolerance: no unmitigated PHI risk above MEDIUM; any breach-event ALE > \$250K triggers board/compliance notification. - Clinical-system availability (EHR, scheduling): appetite Low. Tolerance: planned downtime only in maintenance windows; unplanned outage risk with residual ALE > \$200K/yr requires CIO sign-off. - Research / innovation (new digital-health pilots): appetite Moderate. Tolerance: new risk accepted only with a named owner, compensating controls, and a 90-day review. Justification: patient-safety and privacy appetites are far lower than research/innovation because the harm is irreversible (safety) or severely regulated and trust-destroying (privacy), whereas an innovation pilot's downside is contained and reversible.

26. Board-level rewrites (money/customers/regulation/decision, not the technical detail): (a) "Our remediation backlog includes a small number of high-severity issues on systems that touch customer data; the most urgent, if exploited, could lead to a data breach with regulatory and reputational cost. We have prioritized the top items by risk and request approval to accelerate three fixes this quarter." (b) "An internet-facing system handling card data uses outdated encryption that puts us out of step with PCI requirements and exposes card data in transit; we recommend remediation within 30 days to avoid a compliance finding and reduce breach risk." (c) "Our monitoring currently cannot detect a key class of attacker behavior — the lateral movement an intruder uses after a breach — which means we could miss an attacker already inside. We propose closing this gap as a priority detection investment."

30. Rapid risk assessment of a Friday-afternoon KEV. Inherent risk: an internet-facing system with an actively-exploited, critical (CVSS 9.8) vulnerability — likelihood is now near-certain (it is being exploited in the wild), impact severe → inherent CRITICAL. Treatment options tonight: (1) Mitigate with compensating controls that do not require the risky patch — a WAF/virtual-patch rule for the specific exploit, tightened firewall/ACL exposure, or temporarily taking the service offline if business permits; (2) plus a documented short-term acceptance of the residual until the patch can be tested and deployed in a controlled window, with a hard deadline (e.g., patched by Monday EOD). Register entry: a new row with the inherent CRITICAL rating, the compensating control applied tonight, the residual rating after it, a named owner, and a review/patch deadline. Tell leadership in business terms: "An actively-exploited flaw affects an internet-facing system; we have applied an interim control to reduce the risk tonight and will fully patch in a controlled window by Monday; here is the residual risk in the interim." (Full vulnerability-prioritization method — CVSS + KEV + EPSS + asset context — is Chapter 23.)

32. The cooked quantitative analysis. (a) Vendor's math: $SLE = 10{,}000{,}000 \times 0.8 = 8{,}000{,}000$; $ALE = 8{,}000{,}000 \times 1.0 = \$8{,}000{,}000$/yr; claimed net = \$8M − \$0 − \$0.5M = +\$7.5M. (b) Three problems: (i) the inputs are inflated and unsupported — EF 0.8 and ARO 1.0 are suspiciously round, worst-case figures presented as fact with no basis; (ii) the "reduces ARO to 0.0" claim is false — no control eliminates a risk, so the residual ALE is treated as zero when it is not; (iii) the analysis ignores residual risk entirely, so the "savings" equal the entire inflated ALE. (c) Defensible redo: keep EF 0.8 but use a credible ARO (say 0.3) and a control that reduces ARO to 0.1, not 0: $ALE_{before} = 8{,}000{,}000 \times 0.3 = \$2{,}400{,}000$; $ALE_{after} = 8{,}000{,}000 \times 0.1 = \$800{,}000$; net $= 2{,}400{,}000 - 800{,}000 - 500{,}000 = +\$1{,}100{,}000$/yr. The product may still be worth it, but its real net value is roughly a seventh of the vendor's claim — and the inflated inputs and "reduces to zero" assumption are exactly the false-precision traps §27.2 warns about. The lesson: a confident dollar figure is the most persuasive and most dangerous kind of input; always interrogate the EF, the ARO, and the residual.


Chapter 28 — Worked Solutions to Daggered (†) Exercises

Full solutions to the daggered exercises in exercises.md. The non-daggered exercises are answered against the chapter's worked models (the crosswalk in §28.4, the audit-readiness workflow in §28.5) and the code in code/exercise-solutions.py. Throughout, remember the second question: not just "does this pass?" but "does this defend?"


A2 † — Match obligation to data/activity; name an org subject to all three

  • PCI-DSS → triggered by storing, processing, or transmitting payment-card (cardholder) data.
  • HIPAA → triggered by handling U.S. protected health information (PHI) as a covered entity or business associate.
  • GDPR → triggered by processing the personal data of people in the EU.

An organization subject to all three at once: a telehealth company that takes card payments and serves EU patients — it handles PHI (HIPAA), processes EU residents' personal data (GDPR), and accepts card payments (PCI-DSS). (Other valid answers: an EU-serving pharmacy chain with online card payments; a multinational health insurer with a payment portal.)


A5 † — Certification vs. attestation, in plain language

Plain-language version for an executive: "A certification is like a driver's license — an accredited body tested us against a standard and issued a pass/fail credential; we either have ISO 27001 or we don't. An attestation is like a home inspector's report — an independent CPA examined our controls, wrote down exactly what they checked, noted anything they found, and gave their professional opinion. Our SOC 2 is that report. A customer reading our certificate just sees 'passed.' A customer reading our SOC 2 report can see the details and any exceptions and decide for themselves whether they're comfortable."

Why an attestation can be more informative: a certificate compresses everything into a binary pass/fail, hiding the detail; an attestation report exposes the scope examined, the criteria covered, the testing performed, and any noted exceptions — so a careful reader gets a richer, more honest picture and can judge whether the controls fit their risk, rather than trusting a single stamp.


B1 † — Crosswalk row for "cardholder data encrypted in transit (strong TLS)"

CONTROL: All cardholder data is encrypted in transit using strong TLS.

  Framework      Requirement AREA satisfied (describe area; no invented numbers)
  ──────────     ────────────────────────────────────────────────────────────
  NIST CSF       Protect — Data Security: data-in-transit is protected
  ISO/IEC 27001  Cryptography / secure transmission of information
  PCI-DSS        Encrypt cardholder data when transmitted across open/public nets
  SOC 2          Confidentiality (and Security): protected data in transit

EVIDENCE/ARTIFACT (one artifact for all four):
  A current TLS configuration scan report for the in-scope endpoints showing
  strong protocol/cipher settings (no weak protocols), plus the certificate
  inventory and the list of endpoints the scan covered.

Note the single artifact serves all four columns — that is the crosswalk's whole payoff. Also note the difference in the PCI cell: PCI is specifically concerned with cardholder data across open/public networks, a narrower framing than the others' general "data in transit." Reading that difference is how you catch a seam (e.g., an internal-only segment you assumed was safe).


B3 † — Find the two problems with the junior analyst's crosswalk

CONTROL: "MFA is enabled for some applications."          <-- problem 1
  ... all cells marked ✓ ...
EVIDENCE: "We sent an email telling everyone to turn on MFA."  <-- problem 2

Problem 1 — the control is partial and the crosswalk hides it. "MFA for some applications" is marked "✓" against PCI-DSS ("strong authentication into the CDE") and HIPAA. But if MFA is not enforced on the specific in-scope systems (the CDE; the systems touching PHI), the control does not actually satisfy those requirements. A crosswalk that marks a partial control "✓" produces a false sense of coverage — the §28.4 danger that a fully-✓ table can describe an insecure system. The fix: the cell should be "✓" only where MFA is enforced on the in-scope systems, and "partial/gap" otherwise.

Problem 2 — the "evidence" proves nothing. "We sent an email telling people to turn on MFA" is not evidence that MFA operates; it is evidence that a request was made. An auditor credits what you can show the control doing, not what you asked users to do. The real artifact is a configuration export showing MFA enforced (not optional) plus a sign-in-log sample showing it actually challenged users, with no bypass — design and operation.

Together these are the two classic crosswalk failures: marking a partial control complete, and confusing an intention for an artifact.


C1 † — Define scope; why "in needs evidence, out must be provably out"

Scope (compliance sense) is the precisely defined boundary of the people, processes, systems, and data to which a compliance obligation applies.

Why the operational core is "everything in scope must have evidence, and everything out must be provably out": an audit is, mechanically, a check that every in-scope requirement is met and that the in-scope set is correctly bounded. If something in scope lacks evidence, it is an automatic shortfall. If something is claimed out of scope but is actually connected or data-bearing, the claim is false and becomes a finding — and, worse, in a post-breach examination an indefensible out-of-scope claim is exactly the kind of thing that turns a bad day into a catastrophe. So scope work is two jobs at once: gathering evidence for what is in, and proving the boundary of what is out (network diagram, data-flow map, segmentation test). A boundary asserted but not proven is not a boundary.


C3 † — Is the "no card numbers" analytics database really out of scope?

No — it is in scope as described. PCI-DSS scope includes systems that store/process/transmit cardholder data and any system connected to or able to affect the security of those systems. The analytics database pulls nightly from the payment system over an open path, and the payment system trusts it — so the analytics database can reach and affect the CDE. The absence of card numbers on the analytics database does not remove it from scope; the connectivity pulls it in. An attacker who compromises the analytics database has a trusted path toward the payment system.

To legitimately put it out of scope, change the architecture so it can no longer reach or affect the CDE. Concretely: cut the direct open path; replace the nightly pull with a one-way, tokenized or de-identified feed (so no cardholder data and no inbound trust crosses the boundary); place a default-deny control between the segments; and prove the isolation with segmentation testing. Once there is no path by which the analytics database can affect the CDE, it is genuinely out of scope — and, not coincidentally, no longer a bridge an attacker can use. The scope reduction and the security fix are the same action.


D1 † — One "operated over time" artifact per claimed control

  • (a) "We patch critical vulnerabilities within 30 days." → A report or ticket export covering the last several months showing, for each critical vulnerability, the discovery date and the remediation date, demonstrating the 30-day SLA was met (with any exceptions documented). (A patch policy alone shows only design.)
  • (b) "We require MFA for admin access." → A sign-in / authentication log sample over a defined window showing admin sign-ins were challenged for MFA with no successful bypass, alongside the enforcement-policy export. (The policy alone shows only design.)
  • (c) "We review access quarterly." → The completed access-review records for the last several quarters, each with reviewer name, date, the accounts reviewed, and the decisions (kept/revoked). (A review procedure alone shows only design.)
  • (d) "We back up the core database nightly." → Backup job logs over a period showing successful nightly completion, plus a record of at least one successful test restore (a backup you have never restored is an untested assumption, not a proven control).

The pattern: in every case the operated-over-time artifact is a record produced by the control running repeatedly, not the document that describes the control's intent.


D3 † — Designed vs. operated; why Type II demands operated

Designed evidence shows a control is set up correctly at a moment: the policy exists, the configuration is in place, the tool is enabled. Operated evidence shows the control actually ran consistently over time: a log of instances, a quarter of records, a year of tickets. A control can be perfectly designed and never run — which proves almost nothing about real protection. A SOC 2 Type II review specifically assesses operating effectiveness over a period (commonly 6–12 months), so it demands operated evidence; this is exactly why Type II is the report enterprises trust and Type I is not enough.

For "departing employees are deprovisioned within 24 hours": - Designed: the written offboarding procedure and a screenshot of the automated deprovisioning workflow's configuration. - Operated: a record of the last several months of actual departures showing, for each, the termination time and the access-revocation time, demonstrating the 24-hour target was met.


E1 † — Define gap assessment; gaps vs. findings as the whole strategy

A gap assessment is a self-run comparison of your current controls against a target framework's requirements, producing a list of shortfalls (gaps) you remediate before a formal external audit.

Why it is the highest-leverage prep activity, in gap-vs-finding terms: a gap is a shortfall you found internally — a remediation to-do you own and can fix on your own timeline, invisible to your audit record. A finding is that same shortfall discovered by the auditor and recorded against your result. The two have opposite consequences. The entire art of surviving an audit reduces to converting findings into gaps by getting there first: every shortfall you discover and fix in a self-run gap assessment is one that will never appear as a finding. You cannot control whether shortfalls exist (they always do); you can control whether you or the auditor finds them first.


E3 † — Remediation items for the E2 gaps; route them into the risk register

From E2, the gaps/partials and their remediation + routing:

Gap/partial (from E2) Remediation item Routes to
MFA: admin console password-only Enforce MFA on the admin web console (or retire the legacy path) Risk register — likely CRITICAL (a real bypass into a sensitive system)
Logs: 90 days vs. 12-month requirement Extend log retention to meet the requirement; verify Risk register — MEDIUM (compliance gap, lower likelihood of harm)
PII at rest: backups to unencrypted bucket Enable and verify encryption on the backup bucket Risk register — CRITICAL (a full unprotected copy of the data)
(Quarterly access reviews: covered) File the signed records as evidence No register row — it is covered

Why route gaps into the risk register (Ch.27) rather than a separate compliance to-do list: it is the more mature design because (1) it forces each gap to be scored (likelihood × impact), so they get prioritized against all the organization's risks on the same terms rather than as an undifferentiated checklist; (2) it inherits the register's governance — an owner, a due date, and a sign-off for any accepted risk; and (3) it prevents the "compliance silo" failure where compliance gaps live in a parallel universe disconnected from the security team's actual risk priorities. The gap stops being "a thing the auditor wants" and becomes "a tracked, prioritized risk to the business" — which is both more honest and more likely to get fixed.


F1 † — Policy statement capturing floor-vs-ceiling

Compliance and Security Posture Policy (excerpt). Meridian Regional Bank treats applicable laws, regulations, and contractual security standards — including PCI-DSS, the GLBA Safeguards Rule, and others as they apply — as the minimum required baseline for the protection of its systems and its customers' information. Compliance with these obligations is mandatory and non-negotiable, but it is explicitly understood to be a floor and not a ceiling: where the bank's risk assessment determines that the residual risk after merely-compliant controls exceeds the bank's risk appetite, the bank will implement controls that exceed the compliance requirement. Security decisions are therefore driven by the bank's risk-management process, with compliance obligations as one input among several, and the effectiveness of a control against realistic threats — not merely its existence on an audit checklist — is the standard by which it is judged.

This is board-ready because it (1) commits to compliance unambiguously, (2) states plainly that compliance is the minimum, and (3) names the risk process as the engine for going higher — the exact §28.6 synthesis.


G1 † — Find at least three problems with the compliance claim

"We are fully secure because we passed our PCI-DSS assessment in January. Our scope is the payment system. We screenshot our firewall rules each January for the auditor. Our SOC 2 covers Availability only. We store full card numbers for chargeback convenience."

At least five problems, any three of which suffice:

  1. "Fully secure because we passed PCI-DSS" — the core floor-as-ceiling error. PCI-DSS compliance protects cardholder data within the CDE; it says nothing about the security of everything else, and passing an assessment is not being secure (§28.6).
  2. Scope is "the payment system" — too narrow and likely indefensible. PCI scope includes everything connected to or able to affect the payment system, not just the payment system itself. If anything else can reach it, the scope claim is false (the §28.3/C3 trap).
  3. "Screenshot firewall rules each January" — point-in-time evidence with eleven months of drift. A once-a-year screenshot proves the rules existed for one moment; it is exactly the drift problem (§28.6). PCI-DSS v4.0's emphasis on continuous, risk-based security exists precisely to counter this.
  4. "SOC 2 covers Availability only" — a misleading assurance. A SOC 2 limited to Availability tells a customer nothing about Security or Confidentiality; presenting it as general assurance is the floor-as-ceiling mistake from the vendor side (the §28.2 / Case Study 2 reading-the-report problem).
  5. "Store full card numbers for chargeback convenience" — the opposite of scope reduction, and a major liability. Storing full card numbers you do not need expands the CDE and the attack surface and likely runs against PCI-DSS's data-retention expectations. The correct move is to not store data you do not need (scope reduction), which shrinks both audit burden and breach impact.

A strong answer also notes the through-line: nearly every problem is a version of mistaking a narrow, point-in-time, existence-based compliance result for actual, continuous, effective security.


Chapter 29

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Third-party risk — risk taken on through a direct external relationship (a vendor/contractor) with access to your data, systems, or facilities, or that you depend on. Supply chain risk — risk that a component/product you incorporate is compromised, tampered with, or flawed before it reaches you. Fourth-party risk — risk from your vendors' vendors (sub-processors/dependencies), with no direct contract. Concentration risk — risk from outsized dependence on a single provider/product, so one failure is systemic. Combined sentence: "A bank that relies on one core-banking vendor (a third party) running on one cloud provider (a fourth party) faces concentration risk because that single stack is a systemic point of failure, and supply chain risk because a tampered update to the core software would import the compromise directly into the bank."

4. Software provenance is the verifiable record of where an artifact came from — what source it was built from, by what process, and that it hasn't been altered since. An SBOM is necessary but not sufficient because it enumerates what components are inside the software (answering Log4Shell's "where is the vulnerable library?") but says nothing about whether the artifact was tampered with during the build (the SolarWinds problem) — a malicious build produces an artifact whose component list looks perfectly legitimate. The framework that addresses the gap is SLSA (Supply-chain Levels for Software Artifacts), focused on build provenance.

7. Weights = 3,3,2,2,3,2,1,2 → sum 18, max possible = 18 × 4 = 72. Earned = (3·4)+(3·3)+(2·4)+(2·2)+(3·1)+(2·3)+(1·1)+(2·2) = 12+9+8+4+3+6+1+4 = 47. pct = round(100·47/72) = round(65.3) = 65%. Critical controls: Q1 (4, ok), Q2 (3, ok), Q5 SOC 2 scored 1 < 2 → FAILS. The critical-control override therefore caps the result at HIGH-RISK (critical control failed), flag = Q5 — regardless of the 65% average. Gaps to remediate before signing: Q5 (produce a current, in-scope SOC 2 Type II — this is the deal-breaker), Q4 (tighten patch SLA), Q7 (proactive sub-processor disclosure), Q8 (faster data destruction + certificate). Verdict: do not sign until the SOC 2 evidence is provided and verified; the missing independent audit on a vendor holding employee data is exactly the kind of unverified critical control the override exists to catch.

9. Three things to require before trusting a 100% self-attestation, each motivated by "trust, but verify": (1) Independent audit evidence — a current, in-scope SOC 2 Type II or ISO 27001 certificate corroborating the controls (the questionnaire is the vendor grading its own homework; an independent auditor is a second set of eyes). (2) Direct evidence for the highest-stakes claims — e.g., a screenshot/excerpt of the MFA policy, a penetration-test executive summary, the encryption standard document (claims about critical controls deserve proof, not assertion). (3) A right-to-audit clause in the contract — the standing ability to verify over time, because a point-in-time "yes" decays and the vendor's posture drifts. The principle: a corroborated "yes" outranks an unverified one, and 100% with no evidence is a sales answer, not a security finding.

12. (a) Direct: spring-core (the root branch-portal dependsOn it). Transitive: log4j-core and commons-text (both pulled in by spring-core, not chosen by the app team). (b) log4j-core 2.17.1 is NOT vulnerable to Log4Shell — 2.17.1 ≥ 2.17.0, the fixed line for CVE-2021-44228. (c) commons-text 1.9 IS exposed to the Text4Shell-class RCE (CVE-2022-42889, fixed in 1.10.0); it was pulled in transitively via spring-core. So the app is exposed to a critical RCE through a library nobody on the team chose. (d) With the SBOM and its dependency graph, you read the exposure directly — dependsOn shows the transitive path — in seconds; without it, you would be grepping the deployed artifact and querying the vendor for days, exactly the December-2021 scramble.

14. Response to the vendor: "Your appliance's SBOM lists openssl 1.0.2k, an end-of-life branch affected by multiple known CVEs; please confirm specifically which CVEs you consider non-applicable and why — for example, whether the vulnerable code path is unreachable in your configuration — rather than a blanket 'not affected.'" The evidence that resolves it is a VEX (Vulnerability Exploitability eXchange) statement: a machine-readable assertion, per component and per CVE, of exploitability status (e.g., "not_affected" with a justification such as "vulnerable_code_not_in_execute_path"). VEX is the structured way a vendor says "yes it's in there, but here's why it can't be exploited as shipped," turning a vague verbal "not affected" into an auditable claim you can accept or challenge.

16. Example enforceable clauses (timeframes/cadences/formats make them enforceable): (a) Breach notification: "Vendor shall notify Meridian in writing within seventy-two (72) hours of confirming any security incident affecting Meridian data or the delivered software, including known scope and affected data categories, and shall provide updates no less than every 48 hours until resolved." (b) Right to audit: "Meridian may, upon 30 days' notice and not more than annually (plus after any material security event), review Vendor's security evidence, receive Vendor's current SOC 2 Type II report, and commission an independent third-party audit of controls relevant to the delivered software." (c) Data return/destruction: "Upon termination, Vendor shall return and/or securely destroy all Meridian data within thirty (30) days and provide a written certificate of destruction." (d) SBOM delivery: "Vendor shall provide a current software bill of materials in CycloneDX 1.5+ or SPDX format for each release of software delivered to Meridian, enumerating all direct and transitive components with versions and identifiers."

18. (a) Minimum-controls clause → Mitigate (reduces the likelihood/impact of a vendor-side failure by requiring defenses). (b) Cyber-insurance/indemnification clause → Transfer (shifts the financial cost of a breach to the vendor/insurer; it doesn't prevent the breach). (c) Right-to-terminate clause → Avoid (lets you exit the relationship — and thus the risk — if the vendor materially fails). (d) Documented decision to single-source the core-banking vendor despite concentration → Accept (a conscious, documented choice to bear the residual concentration risk because no viable alternative exists, ideally paired with mitigations).

21. First five actions, distinguishing your environment from vendor requests (this is the §29.5 playbook applied to a SolarWinds-class event): (1) Confirm & scope (yours): check your software/asset inventory for the affected versions — where do you run the product, with what privilege, since when? (2) Contain your side (yours): isolate/disconnect the affected product's servers from the network (they are the foothold), disable or rotate the privileged accounts the product used, and block the published C2 indicators at DNS/proxy/firewall. (3) Hunt the blast radius (yours): treat your network as potentially compromised — sweep for published IoCs, look for periodic beaconing from the product's hosts, and hunt for lateral movement and identity anomalies from the product's access path. (4) Invoke the contract (vendor): demand the breach details, affected-version confirmation, and remediation guidance the contract entitles you to. (5) Meet your obligations (yours): if your data/systems were accessed, begin your regulatory and customer notifications (the duty does not transfer to the vendor). The full IR activity — not a procurement task — is step 3, the hunt across your own environment; a privileged, in-network vendor product compromise is a full incident in your network.

23. (a) Anomalous actions: enum_admins (the monitoring account enumerating domain admins), read_share target=\\fs01\HR (reading an HR file share), and create_token (minting a token) are all far outside "read performance counters" — this is the signature of the vendor's foothold being abused for reconnaissance and lateral movement. (b) Immediate containment on your side: disable/rotate the svc-monitor service account and isolate the host at 203.0.113.10 (the monitoring product's server) from the network — sever the foothold. (c) The broader hunt is governed by threat detection and hunting (Chapter 22) — hypothesis-driven hunting, IoC matching, and behavioral detection across the environment (with network analysis from Chapter 10 for the beaconing/flow side).

25. The vendor that passed the audit — at least five gaps a thorough reviewer surfaces despite a SOC 2 and a 96% score: (1) Scope mismatch — the SOC 2 may cover the vendor's flagship product, not the specific service you're buying; confirm the report's scope matches your use. (2) Date — a SOC 2 Type II covers a past audit window and says nothing about today's posture. (3) Exceptions — the auditor's opinion may be qualified, with exceptions buried in the report body that a checklist reviewer never reads. (4) Transitive/supply chain risk — the questionnaire likely didn't probe what's inside the vendor's software (no SBOM, no provenance), so a SolarWinds- or Log4Shell-class exposure is unaddressed. (5) Sub-processor (fourth-party) risk — the vendor's own dependencies and sub-processors may be unassessed and undisclosed. (6) Concentration — even a perfect vendor can be a single point of failure; the score says nothing about replaceability or resilience. (7) What the questionnaire didn't ask — a 96% on the questions asked is silent on the questions omitted. Three conditions to attach to approval: (a) provide an SBOM per release and support provenance verification; (b) firm right-to-audit + breach notification ≤72h + sub-processor disclosure/flow-down in the contract; (c) document the concentration/exit analysis and place the vendor on annual re-assessment with continuous monitoring — approval is conditional, not unqualified.


Chapter 30

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Security awareness — the ongoing process of building a workforce's knowledge, attitudes, and behaviors to recognize and respond to threats. Security culture — the shared norms and unwritten rules that determine how people actually behave toward security. Human firewall — a trained, engaged, fast-reporting workforce that catches what automation misses. Combined sentence: "A continuous security awareness program builds the security culture in which employees become a human firewall — reporting the phishing email that the email gateway let through."

4. Just-in-time training delivers a small lesson at the exact moment of a decision or right after a risky action (it teaches in context). A nudge changes the choice environment to steer behavior without teaching or forbidding (it shapes the default path). They differ in mechanism: one provides a learning moment, the other re-engineers the decision so the safe option is the easy one. Non-chapter examples: just-in-time — a prompt that appears when a user attaches a file to an external email reminding them to confirm the recipient; nudge — defaulting "reply" to internal-only so that "reply all to external" takes a deliberate extra step.

8. received = 500, clicked = 65, submitted = 18, reported = 140. - (a) Click rate = 65/500 = 13%. - (b) Report rate = 140/500 = 28%. - (c) Submit rate = 18/500 = 3.6% (the real damage — credentials actually surrendered). - (d) Healthy: report rate (28%) exceeds click rate (13%), so detection/reporting behavior outweighs susceptibility — the direction you want. - (e) Put the report rate (or its trend) at the top, because it is the detection signal the board should care about; pair it with time-to-report. Lead with neither completion rate nor a raw click rate in isolation. (Best answer: lead with the trend of report and click rates together.)

10. Time-to-report is the elapsed time (in minutes) from a phishing email landing to the first report reaching the SOC. The 15%/3-minute workforce can be safer than the 25%/40-minute one because in a real attack the campaign hits many inboxes at once; the speed of the first report determines whether the SOC can pull the message from everyone else's mailbox before the slow clickers act. A high report rate that arrives 40 minutes later may come after the damage; a fast first report — even at a lower overall rate — enables containment during the window between delivery and the first clicks. Reporting is a detection capability, and detection's value is dominated by speed once prevention has failed.

13. Reports at T+2, T+3, T+4 (campaign delivered at T+0); clicks at T+25 and T+31. - (a) Median time-to-report of the three reporters = T+3 minutes. - (b) Between 13:42 (first report) and 14:25 (first click), the SOC had ~43 minutes to act: triage the reported email, confirm it malicious, and pull it from all inboxes / block the lookalike domain and URL at the proxy and gateway — which would have prevented tellerD's and tellerE's later clicks. The lesson: a fast report is not just one person protecting themselves; it is the trigger that lets the SOC protect everyone who hasn't yet clicked. The report's value scales to the whole recipient list. - (c) The lookalike domain meridan-bank (missing an "i") exploits liking/familiarity — it reads as the trusted brand at a glance. A just-in-time external-sender banner (a nudge) and DNS/lookalike-domain monitoring would flag it; training the reflex to verify the channel/domain rather than the apparent brand is the human counterpart.

15. "CEO" emails finance to wire $240,000 urgently and confidentially. - (a) Three principles: authority (the CEO is directing it), urgency (today, deal closing), and fear/scarcity (the implied cost of delay / missing the acquisition window). Liking (matching the CEO's style) and social proof (a plausible real deal) are also present. - (b) The single behavior that defeats this entire class: out-of-band verification — confirm any payment or unusual wire request through a separate, known-good channel (call the CEO or finance lead at a stored number), never by replying to the email or trusting a number the email provides. - (c) Aimed at the wire-transfer / finance population. The financial impact is disproportionate because a single tricked wire can cost more than the entire awareness program for years, which is why this small population gets the most intensive tailored program (risk = impact × likelihood applied to humans).

16. Example one-page first-simulation plan (general workforce): - Lure: a gentle, generic "Your mailbox is over quota — click to restore service" message from a benign internal-looking sender. Obvious enough to be a fair first test; not targeted, not personal. - Population: the general corporate workforce (exclude, for the first run, any population requiring separate legal handling); ~700 people. - Metrics collected: received, opened, clicked, submitted, reported; plus time-to-first-report. - Teachable-moment landing page (3 cues): (1) the mismatched/lookalike sender domain; (2) the false urgency ("restore service now"); (3) the link target that doesn't match the real mail system. Tone: "This was a simulated test from the security team — you're not in trouble. Here's what gave it away." - Governance approvals needed before sending: written executive sponsorship defining scope; legal + HR review of data handling and the no-blame posture; SOC + GRC joint template approval. - No-blame handling of clickers: redirect to the teachable page only; no manager notification, no discipline; aggregate reporting only; supportive (not punitive) follow-up for any repeat clickers. - Lure explicitly NOT used: any fake bonus, raise, layoff, or health/benefits notice — these exploit deeply personal hope or fear, cause resentment and lawsuits, and reduce real reporting afterward.

19. Example Security Awareness Policy snippet (policy altitude): "Meridian Regional Bank shall maintain a continuous security awareness program to ensure that all workforce members can recognize and appropriately respond to information-security threats. All personnel shall complete foundational awareness training upon hire and ongoing awareness activities thereafter; personnel in higher-risk roles shall receive role-based training commensurate with their threat exposure. The Bank shall conduct authorized phishing simulations under defined governance, and shall handle results with a strict no-blame posture: reporting a suspicious message or one's own error shall never, by itself, be a basis for discipline. Every workforce member is expected to report suspected phishing or security concerns promptly through the approved reporting mechanism. The Chief Information Security Officer owns this policy and the awareness program; compliance is measured by program coverage and by behavioral metrics (including phishing-simulation click and report rates and time-to-report), reported through security governance to executive leadership. This policy shall be reviewed at least annually."

24. Incident response to four near-simultaneous HR-portal phishing reports with two early clicks: 1. Triage the reports (human reporting culture): the four reports are the detection event — the SOC confirms the email is malicious (credential-harvesting link, lookalike domain). 2. Scope (Part V monitoring): search proxy/email logs for all recipients and identify the two who already clicked; check whether either submitted credentials. 3. Contain: pull the email from all inboxes; block the URL/domain at the proxy and gateway; if credentials were submitted, force password resets and revoke active sessions for the affected accounts (ties to identity controls, Part IV); DMARC/email-auth (Ch. 9) reduces future same-domain spoofs. 4. Govern (incident process): classify severity, follow the IR plan, document, and — closing the loop — send a "you caught these" note crediting the reporters, reinforcing the reporting culture. The human layer detected it, the technical layer scoped and contained it, and the governance layer ran the response — defense in depth across all three.

26. The metric that lies. - (a) Each input inflates the composite: training-completion rate measures attendance, so it is near 100% regardless of behavior; quiz average measures knowledge, which doesn't predict real-world action and is easily high; the click rate from deliberately easy simulations is artificially low because trivial lures get few clicks. Blending three flattering-but-hollow numbers produces a "97% secure" score that reflects none of the workforce's actual resilience. - (b) Honest set instead: the trend in click rate (against consistent/rising difficulty) and report rate; time-to-report; coverage; and high-risk-population improvement (e.g., the finance team vs. BEC). Show trends, not snapshots, and never a single composite "% secure." - (c) Reframe for the board: "The vendor's single 97% score blends attendance and quiz answers with an artificially easy test, so it tells us almost nothing about how our people behave under a real attack. What I can show you is that our report rate has risen and our time-to-report has fallen against harder simulations — meaning our workforce is becoming a faster, more reliable sensor. I'd rather give you a truer, harder number you can trust than a comfortable one that the next real incident would expose."


Chapter 31

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design problems, or discussed in class. Code-flavored solutions also appear runnable in code/exercise-solutions.py.

1. DevSecOps — integrating security into every stage of the SDLC and the DevOps toolchain, automated and shared by developers/operators rather than bolted on at the end. Shift left — moving security activities earlier in the lifecycle, toward the developer and the moment code is written. CI/CD pipeline — the automated path from a commit to running software (CI builds/tests; CD delivers/deploys). Security gate — a check embedded in the pipeline that can pass, warn, or fail the build. Combined sentence: "Under DevSecOps we shift left by adding security gates into the CI/CD pipeline — for example, an SCA scan that fails the build the moment a developer's commit introduces a vulnerable dependency, catching the flaw in minutes instead of after it ships."

4. Pipeline integrity is the property that the software a pipeline produces is exactly what the verified source defines, built by an unmodified process and unaltered from build to deployment. Three controls that protect it: (i) isolated, ephemeral build environments (no persistent foothold); (ii) artifact signing plus provenance generation and verification (prove what was built and how); (iii) least privilege + MFA on who can edit pipeline definitions and trigger builds, with vaulted, short-lived pipeline credentials. SolarWinds makes this "a first-class security boundary" because it proved that whoever controls the build controls the output, and the output is trusted by every downstream consumer — so the build server deserves the same protection as a domain controller or cloud root account.

7. Two problems with putting all six scans at the CD/deploy gate: (1) it recreates the late-stage bottleneck — by the time a deploy-stage scan fails, the developer has moved on, context is cold, and pressure to override and ship is at its peak; (2) it wastes the cheap early catches — a secret or IaC misconfiguration found at deploy has already passed through commit and build, when it could have been caught in seconds on the laptop. Correct placement: pre-commit (laptop) — secrets scan + linters, for seconds-fast local feedback; CI — SAST, SCA, secrets (backstop), IaC, container scan, because they need only source/artifact and CI is automated, controlled, and early-but-complete; deploy — DAST (needs a running app in staging) and integrity verification (signing/provenance, which only make sense on the built artifact about to ship).

9. Baseline strategy: run the scan once, record all 312 findings as a known baseline, and configure the gate to fail the build only on findings not in the baseline — i.e., new findings introduced by a fresh commit (regressions). This makes the gate protective from day one (no new mistake can slip through) without halting all work (the 312 pre-existing issues do not block deploys). The 312 are not ignored: they go on a remediation schedule (criticals within a sprint, the rest over the quarter), tracked, with the baseline shrinking as they are fixed. Baselining trades immediate comprehensiveness for adoptability — and the trade is worth it because a gate that blocks all 312 gets removed within a week, protecting nothing.

11. Response: "The pre-commit hook and the CI gate do the same check but they are not redundant, because the pre-commit hook is advisory — it runs on your machine and can be skipped, bypassed, or simply not installed, intentionally or by accident. The CI gate runs on the server where no one can skip it, so it is the only one that actually guarantees the check happened before code ships. The hook is the fast courtesy that catches the secret in two seconds; the CI gate is the unskippable backstop that makes the guarantee real."

12. Add the gate (solution sketch). Insert gates into the build pipeline like so (placeholder commands; placement and severity are the point):

steps:
  - uses: actions/checkout@v4
  - run: secrets-scan --fail-on=verified --history .      # GATE: committed credentials (CI backstop)
  - run: sca-scan --fail-on=CRITICAL --only-fixable requirements.txt  # GATE: vulnerable deps (Log4Shell)
  - run: sast-scan --fail-on=HIGH --baseline=.sast-baseline.json .    # GATE: insecure code (baselined)
  - run: iac-scan --fail-on=HIGH terraform/               # GATE: misconfigured infra BEFORE it exists
  - run: pip install -r requirements.txt
  - run: pytest
  - run: docker build -t loanapp:$SHA .
  - run: image-scan --fail-on=CRITICAL --only-fixable loanapp:$SHA   # GATE: vulnerable image packages
  - run: docker push registry.meridianbank.example/loanapp:$SHA

Each gate fails the build only on high-confidence, high-severity findings; legacy SAST/IaC findings are baselined so only regressions break the build; run the independent scans in parallel jobs to keep the gate fast. See code/meridian-pipeline.yml for the full parallel form.

13. Add the gate (integrity). After the image is built and scanned, before pushing to production: cosign sign loanapp:$SHA` (artifact signing), `generate-provenance --builder=meridian-ci --source=$SHA (provenance), then a deploy step that refuses anything failing the policy: opa eval --fail --data policy/deploy_gate.rego --input image.json "data.meridian.deploy.allow" — which denies unless the image is signed and provenance.builder == "meridian-ci". This defends against the SolarWinds build-pipeline compromise: an artifact not built by our own attested pipeline is refused at deploy, even if it is signed.

14. Add the gate (speed). The single change: run the five independent scans in parallel (as separate CI jobs) instead of serially. Wall-clock time becomes the max of the scans (~3 min), not the sum (~15 min). It is safe because the five scans are independent — none consumes another's output — so concurrency changes only the scheduling, not the result. No scan is removed; the gate is just as strict, five times faster. (See code/exercise-solutions.py::ex14.)

15. Find the weakness — pipeline-integrity issues and their §31.4 fixes: - Trigger: any push to any branch by any collaborator → restrict who can trigger production builds; build only from main; require code review on protected branches. - Long-lived, shared, never-rebuilt build VM → use isolated, ephemeral builders destroyed after each build (denies a persistent foothold — the SolarWinds persistence point). - Single hard-coded AWS key with AdministratorAccess, never rotated → vaulted, short-lived, least- privilege pipeline credentials scoped to exactly what the build needs (Ch.20). - Unpinned dependencies → pin to specific hashed versions so the build cannot silently pull a swapped library (Ch.29). - Artifact pushed unsigned, no provenance → sign artifacts and generate/verify provenance at deploy. - All developers have admin on the CI system → least privilege + MFA on who can edit pipelines (Ch.16, 17, 19); treat CI as crown-jewel infrastructure.

17. Find the weakness — the signed-but-backdoored vendor update. This is possible because a digital signature proves origin and integrity (the artifact came from the vendor, unaltered after signing) but not safety. If the vendor's build pipeline was compromised, the malicious code was injected during the build, before the signing step, so the vendor's real key faithfully signed a backdoored artifact. This is exactly SolarWinds: source clean, malice injected at build, output correctly signed and shipped as trusted. What the vendor should have provided (and you should have verified) is provenance — verifiable attestation that the artifact was built by the vendor's expected, isolated builder from reviewed source (ideally to a SLSA build level, Ch.29). Verifying provenance, not just the signature, would have caught the artifact whose build could not be attested to clean source. Defense in depth on your side: also monitor the deployed software's behavior (Ch.22), because the next bad update will also be correctly signed.

19. Write the policy. (Rego-style; default deny.)

package deploy
default allow := false
allow if {
    input.image.signed == true
    input.image.provenance.builder == "meridian-ci"
    count(fixable_criticals) == 0
}
fixable_criticals[v] if {
    some v in input.image.vulnerabilities
    v.severity == "CRITICAL"
    v.fix_available == true
}
deny_reasons contains "artifact is not signed" if not input.image.signed
deny_reasons contains "not built by meridian-ci" if input.image.provenance.builder != "meridian-ci"

The default-deny is fail-safe (Ch.3): anything not explicitly allowed is blocked. (Runnable Python form in code/exercise-solutions.py::ex19_20.)

21. Write the policy — security group rule. Gate form (policy check on the IaC):

deny contains msg if {
    some sg in input.resources
    sg.type == "aws_security_group"
    some rule in sg.ingress
    rule.cidr == "0.0.0.0/0"
    rule.port in {22, 3389}
    msg := sprintf("SG %s exposes port %d to the internet", [sg.name, rule.port])
}

This is better implemented as both a gate and a guardrail. The gate (the IaC scan above) gives the developer a fast, local signal at the moment they wrote the misconfiguration, in their own pipeline. The guardrail (an AWS service control policy that denies creating such an ingress rule for every account) is the unbypassable backstop that makes the mistake structurally impossible even for infrastructure created outside the scanned pipeline, or if the gate is misconfigured. Defense in depth: prefer the guardrail for the unbypassable guarantee, keep the gate for the fast developer feedback.

23. Design it — fintech startup, 10 deploys/day, no security staff. Design priorities: maximum automation, guardrails over gates (no security team to babysit), fast feedback. - Commit (laptop): secrets pre-commit hook + linters. - CI: secrets (CI backstop), SCA (--fail-on=CRITICAL --only-fixable), SAST (--fail-on=HIGH, baselined), IaC (--fail-on=HIGH), container image scan (--fail-on=CRITICAL); all in parallel. - Deploy: sign artifact + generate/verify provenance; deny-by-default policy-as-code gate; DAST on staging. - Gating policy: fail the build on verified secrets, fixable critical CVEs (deps and image), and HIGH+ SAST/IaC; everything else warns. - Guardrails (instead of gates): (1) cloud org policy / SCP making public object storage and 0.0.0.0/0 ingress on sensitive ports structurally impossible; (2) only the CI identity holds the signing key and production deploy credentials, so a developer cannot manually push an unsigned/unverified artifact. - How velocity is preserved: the gates are fast (parallel, baselined, fail-only-on-severe) and the guardrails remove the most dangerous mistakes without any per-change inspection — the team moves at full speed because the unsafe actions are simply unavailable, and the gates that remain rarely fire.

26. CTF — the build that lies. (a) The malicious code was introduced during the build (a build compromise), not in the source. We know it was not in the source because the source passed all SAST/SCA/ secrets gates and is clean in version control — yet the deployed binary contains a backdoor, so the malice entered between clean source and shipped artifact, i.e., at the build step. (b) Prevented: an isolated, ephemeral build environment with reproducible builds (denies the attacker a persistent injection foothold and lets an independent rebuild expose the tampered binary). Detected: provenance generated at build and verified at deploy — the malicious binary's provenance could not validly attest to clean source built by an untampered builder, so deploy-time verification would have refused it. (Provenance was never generated or verified here, which is why it shipped.) (c) Immediate response: treat the signing key as compromised — revoke it and reissue, since an attacker in the build environment may have abused it; this is a security incident → engage the SOC (Ch.21–22) and incident response (Ch.24), scope which artifacts and customers received the tampered build, and pull/replace the malicious artifact. Add provenance generation + deploy-time verification and isolated builders before resuming releases.


Chapter 32

Chapter 32 — Answers to Selected (daggered) Exercises

Full worked solutions to the daggered (†) problems in exercises.md. Non-daggered problems are left open for study groups and instructors.


1.† The seven tenets, each with the perimeter failure it addresses.

  1. All data sources and computing services are resources. — Addresses the assumption that some things (internal services, "trusted" hosts) get a free pass; in zero trust everything is subject to policy.
  2. All communication is secured regardless of network location. — Addresses the trusted-interior failure: being on the internal network no longer earns trust or reduced security. This is the formal death of the implicit trust zone.
  3. Access is per-session and least-privilege. — Addresses "one credential reaches everything": access to one resource does not propagate, and each session gets only what it needs.
  4. Access is determined by dynamic policy (identity, device, context). — Addresses static location-based rules ("group X may reach server Y") that cannot account for device health or risk.
  5. The enterprise measures asset posture continuously. — Addresses compromised devices keeping their access: a device's right to connect is contingent on its measured health.
  6. AuthN/authZ are dynamic and strictly enforced before access (continuous verification). — Addresses the one-time front-door gate after which nothing re-checks; trust is re-earned continuously.
  7. The enterprise collects all the data it can and uses it to improve posture. — Addresses invisible lateral movement: by collecting everything, every access becomes a signal to detect and learn from.

4.† Implicit trust zone, two examples, why lateral movement lives there.

Definition: a region where an entity, once inside, is granted broad access without further per-request verification because it is presumed trustworthy by location.

Example 1 (on-prem): the classic flat corporate LAN — any host on the network can reach file servers, internal apps, and admin interfaces without re-authenticating. Example 2 (remote-access): the network a VPN deposits a user onto — after VPN authentication, the user has broad internal reach.

Why lateral movement lives there: inside the zone, systems grant access by location and do not re-verify identity, device, or context; a compromise of any one host inherits that host's broad reach; and because nothing makes a fresh authorization decision per access, the pivots generate little or no telemetry, so they are both permitted and invisible. The zone is, by design, exactly the environment lateral movement needs.


8.† Find the implicit trust in the VPN description.

Implicit trust zone: the internal network the VPN deposits users onto — "once connected, they can reach all internal applications." Authentication at the VPN is a one-time front-door gate; after it, the internal network grants access by location.

Tenets violated: Tenet 2 (communication is not secured/decided regardless of location — internal location confers reach); Tenet 3 (access is not per-session/least-privilege — one connection yields broad access); arguably Tenet 6 (no continuous, per-resource re-verification after the VPN gate).

Blast radius of one phished credential: the entire internal application estate. The attacker, once through the VPN, faces no further per-resource challenge and can enumerate and reach everything the flat network exposes — the exact pattern of the §32.1 war story and Case Study 2.


10.† The SaaS-forward startup that thinks it's "already zero trust."

Having "no internal LAN" removes one implicit trust zone but does not make an architecture zero trust. Two ways implicit trust can persist:

  1. SSO-as-a-gate without per-resource decisions. If authenticating once via SSO then grants a session that reaches many SaaS apps without re-evaluating device posture and context per app, the SSO session has become a small implicit trust zone — "logged in" is doing the work network location used to do.
  2. No device-posture signal. If any device with valid credentials can reach the SaaS apps over the internet, the architecture relies on identity alone and ignores device health and context — only one of the three signals. A phished credential on an attacker's laptop succeeds.

Other plausible gaps: over-broad OAuth scopes/tokens (no least-privilege session), flat trust inside a cloud account (no microsegmentation of workloads), and standing admin access. The lesson: zero trust is about per-request verification of all three signals, not about whether you happen to own a LAN.


12.† Evaluate access requests for the core-banking admin console.

Policy: identity in core-admins, managed device passing posture, in-country, business hours (08–18), risk < 50.

  • (a) in core-admins, managed/healthy, in-country, 14:00, risk 12 → GRANT (all three signals pass).
  • (b) in core-admins, unmanaged device, in-country, 14:00, risk 12 → DENY — device signal fails.
  • (c) in core-admins, managed/healthy, in-country, 02:30, risk 12 → STEP-UP — context (off-hours).
  • (d) not in core-admins, managed/healthy, in-country, 14:00, risk 12 → DENY — identity fails.
  • (e) in core-admins, managed/healthy, in-country, 14:00, risk 80STEP-UP — context (risk ≥ 50).

Deciding signal in each: (a) none fail; (b) device; (c) context/time; (d) identity; (e) context/risk. Note the order: identity and device hard-deny; context anomalies step-up (recoverable via additional authentication) rather than hard-deny. (See code/exercise-solutions.py, exercise_12.)


14.† Three zero-trust reasons a valid-credential request is denied.

The request has genuine, phishing-resistant credentials, yet is denied — because identity is one of three signals, all of which are checked every time:

  1. Device signal: the request comes from an unmanaged or unhealthy device (no EDR, out of patch compliance, disk unencrypted) and fails the posture gate.
  2. Context signal: the circumstances are off-policy or risky — impossible-travel location, off-hours, a high computed risk score, or a resource sensitivity demanding higher assurance than the context offers.
  3. Authorization/least-privilege: the identity is authenticated but not authorized for this specific resource/action, or the requested scope exceeds what a least-privilege session for this task permits.

"The credentials were valid" does not entitle the request because zero trust never grants on identity alone and never grants permanently: a verified identity is verified against device and context for the specific resource, every time ("never trust, always verify").


16.† ZTNA access flow with signal annotations and the VPN's omitted step.

 1. client/agent contacts the BROKER (PEP/PA); app address NOT published (SDP)
 2. AUTHENTICATE identity via IdP (phishing-resistant MFA)        [IDENTITY signal]
 3. CHECK device posture via MDM/EDR (managed? patched? healthy?) [DEVICE signal]
 4. POLICY ENGINE evaluates context (location/time/risk)          [CONTEXT signal]
 5. POLICY ADMINISTRATOR establishes a session to THIS APP ONLY   <- least-privilege session
 6. CONTINUOUS VERIFICATION during the session; tear down on posture/risk change

The legacy VPN omits step 4/5's per-app scoping entirely (and usually step 3): after authentication it places the user on the network with broad reach, rather than brokering a least-privilege connection to a single application. (The single most important omission is the per-app least-privilege session — it is what blocks lateral movement.)


18.† Microsegmentation policy for web1/app1/db1.

Allowed flows (everything else default-deny):

allow  web1 -> app1  tcp/443
allow  app1 -> db1   tcp/5432
default-deny all other east-west flows  (log + alert denials)

Compromise of web1 under this policy reaches only app1 on 443 — nothing else. It cannot reach db1 (no rule permits web1db1), the wiki, or any other host; each attempt is denied and alerted. Under a flat internal zone, a compromise of web1 reaches every host in the zone (db1, wiki, backups, DCs) over any allowed internal port, with no denial and no alert. The blast radius collapses from "the whole zone" to "one allowed flow." (See code/exercise-solutions.py, exercise_18.)


21.† Sequence the three-year program and justify each dependency.

Order: (ii) identity → (iv) device → (iii) ZTNA → (i) microsegment the CDE.

  • (ii) first — phishing-resistant MFA + entitlement cleanup: every later access decision depends on a trustworthy identity signal. With password-only accounts or stale over-privilege remaining, the whole architecture rests on a weak foundation.
  • (iv) next — device enrollment + posture pipeline: once identity is trustworthy, add the second signal so decisions can require a managed, healthy device. You cannot gate on posture before you can trust who the user is.
  • (iii) then — ZTNA replacing the VPN: with identity and device signals real, brokered per-app access is trustworthy; now you can safely stop placing users on the network. ZTNA before solid identity/device would broker access on weak signals.
  • (i) last — microsegment the CDE: the hardest and riskiest to production (default-deny on the data center). Do it once the foundation supports it and after mapping real flows; segment the crown jewel (CDE) first.

This also front-loads risk reduction (identity work is the biggest single win) and keeps each phase independently valuable.


23.† Pragmatic zero-trust treatment for the un-modernizable mainframe.

The core-banking mainframe cannot speak modern identity protocols or run a posture agent. Do not demand a rewrite. Treatment (Phase 4 — microsegmentation):

  • Wrap it behind a PEP/broker. No one reaches the mainframe directly; all access flows through an enforcement point that can make modern identity/device/context decisions on its behalf.
  • Microsegment tightly around it. A single explicitly-allowed path from a hardened broker; default-deny everything else; alert on any denied attempt.
  • Compensate for missing capabilities. Where the mainframe cannot enforce posture or least privilege itself, the surrounding controls substitute: the broker (identity/device/context), microsegmentation (containment), privileged-access controls (Chapter 19), and heavy monitoring.

Phase: 4 (microsegmentation of crown jewels). Substituting controls: PEP/broker for per-request decisions, microsegmentation for least privilege/containment, PAM for privileged paths, monitoring for the visibility the system cannot provide itself. (This is also the OT-security playbook of Chapter 33.)


27.† Respond to the torn-down session (continuous verification in action).

What zero trust did: it granted a least-privilege session, then — because device posture is re-evaluated during the session (continuous verification, Tenet 6) — the policy administrator tore the session down the moment the device's EDR fired a detection. The session's trust was being continually re-earned and failed mid-stream.

What the SOC should do next: treat the EDR detection as a live incident on that endpoint — triage the detection (Chapter 24), isolate/contain the device, investigate what the session reached before teardown, check the user's other sessions and recent authentications, and confirm whether other resources were touched.

How this differs from a VPN: a VPN performs a one-time front-door check; once connected, a device whose health degrades keeps its broad network access for the session's lifetime — the compromise rides the trusted tunnel undetected. Zero trust's continuous verification turned a mid-session compromise into an automatic access revocation plus a high-signal alert.


29.† Analyze the policy-decision log.

10:09  okafor  core-admin  device=managed,healthy  loc=in-country  -> DENY (not in core-admins)
10:11  okafor  core-admin  ... -> DENY (not in core-admins)
10:12  okafor  core-admin  ... -> DENY (not in core-admins)

(a) Concern: repeated denied attempts by the same user to reach a high-value resource they are not authorized for, in quick succession — a pattern consistent with privilege probing or a compromised account testing what it can reach. (The earlier GRANTs to loan-orig and wire-approve at 10:02–10:04 are normal for a loan officer; the core-admin attempts are out of role.)

(b) Why it's good these are log lines: in a flat network, an internal host attempting to reach the admin console would simply succeed or be silently blocked at the network with no access-decision record — the probing would be invisible. Here, every denied decision is logged, so the pattern is visible and alertable. Zero trust illuminates lateral movement by making every access a decision.

(c) Control + next hunt: the telemetry was produced by the policy engine/PEP making and logging a per-request authorization decision. Next: pivot on okafor — review device posture and recent authentications, look for other anomalous access attempts, confirm whether the account is compromised or this is a misconfiguration/role-creep issue, and consider a step-up or temporary disable while investigating. (See code/exercise-solutions.py, exercise_29.)


31.† The "zero-trust appliance" that isn't (CTF).

The product: MFA to a portal, then users land on a "secure access network" from which they reach any internal app, all traffic encrypted, sold as "zero trust in a box."

Ways it fails the seven tenets despite the marketing: - Tenet 2 (secure regardless of location): it creates a new trusted network — the "secure access network" — and grants reach by being on it. That is an implicit trust zone with a nicer name. - Tenet 3 (per-session least privilege): one authentication yields access to any internal app, not a per-resource, least-privilege session. Lateral movement is preserved. - Tenets 5/6 (posture + continuous verification): MFA at the front door is a one-time gate; nothing re-verifies device posture or context per resource during the session. - Tenet 1/7 (everything a resource; collect everything): if internal apps trust the access network and make no per-request decisions, there is little per-resource access telemetry.

The encryption and MFA are real and useful — so the product is a legitimate component (a strong front door / identity-aware gateway), just not a complete zero-trust architecture. The problem is entirely what happens after authentication.

Two follow-up questions to expose the gap: 1. "After a user authenticates, what stops them — or a stolen-but-MFA-passing credential — from reaching a second, unrelated internal application? Is each app an independent per-request decision, or are they on a shared access network?" 2. "Is device posture evaluated per resource and continuously during the session, and can a session be torn down mid-stream if the device's health or risk changes — or is the check only at login?"

(If the answers are "nothing/shared network" and "login only," it is a VPN with better marketing.)


Chapter 33

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. Operational technology (OT) — the hardware/software that directly monitors and controls physical processes. Industrial control system (ICS) — the general category of control systems used to operate industrial processes. SCADA — supervisory control and data acquisition; a system that centralizes monitoring and control of many distributed controllers across a wide area. PLC — a programmable logic controller; a ruggedized real-time controller that reads sensors and drives actuators. Combined sentence: "A water utility's operational technology is an industrial control system in which PLCs at each pumping station run the pumps and valves, while a SCADA system lets operators monitor and control all the stations from one control room."

4. Confidentiality sits last because OT's job is to control the physical world correctly and safely; a disclosed OT data point (a pressure reading, a pump setpoint) rarely causes direct physical harm, whereas a loss of availability or integrity can injure people or damage the environment, and an unsafe process can kill. "Last" is not "irrelevant," though: OT data whose disclosure genuinely matters includes detailed process schematics, setpoints, and safety limits that an attacker could use to plan a future physical attack (reconnaissance value), and, in some sectors, proprietary process recipes whose secrecy is itself the business. The point is that confidentiality is ranked below safety/availability/integrity, not that it is ignored.

7. (a) pressure sensor → L0, OT; (b) corporate email server → L5, IT; (c) data historian → L3, OT; (d) control-room HMI → L2, OT; (e) vendor remote-access jump host → L3.5, IDMZ; (f) PLC → L1, OT; (g) plant scheduling/ERP → L4, IT; (h) historian replica for business analysts → L3.5, IDMZ. Note that (c) and (h) are different assets: the real historian lives in OT at L3, and a replica lives in the IDMZ at L3.5 so business systems can read it without reaching into OT.

9. The objection conflates "a Level-5 person needs to reach a Level-3 system" with "Level 5 and Level 3 must talk directly." They need not talk directly. The IDMZ (Level 3.5) sits between them. The vendor connects to a jump host in the IDMZ (authenticating with MFA on the jump host, since the SCADA server itself may not support it), the session is recorded, and only from the jump host does the vendor reach the Level-3 SCADA server — and only during an approved window. No connection is ever opened from the IT domain straight into the OT domain; the jump host brokers it. The rule ("IT and OT never communicate directly") is preserved because the IDMZ is the intermediary that makes the necessary access possible without a direct path.

12. Compensating controls for an unpatchable critical PLC, ordered by leverage: 1. Segment harder (remove reachability). Ensure the only systems that can reach the PLC are the HMI/ SCADA that must. Reduces: exploitability — a vuln you cannot reach is largely neutralized. Residual: an attacker who first compromises an authorized upstream host (the HMI) can still reach the PLC. 2. Passive monitoring. Watch for any exploitation attempt, unexpected command, new device, or out-of-window program change to the PLC. Reduces: dwell time / undetected compromise. Residual: detection is not prevention; you must be able to respond. 3. Broker + strongly authenticate all access at the IDMZ jump host, with session recording. Reduces: unauthorized human/vendor access (puts MFA where it can live). Residual: a stolen, MFA-bypassing session or an insider with legitimate access. 4. Patch the surrounding IT-like hosts (HMI, SCADA server) in maintenance windows, tested first. Reduces: the most common real attack path (the HMI), which can often be patched. Residual: the PLC itself remains unpatched; the gap persists until a vendor fix and an outage window exist. The ordering reflects that in OT the network boundary is the strongest lever (reachability equals control), detection backstops prevention, access-brokering hardens the human path, and host patching is last because it is the most constrained.

15. A safety instrumented system (SIS) is an independent control system whose sole job is to bring a process to a safe state when conditions become dangerous, running on separate hardware and logic from the normal control system. (a) It means a compromised or failed control system need not become a physical catastrophe, because the SIS independently trips the process to a safe shutdown if the normal controls drive toward an unsafe condition. (b) It is therefore the single highest-value target: an attacker who disables the SIS removes the last barrier between an attack on the process and a real-world disaster — which is precisely what the Triton/Trisis malware attempted. The one segmentation rule never to relax: the SIS must be isolated even from the rest of the OT network, with its own monitoring; no convenience justifies giving the SIS the same reachability as ordinary controllers, and any interaction with safety-system logic is a maximum-severity event.

16. (a) The two alarming lines are 03:12:08 (L4 business host → L3 SCADA, verdict=NEW) and 03:12:09 (L4 business host → L1 PLC, verdict=NEW): both are direct IT→OT crossings, which the Purdue model forbids, and both are first-occurrences with no baseline precedent. (b) The most likely scenario is that a compromised corporate host (e.g., ransomware or a lateral-movement tool after a phished laptop) has found a path into the OT domain — possibly via a forgotten flat route or "temporary" firewall rule — and is now reaching the SCADA server and even issuing a Modbus write to a PLC. (c) The strongest indicator is the source/direction field — a Level-4 (IT-domain) host initiating a connection into the OT domain — independent of the protocol; the boundary crossing is the alarm. (d) The control that should have made the 03:12:08 flow impossible is segmentation enforcing the IT/OT boundary with default-deny (the Purdue rule), so that no Level-4 host can open a connection into the OT domain at all; all legitimate exchange would instead be brokered through the IDMZ.

18. OT networks are "gloriously predictable" because the process does the same thing repeatedly: the same HMIs talk to the same PLCs with the same commands at the same intervals. The liability: this is why patching is so disruptive — any change (a patch, a reboot, an agent) perturbs a system tuned to do one thing deterministically, and the equipment is rarely designed to absorb change. The gift: because the normal pattern is so stable, deviations are highly meaningful; learning the baseline makes anomaly detection far more effective than in IT, where human unpredictability buries the signal. Consequently, a first-occurrence alert is handled very differently than in an IT SOC: in IT, "we've never seen this before" is often benign noise to be triaged down; in OT, "this has never happened before" — a new device, a never-seen command, an IT→OT connection — is genuinely actionable and should be investigated, not tuned away.

21. Find the bridges. Unmanaged or risky IT/OT bridges and remediations: - (a) Vendor account from the internet directly to the SCADA server — BRIDGE (severe). A direct internet→OT path bypassing all boundaries; a stolen credential reaches OT (the Colonial pattern). Remediate: eliminate the direct route; broker vendor access through an IDMZ jump host with MFA and session recording, only in approved windows. - (b) Historian pushing to an IDMZ replica that BI reads — NOT a bridge (this is correct). This is the textbook brokered design: the only flow across the boundary is OT→IDMZ, and IT reads the replica, never the real historian. Leave as is. - (c) Engineering laptop on both corporate WiFi and the control network — BRIDGE (human-carried). A device that touches the hostile corporate network and then the control network can carry malware across the boundary (the Stuxnet principle). Remediate: dedicate a hardened, control-network-only device for OT work; bar general-purpose laptops from the control network (enforce with NAC where possible). - (d) RTU cellular modem with a default password, reachable from the internet — BRIDGE (severe). A direct internet-exposed OT device with default credentials (the IoT default-credential problem, aged). Remediate: remove direct internet exposure; place behind a controlled, brokered path; change/disable default credentials where the device allows; monitor the link. - (e) IDMZ jump host requiring MFA, all sessions through it — NOT a bridge (this is the control). This is the brokered access the others should be routed through. Leave as is. - (f) Two-year-old "temporary" rule letting the corporate patch server reach every OT host on any port — BRIDGE (severe, and exactly the kind that gets forgotten). A broad IT→OT path that an attacker on the patch server (or spoofing it) can ride into the whole OT network. Remediate: remove or tightly scope the rule; if OT hosts need patches, stage them through an IDMZ patch relay rather than a direct corporate path; add a passive-monitoring rule so any future such path raises an alert.

24. Path + single defensive lesson, at public-fact level: - (a) Stuxnet (2010): crossed an air-gapped network (reportedly via removable media/USB) to reach specific controllers and damage centrifuges while reporting normal readings. Lesson: an air gap is a boundary to monitor and enforce, not a guarantee — control removable media and treat isolation as porous. - (b) Ukraine grid (2015/2016): attackers phished into the IT networks of distribution companies, stole credentials, crossed into OT, and remotely opened breakers to cut power; operators recovered via manual operation. Lesson: an IT compromise becomes OT impact through the boundary — guard and monitor the IT/OT line, and value manual/degraded operability for recovery. - (c) Triton/Trisis (2017): malware at a petrochemical facility targeted the safety instrumented system, attempting to reprogram it; it was discovered when it accidentally tripped the SIS to a safe shutdown. Lesson: the safety system is a security target — isolate and monitor it above all else. - (d) Colonial Pipeline (2021): a dormant VPN account without MFA gave ransomware access to the IT/ business network only; the company shut the pipeline ~5 days because it could not bill and could not prove the IT/OT boundary held. Lesson: basic IT hygiene (MFA, account decommissioning) is OT security, and the provability of the IT/OT boundary determines your options under attack.

28. The confident air gap. Flawed assumptions in the paragraph and corrections: - "Fully air-gapped … requires no patching, no monitoring, no additional controls." — False premise: Stuxnet proved an air gap is crossable (USB), so air-gapped systems still need monitoring and compensating controls; isolation lowers but does not eliminate risk. - "Vendors service equipment on site with their own laptops." — A vendor laptop is a human-carried bridge that can ferry malware across the gap; it directly contradicts the "air gap" claim. - "Engineers move files in with USB drives as needed." — Removable media is the exact vector Stuxnet used; "as needed" with no control is an open door across the gap. - "We have never had an incident." — Absence of a detected incident is not evidence of security, especially with no monitoring in place to detect one; you cannot find what you are not watching for. Rewrite (defensible posture): "Our control network is segmented from corporate IT and the internet, with all necessary access brokered through a monitored IDMZ jump host (MFA, session recording). We treat the isolation as porous: removable media is controlled and scanned, vendor work uses dedicated control-network-only devices, and a passive sensor monitors the boundary and feeds our SIEM, with any IT→OT crossing as a top-severity alert. We apply compensating controls to unpatchable equipment and patch IT-like hosts in planned windows. We have detected no incidents and we have the monitoring in place to detect one." The real incident that most directly refutes the paragraph's central claim is Stuxnet, which compromised an air-gapped facility via removable media.


Chapter 34

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design-flavored, or discussed in class. All z-scores use the population standard deviation, matching mlsec.py.

1. Supervised detection — learns a labeled function from examples already marked (e.g., "phishing"/ "legitimate") and predicts labels for new inputs. Unsupervised detection — learns "normal" from unlabeled data and flags deviation, with no labels. Anomaly detection — the identification of observations that deviate enough from an established baseline to warrant attention (an unsupervised technique). UEBA — the multi-feature, per-entity generalization of anomaly detection that fuses several behavioral signals into one risk score. Threshold concept: anomalous is not the same as malicious, and malicious is not always anomalous — a backup job is anomalous and harmless; a "low and slow" attacker using valid credentials is malicious and statistically normal.

4. Explainability is the degree to which a model's individual decisions can be understood and justified by a human. It matters in (a) an incident review because responders must defend why an alert fired and reconstruct what happened — an unexplainable "the model said so" cannot be triaged, escalated, or learned from; and (b) a regulatory conversation because regulators (and, for adverse decisions, affected people) can demand a justification for an automated determination, and "the neural network decided" is not a defensible answer. Simpler models like a z-score have an explainability advantage because the decision is a transparent arithmetic statement ("9 tonight vs. baseline mean 3.0, sd 1.0, so z = 6.0") that a human can read, reproduce, and defend, whereas a complex model's decision boundary is opaque.

6. Baseline $[2,1,2,3,2,1,2,3]$: sum $=16$, $n=8$, $\mu = 16/8 = 2.0$. Deviations $0,-1,0,1,0,-1,0,1$; squares $0,1,0,1,0,1,0,1$; sum $=4$; variance $=4/8 = 0.5$; $\sigma = \sqrt{0.5} \approx 0.7071$. Today $=9$: $z = (9 - 2.0)/0.7071 \approx 9.90$. Yes — far above a threshold of 3; flag it.

8. Baseline $[5,6,4,5,5,5]$: sum $=30$, $n=6$, $\mu = 5.0$. Deviations $0,1,-1,0,0,0$; squares $0,1,1,0,0,0$; sum $=2$; variance $=2/6 \approx 0.3333$; $\sigma = \sqrt{0.3333} \approx 0.5774$. (a) $\mu = 5.0$, $\sigma \approx 0.5774$. (b) New hour $=5$: $z = (5-5)/0.5774 = 0.00$ — not anomalous (a perfectly typical hour). (c) If one baseline hour had been $50$ instead of a $5$, $\mu$ and especially $\sigma$ would balloon (the single huge value dominates the sum of squared deviations), so $\sigma$ becomes large and subsequent real spikes score below threshold — the detector is desensitized by one outlier. A robust statistic — the median and the median absolute deviation (MAD) — resists this because a single extreme point barely moves the median.

10. Clean baseline $[3,4,4,3,4,5,4,3]$: sum $=30$, $n=8$, $\mu = 30/8 = 3.75$. Squared deviations sum to $3.5$; variance $=3.5/8 = 0.4375$; $\sigma = \sqrt{0.4375} \approx 0.6614$. Tonight $=12$: $z = (12 - 3.75)/0.6614 \approx 12.47$ — strongly anomalous. Poisoned baseline $[3,4,4,3,4,7,8,9]$ (last three inflated to $7,8,9$): sum $=42$, $\mu = 42/8 = 5.25$; squared deviations sum to $39.5$; variance $=39.5/8 = 4.9375$; $\sigma = \sqrt{4.9375} \approx 2.2220$. Tonight $=12$: $z = (12 - 5.25)/2.2220 \approx 3.04$. The contamination lowered tonight's z-score from $\approx 12.47$ to $\approx 3.04$ — a drop of about $9.44$ — so it barely still crosses a threshold of 3, and a slightly more patient attacker would have pushed it below. This illustrates data poisoning of an unsupervised model: by inflating the baseline (the attacker being present while "normal" was learned), the attacker raised $\mu$ and $\sigma$ so that genuine attack volume now looks ordinary. Defenses: a longer/vetted baseline window, robust (median/MAD) statistics, and periodic human-reviewed baselines.

11. $500{,}000$ events, $40$ malicious, $95\%$ TP-rate, $0.8\%$ FP-rate. (a) True positives $= 0.95 \times 40 = 38$. (b) False positives $= 0.008 \times (500{,}000 - 40) = 0.008 \times 499{,}960 = 3{,}999.68 \approx 4{,}000$. (c) Queue $= 38 + 3{,}999.68 \approx 4{,}038$. (d) Precision $= 38 / 4{,}037.68 \approx 0.0094 \approx 0.94\%$. The queue is not workable as-is for a five-person SOC: ~99 of every 100 alerts are false, so the team would clear thousands of junk alerts a day and quickly mute the feed. Fixes: narrow scope (shrink the denominator), stack a second signal, enrich, and risk-rank; and tune the threshold to the volume the team can actually investigate.

13. The base rate is the underlying frequency of the thing you are detecting — here, real attacks are rare relative to the ocean of benign events. A detector that flags nothing at all is "99.99% accurate" on a stream where attacks are 1 in 10,000, because it is correct on all 9,999 benign events and wrong only on the 1 attack — accuracy rewards predicting the majority class. Accuracy is therefore the wrong headline metric for rare-event detection, because a useless model scores almost perfectly. Replace it with precision (the share of alerts that are real attacks — what analysts experience) and recall (the share of real attacks the detector catches), plus the alert volume at the operating threshold.

16. (a) Model evasion against a supervised classifier (the malware scanner); the attacker uses query access to find a variant that crosses the decision boundary to "clean." Defense: limit and log queries (a burst of near-identical probing inputs is itself detectable), don't rely on the classifier as the only gate, and add downstream behavioral detection. (b) Data poisoning (gradual baseline contamination) against an unsupervised anomaly detector; enabled by the moving baseline re-learning "normal" from attacker-influenced traffic. Defense: slow/vetted, human-reviewed baselines and robust statistics. (c) Data poisoning (availability/targeted) against a supervised model trained on the shared dataset; mislabeled "benign" samples teach the model to miss that family. Defense: data provenance and validation, outlier filtering of training inputs, and a clean held-out test set. (d) Backdoor (targeted) data poisoning; the model is accurate everywhere except on inputs carrying the secret trigger, which it misclassifies as benign. Defense: scrutinize training-data provenance, test for trigger-like input clusters, and monitor for inputs that produce surprising confident "benign" verdicts.

18. A layered architecture for a malware-scoring model that follows "assume the layer fails": In front of the model — cheap deterministic controls: hash allowlists/denylists of known-good and known-bad files, signature scanning, and policy rules (e.g., block unsigned executables from untrusted sources). These resolve the obvious cases without invoking the corruptible model and shrink what the model must decide, reducing the payoff of attacking it. Behind the model — human review of the model's "uncertain" or high-impact verdicts, plus downstream behavioral/EDR detection that watches what a file does after it runs. This catches anything an evaded or poisoned model waved through, because the malware still has to act and that action is anomalous. Each placement blunts a different attack: the front layer means evasion of the model alone does not reach the asset; the behind layer means even a poisoned model's false "benign" is caught by independent downstream signals. No single corruptible layer is decisive — that is defense in depth (Theme 4) applied to ML.

20. UEBA design for a compromised privileged administrator account at Meridian: - Entity: individual privileged/admin accounts (each baselined separately). - Features (≥4): (1) login hour/day-of-week profile — admins have routine windows; an off-hours login is suspicious; (2) source geography/IP and whether it is new — takeovers often originate from new places; (3) count and sensitivity of systems accessed, especially first-time access to a system the admin never touches — lateral movement; (4) volume of privileged actions or data exported — bulk activity signals abuse; (5) (optional) use of unusual admin tools or commands. - Baseline: per-entity over ~30 days, split business-hours vs. off-hours, plus a peer-group baseline (other admins of the same tier) so a deviation that is unusual for the person and the role both count; the baseline is frozen and human-reviewed (anti-poisoning). - Scoring/threshold: a weighted sum of per-feature z-scores (weight first-time sensitive access and new geography highest), with a combined threshold tuned to the volume the SOC can investigate; risk-rank rather than binary-alert. Two benign false positives and their suppression: (i) an admin legitimately working an unannounced maintenance window (off-hours + new system) — suppress by enriching against the change-management/ maintenance calendar before alerting; (ii) an admin traveling (new geography) — suppress by enriching against an approved-travel/VPN signal or by requiring a second anomalous feature (geography alone is too weak).

24. Control requirement (policy language): "All outbound funds transfers at or above USD 250,000, and any transfer of any amount that is flagged by the requester or initiator as urgent, confidential, or exception-to-process, require out-of-band verification AND dual authorization before release. Out-of-band verification means the approving treasury officer must independently confirm the request by contacting the purported authorizing party using contact details obtained from the internal corporate directory — never contact details supplied within or alongside the request — or by exchanging the current treasury code phrase. Dual authorization means a second, independent treasury officer (not the initiator) must record completion of out-of-band verification and approve release within the treasury system, which shall technically prevent release without this second approval. This requirement may not be waived for urgency under any circumstances." Why a deepfake cannot defeat it: a synthetic voice or video can imitate the executive on the request channel, but it cannot answer a callback placed to a directory-listed number it does not control, cannot produce the rotating code phrase it does not know, and cannot supply the independent second human approver the system requires. The control's trust rests on the channel and the second human, not on judging whether the request's content looks genuine.

26. First five response steps (Chapter 24 lifecycle), in order: 1. Contain immediately: halt and recall the wire — instruct the bank to stop/reverse the transfer before it settles, and freeze the initiating session/account from releasing anything further. (This is the single most important immediate action: stop the money while it is still stoppable.) 2. Verify out of band: contact the real purported authorizer (CFO) via a directory number to confirm they authorized nothing, establishing the fraud. 3. Preserve evidence: capture the meeting invite, any recording/metadata of the call, the messages, and treasury-system logs (Chapter 25 readiness) for investigation and possible law-enforcement referral. 4. Notify and escalate: alert the incident commander, finance leadership, legal, and the bank's fraud team; engage law enforcement per policy. 5. Hunt for blast radius: check for other pending/recent transfers, other targeted employees, and any related access — assume this employee was not the only one contacted. The single most important immediate action: stop/reverse the wire before it clears. The two process controls whose absence let it get this far: (i) mandatory out-of-band callback verification for urgent/confidential high-value transfers (to a directory number, not a supplied one), and (ii) enforced dual authorization by an independent second approver. (A culture that treated verification as distrust, and the waivable-for-urgency loophole, are contributing factors.)

28. Plausible explanations for two months of near-silence from an anomaly detector: 1. A muted/broken feed — the alerts are being suppressed, filtered, or routed to a dead queue (the week-one failure mode); benign-seeming but means no coverage. 2. A broken pipeline — the log source stopped flowing, the parser changed, or the job is failing silently, so the model sees no data (or constant data) and never fires. 3. Baseline contamination/poisoning — an attacker present during/after baseline learning has been absorbed into "normal," or a gradual ramp drifted the baseline; the detector is blind by design now. 4. Drift — the environment changed (new systems, new work patterns) and the stale baseline no longer matches reality, so either everything or nothing scores anomalous and the team retuned it into silence. 5. A genuinely quiet period (least likely to assume first) — possible but never the safe default. Investigate first: whether the detector is working at all — i.e., rule out the muted feed and broken pipeline before believing the network is clean. Test it by injecting a known synthetic anomaly into the input (in a controlled way) — e.g., a test entity with a deliberately extreme value — and confirm an alert fires end to end (model → enrichment → analyst queue). If the planted anomaly does not surface, the silence means the detector is dead, not that the network is clean. Also re-baseline against a known-clean historical window and compare. The unifying lesson: absence of alerts is not evidence of absence of attacks — verify the detector before trusting its silence.


Chapter 35

Worked solutions to the daggered (†) exercises. Other exercises are open-ended or discussed in class.

1. The three forces: Commoditization — a skilled few package a capability (as a kit, service, or subscription) and sell it to the many, so a technique stops being a rare weapon and becomes something any mid-tier criminal can rent (example: ransomware-as-a-service). Specialization — the attack chain is split among specialists who trade with each other, each doing one thing well (example: an initial access broker who breaks in and sells the foothold to a ransomware affiliate). Adaptation to defenses — attackers evolve to bypass whatever control defenders deploy, an indefinite arms race (example: endpoint detection got good at spotting malicious executables, so attackers shifted to living-off-the-land, abusing built-in tools).

4. (a) The explosion in the number of ransomware victims → commoditization: RaaS packaged the hard part (reliable ransomware) and rented it to many affiliates, so the population of actors able to run a campaign grew without any technical breakthrough. (b) Shifting from malicious executables to PowerShell → adaptation to defenses: EDR's success at catching malware pushed attackers to abuse legitimate tools that defenders cannot delete. (c) Voice-cloning fraud reaching ordinary criminals → commoditization: a capability once limited to studios/intelligence agencies became a rentable service as the cost and skill barrier fell.

6. Ransomware-as-a-service (RaaS) is a business model in which operators build and maintain the ransomware platform (encryption code, payment portal, negotiation site, even support) and rent it to affiliates who carry out attacks, splitting the proceeds. Volume can rise without a new encryption technique because RaaS commoditizes the hard part: the difficult engineering is done once by the operators and sold to many affiliates, so the number of actors capable of running a devastating campaign grows dramatically even though the underlying ransomware is unchanged. Volume is a function of who can run it, which the business model expands, not of how novel it is.

8. "Ransomware-proof" is the wrong framing because modern ransomware cannot be reliably prevented — an affiliate economy rents industrial-grade tooling to anyone who can buy a foothold, and prevention has to be perfect while the attacker needs one success (the Ch.1 asymmetry). The right goal is resilience: assume the encryption succeeds and engineer so it is survivable. Five controls and the failure each mitigates: 1. Immutable + offline + tested backups — mitigates the loss of availability and the attacker's now-standard tactic of destroying reachable backups before detonation (an untested backup is not a control). 2. Quiet-phase detection (anomalous PowerShell/WMI, new admin accounts, internal scanning, backup-server access) — mitigates the long dwell before detonation, catching the attack before the ransom note. 3. Egress monitoring — mitigates double extortion's data theft (and, even when it cannot prevent the leak, scopes the breach so notification is precise). 4. Segmentation + least privilege — mitigates blast radius, so a foothold in one zone cannot encrypt everything. 5. A rehearsed double-extortion IR plan — mitigates the chaos of making the pay/notify/engage decisions under pressure by deciding the criteria in advance. Prevention still matters at the front door (phishing-resistant MFA stops the bought-credential foothold), but the posture's center of gravity is survival, not denial.

11. (a) This is exfiltration for double extortion — bulk outbound data transfer preceding encryption. (b) Strongest indicators: the enormous bytes values (≈8–9 GB per transfer), the repetition to a single external dst in the dead of night, and especially the accountsvc_backup, a backup service account, moving data outbound to the internet, which is the opposite of what a backup account should do (backups pull data internally, not push it externally). A service account behaving like an exfiltration tool is the tell. (c) Egress monitoring would have detected it (large/unusual outbound to a new destination); segmentation + least privilege (and removing the service account's unnecessary outbound internet access) would have limited it. (d) The single most urgent thing to determine is what data was taken — egress logs and the destination give the team the scope needed for an accurate breach-notification decision under GLBA/state law, rather than a worst-case guess.

13. Procedural red flags (none of them about how the video looked): - Caller came from an external/unknown number, not the CFO's usual line → defeated by an out-of-band callback to the CFO's known directory number (do not trust the inbound channel). - Request is to a NEW vendor account → triggers the change-of-payment-details verification step; new/changed payee details are a classic fraud signal requiring independent confirmation. - "Before end of day, keep it confidential" (urgency + secrecy) → defeated by procedural friction against urgency: no urgent request skips verification, and secrecy that blocks normal checks is itself the red flag. - Caller deflected when asked to confirm a detail only the real CFO would know → defeated by a pre-shared verification phrase / challenge question the fake cannot answer. The lesson: every defense here is a process that ignores how convincing the face and voice are.

15. (a) VPN ECDHE, session traffic only → MEDIUM: ECDHE is asymmetric (quantum-breakable), but the session secret is short-lived, so harvest-now-decrypt-later barely applies — migrate in time, but not first. (b) 15-year medical archive, RSA-wrapped AES → HIGH: asymmetric RSA key-wrap + very long confidentiality lifetime = prime harvest-now-decrypt-later target. (c) Argon2 password hashes → LOW: a hash, not public-key crypto; minimally affected by quantum. (d) ECDSA firmware signing, 8-year device life → HIGH: asymmetric signatures + long-lived; a future ability to forge signatures undermines the integrity of every device's updates. (e) TLS public marketing site, no sensitive data → MEDIUM by the strict rule (ECDHE is asymmetric, short-lived), and effectively bottom-of-the-list in practice because it protects no sensitive long-lived data — a fine written answer notes both the rule's output and the judgment that deprioritizes it.

17. Harvest-now-decrypt-later for an executive: "An attacker does not need a quantum computer today to steal data we encrypt today — they can copy our encrypted data now and simply wait, decrypting it the day a powerful enough quantum computer exists. So for anything that must stay secret for many years, the danger is present, not future: data we protect with today's algorithms is already at risk of being read later. That is why we should migrate our fifteen-year customer-records archive to post-quantum encryption now — not 'when quantum computers arrive,' because by then the data we wrote this year will already have been sitting in an adversary's archive for over a decade, waiting to be unlocked."

21. Out-of-band verification policy (example). "Any request to transfer funds, change payment or banking details, or release sensitive data — regardless of how it arrives (email, phone, video call, or message) and regardless of the apparent seniority or urgency of the requester — must be independently verified before it is executed. Verification procedure: (1) Do not act on the request as received. (2) Independently contact the requester using a pre-established, known channel — a phone number from the corporate directory, never a number or link provided in the request itself. (3) Confirm the request, and for high-value actions confirm a pre-shared verification phrase known only to the authorized parties. (4) If the requester cannot be reached through the known channel, or cannot provide the verification phrase, stop and escalate to security — do not proceed. (5) Urgency and demands for secrecy are explicit red flags, not reasons to skip these steps; no deadline overrides verification." The verification relies on a channel the attacker does not control (the directory number) and a secret the deepfake cannot possess (the phrase); urgency is reframed from a reason to hurry into a reason to scrutinize. This defeats a synthetic CFO on a video call because the callback goes to the real CFO's known number, which the fake cannot intercept.

26. The dependency you didn't know you had. (a) Dependency confusion despite the name check: the developer's allowlist matches package names, but dependency confusion uses the same name as an internal private package — the attacker publishes a public package with that exact internal name and a higher version number, and a misconfigured build tool, resolving across both public and private registries, prefers the higher version and pulls the attacker's public package. The name check passes because the name is, by design, identical. Mitigation: explicitly scope internal packages to the private registry (namespacing / scoped packages and registry pinning) so internal names never resolve to a public source, and pin versions. (b) Signature verification does not stop the SolarWinds pattern because in that attack the malicious code is inserted into the legitimate vendor's build before the vendor signs it — so the package the developer receives is signed with the vendor's genuine, valid signature. Verifying the signature confirms it really came from the vendor (it did) and was not altered in transit (it wasn't); it cannot detect that the vendor's own build was compromised upstream. Mitigation: add behavioral detection on what the trusted software does after installation (a signed agent suddenly beaconing to a new domain or spawning unusual processes), apply least privilege to vendor agents so a compromised-but-trusted component is contained, and push toward provenance/SLSA so the build process itself carries verifiable integrity evidence beyond a final signature.


Chapter 36

Worked solutions to the daggered (†) exercises. The remaining exercises are open-ended or for discussion.

1. Security metric — a measurement chosen because its value changes a decision (if it doubled or halved, someone would act differently). Vanity metric — a measurement that looks impressive but drives no decision, typically unbounded, lacking a denominator, or measuring activity rather than outcome. KPI — measures how well a process performs against its objective (output/efficiency). KRI — measures how much risk the organization is carrying, as a leading warning signal. Combined sentence: "Meridian's KPI 'median days to patch a critical vulnerability' tells us how well the patching process runs, while the KRI 'number of internet-facing systems with a known-exploited vulnerability past SLA' is the metric that warns us risk is rising — and 'total patches applied,' a vanity metric with no denominator, tells us nothing actionable at all."

4. Control coverage is the proportion of in-scope items (assets, accounts, or attacker techniques) that a control actually protects, expressed as a percentage with an explicit denominator — e.g., "EDR runs on 95% of our 220 servers." The denominator is where it lies because coverage is computed against the items you know about: the shadow server, the unmanaged IoT device, the system a recent acquisition added but inventory missed, are all silently excluded from the denominator — and those unknown items are exactly where risk concentrates, so a high coverage percentage can coexist with a wide-open blind spot. The capability that must be solid for any coverage metric to be trustworthy is a complete, accurate asset inventory (begun in Chapter 1, governed in Chapters 14/18); without it, coverage is a fraction with an unknown bottom.

6. (a) MTTD = (1.0 + 0.5 + 20.0 + 2.0 + 1.5) / 5 = 25.0 / 5 = 5.0 h. (b) Response gaps = (2.0−1.0), (1.5−0.5), (44.0−20.0), (3.0−2.0), (4.0−1.5) = 1.0, 1.0, 24.0, 1.0, 2.5 → sum 29.5; MTTR = 29.5 / 5 = 5.9 h. (c) Detection intervals sorted: [0.5, 1.0, 1.5, 2.0, 20.0]; the middle value is 1.5 h (median). (d) The mean (5.0 h) is more than triple the median (1.5 h) because incident C's 20-hour detection is an outlier that single-handedly drags the average up; put the median on the board slide (with the mean and the outlier noted), because "we usually detect within ~1.5 hours, but one case took 20" is more honest than a 5-hour average that describes none of the incidents well. (e) Incident C is the biggest improvement opportunity: 20 h to detect plus 24 h to contain is a 44-hour attacker window, dwarfing every other incident; whatever blind spot allowed that (likely missing detection coverage for its attack type) is the highest-value fix.

8. Two ways to drop MTTR without responding faster or better: (i) Close tickets prematurely — mark an incident "resolved" at containment-in-name before eradication is actually verified, so the detection-to-close clock stops early while the attacker may still be present. Guardrail: a reopen rate (incidents marked resolved that recur) — premature closure makes reopens spike. (ii) Reclassify or under-count the hard cases — quietly downgrade slow, ugly incidents to "events" or exclude them from the MTTR population so only the fast ones count. Guardrail: track the incident count and classification mix alongside MTTR, and audit a sample of closed tickets. The general lesson is Goodhart's law: any single metric set as a target in isolation will be optimized — including in ways that make security worse — so high-stakes metrics must be paired with a guardrail metric that moves in the opposite direction when the primary is gamed, and never set as a lone target.

9. Measuring MTTD from "first SIEM alert" to "analyst acknowledgement" understates true MTTD because it silently redefines the start of the incident as the moment your tooling happened to notice, not the moment the attacker actually began. If the attacker operated for 18 hours before tripping any rule, that 18 hours of dwell time vanishes from the metric, and you congratulate yourself on a "6-minute detection" of an attack that had been underway since the previous day. MTTD must measure from the attacker's true first action to detection. That start time is rarely known in the moment; it is reconstructed afterward through forensics (Chapter 25) — timeline analysis of artifacts establishes when the intrusion really began. A program that does not investigate thoroughly will systematically flatter its own MTTD, which is both a measurement error and a false sense of security.

10. (a) The KRI a board should see is "critical vulns open past SLA: 47 (up from 31)" — it is a leading indicator of risk (dangerous, exploitable exposures are lingering, so a breach is becoming more likely) and it is actionable (reprioritize remediation of the overdue criticals). "Found" and "remediated" are activity counts with no denominator. (b) The real story: the team is doing a lot of patching (remediated 1,800) and the total count is falling, which looks like progress — but the dangerous subset is going the wrong way (overdue criticals rising from 31 to 47). Reporting only the first two numbers would tell a comforting "we're on top of patching" story while the actual exposure that causes breaches is worsening; volume of patching says nothing about whether the right things were patched in time. (c) Lead sentence: "Our most dangerous exposures are growing — critical vulnerabilities left unfixed past their deadline rose from 31 to 47 this month — and closing that specific gap is our top remediation priority, even though our overall patch volume is healthy."

11. Keep: (2) "MTTD 5.5 h, down from 9 h" — actionable trend, answers "are we improving / how do we compare." (4) "security risk: 1 of 5 dimensions above appetite" — directly answers "are we exposed," framed against the board's own appetite line. (6) "MTTR 6.9 h vs. ~12 h peer" — answers "how do we compare," with an honest benchmark. Cut: (1) "12M attacks blocked" — vanity: unbounded activity, no denominator, no decision. (3) "1,400 vulnerabilities" — no denominator, no target, no trend; a bare count is not a metric. (5) "98% training completion" — activity metric (clicked a module), not an outcome (behavior change / report rate); near-meaningless as a risk signal. Headline the kept three support: "Information-security risk is within appetite on all but one dimension and trending down, and we detect and contain incidents faster than our peers — with one exposure I'll flag and a plan to close it."

13. (a) "We can now detect a far wider range of attacker behaviors than a year ago; the set of attacks that could operate inside our environment unseen has shrunk substantially." (b) "When the industry-wide Log4Shell vulnerability hit, we eliminated our exposure within six days — faster than most institutions — closing a path that caused major breaches elsewhere." (c) "Nine of our critical systems are currently blind spots where an attacker could operate without us seeing it; closing this monitoring gap is our top operational priority this quarter." (d) "The program's overall maturity rose from 2.0 to 2.5 toward our 3.0 target — concretely, this is the difference between handling security ad hoc and running it as a documented, measured discipline."

15. For the claim "our MTTD (5 h) beats the industry average (8 h)" to be fair: the 8-hour figure must come from a named, credible source (e.g., a specific year's DBIR or a reputable industry report), it must be comparable (same definition of detection, similar sector and organization size — an industry "average" that mixes Fortune 500s and startups is not comparable to a mid-size bank), and it must be labeled as approximate and directional, because cross-industry benchmark numbers are notoriously soft (Tier 2 at best). It should be sourced on the slide ("per [source], [year]") and hedged ("roughly," "peer median"). A fabricated or cherry-picked benchmark backfires badly: a director with industry contacts or a board advisor can check it, and the moment one number is exposed as invented or stacked, every number in the deck — and the CISO's credibility — comes into question. An honest "we believe we're ahead of peers, though benchmark data is soft" is far stronger than a precise figure you cannot defend.

16. A worked "Response & Coverage" slide (text):

Response & Coverage — Q1 (headline): We detect and contain serious incidents in hours, not days, and our critical controls are near-complete — with one blind spot we're closing. • MTTD: 1.7 h median (5.5 h mean) — most incidents caught within ~2 hours. • MTTR: 6.9 h to containment. • EDR coverage: 95% of servers · MFA on privileged accounts: 100%¹ · Critical-system logging: 85% (9 systems being onboarded). ¹ Of inventoried privileged accounts; identity-governance review ongoing. Footnote (pre-empting "is 5.5 h good?"): the mean is pulled up by one data-egress incident (18 h to detect); tightening egress detection is the top operational item, in the appendix. This satisfies the constraints: one risk-focused lead sentence, five numbers, median beside mean to handle the outlier, and a footnote pre-empting the obvious director question (plus the honest MFA-denominator footnote).

19. A gaming-resistant MTTR definition for Meridian:

Mean Time to Respond (MTTR). Start event: the timestamp at which an incident is confirmed by an analyst (declared a true-positive incident in the ticketing system), not merely when an alert fired. End event: the timestamp at which containment is verified by the incident handler — the threat can no longer act (account disabled and sessions revoked, host isolated, malicious access cut) — confirmed, not self-asserted. Clock: wall-clock (24×7) elapsed time, because attackers do not keep business hours; business-hours-only variants may be reported separately but are not the headline. Population: incidents of severity High and Critical only (low-severity events are tracked separately to prevent dilution). Excluded: time spent on long-term eradication and recovery after verified containment (those are separate MTT-eradicate / MTT-recover metrics). Paired guardrail: the reopen rate (incidents marked contained that recur within 30 days), reported alongside MTTR so that fast-but-false containment is visible. Two analysts handed the same incident log should compute the same MTTR from this definition.

22. Steps for Dana on discovering the 85%-coverage figure is wrong (true value ~68%): 1. Fix the number, immediately and fully. Recompute coverage with the correct denominator including the 14 acquisition systems; the honest figure (~68%) goes on the deck. There is no version of this where the wrong number stays because the slide was "done." 2. Correct the deck and re-flag the risk. 68% logging coverage is materially worse and likely moves "critical-system logging" from a quiet line to a watch/amber item; reflect that, and check whether the acquisition's systems also drag down EDR and other coverage metrics (a denominator error in one place often hides in others). 3. Say it plainly in the room. Dana should proactively tell the board that the recent acquisition expanded the critical-system population, that integrating its monitoring is incomplete (~68% coverage), and that closing it is now a named priority. Volunteering this is far stronger than being caught later. 4. Treat it as the integrity-of-measurement principle in action. The number is testimony; reporting a denominator she now knows is wrong would be misrepresenting the bank's risk to the people legally accountable for it. The short-term discomfort of a worse number is trivial against the long-term cost of a board that learns the CISO knowingly presented a figure she knew was false. Credibility is spent in an instant and rebuilt over years; the honest correction protects it.

24. The dashboard that hid the breach (half-page memo, key points): (a) Why "zero incidents" is alarming, not reassuring, here: (i) you can only count incidents you can detect, and nothing on this dashboard demonstrates detection capability (no MTTD, no detection coverage), so "zero" may mean "blind," not "safe"; (ii) "100% antivirus" and "98% training" are activity metrics that say nothing about whether intrusions would be seen — a breached system can show all-green on every metric listed; (iii) a mature program with real detection essentially never reports a clean zero over a full quarter — some true positives, contained, are the normal signature of a program that is actually watching, so a pristine zero suggests the sensors are off, not the attackers. (b) Conspicuously absent metrics: MTTD / dwell time (how long an intrusion would go unseen); MTTR (how fast they'd contain one if found); detection coverage vs. ATT&CK (which attacker behaviors are even visible); coverage with honest denominators (100% of what? which systems are outside the count?); risk vs. appetite (is exposure within a stated tolerance?); vulnerability-SLA adherence / third-party-component risk (were the dangerous exposures fixed in time?). (c) The one question: "What is your detection coverage, and how would you know if you'd been breached and just hadn't detected it yet?" — the question the all-green, activity-only deck was structured to prevent, and the one whose answer would have revealed the blindness before the breach.


Chapter 37

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, design-oriented, or discussed in class. For the leadership/design problems, the reasoning and the tradeoffs named matter more than reaching a single "right" answer.

1. SOC tiers — the escalating levels of analyst expertise/authority (Tier 1 triage → Tier 2 investigate → Tier 3 hunt/build) through which an alert flows. MSSP — a provider that monitors your environment and forwards alerts for you to act on. MDR — a provider that brings its own tech and analysts and takes response actions on your behalf. Build vs buy (SOC) — the choice to staff a SOC in-house, outsource it, or combine the two. Combined sentence: "Unable to staff its own SOC tiers around the clock, the mid-size company faced a build-vs-buy decision and chose an MDR provider over a classic MSSP precisely because it wanted real response, not just forwarded alerts."

4. (a) CISO → CIO: advantage — close to technology and IT operations, so security and IT stay coordinated; risk — the CIO is measured on delivery/uptime/cost, which security can work against, so the CISO's budget and warnings may be suppressed. (b) CISO → CEO: advantage — independence from the IT-delivery conflict and a signal that security is taken seriously; risk — the CEO has limited bandwidth for a technical function, so the CISO can become isolated from IT reality. (c) CISO → CFO/GC/CRO: advantage — frames security as enterprise risk/compliance, natural for a regulated bank and aligned with the regulatory relationship; risk — can starve the engineering/operations side, treating security as paperwork. Most natural for a regulated bank: reporting into the risk/compliance side (CFO/CRO/GC), because a bank's security obligations are fundamentally risk-and-regulation framed — but with the near-universal dotted line to the board's Audit/Risk Committee regardless of the solid line, to preserve independence.

7. Walking an alert through the tiers: Tier 1 — (a) primary job: monitor the queue, triage incoming alerts per runbook, close obvious false positives, escalate the rest; (b) escalates out when an alert looks real or is unclear after the runbook steps; (c) makes the next occurrence cheaper by feeding back to Tier 2/3 the alerts that should have had a runbook or been auto-closed. Tier 2 — (a) deep investigation, scope/impact, containment, decide whether to declare an incident; (b) escalates out for major or novel incidents beyond its authority/skill; (c) makes the next occurrence cheaper by writing the runbook Tier 1 will follow next time. Tier 3 — (a) proactive hunting, leading major incidents, malware/forensics, detection engineering; (b) rarely escalates (it is the top) except to leadership for business decisions; (c) makes the next occurrence cheaper by building a new detection so the same threat is caught automatically at Tier 1 or by SOAR — pushing work down the pyramid.

9. A week has $168$ hours. One analyst on a standard schedule covers ~$40$ of them, so covering a single seat continuously needs $168 / 40 = 4.2$ analysts just for the clock. That bare number assumes no vacation, no sick days, no training, and no second analyst for safety — all unrealistic. Adding a slack/coverage factor of roughly $1.2$–$1.7$ for those realities yields $4.2 \times 1.4 \approx 5.9$, i.e. the 5–7 analysts per single seat rule. (See exercise-solutions.py::ex9_analysts_per_seat, which prints 5.0 / 5.9 / 7.1 across slack factors 1.2 / 1.4 / 1.7.) This is the single fact that most shapes build-vs-buy: most mid-size organizations cannot fund 5–7 analysts for every 24/7 seat, so they buy the coverage and build a small in-house core for judgment.

12. MidStream Credit Union build-vs-buy. Running the §37.2 factor table: size/budget → small, favors buy; complexity → standard Microsoft stack, favors buy; talent market → scarce locally, favors buy; speed → has only one analyst and no after-hours coverage now, favors buy; data sensitivity → present (a credit union) but manageable with a vetted provider, neutral-to-build; risk appetite → tight budget, favors buy. Recommendation: buy the core capability — an MDR (or co-managed MSSP) for 24/7 monitoring and first-line response — because MidStream cannot remotely staff continuous coverage (one analyst against any realistic alert volume is wildly understaffed; see ex12_midstream, ~6× over at 1,500 alerts/week). What to keep in-house regardless: the single analyst becomes the in-house owner of vendor management, incident command/decisions, and the relationships with regulators and the business — buy the watching, retain the judgment and accountability. Residual risks: dependence on the provider's tuning and context, the governance burden of granting third-party response authority (scope it tightly, log everything — Chapters 18–19, 29), and concentration risk in a single vendor.

14. The hospital's three specific failures and the controls that would have prevented each: (1) The capability lived in two irreplaceable people — a single point of failure → cross-training and a hybrid model (and a deeper team) so the capability survives any one departure. (2) No documented runbooks — when the two left, their knowledge left with them → runbook-driven operations (institutional memory that does not quit). (3) No coverage / no backup sourcing — a fully in-house 24/7 SOC with no redundancy had no fallback when staff vanished → a co-managed/MDR partner (or at minimum a retainer) carrying the clock so the queue is never simply unwatched. The meta-lesson is Theme 4: whether you build or buy, the capability must survive the loss of any single person.

17. 90-day burnout-stabilization plan (prioritized; many attack more than one driver): 1. Fix the on-call rotation first (week 1–2): expand from two people to a deeper rotation (bring in the other analysts; if numbers are too thin, this forces the sourcing conversation in step 3). Attacks: burnout, retention. 2. Triage the alert noise (week 1–4): have a senior analyst (or assign someone) tune the loudest false-positive sources and add suppression/enrichment. Attacks: alert fatigue (and thereby a cause of burnout). 3. Add capacity via build-vs-buy (week 2–8): if the team is over capacity (run the staffing math), engage an MDR/co-managed partner for overnight Tier 1 and high-volume triage. Attacks: burnout (adds capacity). 4. Write a visible career ladder (week 3–6): publish Tier 1 → Tier 2 → detection-engineering/IR/lead with skills and milestones; have growth conversations with the two flight-risk analysts. Attacks: retention. 5. Inject meaningful work (week 4–12): stand up a recurring purple-team rotation and protected learning time so the week is not only the queue. Attacks: retention, burnout (monotony/futility). 6. Start runbooks + the document→automate pipeline (ongoing): document common alerts, then hand proven runbooks to engineering for SOAR automation. Attacks: burnout, resilience. 7. Add sustainability metrics (week 2 onward): track alert-volume-per-analyst, investigation time, on-call distribution, attrition risk — so leadership sees the next collapse coming. Attacks: all (it is the early-warning system).

19. Alert fatigue (Chapter 21) is the dulling of an analyst's responses by too many low-quality alerts — a SIEM/detection-layer problem, fixed by detection engineering (higher-fidelity, enriched, suppressed rules). Analyst burnout (this chapter) is chronic exhaustion and cynicism produced by the volume, monotony, stress, and unsustainable structure of the work — an organizational-layer problem, fixed by deeper staffing, automation, a career ladder, meaningful-work rotation, sane on-call, and attentive leadership. They are causally linked (unmanaged alert fatigue is a leading cause of burnout), but fixing one does not fix the other: tuning rules reduces a cause of burnout while leaving staffing, growth, monotony, and on-call untouched, and hiring more analysts does nothing about a flood of noisy alerts. A leader must work both layers at once.

21. Escalation runbook for "EDR alert: domain-admin login from an unrecognized workstation at 02:30." Severity: SEV-2 (high) — a privileged account behaving anomalously off-hours is a classic early indicator of compromise (justify: domain-admin = keys to the kingdom; unrecognized host + odd hour = unusual; not yet confirmed malicious, so not auto-SEV-1). Acknowledgment: ≤15 minutes. Steps: 1. Acknowledge in the SIEM/EDR; do not close. 2. Confirm the facts: which admin account, which workstation, is the host enrolled/known, what is the source geo/network. 3. Check for corroborating signals: other alerts on the account/host in the last 24h; recent password changes; concurrent sessions elsewhere (impossible travel). 4. Attempt out-of-band contact: is this a known admin doing legitimate late work? (Call, do not email the possibly-compromised account.) 5. Decision: if benign-confirmed → document and close with reason. If unconfirmed or suspicious → contain: disable the account and/or isolate the host (per authority), and escalate. Escalation chain: Tier 1 (ACK ≤15) → Tier 2 on-call (ACK ≤15) → if confirmed incident, SOC Manager declares the incident → if it reaches a domain-wide or wire-transfer system, IR Lead (Priya) takes incident command → if customer data or a breach-notification clock is implicated, CISO (Dana) engages Legal, Comms, and the board channel. Off-hours: the MDR partner performs first-line investigation/containment and hands to Meridian's on-call Tier 2.

25. Purple teaming is a collaborative exercise in which a red team emulates real adversary techniques (mapped to ATT&CK) while the blue team detects and responds in real time, the two working together to turn every exposed gap into an immediate, re-tested detection improvement. The single framing change that distinguishes it from traditional red-vs-blue: red and blue are collaborators, not adversaries — the red team's goal shifts from "stay undetected and win" to "systematically exercise the blue team's detections," and the blue team watches live. That change produces a measurably better outcome because the gap-finding and gap-closing happen in the same session: a missed technique is immediately turned into a new/tuned detection and re-run to confirm it now fires, so the exercise ends with improved defenses, not merely a report of what was missed.

27. Ten techniques: 5 detected-and-alerted, 3 logged-but-not-alerted, 2 not-visible-at-all. (a) The 3 logged-but-not-alerted are fixed by detection engineering (the data exists; write/tune a rule). The 2 not-visible-at-all are fixed by adding telemetry (a log source/sensor must be added before any detection is possible). (b) Fix the 3 logged-but-not-alerted first: they are the cheapest and highest-yield — no new infrastructure, and closing them raises coverage from 5/10 (50%) to 8/10 (80%) (see ex27_purple). The 2 telemetry gaps are more fundamental and slower (deploy sensors, validate data) and should be prioritized by the risk of the techniques they hide. (c) The coverage metric (Chapter 36) lets you plot the fraction of the relevant ATT&CK matrix you can detect over successive exercises, so leadership sees a rising line — concrete evidence the SOC is getting harder to breach, and the natural companion to the staffing evidence on the board slide.

29. Marcus's instinct felt like leadership because he was visibly working hardest — diving into the deepest tickets and taking the pages himself. It actually harmed the team because, heads-down in the queue as the best individual contributor, he was (a) not building the structure — the deep rotation, runbooks, career ladder, and honest load metrics — that would let the team scale beyond his personal effort, and (b) not noticing that the structure was failing (Theo interviewing, the two-person rotation one resignation from collapse). The hardest transition in security leadership he had failed to make is from doing the work to building the system and the people that do the work — from hero to multiplier. Leadership is, in large part, the work of noticing and building, which cannot be done from inside the queue.

31. Mechanism: blaming the individual who made the visible mistake (e.g., the analyst who dismissed the alert) teaches the entire surviving team that surfacing a mistake or a near-miss is dangerous — so people hide gaps and errors rather than report them, which makes the organization progressively blinder to its own weaknesses and the next incident more likely. A blameless response teaches the opposite: that finding a gap is rewarded, so gaps surface and the system improves. Sample debrief statement: "We are here to understand and fix the system that allowed this to happen, not to find a person to punish. If you saw something, missed something, or made a call that turned out wrong, I want to hear it — that information is how we prevent the next one." Making it credible: the statement is only believable if leadership's actions match — no one is quietly fired or sidelined for the honestly- reported mistake, the review's outputs are systemic fixes (runbooks, detections, staffing) rather than a named scapegoat, and leaders model it by owning their own contributing decisions first. A slogan without matching action teaches the team to distrust the slogan.

33. The green dashboard that lied. A dashboard can be all-green while the function fails because the green metrics (MTTD, MTTR, coverage, Tier 1 close rate) measure the tools and the throughput, not the team's sustainability. MTTD/MTTR can look fine on the incidents that are worked; coverage measures detection capability, not capacity to act on it; and a high Tier 1 close rate can reflect dismissal (closing without investigating) rather than resolution. At least four things the dashboard did not measure: (1) alert volume per analyst (load vs capacity); (2) investigation time per alert (is the team actually looking, or reflex-closing?); (3) on-call distribution (are two people carrying everything?); (4) attrition risk / team sustainability (are the best people about to leave?). First three actions as the new CISO: (1) talk to the people and run the staffing math to quantify the overload; (2) stabilize the rotation and add capacity (deepen on-call; make the build-vs-buy decision — MDR for the clock); (3) add the sustainability metrics to the dashboard so it stops lying — and begin runbooks + a career ladder to retain who remains. The core realization: the predecessor's metrics measured the tools, not the team, and a SOC's real output is bounded by its human sustainability.


Chapter 38

Worked solutions to the daggered (†) exercises. The remaining problems — especially the ⭐⭐⭐ leadership-judgment items — are open-ended; the boardroom has defensible answers, not single right ones.

1. A pile of artifacts is the correct, necessary raw material — a network diagram, an auth standard, an IR plan — with no relationship among the pieces. A security program adds coherence: the components reference each other, priorities are reconciled against one budget, gaps are known and owned, and the whole maps to a story leadership can fund. The three properties that turn one into the other are a spine (an organizing logic — for Meridian, risk), a structure (a small number of legible layers — the NIST CSF functions), and a strategy (a stated multi-year direction that rules choices in and out).

4. It does not follow because a program is coherence among controls, not a complete control inventory (the §38.1 threshold concept). You can own every control in the book and still have no program if no one can say in one breath what you protect, what threatens it, what you have done, and what remains — because there is no spine tying controls to risk, no structure a board can hold, and no strategy giving direction. Conversely, a modest control set, governed and narrated coherently against risk, is a program. Boards fund coherence, not components; "we have every control" is an inventory boast, not a program.

6. A correct program-on-a-page places each component under its CSF function with the chapter that built it; any reasonable compression is fine so long as all six functions appear and no control is re-derived. A model answer mirrors Figure 38.1: Govern — governance/policies (26), risk assessment/appetite (27), compliance map (28), TPRM/SBOM (29), metrics (36), org/SOC model (37); Identify — asset inventory/risk register (1), threat model (2), control framework (3), vuln-mgmt (23); Protect — crypto (4–5), network/seg (6–7), wireless/DNS/email (8–9), hardening (11), appsec (12–13), mobile/IoT (14), cloud (15), identity stack (16–20), pipeline/ZT (31–32), OT (33), awareness (30); Detect — network monitoring (10), SIEM/detection (21–22), UEBA (34); Respond — IR plan/playbooks/tabletop (24); Recover — forensics readiness (25), backup/restore + lessons (24–25); cross-cutting — emerging-threat watch/crypto-agility (35). Full credit requires every function populated and correct chapter attribution; the connection statements (Ex.7) are what demonstrate coherence.

8. The Protect layer holds the most components because defense in depth assumes each layer will fail (the recurring theme): surviving a single failure requires many independent controls, each designed as if the one in front of it has already been breached. The original phishing attack (Ch.1) is now met by at least five Protect-layer controls: email authentication SPF/DKIM/DMARC and a secure gateway (Ch.9); the security awareness program that trains the workforce to report (Ch.30); phishing-resistant MFA that defeats credential theft even on a click (Ch.16); least privilege (Ch.17) and PAM (Ch.19) that limit what a single compromised account can reach; and network segmentation (Ch.6–7) that contains lateral movement — with detection and response (Ch.21, 24) behind all of it because we assume even these can fail. One attack, met by independent layers: defense in depth assembled.

11. Ratios (risk reduced ÷ cost) and phases: - (a) MFA: 2.0M ÷ 150K = 13.3Phase 1 (high ratio, no dependency). - (b) CDE segmentation: 1.4M ÷ 200K = 7.0Phase 2 (good ratio and a PCI obligation, so it is pulled forward regardless; the network-refresh dependency keeps it out of Phase 1). - (c) 24×7 SOC: 1.8M ÷ 1.5M = 1.2Phase 2 (solid absolute risk reduction but a weak ratio and a dependency on a mature SIEM; sequence in the middle, not first). - (d) Disable orphaned accounts: 900K ÷ 40K = 22.5Phase 1 (the single best ratio; cheap, fast, no dependency — exactly the "stop the bleeding" archetype). - (e) Zero-trust migration: 2.4M ÷ 4.0M = 0.6Phase 3 (large absolute risk but a weak ratio and a hard dependency on identity + segmentation being done first — a costly, blocked megaproject). The phase that does not match pure ratio order is (b): its ratio (7.0) is lower than (a) and (d) but it shares Phase 2 placement with the lower-ratio (c) — because its compliance obligation pulls it forward while its dependency keeps it out of Phase 1. The override is the lesson.

13. The two legitimate overrides of the risk-per-cost ranking are: (1) a hard dependency — you physically cannot build B before A exists (Meridian example: the zero-trust migration cannot start before the identity stack and network segmentation foundations are in place, so it waits regardless of its ratio); and (2) a non-negotiable compliance obligation — a control the organization is legally required to have (Meridian example: full CDE segmentation is a PCI-DSS requirement, so it is funded even if a cheaper initiative had a marginally better ratio — the floor is not optional, Ch.28). No other factor — a louder stakeholder, a newer technology, a vendor relationship — legitimately overrides the ratio.

17. ALE = SLE × ARO for each top untreated risk: - Credential compromise → account takeover: $3.0M × 0.8 = **$2.4M/yr. - Lateral movement via over-privileged account: $2.5M × 0.6 = **$1.5M/yr. - CDE breach (cardholder data): $5.0M × 0.4 = **$2.0M/yr. - Total "cost of doing nothing" = $2.4M + $1.5M + $2.0M = $5.9M/yr** (rounded to ~$6M for the board). This total is the anchor the whole business case hangs from, and every number is traceable to a stated SLE and ARO — not plucked from the air.

19. One-sentence board-language case: "We propose investing $1.7M to remove roughly $5M of annualized expected loss, taking residual cyber risk from about $6M to about $0.9M — below the $1M appetite this committee set." (Using the rounded prose figures; the exact computation removes $5.0M from a $5.9M anchor. Either is fine if you state which.) This framing outperforms "we could get breached" because a board's entire job is allocating capital against risk: the loss-avoided framing hands them a security decision in the exact format of every other decision they make (a known cost weighed against a quantified risk), whereas fear asks them to act out of character. Return arguments get funded; fear gets questioned.

23. A board cares about four things, each answered by a deck slide (Figure 38.3): (1) Are we exposed? → slide 2, the risk story (top risks in dollars). (2) Are we handling it competently? → slide 3, what we've built (the program-on-a-page), reinforced by slide 7, the metrics/KRIs. (3) Are we compliant? → slide 6, the business case where compliance obligations are called out as the legal floor. (4) What do you need from us? → slides 1 and 8, the ask and the decision (the specific, decidable motion). Anything that answers none of the four belongs in the appendix, not the deck.

25. A model "What's left" slide names two Meridian risks still above appetite in plain language with a one-line plan each, e.g.: "(1) After-hours detection gap: our SOC runs business-hours only, so an intrusion at 2 a.m. is detected late — Phase 2 brings the SIEM to 24×7-ready and Phase 3 funds full coverage. (2) Privileged-access controls incomplete: some domain and cloud admin paths still lack just-in-time access and session recording — PAM rollout in Phase 2 closes this." Presenting weakness builds board confidence because the board knows no program is gapless; a presentation with no weaknesses signals concealment and poisons trust in everything else, whereas naming the gaps plainly proves the leader sees clearly and is managing the risk rather than managing the board — which is exactly the competence the board exists to assure itself of.

29. A model reply to the board chair (honest, mapped to controls, names the gap, ties to the ask, no panic): "Good question, and the honest answer is: better protected than the bank in the news, but not invulnerable. A ransomware intrusion like theirs typically starts with a stolen credential and spreads across a flat network — and we have closed much of that path: phishing-resistant MFA defeats the credential theft (Ch.16), network segmentation limits lateral movement (Ch.6–7), and we have a tested IR plan and forensics readiness to respond and recover if something does land (Ch.24–25). Where we are still exposed is after-hours detection — our SOC is business-hours only, so a 2 a.m. intrusion would be caught late — which is exactly the gap the 24×7 coverage in our pending roadmap closes. That is part of what next week's budget ask funds; I'd rather close it before we need it than after." Full credit requires all four elements without inducing panic and without overclaiming invulnerability.

32. The consultant's artifact fails every rubric dimension (§38.7): coherence — a flat, unordered tool list has no spine and no structure; traceability — no risk register, so nothing maps to a named risk; prioritization — no roadmap, costs, or sequence, so the board cannot judge what comes first or why; business framing — "increase spend by 20%" is a fear/round-number ask with no loss-avoided or risk-reduced argument; honesty — no gaps or residual risk stated, which (paradoxically) reads as concealment; the ask — "+20%" is not a decidable motion (for what, over what period, to what result?); audience fit — a 26-tool inventory is engine detail, not "will the plane land?" What to deliver instead, by milestone: (1) assemble the tools and controls into a program-on-a-page under the CSF functions with a risk register as the spine; (2) prioritize a costed, phased roadmap ranked by risk-per-cost, each item tied to a register row; (3) build a four-part business case anchored on ALE-based cost-of-doing-nothing with one specific ask; (4) present an 8–12-slide deck leading with the ask, telling the risk story, owning the gap, showing board KRIs, and closing with a decidable motion. The board asked for "a real plan" because the consultant delivered an inventory, not a program.


Chapter 39

Worked solutions to the daggered (†) exercises. Most other exercises are personal or open-ended (build your plan) and have no single answer — they are discussed in class or in a study group.

1. "How do I get into cybersecurity?" is poorly formed because "cybersecurity" is not one job but a family of distinct jobs (blue team, red team, GRC, cloud, AppSec, engineering) that share a goal and little else — so the question has no single answer. The better question is "which kind of security problem do I want to spend my days on — catching attackers, building defenses, governing risk, or going deep on one technology?", because answering it brings the certifications, the lab, the job titles, and the ladder all into focus (§39.1 threshold concept).

4. The red-team (offensive) neighborhood is a poor first-job target because it is small relative to the field's size and almost never entry-level: it typically requires existing deep technical and defensive knowledge, so it is a mid-career destination, not a front door. The usual path that does lead there runs through the blue team or engineering first — you defend better having attacked, and you attack better having defended — so aspiring red-teamers commonly start in a defensive or engineering role and move laterally once they have the foundation. (This defensive book does not train for it; the companion offensive volume does.)

6. "Door-opener, not a skill": a certification gets your résumé past automated filters and human recruiters (and is sometimes a hard requirement), but it does not prove you can actually do the work. The single most important practical consequence is that you must pair every credential with demonstrable, speakable experience — from a home lab, a portfolio, or a job — because the interview tests competence, not the certificate. A cert you cannot speak to behind is a liability, not an asset. Concretely: prepare for interviews by rehearsing stories ("walk me through an alert/incident/project you handled"), not by re-reading exam objectives.

8. Example response: "The CISSP is widely respected, but it is a management-breadth credential with a multi-year experience requirement — you can pass the exam early but you'd hold associate status, and your résumé would then signal 'manager' over little or no experience, which helps no one. Worse, you'd spend months on breadth you can't yet use when you should be deepening the thing you're actually doing. Start with CompTIA Security+ instead: it's vendor-neutral, widely accepted, foundational, and its body of knowledge maps onto the work you'll do first. Save the CISSP for when you have the years to back it, and put the freed energy into a home lab and portfolio — those will matter more for your next role."

12. The experience paradox: every job wants experience, but you need a job to get experience — a loop that stops many careers before they start. The two things that break it for a newcomer are (1) a home lab (hands-on practice on systems you own, which manufactures real skill without prior employment) and (2) a portfolio (public, demonstrable evidence of that practice — lab and CTF write-ups, detection rules, explainers — which lets a hiring manager judge your skill directly). This field lets you manufacture the missing ingredient yourself, more than almost any other, because the tools are free or cheap, the practice can be done legally in isolation, and demonstrated skill is unusually valued relative to formal credentials (which is also why the field is so open to career changers).

15. A CTF is a legal place to practice offensive-flavored skills because the organizers own or provide the targets and explicitly invite you to attack them — so you have authorization, the single property that distinguishes professional practice from a computer crime (§39.5). Attacking an arbitrary website you do not own lacks that authorization, no matter how good your intentions or how obvious the flaw, and is therefore unlawful (the same nmap command is professional in one context and a crime in the other; the only difference is authorization). The one property that makes the difference is authorization (the organizers' provision of, and invitation to attack, the targets).

18. (Self-assessment — the structure, since the content is personal.) A correct answer presents a real, current job posting; extracts 6–10 required skills; and rates each honestly as have / partial / gap without inflation (Dana's rule: a register full of confident nonsense is worse than a short, true one). It then names the single biggest gap and the cheapest first step to begin closing it (e.g., "cloud logging — gap; cheapest start: a free-tier account in my lab plus Chapter 15"). Full credit requires honesty (at least one acknowledged gap or partial) and a specific, cheap first action, not an aspirational list. See example-02-skills-gap-assessment.py for the readiness computation; Theo's example yields 58.3% readiness with cloud logging as the biggest gap.

22. Walk-through of the scenario (classmate wants to scan their old high school's public website to "confirm" a SQL-injection flaw, then email proof): - What is NOT authorized: running a scan against the school's website. The classmate does not own it and has no written permission; "my old school" confers no authorization, and neither does a good intention. Even a "quick" scan to "confirm how bad it is" is unauthorized access/testing. - Law implicated (in general terms): in the U.S., the Computer Fraud and Abuse Act (CFAA) broadly criminalizes accessing a computer "without authorization" or "exceeding authorized access"; other countries have equivalents (e.g., the UK Computer Misuse Act). This is general description, not legal advice, and the statute's exact boundaries have been litigated — but the safe rule does not depend on those nuances. - The ethical path: do not scan or gather proof by unauthorized access. Instead, (a) check whether the school publishes a vulnerability-disclosure contact (e.g., a security.txt or a bug-bounty program); if so, report the observation through that authorized channel. (b) If not, simply tell the appropriate person plainly — "your site may have a serious flaw; please have your web provider check it" — without probing further. The line is precise: observing that something looks wrong from normal, public use is fine; actively testing/scanning to confirm it is not, absent authorization. When unsure whether you have authorization, you do not, and you stop.

26. The three major transitions and the new skill each demands: - Analyst → engineer: proven competence and trust plus the ability to build defenses (not just operate them); rewards depth. - Engineer → architect: systems thinking — seeing how all the pieces fit, making cross-cutting tradeoffs, and making build-vs-buy decisions — rather than excellence at one component. - Architect/manager → CISO: communication and business judgment — translating security into business risk for leaders who will never read a log. Careers most often stall at engineer→architect because the person keeps getting better at optimizing their own piece (the skill that earned the last promotion) instead of building the new, whole-system skill the next rung requires — an instance of the threshold concept that the rewarded skill changes at each rung.

30. The résumé that doesn't add up. - (a) Advance the second candidate (Security+ in progress, a public repo of detection rules, three lab write-ups, an active CTF profile, a help-desk job). They have demonstrable, speakable skill and some real experience — exactly what an entry SOC role needs — whereas the first candidate (CISSP/CISM/OSCP, "expert in all domains," zero experience, no portfolio) has only door-openers with nothing behind them. More certifications is not more qualified. - (b) Using "door-opener, not a skill," the first résumé most likely signals a candidate who studied for exams but has not done the work — and the advanced/management credentials over an empty experience record actively mismatch (a "manager/expert" signal with no demonstrated competence), which is a red flag rather than a strength. - (c) A single good question: "Walk me through a specific time you investigated a real alert or incident (or built/tuned a detection) — what did you see, what did you do, and what did you find?" A candidate whose credentials are backed by real work answers fluently with specifics; one whose are not falls back on definitions. (The exercise's deeper point: recognizing that more certifications can be a warning sign, not a qualification, when nothing demonstrable backs them.)


Chapter 40

Worked solutions to the daggered (†) exercises. Other exercises are open-ended, reflective, or discussed in class. Reasonable answers vary on the synthesis problems; the reasoning and the mapping to the book's controls matter more than exact wording.

1. The §40.1 method, in order: (1) Assume it was stoppable — start from the missing control, not fatalism. (2) Reconstruct the timeline, rigorously separating verified fact (official reports) from speculation (press/rumor). (3) Map the kill chain (ATT&CK stages) — breaking any link stops the breach. (4) Identify which controls failed, classified as absent, misconfigured, or working-but-unwatched (each needs a different fix). (5) Name the controls that would have changed the outcome, specifically and by chapter. (6) Extract the transferable lesson and ask "could this happen to us?" with evidence, not reassurance.

3. Transferable lessons: (a) SolarWindsunverified trust is an attack surface; verify what you run (SBOM), how it was built (provenance), and who you trust (TPRM), and watch even trusted software for untrusted behavior. (b) Colonial Pipelineidentity is the perimeter, including the door you forgot; modern breaches begin with valid-account compromise, and you must also plan the decisions the incident forces on you (the shutdown was a defender's call under uncertainty). (c) Log4Shellyou can't secure what you can't see; the dependency you forgot is your largest unmanaged attack surface, so build the inventory (SBOM) before the emergency.

6. SolarWinds from the victim's seat. (a) Patch management told them to apply vendor updates promptly, and signature verification confirmed the update genuinely came from SolarWinds and was unaltered after signing — both passed correctly, because the attackers compromised the build pipeline, so the malicious code was signed with the legitimate key at the source. The controls did exactly what they were designed to do; the flawed assumption underneath them ("a signed vendor update is trustworthy") was the gap. (b) After installation: (i) C2 beaconing — behavioral/beacon detection on outbound traffic from internal servers (Ch.10 §10.5, Ch.22) would catch a trusted host suddenly beaconing to new infrastructure on a regular interval; (ii) lateral movement / credential & token theft — segmentation (Ch.6–7), PAM and identity monitoring (Ch.19, 15), and cloud identity anomaly detection would catch the foothold trying to expand and forge tokens. (c) It was not primarily a failure of the victims' controls at the initial-compromise stage (prevention was nearly impossible). It was a survivability question determined by whether the victim had defense-in-depth layers behind prevention — behavioral detection and segmentation. So: not a prevention failure, but for the worst-hit victims, a detection and containment gap.

8. Colonial, the defender's decision. (a) The ransomware reportedly hit IT systems (including business systems for billing/product tracking), and Colonial could not be certain the attack had not or would not spread across the IT/OT boundary; in critical infrastructure, when you cannot assure the safety and integrity of the physical process, you stop it. The shutdown was therefore a protective decision under uncertainty (Ch.33), not a direct malware effect. (b) A strong, monitored, well-understood IT/OT segmentation — an industrial DMZ, passive OT monitoring, and confidence in the Purdue-model boundary (Ch.33 §33.3) — would let the organization answer "is OT affected?" quickly and credibly, possibly avoiding a full shutdown. Segmentation here preserves decision-making options under attack. (c) The ransomware tabletop (Ch.24 §24.5): rehearsing the hard calls (disconnect, pay/don't-pay, communicate, when to invoke recovery) before facing them is the only way to make them well under pressure.

10. Log4Shell, buying time. Three defense-in-depth controls for internet-facing vulnerable apps before patching: (i) WAF virtual-patching (Ch.13) — block the exploit string in inbound requests so the vulnerable app cannot be triggered while the real fix is deployed; (ii) egress filtering (Ch.7) — block the outbound callback the exploit relies on (e.g., the LDAP/DNS fetch), neutering many exploitation attempts even on unpatched systems; (iii) behavioral detection use cases (Ch.21–22) — while these do not prevent exploitation, they ensure still-unpatched systems are watched (e.g., a server making an unexpected outbound request right after external input, or a Java process spawning a shell), so an exploited host is caught fast.

12. SolarWinds controls-to-failure table (at least five rows; P=preventive, D=detective, C=corrective): | Kill-chain stage | Control (this book) | Chapter | Type | |---|---|---|---| | Build pipeline compromised at source | Pipeline integrity; artifact signing tied to provenance; reproducible builds | 31 (§31.4) | P | | No visibility into update contents | SBOM + software provenance / SLSA | 29 (§29.3) | P/D | | Vendor trusted without security assurance | TPRM — assess critical vendors' dev practices | 29 (§29.2) | P | | Trusted host beaconed to C2 undetected | Behavioral/beacon detection; baselining | 10, 22 | D | | Foothold moved laterally to crown jewels | Segmentation; zero-trust principles | 6–7, 32 | P | | Lateral movement reached identity/cloud, forged tokens | PAM; identity governance; cloud identity monitoring | 18–19, 15 | P/D | Highest-leverage for victims: behavioral detection + segmentation (the survivability layers).

14. Log4Shell controls-to-failure table (visibility / protection / detection categories noted): | Stage / failure | Control | Chapter | Category | |---|---|---|---| | No inventory of where Log4j ran (incl. transitive) | SBOM + SCA | 12, 23, 29 | visibility | | Couldn't prioritize thousands of findings | Risk-based vuln mgmt (CVSS+EPSS+KEV+context) | 23 (§23.3) | visibility/triage | | No fast disclosure→patched path | Vuln-mgmt lifecycle + risk-based SLAs | 23 | protection | | Internet-facing apps exploitable immediately | WAF virtual-patching; egress filtering | 13, 7 | protection | | Exploitation attempts needed catching | Behavioral detection use cases | 21–22 | detection | | Vendor software contained vulnerable Log4j | TPRM — demand SBOMs/remediation | 29 | visibility | All three categories were needed because the patch could not be everywhere instantly: you must find your exposure (visibility), shield what you can't yet patch (protection), and watch what's still exposed (detection). Visibility is the prerequisite — without the SBOM, the other two have no targets.

16. Reconciling "unpreventable" with "survivable." For victims, the initial compromise was essentially unpreventable because it arrived as a legitimately-signed update through the sanctioned patch process — no reasonable preventive control would have rejected it. But the attack still had to act inside the network (beacon, move laterally, reach identity), and each of those actions was detectable and containable. So prevention failed while later layers could still win. The lesson: in a mature program, prevention and detection are not substitutes but a sequence — you invest in prevention to stop most attacks cheaply, and in detection/response because prevention will sometimes fail completely, sometimes through no fault of yours. "Assume breach" (Theme 4) is the formal name for designing the detective and corrective layers as if prevention has already failed — because at SolarWinds, it had.

20. Breach stress-test (Meridian) — example for Colonial: attack path against Meridian's program: stolen/stale credential probes VPN → phishing-resistant MFA (Ch.16) stops the login (the credential alone is insufficient); had MFA somehow been bypassed, identity governance (Ch.18) should already have disabled the stale account in a quarterly review; behind that, PAM (Ch.19) limits what a privileged account can do and segmentation (Ch.6–7) limits lateral movement; if ransomware deployed anyway, the tested IR plan + ransomware tabletop + backups (Ch.24) govern containment and recovery. Single residual gap: a currently-valid, actively-used privileged account phished from an alert employee (MFA-fatigue on any non-FIDO factor, or session theft) — caught only by behavioral detection/PAM, not prevented. Board sentence: "A Colonial-style stolen-credential attack is plausible here; phishing- resistant MFA and identity governance make initial access very unlikely, and if it occurred our rehearsed ransomware response contains it — the residual risk is a freshly-compromised valid account, which our behavioral monitoring is designed to catch." (SolarWinds and Log4Shell variants are graded analogously; see §§40.2/40.4 for the control chains and named residual gaps.)

24. Write the detection (behavioral logic, product-agnostic). (a) SolarWinds-style C2: Alert when a host in the server zone initiates an outbound connection to an external destination that (i) it has never contacted before AND (ii) recurs at a regular interval (beaconing) — especially if the originating process is trusted/signed vendor software. False-positive risk: legitimate new vendor telemetry features, CDN/update endpoints, software that legitimately beacons. Tuning: maintain an allowlist of known-good vendor endpoints; require the "never seen before" + "regular interval" + "server-zone" conjunction; suppress when the same beacon appears uniformly across all hosts of a given agent version to a verifiable vendor domain (the benign-update signature). (b) Log4Shell exploitation: Alert when a server makes an unexpected outbound LDAP/RMI/DNS request, or spawns a shell/child process, shortly after receiving external input (e.g., within seconds of an inbound HTTP request to an internet-facing app). False-positive risk: applications that legitimately make outbound LDAP/DNS calls. Tuning: scope to processes/hosts that should not make such calls; correlate the inbound-input → outbound-callback timing; prioritize on internet-facing assets; pair with the WAF block events for confirmation.

28. Identity ↔ Colonial (Ch.16/18/19). Each plays a distinct role against the stale-account initial access: Ch.16 (Authentication / MFA) prevents the credential from working at all — strong, phishing-resistant MFA means a stolen or reused password is insufficient to authenticate. Ch.18 (Identity Governance / JML + access reviews) finds and removes the account before it can be used — the joiner-mover-leaver lifecycle and periodic access certifications disable stale/orphaned accounts so the door does not exist to be rattled. Ch.19 (PAM) limits the damage if a valid privileged account is compromised — vaulting, just-in-time access, session recording, and tiering mean that even a successful privileged login reaches less and is watched. Together: prevent the credential, remove the account, and contain the privilege — defense in depth across the identity layer.

30. Risk ↔ all three (qualitative L×I, Ch.1/27). SolarWinds: a pre-incident qualitative score would likely have understated it — the likelihood of a trusted vendor's build pipeline being weaponized looked very low to most organizations, so the score would be low despite catastrophic impact; this is the "low-likelihood/high-impact, and interacts with everything" blind spot of the simple model (Ch.1 CS1). Colonial: a qualitative score would have flagged it if the assessment honestly rated "remote access without MFA on any account" — high likelihood (credential attacks are constant) × high impact = critical; the failure was not the model but not running it against the forgotten account (you cannot score an asset you have not inventoried). Log4Shell: before disclosure, the specific CVE was unknowable, but the systemic risk — "we have no inventory of our software components" — was assessable and would have scored high for any organization honest about its lack of an SBOM; the model understates it only if you fail to frame the risk as "inability to find/patch a critical dependency quickly." Cross-cutting lesson: the simple model understates risks that are low-probability-high-impact or that interact with every other risk (untested backups, missing inventory), which is why richer methods (Ch.23 EPSS/KEV, Ch.27 ALE) and honest scoping matter.

32. CTF — the reassuring report (every claim true, still exposed). At least five blind spots: 1. "Fully PCI-DSS compliant"Compliance is the floor, not the ceiling (Theme 5). None of the three landmark breaches resulted from failing an audit; compliant organizations were breached anyway. Close with: real security beyond the checklist — threat-driven controls, detection, response. (Shape: all three.) 2. "All vendor software is digitally signed" → A valid signature proves origin + post-signing integrity, not that the build was uncompromised or the software is safe. SolarWinds shape. Close with: software provenance / SLSA, SBOM, behavioral monitoring of trusted software (Ch.29, 31, 22). 3. "Patched within 30 days" → For a Log4Shell-class zero-day, exploitation begins in hours; 30 days is an eternity, and you can't patch what you can't find. Log4Shell shape. Close with: SBOM for minutes-not-days exposure answers, WAF/egress to shield before patching, emergency SLAs for known-exploited internet-facing RCE (Ch.23, 13, 7). 4. "Firewalls are next-generation" → The perimeter is not the supply chain or identity. Colonial walked in through a stolen credential on a VPN; a firewall does not stop a valid login. Close with: phishing-resistant MFA on all remote access, identity governance to kill stale accounts (Ch.16, 18). 5. "No critical findings" → A clean scan can be wrong or blind (Equifax: a scan reportedly missed the still-vulnerable Struts host; an expired cert blinded detection for ~76 days). Close with: validated scanning, monitor-the-monitors, behavioral detection, egress baselining (Ch.23, 21–22, 10, 5/20). The deeper point: the report describes the absence of known findings, not the presence of security — and the report itself cannot see the blind spots, which is the entire risk.