Case Study 1: Building Meridian's Host Hardening Baselines

"A standard nobody audits is a suggestion. We're going to write standards we can prove." — Sam Whitfield, Security Engineer, Meridian Regional Bank (constructed)

Executive Summary

After the unhardened-file-server breach that opened this chapter — three days to scope, forty servers sharing one local-admin password, a PowerShell timeline that barely existed — CISO Dana Okafor gave Security Engineer Sam Whitfield a mandate and a constraint. The mandate: a host-hardening standard for every operating system Meridian runs, so that "build a server" stops meaning "inherit the vendor's defaults." The constraint: do not break production. Meridian runs a legacy core-banking application certified against an older platform, and a hardening project that takes that application down on a Monday morning would end Sam's credibility and the program with it.

This case study is a design-and-build exercise. It follows Sam, junior analyst Theo Brandt, and GRC analyst Elena Vasquez as they turn CIS Benchmarks into Meridian-specific baselines, choose Level 1 versus Level 2 by system role, enforce the baselines through Group Policy and configuration management, deploy LAPS and EDR, and — the part most programs skip — build the audit that proves a host actually matches its baseline, using the harden.py tool from the chapter's Project Checkpoint. You will watch the chapter's concepts (attack surface reduction, baseline configuration, CIS levels, drift detection, compensating controls for the unpatchable) become a working standard. All names, figures, and configurations are constructed for teaching (Tier 3).

Skills applied: deriving baselines from CIS Benchmarks; choosing CIS Level 1/2 by role; enforcing configuration at scale (Group Policy, config management, MDM); deploying LAPS and EDR; auditing for drift; handling systems that cannot be patched on demand with compensating controls; writing a standard that is verifiable rather than aspirational.

Background

Meridian's host estate, inventoried for the first time as part of this effort, looks like most mid-size banks':

Platform Count (approx.) Role notes
Windows 10/11 workstations ~1,500 Tellers, back office, branches; general-purpose
Windows Server ~200 Domain controllers, file/print, app tiers, some in the CDE
Linux (RHEL / Ubuntu) ~60 App and data tiers; a growing share in AWS
macOS ~40 Executives, designers, some developers; currently unmanaged
Legacy "Atlas" interface server 3 Core-banking gateway; certified only on an out-of-support OS

Before the project, "hardening" at Meridian was tribal knowledge — whatever the engineer who built a given box happened to know and bother to do. There was no written baseline, no enforcement, and no audit. The file-server breach was the predictable result: a box built by someone in a hurry, to no standard, that nobody ever checked. Dana's framing to the board was deliberately unsentimental: "We didn't get breached because we lacked a tool. We got breached because we had no standard and no way to prove a machine met one. We're going to fix both."

🔗 Connection: This effort sits directly behind the network architecture Sam and Theo built in Chapters 6–7 (zones, default-deny perimeter) and the network monitoring from Chapter 10. The network decides who can reach a host; this project decides what an attacker can do once they reach it. The two are the same defense-in-depth stack (Chapter 3) seen from two sides — and the unpatchable Atlas server, below, is where they have to cooperate.

The Build

Phase 1 — From CIS Benchmarks to Meridian baselines

Sam refuses to write hardening settings from a blank page. "Someone has already done the hard, detailed work of figuring out what 'hardened' means for Windows Server, for Ubuntu, for macOS," he tells Theo. "Our job isn't to reinvent it. It's to adopt it, adjust it for our environment, and enforce it." For each platform he pulls the relevant CIS Benchmark as the starting point and makes one decision per platform that everything else hangs from: which level.

The level decision is a risk decision — the same likelihood × impact judgment from Chapter 1, applied to a configuration knob. Sam reasons it out per role:

System role CIS level chosen Reasoning
General Windows workstation Level 1 Broad functionality needed; Level 1 hardens with minimal breakage
Windows Server (general) Level 1, trending L2 Servers run a fixed software set; push specific L2 items as tested
Windows Server in the CDE Level 2 (target) Cardholder data; accept operational cost for stricter settings (PCI scope)
Linux server Level 1 + selected L2 MAC enforcing, keys-only SSH (some L2 items adopted explicitly)
macOS Level 1 (MDM-enforced) First goal is management; harden once enrolled

⚠️ Common Pitfall (avoided): Theo's first instinct was "apply Level 2 everywhere — more is better." Sam pushes back: "Level 2 on a general teller workstation will break something a branch needs by Tuesday, and then they'll hate security and route around us. Level is a tradeoff, not a score. We pick the level the role can actually bear, and we earn the right to push stricter by testing, not decreeing." This is the chapter's "harden boldly, deploy carefully" made concrete: the benchmark is a starting point, not a one-click cure.

For the Windows Server baseline, the team drafts the specific, high-value settings — the ones that map directly to the breach. A representative excerpt of the Meridian Windows Server Baseline v1:

MERIDIAN WINDOWS SERVER BASELINE v1  (source: CIS Benchmark, Level 1; CDE hosts → Level 2 targets)

  Attack surface reduction
    SMBv1 (client & server)            DISABLED      # the breach's lateral protocol
    Print Spooler (non-print servers)  DISABLED
    PowerShell execution policy        AllSigned + Constrained Language Mode where feasible
    Windows Script Host / AutoRun      DISABLED
  Accounts & credentials
    Local Administrator account        DISABLED (access via LAPS only)
    LAPS                               ENABLED (unique, rotated per host)
    LAN Manager auth level             NTLMv2 only; refuse LM & NTLM
  Endpoint protection
    Microsoft Defender                 ENABLED; Tamper Protection ENABLED
    Defender ASR rules                 key rules ENABLED (block LSASS theft, Office child procs)
    Application allowlisting (WDAC)    ENABLED (servers run a fixed software set)
  Audit & logging  (feeds the SIEM — Ch. 21)
    PowerShell script-block logging    ENABLED
    Process creation (4688) + cmdline  ENABLED
    Sysmon                             DEPLOYED (process/network/image-load telemetry)
  Deviations (documented):
    Atlas interface servers: WDAC allowlisting deferred; see compensating controls (Phase 4).

Figure CS1.1 — An excerpt of Meridian's Windows Server hardening baseline. Every line is traceable to an attacker behavior from the breach: SMBv1 off (no lateral protocol), local admin disabled + LAPS (no shared key), tamper protection on (attacker can't silence Defender), script-block logging on (the timeline Priya lacked). The single documented deviation — Atlas — is handled deliberately, not ignored.

Elena's contribution is the discipline that makes the baseline auditable later: every deviation gets a recorded reason and an owner. "If we relax a setting because Atlas needs it, that's a decision with a name and a date attached, not a quiet exception someone discovers in two years." A baseline with documented deviations is honest; a baseline that pretends there are none is fiction.

Phase 2 — Enforcing at scale (not one box at a time)

The breach taught Meridian that hardening one server by hand is theater — the broken server was built by someone who simply didn't know the standard, because there was no standard to know. Sam's rule: nobody hardens a machine; we harden the policy, and the policy hardens the machines and keeps them hardened.

  • Windows: the baseline is implemented as Group Policy Objects linked to the appropriate organizational units — one GPO set for the Servers OU, one for Workstations, a stricter overlay for the CDE servers' OU. Microsoft's published security baselines give a hardened starting GPO to import and adjust rather than clicking through hundreds of settings. Because Group Policy re-applies on a schedule, a locally clever administrator who weakens a setting finds it reverted at the next refresh — drift correction is automatic.
  • Linux: the baseline is implemented through configuration management (Ansible, in Meridian's case) so that every server is built and continuously reconciled to the same state: SELinux/AppArmor enforcing, the SSH posture (no root login, keys only, AllowGroups), unneeded services masked, auditd running. Re-running the playbook on a schedule is the Linux equivalent of Group Policy's drift correction.
  • macOS: the baseline is implemented through MDM — which first requires enrolling the forty unmanaged Macs at all. MDM then enforces FileVault (with central key escrow), Gatekeeper, SIP, configuration profiles for the hardening baseline, and pushes the EDR agent. As the chapter argued, the most important macOS hardening decision is the decision to manage them centrally in the first place.

🛡️ Defender's Lens: Notice that every enforcement mechanism here does double duty — it applies the baseline and it re-applies it, fighting the entropy that returned the chapter's hand-hardened server to defaults in two weeks. Attackers and careless administrators both create drift; Group Policy, Ansible, and MDM all turn "hardened once" into "hardened continuously." A standard without an enforcement engine behind it decays back to the dangerous default on its own.

Phase 3 — LAPS and EDR: killing the lateral path, lighting up the host

Two deployments get top priority because they map to the two worst facts of the breach.

LAPS goes out across the entire Windows fleet first, because the shared local-administrator password was the single fact that turned one foothold into forty compromised servers. After LAPS, each machine's local Administrator password is unique, random, and rotated, stored in Active Directory with retrieval restricted to authorized administrators. Theo, who scoped the original incident, puts it bluntly in the project notes: "The thing that took me three days — one password opening every box — just stops being possible. Not detected. Impossible. A credential pulled from host A doesn't work on host B." It is the chapter's most satisfying kind of control: a structural change that removes an entire technique rather than detecting its use.

EDR goes out everywhere — Windows, Linux, and the newly managed Macs — because the file server kept almost no behavioral record, which is why the reconstruction was so painful. The team chooses Microsoft Defender for Endpoint for the Windows estate (it includes both AV signatures and behavioral EDR) and a cross-platform EDR for Linux and macOS, all reporting to one SOC console. Sam is explicit with the team about why EDR and not just antivirus:

🔗 Connection: The EDR rollout is simultaneously a prevention upgrade and the detection foundation for the whole SOC. The behavioral telemetry EDR collects — process lineage, command lines, network connections — flows into the SIEM (Chapter 21) and is the raw material for the threat detection and hunting of Chapter 22. Marcus's SOC, which got almost nothing from the breached server, will now get a recorded narrative from every endpoint. A hardening project, done well, hands the SOC its host-side eyes.

Application allowlisting (WDAC) is harder, so the team sequences it sensibly: servers first. Servers run a fixed, known software set, so the allow-list is small and stable, and the high operational cost of maintaining it is manageable — and servers are exactly where the breach did its damage. General workstations, where users install varied software constantly, are deferred to a later phase with a help-desk process for exceptions. This is the chapter's guidance — apply allowlisting where the payoff is highest and the software set is most stable — turned into a rollout order.

Phase 4 — The hard case: the server you cannot patch

The three Atlas interface servers are the test of whether Meridian's standard is real or just convenient. Atlas is the gateway to the core-banking system; it runs only on an out-of-support operating system, the vendor certifies it nowhere else, and it cannot be patched or fully hardened without risking the bank's most critical application. "We can't patch it" is true. It is also, as the chapter insists, the start of a risk conversation, not the end of one.

The team does not throw up its hands; it builds a defensive wrapper from controls they already have, explicitly assuming each layer might fail (Theme 4):

ATLAS interface servers — compensating-controls wrapper (cannot patch on demand)

  Network (Ch. 6–7):   isolated in a dedicated, tightly restricted segment; default-deny
                       in/out; only the specific core-banking hosts/ports may reach it.
  Host (this chapter): host firewall locked to the exact required flows; unneeded services
                       stripped to the minimum; EDR in its most aggressive mode; intensive
                       logging shipped off-box to the SIEM (so an attacker can't erase it).
  Detection:           any unexpected process, connection, or logon on Atlas = high-severity
                       SOC alert (these boxes should be boring; deviation is signal).
  Governance (Elena):  a documented RISK ACCEPTANCE — the risk, why no patch is possible,
                       the compensating controls, the owner (CIO), and a REVIEW/EXPIRY date.
                       Renewed deliberately, not forgotten.

Figure CS1.2 — The unpatchable-system pattern. Where you cannot remove the vulnerability, you shrink who can reach it (network), shrink what runs on it and watch it closely (host + detection), and write down the accepted risk with an expiry (governance). "We can't patch it" becomes a defensible, time-bound decision — the foreshadowing of operational-technology constraints in Chapter 33.

⚠️ Common Pitfall (avoided): The dangerous version of this is the undocumented exception — a box everyone vaguely knows is "special" and "can't be touched," with no record of why, no compensating controls, no owner, and no expiry, quietly accumulating risk until it is the thing that gets breached. Elena's risk-acceptance record with a review date is what separates a managed exception from a forgotten one. The expiry date forces the conversation again next year: is this still necessary? Can we migrate yet? An exception without an expiry is just a vulnerability with paperwork.

Phase 4b — Rolling out without breaking the bank (the deployment plan)

Dana's constraint — do not break production — shapes the order and method of rollout as much as the content of the baselines. The team's worst fear is a hardening setting that takes down a branch or, worse, the core-banking path on a Monday morning; the chapter's "harden boldly, deploy carefully" is, in practice, a sequencing problem. Sam writes a deployment plan that lets the bank move fast where it is safe and slow where it is dangerous:

HOST-HARDENING DEPLOYMENT PLAN  (how the baseline reaches 1,800 machines safely)

  1. TEST     Apply each baseline to a lab that mirrors real workloads (incl. a copy
              of the core-banking client). Find what breaks BEFORE production does.
  2. PILOT    Push to a small "ring 0": IT's own machines + a volunteer branch.
              Watch for breakage and help-desk tickets for two weeks.
  3. BROAD    Roll to the general fleet in waves by OU; Group Policy/Ansible apply
              and then RE-APPLY (drift correction) automatically.
  4. CRITICAL Apply to CDE servers and core-adjacent systems LAST, with change
              windows and rollback plans, after the pattern is proven elsewhere.
  5. EXCEPT   Atlas + any system that genuinely can't take a setting -> documented
              deviation + compensating controls (Phase 4), not a silent skip.

Figure CS1.3b — The rollout as deployment rings — the same risk-managed sequencing the chapter applies to patching (§11.6), here applied to hardening. A baseline setting that would break the core application is caught in TEST or PILOT on a handful of machines, not discovered in CRITICAL on the cardholder-data hosts. The bank earns the right to harden its crown jewels by proving the baseline harmless everywhere else first.

The plan also sequences which controls land first by where they pay off. LAPS and the logging settings go out early and broadly — they are nearly impossible to break anything with and they close the two worst gaps from the breach (shared credentials, no telemetry). Application allowlisting goes to servers first and workstations much later, because the operational cost on general-purpose machines is high. The CDE Level 2 push goes last, because it is the strictest and the most likely to surface a legacy incompatibility. Sequencing by breakage risk and payoff is how a hardening project of this size avoids becoming the outage everyone remembers.

🛡️ Defender's Lens: A hardening rollout that breaks production hands the organization a reason to distrust security and route around it — which is a worse long-term security outcome than a slightly slower rollout. The deployment rings are not bureaucratic caution; they are how the security team keeps the political capital to finish the job. Sam's quiet rule: "The fastest way to harden 1,800 machines is to never cause the outage that gets the whole program paused."

Phase 5 — Proving it: the audit, with harden.py

Here is the phase most hardening programs skip, and the reason Sam's standard is different. A baseline is only real if you can prove a host meets it — otherwise it is, in his words, "a suggestion." The team uses the chapter's harden.py (audit_baseline) as the seed of a continuous audit: configuration data is collected from each host and compared against its baseline, and every drifted setting becomes a finding.

Theo runs the audit against the very file server from the breach (now rebuilt) and three others, to validate that the standard catches what it must:

$ run audit_baseline across server sample (excerpt)

  fileserver-07 (rebuilt to baseline):
    0 setting(s) drifted from baseline.        # the rebuild is clean

  appsrv-22 (built before the standard):
    3 setting(s) drifted from baseline:
      smbv1_enabled               required=False   actual=True
      powershell_logging          required=True    actual=False
      defender_tamper_protection  required=True    actual=False

  atlas-01 (documented exception):
    1 setting(s) drifted from baseline:
      application_allowlisting    required=on      actual=off   <-- KNOWN, accepted (Phase 4)

Figure CS1.4 — The audit in action. harden.py turns the written baseline into an automatic, repeatable verdict per host and names exactly what is wrong. The rebuilt file server passes; an older server still carries the dangerous defaults the standard exists to kill; and the Atlas exception shows up as drift but is reconciled against the documented risk acceptance — so the team can tell "drift we must fix" from "drift we have consciously accepted."

The audit does three things a security team needs. It finds the stragglers — appsrv-22 was built before the standard and still carries SMBv1 and missing logging, exactly the surface that caused the breach. It confirms the rebuilds — fileserver-07 is clean, so the standard, when applied, works. And it distinguishes accepted from unaccepted drift — atlas-01's missing allowlisting is real drift, but it is the documented exception from Phase 4, not a surprise. Sam wires the audit to run on a schedule and report two numbers to the program: the percentage of hosts fully compliant with their baseline, and the count of unaccepted drift findings. Those numbers go into Meridian's metrics pack and, eventually, the board deck (Chapter 36, Chapter 38).

🔄 Check Your Understanding: appsrv-22's audit shows defender_tamper_protection = False and powershell_logging = False. One of these is primarily a prevention gap and one is primarily a detection gap, yet both are dangerous together. Explain which is which, and why the combination is exactly the condition that made the original breach hard to scope. (Hint: think about what tamper protection stops an attacker from doing, and what script-block logging records.)

Discussion Questions

  1. Sam adopted CIS Benchmarks rather than writing hardening settings from scratch. What are the advantages of starting from a published benchmark, and what is the risk of adopting one without the testing and deviation-tracking the team insisted on?
  2. The team chose CIS Level 2 (target) only for CDE servers and Level 1 elsewhere. Was that the right call? Under what circumstances would you push Level 2 onto general workstations despite the operational cost?
  3. LAPS and EDR were both top-priority deployments, but they do very different things — one removes an attack technique, the other detects and records attacker behavior. Which would you deploy first if you could only do one this quarter, and why? Does your answer change if the bank had no SOC to act on EDR alerts?
  4. The Atlas servers got compensating controls and a documented risk acceptance instead of a patch. Is a well-wrapped, well-monitored unpatchable system an acceptable long-term state, or only a bridge to migration? What would make you escalate?
  5. Where in this project did auditing (Phase 5) change the value of the standard (Phases 1–2)? Argue that a hardening standard without an audit is worth less than no standard at all — or argue against it.

Your Turn

Take an operating system and role you know (a personal server, a work laptop, a lab VM you own) and reproduce Sam's process in miniature. (1) Find the relevant CIS Benchmark (or another published baseline) and choose Level 1 or 2 with a one-sentence justification. (2) Draft a ten-line baseline of the highest-value settings for that role, each annotated with the attacker behavior it removes or the telemetry it adds. (3) For one setting you would deviate from, write the deviation record (what, why, owner, expiry). (4) Describe how you would enforce the baseline (what mechanism) and how you would audit it (what you would compare, and the two numbers you would report). Keep it to one page. If you cannot say how you would prove a setting is applied, that is the part of the program you have not built yet.

Key Takeaways

  • Adopt, adjust, enforce — don't invent. Start hardening from a published CIS Benchmark, choose the level per role as a risk decision, adjust for your environment, and document every deviation with an owner and reason. A baseline with honest exceptions beats one that pretends to have none.
  • Harden the policy, not the box. Enforce baselines through mechanisms that re-apply and correct drift — Group Policy (Windows), configuration management (Linux), MDM (macOS) — so "hardened once" becomes "hardened continuously."
  • Prioritize by what the breach taught you. LAPS first (it makes shared-credential lateral movement impossible, not merely detectable); EDR everywhere (prevention upgrade and the SOC's host-side eyes); application allowlisting on servers first (fixed software set = manageable list).
  • The unpatchable system gets a wrapper, not a shrug: isolate it on the network (Ch. 6–7), strip and watch it on the host, alert on any deviation, and record a risk acceptance with an expiry. An exception without an expiry is a vulnerability with paperwork.
  • A standard is only real if you audit it. Drift detection (harden.py / audit_baseline) turns a written baseline into a repeatable verdict, finds the stragglers built before the standard, confirms the rebuilds, and separates accepted drift from unaccepted drift — the numbers that go to the board.
  • This host-hardening standard is the layer behind the network architecture (Ch. 6–7) and feeds the SOC (Ch. 21–22), mobile/IoT and cloud hardening (Ch. 14–15), and vulnerability management (Ch. 23).