Case Study 1: The OT You Didn't Know You Had — Segmenting Meridian's Building Management System

"We're a bank. We don't have OT." — every Meridian executive, until the cooling failed. — opening line of Sam Whitfield's project brief (constructed)

Executive Summary

Meridian Regional Bank does not run a refinery, a grid, or a water plant. So when the security program roadmap reached "OT/facilities segmentation," the reflex in the room was to skip it: we have no operational technology. This case study follows security engineer Sam Whitfield as he proves otherwise — that the building management system (BMS) keeping the on-premises data center cool and powered is genuine OT, that an attacker who reached it could take the bank offline through physics rather than data, and that the system was sitting on a flat network with a vendor backdoor and no monitoring. The work is a design exercise: inventory the OT, map it to the Purdue model, find the IT/OT bridges, and produce a segmentation plan that protects the process without risking it. It is the version of OT security the largest number of defenders will actually meet — not a power station, but the unglamorous machinery in the basement that the whole business silently depends on. The scenario and all figures are constructed for teaching (Tier 3).

Skills applied: OT asset discovery (passive-first); Purdue-model zone mapping; identifying IT/OT boundary violations; designing an industrial DMZ (IDMZ) and brokered access; choosing compensating controls for unpatchable equipment; specifying passive monitoring; writing a risk-acceptance note; translating physical-availability risk into board language.

Background

Meridian's primary on-premises data center occupies part of a floor in its headquarters. It houses the legacy core-banking system, the Active Directory domain controllers, the VMware estate, and the network's heart. Like every data center, it cannot run without environmental controls: if the room gets too hot, servers throttle and then shut down to protect themselves. The cooling, power conditioning, and physical-access systems are governed by a building management system — a network of controllers and software that regulates the computer-room air conditioning (CRAC) units, the uninterruptible power supplies (UPS) and generator transfer, the dampers and airflow, and the door/badge controllers. The BMS was installed by a specialist vendor when the data center was last refurbished, eight years ago, and has been "running fine" ever since — which is to say, nobody has looked at it.

Dana Okafor, the CISO, put OT/facilities on the roadmap for a specific reason: a peer institution had suffered a multi-hour outage when an HVAC failure cooked a server room, and the post-incident reporting hinted the HVAC controllers had been reachable from the corporate network. Dana does not want to discover Meridian's equivalent during an incident. She gives Sam a deliberately modest charter: find out what facilities OT we have, where it sits on the network, and what it would take to make it defensible — without breaking anything. The "without breaking anything" is not a courtesy; it is the binding constraint of the entire engagement, because the things Sam is assessing keep the bank's servers alive.

Sam's first act is to recalibrate the room's assumption. He frames it for Dana in one sentence she can repeat to the board: "The BMS is operational technology — it controls physical equipment — and if it fails or is sabotaged, the data center goes dark, which is an availability failure with a physical cause. We have OT. We've just been calling it 'facilities.'"

🔗 Connection: This is the §33.1 lesson made concrete. The priority inversion — safety and availability over confidentiality — applies even here. Nobody will steal a meaningful secret from a CRAC controller. But an attacker who commands the CRAC controllers can deny the bank its data center, and a badge-controller compromise has physical-access and safety implications. The threat to the BMS is overwhelmingly an availability (and safety) threat, which is exactly why it belongs in this chapter and not in the data-protection chapters.

The Engagement

Phase 1 — Discovery, the OT way (passive first)

Sam's instinct from years of IT work is to run a discovery scan across the facilities subnet and see what answers. He stops himself, because this is the single most important discipline of the chapter: you do not actively scan an OT network. A malformed packet that an office printer shrugs off can hang a fragile BMS controller, and a hung controller can drop cooling. The ethics callout in §33.3 is not abstract here; an unauthorized scan that took the cooling offline would itself be the incident Sam was hired to prevent.

So Sam does it the OT way. He works with the facilities team to identify where the BMS traffic flows, and he places a passive sensor on a SPAN (mirror) port at that switch — a sensor that receives a copy of every packet but cannot transmit. He lets it watch for two weeks. The inventory it produces, with no probe ever sent, is the revelation §33.5 promised. The BMS is larger and more connected than anyone described:

Passively discovered BMS assets (2 weeks of observation, zero packets sent):

  field/sensors    14x temperature, 6x humidity, 4x airflow, 3x leak detectors
  field/actuators  4x CRAC units, 6x damper motors, 2x UPS controllers, generator xfer
  controllers      3x BMS PLC-class controllers (vendor: "BuildControl", fw v4.1, 2016)
  HMI              1x facilities workstation (Windows, ops room) — last patched 2019
  supervisory      1x BMS server + historian (Windows Server, vendor-managed)
  access/badge     1x door controller subsystem (8 doors incl. server-room mantrap)
  UNEXPECTED:
    - the BMS server has an open SMB connection FROM a corporate file server
    - the BMS server accepts an inbound RDP session from an internet IP every Tuesday
    - one "engineering laptop" appears on both the BMS switch and corporate WiFi (by MAC)

The three "UNEXPECTED" findings are the engagement. Everything above them is normal OT that simply needs to be inventoried and zoned. The three below are IT/OT bridges — paths between the corporate (or internet) world and the controllers that run the room's physics — and each is a way an IT compromise becomes a data-center outage.

🛡️ Defender's Lens: Sam got a complete, trustworthy asset inventory and found three serious boundary violations, and he did it without sending a single packet at the equipment he was protecting. This is the OT defender's superpower and the reason passive monitoring is non-negotiable: the same capability that gives you visibility cannot, by construction, cause the outage you fear. An IT-style active scan would have given the same inventory at the risk of being the incident.

Phase 2 — Mapping the BMS to the Purdue model

With an inventory in hand, Sam places every asset in its Purdue level. This is the purdue_zone(asset) step from the Project Checkpoint, done first on paper. The mapping makes the boundary — and the violations — visible.

Asset Role Purdue level Domain
Temperature / humidity / airflow / leak sensors Field measurement 0 OT
CRAC units, damper motors, UPS controllers, generator transfer Actuators 0 OT
BuildControl BMS controllers (×3) Basic control logic 1 OT
Facilities HMI (ops-room workstation) Operator interface 2 OT
Door / badge controller subsystem Physical-access control logic 1–2 OT
BMS server + historian Site supervisory + data 3 OT
(should exist) Facilities jump host + historian replica Brokered access/data 3.5 IDMZ
Corporate file server, ticketing, BI Business logistics 4 IT
Corporate email, AD, internet Enterprise 5 IT

The mapping immediately classifies the three unexpected findings:

  • The SMB connection from a corporate file server (Level 4) into the BMS server (Level 3) is a direct IT→OT crossing. The Purdue model forbids it absolutely; there should be no path from Level 4 into Level 3 that does not pass through the IDMZ.
  • The inbound RDP from an internet IP into the BMS server (Level 3) is a direct internet→OT crossing — worse still. This is the vendor's remote-support access, configured eight years ago for convenience, bypassing every boundary. It is the same class of finding as the missing-IDMZ vendor path in §33.3, and the same class of access that, with a stolen credential, enabled Colonial Pipeline's IT breach.
  • The engineering laptop bridging the BMS switch and corporate WiFi is a human-carried bridge. A laptop that touches both the hostile corporate network (which may carry malware from a phished employee) and the control network is exactly the path Stuxnet used in principle — a device crossing a boundary the network diagram says is sealed.
   CURRENT (dangerous) state — three direct IT/OT bridges:

   Internet ──RDP every Tue──┐
   Corp file server ─SMB────┐│         ┌──────────────┐
   (Level 4/5)              ▼▼          │  BMS server  │ Level 3 (OT)
                       ╳ NO IDMZ ╳ ───▶ │  + historian │
   Eng laptop ─(on both nets)───────────┤              │
                                        └──────┬───────┘
                                               ▼
                                     BMS controllers (Level 1)
                                               ▼
                                     CRAC / UPS / dampers (Level 0)
                                     = the data center's cooling & power

Figure CS1.1 — Meridian's BMS as found: three direct paths from the IT/internet world reach the Level-3 supervisory server, which can command the Level-1 controllers that run cooling and power. There is no IDMZ. Any compromise of the corporate network — or theft of the vendor's RDP credential — has an unbroken path to the machinery that keeps the data center alive.

⚠️ Common Pitfall: It would be tempting to rate this "low risk" because "it's just the air conditioning." That is the confidentiality bias the whole chapter warns against. Re-score it on the OT priorities: the availability impact of losing the data center's cooling is catastrophic (the core banking platform goes down), and the safety dimension of the badge/mantrap controllers is non-trivial. Using the Chapter 1 model on the right axis — likelihood that a corporate compromise reaches the BMS (moderate-to-high, given three open bridges) × impact of a data-center outage (5) — this is a CRITICAL finding, not a facilities footnote.

To make the re-scoring concrete and auditable, Sam builds the BMS findings into the same risk-register format the program has used since Chapter 1, deliberately scoring on the OT axis (availability and safety), not confidentiality:

# Finding Asset L I Score Band
B1 Vendor RDP from internet directly into Level-3 BMS server (no MFA) BMS supervisory 4 5 20 CRITICAL
B2 Corporate file server reaches Level-3 BMS server over SMB BMS supervisory 4 5 20 CRITICAL
B3 Engineering laptop bridges corporate WiFi and the BMS switch BMS zone 3 5 15 CRITICAL
B4 BMS controllers + HMI unpatchable / EOL, no monitoring BMS controllers/HMI 3 5 15 CRITICAL
B5 Badge/mantrap controller on the same flat segment Physical access 2 4 8 HIGH

Every bridge scores CRITICAL, and the reason is the same each time: a moderate-to-high likelihood (the paths are open and the corporate network is a realistic compromise source) multiplied by a maximum impact (losing data-center cooling takes online banking down). The confidentiality-biased version of this table — "it's just facilities, impact 2" — would have buried the whole engagement under MEDIUMs and the work would never have been funded. Scoring on the right axis is what makes the case.

Phase 3 — The physical-access dimension: when OT controls a door

Before designing the network fix, Sam pauses on B5, because it carries a dimension the cooling controllers do not: the badge/door subsystem is OT that controls physical access to the data center itself, including the server-room mantrap. This is where the "safety" in the OT priority ordering stops being abstract. A compromise of the cooling controllers is an availability disaster; a compromise of the door controllers is an availability and a physical-security and safety event — an attacker who can command the badge system might unlock the server-room door (defeating every physical control protecting the bank's most sensitive machines) or, conversely, lock occupants in or out in a way that interferes with emergency egress.

The defensive principle is the §33.4 lesson about the safety instrumented system, applied in miniature: the systems whose failure has physical-safety consequences deserve the most isolation. Sam flags two specific requirements for the door subsystem that go beyond the cooling controllers. First, fail-safe behavior must be verified, not assumed: in a fire, the doors must release for egress (life safety) even as they secure against intrusion — and that behavior must be a property of the door hardware and a dedicated safety circuit, not something that depends on the BMS network being up or the controller being uncompromised. A door whose safe-egress behavior can be overridden from the network is a safety defect, not just a security one. Second, the badge subsystem must be segmented even from the rest of the BMS, on the same logic that isolates a SIS: it is the OT component with the highest-consequence failure mode, so it gets the tightest boundary and its own monitoring. Sam will not let "it's all facilities, put it on one VLAN" collapse the door controllers into the same zone as the air handlers.

🔗 Connection: The badge/mantrap subsystem is Meridian's closest thing to a safety instrumented system (§33.4). The reasoning is identical: the component whose failure can harm people (blocked egress) or defeat the last physical barrier (an unlocked server room) must be the most isolated and most closely watched, and its safe behavior must not depend on the network or a controller that could be compromised. You will rarely configure a real SIS as a bank security engineer — but you will meet this exact logic in physical-access OT, and the principle transfers intact.

Phase 4 — Designing the target state

Now the design work. Sam's target is the Purdue-compliant architecture: the BMS isolated into its own OT zone, every legitimate IT/OT exchange brokered through an IDMZ, the unpatchable controllers protected by compensating controls, and the whole boundary watched passively. He works through it in the order of leverage from §33.4 — segment first, then broker access, then monitor, then patch only what is safe.

1. Segment the BMS into its own zone. The BMS controllers, HMI, and supervisory server move behind a firewall (or enforced VLAN boundary) that, by default, denies all traffic between the corporate network and the BMS zone. This single change neutralizes the SMB bridge: the corporate file server can no longer reach the BMS server because the boundary now drops that traffic. Because OT protocols do not authenticate (the "reachability equals control" principle), removing reachability is the highest-leverage control available, and it requires no change to the fragile controllers themselves.

2. Build the IDMZ for the exchanges that must happen. Two legitimate flows survive segmentation and must be re-homed through a Level-3.5 IDMZ: - Vendor remote support. Replace the direct internet→Level-3 RDP with a jump host in the IDMZ. The vendor authenticates to the jump host with MFA (where modern authentication can live, even though the BMS controllers cannot enforce it), the session is recorded, and only from the jump host can the vendor reach the BMS — and only during approved windows. The internet-to-OT path is eliminated. - Facility data to business reporting. If anyone needs BMS data for capacity planning or reporting, the Level-3 historian pushes its data up to a historian replica in the IDMZ, and the corporate BI tool reads the replica. No business system ever opens a connection into the OT zone; the only flow across the boundary is OT→IDMZ, brokered and inspectable.

3. Eliminate the human-carried bridge. The engineering laptop is brought under policy: a dedicated, hardened device is provisioned for BMS work and is never connected to the corporate network, and general-purpose laptops are barred from the BMS switch (enforced by network access control where available). This closes the removable-bridge path that no firewall rule alone can see.

4. Apply compensating controls to the unpatchable core. The BuildControl controllers (firmware frozen at 2016) and the badge controllers cannot take modern patches or MFA. Their protection is the segmentation and brokering above, plus passive monitoring below. The two Windows hosts — the HMI and the BMS server — can be patched, but only in coordination with the vendor and during a planned maintenance window, tested first; Sam schedules them on a slower cadence than corporate IT and documents the gap.

5. Deploy permanent passive monitoring at the boundary. The two-week sensor becomes permanent. It sits on a SPAN/tap at the IDMZ boundary and inside the BMS zone, baselined, feeding Meridian's SIEM. Its highest-severity rule is reserved for any IT→OT boundary crossing — the direction-based detection from §33.5 — so that if any new path opens (a vendor reconfigures something, a "temporary" rule is added and forgotten), it surfaces as an alert rather than waiting to be found in the next assessment.

   TARGET (defensible) state — one brokered boundary, monitored:

   Internet/Corp (L4/5)
        │  (vendor support)              ┌─────────────────────────┐
        ▼                                │  IDMZ  (Level 3.5)       │
   ┌─────────┐   MFA + session record    │  ┌───────────────────┐  │
   │ vendor  │──────────────────────────▶│  │ jump host         │  │
   └─────────┘                           │  │ historian replica │◀─┼─ OT→IDMZ data push
                                         │  └─────────┬─────────┘  │
   [ passive sensor on SPAN at boundary ]            │             │
                                         └───────────┼─────────────┘
                                            (only path) │  default-deny elsewhere
                                                      ▼
                                          ┌──────────────────────┐
                                          │  BMS server (L3)      │  OT zone
                                          │  HMI (L2)             │  (segmented,
                                          │  controllers (L1)     │   monitored)
                                          │  CRAC/UPS/dampers (L0)│
                                          └──────────────────────┘

Figure CS1.2 — The target architecture. The BMS is segmented into its own OT zone with default-deny at the boundary. The only IT/OT exchanges are brokered through the IDMZ: vendor access via an MFA-protected, recorded jump host, and facility data via a historian replica the OT side pushes to. A passive sensor watches the boundary and feeds the SIEM, with IT→OT crossings as the top-severity alert.

Phase 5 — Validating the design without breaking the process

A target diagram is not a defense; a deployed, verified design is. The binding constraint of the whole engagement — "without breaking anything" — applies most acutely now, because changing network boundaries around live cooling and power controllers is exactly the kind of action that, done carelessly, becomes the outage Sam was hired to prevent. He validates in three deliberately conservative steps.

Step 1 — monitor-only first. Before enforcing a single new boundary, Sam runs the passive sensor and the planned firewall in log-only mode for two weeks, recording what traffic the new default-deny rules would have dropped. This surfaces every legitimate flow he might not have known about — a nightly backup job, a license check, a time-sync source — so that "default-deny" does not accidentally sever something the process quietly depends on. In OT, you learn what to allow by watching, not by guessing, because a wrong guess can stop the plant.

Step 2 — change only in a maintenance window, with the engineers in the room. The actual cutover — moving the BMS behind the enforced boundary, standing up the IDMZ jump host, killing the vendor's direct RDP — happens during a planned facilities maintenance window, with the facilities engineers present and a tested rollback ready. Sam does not touch the controllers themselves; every change is at the network layer (the high-leverage, low-risk lever of §33.4), and the controllers keep running exactly as before, now simply unreachable from the corporate network. The cooling never pauses.

Step 3 — tabletop the failure modes. With the design live, Sam runs a short tabletop with the SOC and facilities: "A corporate laptop is hit by ransomware tonight. Walk me through what the BMS sees and what we can prove." The exercise confirms the win condition — the IDMZ now stands between the corporate network and the BMS, and the passive sensor would alert on any IT→OT crossing — and, crucially, it rehearses the Colonial decision in advance: because the boundary is now segmented and monitored, Meridian could answer "is the BMS reachable from the compromised network?" with evidence, rather than guessing and pulling the plug on its own data center. The tabletop also catches a gap: the SOC had no runbook for an OT boundary alert, so Sam writes one, mapping the alert to "isolate the path, confirm the OT side is clean, engage facilities" rather than the IT reflex of "reimage the host and move on."

🛡️ Defender's Lens: Notice that not one of these validation steps involved touching a controller, and the riskiest action (changing the boundary) happened in a window with rollback and engineers present. This is the OT defender's discipline made routine: the controls that protect the process are deployed in a way that cannot itself disrupt the process. An IT engineer who "just pushes the firewall change Friday afternoon" would be committing the exact error the chapter warns against — in OT, how you deploy a control is as safety-critical as which control you deploy.

Phase 6 — The risk-acceptance note and the board translation

Two pieces of equipment will remain imperfect after the project: the BuildControl controllers and the HMI's aging operating system cannot be made current without a vendor-led project and a planned outage that is out of scope for this quarter. An honest OT program does not hide that; it writes it down. Sam drafts a short risk-acceptance note for Dana and the business owner of the data center:

RISK ACCEPTANCE — BMS legacy components (this quarter)
  Residual risk:  BMS controllers (fw 2016) and the facilities HMI (EOL OS) carry
                  known, unpatched vulnerabilities and cannot enforce MFA. If reached,
                  they could be used to disrupt data-center cooling/power (availability).
  Why accepted:   Vendor-led firmware/OS upgrade requires a planned data-center
                  maintenance window and budget not available until next FY.
  Compensating:   (1) BMS segmented to its own zone, default-deny at boundary;
                  (2) all access brokered via IDMZ jump host (MFA + session recording);
                  (3) human-carried laptop bridge eliminated;
                  (4) permanent passive monitoring at the boundary, feeding the SIEM,
                      with IT->OT crossing as top-severity alert.
  Revisit when:   the FY budget lands, OR the passive sensor flags any anomaly on these
                  hosts, OR the vendor issues a critical advisory — whichever comes first.

For the board, Sam and Dana compress the entire engagement into the language of risk and physics, not protocols: "We discovered that the systems running our data center's cooling and power — operational technology we hadn't been tracking — were directly reachable from the internet and the corporate network through three paths, including a vendor's remote-access account with no multi-factor protection. A compromise of any of those paths could have taken the data center, and therefore online banking, offline. We have isolated those systems, routed the one necessary vendor connection through a monitored, MFA- protected broker, and we now watch the boundary continuously. Two legacy components remain on a documented remediation plan tied to next year's facilities budget." That is the §33.6 Colonial lesson applied preventively: a missing MFA prompt on a remote-access account is not a facilities detail; it is the kind of thing that becomes a headline, and Meridian found and closed it before an attacker did.

🔄 Check Your Understanding: Sam's plan deliberately put "segment the BMS" before "patch the HMI," even though the HMI has known critical vulnerabilities. Using the §33.4 leverage ordering and the OT priorities, explain why segmentation came first. What does segmentation accomplish for the unpatchable controllers that patching the HMI never could? (Hint: a vulnerability you cannot reach is a vulnerability largely neutralized — and the controllers can never be patched at all.)

Discussion Questions

  1. Sam reframed "facilities" as "operational technology" in a single sentence to the board. Why was that relabeling the most important move of the entire engagement, and what would likely have happened to the project's priority and funding without it?
  2. The vendor's direct RDP access existed for eight years and "never caused a problem." How should a defender weigh "it has worked fine" against "it is a direct internet→OT path"? Where does the Colonial Pipeline lesson fit into that argument?
  3. Sam chose passive discovery and refused to run an active scan, accepting that passive monitoring takes two weeks to build a baseline while a scan would have produced an inventory in an hour. Was that the right trade-off here? Under what (if any) circumstances could active techniques be used on this BMS?
  4. The risk-acceptance note leaves two components unpatched on purpose. Is documented, compensated risk acceptance a sign of a mature program or a failure to remediate? Argue both sides, then state where you land for this case.
  5. Meridian is a bank. Does the fact that its OT is "only" building management make the work less important than securing a power utility's OT — or differently important? What transfers and what does not?

Your Turn

Pick a facility you have access to in concept — your office building, a campus, a small business with a back room — and run Sam's engagement on paper. (1) List the OT you can identify: HVAC, access control, elevators, fire suppression, lighting, UPS/generator, refrigeration. (2) For each, name what it controls physically and what happens if it fails or is sabotaged — emphasize availability and safety, not confidentiality. (3) Map each asset to a Purdue level. (4) Hypothesize the IT/OT bridges most likely to exist (vendor remote access, a shared laptop, a "temporary" firewall rule, a building-automation system on the corporate WiFi). (5) Sketch the target state with an IDMZ and a passive monitoring point. (6) Write a three-bullet risk-acceptance note for one component that cannot be fixed now. Keep it to two pages. If you cannot say what a system controls physically, that is a sign you have found IT, not OT — note the boundary.

Key Takeaways

  • Almost every organization has OT, even those that consider themselves pure IT: building management, power, cooling, physical access. The first and most important step is recognizing it — relabeling "facilities" as "operational technology" reorders its priority and its funding.
  • Discover OT passively. A SPAN/tap sensor produces a complete, trustworthy inventory and surfaces IT/OT bridges without sending a single packet at the fragile equipment — the one method that cannot cause the outage you fear. Active scanning is off the table.
  • The Purdue map turns an inventory into a plan. Placing each asset in a level (0–5, plus the IDMZ at 3.5) makes the IT/OT boundary — and every violation of it — visible. The bridges are the engagement.
  • Segment first, broker second, monitor third, patch (only what is safe) last. Segmentation neutralizes unpatchable controllers by removing reachability; the IDMZ re-homes the few legitimate IT/OT exchanges; passive monitoring catches any new crossing; the IT-like hosts are patched in planned windows.
  • Write down what you cannot fix. A documented, compensated risk-acceptance note for legacy equipment is the mark of a mature OT program — honest about residual risk, explicit about compensating controls, and tied to a revisit trigger.
  • Translate physical-availability risk into business language. "A vendor account with no MFA could take the data center — and online banking — offline" is a board-legible statement of exactly the Colonial Pipeline failure mode, found and closed before an attacker arrived.