> "On the morning of May 7, 2021, Colonial Pipeline learned that the cause of the shutdown was ransomware."
Prerequisites
- 6
- 11
- 14
Learning Objectives
- Contrast the priorities of operational technology with information technology, and explain why safety and availability come before confidentiality in an OT environment.
- Identify the core components of an industrial control system — PLCs, RTUs, HMIs, and SCADA — and the role each plays in a physical process.
- Apply the Purdue model to place an asset in the correct zone and design segmentation between IT and OT, including the demilitarized zone that separates them.
- Explain why conventional patching and active scanning are often unsafe in OT, and choose OT-appropriate compensating controls instead.
- Design passive monitoring for an OT network and recognize the IT/OT boundary violations that real incidents have exploited.
In This Chapter
- Overview
- Learning Paths
- 33.1 When downtime can kill
- 33.2 ICS, SCADA, and the components that run the physical world
- 33.3 The Purdue model and segmentation: drawing the line that saves you
- 33.4 Why you can't just patch: OT's broken IT reflexes
- 33.5 Monitoring OT passively: seeing without touching
- 33.6 Lessons from real OT attacks: what the defender takes away
- Project Checkpoint
- Summary
- Spaced Review
- What's Next
Chapter 33: Securing Operational Technology: ICS, SCADA, and Critical Infrastructure Defense
"On the morning of May 7, 2021, Colonial Pipeline learned that the cause of the shutdown was ransomware." — U.S. House Committee on Homeland Security hearing record, 2021
Overview
On the night of May 6, 2021, an attacker logged into a virtual private network at Colonial Pipeline using a single password — an account with no second factor, reportedly recovered from a credential dump. Within hours, ransomware was spreading through the company's business network: billing, scheduling, the systems that decide who owes what for a barrel of refined fuel. The pipeline itself — the pumps, the valves, the sensors that move gasoline from Houston to New York — was never directly touched by the malware. And yet, on the morning of May 7, Colonial shut the pipeline down. Forty-five percent of the fuel supply for the U.S. East Coast stopped flowing for five days. Gas stations ran dry. A regional emergency was declared. People who had never heard the words operational technology suddenly cared a great deal about them.
Why would a company halt a physical process that the malware never reached? Because Colonial could no longer bill for the fuel, and — more to the point of this chapter — because the company could not be confident the attacker had stayed on the business side of the house. When you cannot prove the boundary between your office network and the machinery that moves a hazardous liquid under pressure held, the safe move is to stop the machinery. That decision, made under uncertainty, is the entire subject of this chapter compressed into one morning. The breach was an ordinary IT breach — a missing multi-factor prompt, a reused password, commodity ransomware. The consequence was physical, because the boundary between the two worlds was not as solid, or as well understood, as it needed to be.
This is the chapter where cybersecurity stops being about data and starts being about physics. Everything you have learned so far — segmentation (Chapter 6), host hardening (Chapter 11), device inventory (Chapter 14) — still applies, but the priorities invert and the constraints tighten. In an office, you can reboot a server to apply a patch. In a chemical plant, the equivalent action might vent a tank or stall a turbine. In an office, the worst case is usually a data breach. In a water-treatment plant, the worst case is that someone drinks the water. We are going to study operational technology — the computers that run the physical world — from the defender's seat: what it is, why it cannot be defended like IT, and what does work. We will use the Colonial Pipeline incident as our anchor for the IT/OT boundary, and we will look (carefully, at the level of public fact) at the handful of real attacks that have actually reached physical processes, because they taught the field most of what it knows.
In this chapter, you will learn to:
- Explain the inverted priority of OT — safety first, then availability, then integrity, then confidentiality — and why every defensive decision follows from it.
- Name and place the core ICS components: the controllers (PLCs, RTUs), the operator interfaces (HMIs), and the supervisory layer (SCADA).
- Use the Purdue model to map any asset into a zone and design the segmentation — including the industrial demilitarized zone — that keeps an IT compromise from becoming an OT catastrophe.
- Choose OT-appropriate controls where the IT reflexes (patch now, scan aggressively, force MFA on a 30-year-old controller) are unavailable or unsafe.
- Build passive monitoring for an OT network and read it the way a defender reads any telemetry — to catch the boundary crossing before it becomes a shutdown.
Learning Paths
This chapter is weighted toward engineers who design segmentation and toward the governance professionals who must explain critical-infrastructure risk to a board or a regulator. Here is how to read it:
🏗️ Security Engineer: This is your chapter. Live in §33.3 (the Purdue model and the IDMZ), §33.4 (why patching breaks and what to do instead), and §33.5 (passive monitoring design). The Project Checkpoint extends your toolkit with
otsec.py. 📋 GRC: Focus on §33.1 (the safety/availability inversion that reframes every risk score) and §33.6 (what real incidents imply for policy, vendor contracts, and regulatory exposure). You own the conversation that gets OT into the enterprise risk register. 🛡️ SOC Analyst: §33.5 is where OT telemetry meets your SIEM. Read it for what an OT detection looks like and why it cannot be tuned like an IT one. 📜 Certification Prep: OT/ICS appears in Security+ (the "specialized systems" and architecture objectives) and CISSP (Security Architecture & Engineering, and the safety/embedded-systems material). Thekey-takeaways.mdfile maps the terms.
33.1 When downtime can kill
Begin with the inversion, because nothing else in this chapter makes sense without it. In Chapter 1 we introduced the CIA triad and noted that availability is a genuine security property, not an afterthought. In operational technology, that point is not a nuance — it is the foundation, and it comes with a property the office world rarely has to weigh: safety.
Operational technology (OT) is the hardware and software that directly monitors and controls physical processes, devices, and infrastructure — the computers that open valves, spin motors, regulate temperature, and trip breakers. It is the counterpart to information technology (IT), which stores, processes, and moves data. An email server is IT. The programmable controller that keeps a turbine from over-spinning is OT. The distinction is not about the age or sophistication of the equipment; it is about what happens in the physical world when the system does the wrong thing.
In IT, defenders rank the CIA triad, loosely, as confidentiality first (do not leak the data), then integrity, then availability. A bank fears a data breach more than a four-hour outage. In OT, that ranking flips, and a fourth concern sits above all of them:
IT priority OT priority
┌───────────────────┐ ┌───────────────────────┐
│ 1. Confidentiality│ │ 0. SAFETY (life/limb, │
│ 2. Integrity │ │ environment) │
│ 3. Availability │ inverts → │ 1. Availability │
│ │ │ 2. Integrity │
│ (safety: rarely │ │ 3. Confidentiality │
│ a direct concern)│ │ │
└───────────────────┘ └───────────────────────┘
Figure 33.1 — The priority inversion. In OT, keeping people and the environment safe outranks everything, availability is paramount because the physical process must keep running (or fail to a safe state), and confidentiality — the thing IT guards most jealously — comes last. A defender who imports IT instincts unexamined will make dangerous mistakes.
Read that figure as a set of operating instructions, because it dictates every control choice that follows. Safety sits at level zero: an OT system can leak every byte it holds and still be doing its job, but if it causes a boiler to over-pressurize or a brake to release, people die. Availability is paramount because in many processes, stopping is itself hazardous or hugely expensive — you cannot simply pause a blast furnace or a sewage plant the way you reboot a laptop. Integrity matters because a controller acting on a falsified sensor reading will make a correct decision about a wrong reality. And confidentiality — the property a bank spends most of its budget on — comes last, not because OT data is worthless, but because a stolen pump-pressure log harms no one directly, while an unavailable pump can flood a city.
🚪 Threshold Concept: In IT, the worst outcome of a successful attack is almost always a loss of data. In OT, the worst outcome is a loss of physical control — and physical control failures can injure or kill. This single shift reorders the entire discipline. Patching, scanning, password rotation, even rebooting — reflexes that are unambiguously good in IT — become decisions you must weigh against the possibility of stopping or destabilizing a process that should never stop. Once you internalize that "downtime can kill," you stop asking "how do I secure this like a server?" and start asking "how do I secure this without endangering the process it runs?"
This is why a chapter on OT belongs in a defensive textbook even though most readers will never administer a power plant. Critical infrastructure — the assets so vital that their incapacity would debilitate national security, the economy, or public health and safety — runs on OT. In the United States the government designates sixteen critical-infrastructure sectors: energy, water and wastewater, chemical, manufacturing, transportation, healthcare, food and agriculture, and more. They are interdependent (water plants need power; power plants need water and fuel; fuel needs pipelines), and they are increasingly connected to the same internet your laptop is. The attack surface that Chapter 1 described as "exploding" now includes the machinery of civilization. And — the lesson of Colonial Pipeline — you do not have to attack the OT directly to take it down. You only have to make its operators unable to trust that it is safe.
Even an organization that thinks of itself as pure IT has OT. Meridian Regional Bank does not run a refinery, but its data center has computer-room air conditioning (CRAC) units, uninterruptible power supplies, generators, and a fire-suppression system, all controlled by a building management system (BMS) — a network of controllers regulating temperature, power, and physical access. Its branches have HVAC, alarm panels, and badge readers. Its ATM fleet is, in a real sense, a population of small specialized machines that dispense physical cash. None of that is "critical infrastructure," but all of it is OT: if an attacker reaches the BMS and disables cooling, Meridian's core-banking servers overheat and the bank goes offline — an availability failure with physical roots. We will use exactly this facilities/physical-OT angle as Meridian's contribution to the chapter, because it is the version of OT that the largest number of defenders will actually meet.
🔗 Connection: This builds directly on the network fundamentals of Chapter 6. Segmentation, VLANs, and the death of the flat-network perimeter were introduced there for the IT world; in OT they are not best practice but a safety control. The "east-west vs north-south traffic" distinction from Chapter 6 becomes, in OT, the difference between traffic that should never cross a boundary and traffic that may — a line we will draw precisely with the Purdue model in §33.3.
🔄 Check Your Understanding: 1. State the OT priority ordering and explain, in one sentence, why confidentiality sits last. 2. Colonial Pipeline's ransomware never reached the pipeline's control systems. Give two reasons the company shut the pipeline down anyway.
Answers
- Safety → availability → integrity → confidentiality. Confidentiality is last because a disclosed OT data point (a pressure reading, a setpoint) rarely causes direct physical harm, whereas a loss of availability or integrity can injure people or damage the environment. 2. (a) The breach hit the billing/business systems, so Colonial could not bill for delivered fuel; (b) the company could not be certain the attacker had not crossed from IT into OT, and the safe response under that uncertainty was to stop the process. Either of these, plus the general inability to trust the IT/OT boundary, is a valid reason.
33.2 ICS, SCADA, and the components that run the physical world
To defend OT you must know what the boxes are. The umbrella term is ICS — an industrial control system, the general category of control systems and instrumentation used to operate industrial processes. ICS is the genus; the species you will meet most often are SCADA systems, distributed control systems (DCS), and the individual controllers inside them. Let us build the picture from the bottom up, from the metal to the screen, because the Purdue model in §33.3 is organized along exactly that axis.
At the very bottom are the field devices: the sensors and actuators that touch the physical world. A sensor measures something — temperature, pressure, flow, level, position, voltage — and turns it into a signal. An actuator does something — opens a valve, starts a pump, moves a robotic arm, trips a breaker. These devices have no real intelligence; they are the hands and nerve endings of the system.
Above them sits the controller, and the workhorse controller of the modern factory and utility is the PLC — a programmable logic controller, a ruggedized industrial computer that reads inputs from sensors, runs a control program, and drives outputs to actuators, on a deterministic cycle measured in milliseconds. A PLC is the thing that says, in effect, "if the tank level exceeds 90 percent, close the inlet valve," and does it the same way, on time, every time, for twenty years. PLCs are built for reliability and real-time response, not for security: many speak industrial protocols that have no authentication at all, were designed when the network was assumed to be physically isolated, and cannot run an antivirus agent or accept a modern patch without a vendor's blessing and a plant shutdown. That combination — total control over a physical process, plus a near-total absence of built-in security — is the central defensive problem of this chapter.
A close cousin is the RTU — a remote terminal unit, a controller built to operate at a remote, often unstaffed site and report back over long-distance communications (radio, cellular, leased line). Where a PLC typically lives inside a factory, an RTU sits at a remote pumping station, a substation, or a wellhead, gathering readings and accepting commands from far away. The line between PLC and RTU has blurred over the years, but the defensive significance of the RTU is its remoteness: it extends the control network across geography, often over links a defender does not fully own, which is exactly the kind of path an attacker looks for.
Humans supervise all of this through an HMI — a human-machine interface, the graphical screen (and the computer behind it) that an operator uses to see the state of the process and issue commands. The HMI is where a person watches the tank levels rise and fall, acknowledges alarms, and clicks to start a pump. Crucially for defenders, the HMI is usually a Windows or Linux computer — often an old, unpatched one, because it was validated with the process years ago and "do not touch it" is the operating philosophy. The HMI is frequently the most IT-like, and therefore the most attacker-friendly, box in the whole OT environment. It is where many real intrusions have landed.
Sitting over the controllers and HMIs is SCADA — supervisory control and data acquisition, the system architecture that collects data from many distributed controllers (PLCs, RTUs) and presents centralized monitoring and control to operators, typically across a wide geographic area. SCADA is what a water utility uses to watch and run dozens of pumping stations spread across a county from one control room; it is what an electric utility uses to monitor substations across a state. SCADA is not a single box but a system: servers (the data historian that records every reading, the alarm engine, the master station) plus the communications that reach the field plus the HMIs the operators sit at. Where many controllers are tightly co-located in one plant under unified vendor control, the architecture is instead called a distributed control system (DCS); for our purposes the security principles are the same, and we will say "SCADA/DCS" or just "the supervisory layer" when the distinction does not matter.
Operators ┌──────────────┐
& engineers ────▶│ HMI │ Windows/Linux screen: watch & command
└──────┬───────┘
│ (supervisory layer: SCADA / DCS)
┌──────┴───────┐
│ SCADA servers│ historian, alarms, master station
└──────┬───────┘
│ industrial protocols (often no auth)
┌────────────┼────────────┐
┌─────┴────┐ ┌────┴─────┐ ┌────┴─────┐
│ PLC │ │ PLC │ │ RTU │ controllers run the logic
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ hardwired I/O
┌────┴─────┐ ┌────┴─────┐ ┌────┴─────┐
│ sensors/ │ │ sensors/ │ │ sensors/ │ field devices touch
│actuators │ │actuators │ │actuators │ the physical world
└──────────┘ └──────────┘ └──────────┘
(valves, pumps, motors, breakers, gauges)
Figure 33.2 — The ICS stack, bottom to top. Field devices (sensors/actuators) are driven by controllers (PLCs, RTUs), which are supervised by SCADA/DCS servers and watched by operators at HMIs. Each layer up is more computer-like and more attacker-friendly; each layer down is closer to physics and harder to change.
Three properties of this stack will shape every defense in the chapter, and they are worth stating as principles before we get to controls.
First, the protocols are old and trusting. Industrial protocols — Modbus, DNP3, EtherNet/IP, PROFINET, and many others — were standardized for reliability and determinism in physically isolated networks. Most have little or no authentication: a Modbus "write a value to this register" command is honored by the PLC simply because it arrived, with no check of who sent it. We do not need to study the protocols in offensive detail; the defensive takeaway is that on an OT network, the network itself is the access control. If a packet can reach the PLC, it can very likely command the PLC. That is why segmentation is the load-bearing control.
Second, the equipment lives for decades. A PLC installed in 2003 may still be running in 2033. Industrial equipment is capitalized over twenty- and thirty-year horizons; "just upgrade it" can mean a multi-million-dollar plant project with a year of planning and a scheduled outage. Defenders inherit equipment that predates modern security entirely and will outlive several generations of IT.
Third, availability and determinism are sacred. The PLC must complete its control loop on time, every cycle. Anything that introduces latency, jitter, or an unexpected reboot — including well-meant security tooling — can disrupt the process. This is why the IT reflex of "scan everything, patch everything, install an agent everywhere" is not merely inconvenient in OT; it can be the thing that causes the incident.
🛡️ Defender's Lens: When you map an OT environment for the first time, resist the urge to treat the HMIs and SCADA servers as "just more Windows boxes" to fold into your normal patching and EDR program. They are Windows boxes, but they are load-bearing in a way an office laptop is not — an unexpected reboot or a misbehaving agent on an HMI can blind or destabilize the operators running a physical process. The right first move is not to secure these hosts the IT way; it is to find them, inventory them, and wrap them in network controls while you work out, with the engineers who own the process, what host-level changes are safe.
⚠️ Common Pitfall: Assuming an "air gap" protects the OT, so the components do not need defending. We will dismantle that assumption in §33.3 and §33.5, but flag it now: the belief that "our control network is not connected to anything" is, in the overwhelming majority of real environments, false — and it is most dangerous precisely where it is most confidently held, because confidence in isolation is what justifies leaving the trusting, unpatched components undefended.
🔄 Check Your Understanding: 1. Match each to its role: PLC, RTU, HMI, SCADA. (a) the operator's screen; (b) a controller at an unstaffed remote site reporting over a long-distance link; (c) the system that centralizes monitoring of many distributed controllers; (d) the ruggedized controller running real-time logic inside a plant. 2. Why does the statement "on an OT network, the network is the access control" make segmentation a safety control rather than merely a security control?
Answers
- (a) HMI; (b) RTU; (c) SCADA; (d) PLC. 2. Because most industrial protocols do not authenticate commands — a PLC obeys a "write" command simply because it arrived — reachability equals control. If the only thing standing between an attacker and the ability to command a physical actuator is whether a packet can reach the controller, then the segmentation that decides reachability is what prevents physical harm: it is functioning as a safety control.
33.3 The Purdue model and segmentation: drawing the line that saves you
If there is one diagram every OT defender carries in their head, it is the Purdue model — a reference architecture that organizes an industrial enterprise into hierarchical levels, from the physical process at the bottom to enterprise business systems at the top, so that each level's communications and trust can be controlled at the boundaries between them. It originated as the Purdue Enterprise Reference Architecture in the 1990s and was absorbed into the standards the field still uses. You do not deploy "a Purdue model"; you use it as a map to decide where every asset belongs and which boundaries must be defended.
Here is the model, with the levels labeled and the all-important boundary drawn in:
┌──────────────────────────────────────────────────────────────┐
│ LEVEL 5 Enterprise network (corporate IT, email, internet) │ IT
│ LEVEL 4 Business/site logistics (ERP, scheduling, file) │ domain
└───────────────────────────────┬──────────────────────────────┘
│ ◀── the boundary that matters
┌───────────────────┴───────────────────┐
│ LEVEL 3.5 Industrial DMZ (IDMZ) │ brokered exchange:
│ jump host, data historian replica, │ no direct IT↔OT
│ patch/AV relay, remote-access broker │ traffic crosses
└───────────────────┬───────────────────┘
┌───────────────────────────────┴──────────────────────────────┐
│ LEVEL 3 Site operations (SCADA servers, historian, eng. WS) │ OT
│ LEVEL 2 Area supervisory control (HMIs, area SCADA) │ domain
│ LEVEL 1 Basic control (PLCs, RTUs, controllers) │
│ LEVEL 0 Physical process (sensors, actuators, the metal) │
└──────────────────────────────────────────────────────────────┘
Figure 33.3 — The Purdue model, levels 0–5, with the industrial demilitarized zone (IDMZ) at level 3.5. Levels 0–3 are the OT domain; levels 4–5 are the IT domain. The single most important defensive principle is that no traffic flows directly between the IT domain (4–5) and the OT domain (0–3); everything is brokered through the IDMZ.
Walk the levels from the bottom, because the numbering encodes how close each layer is to physics:
- Level 0 — the physical process. The sensors and actuators, the valves and motors. The metal itself.
- Level 1 — basic control. The PLCs and RTUs that read level 0 and drive it. This is where the control logic lives.
- Level 2 — area supervisory control. The HMIs and local SCADA that let operators watch and command an area or a single line.
- Level 3 — site operations. The plant-wide SCADA servers, the data historian, the engineering workstations where control programs are written and downloaded to PLCs. This is the top of the OT domain.
- Level 3.5 — the industrial DMZ (IDMZ). Not part of the original numbering, but the most important addition the security community made. A buffer zone, between OT and IT, that brokers every necessary exchange so that the two domains never talk directly.
- Level 4 — business/site logistics. Site-level IT: scheduling, manufacturing-execution systems, file servers. The bottom of the IT domain.
- Level 5 — the enterprise. Corporate IT, email, the internet, everything in the first thirty-two chapters of this book.
The defensive heart of the model is a single rule: the IT domain (levels 4–5) and the OT domain (levels 0–3) must never communicate directly. Every legitimate need to move data between them — and there are many: the historian's readings must reach business reporting, antivirus signatures and patches must reach the OT hosts that can safely take them, an engineer must occasionally connect remotely — is satisfied through the IDMZ. You will recognize the IDMZ as the same idea as the network DMZ from Chapter 6, applied to the most consequential boundary in the enterprise. In a classic IT DMZ, you place internet-facing servers in a buffer so the internet never touches your internal network directly. In the IDMZ, you place brokers — a jump host for remote access, a replica of the data historian that business systems read instead of reaching into OT, a patch-and-antivirus relay — so that the OT network never touches the IT network directly.
🔗 Connection: The IDMZ is the network DMZ of Chapter 6 with the stakes raised. There, the DMZ kept the hostile internet away from internal servers. Here, it keeps a compromised IT network away from the controllers that move physical things. The reason Colonial Pipeline's operators could not trust their boundary — and so shut the pipeline down — is precisely the failure mode this level 3.5 broker exists to prevent: a clean, auditable, choke-pointed separation that lets you say with confidence, "the malware on the business network cannot have reached the process network, because there is no path between them that does not pass through controls I can inspect."
Why does this matter so much? Because of the protocol problem from §33.2: on the OT network, reachability is control. The Purdue model is a strategy for ensuring that the only things that can reach a PLC are the things that should — the HMI and SCADA above it, not a ransomware worm spreading laterally from a phished laptop on level 5. Segmentation in IT limits a breach's blast radius; segmentation in OT can be the difference between a contained incident and a physical disaster.
A worked example: mapping Meridian's facilities to the Purdue model
Meridian is a bank, not a factory, but its data center has a building management system, and an OT defender's first job is always the same: take an inventory of assets and place each one in the correct Purdue level. Sam Whitfield, Meridian's security engineer, sits down with the facilities team and produces this mapping for the data center's environmental controls.
| Asset | What it does | Purdue level | Reasoning |
|---|---|---|---|
| Temperature/humidity sensors in the server hall | Measure conditions | Level 0 | Field devices touching the physical environment |
| Cooling/CRAC actuators, damper motors | Adjust airflow and cooling | Level 0 | Actuators in the physical process |
| BMS controllers (regulate CRAC, power, dampers) | Run the control logic | Level 1 | Programmable controllers driving level-0 devices |
| Facilities HMI (the screen in the ops room) | Operators watch/adjust the environment | Level 2 | Human-machine interface for the area |
| BMS supervisory server + historian | Centralizes and logs all facility data | Level 3 | Site-operations supervisory layer |
| Facilities remote-access jump host | Vendor logs in here to service the BMS | Level 3.5 (IDMZ) | Broker for IT-originated access into OT |
| Facilities ticketing / work-order system | Tracks maintenance, in corporate IT | Level 4 | Business logistics, IT domain |
| Corporate email, AD, internet | Everything else | Level 5 | Enterprise IT |
Having placed the assets, Sam can see the boundary that matters and the violation that is almost certainly already present. The textbook design says the BMS vendor, when it needs to service the supervisory server, should connect to the level-3.5 jump host and from there reach the BMS — never directly from the internet to the level-3 server. When Sam checks, he finds what OT assessments almost always find: the BMS vendor has a direct remote-access account into the supervisory server, configured years ago for convenience, bypassing the IDMZ entirely. That single path is an IT-to-OT bridge — exactly the class of finding §33.6 will show has enabled real incidents. The remediation is the rest of this chapter: broker that access through the IDMZ, segment the BMS off the corporate network, and watch the boundary passively. The Project Checkpoint encodes the placement step — purdue_zone(asset) — so the team can run the inventory programmatically and flag the boundary crossings.
⚖️ Authorization & Ethics: OT assessment carries a risk that IT assessment does not. A port scan that is routine on a corporate network can crash a fragile PLC or RTU, and a crashed controller can stop or destabilize a physical process. Never run active scanning, vulnerability probes, or "just a quick nmap" against an OT network without explicit authorization from the process owner and, in most cases, only during a planned maintenance window with engineers present. The default OT discovery method is passive (§33.5). In OT, an unauthorized scan is not just a policy violation — it can be a safety incident.
🔄 Check Your Understanding: 1. A data historian that business analysts want to query lives at Level 3. The textbook design does not let the Level-4 business systems reach into Level 3 to read it. How does the IDMZ solve this without breaking the rule that IT and OT never talk directly? 2. Why is "reachability equals control" the reason the Purdue model treats segmentation as the primary control, rather than, say, host hardening?
Answers
- A replica of the historian is placed in the IDMZ (Level 3.5). The real Level-3 historian pushes its data up to the replica; the Level-4 business systems read from the replica. IT systems get the data they need, but no IT system ever opens a connection into the OT domain — the only flow across the boundary is OT→IDMZ, brokered and inspectable. 2. Because most OT protocols do not authenticate commands, any host that can reach a controller can command it; therefore controlling who can reach what (segmentation) directly controls who can command the process. Host hardening helps but cannot fix an unauthenticated protocol — and many OT hosts (PLCs) cannot be hardened or patched at all — so the network boundary is the control that actually constrains the attacker.
33.4 Why you can't just patch: OT's broken IT reflexes
A defender arriving from the IT world brings a set of reflexes that are correct in IT and dangerous in OT. The most important section of this chapter, practically speaking, is the catalog of those reflexes and what to do instead — because the gap between "what I would do to a server" and "what I can do to a PLC" is where OT security actually lives.
Reflex 1: patch promptly. In IT, an unpatched system is a liability and the answer is to patch it, fast, on a risk-based schedule. In OT, prompt patching is frequently impossible and sometimes unsafe, for several converging reasons. The vendor may not have released a patch at all (a PLC's firmware may be frozen). Applying a patch may require taking the controller — and therefore the process — offline, which on a continuous process is a scheduled outage that happens once or twice a year, if that. The patch may not be validated against the specific process; in regulated industries (pharmaceuticals, for instance) changing a validated system can require re-certification. And the act of patching or rebooting an HMI mid-process can blind the operators at exactly the wrong moment. The consequence is that OT systems routinely run with known, unpatched vulnerabilities for years — not through negligence, but because the cure can be worse than the disease.
So what do you do about a vulnerability you cannot patch? You apply compensating controls — measures that reduce the risk of a vulnerability you cannot directly remove. The OT defender's compensating-control toolkit is, in order of leverage:
- Segment harder. If the vulnerable PLC cannot be patched, ensure that the only systems that can reach it are the ones that must (the principle of §33.3). A vulnerability you cannot exploit because you cannot reach the target is a vulnerability largely neutralized.
- Monitor passively. If you cannot prevent exploitation, ensure you would detect it (§33.5). A change to a PLC's program, an unexpected command, a new device on the control network — all are visible without touching the controller.
- Restrict and broker access. Funnel all human and vendor access through the IDMZ jump host (§33.3), with strong authentication on the broker (where modern MFA can live, even though the PLC cannot enforce it) and session recording.
- Patch what you safely can, in maintenance windows. The HMIs and SCADA servers — the IT-like hosts — often can be patched during planned outages, on a slower cadence than IT, tested first. Prioritize ruthlessly using the same risk thinking from Chapter 1: which unpatched flaw is actually reachable and consequential?
🔗 Connection: This is the patch-management discipline of Chapter 11 turned on its head by constraints. There, the lesson was that host patching is a non-negotiable hygiene control and the challenge is making it actually happen. Here, patching is often unavailable, and the discipline shifts to compensating controls — segmentation and monitoring doing the work that a patch would do in IT. The risk-based prioritization from Chapter 1 is what tells you which of the few patches you can apply are worth the outage to apply.
Reflex 2: scan aggressively to find your assets. In IT, you run an authenticated vulnerability scanner across the estate and accept that the occasional fragile device might hiccup. In OT, an active scan can crash a controller and stop a process — the active-scanning prohibition is so important it has its own ethics callout above. The OT alternative is passive discovery: you watch the network traffic and learn what is there from what is talking, never sending a probe of your own. We build this in §33.5.
Reflex 3: install an endpoint agent everywhere. In IT, you deploy EDR (Chapter 11) to every host for detection and response. In OT, you often cannot: the HMI may be running an operating system the agent does not support, the vendor may void support if you install unapproved software, and the agent's CPU and disk activity may introduce latency a real-time system cannot tolerate. Where agents are supported and vendor-approved, use them — but assume many OT hosts will never have one, and lean on network-based detection instead.
Reflex 4: force strong authentication and rotate credentials. In IT, you mandate MFA and rotate passwords. A 1990s PLC has no concept of a user account; a SCADA system may have a single shared operator login that cannot be changed without a vendor engagement; hard-coded and default credentials are endemic in industrial gear (the same default-credential problem you met for IoT in Chapter 14, aged by another decade). The OT answer is to put the strong authentication where it can live — on the IDMZ jump host that brokers all access — and to compensate for the weak authentication on the devices themselves with segmentation and monitoring.
Above all of these sits a control category that has no IT equivalent and that the defender must understand even though they will rarely configure it directly: the safety instrumented system (SIS) — an independent control system whose sole job is to bring a process to a safe state when conditions become dangerous, operating separately from the normal control system so that a failure (or compromise) of the latter does not disable the former. A SIS is the last line of physical defense: if the regular PLC commands a reactor toward an unsafe pressure, the SIS — a separate controller, on separate hardware, with separate logic — independently trips the process to a safe shutdown. The SIS exists for safety, not security, but it is profoundly relevant to security for two reasons. First, it is the reason that a compromised control system does not automatically equal a catastrophe: a well-designed plant assumes the control system can fail and provides an independent safety layer. Second, and chillingly, the most consequential OT cyberattack yet discovered — the Triton/Trisis malware of 2017, which we examine in §33.6 — targeted the SIS itself, attempting to disable the very safety layer that exists to prevent disaster. The defensive principle is absolute: the safety instrumented system must be the most isolated, most jealously segmented system in the entire environment, because it is the control that turns an attack on the process into a safe shutdown rather than an explosion.
📟 War Story: A constructed but representative composite. An enthusiastic new security hire at a water utility, told to "get our asset inventory up to date," runs a standard network discovery scan across what they believe is a normal subnet. It is the SCADA network. Two RTUs at remote pumping stations, running firmware older than the analyst, stop responding to the malformed traffic and require a manual reset by a technician who has to drive forty minutes to the site. No water was contaminated and no one was hurt — this time — but the operators lost telemetry from two stations for two hours and learned, the hard way, that the IT playbook does not transfer. The lesson the analyst should have learned on day one, and now will never forget: in OT, discovery is passive by default, and you do not touch the control network without the engineers' explicit blessing.
🧩 Try It in the Lab: You cannot (and must not) practice on real OT, but you can build the mindset safely. Take any device on your own home network — a router, a smart plug, a printer — and write a one-page "OT-style" risk note for it: What does it control physically (if anything)? Can it be patched, and what breaks if you reboot it mid-use? What is its default credential situation? If you could not patch it, what compensating controls (segment it onto a guest VLAN, monitor its traffic, restrict what can reach it) would you apply instead? This is exactly the reasoning an OT defender does, scaled down to a sandbox you own.
🔄 Check Your Understanding: 1. A critical PLC has a publicly known, unpatched vulnerability, and the vendor will not have a fix for a year. Name three compensating controls you would apply in the meantime. 2. What is a safety instrumented system, and why is it both the reason a control-system compromise need not be catastrophic and a uniquely high-value target for an attacker?
Answers
- (a) Segment the PLC so only the HMI/SCADA that must reach it can — neutralizing the vuln by removing reachability; (b) deploy passive monitoring to detect any exploitation attempt or unexpected change to the controller; (c) broker and strongly authenticate all access through the IDMZ jump host, with session recording. (Patching the surrounding IT-like hosts in a maintenance window is also valid.) 2. A SIS is an independent control system that forces the process to a safe state when conditions become dangerous, on separate hardware and logic from the normal control system. It means a compromised or failed control system does not automatically cause physical disaster, because the SIS independently trips the process safe — and that is exactly why it is the ultimate target: an attacker who can disable the SIS removes the last barrier between an attack on the process and a real-world catastrophe (the Triton goal, §33.6).
33.5 Monitoring OT passively: seeing without touching
If you cannot patch, cannot scan, and cannot put an agent on most hosts, how do you defend? You watch. Passive OT monitoring is the practice of detecting threats and building asset visibility purely by observing a copy of the network traffic — never by sending packets of your own — so that the fragile, real-time control systems are never touched or disturbed. It is the single most important active defensive capability in OT, and it is the OT counterpart to the network monitoring you met in Chapter 6 and the SIEM-fed detection program you will recognize from the security-operations chapters: same idea, adapted to a network where you must be a silent observer.
The mechanism is a network tap or a switch's SPAN/mirror port — a hardware or switch feature that copies all traffic passing a point on the network and sends the copy to a monitoring sensor. The sensor receives a perfect duplicate of every packet but is electrically incapable (with a true tap) of injecting anything back. That property is the whole point: a passive sensor cannot crash a PLC because it cannot send to a PLC. You place sensors at the boundaries that matter — above all at the IDMZ, where any IT-to-OT crossing must appear — and inside the OT network to see controller-to-controller and HMI-to-controller traffic.
What does passive monitoring give you? Three things, in increasing order of sophistication:
1. Asset inventory, for free and for safe. By watching traffic, the sensor learns what devices exist, what they are (it can often fingerprint a PLC's make and model from how it speaks), how they normally communicate, and with whom. This is how you get the inventory that §33.3's Purdue mapping requires without the scan that §33.4 forbids. The first time you deploy passive monitoring on a network everyone swore was air-gapped, the inventory it produces is frequently a revelation — including devices nobody knew were there and connections nobody knew existed.
2. Baseline and anomaly detection. OT networks are, compared to IT networks, gloriously predictable. The same HMIs talk to the same PLCs using the same commands at the same intervals, day after day, because the process does the same thing. This determinism — a liability for patching — is a gift for detection. You learn the normal pattern (the baseline) and then anything that deviates is worth an alert: a new device appearing, an HMI suddenly issuing a command it never issues, a connection from the IT side that should not exist, an engineering-workstation download to a PLC outside a maintenance window. In IT, anomaly detection drowns in the noise of human unpredictability; in OT, the signal-to-noise ratio is so much better that "this has never happened before" is genuinely actionable.
3. Protocol-aware threat detection. A sensor that understands industrial protocols can go further: it can flag a Modbus command to write a value outside the normal operating range, an unauthorized attempt to change a PLC's program, or traffic matching known ICS-attack signatures. This is where vendor OT-monitoring platforms and the open-source ICS detection ecosystem live, and where the MITRE ATT&CK for ICS knowledge base (a catalog of adversary techniques specific to control systems, analogous to the enterprise ATT&CK you would have met in the threat-detection chapters) becomes the map for "what should I be able to detect?"
A worked example: a passive detection at Meridian's IDMZ
Return to Meridian's data-center BMS. Sam has deployed a passive sensor on a SPAN port at the IDMZ boundary (Purdue level 3.5) and let it learn the baseline for two weeks. The normal pattern is simple: the BMS supervisory server (Level 3) pushes historian data up to its IDMZ replica every 60 seconds, and the facilities vendor connects to the IDMZ jump host on Tuesday mornings for routine maintenance. Nothing on the corporate network (Levels 4–5) ever opens a connection into the OT domain — that is the rule the IDMZ enforces.
One Wednesday at 02:14, the sensor emits an alert. Here is the (illustrative) detection, as it would appear in the monitoring console and forwarded to Meridian's SIEM:
ALERT ot-sensor-idmz severity=HIGH rule=IT_TO_OT_BOUNDARY_VIOLATION
time 2026-06-10T02:14:07Z
src 10.20.5.66 (zone=LEVEL_4_BUSINESS, host=fileserver-03)
dst 10.50.3.10 (zone=LEVEL_3_OT, host=bms-scada-01)
proto SMB/445
note New flow: a Level-4 host initiated a connection INTO the OT domain.
No baseline precedent. Direction crosses the IT/OT boundary.
Expected OT<->IT exchange is brokered via IDMZ replica only.
Read that the way a defender reads any indicator. The direction is the alarm: a business-network file server (Level 4) reached into the OT domain (Level 3), the one thing the Purdue model says must never happen. The source host, fileserver-03, is an ordinary corporate server — which is exactly what a ransomware worm or a lateral-movement tool would be running on after a phished laptop gave it a foothold on the IT side. The sensor did not need to know what the traffic was; it needed only to know that this direction, this crossing, has no precedent and violates the boundary. In Meridian's case the investigation finds the cause is the misconfigured vendor path Sam discovered in §33.3 — a legacy direct route that bypassed the IDMZ, now being traversed by an automated process — and the response is to kill the path, confirm the OT side is clean, and finally broker that access properly. Had this been the early move of a real intrusion, the passive sensor would have caught the boundary crossing before it reached a controller. That is the win condition: detect the crossing at the boundary, not the damage at the process.
🛡️ Defender's Lens: Notice how much this OT detection resembles, and how much it differs from, an IT one. Like an IT detection, it is built from telemetry, alerts on deviation from a baseline, and feeds a central SIEM for correlation and response. Unlike an IT detection, the baseline is trustworthy enough that a single first-occurrence is actionable — there is no sea of false positives from normal human behavior to wade through — and the highest-value rule is not "malware signature matched" but "traffic crossed a boundary it should never cross." In OT, the boundary itself is the detector. Build your highest-fidelity alerts around the Purdue boundaries, especially the IT/OT line at the IDMZ.
⚠️ Common Pitfall: Deploying passive monitoring and then tuning it like an IT IDS — suppressing "noisy" alerts to cut volume. In OT the volume is low and the noise is rare; an alert you are tempted to suppress as a nuisance ("the engineering workstation downloaded to a PLC again") may be the only signal you will ever get that someone is reprogramming a controller. Investigate OT anomalies; do not reflexively tune them away. The discipline that reduces alert fatigue in a busy IT SOC is the wrong instinct in a quiet OT network where every anomaly is meaningful.
🔄 Check Your Understanding: 1. Why can a true network tap never crash a PLC, and why does that property make it the default OT discovery and monitoring method? 2. The Meridian alert fired on the direction of a connection, not on its content. Explain why direction across the IT/OT boundary is such a high-fidelity indicator in an OT environment.
Answers
- A true tap is a one-way copy of traffic — it duplicates packets to a sensor but is physically unable to inject packets back onto the monitored link — so it cannot send anything to a PLC and therefore cannot disturb it. Because OT controllers can be crashed by unexpected traffic and active scanning is unsafe, an observation method that cannot transmit is the only universally safe way to discover and monitor OT assets. 2. The Purdue model dictates that the IT and OT domains never communicate directly; all legitimate exchange is brokered through the IDMZ. Therefore any direct IT→OT connection is, by design, illegitimate regardless of its content — making the mere direction of the flow a near-certain indicator of either a serious misconfiguration or an active intrusion, with very few benign explanations to generate false positives.
33.6 Lessons from real OT attacks: what the defender takes away
The OT-security field is small, and much of what it knows it learned from a handful of real attacks. We study them here strictly from the defender's seat and strictly at the level of public fact — what crossed which boundary, what control would have changed the outcome — and we do not reproduce any attack detail that would help reproduce one. The point is the lesson, not the technique.
Stuxnet (discovered 2010) — the air gap is not a control. Stuxnet was a sophisticated worm, widely attributed in public reporting to a nation-state effort, that targeted specific industrial controllers at a uranium-enrichment facility and caused physical damage to centrifuges while reporting normal readings to the operators. The single most important defensive lesson is this: the facility was air-gapped — physically isolated from the internet — and it was compromised anyway, reportedly via removable media (USB) that carried the malware across the gap. The air gap is the belief that a network is safe because it has no connection to other networks; Stuxnet proved that a determined adversary defeats it through the paths an air gap does not cover — removable media, vendor laptops, the supply chain, a contractor's connection. The defensive takeaway is not "air gaps are useless"; isolation is genuinely valuable. The takeaway is that an air gap is a boundary to be monitored and enforced, not a magic absence of risk — and that defenders who treat "we're air-gapped" as the end of the security conversation are defending the one assumption the most consequential OT attack in history specifically broke. Treat every air gap as porous: monitor it (§33.5), control removable media and vendor access, and never let it justify leaving the controllers undefended.
The Ukraine power-grid attacks (2015 and 2016) — IT compromise becomes OT impact. In December 2015, attackers caused a power outage affecting roughly a quarter of a million people in Ukraine by, in public summary, first compromising the IT networks of electricity distribution companies (through phishing and credential theft), then using that foothold to reach the OT networks and operate the breakers — remotely opening them to cut power — while hampering recovery. A second, more automated attack followed in 2016. The defensive lessons are a direct endorsement of everything in this chapter. First, the path was IT-to-OT: the attackers did not start in the substation; they started in the corporate email inbox and crossed a boundary that should have stopped them — exactly the crossing the IDMZ and passive boundary monitoring exist to prevent and detect. Second, operability survived because of manual fallback: Ukrainian operators were able to restore power by switching to manual operation at the substations, a resilience that more heavily automated grids might lack — a reminder that the ability to run the process without the compromised digital control system is itself a defensive asset. The Ukraine attacks are the clearest real-world demonstration of why the IT/OT boundary is the line a defender guards most carefully: an ordinary IT intrusion, allowed to cross, became people sitting in the cold and dark.
Triton / Trisis (discovered 2017) — the attack on safety itself. Triton, also called Trisis, was malware discovered at a petrochemical facility (public reporting places it in the Middle East) that specifically targeted a safety instrumented system — the independent safety controller of §33.4. The malware sought to reprogram the SIS, and the incident came to light because it accidentally tripped the safety system into a safe shutdown rather than achieving its goal. Sit with the implication: the attackers were attempting to disable the very system whose job is to prevent a physical catastrophe when the normal controls fail. Had they succeeded and then induced an unsafe process condition, the safety layer designed to force a safe shutdown would not have acted. Triton is the reason §33.4 insists that the SIS be the most isolated system in the environment. The defensive lessons: the safety system is a security target, not just a safety device; it must be segmented even from the rest of the OT network, with its own monitoring; and any unexpected interaction with safety controllers — any attempt to change their logic, any unplanned trip — is a maximum-severity event that demands immediate investigation, never a routine acknowledgment.
Colonial Pipeline (2021) — you don't have to touch the OT to take it down. We opened with this incident; now we close the loop with its lesson stated plainly. The ransomware reached only the IT network, via a single VPN account that lacked multi-factor authentication. The pipeline's control systems were never directly compromised. Yet the pipeline stopped — because the company could not bill, and could not be confident the boundary had held, and so made the safety-and-prudence decision to shut the physical process down. The defensive lessons are threefold and they tie the chapter together. First, basic IT hygiene is OT security: a missing multi-factor-authentication prompt on one VPN account led to a national fuel emergency, because OT and IT are not separate worlds but a single attack surface with a boundary in the middle. Second, the strength and clarity of the IT/OT boundary determines your options under attack: an organization that can prove its OT is isolated from a compromised IT network may be able to keep the process running safely; one that cannot must assume the worst and stop. Third, the impact of an IT breach on an OT-dependent business includes the loss of the business processes around the OT — billing, scheduling, dispatch — even when the physical process is untouched. Colonial is the anchor for this chapter precisely because it is not an exotic OT attack. It is an ordinary IT breach whose consequences were physical, which is the far more common and more teachable shape of the threat.
Across all four, one pattern dominates and it is the thesis of this chapter: the boundary between IT and OT is where critical-infrastructure incidents are won or lost. Stuxnet crossed an air gap. Ukraine and Colonial crossed the IT/OT line from the business network. Triton, once inside, went after the safety layer. A defender who segments that boundary rigorously (the Purdue model and the IDMZ), monitors it passively for any crossing, and refuses to treat any form of isolation as a substitute for vigilance has internalized the entire field's hard-won lessons.
🔗 Connection: The ransomware-as-a-service business model behind Colonial Pipeline is dissected in Chapter 35, which this chapter sets up; Colonial also recurs in the book's capstone synthesis. Here, our lens was narrow and specific: the IT/OT boundary and what the OT defender does about it. Notice, too, that this is the same lesson as the entire book in miniature — the asymmetry introduced in Chapter 1 (one missing authentication prompt, one national emergency) and the assume-breach principle (design as if the IT side is already compromised, because at Colonial it was).
🔄 Check Your Understanding: 1. What single, specific defensive lesson does Stuxnet teach about air gaps, and how should it change a defender's behavior? 2. The Ukraine grid attacks and Colonial Pipeline are both, at root, the same story. State that shared story in one sentence, and name the control category that addresses it.
Answers
- An air gap is not a guaranteed control — Stuxnet crossed a physically isolated network via removable media — so a defender must treat every air gap as porous: monitor and enforce it, control removable media and vendor/contractor access, and never let "we're air-gapped" justify leaving controllers otherwise undefended. 2. Both began as ordinary IT compromises (phishing/credential theft) that were allowed to cross the IT/OT boundary and reach systems that affected a physical process; the control category that addresses it is rigorous IT/OT segmentation with a brokered, monitored boundary — the Purdue model and the IDMZ, backed by passive boundary monitoring.
Project Checkpoint
Meridian is not a utility, but as §33.1 established, it has OT: the building management system that keeps its data center cool and powered. This chapter's increment adds an OT/facilities segmentation plan to the security program and the otsec.py module — beginning with purdue_zone(asset) — to bluekit.
Program increment — OT/facilities segmentation plan. Sam Whitfield documents, for Dana to present, a one-page plan with four parts. (1) Inventory and zone mapping: the table from §33.3 placing every BMS asset in its Purdue level (0 through 5), produced and maintained primarily through passive discovery — no active scans on the facilities network. (2) The boundary: a stated rule that no traffic flows directly between the corporate network (Levels 4–5) and the BMS (Levels 0–3); all necessary exchange — vendor remote access, historian data to reporting — is brokered through an IDMZ (Level 3.5) jump host and historian replica. The immediate remediation is to eliminate the vendor's direct path into the Level-3 supervisory server. (3) Compensating controls: because the BMS controllers cannot take modern patches or MFA, the plan leans on segmentation, IDMZ-brokered and strongly authenticated access, and passive monitoring rather than host-level fixes. (4) Passive monitoring: a sensor on a SPAN/tap at the IDMZ boundary, baselined and feeding Meridian's SIEM, with the highest-severity rule reserved for any IT→OT boundary crossing. This plan slots into the program between the zero-trust roadmap of the prior chapter and the analytics work that follows.
bluekit increment — otsec.py. We turn the Purdue mapping into code. The function classifies an asset into its Purdue zone from a small descriptor and flags whether it sits on the IT/OT boundary — the first programmatic step in the inventory the plan depends on. As always, the code is never executed during authoring; the hand-traced result is in the # Expected output: comment.
# bluekit/otsec.py — Chapter 33 increment
"""Purdue-model zoning for OT asset inventory.
Classify an asset into a Purdue level (0-5, plus 3.5 IDMZ) and report
which security domain it belongs to. Reachability is control in OT, so
knowing an asset's zone is the first step in defending the boundary.
"""
# Canonical role -> Purdue level. Lower = closer to the physical process.
_ROLE_TO_LEVEL = {
"sensor": 0, "actuator": 0, # field devices (the metal)
"plc": 1, "rtu": 1, "controller": 1, # basic control
"hmi": 2, "area_scada": 2, # area supervisory control
"scada_server": 3, "historian": 3, "eng_ws": 3, # site operations
"idmz_broker": 3.5, "jump_host": 3.5, "historian_replica": 3.5, # IDMZ
"mes": 4, "site_file": 4, "scheduling": 4, # business logistics (IT)
"email": 5, "ad": 5, "internet": 5, # enterprise (IT)
}
def purdue_zone(asset: dict) -> dict:
"""Return {level, domain, is_boundary} for an OT/IT asset by its role."""
role = asset.get("role", "").lower()
if role not in _ROLE_TO_LEVEL:
raise ValueError(f"unknown role {role!r}; add it to _ROLE_TO_LEVEL")
level = _ROLE_TO_LEVEL[role]
domain = "IT" if level >= 4 else ("IDMZ" if level == 3.5 else "OT")
# The IDMZ is the only place an IT<->OT exchange may legitimately occur.
return {"level": level, "domain": domain, "is_boundary": level == 3.5}
if __name__ == "__main__":
assets = [
{"name": "server-hall-temp", "role": "sensor"},
{"name": "bms-plc-3", "role": "plc"},
{"name": "facilities-hmi", "role": "hmi"},
{"name": "bms-scada-01", "role": "scada_server"},
{"name": "facilities-jump", "role": "jump_host"},
{"name": "corp-email", "role": "email"},
]
for a in assets:
z = purdue_zone(a)
flag = " <-- IT/OT boundary" if z["is_boundary"] else ""
print(f"{a['name']:18s} L{z['level']:<3} {z['domain']:4s}{flag}")
# Expected output:
# server-hall-temp L0 OT
# bms-plc-3 L1 OT
# facilities-hmi L2 OT
# bms-scada-01 L3 OT
# facilities-jump L3.5 IDMZ <-- IT/OT boundary
# corp-email L5 IT
The twenty-odd lines do exactly what the segmentation plan's first step requires: every asset gets a Purdue level, a domain (OT, IDMZ, or IT), and a flag marking the boundary brokers. Run this over a passively built inventory and the assets that should be your only IT/OT bridges light up — and any asset that talks across the boundary without being one of them is a finding, in the spirit of the detection in §33.5. You have written the first line of Meridian's OT defense, and it is the same first line every OT program starts with: know which zone everything is in.
Summary
This chapter moved cybersecurity from data to physics — defending the computers that run the physical world.
- Operational technology (OT) controls physical processes; IT moves data. OT's priorities invert IT's: safety first, then availability, then integrity, then confidentiality. "Downtime can kill," so every control is weighed against the risk of stopping or destabilizing the process. Critical infrastructure runs on OT, and even an IT-centric organization has OT in its building management, power, and cooling.
- The ICS stack, bottom to top: field devices (sensors/actuators) → PLCs/RTUs (controllers running real-time logic) → HMIs (operator screens, usually old Windows/Linux) → SCADA/DCS (the supervisory layer). The protocols are old and mostly unauthenticated, the equipment lives for decades, and determinism is sacred — so reachability equals control.
- The Purdue model (levels 0–5, plus the IDMZ at 3.5) is the map: levels 0–3 are OT, 4–5 are IT, and they must never communicate directly — every exchange is brokered through the IDMZ. Segmentation in OT is a safety control, not merely a security one.
- IT reflexes break in OT: you often cannot patch (no fix, no outage window, no validation), cannot actively scan (it crashes controllers), cannot install agents everywhere, and cannot enforce MFA on the devices. The answers are compensating controls — segment harder, monitor passively, broker and strongly authenticate access at the IDMZ, patch only what is safe in maintenance windows. The safety instrumented system (SIS) is the independent last line of physical defense and therefore the most isolated, highest-value asset to protect.
- Passive OT monitoring — observing a tap/SPAN copy of traffic, never transmitting — provides safe asset inventory, anomaly detection against an unusually trustworthy baseline, and protocol-aware threat detection. In OT, the boundary itself is the detector: the highest-fidelity alert is any IT→OT crossing.
- Real incidents teach one lesson: the IT/OT boundary is where critical-infrastructure attacks are won or lost. Stuxnet — an air gap is a boundary to monitor, not a guarantee. Ukraine (2015/2016) — an IT compromise crossed into OT to open breakers; manual fallback saved recovery. Triton (2017) — malware targeted the safety system itself. Colonial Pipeline (2021) — an ordinary IT ransomware breach (one missing MFA prompt) stopped a physical pipeline because the boundary could not be trusted.
Spaced Review
Retrieval practice over recent and older material. Answer without scrolling up.
- (from Chapter 6) Define network segmentation and the distinction between east-west and north-south traffic. In the Purdue model, which boundary's crossing is the highest-fidelity OT indicator, and is that an east-west or a north-south concern relative to the IT/OT line?
- (from Chapter 11) Host patch management is a non-negotiable hygiene control in IT. Give two specific reasons it is frequently impossible in OT, and name the control category that substitutes for it.
- (from this chapter) State the OT priority ordering, and explain why a network tap — not a vulnerability scanner — is the default OT asset-discovery tool.
- (from this chapter) Colonial Pipeline's malware never reached the control systems, yet the pipeline stopped. What does this teach about the relationship between IT hygiene and OT outcomes?
Answers
1. Network segmentation divides a network into isolated zones so that traffic between them can be controlled (Chapter 6); north-south traffic crosses between an inside and an outside (e.g., between trust zones), while east-west traffic moves laterally within a zone. The highest-fidelity OT indicator is any *direct IT→OT crossing* at the IDMZ boundary, which is a north-south concern relative to the IT/OT line — that line is precisely the boundary the Purdue model forbids crossing directly. 2. Any two of: no vendor patch exists (frozen firmware); patching requires a process outage that occurs at most once or twice a year; the patch is not validated against the process and re-validation is required; rebooting an HMI mid-process blinds operators. The substitute is *compensating controls* — segmentation, passive monitoring, and IDMZ-brokered access. 3. Safety → availability → integrity → confidentiality. A tap is the default discovery tool because it only *copies* traffic and cannot transmit, so it can never crash a fragile controller, whereas an active scanner sends probes that can stop a process — unacceptable when safety and availability come first. 4. OT and IT are one attack surface with a boundary in the middle; ordinary IT hygiene failures (a missing MFA prompt) can produce physical/operational consequences, so basic IT controls *are* OT security, and the trustworthiness of the IT/OT boundary determines a defender's options under attack.What's Next
You have now defended the most consequential systems in the book — the ones where a breach is measured in physical harm, not lost records — and you have seen that the threats reaching them are, increasingly, the same threats reaching everything else, merely with higher stakes at the boundary. Chapter 34 turns to a different frontier: the use of artificial intelligence and machine learning in defense, where anomaly detection and behavioral analytics promise to find what signatures miss — and where attackers turn the same tools against you. Then Chapter 35 gathers the threats on the horizon, including the ransomware-as-a-service economy that drove Colonial Pipeline and the post-quantum reckoning coming for the cryptography of Part I. Keep the OT lesson with you as you go: the discipline that defends a power grid — assume the boundary will be tested, watch it relentlessly, and design so that one failure is survivable — is the discipline that defends everything.