Chapter 23: Vulnerability Management: Scanning, Prioritizing, Patching, and the Vulnerability That Never Gets Fixed

DataField.Dev

45 min read

> "There are two kinds of organizations: those that patch, and those that get patched."

Prerequisites

12
2

Learning Objectives

Run the closed-loop vulnerability-management lifecycle from asset discovery through verified remediation.
Scan authenticated and unauthenticated targets without causing outages, and read the output critically.
Prioritize remediation with CVSS, EPSS, KEV, and asset context — and explain why CVSS alone is not a priority.
Set risk-based patch SLAs, run a defensible exception/risk-acceptance process, and detect when exceptions are abused.
Report vulnerability trends and the program metrics a board actually cares about.

In This Chapter

Overview
Learning Paths
23.1 The lifecycle: why "scan and patch" is not a program
23.2 Scanning without breaking things
23.3 Why CVSS isn't priority: EPSS, KEV, and context
23.4 Patch SLAs and the art of the exception
23.5 The vulnerability that never gets fixed
23.6 Reporting and trends: proving the program works
Project Checkpoint
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 23: Vulnerability Management: Scanning, Prioritizing, Patching, and the Vulnerability That Never Gets Fixed

"There are two kinds of organizations: those that patch, and those that get patched." — security-operations folk wisdom (widely repeated; origin uncertain)

Overview

On the evening of December 9, 2021, the security world stopped what it was doing. A proof-of-concept exploit for a flaw in Apache Log4j — a logging library so common that almost no one knew everywhere it ran — was circulating publicly, and it was about as bad as a vulnerability can be. An attacker who could get a single crafted string into a log message could make the victim's server reach out and run code of the attacker's choosing. No password. No prior access. Just a string in a username field, a chat message, a user-agent header — anywhere that text eventually got logged. It was assigned CVE-2021-44228, scored CVSS 10.0, and nicknamed Log4Shell. Within hours, exploitation was global and indiscriminate.

At Meridian Regional Bank, the call came to SOC Manager Marcus Reyes at 9:40 p.m. His first question was not "how bad is the vulnerability?" — the internet had already answered that. His question was the one every defender asks in the first hour of a crisis like this, and the one this chapter is about: where do we have it, and what do we fix first? Meridian runs hundreds of applications. Some are internet-facing and some are buried three network zones deep. Some process cardholder data and some run the cafeteria menu screens. Log4j could be in any of them — in a vendor appliance no one had the source for, in a Java service Sam Whitfield wrote two years ago, in a dependency of a dependency that nobody had ever looked at. Marcus could not patch everything at once, and the things he patched in the wrong order could cost the bank either an outage or a breach. He had to prioritize under pressure, with incomplete information, while the clock ran.

That is vulnerability management. It is not "run a scanner and fix what's red." Every organization of any size has thousands — often tens or hundreds of thousands — of open vulnerabilities at any moment, and it will never get to zero. The discipline is not elimination; it is triage at scale: continuously finding weaknesses, ranking them by the risk they actually pose, fixing the ones that matter on a defensible schedule, consciously accepting the ones that don't, and proving to an auditor and a board that the whole loop is working. Done well, it is one of the highest-leverage activities in all of security, because the boring truth — confirmed in breach report after breach report — is that most successful intrusions exploit a vulnerability for which a patch had been available for months. The exotic zero-day makes the news. The unpatched known vulnerability makes the breach.

In this chapter, you will learn to:

Run the vulnerability-management lifecycle as a closed loop, not a one-time scan, and understand why the loop never closes for good.
Scan with authenticated and unauthenticated scanners deliberately, and do it without knocking fragile systems over.
Prioritize with the full signal set — CVSS severity, EPSS exploit probability, the KEV catalog of what's actually being exploited, and your own asset context — and articulate precisely why a CVSS score is not a priority.
Set risk-based patch SLAs, run an exception process that doesn't become a loophole, and detect when it is being abused.
Report trends and metrics that tell a true story about whether the program is winning or losing.

Learning Paths

Vulnerability management sits at the intersection of operations, engineering, and governance, so this chapter genuinely serves all four paths — but it serves them differently.

🛡️ SOC Analyst: Live in §23.2 (reading scan output and separating real findings from noise) and §23.3 (prioritization — this is the skill you'll use on every triage). When a KEV-listed vulnerability lands, you are often the first to know it matters. 🏗️ Security Engineer: Focus on §23.2 (scanning safely, authenticated coverage) and §23.4 (patch deployment, SLAs, and the operational reality of the exception backlog). You own the remediation pipeline. 📋 GRC: §23.4 (the exception/risk-acceptance process is yours to govern) and §23.6 (reporting, trends, and the metrics the board sees) are your home turf. The whole chapter is a control an auditor will test. 📜 Certification Prep: Every term here — CVE, CVSS, EPSS, KEV, the lifecycle, patch management vs. vulnerability management — appears on Security+ and CISSP. The key-takeaways.md file maps them to exam domains and gives you the CVSS-vs-EPSS-vs-KEV distinction examiners love to test.

23.1 The lifecycle: why "scan and patch" is not a program

Newcomers, and unfortunately some vendors, describe vulnerability management as a two-step activity: scan, then patch. That description fails the first time it meets reality, and the failure is instructive. A single scan of Meridian's network returns 41,000 findings. You cannot patch 41,000 things. Some have no patch. Some would cause an outage if patched carelessly. Some are false positives. And tomorrow the scanner finds 3,000 new ones, because new vulnerabilities are disclosed every single day and your environment changes every single day. "Scan and patch" is not a program; it is a way to drown.

Vulnerability management is the continuous, closed-loop process of identifying, evaluating, prioritizing, remediating, and verifying the elimination of security weaknesses across an organization's assets — and reporting on the whole loop so it can be governed and improved. Read that definition twice, because every word earns its place. Continuous and closed-loop: it is a cycle that runs forever, and "fixed" is not real until you have re-scanned and confirmed it. Prioritizing: the heart of the discipline, because you will always have more findings than capacity. Reporting: if you can't show the trend, you can't prove the program works or argue for the budget to improve it.

Here is the loop. Memorize its shape; it is the spine of everything in this chapter.

                    THE VULNERABILITY-MANAGEMENT LIFECYCLE
                    (a loop, not a line — it never stops)

      ┌──────────────────────────────────────────────────────────────┐
      │                                                                │
      ▼                                                                │
 ┌─────────┐    ┌─────────┐    ┌──────────┐    ┌──────────┐    ┌───────────┐
 │ 1.      │    │ 2.      │    │ 3.       │    │ 4.       │    │ 5.        │
 │ DISCOVER│ -> │ ASSESS  │ -> │PRIORITIZE│ -> │ REMEDIATE│ -> │ VERIFY    │
 │ assets  │    │ (scan)  │    │ (risk-   │    │ (patch / │    │ (re-scan, │
 │ + attack│    │         │    │  based)  │    │ mitigate │    │ confirm   │
 │ surface │    │         │    │          │    │ / accept)│    │ closed)   │
 └─────────┘    └─────────┘    └──────────┘    └──────────┘    └─────┬─────┘
      ▲                                                              │
      │                          ┌──────────┐                       │
      └──────────────────────────│ 6. REPORT│◄──────────────────────┘
                                 │ + improve│
                                 │ (metrics,│   feeds back: new assets appear,
                                 │  trends, │   SLAs adjust, scope expands,
                                 │  SLAs)   │   exceptions get re-reviewed
                                 └──────────┘

Figure 23.1 — The vulnerability-management lifecycle. The loop is continuous: VERIFY feeds back into DISCOVER because remediation reveals new assets and new exposure, and REPORT drives the program's improvement. "Fixed" is only true after VERIFY confirms it.

Walk the stages once, and we will spend the rest of the chapter going deep on the hard ones.

1. Discover. You cannot manage what you cannot see. This is attack surface management — the continuous discovery and inventory of all the assets an organization exposes, especially the internet-facing ones it may not even know it owns. Forgotten cloud instances, a marketing microsite a vendor spun up, an old VPN appliance: the assets nobody remembers are the ones that breach you, because nobody is patching them. (This builds directly on the asset inventory Meridian started in Chapter 1; an asset inventory you don't maintain is a lie you tell auditors.)

2. Assess. Run scanners against the discovered assets to find known weaknesses. This is the step everyone thinks of, and §23.2 is about doing it without breaking things.

3. Prioritize. Convert the flood of findings into a ranked, defensible list of what to fix first, using risk — not raw severity. This is the intellectual core of the chapter and the subject of §23.3.

4. Remediate. Actually reduce the risk. Note the three options, because beginners forget two of them: patch (apply the fix), mitigate (reduce risk without the fix — a firewall rule, a config change, taking the asset offline), or accept the risk through a governed exception (§23.4). Mitigation matters enormously when no patch exists or when patching must wait.

5. Verify. Re-scan and confirm the finding is actually gone. This is the step that separates real programs from theater. A ticket marked "done" is not a fixed vulnerability; a re-scan that no longer reports it is.

6. Report and improve. Track the metrics and trends (§23.6) that tell you and your board whether you are gaining or losing ground, and feed that back to tune SLAs, scope, and exception reviews.

Hold onto the shape of that loop, because it encodes the chapter's first hard truth: vulnerability management is not a quest to reach zero vulnerabilities. That number is unreachable and, past a point, not even worth chasing. The discipline is risk reduction under permanent scarcity — you will always have more findings than capacity, so the entire game is deciding, defensibly, repeatably, and faster than the attackers, which findings actually matter and fixing those. The teams that drown are the ones still trying to empty the scanner's output; the teams that win are the ones ranking it. Internalize that and you stop counting vulnerabilities and start managing risk.

A quick but load-bearing distinction before we go on. In Chapter 11 you learned patch management as a host activity — the mechanics of testing and deploying updates to a machine. That is one tool inside one stage (Remediate) of this larger loop. Vulnerability management is the program: it decides which patches matter, how fast they must be applied, what to do when no patch exists, and how to prove the whole thing is working. Patch management answers "how do I deploy this update?"; vulnerability management answers "should I, how urgently, and what about the things I can't patch?" The exam loves this distinction, and so will your auditor.

🔄 Check Your Understanding: 1. Why is "scan and patch" an inadequate description of vulnerability management? Name three realities it ignores. 2. A team marks a critical finding "remediated" in its ticket system the moment the patch is pushed. Which lifecycle stage have they skipped, and why does it matter?

Answers

It ignores that (a) you have vastly more findings than capacity, so prioritization is mandatory; (b) many findings have no patch, or patching causes outages, so mitigation and acceptance are real options; and (c) the environment and the threat landscape change daily, so it is a continuous loop, not a one-time event. 2. They skipped Verify — pushing a patch is not proof it applied successfully, on every host, without being rolled back. Only a re-scan confirms the vulnerability is actually closed; "deployed" and "remediated" are different states.

23.2 Scanning without breaking things

The Assess stage runs on vulnerability scanners: tools (Nessus, Qualys, Rapid7 InsightVM, OpenVAS, and the cloud providers' native scanners are common examples) that check assets against a constantly updated database of known weaknesses and report what they find. A scanner is not an exploit tool — it is an auditor with a giant checklist. But how it checks, and from where, changes everything about what it sees and what it risks.

Authenticated vs. unauthenticated scanning

The single most important configuration choice in scanning is whether the scanner has credentials.

An unauthenticated scan probes an asset from the outside, as an anonymous attacker would: it connects to open ports, fingerprints the services, and infers vulnerabilities from banners, behavior, and responses. Its great strength is that it sees what an external attacker sees — your true exposed attack surface. Its weakness is that it is guessing. It sees that port 443 is running a web server that looks like a vulnerable version, but it often cannot be sure, which produces both false positives (it flagged a version that was actually back-ported and patched) and false negatives (the vulnerability is in a component it can't see from outside).

An authenticated scan (also called a credentialed scan) logs into the asset with a privileged read-only account and inspects it from the inside: installed package versions, patch levels, registry settings, running services, configuration files. It is far more accurate and far more complete — it can read the exact version of every installed library, not guess it — and it finds the local vulnerabilities (a missing OS patch, a weak configuration) that no external probe could ever see. Its cost is operational: you must provision, protect, and rotate the scanning credentials (a privileged account that can read every host is itself a juicy target — secure it like the crown jewels it is), and you must accept the scanner is now inside the host.

The rule of thumb every program lands on: authenticated scans for everything you can credential (your managed servers, endpoints, databases) because accuracy and depth matter, and unauthenticated scans from outside to validate your true external exposure — to see yourself as the attacker does. They answer different questions and you need both.

⚠️ Common Pitfall: Trusting an unauthenticated scan's "clean" result. A perimeter scan that finds nothing has told you only that nothing is obviously broken from the outside — it is blind to the missing OS patches, weak local configs, and back-room vulnerabilities that authenticated scanning surfaces. Programs that scan only unauthenticated routinely believe they are far healthier than they are. Conversely, a program that scans only authenticated may miss that a host it thinks is internal is, through some misconfiguration, reachable from the internet. Coverage gaps, not findings, are what get you breached.

Scanning without causing an outage

Here is the part of scanning that the brochures skip: a scan can knock a system over. Aggressive scanners send malformed packets, hammer services with connections, and probe inputs in ways fragile software was never built to survive. Legacy systems, embedded devices, medical equipment, and — exactly Meridian's problem — old core-banking interfaces and ATM controllers can crash, hang, or behave unpredictably when scanned hard. The cardinal sin of a vulnerability-management program is causing the very outage you exist to prevent.

You scan safely by being deliberate:

Inventory the fragile first. Tag the assets that are old, embedded, or business-critical-and-delicate (OT and IoT devices especially — recall Chapter 14's ATM and branch-device fleet). These get gentle treatment.
Throttle and schedule. Limit scan intensity (concurrent checks, packets per second) and run heavy scans in maintenance windows, not at the noon transaction peak.
Use passive discovery where active scanning is dangerous. Some assets should never be actively scanned. For these, passive techniques — watching network traffic to fingerprint devices and versions, or reading from a configuration database — find vulnerabilities without ever touching the device. This is the standard answer for sensitive operational technology.
Prefer authenticated (lighter) checks for fragile hosts. Reading an installed version through a credentialed login is gentler than blasting a service with probe traffic.
Test against a staging mirror first when you can, and keep the asset owners informed of the scan window so that if something does go sideways, the right people already know why.

📟 War Story: A constructed but representative case. A hospital's security team launched a default-intensity unauthenticated scan across a clinical network at mid-morning, unaware that several infusion pumps spoke a brittle protocol on an unusual port. The scanner's probes caused a batch of the pumps to reboot. No patient was harmed, but the incident set the security program back two years politically: clinical staff now treated "the security scan" as the threat. The lesson is not "don't scan." It is know your fragile assets before you scan them, throttle, schedule, and go passive where active probing is dangerous. A scan that causes an outage discredits the entire program.

🧩 Try It in the Lab: In your own lab VM only, install OpenVAS or Nessus Essentials and scan a deliberately vulnerable target you control (e.g., a Metasploitable or an old, isolated VM). Run it once unauthenticated, then again with credentials, and compare the findings. Notice how many more — and how much more accurate — the authenticated results are. Never point a scanner at a system you do not own or are not explicitly authorized to test; an unauthorized scan can itself be a crime under laws like the U.S. Computer Fraud and Abuse Act (Chapter 39).

Reading scanner output critically

A scanner's report is a starting point, not a verdict. Every finding carries a CVE identifier (or several), a severity, an affected asset, and the scanner's confidence. Your job is to interrogate it:

Is it a false positive? Authenticated scans reduce these, but they happen — especially with back-ported security fixes on Linux distributions, where the version string looks vulnerable but the distribution silently patched the flaw. Validate before you escalate; nothing burns a remediation team's trust faster than chasing ghosts.
What is the affected asset, really? A "critical" on an isolated test box is not a "critical" on the internet-facing banking portal. The scanner doesn't know that. You do.
Is there a real, exploitable path? A vulnerability in a service that is installed but not running, or unreachable behind a default-deny firewall, is genuine but not urgent. Context is everything — which is the entire subject of the next section.

🔄 Check Your Understanding: 1. You need to know whether your servers are missing OS patches. Which scan type — authenticated or unauthenticated — and why? 2. Give two concrete techniques for scanning a fleet of fragile, business-critical devices without risking an outage.

Answers

Authenticated — only a credentialed scan that logs in can read the actual installed patch levels and configuration; an external probe can at best guess from banners and will miss local vulnerabilities entirely. 2. Any two of: use passive discovery (traffic fingerprinting / config-database reads) instead of active probing; throttle scan intensity and run only in maintenance windows; prefer lighter authenticated checks over aggressive unauthenticated probing; test against a staging mirror first; and pre-notify asset owners of the scan window.

23.3 Why CVSS isn't priority: EPSS, KEV, and context

This is the section that, if you take only one thing from this chapter, you should take. It is the difference between a vulnerability-management program that reduces real risk and one that exhausts itself patching findings nobody will ever exploit while the one that gets you breached sits in the backlog because the scanner rated it "Medium."

CVE and CVSS: what they are, and what they are not

A CVE (Common Vulnerabilities and Exposures) is a unique public identifier for a specific, disclosed vulnerability — like CVE-2021-44228 for Log4Shell. CVE is just a name and a catalog entry, maintained so the whole industry can refer to the same flaw unambiguously. A CVE by itself tells you a vulnerability exists and roughly what it is; it does not tell you how severe or how exploited it is.

CVSS (the Common Vulnerability Scoring System, maintained by FIRST, the Forum of Incident Response and Security Teams) is the industry-standard method for rating the severity of a vulnerability on a 0.0–10.0 scale, derived from its intrinsic characteristics: how it's exploited (network vs. local), how complex the exploit is, whether it needs privileges or user interaction, and the impact to confidentiality, integrity, and availability if exploited. CVSS sorts roughly into bands — Low (0.1–3.9), Medium (4.0–6.9), High (7.0–8.9), and Critical (9.0–10.0). Log4Shell's base score was the maximum, 10.0: network-exploitable, low complexity, no privileges or user interaction needed, total impact. (You will also see Log4Shell quoted as 9.8 in some sources, reflecting different score versions and the metrics chosen; the headline is "as bad as it gets.")

Here is the crucial, career-defining point: CVSS measures severity, not risk, and severity is not priority. A CVSS base score is computed from the vulnerability's intrinsic nature, in a vacuum. It does not know:

whether anyone is actually exploiting this vulnerability in the wild;
whether your affected asset is internet-facing and full of cardholder data, or an isolated lab box;
whether you have compensating controls that already block the exploit path.

The catastrophic, almost universal failure mode is to treat the CVSS score as a to-do priority: "patch all the Criticals first." This sounds responsible and is quietly disastrous, because CVSS-Critical findings are abundant — a typical large environment has thousands — and most of them will never be exploited against you. Meanwhile, a "High" or even "Medium" CVSS vulnerability that is being mass-exploited right now and sits on your perimeter is a far more urgent risk than a "Critical" buried where no attacker can reach it. Severity is one input. It is not the answer.

🚪 Threshold Concept: A CVSS score answers "how bad would this be if exploited, in the abstract?" It does not answer "how likely is this to be exploited against me, and how much would I lose?" — which is the actual definition of risk you learned in Chapter 1 (likelihood × impact). Prioritizing by CVSS is prioritizing by half of one of the two factors. To prioritize by risk, you must add exploit likelihood (EPSS, KEV) and your own asset context (impact). This shift — from severity to risk — is the threshold every serious vulnerability-management program crosses.

EPSS: the probability it will actually be exploited

EPSS (the Exploit Prediction Scoring System, also from FIRST) supplies the missing likelihood signal. EPSS is a data-driven model that estimates the probability that a given vulnerability will be exploited in the wild in the next 30 days, expressed as a number from 0 to 1 (0% to 100%). It is built from real-world exploitation data and the characteristics that correlate with exploitation, and it is updated daily as the world changes.

EPSS is powerful precisely because exploitation is rare and concentrated. The large majority of CVEs are never meaningfully exploited; a small fraction account for almost all real attacks. EPSS lets you find that small fraction. A vulnerability with EPSS 0.94 is predicted highly likely to be attacked imminently; one with EPSS 0.001 almost certainly will not be, regardless of how scary its CVSS looks. Combining CVSS (how bad) with EPSS (how likely) already gets you far closer to true risk than CVSS alone.

KEV: the list of what's being exploited right now

KEV — the Known Exploited Vulnerabilities catalog, published and continuously updated by the U.S. Cybersecurity and Infrastructure Security Agency (CISA) — is the bluntest and arguably most valuable signal of all. KEV is a curated list of vulnerabilities for which CISA has reliable evidence of active exploitation in the wild. It is not a prediction like EPSS; it is a statement of fact: this is being used in real attacks right now. CISA attaches remediation due dates and (for U.S. federal agencies) makes patching KEV-listed vulnerabilities mandatory, which is a strong signal of seriousness for everyone else.

The operating rule writes itself: if a vulnerability is on the KEV list and it exists in your environment, it jumps the queue. Active exploitation is the strongest possible evidence that the likelihood term in your risk equation is high. Many mature programs hard-code this: KEV-listed = treat as critical priority, regardless of CVSS. Log4Shell was added to KEV essentially immediately; that, more than its 10.0 CVSS, is what told every program "this is not a drill."

Putting it together: risk-based prioritization

Risk-based prioritization is the practice of ranking remediation by the actual risk each finding poses, by combining four signals rather than sorting on any one:

Severity (CVSS): how damaging if exploited.
Exploit likelihood (EPSS prediction + KEV fact-of-exploitation): how probable it is to be exploited.
Asset context: how exposed and how valuable your affected asset is (internet-facing? holds regulated data? business-critical?).
Compensating controls: what already reduces the exploit path (a WAF, segmentation, the service not actually running).

The conceptual formula is just Chapter 1's risk equation, instantiated for vulnerabilities:

$$\text{Vulnerability risk} \;\approx\; \underbrace{f(\text{CVSS})}_{\text{impact}} \;\times\; \underbrace{g(\text{EPSS},\,\text{KEV})}_{\text{likelihood}} \;\times\; \underbrace{h(\text{asset exposure \& value})}_{\text{your context}}$$

You do not need a precise number; you need a defensible ordering. Let us make it concrete with five findings from Meridian's scanner during the Log4Shell week, which is exactly the kind of table you will build for real.

#	Vulnerability (illustrative)	CVSS	EPSS	KEV?	Affected asset	Priority decision
A	Log4Shell (CVE-2021-44228)	10.0	0.94	Yes	Internet-facing online-banking portal	P1 — emergency. Max severity, near-certain exploitation, on KEV, on the crown-jewel asset reachable from the internet. Fix tonight.
B	Log4Shell (CVE-2021-44228)	10.0	0.94	Yes	Internal log-aggregation server, no inbound internet path	P2 — urgent but not tonight. Same flaw, but the asset is internal and segmented; likelihood of reachable exploitation is much lower. Mitigate (block egress, WAF) tonight, patch in the SLA window.
C	Old OpenSSL flaw	9.8	0.02	No	Internet-facing marketing site	P3 — scheduled. High CVSS, but EPSS says exploitation is unlikely and it is not on KEV; site holds no sensitive data. Patch on the normal critical SLA, not the emergency path.
D	Privilege-escalation bug	7.8	0.01	No	Internal workstation, requires local access first	P4 — routine. "High" severity, but local-only, low exploit probability, not KEV; an attacker must already be on the box. Standard SLA.
E	Windows RCE on KEV	8.1	0.90	Yes	Internal file server reachable from user LAN	P1/P2 — high. Lower CVSS than C, but it is on KEV (actively exploited) and reachable laterally from where users — and phished attackers — live. Beats the higher-CVSS finding C handily.

Read down the CVSS column and you would patch C (9.8) and D (7.8) — and finding E (8.1, but actively exploited and laterally reachable) would lose to C, which nobody is exploiting. Read by risk and the order inverts: A first, then B and E, then C, then D. That inversion is the whole point of this section. CVSS got you the wrong order; risk got you the right one. Note especially A versus B: the same vulnerability, same CVSS, same EPSS, same KEV status — but different priority, entirely because of asset context. The bank patches the internet-facing portal tonight and the segmented internal server on schedule. Context is not a tiebreaker; it is a first-class factor.

🛡️ Defender's Lens: Attackers prioritize the same way you should — they go for what is reachable, exploitable, and valuable, and they weaponize KEV-listed and high-EPSS vulnerabilities first because those are the ones with working, reliable exploits. When you prioritize by EPSS and KEV, you are literally racing the attacker on the same track, fixing the things they are most likely to come for. Prioritizing by CVSS, by contrast, sends your team to defend doors the attacker was never going to use.

We encode exactly this priority logic into vulnmgmt.py in the Project Checkpoint, so the four-signal decision becomes a small reusable function instead of a judgment call you re-litigate every time.

🔄 Check Your Understanding: 1. In one sentence each, what question does CVSS answer, what question does EPSS answer, and what does KEV tell you? 2. Findings A and B above are the identical CVE with identical CVSS, EPSS, and KEV status, yet A is P1 and B is P2. What single factor explains the difference, and why is it legitimate?

Answers

CVSS: how severe/damaging the vulnerability is if exploited (intrinsic severity). EPSS: the estimated probability it will be exploited in the wild in the next 30 days (predicted likelihood). KEV: that it is being actively exploited right now (a fact, not a prediction). 2. Asset context (exposure) — A is on an internet-facing, high-value asset directly reachable by attackers; B is on an internal, segmented system with no inbound path, so the reachable likelihood is far lower. It is legitimate because risk is likelihood × impact, and exposure is a real, defensible component of likelihood — not all instances of the same flaw carry the same risk.

23.4 Patch SLAs and the art of the exception

Prioritization tells you the order. Patch SLAs tell you the deadline. And the exception process governs what happens when you cannot meet the deadline — which, in a real organization, is constantly.

Risk-based patch SLAs

A patch SLA (service-level agreement) is a documented, policy-backed commitment specifying the maximum time allowed to remediate a vulnerability after it is discovered, tiered by the vulnerability's risk. SLAs convert good intentions into accountability: without a deadline, "we'll get to it" means never, and the auditor has nothing to measure you against. The SLA is the heartbeat of the program — it is what turns prioritization into a clock.

SLAs must be risk-based, not CVSS-based, for every reason in §23.3. A defensible Meridian SLA table looks like this:

Risk tier	Definition (how the tier is set)	Internet-facing / KEV asset	Internal asset
Emergency	On KEV and present on an exposed/critical asset; or active exploitation against us	24–72 hours (often an emergency change)	7 days
Critical	High risk: high CVSS and high EPSS, or KEV on a less-exposed asset	7 days	14 days
High	Elevated risk: high CVSS, moderate EPSS, exposed asset	14 days	30 days
Medium	Moderate risk: contained exposure, low exploit likelihood	30 days	60 days
Low	Minimal real risk; may be accepted	90 days	90 days / accept

Figure 23.2 — An illustrative risk-based patch-SLA table for Meridian (Tier 3). The clock starts at discovery. Note that the tier is set by risk — CVSS plus EPSS plus KEV plus exposure — not by CVSS alone, and that the same finding gets a tighter SLA on an internet-facing or KEV-relevant asset than on an internal one. These numbers are deliberately stricter than the regulatory floor; compliance is the floor, not the ceiling.

Two design notes that separate a thoughtful SLA from a checkbox one. First, the clock starts at discovery, not at "when we got around to it." If your discovery cadence is monthly, your real worst-case exposure is the SLA plus up to a month of blindness — which is why continuous scanning and timely KEV monitoring matter as much as the patch step. Second, regulators and frameworks (PCI-DSS, for instance, sets expectations around timely patching of critical vulnerabilities) establish a floor. A mature program sets SLAs tighter than the floor for its real risks, because — Theme 5 — compliance is the minimum, not the goal. Passing the PCI assessment does not mean you patched fast enough to beat the attacker; it means you cleared the bar the standard happened to set.

Exceptions and risk acceptance — the necessary escape valve

Now the uncomfortable truth: you will not meet every SLA. A patch breaks a critical application. The vendor hasn't released a fix. The system is a 15-year-old core-banking component that cannot be touched without a six-month change project. A medical device's warranty voids if you patch it yourself. These are not failures of will; they are the normal friction of real environments. For them, you need a governed escape valve: the exception (also called a risk acceptance or risk-acceptance request) — a formal, documented, time-bound, and approved decision to deviate from the patch SLA for a specific finding, with a justification, a compensating control, an accountable owner, and an expiry date.

Every word in that definition is a guardrail, because the exception process is where vulnerability-management programs quietly die. Done right, it is honest risk management. Done wrong, it is a rubber stamp that lets the riskiest, most-deferred vulnerabilities accumulate forever under a veneer of paperwork. A legitimate exception has all of:

A real justification ("patch breaks the loan-origination integration; vendor fix is due in Q2"), not "too busy."
A compensating control that actually reduces the risk in the meantime — segment the asset, add a WAF rule, restrict access, increase monitoring. An exception with no compensating control is just an unaddressed risk wearing a costume.
An accountable owner — a named business owner who accepts the risk, ideally at a level of seniority proportional to the risk. The CISO does not own accepted business risk; the business does. A teller-system risk might be accepted by a branch-operations director; a risk to the core ledger goes far higher.
An expiry date and mandatory re-review. No exception is permanent. It expires, and at expiry someone must re-justify it or close it. This is the single most important guardrail, and §23.5 is about what happens when it's missing.
Risk-proportional approval. A low-risk exception can be approved by a manager; accepting the risk of an unpatched KEV vulnerability on the cardholder data environment should require the CISO and possibly the risk committee. The bigger the risk, the higher the signature.

⚠️ Common Pitfall: The permanent "temporary" exception. The most common abuse is not a single bad decision — it is drift: an exception granted for a real, time-boxed reason (the Q2 vendor fix) that is silently renewed, or never re-reviewed, long after the original justification evaporated. Five years later the asset is still unpatched, the "compensating control" was decommissioned in a network redesign nobody connected to the exception, and the business owner who accepted the risk left the company. The paperwork still says "accepted." This is how a known, exploited vulnerability lives in a production environment for years — and it is exactly the story of §23.5 and of more than one real breach.

⚖️ Authorization & Ethics: Risk acceptance is a business decision with real stakes, and the ethics of it matter. The security team's job is to make the risk legible — to state plainly, in writing, what could happen and how likely it is — so the accountable owner is making an informed choice, not a blind one. It is not ethical (or, often, legally defensible) to let an exception bury a serious risk where decision-makers and auditors can't see it. Document the risk honestly, name who accepted it, and keep the record. When a breach follows an accepted risk, the question regulators ask is "who knew, and what did they decide?" — make sure the answer is on paper and was made in good faith.

🔄 Check Your Understanding: 1. Why must patch SLAs be tiered by risk rather than by CVSS severity alone? 2. Name the five elements that make a vulnerability exception legitimate rather than a rubber stamp.

Answers

Because CVSS measures only intrinsic severity, not actual risk; a CVSS-Critical finding that nobody is exploiting on an isolated asset is far less urgent than a lower-CVSS finding that is on KEV and exposed. Risk-based tiers (folding in EPSS, KEV, and asset exposure) put the real deadlines on the findings that actually threaten you. 2. A real justification, a compensating control that genuinely reduces risk, a named accountable owner who accepts the risk at appropriate seniority, an expiry date with mandatory re-review, and risk-proportional approval authority.

23.5 The vulnerability that never gets fixed

Every seasoned defender knows it: the vulnerability that has been on the report for three years, four years, longer. It is "in progress." It has an exception — or it had one, and nobody can find it. It is somehow always next quarter's problem. This section is about that vulnerability, because understanding why it persists is the difference between a program that genuinely reduces risk and one that just manages a spreadsheet of permanent findings.

Vulnerabilities become un-fixable for a small set of recurring, deeply human reasons — almost none of them technical:

No patch exists, or no good one. The software is end-of-life and unsupported, or the vendor has gone out of business, or a fix exists but it is so disruptive (a major version upgrade that breaks integrations) that it triggers a project nobody will fund. The vulnerability is real; the remediation is genuinely hard.
Patching breaks something critical. The host runs a fragile, business-essential application that the patch is known (or feared) to break, and the cost of an outage exceeds, in the owner's mind, the cost of the risk. So the exception renews, and renews.
The asset is a black box. It's a vendor appliance, a medical device, an OT controller — something you contractually cannot patch yourself and the vendor won't update. (Chapter 14's IoT and Chapter 33's operational-technology realities live here. "We can't just patch it" is the defining constraint of OT security.)
Organizational drift. The owner left. The team reorganized. The exception's re-review never got scheduled because the person who scheduled re-reviews changed roles. The compensating control was removed in an unrelated network change. Nobody is deciding to keep the risk anymore; the risk is just there, unowned, because the process that was supposed to revisit it broke. This is the most dangerous category precisely because no one is responsible — the risk persists by inertia, not by choice.
Legacy debt nobody will pay down. The 20-year-old core-banking system (Meridian has one; most banks do) cannot be modernized without enormous cost and risk, so it accretes unpatchable findings that everyone has learned to stop looking at. Familiarity breeds invisibility.

The danger of the never-fixed vulnerability is not abstract. It is that attackers love exactly these. A long-known, long-unpatched vulnerability on a reachable asset is the softest target in the building — a flaw with a reliable, well-documented exploit, on a system everyone has stopped watching. A disproportionate share of real breaches trace back to a vulnerability that was known and fixable in principle but never actually fixed, sitting behind an exception that had quietly become permanent. The thing you stopped looking at is the thing they're looking for.

How a defender manages what cannot (yet) be patched

You cannot always patch. You can always manage the risk, and managing un-patchable risk well is a hallmark of a mature program. The toolkit:

Mitigate relentlessly. If you cannot fix the vulnerability, shrink the exploit path. Segment the asset onto an isolated network (Chapters 6–7). Put a WAF or strict firewall rules in front of it. Restrict who and what can reach it to the absolute minimum. Disable the vulnerable feature if the business can live without it. A well-mitigated un-patchable vulnerability can be a lower real risk than a patchable one you haven't gotten to — mitigation is not a consolation prize, it is risk reduction.
Monitor intensely. If you must keep a known-vulnerable asset alive, watch it like a hawk. Wire it tightly into the SIEM (Chapter 21) and your detection program (Chapter 22): alert on any access, any anomaly, any sign of exploitation. You are accepting that prevention may fail here, so you compensate with detection and response — Theme 4, defense in depth, in its purest form. If you can't keep them out, make sure you'll see them the instant they're in.
Force the re-decision. The fix for organizational drift is process: every exception expires, every expiry triggers a re-review, and a stale exception escalates automatically to higher authority. The goal is to convert "nobody is deciding" back into "a named, senior person is consciously deciding, on the record, to keep this risk." Sometimes that forced conversation is what finally funds the fix.
Track it as accepted risk in the open. A never-fixed vulnerability that is visible in the risk register, with a named owner and a live compensating control, is governed risk. The same vulnerability hidden in a forgotten exception is a landmine. The difference between the two is entirely whether someone is still looking.

📟 War Story: A constructed case that mirrors a pattern seen repeatedly in real breaches. A mid-size firm ran an internet-facing remote-access appliance with a vulnerability that had been disclosed, patched by the vendor, and added to KEV. An exception had been filed two years earlier ("upgrade scheduled next quarter") and renewed three times without re-review; the compensating control — an IP allowlist — had been quietly dropped during a VPN migration nobody linked back to the exception. An attacker, scanning the internet for that exact KEV-listed flaw, found the appliance, exploited the well-documented vulnerability, and was inside in minutes. The breach was not a failure of patching technology; it was a failure of the exception process — a temporary risk decision that became permanent in the dark. Every element was preventable: re-review the exception, notice the missing compensating control, see the asset on KEV. Nobody was looking, so nobody saw.

🔄 Check Your Understanding: 1. Name three distinct reasons a real vulnerability might remain unpatched for years, and identify which one is the most dangerous and why. 2. You have a business-critical legacy system with a known vulnerability and no available patch. List three things you do to manage the risk without patching.

Answers

Any three of: no patch exists (end-of-life/unsupported software); patching would break a critical application; the asset is a contractually un-patchable black box (vendor appliance/medical/OT device); organizational drift (owner left, re-review never happened); unfunded legacy debt. The most dangerous is organizational drift, because no one is consciously deciding to keep the risk — it persists by inertia, unowned and unwatched, which is exactly the soft target attackers seek. 2. Any three of: mitigate (segment/isolate the asset, put a WAF or strict firewall in front, restrict access, disable the vulnerable feature); monitor intensely (wire it into the SIEM and alert on any access or anomaly); force a re-decision via an expiring exception escalated to senior ownership; and track it openly as accepted risk in the register with a named owner.

23.6 Reporting and trends: proving the program works

A vulnerability-management program produces two kinds of output. One is fixed vulnerabilities — the operational result. The other, just as important, is evidence the program is working — the management result. If you cannot show the trend, you cannot prove you are gaining ground, you cannot defend your budget, and you cannot survive an audit. This section is about the second output, and it sets up the broader metrics discipline of Chapter 36.

What a finding count is not

The first instinct — report "we have 41,000 open vulnerabilities" — is nearly useless and sometimes counterproductive. A raw count is not a measure of risk (most of those 41,000 are low-risk), not a measure of program health (the count goes up when you scan more thoroughly, which is good), and not actionable. A program that gets better at discovery looks worse by raw count, which punishes exactly the right behavior. The metrics that matter measure the loop, not the inventory.

The metrics that tell the truth

Mean time to remediate (MTTR), by risk tier. How long, on average, from discovery to verified fix — broken out by Emergency / Critical / High / etc. This is the single most important operational metric: it measures how fast the loop actually turns for the findings that matter. A falling MTTR on critical findings is a program getting healthier. (You will formalize MTTR alongside MTTD in Chapter 36.)
SLA compliance rate. What percentage of findings were remediated within their SLA, by tier. "We closed 96% of Critical findings within SLA last quarter" is a sentence a board understands and an auditor can test. The trend matters more than the absolute number.
KEV exposure and time-to-remediate KEV. How many KEV-listed (actively exploited) vulnerabilities are currently open in your environment, and how fast you close them. This is the highest-signal risk metric you have — open KEV findings are the ones being exploited right now — and boards increasingly ask for it by name.
Vulnerability age / backlog aging. The distribution of how long open findings have been open, especially the count of findings open past SLA and the oldest findings. This is your early-warning system for the §23.5 never-fixed problem: a growing tail of ancient findings is a flashing red light.
Exception inventory health. How many active exceptions, how many expired but not closed or re-reviewed, and how many sit on high-risk/KEV assets. A rising count of stale exceptions is the leading indicator of the drift that breaches you.
Remediation rate vs. discovery rate. Are you closing findings faster than new ones arrive? If discovery consistently outpaces remediation for high-risk findings, the program is losing ground no matter how busy everyone feels — and that is the argument for more capacity, made in numbers instead of complaints.
Coverage. What fraction of your known assets are actually being scanned (and authenticated where possible)? An unscanned asset has zero findings and unknown risk — the most dangerous state of all. Coverage gaps hide in the blind spots.

Telling the story to different audiences

The same data, framed for the audience:

For the remediation team (operational): the prioritized worklist — what to fix this week, by risk, with SLA clocks. Granular, asset-level, actionable.
For management (tactical): MTTR and SLA-compliance trends by tier, the KEV-exposure number, the exception-health summary. Are we improving? Where are we slipping?
For the board (strategic): a small number of trend lines that answer "are we managing this risk responsibly?" — KEV exposure over time, critical-SLA compliance, and the top accepted risks with named owners. Boards do not want 41,000 findings; they want to know the program is in control and where the residual risk lives. (This is the §23.6 → Chapter 36 bridge: vulnerability metrics become part of the board's risk story.)

🛡️ Defender's Lens: A rising backlog of old, high-EPSS, or KEV-listed findings is the exact attack surface an adversary is enumerating from the outside. When you report "open KEV findings: trending down" and "critical-finding MTTR: 6 days and falling," you are reporting, in management's language, that you are shrinking the soft targets faster than attackers can find them. The trend line is the security posture. A program that only reports a finding count is hiding the one number — KEV exposure over time — that actually predicts whether it will be breached.

🔄 Check Your Understanding: 1. Why is "total number of open vulnerabilities" a poor headline metric for program health? 2. Which single metric best captures whether your program is closing the highest-risk (actively exploited) findings fast enough, and why?

Answers

Because the raw count doesn't measure risk (most findings are low-risk), it rises when you improve discovery (so better scanning looks worse), and it isn't actionable. Metrics that measure the loop — MTTR by tier, SLA-compliance rate, KEV exposure, backlog aging — tell the truth that a count obscures. 2. Open KEV count and time-to-remediate KEV findings — because KEV vulnerabilities are confirmed to be actively exploited in the wild, so they represent the highest, most immediate real risk; how many you have open and how fast you close them is the closest proxy for "are we beating the attacker on the things they're actually using?"

Project Checkpoint

This chapter contributes Meridian's vulnerability-management policy and SLAs to the security program, and adds the vulnmgmt.py module to bluekit.

Program increment — vulnerability-management policy + SLAs. Coming out of the Log4Shell scramble, Dana Okafor's mandate to the team was blunt: "We got lucky on the order we patched things. Luck is not a policy." Elena Vasquez and Sam Whitfield draft Meridian's vulnerability-management policy, the document that turns the lifecycle of §23.1 into a governed control. It states the scope (all assets in the inventory, internet-facing first), the scanning standard (authenticated everywhere feasible, passive for fragile OT/ATM assets, throttled and scheduled), the risk-based prioritization method (CVSS + EPSS + KEV + asset context, per §23.3), the patch-SLA table of Figure 23.2, and — the part the auditors will scrutinize — the exception process with its five guardrails and mandatory expiry/re-review. It defines the program's metrics (MTTR by tier, SLA compliance, KEV exposure, exception health) and who sees which report. This policy slots into the security-program document alongside Chapter 24's incident-response plan; together they are the operational core of how Meridian responds to threats. Templates live in Appendix I.

bluekit increment — vulnmgmt.py. We turn §23.3's four-signal prioritization and §23.4's SLA table into two small functions, so the priority decision becomes code instead of a debate. As always, the code is illustrative and never executed during authoring — the expected output is hand-traced in a comment.

# bluekit/vulnmgmt.py  — Chapter 23 increment
"""Risk-based vulnerability prioritization and patch SLAs for the defender's kit.

priority(): rank a finding by CVSS + EPSS + KEV (the §23.3 signals). KEV (active
            exploitation) dominates; high EPSS escalates; CVSS is the base. Asset
            context is folded in by the caller (it knows what's internet-facing).
patch_sla(): map a severity label to a remediation deadline in days (§23.4).
"""

def priority(cvss: float, kev: bool, epss: float) -> str:
    """Return a remediation priority. KEV or high exploit-probability beats raw CVSS."""
    if kev or epss >= 0.5:                 # actively exploited OR likely to be: jump the queue
        return "P1-EMERGENCY"
    if cvss >= 9.0 and epss >= 0.1:        # critical severity AND non-trivial exploit odds
        return "P2-CRITICAL"
    if cvss >= 7.0:                        # high severity, low exploit odds: scheduled
        return "P3-HIGH"
    return "P4-ROUTINE"                    # everything else: routine SLA


def patch_sla(sev: str) -> int:
    """Days-to-remediate for an (internet-facing) asset, by priority label."""
    return {"P1-EMERGENCY": 3, "P2-CRITICAL": 7, "P3-HIGH": 14, "P4-ROUTINE": 30}.get(sev, 30)


if __name__ == "__main__":
    # (label, cvss, kev, epss) — the five Meridian findings from the §23.3 table
    findings = [
        ("A Log4Shell on banking portal", 10.0, True,  0.94),
        ("C OpenSSL on marketing site",    9.8, False, 0.02),
        ("D Local priv-esc on workstation",7.8, False, 0.01),
        ("E Windows RCE (on KEV)",         8.1, True,  0.90),
    ]
    for name, cvss, kev, epss in findings:
        p = priority(cvss, kev, epss)
        print(f"{p:13s}  SLA {patch_sla(p):2d}d  {name}")

# Expected output:
# P1-EMERGENCY   SLA  3d  A Log4Shell on banking portal
# P3-HIGH        SLA 14d  C OpenSSL on marketing site
# P4-ROUTINE     SLA 30d  D Local priv-esc on workstation
# P1-EMERGENCY   SLA  3d  E Windows RCE (on KEV)

Trace it by hand and watch the lesson of §23.3 fall out of the code. Finding C has the second-highest CVSS (9.8) yet lands at P3-HIGH, because its EPSS is 0.02 and it is not on KEV — nobody is exploiting it. Finding E has a lower CVSS (8.1) yet jumps to P1-EMERGENCY, because kev=True short-circuits the very first check. The function refuses to let raw severity set the order; KEV and exploit-probability do, exactly as a real program must. Asset context (is it internet-facing?) is supplied by the caller, who can tighten the SLA for exposed assets — which is precisely how findings A and B in §23.3, the identical CVE, ended up on different clocks. You have now turned the chapter's central judgment into a repeatable tool.

Summary

This chapter built the discipline of finding, ranking, and fixing weaknesses at scale — and of governing the ones you cannot fix.

Vulnerability management is a continuous, closed-loop process: discover → assess → prioritize → remediate → verify → report, forever. It is risk reduction under permanent scarcity, not a quest to reach zero. "Fixed" is not real until Verify (re-scan) confirms it.
It is distinct from host patch management (Chapter 11), which is one tool inside the Remediate stage. Vulnerability management decides which patches matter, how fast, what to do without a patch, and how to prove the loop works.
Scan deliberately. Use authenticated (credentialed) scans for depth and accuracy on everything you can credential, and unauthenticated scans to see your true external exposure. Scan safely: inventory fragile assets, throttle, schedule, and go passive where active probing risks an outage. Read scanner output critically (false positives, real exposure).
CVSS is severity, not priority. Prioritize by risk: CVSS (how bad) × EPSS (probability of exploitation, 0–1) × KEV (it is being exploited now) × asset context (exposure and value). KEV-listed findings on exposed assets jump the queue; the same CVE on a segmented internal asset is a lower priority. This severity→risk shift is the chapter's core lesson.
Patch SLAs make prioritization accountable: risk-based, tiered deadlines (Emergency / Critical / High / Medium / Low), with tighter clocks for internet-facing and KEV-relevant assets, the clock starting at discovery, and set tighter than the compliance floor.
The exception (risk-acceptance) process is the necessary escape valve and the program's most common failure point. A legitimate exception needs a justification, a compensating control, an accountable owner, an expiry + mandatory re-review, and risk-proportional approval. The permanent "temporary" exception is how known, exploited vulnerabilities live for years.
The vulnerability that never gets fixed persists for human reasons (no patch, breakage fears, un-patchable black boxes, and above all organizational drift). Manage un-patchable risk by mitigating (segment, WAF, restrict, disable), monitoring intensely (SIEM/detection), forcing re-decisions (expiring exceptions), and tracking it openly as accepted risk.
Report the loop, not the inventory. A raw finding count is misleading. Track MTTR by tier, SLA-compliance rate, open KEV exposure, backlog aging, exception health, remediation-vs-discovery rate, and coverage — and frame them per audience (operational / tactical / board).
New term to file: attack surface management (continuous discovery of exposed assets) feeds Discover; SBOM (software bill of materials) is introduced here and gets its full treatment in Chapter 29.

Spaced Review

Retrieval practice across this chapter and earlier ones. Answer before scrolling.

(This chapter) A scanner reports a CVSS 9.8 finding on an isolated lab box and a CVSS 6.5 finding that is on the KEV catalog and present on your internet-facing portal. Which do you remediate first, and which two signals beyond CVSS drove your decision?
(Chapter 12 — application security) Log4Shell was a flaw in a dependency most teams didn't know they were shipping. What practice from secure software development would have helped Meridian know it had Log4j in the first place, before the crisis?
(Chapter 2 — threat landscape) Using the kill-chain idea, at which stage does an attacker exploit an unpatched known vulnerability, and why does fast patching of KEV-listed flaws disrupt the attacker's plan so effectively?
(This chapter) Why does a raw count of "open vulnerabilities" rise when a program gets healthier at discovery, and what should you report instead?

Answers

1. Remediate the **CVSS 6.5 KEV-listed, internet-facing** finding first. The two extra signals are **KEV** (it is being actively exploited right now — likelihood is real and immediate) and **asset context/exposure** (internet-facing and valuable vs. an isolated lab box that no attacker can reach). Risk = likelihood × impact beats raw severity. 2. **Software composition analysis (SCA)** and maintaining a **software bill of materials (SBOM)** — knowing your dependencies in advance (Chapter 12, expanded in Chapter 29) — would have let Meridian instantly answer "where do we have Log4j?" instead of hunting blind during the crisis. 3. An attacker exploits an unpatched vulnerability at the **exploitation** stage (after reconnaissance and delivery) to gain initial access or escalate. Fast-patching KEV-listed flaws removes the specific, reliable, weaponized vulnerabilities attackers most depend on, forcing them toward harder, costlier, noisier methods — you take away their easiest, quietest path. 4. Because better/deeper scanning (especially authenticated and broader coverage) *finds* more of the vulnerabilities that were always there but invisible — the count goes up because your *visibility* went up, which is good. Report the loop instead: **MTTR by tier, SLA compliance, open KEV exposure, backlog aging, and coverage.**

What's Next

You now know how to find weaknesses, rank them by real risk, and fix or govern them before an attacker arrives. But prioritization assumes you have time to decide — and sometimes you don't, because the attacker is already inside. When a high-EPSS, KEV-listed vulnerability is exploited against you despite your best triage, the vulnerability-management loop hands off to the incident-response loop. Chapter 24 is that handoff: the NIST incident-response lifecycle — preparation, detection and analysis, containment, eradication, recovery, and lessons learned — run as a full ransomware tabletop at Meridian. You will see how the patch you didn't get to, the exception that drifted, or the asset you never scanned becomes the entry point an incident responder has to chase down. Vulnerability management is how you keep the incident from happening; incident response is what you do when it happens anyway.