Case Study 2: The Alert That Almost Didn't Fire — A University SOC

DataField.Dev

Case Study 2: The Alert That Almost Didn't Fire — A University SOC

"We didn't catch them with the rule we were proud of. We caught them with the one we'd almost deleted for being too noisy." — SOC lead, Lakeside State University (constructed)

Executive Summary

Lakeside State University runs one of the hardest networks in the world to defend: tens of thousands of unmanaged student devices, an open research culture that resists firewalls on principle, and a small security team. When an attacker compromised a graduate student's credentials and began quietly moving through the network toward a research database, the university's signature-based intrusion detection system saw nothing — the attacker used only stolen credentials and built-in tools, so no signature matched. What finally caught the intrusion was a noisy anomaly rule the SOC had nearly suppressed for generating too many false positives. This case study reconstructs the incident from the detection telemetry, contrasts what the signature IDS missed with what the anomaly model caught, and dissects the tuning decisions that determined whether the one alert that mattered surfaced or drowned. It is a detection-and-analysis study — the analytical counterpart to Case Study 1's design work. All systems, logs, and figures are constructed for teaching (Tier 3).

Skills applied: reading IDS/IPS telemetry; distinguishing signature from anomaly detection in practice; analyzing why signatures miss credential-based intrusions; reasoning about the base-rate and false-positive problem; correlating weak signals; tuning detection without creating blind spots.

Background

A university network is a different animal from a bank's. Lakeside cannot run strict default-deny everywhere: researchers move large datasets to collaborators worldwide, students run every conceivable device and application, and the academic culture treats heavy-handed network controls as an affront. The SOC therefore leans hard on detection rather than prevention — it cannot block its way to safety, so it must see its way there. Its core tool is a network IDS sensor watching traffic at the campus border and at a few internal chokepoints, running a large library of signatures plus a handful of anomaly models.

The crown jewel worth protecting is not money but data: a research database holding years of unpublished genomics work and access to a high-performance computing cluster. It lives on a segment with some access control — a firewall limits which subnets can reach it — but inside the academic network, where a compromised account can originate from anywhere, that boundary is porous by design.

It is worth being precise about why a university leans on detection where a bank leans on prevention, because the contrast sharpens the whole chapter. Meridian (Case Study 1) could enforce strict default-deny and 802.1X because its devices are managed, its users are employees, and its mission tolerates friction — a teller does not need to move a terabyte to a collaborator in another country at 3 a.m. Lakeside can assume almost none of that. Its threat surface is enormous and its ability to prevent is constrained by mission, so it shifts the weight of its defense from the firewall to the sensor: it cannot stop most traffic, so it must understand it. That shift makes Lakeside the perfect place to study what happens when detection is doing the heavy lifting — and what it costs when the detection that matters is buried in noise.

The SOC's detection stack, simplified:

   Lakeside detection architecture (detection-weighted, not prevention-weighted):

   campus border ─► [ IDS sensor: signatures + anomaly models ] ─► alerts ─► SOC queue
   internal chokepoints ─► [ IDS sensors ] ─────────────────────────────────┘
   research segment ─► firewall (coarse subnet rules; porous to internal accounts)

   No NAC on most ports (open culture). No strict default-deny (research mission).
   The sensor IS the security control. Its alerts ARE the visibility.

Figure CS2.0 — A detection-weighted defense. With prevention constrained by mission, the IDS sensor and its alert quality become the university's primary security control — which makes alert tuning a life-or-death matter for the data.

The SOC had a chronic frustration. One of its anomaly rules — "internal host opens a long-lived connection to a destination it has never contacted, sustained over hours" — fired constantly. Most of the time it was a researcher's legitimate (if unusual) data transfer to a new collaborator. The rule generated dozens of false positives a week, and a junior analyst had filed a ticket recommending it be disabled: "This rule is noise. It never catches anything real and it eats triage time." The SOC lead had marked the ticket "hold" rather than "approve." That decision is the hinge of this story.

The Analysis

Phase 1 — What the signature IDS saw: nothing

Reconstruct the intrusion from the telemetry. The attacker began with a phished credential — a graduate student's username and password, harvested by a fake library-login page (the same class of attack as Meridian's Chapter 1 near-miss, against a population with no phishing-resistant MFA). With valid credentials, the attacker logged into the campus VPN during normal hours and began to operate.

Here is the crucial point for this chapter: from the IDS's perspective, almost nothing looked wrong. The attacker did not deploy malware, so no malware signature fired. They used built-in administrative tools — the same SSH, the same database clients researchers use daily — so no exploit signature fired. They moved during business hours, blending into normal traffic volume. The signature IDS, the tool the SOC was proudest of, was looking for known-bad patterns, and a legitimate-looking session using stolen credentials and native tools contains no known-bad pattern to match.

   What the SIGNATURE IDS was watching for (and the attacker avoided):

   [malware payload]      -> none deployed        -> no match
   [exploit byte pattern] -> used native tools    -> no match
   [known-bad domain]     -> used internal hosts  -> no match
   [brute-force burst]    -> had valid creds      -> no match
   ─────────────────────────────────────────────────────────────
   Signature IDS verdict over the first six days: SILENT.

Figure CS2.1 — Why signatures missed it. A credential-based intrusion using native tooling presents no known-bad pattern. Signature detection is blind to an attacker who looks legitimate — its defining limitation from §7.3.

🛡️ Defender's Lens: This is the single most important detection lesson in network security, and it is why the chapter insists you run anomaly detection alongside signatures. The most damaging real intrusions increasingly involve no malware at all — they are "living off the land," using stolen credentials and the target's own tools precisely because those leave no signature. A SOC that trusts only its signature IDS is, against this adversary, effectively blind. Signatures catch the noisy, known threats; they do not catch a quiet attacker who looks like one of your users.

Phase 2 — What the anomaly model caught

On the sixth day, the attacker found the research database and began copying it out — not in one obvious burst (they were careful), but as a sustained, long-lived connection from the compromised graduate student's workstation to an external host the student had never contacted, holding open for hours overnight. That tripped the noisy anomaly rule the junior analyst had wanted to delete:

03:11:07  [ANOMALY 4400] Long-lived connection to never-before-seen destination
          src=10.55.12.40 (grad-ws, user: jpark)  dst=203.0.113.210:443
          duration=4h12m (ongoing)  bytes_out=14.7GB  baseline_dst: NONE in 90d
          host_profile: first outbound to this /24 ever; nightly volume 600x median

On its own, this alert looked like every other false positive the rule had ever produced — a big overnight transfer to a new destination, exactly what a researcher's legitimate data share also looks like. The analyst on shift very nearly closed it as "probable research transfer, low priority." Two details stopped her. First, the volume: 14.7 GB and climbing, 600 times the host's median — extreme even for Lakeside. Second, the time and user: the connection had started at 3 a.m. from a graduate student's workstation, and a quick check showed jpark was not a heavy-data researcher and had no funded project that would explain a 15-gigabyte overnight export. The anomaly rule had not told her it was an attack — it only told her something was abnormal. The judgment that it was an attack came from a human who took the abnormal seriously enough to look.

🚪 Threshold Concept: Anomaly detection does not catch attacks; it catches abnormality, and most abnormality is benign. Its value is entirely realized in the moment a human decides to investigate rather than dismiss. This is why "the alert that mattered fired" is necessary but not sufficient — the alert that mattered also has to be believed. A SOC's real detection capability is the product of its tools' sensitivity and its analysts' justified willingness to chase the unusual. Kill enough false positives that the unusual is still worth chasing, and you have a SOC that catches what signatures miss.

It is worth slowing down on the analyst's actual decision, because it is the hinge on which the whole incident turned and it is teachable. Faced with ANOMALY 4400, she did not have a rule that said "this is an attack." She had an alert that said "this is unusual," sitting in a queue where the same rule had produced a benign result 39 times out of 40. What separated investigation from dismissal was a short, disciplined triage she could run in two minutes:

Does the magnitude exceed even the benign envelope? 14.7 GB at 600× the host's median was extreme even for a research transfer. A merely-unusual transfer she might have deferred; an extreme one earned a closer look.
Does the context fit a legitimate story? She checked the user. A funded genomics lab moving data fits; a graduate student in an unrelated department with no funded data project does not. The anomaly's plausibility as benign collapsed under one lookup.
Is the asset high-value? The destination correlated with a host on the research segment. An anomaly toward the crown jewels clears a lower bar for investigation than an anomaly toward a dorm printer.

None of these three checks is exotic, and all three are enrichment the SOC could have automated — and did, afterward, by attaching user-context and asset-criticality to the alert so the next analyst would not have to look them up by hand. The lesson is not "hire heroic analysts." It is "make the alert carry enough context that a tired analyst at 3 a.m. reaches the right judgment quickly." The human is the weakest link and the strongest asset (Theme 3) — and the way you strengthen the asset is to give it context, not just alerts.

Phase 3 — Correlation: from one ambiguous alert to a confirmed intrusion

A single anomaly is ambiguous. What converted suspicion into confirmation was correlation — pulling together weak signals from across the SOC's data sources that, individually, no one had flagged:

   Correlated timeline (assembled after the anomaly alert prompted a hunt):

   Day 1  09:42  VPN login: user jpark from a residential IP in another country
                 (jpark had never connected from abroad)            <- weak signal, unflagged
   Day 2  14:03  jpark account: first-ever SSH to 3 research hosts  <- weak signal, unflagged
   Day 4  11:20  jpark queried the genomics DB schema (unusual)     <- weak signal, unflagged
   Day 6  03:11  ANOMALY 4400: 14.7GB overnight to new external host <- the alert that fired
   ───────────────────────────────────────────────────────────────────────────────────────
   No single line was alarming. Together they are a textbook intrusion:
   initial access (stolen creds) -> lateral movement -> discovery -> exfiltration.

Figure CS2.2 — The correlated picture. Each weak signal alone was dismissable; assembled, they tell an unambiguous story. This is precisely the value a SIEM adds (Chapter 21) — and the reason isolated alerts are not enough.

The foreign VPN login, the first-ever SSH to research hosts, the unusual schema query — each had been recorded but none had been alerted on, because each in isolation was within the noise of a busy university. Only after the anomaly alert prompted the analyst to assemble jpark's activity did the pattern resolve into the four classic stages of an intrusion. The SOC contained it that night: disabled the account, isolated the workstation, and cut the exfiltration connection — which had moved about 15 GB, a serious but survivable loss compared with the entire database the attacker was minutes from finishing.

🔗 Connection: This is the strongest possible argument for the SIEM and correlation work of 🔗 Chapter 21. The signals that, correlated, screamed "intrusion" were already in the SOC's logs for six days. What was missing was the engine to combine them. Tuning a sensor reduces noise; correlation turns the surviving signals — and the ones too weak to alert on alone — into a single, believable story. Detection at scale is correlation.

Phase 4 — The tuning decision that saved the day, examined

Now confront the uncomfortable arithmetic, because the SOC lead's "hold" on the delete-this-rule ticket was not luck — it was a defensible judgment about base rates and asymmetric costs.

The anomaly rule was genuinely noisy. Say it fired roughly 40 times a week, and that 39 of those were benign research transfers — a 97.5% false-positive rate per alert. By the raw numbers, the junior analyst had a point: the rule cost real triage time and almost always cried wolf. But the cost structure is wildly asymmetric. A false positive costs a few minutes of an analyst's time. A false negative — missing the one real exfiltration in the stream — costs the entire research database, years of work, and the university's reputation. When the downside of a miss is catastrophic and the downside of a false alarm is minutes, you tolerate a noisy rule — provided you make the noise survivable rather than deleting the detection.

Put numbers on the asymmetry to see why "delete the noisy rule" is the wrong instinct. Suppose, over a year, this rule produces about 2,080 firings (40/week). If real exfiltration events are genuinely rare — say one or two a year — then the rule's per-alert probability of being a true positive is tiny, roughly $2 / 2080 \approx 0.1\%$. That is exactly the base-rate problem from §7.6: a detection watching for a rare event will, almost by definition, be mostly false alarms, no matter how good it is. The naive reading is "0.1% true, therefore worthless." The correct reading weighs the outcomes: the expected cost of those ~2,078 false positives is roughly 2,078 × (a few analyst-minutes) ≈ a manageable amount of triage time, while the expected cost of suppressing the rule and thereby missing the one real event is the loss of the crown-jewel database. No rational risk calculation deletes a detection whose false-positive cost is measured in minutes and whose false-negative cost is measured in years of research. The rule is noisy because it is doing its job — watching for something rare. The task is to make 0.1% survivable, not to silence it.

So the SOC's correct move was never "keep the rule as-is" or "delete it." It was tune it so it stays alive but stops drowning analysts:

Tuning action	Effect on this rule	Risk if done wrong
Allowlist known heavy-data researchers	Removes most benign firings (the funded labs)	Allowlisting too broadly could hide a compromise of those accounts
Raise volume threshold + add "new destination" AND "off-hours"	Fires only on the genuinely unusual combination	Too high a threshold misses a careful low-volume thief
Prioritize by asset	An anomaly toward the genomics segment outranks a generic one	Mis-tagging asset criticality buries a real alert
Correlate, don't isolate	Combine the anomaly with VPN-geo and first-SSH signals (Phase 3)	Requires the SIEM investment of Ch.21

After the incident, the SOC implemented all four. The rule's weekly firings dropped from ~40 to ~4, and those four now arrived pre-correlated with the supporting signals, so an analyst could adjudicate each in seconds with high confidence. The detection that caught the intrusion survived — and became trustworthy instead of merely present.

⚠️ Common Pitfall: "This rule is noisy, delete it" is one of the most dangerous sentences in a SOC, because the noisiest rules are often the ones watching for the subtlest attacks. The disciplined response to a noisy-but-valuable rule is to tune and correlate, not to suppress. The base-rate problem (§7.6) guarantees that any rule catching rare events will look noisy; the goal is to make the noise survivable, not to eliminate the detection along with it. Lakeside almost deleted the one rule that mattered.

Phase 5 — What changed: covering the gap signatures left

The incident's after-action review asked the right question: not "how do we write a signature for this attack?" but "how do we cover the entire class of credential-based intrusions that signatures structurally miss?" Writing a signature for the specific exfiltration the SOC had just seen would catch exactly that one technique and nothing else — the eternal weakness of signature-chasing. Lakeside instead made four structural changes, each aimed at the class rather than the instance.

First, phishing-resistant MFA for accounts that can reach research data. The entire intrusion began with a phished password against a population with no second factor — the same root cause as Meridian's Chapter 1 near-miss. Anomaly detection caught the consequences; MFA would have prevented the cause. This is the most important lesson and the cheapest: the best detection in the world is worse than not needing it.

Second, a small set of high-value behavioral detections explicitly designed for living-off-the-land activity, mapped to attacker techniques rather than malware:

Behavioral detection (technique-based)	What it catches that signatures don't
First-ever SSH from an account to a research host	Lateral movement using native tools
Database schema/enumeration queries from a non-analyst account	Discovery before exfiltration
Any large outbound transfer toward a new external destination from a research-segment host	Exfiltration, regardless of payload encryption
VPN login from a country an account has never used	Stolen-credential initial access

Each of these would have fired on the intrusion days before the exfiltration alert — they target the earlier kill-chain stages. None of them is a "signature" in the byte-matching sense; they are behavioral detections, and they are deliberately scoped to high-value assets so their false-positive load stays manageable.

Third, correlation as the default, not the exception. The SOC fed VPN logs, authentication events, and IDS alerts into a correlation engine (the 🔗 Chapter 21 SIEM work) so that the weak signals which had sat unflagged for six days would, in future, assemble themselves into a single rising-confidence case automatically. The goal: never again require a human to manually reconstruct a timeline after the fact when the system could have surfaced it during.

Fourth, a detection-coverage map. Borrowing the discipline of knowing which attacker behaviors you can and cannot see, the SOC plotted its detections against the stages of an intrusion and found exactly the gap this incident had exposed: heavy coverage of delivery and exploitation (where signatures live) and almost none of lateral movement, discovery, and exfiltration (where credential-based attackers operate). Naming the gap turned an embarrassing miss into a prioritized roadmap. (The book develops this detection-coverage discipline fully in Part V, once the SIEM of 🔗 Chapter 21 is in place to feed it.)

   Lakeside detection coverage, BEFORE vs AFTER this incident:

   Intrusion stage     Before        After
   ─────────────────────────────────────────────
   Initial access      weak          MFA + VPN-geo anomaly
   Exploitation        STRONG (sigs) STRONG (sigs)
   Lateral movement    NONE          first-SSH behavioral
   Discovery           NONE          enumeration-query behavioral
   Exfiltration        anomaly only  anomaly + new-dest behavioral + correlation

Figure CS2.3 — The coverage gap, named. Signatures concentrated all detection at the exploitation stage; the credential-based attacker simply operated in the stages where Lakeside was blind. The fix was to build detection for the uncovered stages, not to chase the one technique that got through.

🚪 Threshold Concept: The right response to a detection miss is almost never "write a signature for what we just missed." It is "which class of attacker behavior is invisible to us, and how do we cover the class?" Signature-chasing produces a detection that catches exactly one technique and nothing adjacent; behavioral, technique-mapped detection plus a coverage map catches the family of attacks and tells you honestly where you are still blind. Detection engineering is the discipline of covering classes, not instances — and it is what Part V of this book builds on top of the foundations here.

Discussion Questions

The signature IDS was silent for six days against a credential-based, "living off the land" intrusion. Does this mean signature detection is worthless? Argue the precise role signatures should play, and what they should not be relied upon for.
The analyst nearly dismissed ANOMALY 4400 as "probable research transfer." What specific changes — technical or procedural — make an analyst more likely to investigate rather than dismiss a plausible false positive? Is there a limit to how much you can fix this with tooling alone?
The four weak signals in Phase 3 were recorded but never alerted on individually. Should the SOC have alerted on, say, "VPN login from a never-before-seen country"? What is the cost of alerting on each weak signal versus correlating them, and how does base rate inform the answer?
Lakeside cannot run strict default-deny like Meridian's bank because of its open research culture. Given that constraint, how should a university weight prevention versus detection differently from a bank, and what does it give up by doing so?
The SOC lead held the delete-this-rule ticket on judgment, not policy. Should the decision to suppress a detection rule require formal review? Draft the one-sentence policy you would put in place.

Your Turn

Find (or construct) a detection scenario where a signature system would stay silent — a stolen- credential intrusion, an insider misusing legitimate access, or a novel technique. (1) Write three weak signals it would generate that, individually, a busy SOC might dismiss. (2) Write the single anomaly condition most likely to surface it, and honestly estimate its false-positive rate. (3) Propose two tuning actions that would keep that anomaly rule alive but make its noise survivable, and (4) describe the correlated alert you would build by combining the anomaly with the weak signals. Finish with one sentence completing: "A signature IDS would have caught none of this because ______."

Key Takeaways

Signature detection is blind to credential-based, "living off the land" intrusions — no malware, no exploit, no known-bad pattern means no match. It catches known threats; it does not catch an attacker who looks like a legitimate user.
Anomaly detection catches abnormality, not attacks — its value is realized only when a human takes the abnormal seriously enough to investigate. Tooling sensitivity × analyst judgment = real detection.
Correlation turns weak signals into a confirmed intrusion. The signals that screamed "breach" were already in the logs for six days; what was missing was the engine to combine them (Chapter 21).
The base-rate problem makes any rare-event detector look noisy. A 97.5%-false-positive rule can still be the one that saves you when the cost of a miss is catastrophic and the cost of a false alarm is minutes.
"Noisy, delete it" is dangerous — the noisiest rules often watch for the subtlest attacks. Tune and correlate a valuable-but-noisy rule; do not suppress the detection along with the noise.
Detection-heavy defense suits open networks (universities, research) that cannot prevent their way to safety — but it demands disciplined tuning so the alert that matters is both fired and believed.
The right fix for a missed attack is structural, not reactive: Lakeside added phishing-resistant MFA (kill the root cause), technique-mapped behavioral detection for the uncovered kill-chain stages, default correlation, and a coverage map — not a one-off signature for the exact thing that got through.
Read alongside Case Study 1, the contrast is the lesson: a bank with managed devices and a friction-tolerant mission can lean on prevention (default-deny, NAC); an open research network must lean on detection — and the same chapter concepts (signatures vs. anomalies, base rates, correlation) govern both, just weighted differently.