Case Study 2: The Alert That Wasn't Noise

DataField.Dev

Case Study 2: The Alert That Wasn't Noise

"The attack didn't look like an attack. It looked like a busy Tuesday — until you put three logs next to each other." — SOC analyst, Lakeshore Regional Health (constructed)

Executive Summary

A regional hospital network's Security Operations Center caught an active intrusion at 02:30 on a Wednesday — not with a single dramatic alarm, but because a correlation rule stitched three quiet, individually-plausible events into one undeniable picture: a clinician's account that authenticated from an impossible second location, then accessed thousands of patient records it had never touched, then began moving toward a file server. Each event alone would have died in the queue as noise. Together, read in sequence by a SIEM tuned to value rather than volume, they were an account takeover in progress, caught at the discovery stage instead of discovered months later in a breach-notification letter.

This case is the mirror image of Case Study 1. That study was about building a SIEM; this one is about using one — the detection-and-investigation craft of a SOC on a live incident, in a sector where the stakes are patient safety and regulatory exposure under health-privacy law, and where downtime can be measured in delayed care. It is analysis-heavy where the Meridian case was build-heavy. The organization, people, and all figures are constructed for teaching (Tier 3), though the pattern reflects widely-reported healthcare account-takeover incidents.

Skills applied: reading a fired correlation alert; pivoting via SIEM queries (SQL/SPL/KQL) to scope an incident; distinguishing a true positive from a false positive under time pressure; recognizing the sequence behind individually-benign events; using normalized cross-source logs (identity, application, network) together; handing off to incident response and feeding tuning back into the SIEM.

Background: Lakeshore Regional Health

Lakeshore Regional Health runs four hospitals and a dozen clinics. Its crown jewel is the electronic health record (EHR) system — the database of patient records that clinicians touch hundreds of times a shift, and that fraud groups prize because medical records sell well and enable insurance fraud and identity theft. Lakeshore's small SOC had, the year before, done exactly what Meridian did in Case Study 1: prioritized identity and application logs, normalized them to a common schema in UTC, and written a starter catalog of use cases — including two that will matter here:

Use case I:  Impossible travel — same user, two logins too far apart to be one human  [sequence/geo]
Use case II: Anomalous EHR bulk access — a user accesses far more patient records than
             their role's baseline, in a short window                                  [threshold/behavioral]

Lakeshore had also learned Case Study 1's hardest lesson the hard way: in its first month, the EHR bulk-access rule fired constantly on legitimate heavy users (a records-department clerk pulling hundreds of charts for an audit is normal), and an earlier analyst had nearly disabled it. Instead the team tuned it — baselining "normal" per role, so the rule fired on deviation from a role's own pattern rather than on a raw count. That tuning is why, on the Wednesday in question, the rule had the fidelity to mean something when it fired.

🔗 Connection: Lakeshore's choices echo the chapter directly: identity-first collection (§21.2), normalization to a common schema (§21.2), use cases across the correlation ladder (§21.3), and tuning to fidelity rather than disabling (§21.5). This case study is what happens after a team has done all of that well — the payoff of a SIEM built right.

How the compound rule came to exist

The single most important fact about the Wednesday catch is that the rule which made it was not in any vendor's default pack. It was built deliberately, three months earlier, out of a frustration that will be familiar to anyone who has run a SOC: the two detections that should have protected the EHR were each, on their own, too noisy to trust.

Consider the position Lakeshore's lead detection engineer was in. Impossible travel (use case I) is one of the most-recommended detections in healthcare — clinicians' accounts are the keys to the record kingdom — and also one of the noisiest, for exactly the reasons §21.5 names: a doctor connected through a national telehealth VPN egresses from a different city than their home, and the SIEM, reading two src_ip geolocations, declares "impossible travel." The rule fired dozens of times a day, almost always on this benign artifact. Anomalous EHR access (use case II) had the same disease from the other side: a records-department clerk legitimately pulls four hundred charts during an insurance audit, and a raw-count threshold screams "exfiltration." Both detections, run as independent binary alerts, were headed for the graveyard of disabled rules — and an earlier analyst had in fact muted use case II for a week before a supervisor caught it.

The engineer's insight — and it is the transferable lesson of this case — was to stop treating the two as separate alerts and correlate them. Neither weak signal is trustworthy alone; their coincidence on one account in a short window is. The reasoning is probabilistic: benign impossible-travel artifacts are common, and benign bulk-access events are common, but the chance that a random benign instance of each lands on the same account within minutes is small. By requiring both, the compound rule converts two low-fidelity detections into one high-fidelity one — trading a little coverage (it will miss an attacker who triggers only one of the two behaviors) for an enormous gain in precision (when it fires, it almost always means something).

Design of "account_takeover_compound":
  fires when, for a single user within a 15-minute window, BOTH:
     (A) impossible_travel:  two successful logins whose src_ip geolocations imply
                             a travel speed no human could achieve, AND
     (B) anomalous_ehr_access: record_view count >= 5x the user's role-and-hour baseline
  severity:   High (each component is weak; the conjunction is strong)
  tuning:     (A) allowlists corporate/telehealth VPN egress + cloud IP space;
              (B) baseline is per ROLE and per HOUR-OF-DAY, not a global raw count
  trade-off:  misses attackers who trigger only ONE behavior -> covered by keeping
              the component rules ALSO running at LOW severity (feed a risk score),
              so a single behavior still accrues attention without paging

Two design choices in that specification deserve emphasis because they are the difference between a rule that works and one that does not. First, the baseline for component B is per role and per hour of day, not a global number: an overnight ICU nurse's normal is different from a daytime billing clerk's, and "2,140 records in eight minutes" is damning precisely because it is fifty times that account's overnight role baseline of ~40/hour. A raw global threshold would either miss the attack (if set high to accommodate the audit clerk) or drown in false positives (if set low). Tuning to deviation from a peer-and-time baseline is what gives the detection its teeth — a technique that is the doorway to the user-and-entity behavior analytics of Chapter 34. Second, the team did not simply discard the two component rules once the compound rule existed; they kept them running at low severity, feeding a per-user risk score, so that an attacker who triggers only one behavior (say, bulk access from the user's normal IP — an insider) still accrues attention without paging anyone. Coverage was preserved as a quiet signal; fidelity was purchased for the loud one.

🚪 Threshold Concept: The most valuable detections are often not new data sources or cleverer single rules — they are combinations of signals you already have. Lakeshore did not buy anything to catch the Wednesday attack; it recombined two detections it already owned but could not trust individually. Detection engineering is frequently the art of composing weak signals into strong ones. When a rule is too noisy to keep, the question before "disable it" should be "what could I correlate it with to make it trustworthy?"

Phase 1 — The alert fires

At 02:31 UTC, the SOC's queue surfaced a single high-severity correlated alert. Not a wall of alerts — one, scored high precisely because it combined two of Lakeshore's use cases into a compound signal. The analyst on shift, call her Reza, saw this:

ALERT  sev=High  rule="account_takeover_compound"  user=dr_okeefe
  component A (impossible_travel):
     02:14:55Z  source=entra  user=dr_okeefe  src_ip=198.51.100.40  city=approx-A  outcome=success
     02:20:03Z  source=entra  user=dr_okeefe  src_ip=203.0.113.210  city=approx-B  outcome=success
     -> two successful logins ~5 min apart from locations no human could traverse
  component B (anomalous_ehr_access):
     02:21:40Z..02:29:10Z  source=ehr  user=dr_okeefe  action=record_view  count=2,140
     -> dr_okeefe's role baseline is ~40 records/hour overnight; observed 2,140 in 8 min

Two facts, each suspicious, correlated on the same account within minutes: a login from an impossible second location, immediately followed by record access two orders of magnitude above that clinician's overnight baseline. The compound rule existed because the team understood the chapter's core idea — that attacks are sequences, and a rule reading the sequence sees what single-event rules miss. Either component alone might have been tuned-down noise; together, scored as one alert, they screamed.

🛡️ Defender's Lens: Component A alone (impossible travel) is a notoriously noisy detection — VPNs and cloud egress produce false "travel" constantly, which is why Lakeshore, like Meridian, had tuned it. Component B alone (bulk access) has a legitimate version (audits, records requests). The design insight is that correlating two individually-noisy signals produces one high-fidelity alert: the probability that a benign impossible-travel artifact and a benign bulk-access event land on the same account within eight minutes is low. Compounding weak signals into strong ones is a frontline weapon against alert fatigue (§21.5) and a preview of the risk-based alerting and analytics in Chapter 34.

Phase 2 — Triage: true positive or false positive?

Reza's job in the next five minutes was the most important judgment in security operations: is this real? She did not assume; she queried. The chapter's discipline — lead with a time bound, pivot from the alert to the account's full activity — is exactly what she did.

Her first query asked what dr_okeefe had done in the last hour, across all sources, in time order. In KQL (Lakeshore runs Microsoft Sentinel):

Events
| where user == "dr_okeefe"
| where timestamp >= ago(1h)
| project timestamp, source, action, src_ip, outcome, extra
| sort by timestamp asc

The result told the story the alert had only sketched:

01:58:12Z  entra  login        198.51.100.40  success   (normal: dr_okeefe's home IP, recent days)
02:14:55Z  entra  login        198.51.100.40  success   <-- normal location
02:20:03Z  entra  login        203.0.113.210  success   <-- IMPOSSIBLE second location, new IP
02:21:40Z  ehr    record_view  203.0.113.210  success   x2,140 in 8 min  <-- from the new IP
02:29:55Z  smb    file_access  203.0.113.210  success   \\FILESRV4\share  <-- pivot toward a file server

Three things confirmed a true positive, fast:

The bulk EHR access came from the new, impossible IP (203.0.113.210), not the clinician's normal address — ruling out the benign "doctor doing a chart review from their office" explanation.
A second use case had just tripped that was not in the original alert: an smb file-access event from the same impossible IP at 02:29, suggesting the attacker had finished harvesting records and was moving laterally (the discovery → collection → lateral-movement shape of the kill chain).
The legitimate session was still live from the home IP — meaning this was not the doctor relocating; it was a second, concurrent session from an attacker who had the credentials.

⚠️ Common Pitfall: The fastest way to get this wrong is to dismiss the alert because "impossible travel is always noise." Reza had been burned by false positives on that rule before, and the temptation to wave it away was real — which is precisely how a desensitized SOC misses the real one. The defense against that temptation is the query: she did not trust her fatigue or the alert; she pulled the ground truth and let the logs decide. When in doubt, query — do not assume, in either direction.

Phase 3 — Scoping with cross-source queries

Confirmed true positive. Now Reza had to answer the questions incident response (Chapter 24) would need in the next ten minutes: how did they get in, how far did they get, and is anyone else affected? This is pure SIEM querying, pivoting across the normalized sources.

How did they get in? She queried the authentication history for any brute-force or spray preceding the foothold:

SELECT user, COUNT(*) AS fails, MIN(timestamp) t0, MAX(timestamp) t1
FROM events
WHERE action='login' AND outcome='failure'
  AND timestamp >= NOW() - INTERVAL '24' HOUR
GROUP BY user
HAVING COUNT(*) >= 10
ORDER BY fails DESC;

The result was illuminating: dr_okeefe showed no failed-login burst — meaning the credential was not brute-forced but phished or reused (the attacker logged straight in). That ruled out one entry vector and pointed investigators toward email and credential-reuse, sharpening the response.

How far did they get? She listed every distinct resource the impossible IP touched:

index=* src_ip="203.0.113.210" earliest=-24h
| stats earliest(_time) AS first, latest(_time) AS last, values(action) AS actions by source, host
| sort first

This returned the full blast radius: the Entra logins, the 2,140 EHR record views, and the one file-server access — but, reassuringly, no successful access to backup systems, domain controllers, or other clinician accounts. The attacker had been in for fifteen minutes and reached patient records and one share.

What is the attacker's infrastructure? While querying, Reza also let the SIEM's enrichment do its work — the kind of automated context a SOAR supplies (§21.6) so an analyst does not pivot to a dozen external tools by hand. The alert arrived pre-decorated with reputation and geolocation for 203.0.113.210: a hosting-provider address (not a residential ISP a clinician would use), recently seen in threat-intel feeds, geolocating to a region where Lakeshore has no staff. None of this proved malice — reputation data is a hint, not a verdict — but it raised confidence and shaped the message to the incident commander: this looks like a deliberate account takeover from attacker-controlled infrastructure, not a clinician on an unusual network. The lesson is that querying and enrichment work together: the analyst reconstructs what happened from the logs, while automated enrichment supplies who and where without costing her the minutes that matter.

Which records, exactly? Because the EHR application logs record-level access, Reza could enumerate the specific patient identifiers the attacker viewed — the question the privacy and legal teams would ask first (Phase 5):

Events
| where source == "ehr" and user == "dr_okeefe" and src_ip == "203.0.113.210"
| where action == "record_view" and timestamp between (datetime(2025-05-14T02:21:00Z) .. datetime(2025-05-14T02:30:00Z))
| summarize records = dcount(patient_id), sample = make_set(patient_id, 5)

The query returned a precise count of distinct records — the difference between telling regulators "an account was misused" (a guess) and "these 2,140 specific records were accessed in this nine-minute window" (a fact). This is "logs are the ground truth" (§21.1) doing its second job: not only detection, but the defensible, record-level scoping that breach law demands.

Is anyone else affected? She checked whether 203.0.113.210 had authenticated as any other user (credential spraying she might have missed) and whether any other account showed the impossible-travel or bulk-access pattern in the same window. Both came back clean. The incident was scoped to one account and a fifteen-minute window — information that let the incident commander make confident containment decisions instead of shutting down the whole hospital network during a night shift.

🔄 Check Your Understanding: Reza's brute-force query came back empty, and she treated that empty result as valuable information rather than a dead end. What did the absence of a failed-login burst tell her about the entry vector, and why is "the dog that didn't bark" often as informative in a SIEM investigation as a positive hit? (Hint: a SIEM lets you confirm what did not happen, which narrows the hypothesis space.)

Phase 4 — Containment, and feeding the lesson back

With the scope in hand, the SOC handed off to incident response, whose containment playbook — the kind a SOAR can automate (§21.6, and Chapter 24) — executed quickly: disable dr_okeefe, kill the active sessions, block the impossible IP at the edge, and force a credential reset. Because the SIEM had caught the intrusion at the discovery/collection stage rather than after exfiltration to the outside, the file-server access had not yet led to data leaving the network; the EHR records viewed still had to be treated as potentially compromised for breach-assessment purposes, but the lateral movement was stopped before it spread.

The final, often-skipped step was the one that made Lakeshore better: feeding the incident back into the SIEM. The post-incident review produced concrete tuning and new detections:

The compound account_takeover rule had worked; the team lowered the EHR-access deviation threshold slightly for overnight hours (when baselines are low and an attacker's bulk access stands out more cleanly) and added the SMB-file-access pivot as a third component, so a similar attack would now alert even faster.
They added a new use case: successful login from a new IP immediately followed by access to a high-value system, while a prior session from the user's normal IP is still active — the "concurrent session from a new location" pattern that had been the clincher here.
They confirmed the impossible-travel rule's tuning (corporate-VPN allowlist) had not been the reason it almost got dismissed; the real risk had been analyst fatigue, addressed by the compounding design.

🚪 Threshold Concept: A SIEM is not a product you install; it is a loop you run. Detect → investigate → respond → tune and add detections → detect better next time. The Lakeshore SOC was effective on the Wednesday because of incidents it had learned from on every prior Tuesday. The most valuable output of an incident is not the containment — it is the improved detection that makes the next incident shorter. This is "security is a process, not a product" (Theme 1) made operational in the SOC.

Phase 5 — The breach-assessment question, answered from logs

Containment ended the intrusion; it did not end the incident. In a hospital, an account takeover that touched patient records triggers a regulatory process — the organization must determine whether a reportable breach of protected health information occurred and, if so, notify affected individuals and authorities within legally-defined windows. That determination is not a security question or a legal question alone; it is a factual one, and the facts come from exactly one place: the logs.

Here the value of the build (and of "logs are the ground truth") compounds. Because Lakeshore logged record-level EHR access and retained it — in the SIEM hot, and in the data lake for the longer window breach law may require — the privacy team could answer the questions that decide scope precisely rather than by assumption:

Breach-assessment question	Answered from	Lakeshore's answer
Was the access unauthorized?	identity + EHR logs (impossible IP, concurrent session)	Yes — attacker-controlled IP, not the clinician
Exactly which records were viewed?	EHR record-level access logs	2,140 distinct patient records, enumerated by ID
Over what window?	normalized UTC timestamps	02:21:40–02:29:10Z (under 8 minutes)
Was data exfiltrated off-network?	firewall/proxy egress logs	No large outbound transfer observed before containment
Did the attacker reach other systems/patients?	cross-source scoping queries	No — scoped to one account, one share

Notice how much of the regulatory answer is negative findings established from logs: no exfiltration, no spread to other accounts, no access to additional record sets. A team without comprehensive, retained, normalized logging cannot prove a negative — it can only say "we have no evidence of wider access," which to a regulator reads very differently from "we examined complete access logs and confirmed the scope was these 2,140 records over eight minutes." The difference is the difference between defensible and hopeful, and it is purchased entirely by the logging discipline of §21.1–21.2 done before the incident.

⚖️ Authorization & Ethics: The records the attacker viewed must be treated as potentially compromised even though they were not provably copied off-network — viewing is access, and access to protected health information is the threshold the law cares about. The team's honest framing to leadership distinguished what the logs proved (these records were accessed from a hostile IP) from what they could not rule out (the attacker may have screen-captured or transcribed what they viewed). Overclaiming ("no data left, so no breach") would have been both wrong and a compliance failure; the logs support a precise, honest scope, not a convenient one.

🔗 Connection: This phase is where SIEM operations meet incident response (Chapter 24) and forensics (Chapter 25). The record-level enumeration Reza produced is the seed of the formal breach assessment; the timeline she reconstructed from normalized UTC events is the spine an investigator extends; and the retained data-lake copy is what makes a months-later regulatory inquiry answerable. Detection, response, forensics, and compliance all draw on the same well — the logs.

Discussion Questions

Both components of the triggering alert (impossible travel; bulk EHR access) were individually noisy detections the team had nearly tuned out. Explain, in fidelity terms, why correlating two weak signals produced one strong alert — and what this implies for how to design detections in a noisy environment.
Reza's brute-force query returned empty, and she treated that as informative. Discuss how a SIEM lets an analyst confirm negatives, and give another example where the absence of an expected log is itself a detection.
Lakeshore caught the attack at the discovery/collection stage. Trace where earlier detection might have been possible (was there any signal before the impossible-travel login?), and what additional log source or use case could have caught it sooner.
The post-incident review added and tuned detections rather than just closing the ticket. Why is this "feedback loop" step so often skipped, and what organizational habit ensures it happens?
Compare this case with Case Study 1. One is about building a SIEM, the other about using one on a live incident. Which skills are shared, and which are unique to each? Where does the build determine whether the use succeeds?

Your Turn

You are the analyst when a compound alert fires. Given a (constructed) account-takeover scenario — a login from a new IP plus anomalous access to a sensitive system — write the sequence of queries you would run to (1) confirm true vs. false positive, (2) determine the entry vector, (3) scope the blast radius, and (4) check for other affected accounts. Write at least two of the queries in a real query language (SQL, SPL, or KQL), each with a time bound. Then write the one new detection you would add in the post-incident review, fully specified. Half a page to a page. Notice how much of the work is querying, not staring at the alert.

Key Takeaways

Attacks are sequences; correlation reads sequences. The intrusion was invisible in any single event and undeniable once an impossible-travel login and anomalous bulk access were correlated on one account within minutes.
Compounding weak signals yields strong alerts. Two individually-noisy detections, correlated, produced one high-fidelity alert — a frontline tactic against alert fatigue and a bridge to risk-based alerting (Chapter 34).
When in doubt, query — don't assume. Reza beat her own fatigue not by trusting or dismissing the alert but by pulling the ground truth; the logs, not the analyst's mood, decided.
A SIEM confirms negatives too. An empty brute-force result was evidence — it ruled out one entry vector and pointed the investigation at phishing/reuse. The dog that didn't bark narrows the hypothesis.
Cross-source querying scopes the incident: entry vector, blast radius, and other-affected-accounts are all answered by pivoting across normalized identity, application, and network logs with time-bounded queries.
A SIEM is a loop, not a product. Detect → investigate → respond → tune/add detections. The most valuable output of an incident is the improved detection that shortens the next one — Theme 1 in the SOC.
The build determines the use. Lakeshore could investigate fast only because identity and application logs were collected, normalized to a common schema in UTC, and tuned to fidelity beforehand — exactly the work of Case Study 1.