Case Study 1: Auditing and Hardening Meridian's TLS Posture

DataField.Dev

Case Study 1: Auditing and Hardening Meridian's TLS Posture

"We didn't get breached. We got lucky that the broken thing was the thing nobody used. The audit's job is to find the broken thing before the customers do." — Sam Whitfield, Security Engineer, Meridian Regional Bank (constructed)

Executive Summary

The penetration-test finding from this chapter's opening — a forgotten marketing microsite still speaking TLS 1.0 with broken ciphers and an expired certificate — embarrassed Meridian's security team, but it did something more useful than embarrass them: it proved they had no idea what their own cryptographic posture was. Nobody owned a list of every TLS endpoint the bank exposed, nobody knew which protocols and ciphers those endpoints offered, and nobody knew when any given certificate expired. The microsite was not an anomaly; it was the one weak spot that happened to get scanned. This case study follows Sam Whitfield and junior analyst Theo Brandt as they turn a one-off finding into a program: a complete TLS inventory, a defensive scan of the entire external estate, a grading and prioritization pass using this chapter's tls_config_grade, a remediation wave, and — the part that prevents the next surprise — automated certificate-lifecycle management. You will watch the chapter's concepts (TLS versions, cipher suites, forward secrecy, certificate expiry, revocation, CT monitoring) stop being definitions and become a worklist. The scenario and all figures are constructed for teaching (Tier 3).

Skills applied: TLS endpoint inventory; defensive TLS scanning; reading and grading cipher suites and protocol versions; certificate expiry and chain analysis; risk-based prioritization of crypto findings; designing automated certificate-lifecycle management and CT-log monitoring; distinguishing transport, cipher, and lifecycle failures.

Background

Meridian's external attack surface had grown the way every organization's does: by accretion. Over fifteen years the bank had stood up the main banking portal, the public website, a mobile API, a partner API gateway, an investor-relations site, several campaign microsites, a careers portal, and a handful of vendor-hosted properties bearing the Meridian name. Each was launched by whoever needed it, configured to whatever was "current" that year, and then mostly forgotten. Certificates were bought individually, by different people, from two different CAs, with renewal reminders going to inboxes that in some cases belonged to employees who had since left.

CISO Dana Okafor's instruction after the pen-test finding was characteristically blunt: "I don't want the microsite fixed. I mean, fix it — but I want to know that it's the only one, and right now we can't say that. Build me the list, scan everything we own, and tell me what's bad in priority order. And then make it so a certificate never expires on us by surprise again." She gave Sam and Theo two weeks and tied the work to the program's data-protection standard (this chapter's Project Checkpoint), which was being drafted in parallel.

The constraint that shaped everything: this was a defensive exercise on Meridian's own assets. No attacking, no touching anything they did not own. Read-only enumeration, careful documentation, and a remediation plan the operations team could execute without breaking production banking during business hours.

The Audit

Phase 1 — Building the TLS inventory

You cannot scan what you have not enumerated, and the microsite proved the inventory did not exist. Theo started where the bank's knowledge actually lived — across several disconnected sources — because no single system knew every endpoint:

Sources Theo reconciled into one TLS inventory:
  • External DNS zone records (every A/AAAA/CNAME under meridianbank.example)
  • The two CAs' certificate dashboards (what certs were issued, to what names)
  • Certificate Transparency logs for *.meridianbank.example (the public record
        of every cert ever issued for the bank's names — including ones the bank
        had forgotten it owned)
  • The load balancer and CDN configurations (what listens on 443)
  • The cloud accounts' load balancers and API gateways

The CT-log search was the revelation. Querying public CT logs for every certificate issued under meridianbank.example returned more names than the team knew existed — including two more abandoned campaign sites and a long-decommissioned VPN portal whose certificate, it turned out, was still valid and still pointed at a live (if unused) host. The lesson landed immediately:

🚪 Threshold Concept: Certificate Transparency is not only a mis-issuance detector — it is the most complete inventory of your own public-facing names that exists, because every CA is required to publish every certificate it issues. If you do not know all your TLS endpoints, the CT logs very nearly do. Any TLS audit should begin there; the assets you forgot you owned are exactly the ones rotting into the microsite problem.

The reconciled inventory came to 23 external TLS endpoints — nearly double the dozen the team would have listed from memory. Each got a row: hostname, where it was hosted, which CA issued its certificate, the certificate's expiry date, and an owner (assigned now, even if the answer was "nobody, pending decommission").

Phase 2 — The defensive scan

With the list in hand, Theo ran a read-only TLS scan against each endpoint, on authorized assets only, capturing the protocols offered, the cipher suites offered, forward-secrecy support, and the certificate details. The commands were the standard defensive toolkit:

# Read-only enumeration of OUR OWN endpoints. No exploitation; audit only.
# For each host in the inventory:
nmap --script ssl-enum-ciphers -p 443 <host>          # quick protocol/cipher list
testssl.sh --quiet --color 0 https://<host>           # full graded report incl. cert

# Theo also pulled each certificate's not-after date directly for the expiry tracker:
echo | openssl s_client -connect <host>:443 -servername <host> 2>/dev/null \
  | openssl x509 -noout -dates -subject -issuer

The raw output was hundreds of lines per host — exactly the "wall of text" problem the chapter warned about. Theo normalized each host's result into the four parameters the grading tool consumes (min_protocol, forward_secrecy, weak_ciphers, cert_days_left) and fed the whole inventory through tls_config_grade. A representative slice of the graded results:

Endpoint	Min proto	FS?	Weak ciphers	Cert days left	Grade	Lead finding
`banking.meridianbank.example`	TLS 1.2	yes	none	190	A	clean
`www.meridianbank.example`	TLS 1.2	yes	none	64	A	clean
`mobile-api.meridianbank.example`	TLS 1.2	yes	none	11	B	cert expires in 11 days
`partner-api.meridianbank.example`	TLS 1.2	partial	3DES	150	C	3DES + partial FS
`promo-spring.meridianbank.example`	TLS 1.0	no	RC4, 3DES	-42	F	the microsite: obsolete proto + expired
`careers.meridianbank.example`	TLS 1.1	yes	none	80	F	TLS 1.1 offered
`vpn-old.meridianbank.example`	TLS 1.0	no	3DES	25	F	decommission candidate, still live

🛡️ Defender's Lens: Notice how the grading tool re-sorted the team's intuition. The microsite (promo-spring) was the known problem, but the scan surfaced two more Fs nobody had flagged: a careers site quietly offering TLS 1.1, and an abandoned VPN portal that was a decommission candidate but still live with a 3DES cipher and 25 days left on a certificate that, if it auto-renewed, would keep a bad endpoint alive even longer. The partner-api C was arguably the most sensitive finding — a partner-facing endpoint with 3DES and incomplete forward secrecy, carrying real business data — even though its grade was better than the abandoned sites. Grade is the first sort; what the endpoint actually carries is the second.

Phase 3 — Prioritizing the remediation

A grade is not a plan. Sam and Theo combined each endpoint's grade with what it carried and how exposed it was, using the likelihood × impact instinct from Chapter 1, to order the work. Their reasoning:

PRIORITY 1 (this week) — exploitable weakness on a sensitive, live endpoint:
  partner-api  (C): 3DES + partial FS on a partner-facing API carrying business
                    data. Downgradeable; real data at stake. Fix cipher policy now.
  mobile-api   (B): cert expires in 11 days on the live mobile channel. An expiry
                    here is a customer-facing outage. Renew immediately.

PRIORITY 2 (this week) — F-grade, but low-value / decommission:
  promo-spring (F): the microsite. No data, no traffic. DECOMMISSION rather than
                    fix — the most secure config for a useless asset is "gone."
  vpn-old      (F): decommission the dead VPN portal; do not let its cert renew.
  careers      (F): real (low-sensitivity) site; disable TLS 1.0/1.1, keep it.

PRIORITY 3 (this cycle) — hygiene on the good endpoints:
  Standardize ALL endpoints to TLS 1.2+1.3 only, AEAD+ECDHE only, via a single
  shared cipher policy on the load balancers so nothing drifts back.

⚠️ Common Pitfall: The team's first draft put the microsite at Priority 1 because it was the worst grade and the one that had embarrassed them. Sam pushed back: "It's an F, but it's an F nobody can reach and nothing lives behind. The partner-api is a C, but it's a partner-facing API with real data and a downgradeable cipher. Grade tells you how broken; impact tells you how much it matters. Fix the dangerous C before the harmless F." The most security-urgent item is rarely the lowest grade — it is the worst combination of weakness and value. (And note the cheapest fix of all: the right remediation for a useless asset is to delete it, which removes the attack surface entirely rather than hardening it.)

Phase 4 — The cipher-policy fix and verification

The Priority-1 and Priority-3 work converged on a single technical lever: a standard cipher policy applied at the load balancers and CDN, so that every endpoint inherited the same hardened configuration rather than each carrying its own decade-old defaults. Sam wrote the policy intent to match the chapter's data-protection standard:

Meridian standard TLS policy (applied at all edges):
  Protocols:  TLS 1.2 and TLS 1.3 ONLY  (disable SSLv2/3, TLS 1.0, TLS 1.1)
  TLS 1.2 ciphers:  ECDHE key exchange only (forward secrecy);
                    AES-GCM or ChaCha20-Poly1305 only (AEAD);
                    SHA-256 or better; NO RC4, 3DES, export, static-RSA, CBC.
  TLS 1.3 ciphers:  the protocol's defaults (all AEAD + forward-secret).
  Server cipher preference: ON (server chooses strongest mutually supported).
  HSTS: enabled with a long max-age on all web properties.
  Certificates: RSA-2048+ or ECDSA P-256+; SHA-256 signatures; max ~1-year
                lifetime; automated renewal.

Crucially, the work did not end at "applied." Theo re-scanned every changed endpoint and re-graded it, because a configuration is not fixed until it is verified fixed. The partner-api moved from C to A; the careers site from F to A; the two abandoned endpoints were confirmed gone (the scan now returned no listener). The before/after grades became the evidence for the audit and the metric for the board:

Endpoint	Before	Action	After
`partner-api`	C	applied standard cipher policy	A
`careers`	F	disabled TLS 1.0/1.1	A
`mobile-api`	B	renewed certificate	A
`promo-spring`	F	decommissioned	gone
`vpn-old`	F	decommissioned, cert revoked	gone

🔄 Check Your Understanding: The team revoked the certificate for the decommissioned vpn-old portal rather than just letting it expire in 25 days. Given the chapter's point that revocation is the least reliable part of the PKI, was revoking it worth doing? (Hint: consider what could happen in those 25 days if the private key for a live-but-abandoned host were compromised, and weigh "imperfect revocation plus decommission" against "do nothing and wait.")

Phase 5 — Closing the loop: automated certificate-lifecycle management

The microsite's expired certificate and the mobile API's 11-days-to-expiry near-miss had the same root cause: nobody owned certificate renewal, and nothing watched expiry. Fixing individual certificates without fixing this would guarantee a repeat. So the final and most important deliverable was the lifecycle process, mapped to Figure 5.2:

Meridian certificate-lifecycle program:
  INVENTORY     -> the reconciled list becomes a living source of truth;
                   CT-log monitoring auto-adds any new cert for our names.
  MONITORING    -> "days_until_expiry" tracked for every cert; alerts at
                   30 / 14 / 7 days, escalating to the named owner then the team.
  RENEWAL       -> automated via ACME where the CA/endpoint supports it;
                   for the rest, a calendared, owned, ticketed process.
  REVOCATION    -> a documented runbook: who can revoke, how (CRL/OCSP via the
                   CA), and the trigger conditions (key compromise, mis-issuance,
                   decommission).
  MIS-ISSUANCE  -> standing CT-log monitoring for *.meridianbank.example; any
                   certificate the bank did not request fires an alert (see CS-adjacent
                   Exercise 24).

Theo wired the expiry tracking into the same continuous-monitoring pipeline as vulnerability scanning, so "certificate expiring in 14 days" landed in the same queue, with the same escalation, as a high-severity patch. The metric Dana would carry to the board was simple and honest: number of TLS endpoints below grade A (target: zero) and number of certificates inside 14 days of expiry without a renewal in progress (target: zero).

🔗 Connection: This certificate-lifecycle work is the seed of the machine-identity and secrets-management program built later in the book, where certificate lifecycle is run at the scale of thousands of internal service certificates and the keys move into vaults and HSMs. What Sam and Theo built here for 23 external endpoints is the same discipline that, scaled up, governs every machine identity at the bank. Starting small and external, then growing inward, is the right order.

Discussion Questions

The audit began with Certificate Transparency logs as an inventory source and found roughly twice as many endpoints as the team remembered. What does it say about an organization's security that its most complete asset list lived in a public log it did not control? How should this change how Meridian manages new endpoints going forward?
Sam insisted on fixing the partner-API "C" before the microsite "F." Construct the strongest argument for the opposite priority, then say why Sam's ordering is nonetheless more defensible. What single piece of information would most change the call?
The most secure remediation for two endpoints was decommissioning them, not hardening them. When is "delete the asset" the right security answer, and what are the risks of decommissioning something that turns out to still be in use?
The team's board metric was "endpoints below grade A" and "certificates near expiry without renewal." Are these good metrics? What behavior might they accidentally incentivize, and what would you add to keep them honest? (Recall Theme 5: compliance is the floor.)
Revocation is the least reliable part of the PKI, yet the team revoked the decommissioned VPN certificate anyway and is shortening certificate lifetimes. Reconcile these: if revocation is unreliable, why bother, and how do short lifetimes compensate?

Your Turn

Take an organization you know (or invent a small one with several web properties) and run this audit on paper. (1) Inventory: list every TLS endpoint you can think of, then note that a CT-log search would likely find more — name two endpoints an organization commonly forgets. (2) Scan (simulated): for three endpoints, invent plausible scan results (protocol, forward secrecy, weak ciphers, cert days left) and grade each A–F using this chapter's logic. (3) Prioritize: order the remediation by grade × value/exposure, not grade alone, and justify the top item in two sentences. (4) Close the loop: write the three-line certificate-lifecycle process (inventory, expiry monitoring with alert thresholds, automated renewal) that would prevent the next expiry surprise. Keep it to one page.

Key Takeaways

You cannot harden what you have not inventoried. A TLS audit starts with a complete endpoint list, and Certificate Transparency logs are often the most complete inventory you have — they reveal the forgotten assets that rot into the microsite problem.
Defensive scanning is read-only and continuous. Enumerate protocols, ciphers, forward secrecy, and certificate validity on your own assets, then grade the results to turn a wall of text into a ranked worklist (tls_config_grade).
Grade is the first sort; value and exposure are the second. The worst grade is rarely the most urgent fix — a downgradeable cipher on a sensitive, live, partner-facing API outranks an F on a useless abandoned site. And the cheapest, most secure fix for a useless asset is to delete it.
A fix is not done until it is re-scanned. Verify every change by re-grading; before/after grades become both the audit evidence and the board metric.
The root cause of expired-certificate outages is unowned lifecycle, not bad luck. Closing the loop means an inventory, expiry monitoring with escalating alerts, automated renewal (ACME where possible), a revocation runbook, and standing CT-log monitoring for mis-issuance.
Crypto findings are mostly operational. Every failure in this audit — old protocols, old ciphers, expired and forgotten certificates — was a configuration or lifecycle problem, not a broken algorithm. That is where a defender's time goes.