Exercises: Secrets and Machine Identity

DataField.Dev

Exercises: Secrets and Machine Identity

These exercises move from recognizing secrets to designing the systems that govern them. Difficulty is marked ⭐ (recall/application), ⭐⭐ (analysis), and ⭐⭐⭐ (synthesis/open-ended). A dagger (†) marks problems with a full worked solution in Appendix: Answers to Selected Exercises — try every problem before you read one.

Work in your own notebook or a private repository. Several exercises involve scanning for or writing secret-shaped strings: use only obviously-fake placeholder values (such as AKIA...EXAMPLE, ghp_EXAMPLE...), never a real credential, and only ever scan code you own or are authorized to scan.

Part A — Core vocabulary ⭐

1.† Define in one sentence each: secret, machine identity, workload identity, service account, secret sprawl. Then write one sentence that uses secret sprawl and secret leak correctly to describe how a hard-coded key becomes a breach.

2. Classify each as a secret, a machine identity, a control, or none of these: (a) an AWS access key; (b) a Kubernetes pod's federated token; (c) a nightly backup job; (d) a secrets vault; (e) a database connection string with an embedded password; (f) an employee's fingerprint; (g) a -----BEGIN RSA PRIVATE KEY----- block; (h) automated certificate renewal.

3. Explain the difference between a static secret and a dynamic secret, and give one security property each has that the other lacks.

4.† Why do non-human identities typically outnumber human ones by 10×–50× in a cloud environment? Name three categories of machine identity that contribute to the count.

5. For each property in the human-vs-machine identity table from §20.1 — count, second factor, lifecycle, rotation, visibility, interactive login — state in a phrase why it makes machine identity harder or easier to defend than human identity.

Part B — Vaults, dynamic secrets, and workload identity ⭐⭐

6.† A teammate proposes "solving secrets" by moving every secret from source code into environment variables. List three concrete ways this is weaker than a vault, and one way it is nonetheless an improvement over hard-coded secrets.

7. Explain how a stolen dynamic secret is less valuable than a stolen static secret in terms of the attacker's usable time window. Then name the one situation in which even a dynamic secret is fully dangerous despite its short life.

8.† Rewrite this design to eliminate the long-lived secret. Current state: a containerized service in AWS holds a hard-coded IAM access key in its image so it can read an S3 bucket. Describe the workload-identity design that removes the key entirely, and name the residual risk that remains (hint: SSRF and the metadata service).

9. A secrets vault is described as "concentrating risk into one place." Argue both why that is a good engineering decision and what two specific protections the vault therefore requires that an ordinary server does not.

10. ⭐⭐⭐ Design it. Sketch a secrets architecture for a new microservice at Meridian that needs (a) a database credential, (b) an API key for a payment vendor, and (c) a certificate for mTLS to a sibling service. For each, state where the secret comes from, how long it lives, and how it rotates. Prefer "no static secret" wherever the platform allows.

Part C — Certificate lifecycle ⭐⭐

11.† Using the cert_days_left logic from the chapter, compute the days remaining for each certificate given that "now" is 2026-06-14 (UTC). State which would trigger a 30-day renewal alert. - (a) notAfter = 2026-06-20 - (b) notAfter = 2026-09-30 - (c) notAfter = 2026-06-10 - (d) notAfter = 2026-07-14

12. Explain why the expired-certificate outage is called "the most preventable failure in operations," and describe the minimum control set that prevents it. Why is a spreadsheet maintained by one person an anti-pattern here?

13.† Give two reasons short-lived certificates are operationally more secure than long-lived ones. Address both renewal and revocation in your answer.

14. Revocation (CRL/OCSP) is described as "unreliable in practice." Explain the fail-open problem and why short-lived certificates are the strongest mitigation for a compromised private key.

Part D — Finding the leaked key (secret scanning) ⭐⭐

15.† Find the leaked key. Scan this (illustrative) configuration excerpt by hand using the chapter's patterns. List each finding as (kind, value). Use only the fact that AWS keys are AKIA + 16 uppercase alphanumerics, GitHub PATs are ghp_ + 36 chars, and private keys begin with -----BEGIN ... PRIVATE KEY-----.

db_host    = "10.20.0.15"
db_pass    = "AKIAIOSFODNN7EXAMPLE"
ci_token   = "ghp_EXAMPLEEXAMPLEEXAMPLEEXAMPLEEXAMPL"
greeting   = "welcome to the build server"
tls_key    = "-----BEGIN EC PRIVATE KEY-----"
note       = "rotate me, set in 2019, never touched"

16. A developer commits a secret, notices, deletes the file, and pushes a commit titled "remove secret." Explain precisely why the secret is still compromised and write the three-step correct response.

17.† Write the rule. Write a regular expression that matches a Slack bot/user token of the form xoxb- or xoxp- followed by at least 10 characters of [0-9A-Za-z-]. Then give one true-positive example (fake) it should match and one false-positive risk — a benign string a naive pattern might catch — and say how you would reduce false positives.

18. Name the three placements where a secret scanner should run (developer machine, pipeline, history/runtime) and state, for each, what class of leak it catches and one limitation.

19. ⭐⭐⭐ Secret scanning produces false positives (a random high-entropy string that is not a secret) and false negatives (a secret in a format the scanner does not know). Discuss the cost of each kind of error for a security team, and how the "rotate on confirmed leak" rule interacts with a high false-positive rate.

Part E — Respond to this incident ⭐⭐

20.† Respond to this incident. Marcus's SOC receives this alert: a Meridian AWS access key tied to the svc-statements reporting workload was used at 03:14 UTC to call s3:ListAllMyBuckets and iam:ListUsers from a source IP in a country where Meridian has no operations. The key normally runs a nightly PDF job at 02:00 against one bucket. Walk through your response: (a) what does this pattern indicate; (b) what are your first two containment actions; (c) what is the single action that actually stops the attacker's access; (d) what longer-term fix (from this chapter's standard) would have prevented it?

21. A service account svc-backup appears in the authentication logs with logon_type=interactive on a domain controller at 19:40. Explain why this single event is high-fidelity (worth investigating immediately) and what you would check next.

Part F — Write the policy / design the architecture ⭐⭐–⭐⭐⭐

22.† Write the policy. Draft four rules of a one-page secrets-management standard for a small company, each as a single enforceable sentence (e.g., "No secret shall be committed to source control..."). Cover: storage, rotation, service-account privilege, and certificate expiry. Make each rule auditable — a reviewer must be able to check whether it is met.

23. Design the rotation. Design an automated rotation plan for a database service account that twelve microservices use. Address: how services get the new credential without downtime, how often it rotates, what happens to in-flight connections, and how you confirm rotation succeeded. Why is a vault with dynamic secrets the cleanest solution?

24. Design it. Meridian is migrating off long-lived AWS access keys. Propose a phased plan: (1) stop the bleeding, (2) inventory, (3) migrate, (4) prevent recurrence. For each phase name the control from this chapter and one success metric.

Part G — CTF-style challenge ⭐⭐⭐

25.† The dormant key awakens. A vault audit log (illustrative) shows that a secret named legacy/partner-api-key, last accessed 414 days ago, was just requested three times in ten minutes by a workload identity that has never requested it before, from a pod that normally only reads app/config. Nothing is "broken"; no alert fired from the application. (a) Construct the most likely attack hypothesis. (b) Identify the two anomalies that should have been detections. (c) State what you rotate and in what order. (d) Explain how §20.1's "machine behavior is boring" principle made this catchable where a human's behavior would not have been.

Part H — Interleaved & forward-looking ⭐⭐

26. (Interleaved — Ch.19) Privileged access management vaults human admin credentials with just-in-time issuance and session recording. Compare and contrast that with vaulting machine secrets: which two ideas transfer directly, and what does the machine case have that the human case does not (and vice versa)?

27. (Interleaved — Ch.4/5) A vault encrypts stored secrets under a master key held in an HSM, and workloads authenticate to each other with mTLS. In two sentences, connect each to the cryptography concepts (key management, certificates, the CA) you learned earlier — what is the HSM protecting, and what does mTLS prove?

28. (Interleaved — Ch.18) You hunted orphaned human accounts in identity governance. Describe the machine-identity equivalent — the orphaned service account — and why it is harder to find and retire than an orphaned employee account.

29. This chapter says workload identity and mTLS "set up zero trust" (Chapter 32) and that secret scanning belongs in the pipeline (Chapter 31). In two sentences each, predict what role machine identity will play in those two later chapters.

30. ⭐⭐⭐ Open reflection. The chapter argues the highest-leverage move is "don't have a static secret" rather than "store the secret better." Write half a page on where this principle could and could not be applied in an environment you know — legacy systems, third-party integrations, devices — and what you would do about the cases where a static secret is unavoidable.

Solutions to daggered (†) problems are in the Answers appendix. The remaining problems are deliberately open — bring them to a study group or your instructor.