Chapter 15: Cloud Security: AWS, Azure, GCP — Shared Responsibility and the New Attack Surface

DataField.Dev

45 min read

> "There is no cloud. It's just someone else's computer — and you are still responsible for what you put on it."

Prerequisites

6
11

Learning Objectives

Apply the shared-responsibility model to decide which security tasks belong to the provider and which belong to you, across IaaS, PaaS, and SaaS.
Identify and remediate the two most damaging cloud misconfigurations — publicly exposed storage and over-broad IAM — by reading an ACL and an IAM policy.
Design least-privilege cloud IAM and explain why identity, not the network, is the cloud's real perimeter.
Use cloud security posture management (CSPM), workload protection (CWPP), and preventive guardrails to catch misconfiguration at scale rather than one bucket at a time.
Enable and reason about cloud-native logging (CloudTrail and equivalents) and build a basic detection for risky cloud activity.

In This Chapter

Overview
Learning Paths
15.1 The cloud changes the boundary
15.2 Shared responsibility, concretely
15.3 Identity is the new perimeter (cloud IAM)
15.4 The misconfiguration epidemic (public storage and security groups)
15.5 CSPM, CWPP, and guardrails
15.6 Cloud logging and detection
Project Checkpoint
Summary
Spaced Review
What's Next

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 15: Cloud Security: AWS, Azure, GCP — Shared Responsibility and the New Attack Surface

"There is no cloud. It's just someone else's computer — and you are still responsible for what you put on it." — A common security-team adage

Overview

The data breach that embarrassed a Fortune 500 retailer in the most ordinary way possible did not involve a zero-day, a nation-state, or a single line of malicious code. An engineer created a storage bucket to hold a backup, clicked a setting whose consequences he did not fully understand, and made it readable by the entire internet. No one attacked anything. A security researcher running an automated scanner — the kind that does nothing but list every public bucket it can find, all day, every day — stumbled onto millions of customer records sitting in the open, indexed and downloadable by anyone who knew the URL. The first the company heard of it was a journalist's email. There was no alarm, because nothing had been "broken into." The door had simply been built without a lock, and the cloud is a neighborhood where automated burglars try every door, continuously, for free.

This is the characteristic cloud breach, and it is worth sitting with how unspectacular it is. When Meridian Regional Bank ran its first cloud security review — the subject of this chapter's project work and case study — the team expected to hunt for sophisticated attacks. What they found instead was a sprawl of small, boring mistakes: a storage bucket that a contractor had opened "temporarily" eighteen months earlier and never closed; an identity-and-access policy that granted a batch job permission to do anything in the account because writing a precise policy was harder than writing "Action": "*"; a database whose network firewall allowed connections from 0.0.0.0/0 — every IP address on Earth. None of these was an attack. Each was a self-inflicted opening, waiting for the internet's automated scanners to notice. The cloud did not make Meridian less secure. It moved the security work to a new place, with new defaults and new failure modes, and for a while nobody had moved with it.

That relocation of responsibility is the whole subject of this chapter. When Meridian ran its own data center (the hardening of which you studied in Chapter 11), the bank owned every layer: the building, the power, the network cables, the hypervisor, the operating system, the application. In the cloud, the provider — Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP) — now owns some of those layers, and you own the rest, and the single most expensive mistake in cloud security is being wrong about where that line falls. The provider will secure the physical data center, the underlying hardware, and the virtualization layer to a standard most organizations could never match. But the provider will also, by design, let you make your data world-readable with two clicks, and will not stop you, because in the cloud's model that is your decision to make. Security in the cloud is not weaker than on-premises. It is differently distributed, and you have to know your half.

In this chapter, you will learn to:

Apply the shared-responsibility model concretely — to draw, for any service, the line between what the provider secures and what you must.
Distinguish IaaS, PaaS, and SaaS, and explain how your share of the security work shrinks as you move up that stack.
Read a cloud IAM policy and a security-group rule well enough to spot the over-broad grant and the wide-open port that cause most cloud incidents.
Detect and prevent the misconfiguration epidemic — public storage, permissive identity, and exposed metadata — using least privilege, CSPM, and preventive guardrails.
Turn on the cloud's audit log (AWS CloudTrail and its peers) and write a first detection for the activity that precedes a cloud breach.

Learning Paths

Cloud security sits at the intersection of engineering and operations, and it is heavily tested on certification exams. Weight this chapter as follows:

🛡️ SOC Analyst: §15.6 is your chapter — cloud logging and detection are now core SOC skills, and the activity that precedes a cloud breach (a new public bucket, an IAM policy change, a login from a new region) lives in CloudTrail. Read §15.3 (cloud IAM) closely too; most cloud alerts are identity events. 🏗️ Security Engineer: This is one of your home chapters. §15.3 (IAM design), §15.4 (the misconfigurations you must prevent), and §15.5 (CSPM/CWPP and guardrails) are the heart of the cloud engineer's job. The worked IAM policy and security-group examples are templates you will reuse. 📋 GRC: Focus on §15.1–15.2 (shared responsibility — the governance question of "who owns this control?") and the compliance framing throughout. You map cloud controls to PCI-DSS and GLBA; the shared-responsibility split decides which controls you must evidence. 📜 Certification Prep: Shared responsibility, IaaS/PaaS/SaaS, cloud misconfiguration, and least-privilege IAM are tested on Security+ and CISSP and on every cloud certification. The key-takeaways.md file maps the crosswalk.

15.1 The cloud changes the boundary

Start with a question that sounds simple and is not: when your application runs in the cloud, who is responsible for securing it? New engineers reliably get this wrong in one of two directions. Some assume the provider — a company spending billions on security, with armies of engineers — handles everything, and that moving to the cloud is itself a security upgrade you can stop thinking about. Others assume the opposite, that "the cloud is just someone else's computer" means it is inherently insecure and the provider does nothing for you. Both are wrong, and the breaches that follow from each are predictable. The first group leaves data in public buckets because they assumed AWS would not let that happen. The second group wastes effort re-securing layers the provider already secures better than they could, while still leaving their own layer exposed.

The truth is a division of labor, and to reason about it you first need to know what kind of cloud service you are using, because the line moves depending on how much of the stack you have handed over.

Infrastructure as a Service (IaaS) is the cloud at its most raw: the provider gives you virtual machines, virtual networks, and storage, and you do almost everything else. An AWS EC2 instance, an Azure Virtual Machine, a GCP Compute Engine instance — these are IaaS. The provider runs the physical host and the hypervisor; you choose the operating system, patch it, configure its firewall, install and secure the application, and manage the data. IaaS gives you the most control and hands you the most responsibility. If you launch an EC2 instance and never patch its operating system, that unpatched OS is entirely your problem — exactly the host-hardening work of Chapter 11, now running on a machine you do not physically own.

Platform as a Service (PaaS) raises the floor. The provider manages not just the hardware and hypervisor but the operating system and the runtime, and you supply only your application code and its configuration. AWS Lambda, Azure App Service, Google App Engine, and managed databases like Amazon RDS are PaaS. You no longer patch the OS — the provider does — but you are still responsible for your code's security, your data, and crucially the access controls and configuration of the service. A managed database patched by the provider but left reachable from the entire internet with a weak password is still your breach. PaaS shrinks your share of the work; it does not eliminate it.

Software as a Service (SaaS) is the top of the stack: a finished application you simply use. Salesforce, Microsoft 365, Google Workspace, Slack — the provider runs everything, including the application itself. Your responsibility narrows to how you use it: who has accounts, what permissions they hold, how they authenticate, what data you put in, and how you configure the application's own security settings. The classic SaaS breach is not a hacked vendor; it is a customer who configured sharing too broadly, never turned on multi-factor authentication, or left a departed employee's account active. Even at the top of the stack, identity and configuration remain yours.

Here is the relationship as a diagram. The stack is the same set of layers in every model; what changes is where the line falls between the provider's responsibility and yours.

   Figure 15.1 — Who secures what, by service model
   (■ = YOU are responsible    □ = the PROVIDER is responsible)

   Layer                  On-Prem    IaaS     PaaS     SaaS
   ─────────────────────────────────────────────────────────
   Data & data classify.    ■         ■        ■        ■     <- ALWAYS yours
   Identity & access (IAM)  ■         ■        ■        ■     <- ALWAYS yours
   Application logic        ■         ■        ■        □
   Runtime / middleware     ■         ■        □        □
   Operating system         ■         ■        □        □
   Virtualization / host    ■         □        □        □
   Physical network         ■         □        □        □
   Physical data center     ■         □        □        □
   ─────────────────────────────────────────────────────────
   Your share SHRINKS as you move right (up the stack) ───────►
   But the TOP TWO rows are NEVER the provider's job.

Read the diagram top to bottom and one fact jumps out: no matter which model you choose, the data and the identity-and-access layers are always yours. The provider will never decide who in your organization should have access to your data, or classify which of your data is sensitive, or notice that you granted an intern administrative rights. Those are judgments only you can make. This is why, as we will see, the overwhelming majority of cloud breaches trace back to those two always-yours layers: data exposed through misconfigured storage, and access granted too broadly through misconfigured IAM. The provider hardens the parts you cannot see; you misconfigure the parts you can.

🚪 Threshold Concept: In the cloud, security failures are almost never the provider's hypervisor being breached — that is vanishingly rare and not your job to prevent. They are your misconfigurations of the layers the provider deliberately leaves under your control. Once you accept that "the cloud got breached" almost always means "we misconfigured our cloud," you stop looking for exotic attacks and start auditing your own data and identity layers, which is where the actual risk lives.

🔗 Connection: This builds directly on Chapter 11. Hardening an operating system does not stop being your job because the OS now runs on rented hardware — in IaaS it is exactly the same job, with the same CIS Benchmarks and the same patch discipline, just on an instance you spun up with an API call. The cloud changes who owns the layers below the OS; it does not change the work at and above it.

🔄 Check Your Understanding: 1. For each, name who is responsible for patching the operating system: (a) an AWS EC2 instance running your web server; (b) an Amazon RDS managed database; (c) Microsoft 365. 2. Two layers are your responsibility in every service model, including SaaS. Which two, and why can the provider never take them over?

Answers

(a) You — EC2 is IaaS, the OS is yours. (b) The provider — RDS is PaaS, the provider patches the database engine and OS. (c) The provider — M365 is SaaS, the provider runs the entire stack. 2. Data (including its classification) and identity-and-access management. The provider cannot decide which of your data is sensitive or who in your organization should have access — those are business judgments only you can make, which is why most cloud breaches live in exactly these two layers.

15.2 Shared responsibility, concretely

The diagram in §15.1 is the idea. Now we make it operational, because "the provider secures the infrastructure and you secure your stuff" is too vague to act on when you are standing in front of an actual AWS account at 9 a.m. deciding what to fix. The shared responsibility model is the framework — published by every major provider — that divides security duties between the cloud provider and the customer. AWS phrases it as the provider being responsible for security "of" the cloud (the hardware, software, networking, and facilities that run cloud services) and the customer being responsible for security "in" the cloud (everything the customer configures and puts there). Azure and GCP publish near-identical models. The wording is corporate; the consequences are concrete and, when misunderstood, catastrophic.

Make it specific with Meridian. The bank runs its loan-document archive in Amazon S3 (Simple Storage Service, AWS's object storage). Here is how responsibility actually divides for that one workload:

Concern	Whose job	What it means in practice
The physical servers and disks storing the data	AWS	Datacenter physical security, hardware lifecycle, redundancy — Meridian never thinks about this
The S3 service software itself (durability, the API)	AWS	AWS patches and operates S3; Meridian trusts its 11-nines durability claim
Whether the loan-document bucket is public or private	Meridian	A single ACL/policy setting Meridian controls — and the one that causes breaches
Encryption of the data at rest	Shared	AWS provides the encryption capability; Meridian must turn it on and manage keys (Ch. 5 covered the crypto)
Who in Meridian can read the bucket (IAM)	Meridian	Entirely Meridian's identity policies — AWS enforces what Meridian writes
Encryption in transit (TLS to S3)	Shared	AWS supports TLS; Meridian must require it and reject plaintext
Detecting that the bucket became public	Meridian	Meridian must enable and watch logs/CSPM — AWS will not call to warn you

Notice the pattern. AWS's responsibilities are the ones Meridian cannot perform and would perform worse — datacenter security, hardware, the durability of a globe-spanning storage service. Meridian's responsibilities are the ones AWS cannot perform because they encode Meridian's business decisions — should this bucket be public, who should have access, is this data sensitive. The "shared" rows are the trap: encryption at rest is available from AWS but off until you turn it on, and an engineer who assumes "AWS encrypts everything" may leave a bucket unencrypted because they thought it was the provider's job. The model does not fail by giving you too little; it fails when you misread a shared duty as the provider's.

⚠️ Common Pitfall: Assuming a "secure provider" means a "secure deployment." AWS, Azure, and GCP are, at the infrastructure layer, more secure than almost any organization's own data center — but that security does not extend upward into your configuration. The provider securing its hypervisor does nothing to stop you from making a bucket public, granting * permissions, or opening a database to the world. Every major cloud breach of the last decade was a customer-side misconfiguration, not a provider-side compromise. The provider's excellent security is necessary; it is nowhere near sufficient.

The shared-responsibility model also reframes a governance question that GRC professionals will recognize from Chapter 11's hardening standards: for every control, who owns it, and how do we evidence it? When an auditor asks Meridian to prove that cardholder data in the cloud is encrypted at rest (a PCI-DSS requirement), "AWS handles encryption" is not an acceptable answer — Meridian must show that it enabled encryption on the specific buckets and databases in scope, and that it manages the keys. The shared-responsibility model is, in this sense, a control-ownership map. The provider hands you a "compliance inheritance" for the layers it owns (you can inherit AWS's datacenter physical-security controls in your own audit), but you must independently evidence every control on your side of the line.

🛡️ Defender's Lens: The shared-responsibility line is also a detection line. You will never see the provider's hardware logs, and you do not need to — that telemetry is the provider's to watch. But everything on your side of the line generates telemetry you can and must collect: every API call that changes a configuration, every login, every permission grant, every bucket that turns public. We come back to this in §15.6, but internalize it now: your half of shared responsibility is also your half of visibility, and the cloud is unusually generous about logging it — if you turn the logging on.

🔄 Check Your Understanding: 1. An engineer says, "We don't need to worry about encrypting our cloud database — the cloud provider encrypts everything." Where is this reasoning dangerous, using the shared-responsibility model? 2. Why is "AWS is responsible for encryption" an inadequate answer when a PCI-DSS auditor asks Meridian to prove cardholder data is encrypted at rest?

Answers

Encryption at rest is typically a shared responsibility: the provider supplies the capability, but it is frequently off until the customer enables it and configures key management. Assuming the provider does it automatically can leave data unencrypted. 2. The control on Meridian's side of the line must be evidenced by Meridian — the auditor needs proof that encryption was enabled on the specific in-scope resources and that Meridian manages the keys. The provider's general capability is not evidence that Meridian configured it.

15.3 Identity is the new perimeter (cloud IAM)

In Chapter 6 you learned the network perimeter — the firewall between an "inside" and an "outside" — and in the same breath learned that it is dissolving. Nowhere has it dissolved more completely than in the cloud. There is no cable to cut, no single chokepoint to firewall, no physical "inside." Instead, every action against a cloud account — reading a file, launching a server, deleting a database, creating a new user — is an authenticated API call, and what decides whether that call succeeds is not where it came from on the network but who is making it and what they are permitted to do. In the cloud, identity is the perimeter. Get identity right and an attacker with network access still cannot act; get identity wrong and an attacker needs no network cleverness at all, because a leaked credential is a key to the front door.

Cloud IAM (identity and access management) is the system every cloud provider gives you to define who (which identities) can do what (which actions) to which resources, under what conditions. (We mean specifically the cloud platform's own access-control plane here — AWS IAM, Azure RBAC and Entra ID, GCP IAM. The broader organizational discipline of running an identity program across an enterprise — joiners, movers, leavers, access reviews at scale — is identity governance, which you will study in Chapter 18. Keep the two distinct: this chapter is about getting the cloud platform's permissions right; Chapter 18 is about governing identity as an enterprise process.) Cloud IAM has a few moving parts you must know by name:

A principal (or identity) is who is acting: a human user, a group of users, or — increasingly the majority — a role or service account assumed by an application, a virtual machine, or a function. Most cloud API calls are made by non-human identities, a fact that will matter enormously in Chapter 20.
A policy is the document that grants or denies permissions. In AWS it is JSON specifying which actions (e.g., s3:GetObject) a principal may take on which resources (e.g., a specific bucket), under optional conditions (e.g., only from a certain network, only with MFA).
A role is a set of permissions that a principal can assume temporarily, rather than permissions attached permanently to a user. Roles are how you give a virtual machine or a Lambda function exactly the access it needs without embedding long-lived credentials — a pattern we will lean on hard.

The governing principle is the one you met as a security fundamental in Chapter 3 and will see applied to access at scale in Chapter 17: least privilege. A principal should hold the minimum permissions required to do its job, and no more. In the cloud this principle is both more important and more frequently violated than anywhere else, because cloud permissions are fine-grained (AWS defines thousands of distinct actions), writing a precise policy is tedious, and the path of least resistance — granting broad permissions — works immediately, fails silently, and leaves a wide-open door that nothing will flag until an attacker or an auditor finds it.

Here is what least privilege looks like in practice. Meridian has a batch job that reads loan documents from one specific S3 bucket and nothing else. Compare two IAM policies that both "make the job work":

// OVER-BROAD — works, but grants far too much (an attacker's dream)
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": "*",
    "Resource": "*"
  }]
}

This policy says: this principal may take any action on any resource in the account. It is, functionally, an administrator. The batch job needed to read one bucket; this grants it the power to delete every database, create new admin users, disable logging, and exfiltrate everything. If the credentials for this job leak — into a code repository, a log file, a compromised laptop — the attacker inherits total control of the account. The job works, the engineer moves on, and a "Action": "*" time bomb sits in the account until someone finds it. Now the least-privilege version:

// LEAST PRIVILEGE — grants exactly what the batch job needs, nothing more
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:ListBucket"],
    "Resource": [
      "arn:aws:s3:::meridian-loan-docs",
      "arn:aws:s3:::meridian-loan-docs/*"
    ],
    "Condition": {
      "Bool": { "aws:SecureTransport": "true" }
    }
  }]
}

This policy says: this principal may list and read objects in the meridian-loan-docs bucket, and only over an encrypted connection. The batch job works exactly as before. But if these credentials leak, the attacker gets read access to one bucket of loan documents — a serious problem, but a bounded one. They cannot pivot to the rest of the account, cannot destroy anything, cannot escalate. This is the entire game of cloud IAM: not "can the job do its work" (both policies pass that test) but "how much damage does this identity enable if it is compromised." You design every policy assuming the credential will eventually leak — Theme 4, assume breach, applied to identity — and you make the blast radius as small as the job allows.

Two more IAM patterns are worth internalizing because attackers love their absence:

Prefer roles and short-lived credentials over long-lived access keys. A long-lived AWS access key (the AKIA... kind) is a password that never expires and is trivial to leak into a Git repository. A role assumed by an EC2 instance or a workload supplies temporary credentials that rotate automatically and never need to be stored in code. The single most common serious cloud finding is a long-lived access key checked into source control; we build a scanner for exactly this in Chapter 20.
Require MFA for privileged and human identities, and add conditions. An IAM policy can demand that sensitive actions only succeed when the caller has authenticated with multi-factor authentication ("aws:MultiFactorAuthPresent": "true"), or only from known networks. These conditions turn a stolen password into a dead end — the same phishing-resistance logic that saved Meridian in Chapter 1, now expressed as a policy condition. Authentication itself is Chapter 16's subject; here, just know that IAM lets you require it as a gate on the action.

⚠️ Common Pitfall: The "Action": "*" and "Resource": "*" policy — granted "to get the thing working" and never tightened. It is the cloud equivalent of giving every employee a master key because cutting individual keys was slower. Engineers reach for the wildcard under deadline pressure, it works, and the over-privileged identity becomes a permanent liability. The fix is cultural as much as technical: treat every wildcard in a policy as a finding to be justified or removed, and use the provider's access-analyzer tooling (which observes what permissions are actually used and proposes a tighter policy) to right-size grants after the fact.

🔗 Connection: Cloud IAM is least privilege (Chapter 3) made concrete and unforgiving. On-premises, an over-privileged account is dangerous but often still behind a network perimeter. In the cloud, the IAM policy frequently is the only thing standing between a leaked credential and your data — there is no firewall behind it. This is why Part IV (Chapters 16–20) devotes five chapters to identity: in the modern world, and especially in the cloud, identity is where the real security boundary lives.

🔄 Check Your Understanding: 1. In one sentence, why is "identity is the new perimeter" especially true in the cloud, where it might be less true on a traditional on-premises network? 2. A policy grants "Action": "*" on "Resource": "*" to a job that only needs to read one bucket. The job works perfectly in testing. Why is this nonetheless a serious finding, and what is the precise risk?

Answers

Because in the cloud every action is an authenticated API call evaluated by IAM rather than gated by a network location, so what an identity is permitted to do — not where it sits on a network — determines what an attacker who holds that identity can do; there is often no network perimeter behind the IAM policy. 2. The policy grants full administrative control of the account, so if the job's credentials ever leak (into a repo, a log, a compromised host), the attacker inherits total control — they can read everything, destroy resources, create new admins, and disable logging — when the job only ever needed read access to a single bucket. The blast radius is enormous and entirely unnecessary.

15.4 The misconfiguration epidemic (public storage and security groups)

Now we reach the heart of cloud security, because this is where the breaches actually happen. A cloud misconfiguration is a cloud resource left in an insecure state through a setting the customer controls — and it is, by a wide margin, the leading cause of cloud data exposure. Not exotic exploits. Settings. The cloud's power is that anyone can provision infrastructure with an API call; its danger is that the same ease lets anyone misconfigure it just as fast, and at a scale where a single bad default replicated across a thousand resources is a thousand open doors. We will dissect the two misconfigurations responsible for most public cloud breaches — exposed storage and over-permissive network rules — and then the subtler one that turns a small foothold into a total compromise: exposed instance metadata.

Public object storage — the bucket the whole internet can read

Object storage — S3 on AWS, Blob Storage on Azure, Cloud Storage on GCP — is where organizations dump data at scale: backups, logs, images, exports, and far too often, customer records that should never have left a database. Every object-storage service lets you control who can access a bucket and its contents through access control lists (ACLs) and resource policies. And every one of them lets you, with a couple of clicks or one API call, set that access to public — readable by any unauthenticated user on the internet.

The mechanism is simple, which is exactly why it is dangerous. An S3 bucket's access can be governed by an ACL with grants like these:

   Bucket ACL — the two states that matter
   ─────────────────────────────────────────────
   PRIVATE (correct for almost all data):
     Grantee: AccountOwner            -> FULL_CONTROL

   PUBLIC (the breach state):
     Grantee: AllUsers (the internet) -> READ        <-- anyone can list/download
     Grantee: AllUsers                -> WRITE        <-- (rarer, worse: anyone can upload)

The grantee AllUsers is the special principal meaning everyone on the internet, unauthenticated. A bucket whose ACL grants READ to AllUsers is downloadable by anyone who finds its URL — and they will find it, because automated scanners enumerate public buckets continuously, and search engines and dedicated tools index them. (A WRITE grant to AllUsers is rarer and even worse: it lets strangers upload content into your bucket, which has been used to plant malware and deface sites.) The attacker's "exploit" is nothing more than an HTTP GET request. There is no break-in to detect because nothing was broken.

How an attacker abuses it: they do not target you specifically. They run — or buy access to the output of — a scanner that lists every public bucket it can find across all three clouds, all day. When your bucket appears, they download it. The first sign you have is often a researcher's email, a ransom note ("we have your data"), or a news story. This is the breach from this chapter's overview, and it is the single most common way organizations leak data to the cloud.

How you detect and prevent it — and this is where a defender earns their salary:

Block public access at the account level. AWS provides "S3 Block Public Access," an account-wide switch that overrides any bucket-level setting and refuses to make buckets public regardless of what an individual engineer configures. Azure and GCP have equivalents. This is a guardrail (§15.5): instead of hoping every engineer configures every bucket correctly forever, you make the dangerous state structurally impossible. Turn it on; very few legitimate workloads need truly public buckets, and those that do (a public website's assets) should be served through a content-delivery network with the bucket itself still private.
Default to private and encrypt. New buckets should be private by default and encrypted at rest. Make the secure state the default state, so an engineer has to actively work to create an insecure one.
Continuously audit with CSPM. A cloud security posture management tool (§15.5) scans your accounts continuously and flags any bucket that is public, the moment it becomes public — turning a silent, indefinite exposure into an alert you act on in minutes. This is the detection backstop for when prevention fails.
Log and alert on the change. The API call that makes a bucket public (PutBucketAcl, PutBucketPolicy) appears in CloudTrail (§15.6). Alerting on "a bucket just became public" catches the exposure at the moment it is created, which is the difference between a five-minute incident and an eighteen-month one.

Security groups — the firewall you forgot is a firewall

In the cloud, the firewall around a virtual machine is usually a security group — a set of rules that controls inbound and outbound network traffic to cloud resources, functioning as a virtual, stateful firewall attached to instances. (Azure calls the equivalent a Network Security Group; GCP uses VPC firewall rules.) Everything you learned about firewalls and default-deny in Chapters 6 and 7 applies — a security group is a firewall — but it is configured through the cloud console or API, which means it is just as easy to misconfigure as a bucket, and a bad rule exposes a server to the entire internet just as completely.

The canonical misconfiguration is a security-group rule allowing inbound access from 0.0.0.0/0 — every IPv4 address on Earth — to a sensitive port. Compare:

   Security-group inbound rules — bad vs. good
   ──────────────────────────────────────────────────────────────
   DANGEROUS (exposes a database admin port to the whole internet):
     Protocol  Port   Source           Purpose
     TCP       3306   0.0.0.0/0        "MySQL"     <-- DB open to the world
     TCP       22     0.0.0.0/0        "SSH"       <-- admin login open to the world
     TCP       3389   0.0.0.0/0        "RDP"       <-- Windows admin open to the world

   SAFER (least privilege on the wire — restrict source, use a bastion):
     Protocol  Port   Source              Purpose
     TCP       3306   10.20.0.0/24       "app subnet only"  <-- DB reachable from app tier
     TCP       22     10.20.255.10/32    "bastion host only"<-- SSH only via jump host
     (RDP 3389: not exposed; admin only through the bastion / VPN)

The dangerous rules say: anyone, anywhere, may attempt to connect to this database's admin port, this server's SSH, this Windows machine's remote desktop. Within minutes of such a rule existing, automated scanners (the same indiscriminate tide you saw hitting a new server in Chapter 1) begin brute-forcing credentials against those ports. A database with a weak or default password and 3306 open to 0.0.0.0/0 is not a hypothetical breach; it is a breach with a countdown timer. The safer rules apply the same principle as the least-privilege IAM policy: restrict the source to exactly the networks that legitimately need access — the application subnet for the database, a single bastion host for SSH — and expose nothing to 0.0.0.0/0 that does not have to be.

How you detect and prevent it: the same triad of guardrail, default, and audit. A preventive guardrail (a policy-as-code rule, §15.5) can reject any security-group change that opens a sensitive port to 0.0.0.0/0 before it is ever applied. A CSPM tool flags existing wide-open rules across all accounts. And the API call that creates the rule (AuthorizeSecurityGroupIngress) is logged, so you can alert on "someone just opened port 3389 to the world" and investigate immediately. Detection at the moment of change is the recurring cloud pattern: the cloud tells you everything that happens to your configuration if you are listening.

Exposed instance metadata — the small hole that becomes total compromise

The third misconfiguration is subtler and explains how a minor web vulnerability can escalate into full account compromise — which is why it appears in the analytical case study (Case Study 2). Every cloud virtual machine can query a special internal address — the instance metadata service, reachable at the link-local IP 169.254.169.254 — to learn about itself, including, critically, the temporary IAM credentials of the role attached to the instance. This is a feature: it is how a VM gets credentials without storing them. But if an application on that VM has a server-side request forgery (SSRF) flaw — a web vulnerability you will study in Chapter 13 that lets an attacker make the server fetch a URL of the attacker's choosing — the attacker can make the server fetch http://169.254.169.254/... and read the instance's IAM credentials out of the metadata service. Now the attacker holds whatever permissions that instance's role grants. If the role is over-broad (recall §15.3), a single SSRF bug just became total account compromise.

This is the chain that produced one of the most-studied cloud breaches in history, and the defenses are precise: enforce the hardened version of the metadata service (AWS's IMDSv2, which requires a session token and blocks the simple GET-based SSRF path), apply least privilege to instance roles so a stolen role credential is bounded, and fix the SSRF at the application layer (Chapter 13). Defense in depth: three independent controls, each of which alone would have broken the chain.

📟 War Story: A constructed but representative composite. A mid-size retailer ran a public web application on a cloud VM whose attached role had "Action": "*" "to keep things simple." The application had an unremarkable SSRF bug in an image-fetch feature. An attacker supplied a URL pointing at 169.254.169.254, read the instance's temporary credentials, and — because the role was administrator-equivalent — used them to enumerate and download every storage bucket in the account, including a database backup with millions of records. No malware, no zero-day: one web bug plus one over-broad role plus an un-hardened metadata service. Any one of the three defenses (IMDSv2, least-privilege role, fixing the SSRF) would have stopped it. This is defense in depth's whole argument in a single incident.

🔄 Check Your Understanding: 1. A security-group rule reads: TCP / 22 / 0.0.0.0/0. What does it allow, why is it dangerous, and what is a least-privilege fix? 2. Explain the chain by which an SSRF web vulnerability can lead to full cloud-account compromise, and name one control that breaks the chain.

Answers

It allows anyone on the entire internet to attempt SSH connections to the instance on port 22, which exposes the admin login to continuous automated brute-forcing; the least-privilege fix restricts the source to a single bastion host or known admin network (e.g., 10.20.255.10/32) rather than 0.0.0.0/0. 2. An attacker exploits SSRF to make the server request http://169.254.169.254/..., reads the instance role's temporary IAM credentials from the metadata service, and then uses those credentials — with whatever permissions the role holds — to act against the account; if the role is over-broad, this is total compromise. Breaking controls: enforce IMDSv2 (blocks the simple SSRF path), apply least privilege to the instance role (bounds the damage), or fix the SSRF itself.

15.5 CSPM, CWPP, and guardrails

By now a pattern is obvious: cloud risk is dominated by misconfiguration, misconfiguration happens fast and at scale, and humans cannot manually check every bucket, policy, and security group across dozens of accounts forever. The answer is automation — tooling that watches the cloud's configuration continuously and, better still, prevents dangerous configurations from ever existing. Three ideas do this work, and you should know them by name and by what each is actually for.

Cloud security posture management (CSPM) is the practice and tooling that continuously scans cloud accounts for misconfigurations and compliance violations, comparing your actual configuration against a baseline of secure settings and best-practice benchmarks. A CSPM tool is, in effect, an automated auditor that never sleeps. It connects to your AWS, Azure, and GCP accounts (read-only) and continuously answers questions like: Is any bucket public? Any security group open to 0.0.0.0/0 on a sensitive port? Any IAM policy granting *? Any unencrypted database? Any account without MFA? When it finds a violation, it raises a finding, often mapped directly to a control in a framework like the CIS Benchmarks or PCI-DSS. CSPM is detective — it finds the misconfiguration that already exists — and it is the backstop that turns a silent exposure into an alert. Native examples include AWS Security Hub, Azure Defender for Cloud, and GCP Security Command Center; many organizations also run a third-party CSPM that spans all three clouds with one view. For Meridian, CSPM is the difference between discovering a public bucket via a CSPM alert in minutes and discovering it via a journalist's email in months.

Cloud workload protection (CWPP) secures the workloads themselves — the virtual machines, containers, and serverless functions where your code actually runs — rather than the account's configuration. Where CSPM asks "is this resource configured securely?", CWPP asks "is this running workload compromised or vulnerable?" CWPP covers vulnerability scanning of running instances and container images, runtime threat detection (is this process behaving maliciously?), file-integrity monitoring, and malware detection — much of it the same endpoint and detection logic you met for servers in Chapter 11, adapted to ephemeral cloud workloads that may exist for only minutes. The two are complementary: CSPM secures the configuration (the always-yours layers from §15.1's diagram), CWPP secures the workload (the OS, runtime, and application in IaaS/PaaS). A mature cloud security program runs both.

Guardrails are the preventive complement to CSPM's detection, and they are the most powerful idea in this section. A guardrail is a preventive control that makes an insecure configuration structurally impossible or automatically rejected, rather than merely flagged after the fact. The distinction between a guardrail and a gate matters: a gate stops a deployment and waits for a human (which slows everyone down and gets bypassed under pressure), while a guardrail lets engineers move freely within a safe boundary and only blocks the specifically dangerous action. Examples make it concrete:

AWS S3 Block Public Access (account-wide) is a guardrail: it makes "public bucket" impossible no matter what any engineer configures.
Service control policies (SCPs) in AWS Organizations, and Azure Policy / GCP Organization Policy, are guardrails: an organization-wide rule can deny the action of disabling logging, or opening a security group to 0.0.0.0/0, or creating resources outside an approved region — for every account, regardless of the user's IAM permissions. Even an account administrator cannot override a guardrail set above them.
Policy-as-code on infrastructure-as-code (tools like Open Policy Agent, or AWS/Azure/GCP native policy engines) evaluates a proposed change before it is applied and rejects, say, any Terraform plan that would create a public bucket or a wide-open security group. This catches the misconfiguration before it ever exists — the cheapest possible place to catch it, a theme Chapter 31's DevSecOps work develops fully.

The hierarchy to remember: prevent with guardrails, detect with CSPM, protect workloads with CWPP. Guardrails stop the dangerous configurations you can anticipate; CSPM catches the ones that slip through or that no guardrail covers; CWPP defends the running workload against the threats that are not configuration at all. Each assumes the layer before it can fail — defense in depth, applied to the cloud control plane.

🛡️ Defender's Lens: The reason guardrails beat gates in the cloud is the same asymmetry from Chapter 1: engineers create cloud resources constantly, and you cannot have a human review every one without becoming the bottleneck everyone routes around. A guardrail flips the model — instead of a human approving every safe action, automation blocks only the unsafe ones, so engineers move fast by default and the dangerous state is simply unreachable. You make the secure path the easy path. That is how security scales in an environment where infrastructure is created by API call thousands of times a day.

⚠️ Common Pitfall: Buying a CSPM tool and treating the dashboard as the work. A CSPM that produces 4,000 findings nobody triages is the cloud version of the unwatched SIEM from Chapter 1 — technology with no process and no people. The value is not the scan; it is (1) prioritizing findings by real risk (a public bucket with customer data outranks an unencrypted log bucket), (2) fixing the systemic causes with guardrails so the same finding stops recurring, and (3) wiring the highest-severity findings to alerts a human acts on now. A posture tool is a starting point for a process, not a substitute for one.

🔄 Check Your Understanding: 1. Distinguish CSPM from CWPP in one sentence each — what does each secure? 2. What is the difference between a guardrail and a gate, and why do guardrails scale better in a cloud environment where engineers provision infrastructure constantly?

Answers

CSPM continuously scans the cloud account's configuration for misconfigurations and compliance violations (e.g., public buckets, over-broad IAM); CWPP secures the running workloads themselves (VMs, containers, functions) against vulnerabilities and runtime threats. 2. A gate stops a deployment and waits for a human decision (a bottleneck that gets bypassed under pressure), while a guardrail makes the dangerous action automatically impossible or rejected while letting all safe actions proceed freely; guardrails scale because they do not require a human in the loop for every one of the thousands of daily resource changes — engineers move fast by default and only the specifically unsafe action is blocked.

15.6 Cloud logging and detection

Everything on your side of the shared-responsibility line generates evidence — if you turn on the logging. This is the cloud's quiet gift to defenders: where an on-premises environment requires you to instrument hosts and networks yourself (Chapters 10 and 21), the cloud control plane logs every API call natively, giving you a near-complete record of who did what, when, from where, and whether it succeeded. The catch — and it is a real one — is that this logging is often not on by default at the depth you need, and an account with logging disabled is an account where a breach leaves no trace. The first thing you do in any cloud account you are responsible for defending is confirm the audit log is on, comprehensive, and protected.

On AWS, that log is CloudTrail — the service that records every API call made in an AWS account (who, what, when, from which IP, with which credentials, and the result) into a tamper-evident log. Azure provides Activity Log and Entra ID sign-in/audit logs; GCP provides Cloud Audit Logs. They are conceptually identical: a chronological record of every control-plane action. A single CloudTrail event looks (simplified) like this:

{
  "eventTime": "2026-06-14T14:22:07Z",
  "eventName": "PutBucketAcl",
  "eventSource": "s3.amazonaws.com",
  "userIdentity": {
    "type": "IAMUser",
    "userName": "contractor-batch",
    "accountId": "123456789012"
  },
  "sourceIPAddress": "203.0.113.77",
  "requestParameters": {
    "bucketName": "meridian-loan-docs",
    "x-amz-acl": "public-read"
  },
  "responseElements": null,
  "errorCode": null
}

Read it like a defender. The identity contractor-batch just called PutBucketAcl on meridian-loan-docs with x-amz-acl: public-read — this is the exact moment a bucket of loan documents became readable by the entire internet, captured with the who (contractor-batch), the what (PutBucketAcl ... public-read), the when (14:22:07Z), and the from-where (203.0.113.77). With CloudTrail flowing into a SIEM (Chapter 21) or a cloud-native detection service, a rule matching "PutBucketAcl or PutBucketPolicy that results in public access" fires an alert the instant the bucket turns public — converting the eighteen-month silent exposure into a five-minute incident. The same log captures the IAM policy change that grants *, the security-group rule that opens 0.0.0.0/0, the login from an unfamiliar country, the disabling of logging itself, the creation of a new admin user. The events that precede every cloud breach are in the log. Detection is a matter of collecting it and writing the rules.

What to actually watch for — the cloud detections every SOC should have:

Detection	Why it matters	Key event(s)
A bucket becomes public	The #1 cloud data-exposure vector	`PutBucketAcl`, `PutBucketPolicy` (public)
Security group opened to `0.0.0.0/0`	Exposes a service to the whole internet	`AuthorizeSecurityGroupIngress`
IAM policy granting `*` created/attached	Privilege escalation; over-broad access	`PutUserPolicy`, `AttachUserPolicy`, `CreatePolicyVersion`
Logging disabled	Attacker covering tracks — a top-priority alert	`StopLogging`, `DeleteTrail`
Root account used	Root should almost never act after setup	`userIdentity.type = "Root"`
New access key created for a user	Possible persistence by an attacker	`CreateAccessKey`
Login from a new region / impossible travel	Possible credential compromise	Console/sign-in events with geolocation

A few of these deserve emphasis. Disabling logging (StopLogging, DeleteTrail) is one of the highest-fidelity alerts you can have, because there is almost no legitimate reason to turn off the audit log, and an attacker who has gained access frequently tries to — so an alert on it catches an intrusion in progress. Root-account usage should be near-zero after initial setup (you operate through least-privilege IAM roles, not root), so any root action is worth a look. And to make the log trustworthy as evidence, you must protect the log itself: deliver CloudTrail to a separate, locked-down account or bucket with object-lock so that an attacker who compromises the main account cannot delete the trail of what they did. A log an attacker can erase is not evidence; it is theater.

This is the same conviction you have carried since Chapter 1 — logs are the ground truth — applied to a place where the logging is unusually complete, and it is where the theme that security is a process, not a product surfaces in the cloud: a logging service you buy but never turn on, never make comprehensive, and never protect from deletion is no security at all. The cloud control plane records the API calls that are the attacker's actions: there is no equivalent on-premises record of "someone changed a firewall rule" without effort, but in the cloud it is one event in CloudTrail. The mechanics of collecting and correlating these logs at scale are Chapter 21's SIEM material; here, the point is simpler and prior — turn the logging on, make it comprehensive, protect it from deletion, and you have given yourself the evidence every cloud investigation depends on.

🔄 Check Your Understanding: 1. Why is an alert on StopLogging / DeleteTrail one of the highest-value cloud detections, despite being one of the simplest? 2. You see a CloudTrail event: PutBucketAcl on a customer-data bucket with x-amz-acl: public-read, made by an unfamiliar IAM user from an unusual IP. List the three things this single event tells you and the immediate action.

Answers

Because there is almost no legitimate reason to disable the audit log, so the event is a high-fidelity indicator that an attacker who has gained access is trying to cover their tracks — it catches an intrusion in progress, and the alert is meaningful precisely because false positives are rare. 2. It tells you (a) what happened — a bucket just became publicly readable; (b) who/what did it — the named IAM user; and (c) from where — the source IP, which is unfamiliar, suggesting possible credential compromise. Immediate action: revert the bucket to private (or rely on Block Public Access if enabled), then investigate/disable the credential and scope what was exposed.

Project Checkpoint

This chapter adds Meridian's cloud security baseline to the program document and the cloudpost.py module to bluekit. As always, the code is illustrative and never executed during authoring — every example shows hand-traced expected output in a comment.

Program increment — the cloud security baseline. Before the review (Case Study 1), Meridian's AWS footprint had grown organically: teams created accounts and buckets ad hoc, IAM policies accreted wildcards, and nobody could answer "is anything public?" Sam Whitfield, the security engineer, drafts a one-page cloud security baseline that becomes Meridian's standard for every AWS account. Its core requirements: (1) S3 Block Public Access enabled account-wide — no public buckets, as a guardrail, not a guideline; (2) CloudTrail enabled in all regions, delivered to a locked-down logging account with object-lock; (3) least-privilege IAM — no "Action": "*" policies, MFA required for all human users, long-lived access keys forbidden in favor of roles; (4) no security-group rule may open a sensitive port to 0.0.0.0/0, enforced by a service control policy; (5) encryption at rest enabled by default on storage and databases; (6) a CSPM tool scanning all accounts continuously against the CIS AWS Benchmark, with public-exposure and logging-disabled findings wired to the SOC. This baseline plugs into the program's network and identity standards (Chapters 6–7, 16–18) and is the artifact Meridian presents to its PCI-DSS assessor to evidence the customer side of the shared-responsibility line.

bluekit increment — cloudpost.py. Two functions encode the two most important cloud checks of the chapter: is a bucket public, and is an IAM policy over-broad? They are deliberately small — the value is that a defender can run them across every bucket and policy in an account in seconds.

# bluekit/cloudpost.py  — Chapter 15 increment
"""Cloud posture checks: the two findings that cause most cloud breaches.

s3_public(acl)      -> True if a bucket ACL grants public access.
iam_overbroad(policy) -> True if an IAM policy grants wildcard action/resource.
Inputs are parsed dicts (e.g., from boto3); we never call a live cloud here.
"""

PUBLIC_GRANTEES = {"AllUsers", "AuthenticatedUsers"}  # AWS "everyone" / "any AWS account"

def s3_public(acl: dict) -> bool:
    """Return True if the bucket ACL grants READ/WRITE to a public grantee."""
    for grant in acl.get("Grants", []):
        grantee = grant.get("Grantee", {}).get("URI", "")
        # AWS encodes public grantees as .../groups/global/AllUsers etc.
        if any(g in grantee for g in PUBLIC_GRANTEES):
            return True
    return False


def iam_overbroad(policy: dict) -> bool:
    """Return True if any Allow statement uses a wildcard action AND resource."""
    statements = policy.get("Statement", [])
    if isinstance(statements, dict):       # a single-statement policy is a dict
        statements = [statements]
    for st in statements:
        if st.get("Effect") != "Allow":
            continue
        actions = st.get("Action", [])
        resources = st.get("Resource", [])
        actions = [actions] if isinstance(actions, str) else actions
        resources = [resources] if isinstance(resources, str) else resources
        if "*" in actions and "*" in resources:
            return True
    return False


if __name__ == "__main__":
    public_acl = {"Grants": [
        {"Grantee": {"URI": "http://acs.amazonaws.com/groups/global/AllUsers"},
         "Permission": "READ"}]}
    private_acl = {"Grants": [
        {"Grantee": {"ID": "ownerid"}, "Permission": "FULL_CONTROL"}]}
    admin_policy = {"Statement": [{"Effect": "Allow", "Action": "*", "Resource": "*"}]}
    scoped_policy = {"Statement": [{"Effect": "Allow",
        "Action": ["s3:GetObject"], "Resource": ["arn:aws:s3:::meridian-loan-docs/*"]}]}

    print("public bucket ACL   ->", s3_public(public_acl))
    print("private bucket ACL  ->", s3_public(private_acl))
    print("admin (*) policy    ->", iam_overbroad(admin_policy))
    print("scoped policy       ->", iam_overbroad(scoped_policy))

# Expected output:
# public bucket ACL   -> True
# private bucket ACL  -> False
# admin (*) policy    -> True
# scoped policy       -> False

Trace it by hand to be sure. For public_acl, the single grant's grantee URI contains AllUsers, so s3_public returns True. For private_acl, the grantee has no URI (only an ID), the .get("URI", "") yields "", no public grantee matches, so it returns False. For admin_policy, the lone Allow statement has Action and Resource both equal to the string "*", each is wrapped into a one-element list, both contain "*", so iam_overbroad returns True. For scoped_policy, the action is s3:GetObject and the resource is a specific ARN — neither list contains "*" — so it returns False. Two tiny functions, run across every bucket and policy in an account, find the two misconfigurations behind most cloud breaches. You have written the cloud-posture core of Meridian's defense.

Summary

This chapter relocated security from a place you own entirely to a place you share — and showed where your half of the work actually is.

The shared responsibility model divides security between provider and customer. The provider secures of the cloud (hardware, hypervisor, facilities); you secure in the cloud (configuration, data, identity). Misreading the line — especially treating a shared duty like encryption as the provider's — is the root governance error.
IaaS → PaaS → SaaS is a ladder: as you move up, the provider takes over more layers (OS, runtime, the application itself) and your share shrinks — but data and identity are always yours, in every model, which is why most cloud breaches live in exactly those two layers.
Identity is the new perimeter. Every cloud action is an authenticated API call evaluated by cloud IAM, not gated by network location. Design every policy with least privilege, assuming the credential will leak; prefer roles and short-lived credentials over long-lived access keys; require MFA via policy conditions. The "Action": "*" / "Resource": "*" policy is the canonical, dangerous anti-pattern.
The misconfiguration epidemic is the leading cause of cloud breaches: public object storage (an ACL granting AllUsers READ), security groups open to 0.0.0.0/0 on sensitive ports, and exposed instance metadata (169.254.169.254) that turns an SSRF bug plus an over-broad role into total compromise. Prevent each with a guardrail, default to secure, and alert on the change.
CSPM detects misconfigurations continuously (the always-yours configuration layers); CWPP protects running workloads (VMs, containers, serverless); guardrails prevent dangerous configurations structurally. The hierarchy: prevent with guardrails, detect with CSPM, protect workloads with CWPP — defense in depth on the control plane. Guardrails beat gates because they let engineers move fast while making the unsafe action unreachable.
Cloud logging (AWS CloudTrail; Azure Activity/Entra logs; GCP Cloud Audit Logs) records every API call — the cloud's gift to defenders. Turn it on in all regions, make it comprehensive, and protect it from deletion. The events that precede every cloud breach (a bucket turning public, a * policy, a 0.0.0.0/0 rule, logging being disabled, root usage) are in the log; detection is collecting it and writing the rules.

Spaced Review

Test yourself on earlier material without scrolling back — these revisit Chapters 11 and 5.

(Ch. 11) In an IaaS deployment, you launch a virtual machine and never apply OS patches. Under the shared-responsibility model, whose problem is the unpatched OS, and which Chapter 11 concept (the standard you compare a system against) tells you what "patched and hardened" should mean for it?
(Ch. 11) Name two host-hardening practices from Chapter 11 that apply unchanged to a cloud IaaS instance, and one cloud-specific control from this chapter that has no on-premises equivalent.
(Ch. 5) This chapter said encryption at rest is a shared responsibility — the provider supplies the capability, you must enable it. Recall from Chapter 5 the distinction between data at rest and in transit, and name which one TLS to an S3 bucket protects.

Answers

1. The unpatched OS is entirely the *customer's* problem — in IaaS the operating system is on your side of the line. The Chapter 11 concept is the **baseline configuration / CIS Benchmark**: it defines the secure target state (services disabled, patches applied, configuration hardened) you audit the instance against — the same `audit_baseline` logic, now on a rented instance. 2. Examples: applying CIS-benchmark hardening, disabling unneeded services, patch management, host-based firewall, EDR/endpoint protection — any of these transfers unchanged. A cloud-specific control with no on-prem equivalent: S3 Block Public Access (a guardrail against public object storage), or a service control policy denying `0.0.0.0/0` security-group rules, or instance-metadata hardening (IMDSv2). 3. *At rest* is data stored on disk; *in transit* is data moving over a network. TLS to an S3 bucket protects data **in transit** (the connection between client and S3); encryption at rest is the separate setting that protects the stored objects on disk.

What's Next

You have seen that in the cloud, identity is the perimeter and a leaked credential is a key to the front door — and you have met the problem of non-human identities (the roles and service accounts that make most cloud API calls) without yet solving it. Chapter 20 takes that on directly: how to manage secrets and machine identity — the service accounts, API keys, and certificates that authenticate workloads to each other — including how to find the long-lived access key someone checked into a repository, the exact finding that turns the cloud IAM of this chapter into a breach. Before that, Part IV builds the human side of identity that the cloud assumes: authentication, authorization, and governance. And the guardrails-versus-gates idea you met in §15.5 returns in force in Chapter 31, where security moves left into the CI/CD pipeline, and in Chapter 32, where the "identity is the perimeter" conviction becomes a full zero-trust architecture. The cloud did not just move your servers; it moved your security boundary onto identity and configuration — and the rest of the book is largely about defending that boundary well.