Alert when users experience errors, not when CPU is high (CPU might be high and everything might be fine) - **Set meaningful thresholds** — Base thresholds on historical data and SLO (Service Level Objective) requirements - **Avoid alert fatigue** — Too many alerts lead to people ignoring them; ever