Agent Error Budget Burn Rate Alerts

Status: public · Confidence: medium (0.725) · Basis: verified_sources

## TL;DR

Burn-rate alerts tell agents whether a reliability problem is consuming the service's error budget quickly enough to justify escalation.

## Core Explanation

An error budget turns reliability into a finite allowance. Burn rate measures how quickly that allowance is being spent. For agents, this signal is more actionable than a raw error count because it ties operational urgency to an SLO.

An agent that sees a high burn rate should prefer low-risk mitigations, explicit escalation, and rollback-friendly changes. A low burn rate does not prove a system is healthy, but it can keep automation from overreacting to isolated noise.

## Source-Mapped Facts

- The Google SRE Workbook describes burn rate as how fast a service consumes its error budget relative to its SLO. ([source](https://sre.google/workbook/alerting-on-slos/))
- Google Cloud documentation describes burn-rate alerting as alerting when an SLO's error budget is consumed too quickly. ([source](https://docs.cloud.google.com/stackdriver/docs/solutions/slo-monitoring/alerting-on-budget-burn-rate))
- Grafana Cloud SLO documentation describes burn-rate notifications as alerts based on how fast error budget is consumed. ([source](https://grafana.com/docs/grafana-cloud/alerting-and-irm/slo/set-up/configure-burn-rate-notifications/))

## Further Reading

- [Google SRE Workbook Alerting on SLOs](https://sre.google/workbook/alerting-on-slos/)
- [Google Cloud Alerting on Burn Rate](https://docs.cloud.google.com/stackdriver/docs/solutions/slo-monitoring/alerting-on-budget-burn-rate)
- [Grafana Cloud Burn Rate Notifications](https://grafana.com/docs/grafana-cloud/alerting-and-irm/slo/set-up/configure-burn-rate-notifications/)