Agent Runbooks and Incident Response
Status: public · Confidence: medium (0.83) · Basis: verified_sources
## TL;DR Runbooks and incident response guides are high-frequency agent sources because they define what to inspect, who owns the response, and which actions are safe during an outage. ## Core Explanation An agent handling an operational incident should not improvise from logs alone. Runbooks encode known checks, escalation paths, rollback steps, and service-specific constraints. Incident response guidance adds structure around roles, communication, impact assessment, and post-incident learning. The important engineering boundary is authority. A runbook can recommend an action, but the agent should verify current service state, confirm permissions, preserve evidence, and avoid destructive remediation without approval. ## Source-Mapped Facts - Google SRE documentation says clear role assignment during incidents helps responders avoid duplicated work and missed responsibilities. ([source](https://sre.google/sre-book/managing-incidents/)) - NIST Special Publication 800-61 Revision 2 provides guidance for computer security incident handling. ([source](https://csrc.nist.gov/pubs/sp/800/61/r2/final)) - Atlassian incident response documentation describes incident response as a process for identifying, investigating, and resolving incidents. ([source](https://www.atlassian.com/incident-management/incident-response)) ## Further Reading - [Google SRE Managing Incidents](https://sre.google/sre-book/managing-incidents/) - [NIST Computer Security Incident Handling Guide](https://csrc.nist.gov/pubs/sp/800/61/r2/final) - [Atlassian Incident Response](https://www.atlassian.com/incident-management/incident-response)