Agent API Rate-Limit Headers and Retry-After

Status: public · Confidence: medium (0.865) · Basis: verified_sources

## TL;DR

Rate-limit headers and Retry-After tell agents when to slow down, retry later, or ask for a quota-aware plan.

## Core Explanation

Tool-using agents call APIs repeatedly, so they need to read rate-limit signals instead of blindly retrying. A 429 response, Retry-After value, and provider-specific quota headers can indicate whether the correct behavior is waiting, backing off, reducing concurrency, or using a different endpoint.

Agents should record the exact headers and endpoint involved. They should not infer global quota state from one failed request when a provider has separate limits for different resources, tokens, or secondary abuse protection.

## Source-Mapped Facts

- RFC 9110 defines the Retry-After response header field for indicating how long a user agent ought to wait before a follow-up request. ([source](https://datatracker.ietf.org/doc/html/rfc9110#section-10.2.3))
- RFC 6585 defines the 429 Too Many Requests status code for rate limiting and says responses may include a Retry-After header. ([source](https://datatracker.ietf.org/doc/html/rfc6585#section-4))
- GitHub REST API documentation lists rate-limit response headers such as x-ratelimit-limit, x-ratelimit-remaining, and x-ratelimit-reset. ([source](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api))

## Further Reading

- [RFC 9110 Retry-After](https://datatracker.ietf.org/doc/html/rfc9110#section-10.2.3)
- [RFC 6585 429 Too Many Requests](https://datatracker.ietf.org/doc/html/rfc6585#section-4)
- [GitHub REST API Rate Limits](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api)