# Agent Tool Rate Limits and Quotas Status: public Confidence: medium (0.725) (verified) Last verified: 2026-06-02 Generation: ai_structured ## TL;DR Agent tool rate limits and quotas define how many external operations an agent can perform before an API throttles, delays, rejects, or requires backoff. ## Core Explanation Tool-using agents can issue many calls quickly: search, list, fetch, patch, rerun, or retry. Without quota awareness, an agent can exhaust a user token, trigger secondary abuse limits, or create noisy failures that look like task failures. Production agents should track per-provider budgets, expose remaining quota in traces, prefer batching where supported, and treat retry-after or reset metadata as hard scheduling constraints rather than model suggestions. ## Source-Mapped Facts - GitHub REST API documentation says GitHub limits the number of REST API requests a client can make within a specific amount of time. ([source](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api)) - Stripe documentation says a concurrency limiter restricts the number of concurrent active requests. ([source](https://docs.stripe.com/rate-limits)) - Google Cloud documentation says quotas help ensure fairness and reduce spikes in resource use and availability. ([source](https://cloud.google.com/docs/quotas/overview)) ## Further Reading - [GitHub REST API rate limits](https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api) - [Stripe rate limits](https://docs.stripe.com/rate-limits) - [Google Cloud quotas](https://cloud.google.com/docs/quotas/overview)