Agent Token Budgeting and Context Accounting

Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR

Token budgeting is the agent engineering practice of estimating prompt, context, tool schema, retrieval, and output size before a model call.

## Core Explanation

Agents need context accounting because model calls combine system prompts, conversation history, retrieved evidence, tool schemas, tool results, and generated output. Without a budget, an agent can silently truncate important context or spend too much on low-value tokens.

Good budgeting happens before the call. The agent estimates input and output size, reserves room for tool calls or citations, compresses or ranks context, and records which material was omitted. Provider-specific token counting remains necessary because tokenization differs by model family.

## Source-Mapped Facts

- OpenAI Cookbook documentation describes tiktoken as a fast BPE tokenizer for use with OpenAI models. ([source](https://developers.openai.com/cookbook/examples/how_to_count_tokens_with_tiktoken))
- Anthropic documentation provides a token counting API for estimating token usage without creating a message. ([source](https://platform.claude.com/docs/en/build-with-claude/token-counting))
- Google Gemini API documentation says tokens can be single characters or whole words depending on the language and tokenization model. ([source](https://ai.google.dev/gemini-api/docs/tokens))

## Further Reading

- [OpenAI Cookbook Token Counting](https://developers.openai.com/cookbook/examples/how_to_count_tokens_with_tiktoken)
- [Anthropic Token Counting](https://platform.claude.com/docs/en/build-with-claude/token-counting)
- [Gemini API Token Guide](https://ai.google.dev/gemini-api/docs/tokens)