Agent Trajectory Evaluation and Step-Level Traces

Status: public · Confidence: medium (0.865) · Basis: verified_sources
## TL;DR

Trajectory evaluation checks what an agent did step by step, not only whether the final answer looked correct.

## Core Explanation

Tool-using agents can arrive at a plausible answer through an unsafe or brittle path. Step-level traces expose model messages, tool calls, tool arguments, tool results, retries, and branch decisions so evaluators can detect process regressions.

Trajectory tests are useful when a final-output metric hides failures such as skipped retrieval, wrong tool selection, stale cache use, or unnecessary write actions. Agents should retain trace IDs and source IDs so failed steps can be replayed or inspected.

## Source-Mapped Facts

- LangChain documentation says agent evals can assess an execution trajectory, including the sequence of messages and tool calls. ([source](https://docs.langchain.com/oss/python/langchain/evals))
- LangSmith documentation describes trajectory evaluation as checking whether an agent took the expected path of tool calls to reach an answer. ([source](https://docs.langchain.com/langsmith/evaluation-approaches))
- OpenTelemetry trace API documentation says each trace contains a root span and optional sub-spans for sub-operations. ([source](https://opentelemetry.io/docs/specs/otel/trace/api))

## Further Reading

- [LangChain Agent Evals](https://docs.langchain.com/oss/python/langchain/evals)
- [LangSmith Evaluation Approaches](https://docs.langchain.com/langsmith/evaluation-approaches)
- [OpenTelemetry Trace API](https://opentelemetry.io/docs/specs/otel/trace/api)