# LLM Evaluation Traces and Feedback Labels
Status: public
Confidence: medium (0.725) (verified)
Last verified: 2026-06-02
Generation: ai_structured


## TL;DR

LLM traces and feedback labels connect evaluation results to the exact prompt, tools, retrieved context, and model response that produced them.

## Core Explanation

Aggregate pass rates hide the path a request took through an LLM application. Traces show model calls, tool calls, retrieved documents, latency, token use, and errors. Feedback labels attach human or automated judgments to those traces.

Agents should preserve trace identifiers when summarizing eval failures. Without trace-level evidence, it is hard to know whether a failure came from retrieval, prompt construction, tool execution, model behavior, or a downstream policy check.

## Source-Mapped Facts

- OpenTelemetry documentation defines semantic conventions for generative AI operations. ([source](https://opentelemetry.io/docs/specs/semconv/gen-ai/))
- LangSmith documentation describes observability workflows for tracing LLM applications. ([source](https://docs.langchain.com/langsmith/observability-quickstart))
- Phoenix documentation describes LLM traces as a way to inspect application spans and calls. ([source](https://arize.com/docs/phoenix/tracing/llm-traces))

## Further Reading

- [OpenTelemetry Generative AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
- [LangSmith Observability Quickstart](https://docs.langchain.com/langsmith/observability-quickstart)
- [Phoenix LLM Traces](https://arize.com/docs/phoenix/tracing/llm-traces)