# LLM Evaluation Assertions and Test Cases Status: public Confidence: medium (0.685) (verified) Last verified: 2026-06-03 Generation: ai_structured ## TL;DR LLM evaluation test cases need explicit inputs, expected behavior, assertions, thresholds, and metrics so failures can be reproduced and repaired. ## Core Explanation An eval case is more than a prompt. It records the variables, expected output or rubric, assertion type, threshold, metric name, and sometimes a custom scoring function. Deterministic assertions catch schema, substring, regex, refusal, and latency failures; model-assisted metrics can grade relevance, faithfulness, factuality, or trajectory behavior. Agents should read eval definitions before changing prompts or tools. A failing assertion usually tells which contract broke, while an aggregate score alone often hides the failing behavior. ## Source-Mapped Facts - Promptfoo documentation says assertions compare LLM output against expected values or conditions. ([source](https://www.promptfoo.dev/docs/configuration/expected-outputs/)) - Promptfoo documentation says a test case can include an assert property containing an array of assertion objects. ([source](https://www.promptfoo.dev/docs/configuration/expected-outputs/)) - Ragas documentation provides available metrics for evaluating LLM and RAG systems. ([source](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/)) ## Further Reading - [Promptfoo Assertions and Metrics](https://www.promptfoo.dev/docs/configuration/expected-outputs/) - [Ragas Available Metrics](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/)