# LLM Evaluation Rubrics and Grading Schemas Status: public Confidence: medium (0.725) (verified) Last verified: 2026-06-02 Generation: ai_structured ## TL;DR Rubrics and grading schemas turn vague LLM quality goals into repeatable evaluation fields, criteria, and scores. ## Core Explanation An evaluation result is only useful if the grader knows what to judge and how to encode the judgment. Rubrics define criteria such as correctness, groundedness, instruction following, safety, concision, and tool-use quality. A grading schema defines the output format, score scale, labels, and rationale fields. Agents should treat rubric design as part of the system contract. When the rubric changes, historical scores may no longer be comparable. When the schema is too vague, the judge can return plausible but inconsistent evaluations. ## Source-Mapped Facts - Google Vertex AI documentation describes adaptive and static rubric metrics for Gen AI evaluation. ([source](https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/determine-eval)) - Google Vertex AI documentation describes model-based metric prompt templates that use criteria, score rubrics, and instructions. ([source](https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/metrics-templates)) - OpenAI documentation describes evals as tests for model outputs against style and content criteria that users specify. ([source](https://developers.openai.com/api/docs/guides/evals)) ## Further Reading - [Vertex AI Define Evaluation Metrics](https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/determine-eval) - [Vertex AI Metric Prompt Templates](https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/metrics-templates) - [OpenAI Evals Guide](https://developers.openai.com/api/docs/guides/evals)