LLM Production Quality Monitoring and Drift

Status: public · Confidence: medium (0.725) · Basis: verified_sources

## TL;DR

LLM production monitoring checks whether quality, safety, latency, and traffic patterns drift after deployment.

## Core Explanation

Offline evals do not cover every production input. Production monitoring adds ongoing checks for distribution changes, quality regressions, safety failures, user feedback, and operational behavior.

Agents should connect a production-quality alert back to examples and traces. A drift alert without sampled inputs, evaluator scores, time windows, and deployment context is not enough to justify a rollback or prompt rewrite.

## Source-Mapped Facts

- Evidently documentation describes monitoring as tracking data and model quality over time. ([source](https://docs.evidentlyai.com/docs/platform/monitoring_overview))
- Azure AI Foundry documentation describes monitoring deployed generative AI applications. ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/online-evaluation))
- LangSmith documentation lists online evaluation among evaluation types for LLM applications. ([source](https://docs.langchain.com/langsmith/evaluation-types))

## Further Reading

- [Evidently Monitoring Overview](https://docs.evidentlyai.com/docs/platform/monitoring_overview)
- [Azure AI Foundry Monitor Applications](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/online-evaluation)
- [LangSmith Evaluation Types](https://docs.langchain.com/langsmith/evaluation-types)