ML Experiment Tracking
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR ML experiment tracking records runs, parameters, metrics, artifacts, code versions, and datasets so teams can compare model behavior and reproduce results. ## Core Explanation Modern ML and LLM systems change across prompts, models, retrieval indexes, fine-tuning data, evaluation sets, and runtime settings. Experiment tracking gives those changes durable identifiers and comparable metrics. For LLM evaluation and agent engineering, tracking is the bridge between a one-off result and a repeatable quality loop. It lets teams answer which model, prompt, dataset, retrieval setup, and grader produced a score. ## Source-Mapped Facts - TensorFlow documentation says TensorBoard provides measurements and visualizations for machine learning workflows and enables tracking experiment metrics such as loss and accuracy. ([source](https://www.tensorflow.org/tensorboard/get_started)) - Weights and Biases documentation says experiment tracking workflows create a run, store hyperparameters in configuration, log metrics over time, and save run outputs. ([source](https://docs.wandb.ai/models/track)) - Google Cloud documentation says Vertex AI Experiments tracks experiment steps, inputs such as parameters and datasets, and outputs such as models, checkpoints, and metrics. ([source](https://docs.cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments)) ## Further Reading - [TensorBoard getting started](https://www.tensorflow.org/tensorboard/get_started) - [Weights and Biases experiments](https://docs.wandb.ai/models/track) - [Vertex AI Experiments](https://docs.cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments)