Test Flakiness History and Quarantine for Agents
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Flaky-test history tells agents whether a failing check is a new regression, a known nondeterministic test, or a quarantined signal that should not be ignored forever. ## Core Explanation CI failures are not all equal. A deterministic failure on the first run may point to a patch regression. A test that fails and then passes on retry may indicate timing, isolation, data, or environment instability. A quarantined or expected-failure test may already be known, but it still needs ownership and an expiry path. Useful evidence includes test ID, file path, retry count, first-fail timestamp, pass-on-retry status, historical failure rate, runner image, random seed, quarantine marker, xfail reason, linked issue, owner, and last successful non-quarantined run. Without these fields, an agent may either overreact to a known flaky test or dismiss a real regression as "probably flaky." Agents should avoid using retries as proof of correctness. Retry and quarantine metadata are diagnostic evidence, not a substitute for fixing nondeterminism or preserving meaningful CI gates. ## Source-Mapped Facts - Playwright documentation classifies a test that fails initially but passes on retry as flaky. ([source](https://playwright.dev/docs/test-retries)) - GitLab documentation describes quarantining tests that are failing due to non-deterministic behavior. ([source](https://docs.gitlab.com/development/testing_guide/quarantining_tests/)) - pytest documentation describes xfail as marking tests that are expected to fail. ([source](https://docs.pytest.org/en/stable/how-to/skipping.html)) ## Further Reading - [Playwright Test Retries](https://playwright.dev/docs/test-retries) - [GitLab Quarantining Tests](https://docs.gitlab.com/development/testing_guide/quarantining_tests/) - [pytest Skip and xfail](https://docs.pytest.org/en/stable/how-to/skipping.html)