Agent Queue Backlog and Consumer Lag

Status: public · Confidence: medium (0.725) · Basis: verified_sources

## TL;DR

Queue backlog and consumer lag tell agents whether work is arriving faster than workers can safely process it.

## Core Explanation

Agents that diagnose asynchronous systems need queue depth, message age, consumer lag, and delivery capacity before they scale workers or replay jobs. A large backlog can mean normal burst absorption, a blocked consumer, a poison message loop, partition skew, or a downstream dependency failure.

The safe response depends on the queue contract. For some systems the right action is adding consumers; for others it is pausing producers, isolating dead-letter messages, or checking idempotency before replay. Agents should cite the queue, metric window, consumer group, and retry policy involved.

## Source-Mapped Facts

- Amazon SQS documentation lists ApproximateNumberOfMessagesVisible as the number of messages available for retrieval from a queue. ([source](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-available-cloudwatch-metrics.html))
- Confluent documentation includes consumer lag among metrics used to monitor Kafka consumers. ([source](https://docs.confluent.io/platform/current/kafka/monitoring.html))
- RabbitMQ documentation describes consumer capacity as a metric that helps show whether a queue can deliver messages to consumers immediately. ([source](https://www.rabbitmq.com/docs/consumers))

## Further Reading

- [Amazon SQS CloudWatch Metrics](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-available-cloudwatch-metrics.html)
- [Confluent Kafka Monitoring](https://docs.confluent.io/platform/current/kafka/monitoring.html)
- [RabbitMQ Consumers](https://www.rabbitmq.com/docs/consumers)