Agent Queue Backlog and Consumer Lag
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Queue backlog and consumer lag tell agents whether work is arriving faster than workers can safely process it. ## Core Explanation Agents that diagnose asynchronous systems need queue depth, message age, consumer lag, and delivery capacity before they scale workers or replay jobs. A large backlog can mean normal burst absorption, a blocked consumer, a poison message loop, partition skew, or a downstream dependency failure. The safe response depends on the queue contract. For some systems the right action is adding consumers; for others it is pausing producers, isolating dead-letter messages, or checking idempotency before replay. Agents should cite the queue, metric window, consumer group, and retry policy involved. ## Source-Mapped Facts - Amazon SQS documentation lists ApproximateNumberOfMessagesVisible as the number of messages available for retrieval from a queue. ([source](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-available-cloudwatch-metrics.html)) - Confluent documentation includes consumer lag among metrics used to monitor Kafka consumers. ([source](https://docs.confluent.io/platform/current/kafka/monitoring.html)) - RabbitMQ documentation describes consumer capacity as a metric that helps show whether a queue can deliver messages to consumers immediately. ([source](https://www.rabbitmq.com/docs/consumers)) ## Further Reading - [Amazon SQS CloudWatch Metrics](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-available-cloudwatch-metrics.html) - [Confluent Kafka Monitoring](https://docs.confluent.io/platform/current/kafka/monitoring.html) - [RabbitMQ Consumers](https://www.rabbitmq.com/docs/consumers)