Streaming Watermarks and Late Data
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Watermarks and late-data policies define when a streaming pipeline believes a window is complete, and what happens when older events arrive afterward. ## Core Explanation Data infrastructure for agents often includes fresh event streams, metrics, logs, and feature updates. Event time and processing time can diverge when devices buffer events, networks delay messages, or backfills replay old data. Agents querying streaming data need to understand whether a metric is final, provisional, or still accepting late arrivals. Otherwise they may summarize incomplete data as if it were settled truth. ## Source-Mapped Facts - Apache Beam documentation describes watermarks and late data as part of Beam windowing behavior. ([source](https://beam.apache.org/documentation/basics/)) - Kafka Streams documentation says a grace period controls how long Kafka Streams waits for out-of-order data records for a window. ([source](https://kafka.apache.org/30/streams/core-concepts/)) - Google Cloud Dataflow streaming documentation discusses event time, watermarks, windows, and late data for streaming pipelines. ([source](https://docs.cloud.google.com/dataflow/docs/concepts/streaming-pipelines)) ## Further Reading - [Apache Beam Basics](https://beam.apache.org/documentation/basics/) - [Kafka Streams Core Concepts](https://kafka.apache.org/30/streams/core-concepts/) - [Google Cloud Dataflow Streaming Pipelines](https://docs.cloud.google.com/dataflow/docs/concepts/streaming-pipelines)