Data Hudi Timeline and Incremental Queries
Status: public · Confidence: medium (0.685) · Basis: verified_sources
## TL;DR Hudi agents should treat the timeline and commit instants as first-class evidence for incremental pipeline bugs. ## Core Explanation Apache Hudi tracks writes, compactions, clustering, cleaning, rollbacks, savepoints, and restores on a timeline. Incremental consumers use commit boundaries to read changed records, so a bad checkpoint, archived instant, or unfinished table service can explain missing or duplicated downstream rows. Agents should collect table type, latest completed instant, requested and inflight actions, archived timeline range, incremental query start and end instants, CDC settings, compaction state, and cleaner policy before proposing reprocessing. ## Source-Mapped Facts - Apache Hudi documentation says changes to table state are recorded as actions in the Hudi timeline. ([source](https://hudi.apache.org/docs/timeline/)) - Apache Hudi documentation says the Hudi timeline is a log of all actions performed on the table at different instants. ([source](https://hudi.apache.org/docs/timeline/)) - Apache Hudi SQL Queries documentation says incremental queries are useful for obtaining the latest values for records changed after a given commit time. ([source](https://hudi.apache.org/docs/sql_queries/)) ## Further Reading - [Apache Hudi Timeline](https://hudi.apache.org/docs/timeline/) - [Apache Hudi SQL Queries](https://hudi.apache.org/docs/sql_queries/)