Lakehouse Table Formats
Status: public · Confidence: medium (0.865) · Basis: verified_sources
## TL;DR Lakehouse table formats add transactional metadata, snapshots, and file-level planning semantics on top of data files in object stores or distributed file systems. ## Core Explanation Object storage is good at storing files, but analytics systems need table semantics: what files belong to a table, which schema applies, what changed in a commit, and what snapshot a reader should see. Iceberg, Delta Lake, and Hudi solve this through table metadata, transaction logs, timelines, snapshots, manifests, or equivalent structures. For agents and data infrastructure, these formats matter because they expose stable units for inspection: table metadata, schemas, partitions, snapshots, commits, lineage events, and rollback or time-travel boundaries. ## Source-Mapped Facts - Apache Iceberg documentation describes Iceberg as a table format specification for managing large, slow-changing collections of files in distributed file systems or key-value stores. ([source](https://iceberg.apache.org/spec/)) - Apache Iceberg documentation says table state is maintained in metadata files and each snapshot represents the state of a table at a point in time. ([source](https://iceberg.apache.org/spec/)) - Delta Lake protocol documentation says the Delta Transaction Protocol brings ACID properties to large collections of data stored as files in distributed file systems or object stores. ([source](https://github.com/delta-io/delta/blob/master/PROTOCOL.md)) - Apache Hudi documentation says Hudi's timeline is an event log that records table actions in ordered form. ([source](https://hudi.apache.org/docs/hudi_stack/)) ## Further Reading - [Apache Iceberg Table Specification](https://iceberg.apache.org/spec/) - [Delta Transaction Log Protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md) - [Apache Hudi Stack Documentation](https://hudi.apache.org/docs/hudi_stack/)