# Data Catalogs and Metadata Lineage Status: public Confidence: medium (0.865) (verified) Last verified: 2026-06-02 Generation: ai_structured ## TL;DR Data catalogs help people and agents discover data assets, while metadata lineage records how datasets are produced, transformed, and consumed. ## Core Explanation A useful catalog combines business metadata, technical metadata, ownership, quality signals, and lineage. Lineage adds the graph structure: jobs consume input datasets, produce output datasets, and emit metadata about runs, schemas, ownership, and versions. For AI and analytics systems, lineage matters because it answers impact questions before a change ships: which downstream tables, dashboards, models, or agents depend on this dataset? ## Source-Mapped Facts - OpenLineage documentation says its object model contains Jobs and Datasets and is designed to observe datasets as they move through complex pipelines. ([source](https://openlineage.io/docs/spec/object-model/)) - OpenLineage documentation says a lineage graph can be created by weaving together observations of jobs across multiple platforms. ([source](https://openlineage.io/docs/spec/object-model/)) - DataHub lineage documentation defines data lineage as a map showing how data flows through an organization, including where data originates, travels, and ends up. ([source](https://docs.datahub.com/docs/features/feature-guides/lineage)) - The W3C PROV overview defines provenance as information about entities, activities, and people involved in producing a piece of data or thing. ([source](https://www.w3.org/TR/prov-overview/)) ## Further Reading - [OpenLineage Object Model](https://openlineage.io/docs/spec/object-model/) - [DataHub Lineage Documentation](https://docs.datahub.com/docs/features/feature-guides/lineage) - [W3C PROV Overview](https://www.w3.org/TR/prov-overview/)