Data Catalog Glossaries and Column Definitions
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Glossaries and column definitions help agents distinguish business meaning from raw field names, especially in analytics and RAG over data catalogs. ## Core Explanation Column names are often too terse for reliable agent reasoning. A field named `status`, `amount`, or `created_at` may mean different things across domains. A data catalog glossary gives the agent a controlled vocabulary for business terms, while field descriptions and semantic models connect those terms to concrete datasets. Useful evidence includes term name, approved definition, synonyms, steward, mapped assets, mapped columns, semantic-model measures, lineage, and last review time. That metadata helps an agent answer "what does this metric mean" before it writes SQL, summarizes a dashboard, or generates a data-quality rule. Agents should still treat glossary entries as governed context, not as proof that a dataset is correct. The definition may be stale, unmapped to the field being queried, or inconsistent with a downstream metric layer. ## Source-Mapped Facts - OpenMetadata documentation describes glossary terms as shared definitions used for data assets. ([source](https://docs.open-metadata.org/latest/how-to-guides/data-governance/glossary)) - DataHub documentation provides a business glossary ingestion source for ingesting glossary terms. ([source](https://docs.datahub.com/docs/generated/ingestion/sources/business-glossary)) - dbt documentation says semantic models contain entities, dimensions, and measures. ([source](https://docs.getdbt.com/docs/build/semantic-models)) ## Further Reading - [OpenMetadata Glossary](https://docs.open-metadata.org/latest/how-to-guides/data-governance/glossary) - [DataHub Business Glossary Source](https://docs.datahub.com/docs/generated/ingestion/sources/business-glossary) - [dbt Semantic Models](https://docs.getdbt.com/docs/build/semantic-models)