Data Profiling and Column Statistics

Status: public · Confidence: medium (0.725) · Basis: verified_sources

## TL;DR

Data profiling gives agents column-level evidence about shape, completeness, and distribution before they trust or transform a dataset.

## Core Explanation

Agents often need to answer whether a table is usable for analytics, training, migration, or reporting. Schema alone is not enough. Profiling can surface null rates, distinct values, min and max values, inferred patterns, and other column-level signals.

Useful profiling evidence includes scan time, column name, type, null ratio, distinct count, detected anomalies, sample policy, and whether the profile was generated before or after a pipeline change.

## Source-Mapped Facts

- Dataplex documentation says data profiling discovers common statistical characteristics of columns in BigQuery tables. ([source](https://docs.cloud.google.com/dataplex/docs/data-profiling-overview))
- BigQuery documentation says data profile scan results can appear on a source table's Data profile tab. ([source](https://docs.cloud.google.com/bigquery/docs/data-profile-scan))
- BigQuery INFORMATION_SCHEMA COLUMNS view contains one row for each column in a table. ([source](https://cloud.google.com/bigquery/docs/information-schema-columns))

## Further Reading

- [Dataplex Data Profiling Overview](https://docs.cloud.google.com/dataplex/docs/data-profiling-overview)
- [BigQuery Data Profile Scan](https://docs.cloud.google.com/bigquery/docs/data-profile-scan)
- [BigQuery INFORMATION_SCHEMA COLUMNS](https://cloud.google.com/bigquery/docs/information-schema-columns)