Data Profiling and Column Statistics
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Data profiling gives agents column-level evidence about shape, completeness, and distribution before they trust or transform a dataset. ## Core Explanation Agents often need to answer whether a table is usable for analytics, training, migration, or reporting. Schema alone is not enough. Profiling can surface null rates, distinct values, min and max values, inferred patterns, and other column-level signals. Useful profiling evidence includes scan time, column name, type, null ratio, distinct count, detected anomalies, sample policy, and whether the profile was generated before or after a pipeline change. ## Source-Mapped Facts - Dataplex documentation says data profiling discovers common statistical characteristics of columns in BigQuery tables. ([source](https://docs.cloud.google.com/dataplex/docs/data-profiling-overview)) - BigQuery documentation says data profile scan results can appear on a source table's Data profile tab. ([source](https://docs.cloud.google.com/bigquery/docs/data-profile-scan)) - BigQuery INFORMATION_SCHEMA COLUMNS view contains one row for each column in a table. ([source](https://cloud.google.com/bigquery/docs/information-schema-columns)) ## Further Reading - [Dataplex Data Profiling Overview](https://docs.cloud.google.com/dataplex/docs/data-profiling-overview) - [BigQuery Data Profile Scan](https://docs.cloud.google.com/bigquery/docs/data-profile-scan) - [BigQuery INFORMATION_SCHEMA COLUMNS](https://cloud.google.com/bigquery/docs/information-schema-columns)