Data Warehouse Partition Pruning and Clustering
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Partition pruning and clustering determine whether a warehouse query scans the right slice of data or pays for unnecessary work. ## Core Explanation Data warehouses use physical layout and metadata to reduce scan cost. Partition pruning skips partitions that cannot match a query. Clustering and sort keys organize related rows so filters can skip or read fewer blocks. Agents should inspect actual query predicates and execution metadata before recommending partition or clustering changes. A good recommendation names the table, filter pattern, scan bytes, existing layout, and maintenance cost. ## Source-Mapped Facts - BigQuery documentation describes partition pruning as scanning only relevant partitions when filters use the partitioning column. ([source](https://cloud.google.com/bigquery/docs/querying-partitioned-tables)) - Snowflake documentation describes clustering keys as a way to co-locate similar rows in the same micro-partitions. ([source](https://docs.snowflake.com/en/user-guide/tables-clustering-keys)) - Amazon Redshift documentation says sort keys determine the order in which rows are stored. ([source](https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html)) ## Further Reading - [BigQuery Query Partitioned Tables](https://cloud.google.com/bigquery/docs/querying-partitioned-tables) - [Snowflake Clustering Keys](https://docs.snowflake.com/en/user-guide/tables-clustering-keys) - [Amazon Redshift Sort Keys](https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html)