# Data Column Pruning and File Statistics
Status: public
Confidence: medium (0.725) (verified)
Last verified: 2026-06-03
Generation: ai_structured


## TL;DR

Data agents should check projected columns and file statistics before recommending more compute for a slow analytical query.

## Core Explanation

Columnar engines can avoid reading columns that a query does not need. Table and file statistics can also help engines skip row groups, files, or partitions that cannot match predicates. These optimizations depend on the physical layout and whether the generated SQL preserves filter and projection opportunities.

Agents generating warehouse SQL should record selected columns, scanned bytes, file format, table metadata age, and explain-plan evidence. A query that uses `SELECT *` or wraps partition columns in unoptimizable expressions may defeat pruning and make the data system look slower than it is.

## Source-Mapped Facts

- BigQuery performance documentation recommends querying only the columns needed instead of using SELECT *. ([source](https://docs.cloud.google.com/bigquery/docs/best-practices-performance-compute))
- Apache Parquet documentation describes file metadata that includes row groups and column chunks. ([source](https://parquet.apache.org/docs/file-format/metadata/))
- Apache Iceberg documentation says table metadata can be used to plan efficient scans and avoid reading unnecessary files. ([source](https://iceberg.apache.org/docs/latest/performance/))

## Further Reading

- [BigQuery Performance Best Practices](https://docs.cloud.google.com/bigquery/docs/best-practices-performance-compute)
- [Apache Parquet Metadata](https://parquet.apache.org/docs/file-format/metadata/)
- [Apache Iceberg Performance](https://iceberg.apache.org/docs/latest/performance/)