Data Spark Adaptive Query Execution and Explain Plans

Status: public · Confidence: medium (0.685) · Basis: verified_sources
## TL;DR

Spark Adaptive Query Execution and EXPLAIN plans help agents distinguish a bad logical query from runtime reoptimization, skew, shuffle, or stale statistics.

## Core Explanation

Spark SQL performance changes can come from both the plan selected before execution and the adaptive choices made while the query is running. An agent needs to compare EXPLAIN output, SQL UI metrics, adaptive plan changes, and file-level input data before recommending joins, repartitioning, or caching.

Useful evidence includes the SQL text, EXPLAIN mode, Spark version, AQE flags, runtime statistics, shuffle partition counts, join strategy, skew partitions, table statistics, and the final adaptive physical plan.

## Source-Mapped Facts

- Apache Spark documentation lists Adaptive Query Execution among Spark SQL performance tuning features. ([source](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution))
- Apache Spark documentation describes runtime statistics as visible in the SQL UI while a query is running. ([source](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution))
- Apache Spark SQL reference says EXPLAIN provides logical or physical plans for an input statement and defaults to physical plan information. ([source](https://spark.apache.org/docs/latest/sql-ref-syntax-qry-explain.html))

## Further Reading

- [Spark SQL Performance Tuning](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution)
- [Spark SQL EXPLAIN](https://spark.apache.org/docs/latest/sql-ref-syntax-qry-explain.html)