Schema Evolution for Data Pipelines

Status: public · Confidence: medium (0.865) · Basis: verified_sources
## TL;DR

Schema evolution is the controlled process of changing data contracts while allowing existing producers, consumers, and stored data to keep working.

## Core Explanation

Data infrastructure breaks when schemas change without compatibility rules. Adding a required field, reusing a Protobuf field number, changing an Avro union, or dropping a column can silently corrupt downstream assumptions. Production pipelines use schema registries, compatibility checks, migration windows, and consumer upgrade plans to make schema changes explicit and testable.

## Source-Mapped Facts

- Apache Avro's specification says a reader may read data with a schema different from the writer's schema and defines how schema differences should be resolved. ([source](https://avro.apache.org/docs/1.12.0/specification/))
- Protocol Buffers documentation says deleted field numbers should be added to a reserved list so future developers do not reuse them. ([source](https://protobuf.dev/programming-guides/proto3/))
- Confluent Schema Registry documentation defines schema evolution as safely changing schemas over time while maintaining compatibility with existing producers and consumers. ([source](https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html))

## Further Reading

- [Apache Avro specification](https://avro.apache.org/docs/1.12.0/specification/)
- [Protocol Buffers proto3 guide](https://protobuf.dev/programming-guides/proto3/)
- [Confluent schema evolution](https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html)