Schema Evolution for Data Pipelines
Status: public · Confidence: medium (0.865) · Basis: verified_sources
## TL;DR Schema evolution is the controlled process of changing data contracts while allowing existing producers, consumers, and stored data to keep working. ## Core Explanation Data infrastructure breaks when schemas change without compatibility rules. Adding a required field, reusing a Protobuf field number, changing an Avro union, or dropping a column can silently corrupt downstream assumptions. Production pipelines use schema registries, compatibility checks, migration windows, and consumer upgrade plans to make schema changes explicit and testable. ## Source-Mapped Facts - Apache Avro's specification says a reader may read data with a schema different from the writer's schema and defines how schema differences should be resolved. ([source](https://avro.apache.org/docs/1.12.0/specification/)) - Protocol Buffers documentation says deleted field numbers should be added to a reserved list so future developers do not reuse them. ([source](https://protobuf.dev/programming-guides/proto3/)) - Confluent Schema Registry documentation defines schema evolution as safely changing schemas over time while maintaining compatibility with existing producers and consumers. ([source](https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html)) ## Further Reading - [Apache Avro specification](https://avro.apache.org/docs/1.12.0/specification/) - [Protocol Buffers proto3 guide](https://protobuf.dev/programming-guides/proto3/) - [Confluent schema evolution](https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html)