Inspired by best practices in software engineering, Write, Audit, Publish (WAP) is a “blue-green deployment” for data pipelines. Instead of immediately overwriting production data, a typical WAP paradigm stages updates in a table, runs tests against this staged data, and only promotes it to production if all tests pass. The process ensures that no erroneous data ever reaches production, avoiding unnecessary downtime or downstream impact.
Here’s a breakdown of how it works:
- Write: The transformation runs and stages its results into a temporary non-production table.
- Audit: Data quality tests are executed and the results are audited in order to identify and rectify any issues.
- Publish: If tests pass and issues are resolved, the non-production table is promoted to production (ideally in a way that doesn’t recompute the results). This ensures atomicity and avoids rerunning computations.
By decoupling testing from production updates, WAP empowers teams to build trust in their data pipelines without the constant fear of breaking downstream systems.
Important
A traditional pipeline execution with data quality tests execute these tests after the model’s data has already been updated. Critically, this means downstream tables would still run even if the tests failed, resulting in faulty data all the way down to the data consumer be it a dashboard or embedded application.