Change Data Capture (CDC)
Install
See the Install guide for the full setup, including Windows PowerShell.
curl -fsSL https://install.skippr.io/install.sh | shClick to copyChange Data Capture (CDC) replicates individual inserts, updates, and deletes from a source system into your destination in real time instead of re-extracting entire tables on every run.
Skippr reads native change logs such as PostgreSQL WAL, MySQL binlog, MongoDB change streams, DynamoDB Streams, and Kafka Debezium envelopes. For supported warehouse destinations, it applies those mutations with exactly-once final-state semantics.
How it works
Source Log
│
│ read changes ── WAL / binlog / stream / envelope
▼
Committed Change Batch
│
│ apply if newer ── business key + order token
▼
Destination Warehouse
├── _skippr_order_token column (stale-write rejection)
└── _skippr_tombstones_{table} (anti-resurrect protection)Each change carries:
- a mutation kind such as
insert,update, ordelete - an event identity used to recognize the same source event on replay
- an order token used to reject stale writes and protect delete ordering
The committed change batch is the source of truth for replay. Resume state is rebuilt from committed ownership of source events.
At the destination, Skippr uses these rules to guarantee correctness:
- Upsert-if-newer -- a row is only written if its order token is greater than the existing token. Stale or replayed writes are silently discarded.
- Tombstone anti-resurrect -- deletes are recorded in a per-table tombstone table. A later insert for a deleted key is only applied if its order token is newer than the delete.
See CDC Guarantees for the contract details and CDC Operations for production expectations.
Enabling CDC
CDC requires two pieces in your skippr.yaml:
- Set
cdc_enabled: trueon the source - Add a
cdc:pipeline block with your business key columns
Skippr automatically determines the most complete CDC guarantee your source and destination pair supports. You do not need to set a guarantee level manually.
project: my_cdc_pipeline
source:
kind: postgres
host: localhost
port: 5432
user: replicator
password: ${POSTGRES_PASSWORD}
database: mydb
cdc_enabled: true
warehouse:
kind: snowflake
database: ANALYTICS
schema: RAW
warehouse: COMPUTE_WH
cdc:
business_key_columns:
- idSee CDC Configuration for the full reference.
Supported sources
| Source | CDC Mechanism | Details |
|---|---|---|
| PostgreSQL | WAL logical replication (pgoutput) | CDC Sources -- PostgreSQL |
| MySQL | Binlog replication | CDC Sources -- MySQL |
| MongoDB | Change streams | CDC Sources -- MongoDB |
| DynamoDB | DynamoDB Streams | CDC Sources -- DynamoDB |
| Kafka | Debezium envelope parsing | CDC Sources -- Kafka |
Supported destinations
All supported warehouse destinations use final-state reconciliation:
| Destination | MERGE Strategy | Details |
|---|---|---|
| Snowflake | MERGE DML | CDC Destinations -- Snowflake |
| BigQuery | MERGE DML | CDC Destinations -- BigQuery |
| PostgreSQL | Staging table + INSERT ... ON CONFLICT | CDC Destinations -- PostgreSQL |
| Redshift | Staging table + MERGE | CDC Destinations -- Redshift |
| ClickHouse | ReplacingMergeTree | CDC Destinations -- ClickHouse |
| Databricks | Unity Catalog MERGE | CDC Destinations -- Databricks |
| Synapse | MERGE via Tiberius | CDC Destinations -- Synapse |
| MotherDuck | DuckDB MERGE | CDC Destinations -- MotherDuck |
Further reading
- CDC Guarantees -- what exactly-once final state means and where its limits are
- CDC Operations -- lag, retention, restarts, and monitoring
- CDC Sources -- prerequisites, configuration, and resume behavior for each source
- CDC Destinations -- how changes are applied at each warehouse
- CDC Configuration -- full YAML reference for the
cdc:pipeline block - Blog: Change Data Capture with Exactly-Once Guarantees -- deep dive into Skippr's CDC architecture
Install
See the Install guide for the full setup, including Windows PowerShell.
curl -fsSL https://install.skippr.io/install.sh | shClick to copy