Skip to content

Change Data Capture (CDC)

Install

See the Install guide for the full setup, including Windows PowerShell.

curl -fsSL https://install.skippr.io/install.sh | shClick to copy

Change Data Capture (CDC) replicates individual inserts, updates, and deletes from a source system into your destination in real time instead of re-extracting entire tables on every run.

Skippr reads native change logs such as PostgreSQL WAL, MySQL binlog, MongoDB change streams, DynamoDB Streams, and Kafka Debezium envelopes. For supported warehouse destinations, it applies those mutations with exactly-once final-state semantics.

How it works

Source Log

  │  read changes ── WAL / binlog / stream / envelope

Committed Change Batch

  │  apply if newer ── business key + order token

Destination Warehouse
  ├── _skippr_order_token column (stale-write rejection)
  └── _skippr_tombstones_{table} (anti-resurrect protection)

Each change carries:

  • a mutation kind such as insert, update, or delete
  • an event identity used to recognize the same source event on replay
  • an order token used to reject stale writes and protect delete ordering

The committed change batch is the source of truth for replay. Resume state is rebuilt from committed ownership of source events.

At the destination, Skippr uses these rules to guarantee correctness:

  • Upsert-if-newer -- a row is only written if its order token is greater than the existing token. Stale or replayed writes are silently discarded.
  • Tombstone anti-resurrect -- deletes are recorded in a per-table tombstone table. A later insert for a deleted key is only applied if its order token is newer than the delete.

See CDC Guarantees for the contract details and CDC Operations for production expectations.

Enabling CDC

CDC requires two pieces in your skippr.yaml:

  1. Set cdc_enabled: true on the source
  2. Add a cdc: pipeline block with your business key columns

Skippr automatically determines the most complete CDC guarantee your source and destination pair supports. You do not need to set a guarantee level manually.

yaml
project: my_cdc_pipeline

source:
  kind: postgres
  host: localhost
  port: 5432
  user: replicator
  password: ${POSTGRES_PASSWORD}
  database: mydb
  cdc_enabled: true

warehouse:
  kind: snowflake
  database: ANALYTICS
  schema: RAW
  warehouse: COMPUTE_WH

cdc:
  business_key_columns:
    - id

See CDC Configuration for the full reference.

Supported sources

SourceCDC MechanismDetails
PostgreSQLWAL logical replication (pgoutput)CDC Sources -- PostgreSQL
MySQLBinlog replicationCDC Sources -- MySQL
MongoDBChange streamsCDC Sources -- MongoDB
DynamoDBDynamoDB StreamsCDC Sources -- DynamoDB
KafkaDebezium envelope parsingCDC Sources -- Kafka

Supported destinations

All supported warehouse destinations use final-state reconciliation:

DestinationMERGE StrategyDetails
SnowflakeMERGE DMLCDC Destinations -- Snowflake
BigQueryMERGE DMLCDC Destinations -- BigQuery
PostgreSQLStaging table + INSERT ... ON CONFLICTCDC Destinations -- PostgreSQL
RedshiftStaging table + MERGECDC Destinations -- Redshift
ClickHouseReplacingMergeTreeCDC Destinations -- ClickHouse
DatabricksUnity Catalog MERGECDC Destinations -- Databricks
SynapseMERGE via TiberiusCDC Destinations -- Synapse
MotherDuckDuckDB MERGECDC Destinations -- MotherDuck

Further reading

Install

See the Install guide for the full setup, including Windows PowerShell.

curl -fsSL https://install.skippr.io/install.sh | shClick to copy