Skip to content

How It Works

When you run skippr run, the CLI orchestrates a multi-phase pipeline that takes you from raw source data to materialised dbt models -- without writing any SQL, YAML, or pipeline config by hand.

The pipeline

Your Source

  │  discover ── reads schemas and infers structure

Schema Discovery

  │  map ── deterministic typed destination schemas

Schema Mapping

  │  sync ── extracts rows, loads into warehouse

Bronze Tables (raw data in your warehouse)

  │  dbt ── generates and runs silver/gold models

Silver Models ── staging: cleaned, typed, renamed


Gold Models ── marts: business-ready aggregations

Each phase runs automatically. You see real-time progress in the terminal UI (or structured logs in CI).

What happens at each step

  1. Discover -- reads your source system's metadata: table names, column names, data types. No manual DDL or schema definitions needed.
  2. Map -- deterministic algorithms design the destination schema: clean column names, appropriate type casts, staging model structure. Only metadata is used, never your actual data.
  3. Sync -- extracts rows and files from your source, writes them directly into your warehouse's bronze schema. Data flows from the machine running skippr straight to the warehouse -- row-level data is never sent anywhere else.
  4. Model -- generates a complete dbt project: source definitions, silver staging models with type casting and renaming, and gold mart models.
  5. Validate -- compiles and runs the dbt project against your warehouse to confirm everything materialises cleanly.

Incremental by default

Re-running skippr run on an existing project doesn't start from scratch:

  • Data sync -- offsets are tracked internally. Only new and changed rows are extracted and loaded.
  • dbt models -- existing models are preserved. The agent updates or adds new models as the source evolves.

This means you can run the same pipeline on a schedule and it behaves like a proper incremental ETL -- no custom state management required.

Data privacy

Row-level data only ever exists in two places: the machine running skippr, and your warehouse.

  • Source data is read locally and written directly to the warehouse API (Snowflake REST, BigQuery API, Postgres wire protocol, etc.). It is never sent to Skippr or any third party.
  • AI modeling uses only metadata (table names, column names, types) by default. Data samples can optionally be sent to improve model quality but are off by default.
  • The Skippr backend receives only pipeline metadata and usage metrics (e.g. run status, table counts, credit consumption). No source data or warehouse data is sent.
  • Credentials live in environment variables, never in config files.

Output structure

The pipeline creates schemas in your warehouse using the project name:

TierSchema nameContents
Bronze<warehouse_schema> (e.g. RAW)Raw extracted data
Silver<project>_silver (e.g. my_project_silver)Staged, cleaned, typed
Gold<project>_gold (e.g. my_project_gold)Business-ready models

Local artifacts

Pipeline state, logs, and intermediate files are stored under .skippr/ in your working directory. See Logs and Artifacts for the full layout.