How It Works

The CLI moves through a short public pipeline model: discover the source, sync raw data, draft dbt assets, and validate the result. skippr sync handles extract/load; skippr model runs the data-engineer workflow that plans, authors, validates, and reviews the dbt project.

The pipeline

Your Source
  │
  │  discover ── inspect metadata and shape the destination
  ▼
Discover
  │
  │  sync ── move raw rows into bronze tables
  ▼
Bronze Tables
  │
  │  model ── draft silver/gold dbt assets
  ▼
Reviewable dbt Project
  │
  │  validate ── compile and run against the destination
  ▼
Silver and Gold Models

sync automatically runs discovery when metadata is missing. Modeling is a separate command so you can load bronze data on a schedule and run dbt authoring when you are ready. You see real-time progress in the terminal UI, or structured logs with --log.

How this maps to the CLI phases

Public step	CLI phases
Discover	`Discover`
Sync	`Sync`, `Verify`
Model	`Plan`, `Author`
Validate	`Validate`, `Review`, and optionally `skippr test` for dbt tests

What happens at each step

Discover -- reads source metadata such as table names, column names, and types. Destination mapping is determined here, using deterministic logic rather than model output.
Sync -- extracts rows and files from the source and writes them into bronze tables in your destination.
Model -- drafts a dbt project with source definitions, staging models, and business-facing models for review.
Validate -- runs the generated dbt project against the destination to confirm that the output materialises cleanly. After a successful model run, use skippr test run in CI or locally to execute dbt tests against the same materialised project.

Incremental by default

Re-running skippr sync on an existing project doesn't start from scratch:

Data sync -- offsets are tracked internally. Only new and changed rows are extracted and loaded.
dbt models -- existing models are preserved. The agent updates or adds new models as the source evolves.

This means you can run the same pipeline on a schedule and it behaves like a proper incremental ETL -- no custom state management required.

Resumable modeling

By default, skippr model --pipeline <name> resumes the latest modeling thread for the current pipeline when one exists. Use skippr model --pipeline <name> --no-resume to start a fresh thread, for example after changing the source shape significantly or when you want to ignore stale run state.

Data privacy

Row-level data only ever exists in two places: the machine running skippr, and your warehouse.

Source data is read locally and written directly to the warehouse API (Snowflake REST, BigQuery API, Postgres wire protocol, etc.). It is never sent to Skippr or any third party.
AI modeling uses only metadata (table names, column names, types) by default. Data samples can optionally be sent to improve model quality but are off by default.
The Skippr cloud path handles authentication and control-plane services. It receives metadata needed to operate the service, not row-level source or warehouse data.
Credentials live in environment variables, never in config files.

Output structure

The pipeline creates schemas in your warehouse using the project name:

Tier	Schema name	Contents
Bronze	`<warehouse_schema>` (e.g. `RAW`)	Raw extracted data
Silver	`<project>_silver` (e.g. `my_project_silver`)	Staged, cleaned, typed
Gold	`<project>_gold` (e.g. `my_project_gold`)	Business-ready models

What you can inspect

Bronze, silver, and gold objects in your destination
The generated dbt project in your working directory
Connector guides for auth, permissions or network requirements, and troubleshooting

How It Works ​

The pipeline ​

How this maps to the CLI phases ​

What happens at each step ​

Incremental by default ​

Resumable modeling ​

Data privacy ​

Output structure ​

What you can inspect ​