How It Works
When you run skippr run, the CLI orchestrates a multi-phase pipeline that takes you from raw source data to materialised dbt models -- without writing any SQL, YAML, or pipeline config by hand.
The pipeline
Your Source
│
│ discover ── reads schemas and infers structure
▼
Schema Discovery
│
│ map ── deterministic typed destination schemas
▼
Schema Mapping
│
│ sync ── extracts rows, loads into warehouse
▼
Bronze Tables (raw data in your warehouse)
│
│ dbt ── generates and runs silver/gold models
▼
Silver Models ── staging: cleaned, typed, renamed
│
▼
Gold Models ── marts: business-ready aggregationsEach phase runs automatically. You see real-time progress in the terminal UI (or structured logs in CI).
What happens at each step
- Discover -- reads your source system's metadata: table names, column names, data types. No manual DDL or schema definitions needed.
- Map -- deterministic algorithms design the destination schema: clean column names, appropriate type casts, staging model structure. Only metadata is used, never your actual data.
- Sync -- extracts rows and files from your source, writes them directly into your warehouse's bronze schema. Data flows from the machine running
skipprstraight to the warehouse -- row-level data is never sent anywhere else. - Model -- generates a complete dbt project: source definitions, silver staging models with type casting and renaming, and gold mart models.
- Validate -- compiles and runs the dbt project against your warehouse to confirm everything materialises cleanly.
Incremental by default
Re-running skippr run on an existing project doesn't start from scratch:
- Data sync -- offsets are tracked internally. Only new and changed rows are extracted and loaded.
- dbt models -- existing models are preserved. The agent updates or adds new models as the source evolves.
This means you can run the same pipeline on a schedule and it behaves like a proper incremental ETL -- no custom state management required.
Data privacy
Row-level data only ever exists in two places: the machine running skippr, and your warehouse.
- Source data is read locally and written directly to the warehouse API (Snowflake REST, BigQuery API, Postgres wire protocol, etc.). It is never sent to Skippr or any third party.
- AI modeling uses only metadata (table names, column names, types) by default. Data samples can optionally be sent to improve model quality but are off by default.
- The Skippr backend receives only pipeline metadata and usage metrics (e.g. run status, table counts, credit consumption). No source data or warehouse data is sent.
- Credentials live in environment variables, never in config files.
Output structure
The pipeline creates schemas in your warehouse using the project name:
| Tier | Schema name | Contents |
|---|---|---|
| Bronze | <warehouse_schema> (e.g. RAW) | Raw extracted data |
| Silver | <project>_silver (e.g. my_project_silver) | Staged, cleaned, typed |
| Gold | <project>_gold (e.g. my_project_gold) | Business-ready models |
Local artifacts
Pipeline state, logs, and intermediate files are stored under .skippr/ in your working directory. See Logs and Artifacts for the full layout.
