Skip to content

Config File

Skippr uses one project config file: skippr.yaml. The same full engine schema is used by skippr and skipprd.

Example

yaml
skippr:
  workspace: mssql_migration
  tenant: _
  skipprd_el_storage_mode: local

pipelines:
  mssql_to_snowflake:
    data_source: data_sources.mssql
    data_sink: data_sinks.snowflake
    cdc:
      business_key_columns: [id]

data_sources:
  mssql:
    Mssql:
      connection_string: ${MSSQL_CONNECTION_STRING}
      tables: ["dbo.customers", "dbo.orders"]
  postgres_cdc:
    Postgres:
      connection_string: ${POSTGRES_CONNECTION_STRING}
      tables: ["public.orders"]
      cdc_mode: snapshot_then_cdc

data_sinks:
  snowflake:
    Snowflake:
      database: ANALYTICS
      schema: RAW
      warehouse: COMPUTE_WH
      role: ACCOUNTADMIN

schema_sinks: {}
runtime_plugins: {}

react:
  providers:
    warehouse:
      kind: snowflake
      database: ANALYTICS
      schema: RAW
      warehouse: COMPUTE_WH
      role: ACCOUNTADMIN
    catalog:
      enabled: true
      refresh_secs: 3600
      max_concurrency: 8
    dbt:
      enabled: true
      runner: host
      target: dev
      naming:
        target_schema: analytics
        silver_suffix: silver
        gold_suffix: gold
    vector:
      enabled: true

Top-Level Sections

SectionPurpose
skipprWorkspace, tenant, and internal skipprd extract/load defaults
pipelinesNamed pipelines and their source, sink, transform, CDC, and runtime settings
data_sourcesSource plugin configuration keyed by name
data_sinksDestination plugin configuration keyed by name
schema_sinksCatalog/schema plugin configuration keyed by name
runtime_pluginsOptional explicit runtime plugin manifest paths
reactModeling provider settings used by skippr model

Storage Settings

skippr.skipprd_el_storage_mode is an internal development/testing setting that controls where skipprd extract/load state is stored (local or s3). It does not control dbt project storage, React thread logs, or vector storage for skippr model; authenticated modeling runs use the storage credentials returned by the Skippr API.

The equivalent environment variable for direct skipprd runs is SKIPPRD_EL_STORAGE_MODE.

Pipelines

Each pipeline references registry entries by section-qualified name:

yaml
pipelines:
  ingest_orders:
    data_source: data_sources.postgres
    data_sink: data_sinks.iceberg

Use skippr discover --pipeline ingest_orders to persist metadata, then skippr sync --pipeline ingest_orders --once to load data. If metadata is missing, sync runs discovery automatically before loading. Run skippr model after sync when you are ready to generate and validate dbt assets.

Plugin Entries

Plugin sections use the plugin name as the single key under each named entry:

yaml
data_sources:
  postgres:
    Postgres:
      connection_string: ${POSTGRES_CONNECTION_STRING}
      tables: ["public.orders"]

Destination entries follow the same shape:

yaml
data_sinks:
  iceberg:
    Iceberg:
      table_namespace: analytics
      table_location_prefix: s3://my-bucket/warehouse
      catalog:
        type: glue
        warehouse: s3://my-bucket/warehouse
        database: analytics
        region: us-east-1

CDC-capable source plugins use cdc_mode to choose how source reads begin:

ValueBehavior
snapshotBounded snapshot only.
snapshot_then_cdcFull initial snapshot, then native CDC stream.
cdc_onlyNative CDC stream only, with no initial snapshot.

Modeling Settings

skippr model reads modeling provider settings from react.providers. Extract and load providers are not part of the modeling workflow; use discover and sync for those steps. By default, model resumes the latest modeling thread for the project; use skippr model --no-resume to start a fresh thread.

Environment Variables

Use ${ENV_VAR} syntax for secrets and deployment-specific values:

yaml
data_sources:
  mssql:
    Mssql:
      connection_string: ${MSSQL_CONNECTION_STRING}

Keep secure values in the environment or your secret manager, not in skippr.yaml.