Skip to content

Quick Start: BigQuery

Five commands to go from files in S3 to materialised dbt models in BigQuery -- bronze, silver, and gold layers, all generated and validated automatically.

Prerequisites

  • skippr on PATH (Install)
  • Python venv with dbt-core and dbt-bigquery
  • Authenticated via skippr user login (or SKIPPR_API_KEY for CI)
  • BigQuery and AWS credentials in your environment:
bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."

Need help with credentials? See BigQuery and S3.

Build the pipeline

bash
# 1. Create the project
mkdir my-workspace && cd my-workspace
skippr init s3-analytics

# 2. Point at your warehouse
skippr connect warehouse bigquery \
  --project my-gcp-project \
  --dataset raw_data \
  --location US

# 3. Point at your source
skippr connect source s3 \
  --bucket my-data-bucket \
  --prefix raw/

# 4. Verify everything is wired up
skippr doctor

# 5. Run it
skippr run

That's it. skippr run discovers your file schemas, extracts the data, loads it into BigQuery, and generates a complete dbt project with silver and gold models -- compiled and materialised.

What you get

dbt models (ready to extend)

models/
├── schema.yml                   # source definitions
└── staging/
    ├── stg_raw_events.sql       # silver model
    └── stg_raw_sessions.sql     # silver model

BigQuery datasets (populated and queryable)

DatasetContents
raw_dataBronze -- raw extracted data
s3_analytics_silverSilver -- staged and cleansed
s3_analytics_goldGold -- mart-ready models

Project config

yaml
# skippr.yaml
project: s3_analytics

warehouse:
  kind: bigquery
  project: my-gcp-project
  dataset: raw_data
  location: US

source:
  kind: s3
  s3_bucket: my-data-bucket
  s3_prefix: raw/

What this quickstart proves

  • The runner reads S3 data and writes it directly into BigQuery.
  • Skippr generates a reviewable dbt project instead of hiding the result behind a proprietary format.
  • Authentication and control-plane services are cloud-backed, but row-level source data is not routed through that cloud path.
  • The next trust layer is How It Works and CDC Guarantees.

What's next

  • Run skippr run again -- it's incremental, only new and changed rows are synced.
  • The dbt project is yours. Add tests, snapshots, or custom gold models.
  • See How It Works for the full pipeline breakdown.