Skip to content

Quick Start: BigQuery

Extract data from S3, load it into BigQuery, and generate dbt models -- in under 5 minutes.

Prerequisites

  • skippr-dbt and skippr on PATH (Install)
  • Python venv with dbt-core and dbt-bigquery installed
  • Environment variables set:
export LLM_API_KEY="sk-..."
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."

See Connect: BigQuery for service account setup and Connect: S3 for AWS credential details.

1. Initialise the project

mkdir my-workspace && cd my-workspace
skippr-dbt init s3-analytics

2. Connect the warehouse

skippr-dbt connect warehouse bigquery \
  --project my-gcp-project \
  --dataset raw_data \
  --location US

3. Connect the source

skippr-dbt connect source s3 \
  --bucket my-data-bucket \
  --prefix raw/

4. Check prerequisites

skippr-dbt doctor

5. Run the pipeline

skippr-dbt run

The pipeline will:

  1. Discover file schemas from S3.
  2. Sync data into BigQuery bronze tables.
  3. Verify the destination tables are queryable.
  4. Plan a silver (staging) layer with one model per raw table.
  5. Author dbt SQL models with type casting and column mapping.
  6. Validate by running dbt compile and dbt run against the warehouse.

6. Verify outputs

Generated config

# skippr-dbt.yaml
project: s3_analytics

warehouse:
  kind: bigquery
  project: my-gcp-project
  dataset: raw_data
  location: US

source:
  kind: s3
  s3_bucket: my-data-bucket
  s3_prefix: raw/

dbt models

models/
├── schema.yml                   # source definitions
└── staging/
    ├── stg_raw_events.sql       # silver model
    └── stg_raw_sessions.sql     # silver model

BigQuery datasets

Dataset Contents
raw_data Bronze -- raw S3 data
s3_analytics_silver Silver -- staged and cleansed
s3_analytics_gold Gold -- mart-ready models