Quick Start: BigQuery¶
Extract data from S3, load it into BigQuery, and generate dbt models -- in under 5 minutes.
Prerequisites¶
skippr-dbtandskippron PATH (Install)- Python venv with
dbt-coreanddbt-bigqueryinstalled - Environment variables set:
export LLM_API_KEY="sk-..."
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
See Connect: BigQuery for service account setup and Connect: S3 for AWS credential details.
1. Initialise the project¶
mkdir my-workspace && cd my-workspace
skippr-dbt init s3-analytics
2. Connect the warehouse¶
skippr-dbt connect warehouse bigquery \
--project my-gcp-project \
--dataset raw_data \
--location US
3. Connect the source¶
skippr-dbt connect source s3 \
--bucket my-data-bucket \
--prefix raw/
4. Check prerequisites¶
skippr-dbt doctor
5. Run the pipeline¶
skippr-dbt run
The pipeline will:
- Discover file schemas from S3.
- Sync data into BigQuery bronze tables.
- Verify the destination tables are queryable.
- Plan a silver (staging) layer with one model per raw table.
- Author dbt SQL models with type casting and column mapping.
- Validate by running
dbt compileanddbt runagainst the warehouse.
6. Verify outputs¶
Generated config¶
# skippr-dbt.yaml
project: s3_analytics
warehouse:
kind: bigquery
project: my-gcp-project
dataset: raw_data
location: US
source:
kind: s3
s3_bucket: my-data-bucket
s3_prefix: raw/
dbt models¶
models/
├── schema.yml # source definitions
└── staging/
├── stg_raw_events.sql # silver model
└── stg_raw_sessions.sql # silver model
BigQuery datasets¶
| Dataset | Contents |
|---|---|
raw_data |
Bronze -- raw S3 data |
s3_analytics_silver |
Silver -- staged and cleansed |
s3_analytics_gold |
Gold -- mart-ready models |