Source Connectors¶
Skippr supports a wide range of source connectors for extracting data from databases, object stores, streaming platforms, APIs, and more.
Databases¶
MSSQL¶
Reads data from Microsoft SQL Server tables.
source:
kind: mssql
connection_string: ${MSSQL_CONNECTION_STRING}
| Field | Default | Description |
|---|---|---|
connection_string |
(required) | ADO.NET connection string |
tables |
(auto-discover) | Optional list of tables to ingest |
batch_size_rows |
10000 |
Rows per ingest batch |
Namespace: mssql.{database}.{schema}.{table}
See Connect: MSSQL for a step-by-step setup guide.
MySQL¶
Reads data from MySQL tables.
source:
kind: mysql
connection_string: ${MYSQL_CONNECTION_STRING}
| Field | Default | Description |
|---|---|---|
connection_string |
(required) | MySQL connection string (e.g. mysql://user:pass@host:3306/db) |
tables |
(auto-discover) | Optional list of schema.table names to ingest |
batch_size_rows |
10000 |
Rows per ingest batch |
Namespace: mysql.{database}.{schema}.{table}
PostgreSQL¶
Reads data from PostgreSQL tables.
source:
kind: postgres
host: localhost
port: 5432
user: postgres
password: ${POSTGRES_PASSWORD}
database: mydb
| Field | Default | Description |
|---|---|---|
host |
localhost |
Postgres host |
port |
5432 |
Postgres port |
user |
postgres |
Username |
password |
Password | |
database |
Database name | |
connection_string |
Full connection string (overrides individual fields) | |
tables |
(auto-discover) | Optional list of tables to read |
query |
Custom SQL query (overrides tables) | |
batch_size_rows |
10000 |
Rows per ingest batch |
Namespace: postgres.{table_name}
ClickHouse¶
Reads from ClickHouse via the HTTP API.
source:
kind: clickhouse_source
url: http://localhost:8123
database: default
user: default
password: ${CLICKHOUSE_PASSWORD}
tables:
- events
| Field | Default | Description |
|---|---|---|
url |
http://localhost:8123 |
ClickHouse HTTP URL |
database |
Database name | |
user |
default |
Username |
password |
Password | |
tables |
(optional) | Tables to extract |
query |
Custom SQL query (overrides tables) |
Namespace: clickhouse.{database}.{table}
MotherDuck¶
Reads from a MotherDuck database.
source:
kind: motherduck_source
motherduck_token: ${MOTHERDUCK_TOKEN}
database: my_database
tables:
- raw.events
| Field | Default | Description |
|---|---|---|
motherduck_token |
(required) | MotherDuck token (or set MOTHERDUCK_TOKEN) |
database |
Database name | |
tables |
(optional) | Tables to extract |
query |
Custom SQL query (overrides tables) |
Delta Lake¶
Reads from a Delta Lake table URI (for example S3, ADLS, or local).
source:
kind: delta_lake
table_uri: "s3://my-bucket/path/to/table"
storage_options:
AWS_REGION: us-east-1
version: 5
| Field | Default | Description |
|---|---|---|
table_uri |
(required) | Path to the Delta table (e.g. s3://, abfss://, file path) |
storage_options |
Key/value options for the object store (credentials, region, etc.) | |
version |
Optional table version to read | |
filter |
Optional predicate filter expression |
Redshift¶
Reads data from Amazon Redshift via the Data API.
source:
kind: redshift
cluster_identifier: my-cluster
database: analytics
db_user: admin
region: us-east-1
| Field | Default | Description |
|---|---|---|
cluster_identifier |
Redshift cluster identifier | |
workgroup_name |
Serverless workgroup (alternative to cluster) | |
database |
(required) | Database name |
db_user |
Database user (for cluster mode) | |
tables |
List of tables to read | |
query |
Custom SQL query | |
region |
AWS region |
Namespace: redshift.{database}.{table_name}
MongoDB¶
Reads documents from a MongoDB collection, converting BSON to JSON.
source:
kind: mongodb
connection_string: "mongodb://localhost:27017"
database: mydb
collection: events
| Field | Default | Description |
|---|---|---|
connection_string |
(required) | MongoDB connection URI |
database |
(required) | Database name |
collection |
(required) | Collection name |
filter |
Optional JSON filter document | |
batch_size_rows |
Rows per batch |
Namespace: mongodb.{database}.{collection}
DynamoDB¶
Reads items from an Amazon DynamoDB table.
source:
kind: dynamodb
table_name: my-table
region: us-east-1
| Field | Default | Description |
|---|---|---|
table_name |
(required) | DynamoDB table name |
region |
AWS region | |
endpoint_url |
Custom endpoint (e.g. LocalStack) |
Namespace: dynamodb.{table_name}
Object Stores¶
S3¶
Reads files from an Amazon S3 bucket.
source:
kind: s3
s3_bucket: my-bucket
s3_prefix: data/
| Field | Default | Description |
|---|---|---|
s3_bucket |
(required) | S3 bucket name |
s3_prefix |
Key prefix for filtering objects | |
region |
AWS region | |
endpoint_url |
Custom endpoint |
See Connect: S3 for a step-by-step setup guide.
SFTP¶
Downloads files from an SFTP server.
source:
kind: sftp
host: sftp.example.com
username: user
password: ${SFTP_PASSWORD}
remote_path: "/data/*.json"
| Field | Default | Description |
|---|---|---|
host |
(required) | SFTP server hostname |
port |
22 |
SSH port |
username |
(required) | SSH username |
password |
Password authentication | |
private_key_path |
Path to SSH private key | |
remote_path |
(required) | Remote file path or glob |
Namespace: sftp.{filename}
Streaming & Messaging¶
Kafka¶
Consumes messages from a Kafka topic.
source:
kind: kafka
brokers: "localhost:9092"
topic: events
group_id: skippr-consumer
| Field | Default | Description |
|---|---|---|
brokers |
(required) | Kafka bootstrap servers |
topic |
(required) | Topic to consume |
group_id |
auto-generated | Consumer group ID |
auto_offset_reset |
earliest |
earliest or latest |
security_protocol |
Security protocol | |
sasl_mechanism |
SASL mechanism | |
sasl_username / sasl_password |
SASL credentials | |
mode |
stream |
stream or batch |
idle_timeout_seconds |
5 |
Batch mode idle timeout |
Namespace: kafka.{topic}
SQS¶
Consumes messages from an Amazon SQS queue.
source:
kind: sqs
queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-queue"
region: us-east-1
| Field | Default | Description |
|---|---|---|
queue_url |
(required) | SQS queue URL |
region |
AWS region | |
endpoint_url |
Custom endpoint |
Kinesis¶
Consumes records from an Amazon Kinesis stream.
source:
kind: kinesis
stream_name: my-stream
region: us-east-1
| Field | Default | Description |
|---|---|---|
stream_name |
(required) | Kinesis stream name |
region |
AWS region |
AMQP (RabbitMQ)¶
Consumes messages from an AMQP queue.
source:
kind: amqp
connection_string: "amqp://guest:guest@localhost:5672"
queue: events
| Field | Default | Description |
|---|---|---|
connection_string |
(required) | AMQP connection URI |
queue |
(required) | Queue name |
exchange |
Exchange to bind to | |
routing_key |
Routing key for binding | |
consumer_tag |
auto-generated | Consumer tag |
prefetch_count |
10 |
Prefetch count |
mode |
stream |
stream or batch |
idle_timeout_seconds |
5 |
Batch mode idle timeout |
Namespace: amqp.{queue}
SNS¶
Consumes AWS SNS messages via an SQS subscription.
source:
kind: sns
topic_arn: "arn:aws:sns:us-east-1:123456:my-topic"
sqs_queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-sns-queue"
region: us-east-1
| Field | Default | Description |
|---|---|---|
topic_arn |
(required) | SNS topic ARN |
sqs_queue_url |
(required) | SQS queue URL subscribed to the topic |
region |
AWS region | |
endpoint_url |
Custom endpoint |
Namespace: sns.{topic_name}
EventBridge¶
Consumes AWS EventBridge events via an SQS queue target.
source:
kind: eventbridge
event_bus_name: my-bus
sqs_queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-eb-queue"
region: us-east-1
| Field | Default | Description |
|---|---|---|
event_bus_name |
(required) | EventBridge bus name |
sqs_queue_url |
(required) | SQS queue URL receiving events |
region |
AWS region | |
endpoint_url |
Custom endpoint |
Namespace: eventbridge.{event_bus_name}
MQTT¶
Subscribes to an MQTT topic and ingests messages.
source:
kind: mqtt
broker_url: "mqtt.example.com"
port: 1883
topic: "sensors/temperature"
| Field | Default | Description |
|---|---|---|
broker_url |
(required) | MQTT broker hostname |
port |
1883 |
Broker port |
topic |
(required) | Topic to subscribe to |
client_id |
auto-generated | MQTT client ID |
qos |
1 |
Quality of Service (0, 1, 2) |
username / password |
Optional broker credentials | |
mode |
stream |
stream or batch |
idle_timeout_seconds |
5 |
Batch mode idle timeout |
Namespace: mqtt.{topic}
WebSocket¶
Connects to a WebSocket server and ingests received messages.
source:
kind: websocket
url: "ws://localhost:8080/stream"
| Field | Default | Description |
|---|---|---|
url |
(required) | WebSocket URL (ws:// or wss://) |
headers |
Additional request headers | |
ping_interval_seconds |
30 |
Ping interval |
mode |
stream |
stream or batch |
idle_timeout_seconds |
5 |
Batch mode idle timeout |
Namespace: websocket.{url_host}
HTTP & Network¶
HTTP Client¶
Fetches data from an HTTP endpoint. Supports one-shot or periodic polling.
source:
kind: http_client
url: "https://api.example.com/data"
method: GET
scrape_interval_seconds: 60
| Field | Default | Description |
|---|---|---|
url |
(required) | HTTP endpoint URL |
method |
GET |
HTTP method (GET, POST, PUT) |
headers |
Map of additional request headers | |
body |
Request body string | |
auth.strategy |
basic or bearer |
|
auth.user / auth.password |
Credentials for basic auth | |
auth.token |
Token for bearer auth | |
scrape_interval_seconds |
Polling interval; omit for one-shot | |
scrape_timeout_seconds |
5 |
Request timeout |
Namespace: http.{url_host}
HTTP Server¶
Listens for incoming HTTP POST requests and ingests their bodies.
source:
kind: http_server
listen_address: "0.0.0.0:8080"
path: "/"
| Field | Default | Description |
|---|---|---|
listen_address |
0.0.0.0:8080 |
Address to bind the HTTP server |
path |
/ |
URL path to listen on |
auth_token |
Optional Bearer token for authentication |
Namespace: http_server.{path}
Socket (TCP/UDP/Unix)¶
Listens on a TCP, UDP, or Unix socket for incoming data.
source:
kind: socket
mode: tcp
address: "0.0.0.0:9000"
| Field | Default | Description |
|---|---|---|
mode |
(required) | tcp, udp, or unix |
address |
(required) | Bind address (host:port or socket path) |
framing |
newline |
Frame delimiter (newline or bytes) |
Namespace: socket.{mode}.{address}
StatsD¶
Listens for StatsD metrics over UDP and converts them to JSON.
source:
kind: statsd
listen_address: "0.0.0.0:8125"
| Field | Default | Description |
|---|---|---|
listen_address |
0.0.0.0:8125 |
UDP address to listen on |
Namespace: statsd
Other¶
Local File¶
Reads data from local files.
source:
kind: file
path: "/data/events.json"
Stdin¶
Reads data from standard input.
source:
kind: stdin