Skip to content

Source Connectors

Skippr supports a wide range of source connectors for extracting data from databases, object stores, streaming platforms, APIs, and more.

Databases

MSSQL

Reads data from Microsoft SQL Server tables.

source:
  kind: mssql
  connection_string: ${MSSQL_CONNECTION_STRING}
Field Default Description
connection_string (required) ADO.NET connection string
tables (auto-discover) Optional list of tables to ingest
batch_size_rows 10000 Rows per ingest batch

Namespace: mssql.{database}.{schema}.{table}

See Connect: MSSQL for a step-by-step setup guide.


MySQL

Reads data from MySQL tables.

source:
  kind: mysql
  connection_string: ${MYSQL_CONNECTION_STRING}
Field Default Description
connection_string (required) MySQL connection string (e.g. mysql://user:pass@host:3306/db)
tables (auto-discover) Optional list of schema.table names to ingest
batch_size_rows 10000 Rows per ingest batch

Namespace: mysql.{database}.{schema}.{table}


PostgreSQL

Reads data from PostgreSQL tables.

source:
  kind: postgres
  host: localhost
  port: 5432
  user: postgres
  password: ${POSTGRES_PASSWORD}
  database: mydb
Field Default Description
host localhost Postgres host
port 5432 Postgres port
user postgres Username
password Password
database Database name
connection_string Full connection string (overrides individual fields)
tables (auto-discover) Optional list of tables to read
query Custom SQL query (overrides tables)
batch_size_rows 10000 Rows per ingest batch

Namespace: postgres.{table_name}


ClickHouse

Reads from ClickHouse via the HTTP API.

source:
  kind: clickhouse_source
  url: http://localhost:8123
  database: default
  user: default
  password: ${CLICKHOUSE_PASSWORD}
  tables:
    - events
Field Default Description
url http://localhost:8123 ClickHouse HTTP URL
database Database name
user default Username
password Password
tables (optional) Tables to extract
query Custom SQL query (overrides tables)

Namespace: clickhouse.{database}.{table}


MotherDuck

Reads from a MotherDuck database.

source:
  kind: motherduck_source
  motherduck_token: ${MOTHERDUCK_TOKEN}
  database: my_database
  tables:
    - raw.events
Field Default Description
motherduck_token (required) MotherDuck token (or set MOTHERDUCK_TOKEN)
database Database name
tables (optional) Tables to extract
query Custom SQL query (overrides tables)

Delta Lake

Reads from a Delta Lake table URI (for example S3, ADLS, or local).

source:
  kind: delta_lake
  table_uri: "s3://my-bucket/path/to/table"
  storage_options:
    AWS_REGION: us-east-1
  version: 5
Field Default Description
table_uri (required) Path to the Delta table (e.g. s3://, abfss://, file path)
storage_options Key/value options for the object store (credentials, region, etc.)
version Optional table version to read
filter Optional predicate filter expression

Redshift

Reads data from Amazon Redshift via the Data API.

source:
  kind: redshift
  cluster_identifier: my-cluster
  database: analytics
  db_user: admin
  region: us-east-1
Field Default Description
cluster_identifier Redshift cluster identifier
workgroup_name Serverless workgroup (alternative to cluster)
database (required) Database name
db_user Database user (for cluster mode)
tables List of tables to read
query Custom SQL query
region AWS region

Namespace: redshift.{database}.{table_name}


MongoDB

Reads documents from a MongoDB collection, converting BSON to JSON.

source:
  kind: mongodb
  connection_string: "mongodb://localhost:27017"
  database: mydb
  collection: events
Field Default Description
connection_string (required) MongoDB connection URI
database (required) Database name
collection (required) Collection name
filter Optional JSON filter document
batch_size_rows Rows per batch

Namespace: mongodb.{database}.{collection}


DynamoDB

Reads items from an Amazon DynamoDB table.

source:
  kind: dynamodb
  table_name: my-table
  region: us-east-1
Field Default Description
table_name (required) DynamoDB table name
region AWS region
endpoint_url Custom endpoint (e.g. LocalStack)

Namespace: dynamodb.{table_name}


Object Stores

S3

Reads files from an Amazon S3 bucket.

source:
  kind: s3
  s3_bucket: my-bucket
  s3_prefix: data/
Field Default Description
s3_bucket (required) S3 bucket name
s3_prefix Key prefix for filtering objects
region AWS region
endpoint_url Custom endpoint

See Connect: S3 for a step-by-step setup guide.


SFTP

Downloads files from an SFTP server.

source:
  kind: sftp
  host: sftp.example.com
  username: user
  password: ${SFTP_PASSWORD}
  remote_path: "/data/*.json"
Field Default Description
host (required) SFTP server hostname
port 22 SSH port
username (required) SSH username
password Password authentication
private_key_path Path to SSH private key
remote_path (required) Remote file path or glob

Namespace: sftp.{filename}


Streaming & Messaging

Kafka

Consumes messages from a Kafka topic.

source:
  kind: kafka
  brokers: "localhost:9092"
  topic: events
  group_id: skippr-consumer
Field Default Description
brokers (required) Kafka bootstrap servers
topic (required) Topic to consume
group_id auto-generated Consumer group ID
auto_offset_reset earliest earliest or latest
security_protocol Security protocol
sasl_mechanism SASL mechanism
sasl_username / sasl_password SASL credentials
mode stream stream or batch
idle_timeout_seconds 5 Batch mode idle timeout

Namespace: kafka.{topic}


SQS

Consumes messages from an Amazon SQS queue.

source:
  kind: sqs
  queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-queue"
  region: us-east-1
Field Default Description
queue_url (required) SQS queue URL
region AWS region
endpoint_url Custom endpoint

Kinesis

Consumes records from an Amazon Kinesis stream.

source:
  kind: kinesis
  stream_name: my-stream
  region: us-east-1
Field Default Description
stream_name (required) Kinesis stream name
region AWS region

AMQP (RabbitMQ)

Consumes messages from an AMQP queue.

source:
  kind: amqp
  connection_string: "amqp://guest:guest@localhost:5672"
  queue: events
Field Default Description
connection_string (required) AMQP connection URI
queue (required) Queue name
exchange Exchange to bind to
routing_key Routing key for binding
consumer_tag auto-generated Consumer tag
prefetch_count 10 Prefetch count
mode stream stream or batch
idle_timeout_seconds 5 Batch mode idle timeout

Namespace: amqp.{queue}


SNS

Consumes AWS SNS messages via an SQS subscription.

source:
  kind: sns
  topic_arn: "arn:aws:sns:us-east-1:123456:my-topic"
  sqs_queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-sns-queue"
  region: us-east-1
Field Default Description
topic_arn (required) SNS topic ARN
sqs_queue_url (required) SQS queue URL subscribed to the topic
region AWS region
endpoint_url Custom endpoint

Namespace: sns.{topic_name}


EventBridge

Consumes AWS EventBridge events via an SQS queue target.

source:
  kind: eventbridge
  event_bus_name: my-bus
  sqs_queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-eb-queue"
  region: us-east-1
Field Default Description
event_bus_name (required) EventBridge bus name
sqs_queue_url (required) SQS queue URL receiving events
region AWS region
endpoint_url Custom endpoint

Namespace: eventbridge.{event_bus_name}


MQTT

Subscribes to an MQTT topic and ingests messages.

source:
  kind: mqtt
  broker_url: "mqtt.example.com"
  port: 1883
  topic: "sensors/temperature"
Field Default Description
broker_url (required) MQTT broker hostname
port 1883 Broker port
topic (required) Topic to subscribe to
client_id auto-generated MQTT client ID
qos 1 Quality of Service (0, 1, 2)
username / password Optional broker credentials
mode stream stream or batch
idle_timeout_seconds 5 Batch mode idle timeout

Namespace: mqtt.{topic}


WebSocket

Connects to a WebSocket server and ingests received messages.

source:
  kind: websocket
  url: "ws://localhost:8080/stream"
Field Default Description
url (required) WebSocket URL (ws:// or wss://)
headers Additional request headers
ping_interval_seconds 30 Ping interval
mode stream stream or batch
idle_timeout_seconds 5 Batch mode idle timeout

Namespace: websocket.{url_host}


HTTP & Network

HTTP Client

Fetches data from an HTTP endpoint. Supports one-shot or periodic polling.

source:
  kind: http_client
  url: "https://api.example.com/data"
  method: GET
  scrape_interval_seconds: 60
Field Default Description
url (required) HTTP endpoint URL
method GET HTTP method (GET, POST, PUT)
headers Map of additional request headers
body Request body string
auth.strategy basic or bearer
auth.user / auth.password Credentials for basic auth
auth.token Token for bearer auth
scrape_interval_seconds Polling interval; omit for one-shot
scrape_timeout_seconds 5 Request timeout

Namespace: http.{url_host}


HTTP Server

Listens for incoming HTTP POST requests and ingests their bodies.

source:
  kind: http_server
  listen_address: "0.0.0.0:8080"
  path: "/"
Field Default Description
listen_address 0.0.0.0:8080 Address to bind the HTTP server
path / URL path to listen on
auth_token Optional Bearer token for authentication

Namespace: http_server.{path}


Socket (TCP/UDP/Unix)

Listens on a TCP, UDP, or Unix socket for incoming data.

source:
  kind: socket
  mode: tcp
  address: "0.0.0.0:9000"
Field Default Description
mode (required) tcp, udp, or unix
address (required) Bind address (host:port or socket path)
framing newline Frame delimiter (newline or bytes)

Namespace: socket.{mode}.{address}


StatsD

Listens for StatsD metrics over UDP and converts them to JSON.

source:
  kind: statsd
  listen_address: "0.0.0.0:8125"
Field Default Description
listen_address 0.0.0.0:8125 UDP address to listen on

Namespace: statsd


Other

Local File

Reads data from local files.

source:
  kind: file
  path: "/data/events.json"

Stdin

Reads data from standard input.

source:
  kind: stdin