Source Connectors¶

Skippr supports a wide range of source connectors for extracting data from databases, object stores, streaming platforms, APIs, and more.

Databases¶

MSSQL¶

Reads data from Microsoft SQL Server tables.

source:
  kind: mssql
  connection_string: ${MSSQL_CONNECTION_STRING}

Field	Default	Description
`connection_string`	(required)	ADO.NET connection string
`tables`	(auto-discover)	Optional list of tables to ingest
`batch_size_rows`	`10000`	Rows per ingest batch

Namespace: mssql.{database}.{schema}.{table}

See Connect: MSSQL for a step-by-step setup guide.

MySQL¶

Reads data from MySQL tables.

source:
  kind: mysql
  connection_string: ${MYSQL_CONNECTION_STRING}

Field	Default	Description
`connection_string`	(required)	MySQL connection string (e.g. `mysql://user:pass@host:3306/db`)
`tables`	(auto-discover)	Optional list of `schema.table` names to ingest
`batch_size_rows`	`10000`	Rows per ingest batch

Namespace: mysql.{database}.{schema}.{table}

PostgreSQL¶

Reads data from PostgreSQL tables.

source:
  kind: postgres
  host: localhost
  port: 5432
  user: postgres
  password: ${POSTGRES_PASSWORD}
  database: mydb

Field	Default	Description
`host`	`localhost`	Postgres host
`port`	`5432`	Postgres port
`user`	`postgres`	Username
`password`		Password
`database`		Database name
`connection_string`		Full connection string (overrides individual fields)
`tables`	(auto-discover)	Optional list of tables to read
`query`		Custom SQL query (overrides tables)
`batch_size_rows`	`10000`	Rows per ingest batch

Namespace: postgres.{table_name}

ClickHouse¶

Reads from ClickHouse via the HTTP API.

source:
  kind: clickhouse_source
  url: http://localhost:8123
  database: default
  user: default
  password: ${CLICKHOUSE_PASSWORD}
  tables:
    - events

Field	Default	Description
`url`	`http://localhost:8123`	ClickHouse HTTP URL
`database`		Database name
`user`	`default`	Username
`password`		Password
`tables`	(optional)	Tables to extract
`query`		Custom SQL query (overrides `tables`)

Namespace: clickhouse.{database}.{table}

MotherDuck¶

Reads from a MotherDuck database.

source:
  kind: motherduck_source
  motherduck_token: ${MOTHERDUCK_TOKEN}
  database: my_database
  tables:
    - raw.events

Field	Default	Description
`motherduck_token`	(required)	MotherDuck token (or set `MOTHERDUCK_TOKEN`)
`database`		Database name
`tables`	(optional)	Tables to extract
`query`		Custom SQL query (overrides `tables`)

Delta Lake¶

Reads from a Delta Lake table URI (for example S3, ADLS, or local).

source:
  kind: delta_lake
  table_uri: "s3://my-bucket/path/to/table"
  storage_options:
    AWS_REGION: us-east-1
  version: 5

Field	Default	Description
`table_uri`	(required)	Path to the Delta table (e.g. `s3://`, `abfss://`, file path)
`storage_options`		Key/value options for the object store (credentials, region, etc.)
`version`		Optional table version to read
`filter`		Optional predicate filter expression

Redshift¶

Reads data from Amazon Redshift via the Data API.

source:
  kind: redshift
  cluster_identifier: my-cluster
  database: analytics
  db_user: admin
  region: us-east-1

Field	Default	Description
`cluster_identifier`		Redshift cluster identifier
`workgroup_name`		Serverless workgroup (alternative to cluster)
`database`	(required)	Database name
`db_user`		Database user (for cluster mode)
`tables`		List of tables to read
`query`		Custom SQL query
`region`		AWS region

Namespace: redshift.{database}.{table_name}

MongoDB¶

Reads documents from a MongoDB collection, converting BSON to JSON.

source:
  kind: mongodb
  connection_string: "mongodb://localhost:27017"
  database: mydb
  collection: events

Field	Default	Description
`connection_string`	(required)	MongoDB connection URI
`database`	(required)	Database name
`collection`	(required)	Collection name
`filter`		Optional JSON filter document
`batch_size_rows`		Rows per batch

Namespace: mongodb.{database}.{collection}

DynamoDB¶

Reads items from an Amazon DynamoDB table.

source:
  kind: dynamodb
  table_name: my-table
  region: us-east-1

Field	Default	Description
`table_name`	(required)	DynamoDB table name
`region`		AWS region
`endpoint_url`		Custom endpoint (e.g. LocalStack)

Namespace: dynamodb.{table_name}

Object Stores¶

S3¶

Reads files from an Amazon S3 bucket.

source:
  kind: s3
  s3_bucket: my-bucket
  s3_prefix: data/

Field	Default	Description
`s3_bucket`	(required)	S3 bucket name
`s3_prefix`		Key prefix for filtering objects
`region`		AWS region
`endpoint_url`		Custom endpoint

See Connect: S3 for a step-by-step setup guide.

SFTP¶

Downloads files from an SFTP server.

source:
  kind: sftp
  host: sftp.example.com
  username: user
  password: ${SFTP_PASSWORD}
  remote_path: "/data/*.json"

Field	Default	Description
`host`	(required)	SFTP server hostname
`port`	`22`	SSH port
`username`	(required)	SSH username
`password`		Password authentication
`private_key_path`		Path to SSH private key
`remote_path`	(required)	Remote file path or glob

Namespace: sftp.{filename}

Streaming & Messaging¶

Kafka¶

Consumes messages from a Kafka topic.

source:
  kind: kafka
  brokers: "localhost:9092"
  topic: events
  group_id: skippr-consumer

Field	Default	Description
`brokers`	(required)	Kafka bootstrap servers
`topic`	(required)	Topic to consume
`group_id`	auto-generated	Consumer group ID
`auto_offset_reset`	`earliest`	`earliest` or `latest`
`security_protocol`		Security protocol
`sasl_mechanism`		SASL mechanism
`sasl_username` / `sasl_password`		SASL credentials
`mode`	`stream`	`stream` or `batch`
`idle_timeout_seconds`	`5`	Batch mode idle timeout

Namespace: kafka.{topic}

SQS¶

Consumes messages from an Amazon SQS queue.

source:
  kind: sqs
  queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-queue"
  region: us-east-1

Field	Default	Description
`queue_url`	(required)	SQS queue URL
`region`		AWS region
`endpoint_url`		Custom endpoint

Kinesis¶

Consumes records from an Amazon Kinesis stream.

source:
  kind: kinesis
  stream_name: my-stream
  region: us-east-1

Field	Default	Description
`stream_name`	(required)	Kinesis stream name
`region`		AWS region

AMQP (RabbitMQ)¶

Consumes messages from an AMQP queue.

source:
  kind: amqp
  connection_string: "amqp://guest:guest@localhost:5672"
  queue: events

Field	Default	Description
`connection_string`	(required)	AMQP connection URI
`queue`	(required)	Queue name
`exchange`		Exchange to bind to
`routing_key`		Routing key for binding
`consumer_tag`	auto-generated	Consumer tag
`prefetch_count`	`10`	Prefetch count
`mode`	`stream`	`stream` or `batch`
`idle_timeout_seconds`	`5`	Batch mode idle timeout

Namespace: amqp.{queue}

SNS¶

Consumes AWS SNS messages via an SQS subscription.

source:
  kind: sns
  topic_arn: "arn:aws:sns:us-east-1:123456:my-topic"
  sqs_queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-sns-queue"
  region: us-east-1

Field	Default	Description
`topic_arn`	(required)	SNS topic ARN
`sqs_queue_url`	(required)	SQS queue URL subscribed to the topic
`region`		AWS region
`endpoint_url`		Custom endpoint

Namespace: sns.{topic_name}

EventBridge¶

Consumes AWS EventBridge events via an SQS queue target.

source:
  kind: eventbridge
  event_bus_name: my-bus
  sqs_queue_url: "https://sqs.us-east-1.amazonaws.com/123456/my-eb-queue"
  region: us-east-1

Field	Default	Description
`event_bus_name`	(required)	EventBridge bus name
`sqs_queue_url`	(required)	SQS queue URL receiving events
`region`		AWS region
`endpoint_url`		Custom endpoint

Namespace: eventbridge.{event_bus_name}

MQTT¶

Subscribes to an MQTT topic and ingests messages.

source:
  kind: mqtt
  broker_url: "mqtt.example.com"
  port: 1883
  topic: "sensors/temperature"

Field	Default	Description
`broker_url`	(required)	MQTT broker hostname
`port`	`1883`	Broker port
`topic`	(required)	Topic to subscribe to
`client_id`	auto-generated	MQTT client ID
`qos`	`1`	Quality of Service (0, 1, 2)
`username` / `password`		Optional broker credentials
`mode`	`stream`	`stream` or `batch`
`idle_timeout_seconds`	`5`	Batch mode idle timeout

Namespace: mqtt.{topic}

WebSocket¶

Connects to a WebSocket server and ingests received messages.

source:
  kind: websocket
  url: "ws://localhost:8080/stream"

Field	Default	Description
`url`	(required)	WebSocket URL (`ws://` or `wss://`)
`headers`		Additional request headers
`ping_interval_seconds`	`30`	Ping interval
`mode`	`stream`	`stream` or `batch`
`idle_timeout_seconds`	`5`	Batch mode idle timeout

Namespace: websocket.{url_host}

HTTP & Network¶

HTTP Client¶

Fetches data from an HTTP endpoint. Supports one-shot or periodic polling.

source:
  kind: http_client
  url: "https://api.example.com/data"
  method: GET
  scrape_interval_seconds: 60

Field	Default	Description
`url`	(required)	HTTP endpoint URL
`method`	`GET`	HTTP method (GET, POST, PUT)
`headers`		Map of additional request headers
`body`		Request body string
`auth.strategy`		`basic` or `bearer`
`auth.user` / `auth.password`		Credentials for basic auth
`auth.token`		Token for bearer auth
`scrape_interval_seconds`		Polling interval; omit for one-shot
`scrape_timeout_seconds`	`5`	Request timeout

Namespace: http.{url_host}

HTTP Server¶

Listens for incoming HTTP POST requests and ingests their bodies.

source:
  kind: http_server
  listen_address: "0.0.0.0:8080"
  path: "/"

Field	Default	Description
`listen_address`	`0.0.0.0:8080`	Address to bind the HTTP server
`path`	`/`	URL path to listen on
`auth_token`		Optional Bearer token for authentication

Namespace: http_server.{path}

Socket (TCP/UDP/Unix)¶

Listens on a TCP, UDP, or Unix socket for incoming data.

source:
  kind: socket
  mode: tcp
  address: "0.0.0.0:9000"

Field	Default	Description
`mode`	(required)	`tcp`, `udp`, or `unix`
`address`	(required)	Bind address (host:port or socket path)
`framing`	`newline`	Frame delimiter (`newline` or `bytes`)

Namespace: socket.{mode}.{address}

StatsD¶

Listens for StatsD metrics over UDP and converts them to JSON.

source:
  kind: statsd
  listen_address: "0.0.0.0:8125"

Field	Default	Description
`listen_address`	`0.0.0.0:8125`	UDP address to listen on

Namespace: statsd

Other¶

Local File¶

Reads data from local files.

source:
  kind: file
  path: "/data/events.json"

Stdin¶

Reads data from standard input.

source:
  kind: stdin