skippr vector
The skippr vector command group handles vector store workflows that do not go through skippr model. Today the only subcommand is ingest-docs: walk declarative file sets from skippr.yml, chunk text, call the hosted embed API, and upsert vectors into tenant Lance storage on S3 (same keyspace layout as the data-engineer suite).
Public read copies of those vectors (for example marketing-site knowledge) are a separate sync step from your tenant prefix to the public vectors bucket; that is typically done in CI with a dedicated publish role, not by this CLI command alone.
Subcommands
| Subcommand | Purpose |
|---|---|
skippr vector ingest-docs | Chunk, embed, and upsert documentation (or similar text files) into Lance under your tenant bucket. |
Usage
skippr [--config <path>] [--log [level]] vector ingest-docs \
[--vector-source <key>] \
[--src-path <dir>] \
[--chunk-chars <n>] \
[--chunk-overlap <n>] \
[--include-glob <pattern>]... \
[--exclude-glob <pattern>]... \
[--dry-run] \
[--output text|json]Configuration (skippr.yml)
Discovery rules live entirely in YAML — the binary does not hard-code paths or extensions.
vector_sources— map of named sources. Each entry includes at leastroot(directory relative to the config file) andinclude(non-empty list of globs relative toroot). Optional:exclude,extensions, per-sourcechunk_chars/chunk_overlap.vector_ingest(optional) — defaults:default_vector_source,chunk_chars,chunk_overlap.project(top-level) — becomes the Reactproject_idin the Lance URI.skippr.workspace— workspace segment in the Lance path (same convention as the engine config).
There is no data_sink or warehouse requirement on this path; you still need Skippr authentication and a positive balance for embeddings.
Flags
| Flag | Description |
|---|---|
--vector-source <key> | Which vector_sources entry to use. Required when multiple sources exist and no vector_ingest.default_vector_source is set. |
--src-path <dir> | Override the scan root for this run (default: root from the selected source). |
--chunk-chars <n> | Override chunk size (characters). YAML / vector_ingest defaults apply when omitted. |
--chunk-overlap <n> | Override overlap between chunks. |
--include-glob <pattern> | Extra include glob (repeatable); merged after YAML includes. |
--exclude-glob <pattern> | Extra exclude glob (repeatable); merged after YAML excludes. |
--dry-run | Resolve files and count chunks only; no embed or Lance writes. |
--output json | Structured summary (dry run or post-ingest metadata including resolved Lance prefix). |
--output text | Human-readable progress. |
Global flags: --config, --log (same as other commands).
Authentication
Same as skippr model: skippr user login, SKIPPR_API_KEY in CI, and /auth/credentials for tenant S3 + LLM. Optional API fields knowledge_credentials and public_vectors_bucket apply to reading published public vectors in apps, not to ingest-docs writes (ingest uses the primary tenant credentials).
See also
- Config file —
vector_sourcesandvector_ingestsections. skippr model— warehouse-backed modeling (separate from doc vector ingest).
