Skip to content

skippr vector

The skippr vector command group handles vector store workflows that do not go through skippr model. Today the only subcommand is ingest-docs: walk declarative file sets from skippr.yml, chunk text, call the hosted embed API, and upsert vectors into tenant Lance storage on S3 (same keyspace layout as the data-engineer suite).

Public read copies of those vectors (for example marketing-site knowledge) are a separate sync step from your tenant prefix to the public vectors bucket; that is typically done in CI with a dedicated publish role, not by this CLI command alone.

Subcommands

SubcommandPurpose
skippr vector ingest-docsChunk, embed, and upsert documentation (or similar text files) into Lance under your tenant bucket.

Usage

bash
skippr [--config <path>] [--log [level]] vector ingest-docs \
  [--vector-source <key>] \
  [--src-path <dir>] \
  [--chunk-chars <n>] \
  [--chunk-overlap <n>] \
  [--include-glob <pattern>]... \
  [--exclude-glob <pattern>]... \
  [--dry-run] \
  [--output text|json]

Configuration (skippr.yml)

Discovery rules live entirely in YAML — the binary does not hard-code paths or extensions.

  • vector_sources — map of named sources. Each entry includes at least root (directory relative to the config file) and include (non-empty list of globs relative to root). Optional: exclude, extensions, per-source chunk_chars / chunk_overlap.
  • vector_ingest (optional) — defaults: default_vector_source, chunk_chars, chunk_overlap.
  • project (top-level) — becomes the React project_id in the Lance URI.
  • skippr.workspace — workspace segment in the Lance path (same convention as the engine config).

There is no data_sink or warehouse requirement on this path; you still need Skippr authentication and a positive balance for embeddings.

Flags

FlagDescription
--vector-source <key>Which vector_sources entry to use. Required when multiple sources exist and no vector_ingest.default_vector_source is set.
--src-path <dir>Override the scan root for this run (default: root from the selected source).
--chunk-chars <n>Override chunk size (characters). YAML / vector_ingest defaults apply when omitted.
--chunk-overlap <n>Override overlap between chunks.
--include-glob <pattern>Extra include glob (repeatable); merged after YAML includes.
--exclude-glob <pattern>Extra exclude glob (repeatable); merged after YAML excludes.
--dry-runResolve files and count chunks only; no embed or Lance writes.
--output jsonStructured summary (dry run or post-ingest metadata including resolved Lance prefix).
--output textHuman-readable progress.

Global flags: --config, --log (same as other commands).

Authentication

Same as skippr model: skippr user login, SKIPPR_API_KEY in CI, and /auth/credentials for tenant S3 + LLM. Optional API fields knowledge_credentials and public_vectors_bucket apply to reading published public vectors in apps, not to ingest-docs writes (ingest uses the primary tenant credentials).

See also

  • Config filevector_sources and vector_ingest sections.
  • skippr model — warehouse-backed modeling (separate from doc vector ingest).