*** title: Astropods Spec subtitle: Declarative YAML format for AI agent topology slug: astropods-package-spec ---------------------------- ## Abstract The Astropods Spec defines a declarative YAML format for describing the topology of an AI agent — its container, model dependencies, knowledge stores, integrations, and data ingestion pipelines. The spec is consumed by build tools and deployment servers; it intentionally excludes runtime, orchestration, and deployment-environment concerns. At deploy time, the platform combines this spec with runtime configuration (credentials, interfaces, schedules) to produce a resolved deployment spec, which is then translated into infrastructure manifests. ## Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). *** ## 1. Introduction An AstroAI Spec file (`astropods.yml`) is a YAML document that declares: * The agent's container image (pre-built or build-from-source). * Components the agent depends on — models, knowledge stores, and integrations — each supplied by either a platform-managed provider or a user-managed container. * Custom providers for external API services that require credential injection. * Data ingestion pipelines with trigger semantics. * Local development overrides. A central concept is **provider binding**. Components that an agent depends on — models, knowledge stores, and integrations — are each declared as either a **provider** reference or a **container** definition: * A **provider** is a named, platform-known service (e.g. `ollama`, `anthropic`, `qdrant`). The platform resolves each provider to one of two kinds: * **Self-hosted** — the platform deploys and manages a container on the agent's behalf. * **Cloud** — the platform injects credentials for an external API. * A **container** gives the user full control over the image, port, and configuration. This design lets authors mix managed and custom components freely within a single spec. The spec does **not** cover: resource limits (CPU/memory), observability, rate limits, budgets, security policies, deployment region, or interface routing (Slack, web). These are deployment-time concerns configured separately. The document format is YAML. Implementations MUST accept files named `astropods.yml`. *** ## 2. Top-Level Structure A conforming document MUST contain the following top-level fields: | Field | Type | Required | Description | | -------------- | ------------------------- | ------------ | ------------------------------------------------------------------ | | `spec` | string | **REQUIRED** | Spec version identifier. MUST be `package/v1`. | | `name` | string | **REQUIRED** | Unique agent name. | | `meta` | object | **REQUIRED** | Agent metadata. | | `agent` | object | **REQUIRED** | Agent definition. | | `models` | map\ | OPTIONAL | Model entries (LLMs, embedding models, etc). | | `knowledge` | map\ | OPTIONAL | Knowledge store entries. | | `integrations` | map\ | OPTIONAL | Integration entries. | | `providers` | map\ | OPTIONAL | Custom provider entries. | | `inputs` | map\ | OPTIONAL | User-supplied inputs injected into every container at deploy time. | | `ingestion` | map\ | OPTIONAL | Data ingestion pipeline entries. | | `dev` | object | OPTIONAL | Local development overrides. | Map keys serve as entry names and are used in credential injection (see [Section 8](#8-environment-variable-injection-model)). ### 2.1 Meta | Field | Type | Required | Description | | ------------- | --------- | -------- | --------------------------------- | | `description` | string | OPTIONAL | Human-readable agent description. | | `tags` | string\[] | OPTIONAL | Classification tags. | *** ## 3. Agent The `agent` object defines the agent's primary service — its container image or build configuration. | Field | Type | Required | Description | | ------------- | ----------- | ----------- | ---------------------------------------------------------------------- | | `image` | string | Conditional | Pre-built container image reference. | | `build` | BuildConfig | Conditional | Build-from-source configuration. | | `distributed` | boolean | OPTIONAL | Whether the agent supports multi-replica deployment. Default: `false`. | | `healthcheck` | Healthcheck | OPTIONAL | Health check configuration. | | `inputs` | Input\[] | OPTIONAL | User-supplied inputs injected into the agent container. | An agent entry MUST specify exactly one of `image` or `build`. Providing both or neither is invalid. ### 3.1 BuildConfig | Field | Type | Required | Description | | ------------ | -------------------- | ------------ | --------------------------------------- | | `context` | string | **REQUIRED** | Build context path. | | `dockerfile` | string | **REQUIRED** | Path to Dockerfile relative to context. | | `target` | string | OPTIONAL | Multi-stage build target. | | `args` | map\ | OPTIONAL | Build arguments passed to the builder. | | `secrets` | BuildSecret\[] | OPTIONAL | Build-time secrets. | #### BuildSecret | Field | Type | Required | Description | | ----- | ------ | ------------ | -------------------------------------------------------- | | `id` | string | **REQUIRED** | Secret identifier used in `--mount=type=secret,id=`. | | `env` | string | OPTIONAL | Environment variable to source the secret value from. | ### 3.2 Healthcheck Applies to the agent definition and to any `ContainerConfig.healthcheck` in component sections. | Field | Type | Required | Description | | ---------- | --------- | -------- | ------------------------------------------------------------------ | | `test` | string\[] | OPTIONAL | Custom health check command (e.g. `["CMD", "redis-cli", "ping"]`). | | `path` | string | OPTIONAL | HTTP path for health check (auto-generates a `curl` test command). | | `interval` | string | OPTIONAL | Check interval. Default: `10s`. | | `timeout` | string | OPTIONAL | Response timeout. Default: `5s`. | | `retries` | integer | OPTIONAL | Consecutive failures before unhealthy. Default: `3`. | Implementations SHOULD support both `test` (exec-based) and `path` (HTTP-based) health checks. When `path` is provided, the implementation SHOULD generate an equivalent `test` command. *** ## 4. Component Sections: Models, Knowledge, Integrations Models, knowledge stores, and integrations share a unified provider binding scheme. Each entry operates in exactly one of two modes: * **Provider mode** — the entry specifies a `provider` string. The platform resolves this to either a self-hosted provider (deploys a container from its registry) or a cloud provider (injects credentials). * **Container mode** — the entry specifies a `container` object. The user manages the image, port, and configuration. These modes are **mutually exclusive**: an entry MUST specify exactly one of `provider` or `container`. Providing both or neither is invalid. ### 4.1 Models The `models` section declares AI models the agent consumes — LLMs (e.g. Claude, GPT, Llama), embedding models, or any model served behind an inference API. Each entry in the `models` map: | Field | Type | Required | Description | | ----------- | --------------- | ----------- | --------------------------------------------------------------------------------------- | | `provider` | string | Conditional | Platform-managed provider name (e.g. `ollama`, `anthropic`). | | `model` | string | OPTIONAL | Provider-specific model identifier (e.g. `llama3.2`). Only meaningful in provider mode. | | `container` | ContainerConfig | Conditional | Custom container configuration. | | `inputs` | Input\[] | OPTIONAL | User-supplied inputs injected into the model's container. | ### 4.2 Knowledge Each entry in the `knowledge` map: | Field | Type | Required | Description | | ------------ | --------------- | ----------- | ------------------------------------------------------------------- | | `provider` | string | Conditional | Platform-managed provider name (e.g. `qdrant`, `pinecone`). | | `container` | ContainerConfig | Conditional | Custom container configuration. | | `persistent` | boolean | OPTIONAL | Whether data SHOULD be persisted across restarts. Default: `false`. | | `inputs` | Input\[] | OPTIONAL | User-supplied inputs injected into the knowledge container. | When `persistent` is `true`, the platform SHOULD provision durable storage for the entry regardless of mode. ### 4.3 Integrations The `integrations` section declares services the agent invokes to perform actions or retrieve data. Integrations can be HTTP APIs, MCP (Model Context Protocol) servers, or any service exposed over a network port. Each entry in the `integrations` map: | Field | Type | Required | Description | | ----------- | --------------- | ----------- | --------------------------------------------------------------- | | `provider` | string | Conditional | Platform-managed provider name (e.g. `github`, `gitlab`). | | `container` | ContainerConfig | Conditional | Custom container configuration. | | `inputs` | Input\[] | OPTIONAL | User-supplied inputs injected into the integration's container. | For external API services that need credentials but no platform-managed container, define a custom provider in the `providers` section (see [Section 5](#5-custom-providers)) instead. ### 4.4 ContainerConfig Used by container-mode entries and by ingestion containers. | Field | Type | Required | Description | | ------------- | -------------------- | ----------- | ------------------------------------------------------------- | | `image` | string | Conditional | Container image reference. | | `build` | BuildConfig | Conditional | Build-from-source configuration (same schema as Section 3.1). | | `port` | integer | OPTIONAL | Primary port the container listens on. | | `environment` | map\ | OPTIONAL | Static environment variables injected into the container. | | `gpu` | GPUConfig | OPTIONAL | GPU resource requirements. | | `persistent` | boolean | OPTIONAL | Whether data SHOULD be persisted. Default: `false`. | | `healthcheck` | Healthcheck | OPTIONAL | Health check configuration (same schema as Section 3.2). | A ContainerConfig SHOULD specify at least one of `image` or `build`. #### GPUConfig | Field | Type | Required | Description | | --------- | ------ | -------- | -------------------------------------------------------------- | | `vram` | string | OPTIONAL | GPU memory required (e.g. `24Gi`). | | `runtime` | string | OPTIONAL | GPU runtime. MUST be one of `cuda` or `rocm`. Default: `cuda`. | ### 4.5 Input An `Input` declares a user-supplied value that the platform prompts for at deploy time and injects as an environment variable into the target container. The `name` is used directly as the env var key. See [Section 8.4](#84-inputs) for injection targets. | Field | Type | Required | Description | | ------------- | --------- | ------------ | ---------------------------------------------------------------------------------------------------------- | | `name` | string | **REQUIRED** | Env var key injected into the target container. | | `datatype` | string | **REQUIRED** | Value type. MUST be one of: `string`, `boolean`, `number`, `array`, `object`. | | `secret` | boolean | OPTIONAL | If `true`, the platform MUST store the value securely and MUST NOT log it. Default: `false`. | | `description` | string | OPTIONAL | Human-readable description for deploy-time prompts. | | `display-as` | string | OPTIONAL | UI rendering hint. MUST be one of: `short-text`, `long-text`, `select`. | | `options` | string\[] | OPTIONAL | Allowed values. When present, UIs SHOULD render a dropdown/select. Required when `display-as` is `select`. | | `default` | string | OPTIONAL | Default value pre-filled in the UI. | | `optional` | boolean | OPTIONAL | If `true`, the input MAY be omitted at deploy time. Default: `false`. | The `datatype` field controls validation and type coercion applied before injection. `secret` is orthogonal to datatype — it controls storage and logging: when `true`, the platform stores the value securely and never logs it. The `display-as` field controls UI rendering: `short-text` renders a single-line field, `long-text` a multi-line field, and `select` a dropdown using `options`. *** ## 5. Custom Providers The `providers` section extends the platform's built-in provider registry with user-defined entries. A custom provider declares the variables it requires so the platform can prompt for them at deploy time. Custom providers behave like cloud providers (§8.1): they inject credentials into the agent, not connection details. Each variable's `name` is a **suffix**. The full env var key is formed as `{UPPER(provider)}_{varName}`, following the same rule as §8.1. Duplicate-entry handling also mirrors §8.1: when multiple entries reference the same custom provider, each entry gets a qualified key; the primary entry also gets the bare key. Custom providers can be referenced by name from the `models`, `knowledge`, and `integrations` sections, just like built-in providers. The `scope` field controls which sections are allowed to reference the provider — the platform MUST reject references from sections not listed in `scope`. | Field | Type | Required | Description | | ----------- | ----------------- | ------------ | -------------------------------------------------------------------------------------------------------------- | | `scope` | string\[] | **REQUIRED** | Sections that may reference this provider. MUST contain one or more of: `models`, `knowledge`, `integrations`. | | `variables` | Input\[] | **REQUIRED** | Variables this provider requires from the user. MUST contain at least one entry. | | `config` | map\ | OPTIONAL | Provider-specific configuration. | *** ## 6. Ingestion The `ingestion` section declares data ingestion pipelines. Each entry is a container that runs on a trigger. | Field | Type | Required | Description | | ----------- | ---------------- | ------------ | ----------------------------------------------------------- | | `container` | ContainerConfig | **REQUIRED** | Container that performs the ingestion. | | `trigger` | IngestionTrigger | **REQUIRED** | When the container runs. | | `inputs` | Input\[] | OPTIONAL | User-supplied inputs injected into the ingestion container. | #### IngestionTrigger | Field | Type | Required | Description | | ------ | ------ | ------------ | ----------------------------------------------------------- | | `type` | string | **REQUIRED** | MUST be one of: `schedule`, `startup`, `manual`, `webhook`. | Trigger type semantics: * **`schedule`** — runs on a cron schedule. The cron expression is supplied at deploy time. * **`startup`** — runs once automatically at deploy time. * **`manual`** — runs on demand via API invocation. * **`webhook`** — deploys as a long-running service that receives incoming HTTP requests. The container SHOULD declare a `port` when using this trigger type. *** ## 7. Dev The `dev` section provides local development overrides consumed by `astro dev`. These fields are deployment concerns that do not belong in the normative agent topology. | Field | Type | Required | Description | | ------------ | ------------ | -------- | --------------------------------------------------------------------- | | `interfaces` | string\[] | OPTIONAL | Messaging interfaces to enable locally (e.g. `slack`, `web`). | | `command` | string | OPTIONAL | Custom start command for the agent. Default: `bun --watch run start`. | | `overrides` | DevOverrides | OPTIONAL | Image overrides for local dev services. | #### DevOverrides | Field | Type | Required | Description | | ----------------- | ------ | -------- | --------------------------------------- | | `messagingImage` | string | OPTIONAL | Custom image for the messaging sidecar. | | `playgroundImage` | string | OPTIONAL | Custom image for the playground UI. | *** ## 8. Environment Variable Injection Model The platform automatically injects environment variables into the agent to wire it to its dependencies. The injection model differs by entry mode: cloud providers inject credentials, self-hosted providers inject connection details, and container-mode entries inject generic connection details. ### 8.1 Cloud Provider Credentials Cloud providers (in `models`, `knowledge`, `integrations`) require user-provided credentials at deploy time. The env var key is derived from the **provider name**, not the entry name: **Single entry for a provider:** ``` {UPPER(provider)}_{suffix} ``` Example: one `anthropic` model entry → `ANTHROPIC_API_KEY`. **Multiple entries for the same provider (duplicate handling):** Each entry gets a name-qualified key: ``` {UPPER(provider)}_{UPPER(entry_name)}_{suffix} ``` Additionally, a "primary" entry also receives the bare `{UPPER(provider)}_{suffix}` key for convenience. The primary entry is the one whose name matches the provider (e.g. an entry named `anthropic` using `provider: anthropic`); if no entry name matches, the first alphabetically is primary. When the entry name equals the provider name, the redundant qualified form (e.g. `ANTHROPIC_ANTHROPIC_API_KEY`) is omitted — only the bare key is produced. Examples (single entry): * `models.primary` with `provider: anthropic` → `ANTHROPIC_API_KEY` * `integrations.github` with `provider: github` → `GITHUB_TOKEN` * `knowledge.vectors` with `provider: pinecone` → `PINECONE_API_KEY` Examples (duplicate entries, two anthropic models): * `models.anthropic` + `models.sonnet` both with `provider: anthropic`: * `anthropic` (name matches provider, primary) → `ANTHROPIC_API_KEY` * `sonnet` → `ANTHROPIC_SONNET_API_KEY` Cloud provider credentials are always required. ### 8.2 Self-Hosted Provider Connection Details Self-hosted providers deploy a container. The platform injects connection env vars using the provider's env prefix: **Single entry for a provider:** ``` {EnvPrefix}_HOST, {EnvPrefix}_PORT, {EnvPrefix}_URL ``` Example: one `qdrant` knowledge entry → `QDRANT_HOST`, `QDRANT_PORT`, `QDRANT_URL`. **Multiple entries for the same self-hosted provider:** Each entry gets name-qualified keys; the first alphabetically also gets bare keys: ``` {EnvPrefix}_{UPPER(entry_name)}_HOST (plus bare {EnvPrefix}_HOST for first) ``` Model providers additionally inject `{EnvPrefix}_BASE_URL` (with `/api` appended) and `{EnvPrefix}_MODEL` when a model name is specified. ### 8.3 Container-Mode Connection Details Container-mode entries (no provider) receive generic section-prefixed env vars: * **Models:** `MODEL_{UPPER(name)}_HOST`, `MODEL_{UPPER(name)}_PORT`, `MODEL_{UPPER(name)}_URL` * **Knowledge:** `KNOWLEDGE_{UPPER(name)}_HOST`, `KNOWLEDGE_{UPPER(name)}_PORT` * **Integrations:** `INTEGRATION_{UPPER(name)}_HOST`, `INTEGRATION_{UPPER(name)}_PORT`, `INTEGRATION_{UPPER(name)}_URL` ### 8.4 Inputs Inputs are user-supplied values prompted at deploy time. Each input's `name` is used directly as the env var key (no prefix) in the target container: | Declared on | Injected into | | ----------------------- | --------------------- | | Top-level `inputs` | All containers | | `agent.inputs` | Agent container | | `models[].inputs` | Model container | | `knowledge[].inputs` | Knowledge container | | `integrations[].inputs` | Integration container | | `ingestion[].inputs` | Ingestion container | `providers[].variables` is a template only — it declares what variables a provider requires so the platform can prompt for them at deploy time. Example: `inputs: [{name: OPENAI_API_KEY, datatype: secret}]` → `OPENAI_API_KEY` in the target container. ### 8.5 Name Sanitization Entry names used in env var keys are sanitized: converted to lowercase, hyphens, underscores and dots replaced with underscores, non-alphanumeric characters removed, consecutive underscores collapsed, then uppercased. For example, entry name `my-model` sanitizes to `my_model`, then uppercases to `MY_MODEL`. *** ## 9. Validation Rules Implementations MUST enforce the following validation rules: 1. `spec` MUST be a non-empty string. 2. `name` MUST be a non-empty string. 3. `agent` MUST specify exactly one of `image` or `build`. If `build` is present, `build.context` and `build.dockerfile` are REQUIRED. 4. For each entry in `models`: `provider` and `container` are mutually exclusive. Exactly one MUST be present. 5. For each entry in `knowledge`: `provider` and `container` are mutually exclusive. Exactly one MUST be present. 6. For each entry in `integrations`: `provider` and `container` are mutually exclusive. Exactly one MUST be present. 7. For each entry in `providers`: `scope` MUST be present and contain one or more of `models`, `knowledge`, `integrations`. `variables` MUST be present and MUST contain at least one element. Each variable MUST have a non-empty `name` and a valid `datatype`. 7a. When a component entry references a custom provider by name, the referencing section MUST be listed in that provider's `scope`. 8. For each entry in `ingestion`: both `container` and `trigger` are REQUIRED. `trigger.type` MUST be one of `schedule`, `startup`, `manual`, `webhook`. 9. When a `BuildConfig` is provided (in `agent.build`, `container.build`), `context` and `dockerfile` are REQUIRED. 10. When `gpu.runtime` is provided, it MUST be one of `cuda` or `rocm`. 11. When an input's `display-as` is `select`, `options` MUST be present and non-empty. 12. Each `Input` in any context MUST have a non-empty `name` and `datatype` MUST be one of `string`, `boolean`, `number`, `array`, `object`. *** ## Appendix A: Provider Registries (Non-Normative) The following tables document the platform's built-in provider registries as of this specification version. Implementations MAY extend these registries. ### A.1 Model Providers #### Self-Hosted | Provider | Image | Port | Health Check | GPU | Default Env | | -------- | ---------------------- | ----- | ---------------- | --- | --------------------------------------------- | | `ollama` | `ollama/ollama:latest` | 11434 | HTTP `/api/tags` | Yes | `OLLAMA_HOST=0.0.0.0`, `OLLAMA_KEEP_ALIVE=-1` | When `model` is specified for a self-hosted provider, the platform sets `{ENV_PREFIX}_MODEL` (e.g. `OLLAMA_MODEL=llama3.2`). #### Cloud | Provider | Credential Suffix | Description | | ----------- | ----------------- | ----------------------------------------------------- | | `anthropic` | `API_KEY` | Anthropic API key for Claude models | | `openai` | `API_KEY` | OpenAI API key for GPT models | | `google` | `API_KEY` | Google API key for Gemini models | | `gemini` | `API_KEY` | Google API key for Gemini models (alias for `google`) | | `cohere` | `API_KEY` | Cohere API key for language models | ### A.2 Knowledge Providers #### Self-Hosted | Provider | Image | Port | Extra Ports | Mount Path | Health Check | Default Env | | ---------- | ---------------------- | ---- | ----------- | -------------------------- | ------------------------ | ----------------- | | `qdrant` | `qdrant/qdrant:latest` | 6333 | gRPC 6334 | `/qdrant/storage` | HTTP `/healthz` | — | | `redis` | `redis:7-alpine` | 6379 | — | `/data` | `redis-cli ping` | — | | `postgres` | `postgres:15-alpine` | 5432 | — | `/var/lib/postgresql/data` | `pg_isready -U postgres` | — | | `neo4j` | `neo4j:5-community` | 7474 | Bolt 7687 | `/data` | HTTP `/` | `NEO4J_AUTH=none` | #### Cloud | Provider | Credential Suffix | Description | | ---------- | ----------------- | ------------------------------------ | | `pinecone` | `API_KEY` | Pinecone API key for vector database | ### A.3 Integration Providers #### Cloud | Provider | Credential Suffix | Description | | -------- | ----------------- | --------------------------- | | `github` | `TOKEN` | GitHub token for API access | | `gitlab` | `TOKEN` | GitLab token for API access | *** ## Appendix B: JSON Schema A machine-readable JSON Schema for this specification is maintained at `astropods.schema.json` in the `astro-spec` package. The schema is generated from the normative type definitions and MAY be used for editor autocompletion and pre-validation. Schema ID: `https://astropods.ai/schema/package.json` *** ## Appendix C: Complete Example ```yaml spec: package/v1 name: engineering-assistant meta: description: Engineering knowledge assistant with self-hosted and cloud components tags: [engineering, support, internal] agent: build: context: . dockerfile: Dockerfile secrets: - id: npm_token env: GITHUB_PACKAGES_TOKEN inputs: - name: LOG_LEVEL datatype: string default: info description: Agent log level inputs: ALLOWED_ORIGINS: name: ALLOWED_ORIGINS datatype: string description: Comma-separated list of allowed CORS origins optional: true models: local_llm: provider: ollama model: llama3.2 primary: provider: anthropic embedder: container: build: context: ./embedder dockerfile: Dockerfile port: 8000 healthcheck: path: /health inputs: - name: EMBEDDING_BATCH_SIZE datatype: number default: "32" description: Number of texts to embed per request knowledge: docs: provider: qdrant persistent: true cache: provider: redis integrations: github: provider: github jira: provider: my-jira # references custom provider below providers: my-jira: scope: [integrations] variables: - name: API_KEY # → MY_JIRA_API_KEY datatype: string secret: true description: Jira API key - name: BASE_URL # → MY_JIRA_BASE_URL datatype: string display-as: short-text description: Jira instance URL - name: PROJECT # → MY_JIRA_PROJECT datatype: string display-as: select options: [ENG, PLATFORM, INFRA] description: Default Jira project - name: HMAC_SECRET # → MY_JIRA_HMAC_SECRET datatype: string secret: true description: Shared secret for HMAC signing optional: true ingestion: docs_sync: container: image: my-docs-sync:latest environment: SOURCE_REPO: company/engineering-docs TARGET_COLLECTION: docs trigger: type: schedule inputs: - name: SYNC_BATCH_SIZE datatype: number default: "100" description: Number of documents to sync per batch initial_load: container: image: my-bootstrap-worker:latest trigger: type: startup dev: interfaces: [slack, web] command: bun --watch run start ```