Astropods Spec | Astro AI

Abstract

The AstroAI Spec defines a declarative YAML format for describing the topology of an AI agent — its container, model dependencies, knowledge stores, tool services, integrations, and data ingestion pipelines. The spec is consumed by build tools and deployment servers; it intentionally excludes runtime, orchestration, and deployment-environment concerns. At deploy time, the platform combines this spec with runtime configuration (credentials, interfaces, schedules) to produce a resolved deployment spec, which is then translated into infrastructure manifests.

Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

1. Introduction

An AstroAI Spec file (astropods.yml) is a YAML document that declares:

The agent’s container image (pre-built or build-from-source).
Components the agent depends on — models, knowledge stores, and tools — each supplied by either a platform-managed provider or a user-managed container.
Custom providers for external API services that require credential injection.
Data ingestion pipelines with trigger semantics.
Local development overrides.

A central concept is provider binding. Components that an agent depends on — models, knowledge stores, and tools — are each declared as either a provider reference or a container definition:

A provider is a named, platform-known service (e.g. ollama, anthropic, qdrant). The platform resolves each provider to one of two kinds:
- Self-hosted — the platform deploys and manages a container on the agent’s behalf.
- Cloud — the platform injects credentials for an external API.
A container gives the user full control over the image, port, and configuration.

This design lets authors mix managed and custom components freely within a single spec.

The spec does not cover: resource limits (CPU/memory), observability, rate limits, budgets, security policies, deployment region, or interface routing (Slack, web). These are deployment-time concerns configured separately.

The document format is YAML. Implementations MUST accept files named astropods.yml.

2. Top-Level Structure

A conforming document MUST contain the following top-level fields:

Field	Type	Required	Description
`spec`	string	REQUIRED	Spec version identifier. MUST be `package/v1`.
`name`	string	REQUIRED	Unique agent name.
`meta`	object	REQUIRED	Agent metadata.
`agent`	object	REQUIRED	Agent definition.
`models`	map<string, Model>	OPTIONAL	Model entries (LLMs, embedding models, etc).
`knowledge`	map<string, Knowledge>	OPTIONAL	Knowledge store entries.
`tools`	map<string, Tool>	OPTIONAL	Tool service entries.
`providers`	map<string, Provider>	OPTIONAL	Custom provider entries.
`inputs`	map<string, Input>	OPTIONAL	User-supplied inputs injected into every container at deploy time.
`ingestion`	map<string, Ingestion>	OPTIONAL	Data ingestion pipeline entries.
`dev`	object	OPTIONAL	Local development overrides.

Map keys serve as entry names and are used in credential injection (see Section 8).

2.1 Meta

Field	Type	Required	Description
`description`	string	OPTIONAL	Human-readable agent description.
`tags`	string[]	OPTIONAL	Classification tags.

3. Agent

The agent object defines the agent’s primary service — its container image or build configuration.

Field	Type	Required	Description
`image`	string	Conditional	Pre-built container image reference.
`build`	BuildConfig	Conditional	Build-from-source configuration.
`distributed`	boolean	OPTIONAL	Whether the agent supports multi-replica deployment. Default: `false`.
`healthcheck`	Healthcheck	OPTIONAL	Health check configuration.
`inputs`	Input[]	OPTIONAL	User-supplied inputs injected into the agent container.

An agent entry MUST specify exactly one of image or build. Providing both or neither is invalid.

3.1 BuildConfig

Field	Type	Required	Description
`context`	string	REQUIRED	Build context path.
`dockerfile`	string	REQUIRED	Path to Dockerfile relative to context.
`target`	string	OPTIONAL	Multi-stage build target.
`args`	map<string, string>	OPTIONAL	Build arguments passed to the builder.
`secrets`	BuildSecret[]	OPTIONAL	Build-time secrets.

BuildSecret

Field	Type	Required	Description
`id`	string	REQUIRED	Secret identifier used in `--mount=type=secret,id=<id>`.
`env`	string	OPTIONAL	Environment variable to source the secret value from.

3.2 Healthcheck

Applies to the agent definition and to any ContainerConfig.healthcheck in component sections.

Field	Type	Required	Description
`test`	string[]	OPTIONAL	Custom health check command (e.g. `["CMD", "redis-cli", "ping"]`).
`path`	string	OPTIONAL	HTTP path for health check (auto-generates a `curl` test command).
`interval`	string	OPTIONAL	Check interval. Default: `10s`.
`timeout`	string	OPTIONAL	Response timeout. Default: `5s`.
`retries`	integer	OPTIONAL	Consecutive failures before unhealthy. Default: `3`.

Implementations SHOULD support both test (exec-based) and path (HTTP-based) health checks. When path is provided, the implementation SHOULD generate an equivalent test command.

4. Component Sections: Models, Knowledge, Tools

Models, knowledge stores, and tools share a unified provider binding scheme. Each entry operates in exactly one of two modes:

Provider mode — the entry specifies a provider string. The platform resolves this to either a self-hosted provider (deploys a container from its registry) or a cloud provider (injects credentials).
Container mode — the entry specifies a container object. The user manages the image, port, and configuration.

These modes are mutually exclusive: an entry MUST specify exactly one of provider or container. Providing both or neither is invalid.

4.1 Models

The models section declares AI models the agent consumes — LLMs (e.g. Claude, GPT, Llama), embedding models, or any model served behind an inference API. Each entry in the models map:

Field	Type	Required	Description
`provider`	string	Conditional	Platform-managed provider name (e.g. `ollama`, `anthropic`).
`model`	string	OPTIONAL	Provider-specific model identifier (e.g. `llama3.2`). Only meaningful in provider mode.
`container`	ContainerConfig	Conditional	Custom container configuration.
`inputs`	Input[]	OPTIONAL	User-supplied inputs injected into the model’s container.

4.2 Knowledge

Each entry in the knowledge map:

Field	Type	Required	Description
`provider`	string	Conditional	Platform-managed provider name (e.g. `qdrant`, `pinecone`).
`container`	ContainerConfig	Conditional	Custom container configuration.
`persistent`	boolean	OPTIONAL	Whether data SHOULD be persisted across restarts. Default: `false`.
`inputs`	Input[]	OPTIONAL	User-supplied inputs injected into the knowledge container.

When persistent is true, the platform SHOULD provision durable storage for the entry regardless of mode.

4.3 Tools

The tools section declares services the agent invokes to perform actions or retrieve data. Tools can be HTTP APIs, MCP (Model Context Protocol) servers, or any service exposed over a network port. Each entry in the tools map:

Field	Type	Required	Description
`provider`	string	Conditional	Platform-managed provider name (e.g. `github`, `gitlab`).
`container`	ContainerConfig	Conditional	Custom container configuration.
`inputs`	Input[]	OPTIONAL	User-supplied inputs injected into the tool’s container.

For external API services that need credentials but no platform-managed container, define a custom provider in the providers section (see Section 5) instead.

4.4 ContainerConfig

Used by container-mode entries and by ingestion containers.

Field	Type	Required	Description
`image`	string	Conditional	Container image reference.
`build`	BuildConfig	Conditional	Build-from-source configuration (same schema as Section 3.1).
`port`	integer	OPTIONAL	Primary port the container listens on.
`environment`	map<string, string>	OPTIONAL	Static environment variables injected into the container.
`gpu`	GPUConfig	OPTIONAL	GPU resource requirements.
`persistent`	boolean	OPTIONAL	Whether data SHOULD be persisted. Default: `false`.
`healthcheck`	Healthcheck	OPTIONAL	Health check configuration (same schema as Section 3.2).

A ContainerConfig SHOULD specify at least one of image or build.

GPUConfig

Field	Type	Required	Description
`vram`	string	OPTIONAL	GPU memory required (e.g. `24Gi`).
`runtime`	string	OPTIONAL	GPU runtime. MUST be one of `cuda` or `rocm`. Default: `cuda`.

4.5 Input

An Input declares a user-supplied value that the platform prompts for at deploy time and injects as an environment variable into the target container. The name is used directly as the env var key. See Section 8.4 for injection targets.

Field	Type	Required	Description
`name`	string	REQUIRED	Env var key injected into the target container.
`datatype`	string	REQUIRED	Value type. MUST be one of: `string`, `boolean`, `number`, `array`, `object`.
`secret`	boolean	OPTIONAL	If `true`, the platform MUST store the value securely and MUST NOT log it. Default: `false`.
`description`	string	OPTIONAL	Human-readable description for deploy-time prompts.
`display-as`	string	OPTIONAL	UI rendering hint. MUST be one of: `short-text`, `long-text`, `select`.
`options`	string[]	OPTIONAL	Allowed values. When present, UIs SHOULD render a dropdown/select. Required when `display-as` is `select`.
`default`	string	OPTIONAL	Default value pre-filled in the UI.
`optional`	boolean	OPTIONAL	If `true`, the input MAY be omitted at deploy time. Default: `false`.

The datatype field controls validation and type coercion applied before injection. secret is orthogonal to datatype — it controls storage and logging: when true, the platform stores the value securely and never logs it. The display-as field controls UI rendering: short-text renders a single-line field, long-text a multi-line field, and select a dropdown using options.

5. Custom Providers

The providers section extends the platform’s built-in provider registry with user-defined entries. A custom provider declares the variables it requires so the platform can prompt for them at deploy time. Custom providers behave like cloud providers (§8.1): they inject credentials into the agent, not connection details.

Each variable’s name is a suffix. The full env var key is formed as {UPPER(provider)}_{varName}, following the same rule as §8.1. Duplicate-entry handling also mirrors §8.1: when multiple entries reference the same custom provider, each entry gets a qualified key; the primary entry also gets the bare key.

Custom providers can be referenced by name from the models, knowledge, and tools sections, just like built-in providers. The scope field controls which sections are allowed to reference the provider — the platform MUST reject references from sections not listed in scope.

Field	Type	Required	Description
`scope`	string[]	REQUIRED	Sections that may reference this provider. MUST contain one or more of: `models`, `knowledge`, `tools`.
`variables`	Input[]	REQUIRED	Variables this provider requires from the user. MUST contain at least one entry.
`config`	map<string, any>	OPTIONAL	Provider-specific configuration.

6. Ingestion

The ingestion section declares data ingestion pipelines. Each entry is a container that runs on a trigger.

Field	Type	Required	Description
`container`	ContainerConfig	REQUIRED	Container that performs the ingestion.
`trigger`	IngestionTrigger	REQUIRED	When the container runs.
`inputs`	Input[]	OPTIONAL	User-supplied inputs injected into the ingestion container.

IngestionTrigger

Field	Type	Required	Description
`type`	string	REQUIRED	MUST be one of: `schedule`, `startup`, `manual`, `webhook`.

Trigger type semantics:

schedule — runs on a cron schedule. The cron expression is supplied at deploy time.
startup — runs once automatically at deploy time.
manual — runs on demand via API invocation.
webhook — deploys as a long-running service that receives incoming HTTP requests. The container SHOULD declare a port when using this trigger type.

7. Dev

The dev section provides local development overrides consumed by astro dev. These fields are deployment concerns that do not belong in the normative agent topology.

Field	Type	Required	Description
`interfaces`	string[]	OPTIONAL	Messaging interfaces to enable locally (e.g. `slack`, `web`).
`command`	string	OPTIONAL	Custom start command for the agent. Default: `bun --watch run start`.
`overrides`	DevOverrides	OPTIONAL	Image overrides for local dev services.

DevOverrides

Field	Type	Required	Description
`messagingImage`	string	OPTIONAL	Custom image for the messaging sidecar.
`playgroundImage`	string	OPTIONAL	Custom image for the playground UI.

8. Environment Variable Injection Model

The platform automatically injects environment variables into the agent to wire it to its dependencies. The injection model differs by entry mode: cloud providers inject credentials, self-hosted providers inject connection details, and container-mode entries inject generic connection details.

8.1 Cloud Provider Credentials

Cloud providers (in models, knowledge, tools) require user-provided credentials at deploy time. The env var key is derived from the provider name, not the entry name:

Single entry for a provider:

{UPPER(provider)}_{suffix}

Example: one anthropic model entry → ANTHROPIC_API_KEY.

Multiple entries for the same provider (duplicate handling):

Each entry gets a name-qualified key:

{UPPER(provider)}_{UPPER(entry_name)}_{suffix}

Additionally, a “primary” entry also receives the bare {UPPER(provider)}_{suffix} key for convenience. The primary entry is the one whose name matches the provider (e.g. an entry named anthropic using provider: anthropic); if no entry name matches, the first alphabetically is primary.

When the entry name equals the provider name, the redundant qualified form (e.g. ANTHROPIC_ANTHROPIC_API_KEY) is omitted — only the bare key is produced.

Examples (single entry):

models.primary with provider: anthropic → ANTHROPIC_API_KEY
tools.github with provider: github → GITHUB_TOKEN
knowledge.vectors with provider: pinecone → PINECONE_API_KEY

Examples (duplicate entries, two anthropic models):

models.anthropic + models.sonnet both with provider: anthropic:
- anthropic (name matches provider, primary) → ANTHROPIC_API_KEY
- sonnet → ANTHROPIC_SONNET_API_KEY

Cloud provider credentials are always required.

8.2 Self-Hosted Provider Connection Details

Self-hosted providers deploy a container. The platform injects connection env vars using the provider’s env prefix:

Single entry for a provider:

{EnvPrefix}_HOST, {EnvPrefix}_PORT, {EnvPrefix}_URL

Example: one qdrant knowledge entry → QDRANT_HOST, QDRANT_PORT, QDRANT_URL.

Multiple entries for the same self-hosted provider:

Each entry gets name-qualified keys; the first alphabetically also gets bare keys:

{EnvPrefix}_{UPPER(entry_name)}_HOST  (plus bare {EnvPrefix}_HOST for first)

Model providers additionally inject {EnvPrefix}_BASE_URL (with /api appended) and {EnvPrefix}_MODEL when a model name is specified.

8.3 Container-Mode Connection Details

Container-mode entries (no provider) receive generic section-prefixed env vars:

Models: MODEL_{UPPER(name)}_HOST, MODEL_{UPPER(name)}_PORT, MODEL_{UPPER(name)}_URL
Knowledge: KNOWLEDGE_{UPPER(name)}_HOST, KNOWLEDGE_{UPPER(name)}_PORT
Tools: TOOL_{UPPER(name)}_HOST, TOOL_{UPPER(name)}_PORT, TOOL_{UPPER(name)}_URL

8.4 Inputs

Inputs are user-supplied values prompted at deploy time. Each input’s name is used directly as the env var key (no prefix) in the target container:

Declared on	Injected into
Top-level `inputs`	All containers
`agent.inputs`	Agent container
`models[].inputs`	Model container
`knowledge[].inputs`	Knowledge container
`tools[].inputs`	Tool container
`ingestion[].inputs`	Ingestion container

providers[].variables is a template only — it declares what variables a provider requires so the platform can prompt for them at deploy time.

Example: inputs: [{name: OPENAI_API_KEY, datatype: secret}] → OPENAI_API_KEY in the target container.

8.5 Name Sanitization

Entry names used in env var keys are sanitized: converted to lowercase, hyphens, underscores and dots replaced with underscores, non-alphanumeric characters removed, consecutive underscores collapsed, then uppercased. For example, entry name my-model sanitizes to my_model, then uppercases to MY_MODEL.

9. Validation Rules

Implementations MUST enforce the following validation rules:

spec MUST be a non-empty string.
name MUST be a non-empty string.
agent MUST specify exactly one of image or build. If build is present, build.context and build.dockerfile are REQUIRED.
For each entry in models: provider and container are mutually exclusive. Exactly one MUST be present.
For each entry in knowledge: provider and container are mutually exclusive. Exactly one MUST be present.
For each entry in tools: provider and container are mutually exclusive. Exactly one MUST be present.
For each entry in providers: scope MUST be present and contain one or more of models, knowledge, tools. variables MUST be present and MUST contain at least one element. Each variable MUST have a non-empty name and a valid datatype. 7a. When a component entry references a custom provider by name, the referencing section MUST be listed in that provider’s scope.
For each entry in ingestion: both container and trigger are REQUIRED. trigger.type MUST be one of schedule, startup, manual, webhook.
When a BuildConfig is provided (in agent.build, container.build), context and dockerfile are REQUIRED.
When gpu.runtime is provided, it MUST be one of cuda or rocm.
When an input’s display-as is select, options MUST be present and non-empty.
Each Input in any context MUST have a non-empty name and datatype MUST be one of string, boolean, number, array, object.

Appendix A: Provider Registries (Non-Normative)

The following tables document the platform’s built-in provider registries as of this specification version. Implementations MAY extend these registries.

A.1 Model Providers

Self-Hosted

Provider	Image	Port	Health Check	GPU	Default Env
`ollama`	`ollama/ollama:latest`	11434	HTTP `/api/tags`	Yes	`OLLAMA_HOST=0.0.0.0`, `OLLAMA_KEEP_ALIVE=-1`

When model is specified for a self-hosted provider, the platform sets {ENV_PREFIX}_MODEL (e.g. OLLAMA_MODEL=llama3.2).

Cloud

Provider	Credential Suffix	Description
`anthropic`	`API_KEY`	Anthropic API key for Claude models
`openai`	`API_KEY`	OpenAI API key for GPT models
`google`	`API_KEY`	Google API key for Gemini models
`gemini`	`API_KEY`	Google API key for Gemini models (alias for `google`)
`cohere`	`API_KEY`	Cohere API key for language models

A.2 Knowledge Providers

Self-Hosted

Provider	Image	Port	Extra Ports	Mount Path	Health Check	Default Env
`qdrant`	`qdrant/qdrant:latest`	6333	gRPC 6334	`/qdrant/storage`	HTTP `/healthz`	—
`redis`	`redis:7-alpine`	6379	—	`/data`	`redis-cli ping`	—
`postgres`	`postgres:15-alpine`	5432	—	`/var/lib/postgresql/data`	`pg_isready -U postgres`	—
`neo4j`	`neo4j:5-community`	7474	Bolt 7687	`/data`	HTTP `/`	`NEO4J_AUTH=none`

Cloud

Provider	Credential Suffix	Description
`pinecone`	`API_KEY`	Pinecone API key for vector database

A.3 Tool Providers

Cloud

Provider	Credential Suffix	Description
`github`	`TOKEN`	GitHub token for API access
`gitlab`	`TOKEN`	GitLab token for API access

Appendix B: JSON Schema

A machine-readable JSON Schema for this specification is maintained at astropods.schema.json in the astro-spec package. The schema is generated from the normative type definitions and MAY be used for editor autocompletion and pre-validation.

Schema ID: https://astropods.ai/schema/package.json

Appendix C: Complete Example

1 spec: package/v1
2 name: engineering-assistant
3 
4 meta:
5   description: Engineering knowledge assistant with self-hosted and cloud components
6   tags: [engineering, support, internal]
7 
8 agent:
9   build:
10     context: .
11     dockerfile: Dockerfile
12     secrets:
13       - id: npm_token
14         env: GITHUB_PACKAGES_TOKEN
15   inputs:
16     - name: LOG_LEVEL
17       datatype: string
18       default: info
19       description: Agent log level
20 
21 inputs:
22   ALLOWED_ORIGINS:
23     name: ALLOWED_ORIGINS
24     datatype: string
25     description: Comma-separated list of allowed CORS origins
26     optional: true
27 
28 models:
29   local_llm:
30     provider: ollama
31     model: llama3.2
32 
33   primary:
34     provider: anthropic
35 
36   embedder:
37     container:
38       build:
39         context: ./embedder
40         dockerfile: Dockerfile
41       port: 8000
42       healthcheck:
43         path: /health
44     inputs:
45       - name: EMBEDDING_BATCH_SIZE
46         datatype: number
47         default: "32"
48         description: Number of texts to embed per request
49 
50 knowledge:
51   docs:
52     provider: qdrant
53     persistent: true
54 
55   cache:
56     provider: redis
57 
58 tools:
59   github:
60     provider: github
61 
62   jira:
63     provider: my-jira    # references custom provider below
64 
65 providers:
66   my-jira:
67     scope: [tools]
68     variables:
69       - name: API_KEY           # → MY_JIRA_API_KEY
70         datatype: string
71         secret: true
72         description: Jira API key
73       - name: BASE_URL          # → MY_JIRA_BASE_URL
74         datatype: string
75         display-as: short-text
76         description: Jira instance URL
77       - name: PROJECT           # → MY_JIRA_PROJECT
78         datatype: string
79         display-as: select
80         options: [ENG, PLATFORM, INFRA]
81         description: Default Jira project
82       - name: HMAC_SECRET       # → MY_JIRA_HMAC_SECRET
83         datatype: string
84         secret: true
85         description: Shared secret for HMAC signing
86         optional: true
87 
88 ingestion:
89   docs_sync:
90     container:
91       image: my-docs-sync:latest
92       environment:
93         SOURCE_REPO: company/engineering-docs
94         TARGET_COLLECTION: docs
95     trigger:
96       type: schedule
97     inputs:
98       - name: SYNC_BATCH_SIZE
99         datatype: number
100         default: "100"
101         description: Number of documents to sync per batch
102 
103   initial_load:
104     container:
105       image: my-bootstrap-worker:latest
106     trigger:
107       type: startup
108 
109 dev:
110   interfaces: [slack, web]
111   command: bun --watch run start