Astropods Spec

Declarative YAML format for AI agent topology
View as Markdown

Abstract

The AstroAI Spec defines a declarative YAML format for describing the topology of an AI agent — its container, model dependencies, knowledge stores, tool services, integrations, and data ingestion pipelines. The spec is consumed by build tools and deployment servers; it intentionally excludes runtime, orchestration, and deployment-environment concerns. At deploy time, the platform combines this spec with runtime configuration (credentials, interfaces, schedules) to produce a resolved deployment spec, which is then translated into infrastructure manifests.

Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.


1. Introduction

An AstroAI Spec file (astropods.yml) is a YAML document that declares:

  • The agent’s container image (pre-built or build-from-source).
  • Components the agent depends on — models, knowledge stores, and tools — each supplied by either a platform-managed provider or a user-managed container.
  • Custom providers for external API services that require credential injection.
  • Data ingestion pipelines with trigger semantics.
  • Local development overrides.

A central concept is provider binding. Components that an agent depends on — models, knowledge stores, and tools — are each declared as either a provider reference or a container definition:

  • A provider is a named, platform-known service (e.g. ollama, anthropic, qdrant). The platform resolves each provider to one of two kinds:
    • Self-hosted — the platform deploys and manages a container on the agent’s behalf.
    • Cloud — the platform injects credentials for an external API.
  • A container gives the user full control over the image, port, and configuration.

This design lets authors mix managed and custom components freely within a single spec.

The spec does not cover: resource limits (CPU/memory), observability, rate limits, budgets, security policies, deployment region, or interface routing (Slack, web). These are deployment-time concerns configured separately.

The document format is YAML. Implementations MUST accept files named astropods.yml.


2. Top-Level Structure

A conforming document MUST contain the following top-level fields:

FieldTypeRequiredDescription
specstringREQUIREDSpec version identifier. MUST be package/v1.
namestringREQUIREDUnique agent name.
metaobjectREQUIREDAgent metadata.
agentobjectREQUIREDAgent definition.
modelsmap<string, Model>OPTIONALModel entries (LLMs, embedding models, etc).
knowledgemap<string, Knowledge>OPTIONALKnowledge store entries.
toolsmap<string, Tool>OPTIONALTool service entries.
providersmap<string, Provider>OPTIONALCustom provider entries.
inputsmap<string, Input>OPTIONALUser-supplied inputs injected into every container at deploy time.
ingestionmap<string, Ingestion>OPTIONALData ingestion pipeline entries.
devobjectOPTIONALLocal development overrides.

Map keys serve as entry names and are used in credential injection (see Section 8).

2.1 Meta

FieldTypeRequiredDescription
descriptionstringOPTIONALHuman-readable agent description.
tagsstring[]OPTIONALClassification tags.

3. Agent

The agent object defines the agent’s primary service — its container image or build configuration.

FieldTypeRequiredDescription
imagestringConditionalPre-built container image reference.
buildBuildConfigConditionalBuild-from-source configuration.
distributedbooleanOPTIONALWhether the agent supports multi-replica deployment. Default: false.
healthcheckHealthcheckOPTIONALHealth check configuration.
inputsInput[]OPTIONALUser-supplied inputs injected into the agent container.

An agent entry MUST specify exactly one of image or build. Providing both or neither is invalid.

3.1 BuildConfig

FieldTypeRequiredDescription
contextstringREQUIREDBuild context path.
dockerfilestringREQUIREDPath to Dockerfile relative to context.
targetstringOPTIONALMulti-stage build target.
argsmap<string, string>OPTIONALBuild arguments passed to the builder.
secretsBuildSecret[]OPTIONALBuild-time secrets.

BuildSecret

FieldTypeRequiredDescription
idstringREQUIREDSecret identifier used in --mount=type=secret,id=<id>.
envstringOPTIONALEnvironment variable to source the secret value from.

3.2 Healthcheck

Applies to the agent definition and to any ContainerConfig.healthcheck in component sections.

FieldTypeRequiredDescription
teststring[]OPTIONALCustom health check command (e.g. ["CMD", "redis-cli", "ping"]).
pathstringOPTIONALHTTP path for health check (auto-generates a curl test command).
intervalstringOPTIONALCheck interval. Default: 10s.
timeoutstringOPTIONALResponse timeout. Default: 5s.
retriesintegerOPTIONALConsecutive failures before unhealthy. Default: 3.

Implementations SHOULD support both test (exec-based) and path (HTTP-based) health checks. When path is provided, the implementation SHOULD generate an equivalent test command.


4. Component Sections: Models, Knowledge, Tools

Models, knowledge stores, and tools share a unified provider binding scheme. Each entry operates in exactly one of two modes:

  • Provider mode — the entry specifies a provider string. The platform resolves this to either a self-hosted provider (deploys a container from its registry) or a cloud provider (injects credentials).
  • Container mode — the entry specifies a container object. The user manages the image, port, and configuration.

These modes are mutually exclusive: an entry MUST specify exactly one of provider or container. Providing both or neither is invalid.

4.1 Models

The models section declares AI models the agent consumes — LLMs (e.g. Claude, GPT, Llama), embedding models, or any model served behind an inference API. Each entry in the models map:

FieldTypeRequiredDescription
providerstringConditionalPlatform-managed provider name (e.g. ollama, anthropic).
modelstringOPTIONALProvider-specific model identifier (e.g. llama3.2). Only meaningful in provider mode.
containerContainerConfigConditionalCustom container configuration.
inputsInput[]OPTIONALUser-supplied inputs injected into the model’s container.

4.2 Knowledge

Each entry in the knowledge map:

FieldTypeRequiredDescription
providerstringConditionalPlatform-managed provider name (e.g. qdrant, pinecone).
containerContainerConfigConditionalCustom container configuration.
persistentbooleanOPTIONALWhether data SHOULD be persisted across restarts. Default: false.
inputsInput[]OPTIONALUser-supplied inputs injected into the knowledge container.

When persistent is true, the platform SHOULD provision durable storage for the entry regardless of mode.

4.3 Tools

The tools section declares services the agent invokes to perform actions or retrieve data. Tools can be HTTP APIs, MCP (Model Context Protocol) servers, or any service exposed over a network port. Each entry in the tools map:

FieldTypeRequiredDescription
providerstringConditionalPlatform-managed provider name (e.g. github, gitlab).
containerContainerConfigConditionalCustom container configuration.
inputsInput[]OPTIONALUser-supplied inputs injected into the tool’s container.

For external API services that need credentials but no platform-managed container, define a custom provider in the providers section (see Section 5) instead.

4.4 ContainerConfig

Used by container-mode entries and by ingestion containers.

FieldTypeRequiredDescription
imagestringConditionalContainer image reference.
buildBuildConfigConditionalBuild-from-source configuration (same schema as Section 3.1).
portintegerOPTIONALPrimary port the container listens on.
environmentmap<string, string>OPTIONALStatic environment variables injected into the container.
gpuGPUConfigOPTIONALGPU resource requirements.
persistentbooleanOPTIONALWhether data SHOULD be persisted. Default: false.
healthcheckHealthcheckOPTIONALHealth check configuration (same schema as Section 3.2).

A ContainerConfig SHOULD specify at least one of image or build.

GPUConfig

FieldTypeRequiredDescription
vramstringOPTIONALGPU memory required (e.g. 24Gi).
runtimestringOPTIONALGPU runtime. MUST be one of cuda or rocm. Default: cuda.

4.5 Input

An Input declares a user-supplied value that the platform prompts for at deploy time and injects as an environment variable into the target container. The name is used directly as the env var key. See Section 8.4 for injection targets.

FieldTypeRequiredDescription
namestringREQUIREDEnv var key injected into the target container.
datatypestringREQUIREDValue type. MUST be one of: string, boolean, number, array, object.
secretbooleanOPTIONALIf true, the platform MUST store the value securely and MUST NOT log it. Default: false.
descriptionstringOPTIONALHuman-readable description for deploy-time prompts.
display-asstringOPTIONALUI rendering hint. MUST be one of: short-text, long-text, select.
optionsstring[]OPTIONALAllowed values. When present, UIs SHOULD render a dropdown/select. Required when display-as is select.
defaultstringOPTIONALDefault value pre-filled in the UI.
optionalbooleanOPTIONALIf true, the input MAY be omitted at deploy time. Default: false.

The datatype field controls validation and type coercion applied before injection. secret is orthogonal to datatype — it controls storage and logging: when true, the platform stores the value securely and never logs it. The display-as field controls UI rendering: short-text renders a single-line field, long-text a multi-line field, and select a dropdown using options.


5. Custom Providers

The providers section extends the platform’s built-in provider registry with user-defined entries. A custom provider declares the variables it requires so the platform can prompt for them at deploy time. Custom providers behave like cloud providers (§8.1): they inject credentials into the agent, not connection details.

Each variable’s name is a suffix. The full env var key is formed as {UPPER(provider)}_{varName}, following the same rule as §8.1. Duplicate-entry handling also mirrors §8.1: when multiple entries reference the same custom provider, each entry gets a qualified key; the primary entry also gets the bare key.

Custom providers can be referenced by name from the models, knowledge, and tools sections, just like built-in providers. The scope field controls which sections are allowed to reference the provider — the platform MUST reject references from sections not listed in scope.

FieldTypeRequiredDescription
scopestring[]REQUIREDSections that may reference this provider. MUST contain one or more of: models, knowledge, tools.
variablesInput[]REQUIREDVariables this provider requires from the user. MUST contain at least one entry.
configmap<string, any>OPTIONALProvider-specific configuration.

6. Ingestion

The ingestion section declares data ingestion pipelines. Each entry is a container that runs on a trigger.

FieldTypeRequiredDescription
containerContainerConfigREQUIREDContainer that performs the ingestion.
triggerIngestionTriggerREQUIREDWhen the container runs.
inputsInput[]OPTIONALUser-supplied inputs injected into the ingestion container.

IngestionTrigger

FieldTypeRequiredDescription
typestringREQUIREDMUST be one of: schedule, startup, manual, webhook.

Trigger type semantics:

  • schedule — runs on a cron schedule. The cron expression is supplied at deploy time.
  • startup — runs once automatically at deploy time.
  • manual — runs on demand via API invocation.
  • webhook — deploys as a long-running service that receives incoming HTTP requests. The container SHOULD declare a port when using this trigger type.

7. Dev

The dev section provides local development overrides consumed by astro dev. These fields are deployment concerns that do not belong in the normative agent topology.

FieldTypeRequiredDescription
interfacesstring[]OPTIONALMessaging interfaces to enable locally (e.g. slack, web).
commandstringOPTIONALCustom start command for the agent. Default: bun --watch run start.
overridesDevOverridesOPTIONALImage overrides for local dev services.

DevOverrides

FieldTypeRequiredDescription
messagingImagestringOPTIONALCustom image for the messaging sidecar.
playgroundImagestringOPTIONALCustom image for the playground UI.

8. Environment Variable Injection Model

The platform automatically injects environment variables into the agent to wire it to its dependencies. The injection model differs by entry mode: cloud providers inject credentials, self-hosted providers inject connection details, and container-mode entries inject generic connection details.

8.1 Cloud Provider Credentials

Cloud providers (in models, knowledge, tools) require user-provided credentials at deploy time. The env var key is derived from the provider name, not the entry name:

Single entry for a provider:

{UPPER(provider)}_{suffix}

Example: one anthropic model entry → ANTHROPIC_API_KEY.

Multiple entries for the same provider (duplicate handling):

Each entry gets a name-qualified key:

{UPPER(provider)}_{UPPER(entry_name)}_{suffix}

Additionally, a “primary” entry also receives the bare {UPPER(provider)}_{suffix} key for convenience. The primary entry is the one whose name matches the provider (e.g. an entry named anthropic using provider: anthropic); if no entry name matches, the first alphabetically is primary.

When the entry name equals the provider name, the redundant qualified form (e.g. ANTHROPIC_ANTHROPIC_API_KEY) is omitted — only the bare key is produced.

Examples (single entry):

  • models.primary with provider: anthropicANTHROPIC_API_KEY
  • tools.github with provider: githubGITHUB_TOKEN
  • knowledge.vectors with provider: pineconePINECONE_API_KEY

Examples (duplicate entries, two anthropic models):

  • models.anthropic + models.sonnet both with provider: anthropic:
    • anthropic (name matches provider, primary) → ANTHROPIC_API_KEY
    • sonnetANTHROPIC_SONNET_API_KEY

Cloud provider credentials are always required.

8.2 Self-Hosted Provider Connection Details

Self-hosted providers deploy a container. The platform injects connection env vars using the provider’s env prefix:

Single entry for a provider:

{EnvPrefix}_HOST, {EnvPrefix}_PORT, {EnvPrefix}_URL

Example: one qdrant knowledge entry → QDRANT_HOST, QDRANT_PORT, QDRANT_URL.

Multiple entries for the same self-hosted provider:

Each entry gets name-qualified keys; the first alphabetically also gets bare keys:

{EnvPrefix}_{UPPER(entry_name)}_HOST (plus bare {EnvPrefix}_HOST for first)

Model providers additionally inject {EnvPrefix}_BASE_URL (with /api appended) and {EnvPrefix}_MODEL when a model name is specified.

8.3 Container-Mode Connection Details

Container-mode entries (no provider) receive generic section-prefixed env vars:

  • Models: MODEL_{UPPER(name)}_HOST, MODEL_{UPPER(name)}_PORT, MODEL_{UPPER(name)}_URL
  • Knowledge: KNOWLEDGE_{UPPER(name)}_HOST, KNOWLEDGE_{UPPER(name)}_PORT
  • Tools: TOOL_{UPPER(name)}_HOST, TOOL_{UPPER(name)}_PORT, TOOL_{UPPER(name)}_URL

8.4 Inputs

Inputs are user-supplied values prompted at deploy time. Each input’s name is used directly as the env var key (no prefix) in the target container:

Declared onInjected into
Top-level inputsAll containers
agent.inputsAgent container
models[].inputsModel container
knowledge[].inputsKnowledge container
tools[].inputsTool container
ingestion[].inputsIngestion container

providers[].variables is a template only — it declares what variables a provider requires so the platform can prompt for them at deploy time.

Example: inputs: [{name: OPENAI_API_KEY, datatype: secret}]OPENAI_API_KEY in the target container.

8.5 Name Sanitization

Entry names used in env var keys are sanitized: converted to lowercase, hyphens, underscores and dots replaced with underscores, non-alphanumeric characters removed, consecutive underscores collapsed, then uppercased. For example, entry name my-model sanitizes to my_model, then uppercases to MY_MODEL.


9. Validation Rules

Implementations MUST enforce the following validation rules:

  1. spec MUST be a non-empty string.
  2. name MUST be a non-empty string.
  3. agent MUST specify exactly one of image or build. If build is present, build.context and build.dockerfile are REQUIRED.
  4. For each entry in models: provider and container are mutually exclusive. Exactly one MUST be present.
  5. For each entry in knowledge: provider and container are mutually exclusive. Exactly one MUST be present.
  6. For each entry in tools: provider and container are mutually exclusive. Exactly one MUST be present.
  7. For each entry in providers: scope MUST be present and contain one or more of models, knowledge, tools. variables MUST be present and MUST contain at least one element. Each variable MUST have a non-empty name and a valid datatype. 7a. When a component entry references a custom provider by name, the referencing section MUST be listed in that provider’s scope.
  8. For each entry in ingestion: both container and trigger are REQUIRED. trigger.type MUST be one of schedule, startup, manual, webhook.
  9. When a BuildConfig is provided (in agent.build, container.build), context and dockerfile are REQUIRED.
  10. When gpu.runtime is provided, it MUST be one of cuda or rocm.
  11. When an input’s display-as is select, options MUST be present and non-empty.
  12. Each Input in any context MUST have a non-empty name and datatype MUST be one of string, boolean, number, array, object.

Appendix A: Provider Registries (Non-Normative)

The following tables document the platform’s built-in provider registries as of this specification version. Implementations MAY extend these registries.

A.1 Model Providers

Self-Hosted

ProviderImagePortHealth CheckGPUDefault Env
ollamaollama/ollama:latest11434HTTP /api/tagsYesOLLAMA_HOST=0.0.0.0, OLLAMA_KEEP_ALIVE=-1

When model is specified for a self-hosted provider, the platform sets {ENV_PREFIX}_MODEL (e.g. OLLAMA_MODEL=llama3.2).

Cloud

ProviderCredential SuffixDescription
anthropicAPI_KEYAnthropic API key for Claude models
openaiAPI_KEYOpenAI API key for GPT models
googleAPI_KEYGoogle API key for Gemini models
geminiAPI_KEYGoogle API key for Gemini models (alias for google)
cohereAPI_KEYCohere API key for language models

A.2 Knowledge Providers

Self-Hosted

ProviderImagePortExtra PortsMount PathHealth CheckDefault Env
qdrantqdrant/qdrant:latest6333gRPC 6334/qdrant/storageHTTP /healthz
redisredis:7-alpine6379/dataredis-cli ping
postgrespostgres:15-alpine5432/var/lib/postgresql/datapg_isready -U postgres
neo4jneo4j:5-community7474Bolt 7687/dataHTTP /NEO4J_AUTH=none

Cloud

ProviderCredential SuffixDescription
pineconeAPI_KEYPinecone API key for vector database

A.3 Tool Providers

Cloud

ProviderCredential SuffixDescription
githubTOKENGitHub token for API access
gitlabTOKENGitLab token for API access

Appendix B: JSON Schema

A machine-readable JSON Schema for this specification is maintained at astropods.schema.json in the astro-spec package. The schema is generated from the normative type definitions and MAY be used for editor autocompletion and pre-validation.

Schema ID: https://astropods.ai/schema/package.json


Appendix C: Complete Example

1spec: package/v1
2name: engineering-assistant
3
4meta:
5 description: Engineering knowledge assistant with self-hosted and cloud components
6 tags: [engineering, support, internal]
7
8agent:
9 build:
10 context: .
11 dockerfile: Dockerfile
12 secrets:
13 - id: npm_token
14 env: GITHUB_PACKAGES_TOKEN
15 inputs:
16 - name: LOG_LEVEL
17 datatype: string
18 default: info
19 description: Agent log level
20
21inputs:
22 ALLOWED_ORIGINS:
23 name: ALLOWED_ORIGINS
24 datatype: string
25 description: Comma-separated list of allowed CORS origins
26 optional: true
27
28models:
29 local_llm:
30 provider: ollama
31 model: llama3.2
32
33 primary:
34 provider: anthropic
35
36 embedder:
37 container:
38 build:
39 context: ./embedder
40 dockerfile: Dockerfile
41 port: 8000
42 healthcheck:
43 path: /health
44 inputs:
45 - name: EMBEDDING_BATCH_SIZE
46 datatype: number
47 default: "32"
48 description: Number of texts to embed per request
49
50knowledge:
51 docs:
52 provider: qdrant
53 persistent: true
54
55 cache:
56 provider: redis
57
58tools:
59 github:
60 provider: github
61
62 jira:
63 provider: my-jira # references custom provider below
64
65providers:
66 my-jira:
67 scope: [tools]
68 variables:
69 - name: API_KEY # → MY_JIRA_API_KEY
70 datatype: string
71 secret: true
72 description: Jira API key
73 - name: BASE_URL # → MY_JIRA_BASE_URL
74 datatype: string
75 display-as: short-text
76 description: Jira instance URL
77 - name: PROJECT # → MY_JIRA_PROJECT
78 datatype: string
79 display-as: select
80 options: [ENG, PLATFORM, INFRA]
81 description: Default Jira project
82 - name: HMAC_SECRET # → MY_JIRA_HMAC_SECRET
83 datatype: string
84 secret: true
85 description: Shared secret for HMAC signing
86 optional: true
87
88ingestion:
89 docs_sync:
90 container:
91 image: my-docs-sync:latest
92 environment:
93 SOURCE_REPO: company/engineering-docs
94 TARGET_COLLECTION: docs
95 trigger:
96 type: schedule
97 inputs:
98 - name: SYNC_BATCH_SIZE
99 datatype: number
100 default: "100"
101 description: Number of documents to sync per batch
102
103 initial_load:
104 container:
105 image: my-bootstrap-worker:latest
106 trigger:
107 type: startup
108
109dev:
110 interfaces: [slack, web]
111 command: bun --watch run start