Agentic Stacks

A stack is a git repository that teaches an AI agent how to operate in a specific domain. When an agent reads your stack, it should become an expert operator — capable of deploying, managing, troubleshooting, and upgrading the target software.

Anatomy of a Stack

Every stack has these files at the root:

my-stack/
├── README.md       # Repo landing page
├── CLAUDE.md       # Agent entry point — persona, rules, routing
├── stack.yaml      # Machine-readable manifest
└── skills/         # Operational knowledge, organized by phase

Scaffold one with:

agentic-stacks create my-org/my-stack

Step 1: Design the Skill Hierarchy

Skills are directories of markdown files that teach the agent specific operations. Organize by what the operator is trying to do:

Phase	Purpose	Examples
Foundation	Understanding and setup	Architecture, configuration, provisioning
Deploy	Initial deployment	Bootstrap, networking, storage
Platform	Platform layer	GitOps, ingress, monitoring, security
Operations	Day-two management	Health checks, scaling, upgrades, backup
Diagnose	Troubleshooting	Symptom-based decision trees
Reference	Cross-cutting lookups	Known issues, compatibility, decision guides

For complex stacks (10+ skills), use phase/domain nesting:

skills/
├── foundation/
│   ├── concepts/
│   └── infrastructure/
│       ├── README.md       # Overview + index
│       ├── aws.md          # Platform-specific
│       └── gcp.md
├── deploy/
│   ├── bootstrap/
│   ├── networking/
│   │   ├── README.md       # Decision matrix
│   │   ├── cilium.md       # Option deep dive
│   │   └── flannel.md
│   └── storage/
└── operations/
    ├── health-check/
    ├── upgrades/
    └── backup-restore/

Step 2: Write CLAUDE.md

CLAUDE.md is the agent's brain. It sets identity, enforces safety, and routes to skills.

# [Stack Name] — Agentic Stack

## Identity
[1-2 sentences establishing the agent's expertise]

## Critical Rules
[Numbered list of hard safety guardrails]

## Routing Table
| Operator Need | Skill | Entry Point |
|---|---|---|
| Deploy the cluster | bootstrap | skills/deploy/bootstrap |
| Troubleshoot issues | troubleshooting | skills/diagnose/troubleshooting |

## Workflows
### New Deployment
[Linear path through skills for first-time setup]

### Existing Deployment
[How to jump to the right skill for ongoing operations]

Writing Critical Rules

Critical rules prevent the agent from doing damage. Good rules are:

Specific: "Never run talosctl reset without operator approval" not "be careful"
Actionable: the agent can check compliance unambiguously
Justified: explain why — "etcd quorum loss means cluster down"
Minimal: 5-10 rules. Too many and the agent ignores them.

Step 3: Write stack.yaml

name: my-stack
owner: my-org
version: 0.1.0
description: >
  One paragraph describing what this stack teaches agents to operate.

repository: https://github.com/my-org/my-stack

target:
  software: target-software-name
  versions: ["1.x"]

skills:
  - name: skill-name
    entry: skills/path/to/skill
    description: One-line description

project:
  structure:
    - file-or-dir-in-operator-project

requires:
  tools:
    - name: tool-name
      description: What it's used for

depends_on: []

Tips: entry points to a directory, not a file. The directory's README.md is the entry point. description should help an agent decide whether to read the skill.

Step 4: Research and Verify

A stack is only as good as its accuracy. Before writing any skill:

Fetch the target software's official documentation index (/llms.txt, /sitemap.xml, or GitHub source)
Copy exact commands from the docs — do not reconstruct from memory
Verify YAML field names, CLI flags, and config structure
Note version-specific behavior
Cross-reference with release notes and GitHub issues

Step 5: Write Skill Content

Optimize for how agents process information:

Imperative headings: "Install Cilium", "Verify Health" — not "About Cilium Installation"
Exact commands: full copy-pasteable commands with realistic example values
Decision trees: "If X fails -> check Y -> if Y is true -> do Z"
Tables for reference: comparison matrices, port requirements
Safety warnings: explicit callouts before any destructive operation
Full YAML/config examples: valid snippets, not fragments

Known Issues Pattern

Version-specific bugs get their own files in skills/reference/known-issues/:

### [Short Description]

**Symptom:** What the operator sees
**Cause:** Why it happens
**Workaround:** Exact steps to fix it
**Affected versions:** x.y.z through x.y.w
**Status:** Open / Fixed in x.y.w

Step 6: Decision Guides and Compatibility

For stacks where operators must choose between components, provide structured decision aids in skills/reference/decision-guides/:

Comparison tables with features, complexity, and performance
Recommendations by use case (production, development, cloud-native)
Migration paths — can you change this decision later?

And compatibility matrices in skills/reference/compatibility/ mapping which versions of components work together.

Step 7: Validate Your Stack

agentic-stacks doctor

Before publishing, check:

CLAUDE.md has identity, critical rules, routing table, and workflows
stack.yaml lists all skills with correct entry paths
Every skill directory has a README.md
All commands are exact and copy-pasteable
No placeholders (TBD, TODO, FIXME) remain
Known issues are documented for supported versions
The stack has been tested by having an agent use it end-to-end

Designing for Composition

Operators compose multiple stacks in a single project. To make your stack compose well:

Stay in your domain. A hardware stack shouldn't reimplement networking concepts that a platform stack covers.
Use depends_on to declare stacks that pair well with yours.
Avoid conflicting file outputs. Document what files your stack creates in project.structure.
Name skills distinctively. When an agent loads multiple stacks, skill names should make the domain clear.

Reference Implementations

Stack	Complexity	Pattern
openstack-kolla	Simple	Flat phase-based (8 skills)
kubernetes-talos	Comprehensive	Two-layer phase/domain (20 skills)

Authoring a Stack