agentic-stacks /

ceph

v2026.0402.1

Complete operational knowledge for deploying, managing, and operating production Ceph storage clusters using cephadm on bare metal. Covers RBD, CephFS, and RGW services across Ceph Reef (18.2.x) and Squid (19.2.x).

Install:

pipx install agentic-stacks  # if you haven't already
agentic-stacks pull agentic-stacks/ceph

Skills

concepts

Ceph architecture, CRUSH maps, placement groups, pools, BlueStore

hardware-planning

Disk sizing, CPU/RAM ratios, network bandwidth, node role planning

host-preparation

OS prerequisites, NTP, firewall ports, container runtime, cephadm install

bootstrap

cephadm bootstrap, initial MON/MGR/OSD deployment, dashboard setup

networking

Public vs cluster network design, VLAN, bonding, MTU configuration

services

RBD pool creation, CephFS/MDS deployment, RGW S3 gateway setup

health-check

Cluster health interpretation, OSD states, PG states, alert triage

scaling

Add/remove OSDs and hosts, expand services, rebalance

upgrades

Rolling upgrades within and across Reef/Squid versions

backup-restore

Pool snapshots, RBD mirroring, RGW multisite, disaster recovery

pool-management

CRUSH rules, erasure coding profiles, tiering, quotas

certificate-mgmt

Dashboard TLS, RGW TLS, internal messenger encryption

troubleshooting

Symptom-based diagnostic trees for common Ceph failure modes

performance

Benchmarking with rados bench/fio, slow OSD diagnosis, bottleneck ID

known-issues

Version-specific bugs and workarounds for Reef and Squid

compatibility

Ceph version × kernel × container image × client compatibility

decision-guides

Replicated vs erasure coding, BlueStore tuning, network topology