SmartMur Ecosystem — Layered Architecture Model¶
Version: 2.0 Date: 2026-03-03 Status: Architecture Reference Document
1. Layered Stack Model¶
The SmartMur ecosystem is organized into six architectural layers. Each layer has a clear responsibility boundary, communicates only with its immediate neighbors (or via the Event Bus), and can be developed, tested, and deployed independently.
Layer 0 — Physical & Network Infrastructure¶
What it is: The bare-metal and virtualized compute, storage, and networking substrate that everything else runs on.
Components: - Proxmox VE hypervisors running K3s worker nodes and Docker hosts - TrueNAS storage providing NFS/SMB shares for persistent data - UniFi network fabric (switches, APs, firewall rules, VLANs) - Cloudflare tunnels for secure external ingress without port forwarding
Repos: homelab (Docker Compose definitions for host-level services), k3s-cluster (Ansible/Terraform provisioning, Flux GitOps)
Key interfaces: - SSH (port 22) for remote command execution from Layer 3 - Proxmox API (port 8006) for VM lifecycle management - Docker socket for container orchestration on Docker hosts - Kubernetes API (port 6443) for K3s workload management
Layer 1 — Platform Services¶
What it is: The shared services that every higher layer depends on — message bus, persistent storage, observability, identity, and security.
Components:
- Redis (port 6379) — Pub/sub message bus, ephemeral cache, session store
- SQLite — Persistent state for cron jobs (scheduler.db), memory store (memory.db), audit log
- Encrypted Vault — age-encrypted credential store (vault.enc) with identity file auth
- Prometheus + Grafana + Loki — Metrics collection, dashboards, and log aggregation
- Wazuh + CrowdSec — Security monitoring, IDS, threat intelligence
- Keycloak — Identity provider for SSO across services (future integration)
Repos: homelab (monitoring stack, security stack), claude-superpowers (Redis in Compose, SQLite databases, vault module)
Key interfaces:
- Redis protocol (pub/sub channels, key/value ops)
- SQLite WAL-mode connections (file-level, no network)
- Vault API (superpowers.vault.Vault — get/set/delete/list)
- Prometheus /metrics endpoints scraped from all services
Layer 2 — Core Automation Engine¶
What it is: The Python application platform that provides all automation primitives — scheduling, messaging, SSH command execution, browser automation, workflow orchestration, file watching, and persistent memory. This is the brain of the system.
Components:
| Module | File | Purpose |
|---|---|---|
| Cron Engine | superpowers/cron_engine.py |
APScheduler + SQLite job store. Four job types: shell, claude, webhook, skill. Cron expressions, interval, and daily-at schedules. Output routing to files or messaging channels. |
| Message Gateway | msg_gateway/app.py |
FastAPI service (port 8100) normalizing send/receive across 5 channels: Slack, Telegram, Discord, email (SMTP/IMAP), iMessage. |
| Channel Registry | superpowers/channels/registry.py |
Lazy-loading adapter factory. Auto-detects configured channels from env vars/vault. |
| Notification Profiles | superpowers/profiles.py |
Named groups (critical, info, daily-digest) mapping to channel+target combos. Skills and cron jobs reference profiles, not raw channels. |
| SSH Fabric | superpowers/ssh_fabric/ |
Paramiko-based connection pool with lazy creation, max-age expiry, liveness checks. Auth via key, password (from vault), or agent. |
| Browser Engine | browser_engine/app.py |
FastAPI service (port 8300) wrapping Playwright + headless Chromium. Persistent sessions with profile-based cookie/localStorage storage. Navigate, screenshot, extract, fill forms, click, evaluate JS. |
| Workflow Engine | superpowers/workflow/engine.py |
YAML-defined multi-step pipelines. Step types: shell, claude-prompt, skill, http, approval-gate. Conditions, rollback, and notification on completion. |
| File Watcher | superpowers/watcher/engine.py |
Watchdog-based directory monitor. Rules trigger actions: shell, skill, workflow, move, copy. |
| Memory Store | superpowers/memory/store.py |
SQLite-backed structured knowledge base. Categories: fact, preference, project_context, conversation_summary. Search, decay (90-day auto-archive), access counting. |
| LLM Provider | superpowers/llm_provider.py |
Abstraction layer supporting Claude CLI, Anthropic SDK, OpenAI SDK. Automatic fallback chain. Role-based model routing (chat vs job). |
| Audit Log | superpowers/audit.py |
Append-only JSONL log of every skill invocation, cron execution, message sent, and SSH command run. |
| Intake Pipeline | superpowers/intake.py |
Request decomposition, requirement extraction, skill auto-mapping, parallel execution with ThreadPoolExecutor, role-based routing (planner/executor/verifier). |
| Infra Fixer | superpowers/infra_fixer.py |
Docker health monitor scanning all projects (40+ containers). Detects crash loops, unhealthy containers, stopped services. Auto-restart capability. |
Repo: claude-superpowers
Layer 3 — Skill & Plugin System¶
What it is: The extensible skill layer that wraps automation primitives into reusable, discoverable, sandboxed units of work. Skills are the primary unit of composition in the platform.
Components:
- Skill Registry (superpowers/skill_registry.py) — Discovers skills via skill.yaml manifests, auto-generates Claude Code slash commands
- Skill Loader (superpowers/skill_loader.py) — Executes skills in sandboxed subprocesses with controlled env vars, vault access gating
- Skill Creator (superpowers/skill_creator.py) — Scaffolds new skills from templates (5 built-in: shell, python, claude, http, composite)
- SkillHub (superpowers/skillhub.py) — Git-based skill sharing: push/pull/list/diff between local and shared repos
- Auto-Install (superpowers/auto_install.py) — Template-based skill creation from natural language requirements
Built-in Skills:
| Skill | Type | Purpose |
|---|---|---|
heartbeat |
Shell | Pings 6 hosts, probes 3 HTTPS services, formatted status table |
network-scan |
Shell | Network discovery and port scanning |
ssl-cert-check |
Shell | TLS certificate expiry monitoring |
docker-monitor |
Shell | Container health checks across all Docker hosts |
backup-status |
Shell | Backup job verification |
deploy |
Python | Local deployment pipeline: git pull, pip install, docker build, health check, test |
github-admin |
Python | Repository audit, branch protection, security scanning |
tunnel-setup |
Python | Cloudflare tunnel token management |
qa-guardian |
Python | Code quality scanner (12 checks, 4 categories) |
infra-fixer |
Python | Docker infrastructure auto-remediation |
cloudflared-fixer |
Python | Tunnel crash-loop detection and recovery |
Repo: claude-superpowers (skills at skills/), claude-code-skill-factory (skill creation patterns)
Layer 4 — Interface & Control Plane¶
What it is: The user-facing surfaces through which humans and external systems interact with the platform — the dashboard, CLI, Telegram bot, and MCP server.
Components:
- Dashboard (dashboard/app.py) — FastAPI + HTMX SPA on port 8200. JWT-authenticated. 17 API router modules covering cron, messaging, SSH, workflows, memory, skills, audit, vault, watchers, browser, chat, notifications, jobs, settings, GitHub, and status. Static file serving for the frontend.
- CLI (claw) — Click-based CLI entry point. Subcommands: cron, msg, skill, ssh, browse, workflow, memory, vault, intake, setup, status. Installed as editable package via pyproject.toml.
- Telegram Bot (telegram-bot/entrypoint.py) — Long-polling or webhook-based inbound listener. Claude AI responses, session management (TTL, history limits), concurrency control (per-chat and global limits), admin access requests, verification flow.
- MCP Server (superpowers/mcp_server.py) — Model Context Protocol server exposing memory tools and workflow tools as MCP resources for Claude Code integration.
Repo: claude-superpowers
Layer 5 — AI & Agent Orchestration¶
What it is: The intelligence layer that adds autonomous reasoning, multi-agent coordination, and AI-driven decision-making on top of the automation platform.
Components:
- Intake Pipeline — Decomposes natural language requests into task graphs, auto-maps to skills, dispatches parallel execution with role assignment (planner/executor/verifier)
- Role Router (superpowers/role_router.py) — Assigns specialized roles to intake tasks based on content analysis
- LLM Fallback Chain — Primary provider (Claude) with automatic failover to OpenAI. Role-based model selection (interactive chat vs background jobs)
- QA Guardian (superpowers/qa_guardian.py) — Autonomous code quality enforcement: 12 checks, 4 categories, JSON reports, Telegram notifications
- Claude Code Tresor — Agent orchestration framework for complex multi-step reasoning tasks
- Agent OS — Operating system abstractions for persistent agent processes
- Lighthouse AI — AI-powered code analysis and review
Repos: claude-superpowers, claude-code-tresor, agent-os, Lighthouse-AI
2. Architecture Diagram¶
+===========================================================================+
| |
| LAYER 5 — AI & AGENT ORCHESTRATION |
| |
| +-------------------+ +----------------+ +-------------------------+ |
| | Intake Pipeline | | Role Router | | LLM Provider | |
| | - decompose | | - planner | | - Claude CLI/SDK | |
| | - auto-map skills | | - executor | | - OpenAI fallback | |
| | - parallel exec | | - verifier | | - role-based routing | |
| +--------+----------+ +-------+--------+ +------------+------------+ |
| | | | |
| +--------v----------+ +-------v--------+ +------------v------------+ |
| | QA Guardian | | claude-code- | | Lighthouse-AI | |
| | - 12 checks | | tresor | | - code analysis | |
| | - auto-enforce | | - agent orch. | | - review automation | |
| +-------------------+ +----------------+ +-------------------------+ |
| |
+========+============+==================+=================+================+
| | | |
| invoke | dispatch | prompt | analyze
v v v v
+===========================================================================+
| |
| LAYER 4 — INTERFACE & CONTROL PLANE |
| |
| +------------------+ +-----+ +-----------------+ +-----------------+ |
| | Dashboard :8200 | | CLI | | Telegram Bot | | MCP Server | |
| | - FastAPI+HTMX | | | | - polling/hook | | - memory tools | |
| | - JWT auth | | claw| | - AI responses | | - workflow tools| |
| | - 17 API routers | | | | - session mgmt | | - Claude Code | |
| +--------+---------+ +--+--+ +--------+--------+ +--------+--------+ |
| | | | | |
+========+===+===============+==============+=====+===============+==========+
| | | | |
| REST | subprocess | | pub/sub | MCP
v v v v v
+===========================================================================+
| |
| LAYER 3 — SKILL & PLUGIN SYSTEM |
| |
| +--------------------+ +-------------------+ +----------------------+ |
| | Skill Registry | | Skill Loader | | SkillHub | |
| | - YAML discovery | | - sandboxed exec | | - git push/pull | |
| | - slash cmd gen | | - env isolation | | - shared repo sync | |
| +--------------------+ | - vault gating | +----------------------+ |
| +-------------------+ |
| +-------------------------------------------------------------------+ |
| | Built-in Skills | |
| | heartbeat | network-scan | ssl-cert-check | docker-monitor | |
| | deploy | github-admin | tunnel-setup | qa-guardian | |
| | infra-fixer | cloudflared-fixer | backup-status | |
| +-------------------------------------------------------------------+ |
| |
+======+==========+==========+==========+===========+==========+============+
| | | | | |
| shell | skill | http | claude | workflow | watcher
v v v v v v
+===========================================================================+
| |
| LAYER 2 — CORE AUTOMATION ENGINE |
| |
| +---------------+ +------------------+ +---------------------------+ |
| | Cron Engine | | Message Gateway | | SSH Fabric | |
| | - APScheduler | | :8100 | | - paramiko pool | |
| | - SQLite jobs | | - 5 channels | | - key/password/agent auth | |
| | - 4 job types | | - profiles | | - connection reuse | |
| +-------+-------+ +--------+---------+ +-------------+-------------+ |
| | | | |
| +-------v-------+ +--------v---------+ +-------------v-------------+ |
| | Workflow Eng. | | Browser Engine | | File Watcher | |
| | - YAML steps | | :8300 | | - watchdog | |
| | - conditions | | - Playwright | | - YAML rules | |
| | - rollback | | - session mgmt | | - trigger actions | |
| | - approvals | | - DOM extraction | +---------------------------+ |
| +---------------+ +------------------+ |
| |
| +------------------+ +-------------------+ +------------------------+ |
| | Memory Store | | Audit Log | | Infra Fixer | |
| | - SQLite WAL | | - append-only | | - 40+ containers | |
| | - search/decay | | - JSONL | | - auto-restart | |
| | - categories | | - searchable | | - health reports | |
| +------------------+ +-------------------+ +------------------------+ |
| |
+======+=================+=================+================================+
| | |
| redis | sqlite | vault
v v v
+===========================================================================+
| |
| LAYER 1 — PLATFORM SERVICES |
| |
| +-----------+ +-----------+ +------------+ +------------------------+|
| | Redis | | SQLite | | Encrypted | | Monitoring ||
| | :6379 | | WAL mode | | Vault | | Prometheus + Grafana ||
| | - pub/sub | | - cron DB | | - age enc | | Loki (logs) ||
| | - cache | | - memory | | - identity | | Wazuh + CrowdSec ||
| | - session | | - audit | | file | | (IDS, threat intel) ||
| +-----------+ +-----------+ +------------+ +------------------------+|
| |
+======+=================+=================+================================+
| | |
| docker | ssh | API
v v v
+===========================================================================+
| |
| LAYER 0 — PHYSICAL & NETWORK INFRASTRUCTURE |
| |
| +--------------------+ +------------------+ +----------------------+ |
| | Proxmox VE | | Docker Hosts | | TrueNAS | |
| | - K3s nodes | | - Compose stacks | | - NFS/SMB shares | |
| | - VM lifecycle | | - 9 projects | | - ZFS snapshots | |
| +--------------------+ | - 40+ containers | +----------------------+ |
| +------------------+ |
| +--------------------+ +------------------+ |
| | UniFi Network | | Cloudflare | |
| | - VLANs | | - tunnels | |
| | - firewall rules | | - DNS | |
| | - APs + switches | | - zero-trust | |
| +--------------------+ +------------------+ |
| |
+===========================================================================+
3. Repo-to-Layer Mapping¶
| Repository | Layer | Role | Connects To |
|---|---|---|---|
| claude-superpowers | L2, L3, L4, L5 | Core platform. Houses the automation engine (L2), skill system (L3), all user interfaces (L4), and AI orchestration (L5). The central nervous system. | L1 via Redis, SQLite, Vault. L0 via SSH Fabric and Docker API. |
| k3s-cluster | L0 | Infrastructure-as-code for the Kubernetes substrate. Ansible playbooks for Proxmox VM provisioning, Terraform for K3s bootstrap, Flux for GitOps. | L1 receives metrics via Prometheus. L2 manages via SSH Fabric. |
| homelab | L0, L1 | Docker Compose definitions for host-level services spanning infrastructure (L0) and platform services (L1): monitoring stack, security stack, Gitea, Guacamole, Frigate, Keycloak. | L2 monitors via Infra Fixer. L4 displays in Dashboard. |
| home_media | L0 | Media automation stack: Plex, Jellyfin, Sonarr, Radarr, Prowlarr, qBittorrent. Runs as Docker Compose on a dedicated host. | L2 monitors via Infra Fixer. L2 File Watcher triggers on new downloads. |
| claude-code-tresor | L5 | Agent orchestration framework. Provides multi-agent coordination patterns for complex reasoning tasks. | L4 via MCP. L3 by invoking skills. L2 via LLM Provider. |
| claude-code-skill-factory | L3 | Skill creation factory with templates, validation, and publishing workflows. Feeds the SkillHub. | L3 SkillHub for distribution. L5 Intake for auto-installation. |
| agent-os | L5 | Agent operating system abstractions: persistent processes, memory, tool access. Provides runtime for long-lived agents. | L2 Memory Store. L1 Redis for state. L3 Skills for actions. |
| Lighthouse-AI | L5 | AI-powered code analysis, review automation, and quality scoring. | L5 QA Guardian integration. L4 Dashboard reporting. |
| dotfiles | L0 | Developer environment setup: shell configs, tool installation, SSH config, editor settings. Bootstraps new hosts. | L0 provisions the environment that all layers run in. |
| design-os | -- | Fork. Design system reference. Not integrated into the automation stack. | -- |
| Smoke | -- | Fork. Testing framework reference. Not integrated into the automation stack. | -- |
4. Data Flow¶
4.1 Command Flow (User-Initiated)¶
User
|
+--[CLI: claw cron add ...]---------> Cron Engine ----> SQLite (jobs.json)
| |
+--[Dashboard: POST /api/msg/send]--> Msg Gateway ----> Channel Adapter ---> Slack/Telegram/...
| |
+--[Telegram: "!scan network"]------> Inbound -------> Intake Pipeline
| Listener |
+--[CLI: claw skill run deploy]-----> Skill Loader ---> Subprocess (sandboxed)
| |
+--[MCP: memory.remember]-----------> MCP Server -----> Memory Store ----> SQLite
4.2 Scheduled Execution Flow (Cron-Initiated)¶
APScheduler timer fires
|
v
CronEngine._execute_job(job_id)
|
+--[shell]----> subprocess.run() ----> stdout/stderr
|
+--[claude]---> LLMProvider.invoke() -> Claude/OpenAI API -> response text
|
+--[webhook]--> urllib POST ----------> remote HTTP endpoint -> response
|
+--[skill]----> SkillRegistry.get() --> SkillLoader.run() --> subprocess
|
v
_route_output()
|
+--[file]-----> write to ~/.claude-superpowers/cron/output/{id}/{ts}.log
|
+--[channel]--> ChannelRegistry.get() -> adapter.send() -> Slack/Telegram/...
|
+--[profile]--> ProfileManager.send() -> fan-out to N channel+target combos
4.3 Event-Driven Flow (Watcher/Inbound)¶
Filesystem event (watchdog) Inbound message (Telegram)
| |
v v
WatcherEngine._on_event() InboundListener.on_message()
| |
+-- pattern match against rules +-- pattern match (!command)
| |
v v
Action dispatch: Intake Pipeline:
shell -> subprocess extract_requirements()
skill -> SkillLoader build_plan()
move -> shutil.move auto_install() for each task
copy -> shutil.copy2 ThreadPoolExecutor.submit()
workflow -> WorkflowEngine |
v
SkillLoader.run_sandboxed()
|
v
Response -> Telegram reply
4.4 Metrics & Observability Flow¶
All Services
|
+--[Prometheus /metrics]-----> Prometheus -----> Grafana dashboards
|
+--[Structured logs]--------> Loki ------------> Grafana log queries
|
+--[Audit log entries]------> audit.log (JSONL)
| |
| +---> Dashboard /api/audit/tail
| +---> CLI: claw audit search
|
+--[Health checks]----------> Dashboard /api/status
| |
| +---> Cron daemon status
| +---> Redis connectivity
| +---> Browser engine sessions
| +---> Message gateway channels
| +---> Memory store stats
|
+--[Security events]--------> Wazuh -----------> Alerts
| CrowdSec ---------> IP blocklists
|
+--[Infra Fixer reports]----> JSON reports -----> Telegram notifications
|
+---> Dashboard /api/status
5. Integration Points¶
5.1 Inter-Layer Communication Matrix¶
| From | To | Protocol | Mechanism | Example |
|---|---|---|---|---|
| L4 (Dashboard) | L2 (Cron Engine) | In-process | Python import, direct method call | CronEngine().list_jobs() |
| L4 (Dashboard) | L2 (Browser Engine) | HTTP REST | httpx to http://browser-engine:8300 |
POST /screenshot |
| L4 (Telegram Bot) | L2 (Msg Gateway) | In-process | InboundListener -> ChannelRegistry |
Pattern-matched inbound dispatch |
| L4 (CLI) | L2 (All engines) | In-process | Click command -> Python module | claw cron list -> CronEngine.list_jobs() |
| L4 (MCP Server) | L2 (Memory Store) | In-process | MCP tool handler -> MemoryStore |
memory.remember tool call |
| L3 (Skills) | L2 (SSH Fabric) | In-process | Skill script imports ssh_fabric |
ConnectionPool.get_client("proxmox") |
| L3 (Skills) | L1 (Vault) | In-process | Vault.get() (gated by skill.yaml perms) |
Retrieve SSH password for host |
| L2 (Cron Engine) | L1 (SQLite) | File I/O | SQLAlchemy job store, JSON job persistence | scheduler.db, jobs.json |
| L2 (Cron Engine) | L2 (Msg Gateway) | In-process | _send_to_channel() -> ChannelRegistry |
Job output to Slack profile |
| L2 (Workflow) | L2 (All engines) | In-process | Step dispatcher calls engine methods | shell -> subprocess, skill -> SkillLoader |
| L2 (Msg Gateway) | L1 (Redis) | Redis protocol | Pub/sub for message bus | Inbound message fanout |
| L2 (SSH Fabric) | L0 (Hosts) | SSH | Paramiko TCP connection | executor.run("proxmox", "qm list") |
| L2 (Infra Fixer) | L0 (Docker) | Subprocess | docker ps, docker inspect, docker restart |
Container health scan |
| L1 (Redis) | -- | TCP 6379 | Docker network (default) |
All services on same Compose network |
5.2 Docker Compose Service Topology¶
+------------------+
| npm_default | (external network:
| network | reverse proxy)
+--------+---------+
|
+----------------------------+-----------------------------------+
| Docker Compose: claude-superpowers |
| |
| +-----------+ +--------------+ +------------------+ |
| | redis | | msg-gateway | | dashboard | |
| | :6379 |<----+ :8100 | | :8200 | |
| | | | | | +--+--> npm_default
| +-----+-----+ +--------------+ +--------+---------+ |
| | | |
| | +--------------+ | |
| +---------->| browser-eng |<-------------+ |
| | | :8300 | |
| | +--------------+ |
| | |
| | +--------------+ |
| +---------->| telegram-bot | |
| | (no port) | |
| +--------------+ |
| |
+----------------------------------------------------------------+
5.3 Authentication & Security Boundaries¶
| Boundary | Mechanism | Details |
|---|---|---|
| Dashboard API | JWT (HS256) | DASHBOARD_USER/DASHBOARD_PASS credentials. Token issued via /auth/login, validated on every /api/* route. Insecure default detection. |
| Telegram Bot | Chat ID allowlist | ALLOWED_CHAT_IDS env var. Unrecognized chats trigger admin approval flow. |
| Telegram Webhook | Secret token header | X-Telegram-Bot-Api-Secret-Token validated fail-closed. |
| Vault | age encryption | Identity file (age-identity.txt) required for decrypt. No network exposure. |
| SSH Fabric | Key/password/agent | Credentials from vault or SSH agent. Connection pool with max-age expiry. |
| Skill Sandbox | Env stripping | run_sandboxed() strips environment, only passes vault access if skill.yaml grants vault: true. |
| Webhooks | Signature validation | WEBHOOK_REQUIRE_SIGNATURE=true (fail-closed default). |
| Rate limiting | Per-IP and per-user | Configurable via RATE_LIMIT_PER_IP (60/min) and RATE_LIMIT_PER_USER (120/min). |
5.4 Persistent Storage Map¶
~/.claude-superpowers/ (SUPERPOWERS_DATA_DIR)
|
+-- vault.enc age-encrypted credential store
+-- age-identity.txt vault decryption identity
+-- memory.db SQLite memory store (WAL mode)
+-- audit.log append-only JSONL audit trail
+-- profiles.yaml notification profile definitions
+-- watchers.yaml file watcher rule definitions
|
+-- cron/
| +-- jobs.json cron job definitions (atomic write)
| +-- scheduler.db APScheduler SQLite job store
| +-- cron-daemon.pid daemon PID file
| +-- output/{job-id}/{ts}.log per-execution output logs
|
+-- browser/
| +-- profiles/{name}/ Chromium user data dirs (cookies, localStorage)
|
+-- skills/ user-installed skills
+-- workflows/ user-defined workflow YAML files
+-- runtime/ intake pipeline session state
+-- ssh/ SSH host registry
+-- logs/ application logs
+-- msg/ message queue persistence
6. Cross-Cutting Concerns¶
6.1 Configuration Resolution Order¶
Every service follows the same configuration precedence:
- Environment variable (highest priority)
- Encrypted vault (
vault.encviaVault.get()) .envfile (loaded once at startup, never overrides existing env vars)- Code defaults (lowest priority)
This is implemented in Settings.load() via the _secret() helper function, which tries env var first then vault fallback.
6.2 Error Handling Philosophy¶
- Messaging failures never break automation:
_send_to_channel()catchesChannelErrorsilently. A failed Slack notification does not abort a cron job. - Vault is optional:
Settings._vault_get()returns empty string on any exception. Services degrade gracefully without vault access. - SSH connections are lazy:
ConnectionPool.get_client()only connects on first use. Dead connections are detected and reconnected transparently. - Skill execution is isolated:
run_sandboxed()strips the environment. A skill crash does not affect the host process.
6.3 Deployment Modes¶
| Mode | Stack | Use Case |
|---|---|---|
| Full Docker | All 5 services via docker-compose up |
Production on Docker host |
| CLI-only | pip install -e . + claw commands |
Developer workstation, quick automation |
| Hybrid | CLI locally + Redis/browser-engine in Docker | Development with browser automation |
| K3s | Helm charts (future) via k3s-cluster Flux |
Production Kubernetes deployment |
7. Evolution Roadmap¶
The layered model enables targeted investment at each level:
| Layer | Current State | Next Milestone |
|---|---|---|
| L0 Infrastructure | Manual provisioning, Ansible scripts | Full IaC with Terraform state in Git, automated VM lifecycle |
| L1 Platform | Redis + SQLite, monitoring via separate stack | Unified Prometheus federation, Keycloak SSO integration |
| L2 Engine | All 8 engines operational (982 tests) | Event sourcing for audit trail, WebSocket push for real-time updates |
| L3 Skills | 11 built-in skills, SkillHub sync | Skill marketplace, versioned skill dependencies, skill composition |
| L4 Interfaces | Dashboard, CLI, Telegram, MCP | Mobile-responsive dashboard, Discord bot, webhook API for external integrations |
| L5 AI | Intake pipeline, LLM fallback, QA Guardian | Autonomous agent loops, planning with memory, self-healing infrastructure |