Benchmark Baseline — Orchestration Performance¶

Established: 2026-03-06

Purpose¶

This document records the initial performance baseline for the orchestration subsystem. All future benchmark runs are compared against these thresholds to detect regressions.

Default Thresholds¶

Scenario	Metric	Threshold	Direction
dag_throughput	duration_ms	10000.0	lower is better
dag_throughput	memory_peak_mb	100.0	lower is better
policy_evaluation	duration_ms	5000.0	lower is better
policy_evaluation	ops_per_sec	100.0	higher is better
orchestration_e2e	duration_ms	5000.0	lower is better
report_generation	duration_ms	5000.0	lower is better
report_generation	ops_per_sec	10.0	higher is better

Scenarios¶

dag_throughput¶

Fan-out DAG with 50 parallel tasks depending on a root node, collected by a single sink node. Measures thread pool scheduling overhead and dependency resolution speed.

policy_evaluation¶

1000 iterations of PolicyEngine.check_command() across 10 different command strings (mix of allowed, denied, and approval-required).

orchestration_e2e¶

5-step sequential DAG pipeline executed 3 times. Measures end-to-end orchestration overhead.

report_generation¶

50 iterations of JSON + Markdown rendering for a report with 5 sections and 50 items.

How to Run¶

claw benchmark run              # run all and save
claw benchmark run --scenario dag_throughput   # single scenario
claw benchmark report           # show last results
claw benchmark list             # list scenarios

Initial Baseline Results¶

Run claw benchmark run to populate this section with actual numbers. Results are saved to ~/.claude-superpowers/benchmarks/.

Scenario	Duration (ms)	Ops/sec	Peak Memory (MB)
(run benchmarks to fill)