oh-my-openagent/src/agents/prometheus/plan-template.ts

/**
 * Prometheus Plan Template
 *
 * The markdown template structure for work plans generated by Prometheus.
 * Includes TL;DR, context, objectives, verification strategy, TODOs, and success criteria.
 */

export const PROMETHEUS_PLAN_TEMPLATE = `## Plan Structure

Generate plan to: \`.sisyphus/plans/{name}.md\`

\`\`\`markdown
# {Plan Title}

## TL;DR

> **Quick Summary**: [1-2 sentences capturing the core objective and approach]
>
> **Deliverables**: [Bullet list of concrete outputs]
> - [Output 1]
> - [Output 2]
>
> **Estimated Effort**: [Quick | Short | Medium | Large | XL]
> **Parallel Execution**: [YES - N waves | NO - sequential]
> **Critical Path**: [Task X → Task Y → Task Z]

---

## Context

### Original Request
[User's initial description]

### Interview Summary
**Key Discussions**:
- [Point 1]: [User's decision/preference]
- [Point 2]: [Agreed approach]

**Research Findings**:
- [Finding 1]: [Implication]
- [Finding 2]: [Recommendation]

### Metis Review
**Identified Gaps** (addressed):
- [Gap 1]: [How resolved]
- [Gap 2]: [How resolved]

---

## Work Objectives

### Core Objective
[1-2 sentences: what we're achieving]

### Concrete Deliverables
- [Exact file/endpoint/feature]

### Definition of Done
- [ ] [Verifiable condition with command]

### Must Have
- [Non-negotiable requirement]

### Must NOT Have (Guardrails)
- [Explicit exclusion from Metis review]
- [AI slop pattern to avoid]
- [Scope boundary]

---

## Verification Strategy (MANDATORY)

> **ZERO HUMAN INTERVENTION** — ALL verification is agent-executed. No exceptions.
> Acceptance criteria requiring "user manually tests/confirms" are FORBIDDEN.

### Test Decision
- **Infrastructure exists**: [YES/NO]
- **Automated tests**: [TDD / Tests-after / None]
- **Framework**: [bun test / vitest / jest / pytest / none]
- **If TDD**: Each task follows RED (failing test) → GREEN (minimal impl) → REFACTOR

### QA Policy
Every task MUST include agent-executed QA scenarios (see TODO template below).
Evidence saved to \`.sisyphus/evidence/task-{N}-{scenario-slug}.{ext}\`.

- **Frontend/UI**: Use Playwright (playwright skill) — Navigate, interact, assert DOM, screenshot
- **TUI/CLI**: Use interactive_bash (tmux) — Run command, send keystrokes, validate output
- **API/Backend**: Use Bash (curl) — Send requests, assert status + response fields
- **Library/Module**: Use Bash (bun/node REPL) — Import, call functions, compare output

---

## Execution Strategy

### Parallel Execution Waves

> Maximize throughput by grouping independent tasks into parallel waves.
> Each wave completes before the next begins.
> Target: 5-8 tasks per wave. Fewer than 3 per wave (except final) = under-splitting.

\`\`\`
Wave 1 (Start Immediately — foundation + scaffolding):
├── Task 1: Project scaffolding + config [quick]
├── Task 2: Design system tokens [quick]
├── Task 3: Type definitions [quick]
├── Task 4: Schema definitions [quick]
├── Task 5: Storage interface + in-memory impl [quick]
├── Task 6: Auth middleware [quick]
└── Task 7: Client module [quick]

Wave 2 (After Wave 1 — core modules, MAX PARALLEL):
├── Task 8: Core business logic (depends: 3, 5, 7) [deep]
├── Task 9: API endpoints (depends: 4, 5) [unspecified-high]
├── Task 10: Secondary storage impl (depends: 5) [unspecified-high]
├── Task 11: Retry/fallback logic (depends: 8) [deep]
├── Task 12: UI layout + navigation (depends: 2) [visual-engineering]
├── Task 13: API client + hooks (depends: 4) [quick]
└── Task 14: Telemetry middleware (depends: 5, 10) [unspecified-high]

Wave 3 (After Wave 2 — integration + UI):
├── Task 15: Main route combining modules (depends: 6, 11, 14) [deep]
├── Task 16: UI data visualization (depends: 12, 13) [visual-engineering]
├── Task 17: Deployment config A (depends: 15) [quick]
├── Task 18: Deployment config B (depends: 15) [quick]
├── Task 19: Deployment config C (depends: 15) [quick]
└── Task 20: UI request log + build (depends: 16) [visual-engineering]

Wave 4 (After Wave 3 — verification):
├── Task 21: Integration tests (depends: 15) [deep]
├── Task 22: UI QA - Playwright (depends: 20) [unspecified-high]
├── Task 23: E2E QA (depends: 21) [deep]
└── Task 24: Git cleanup + tagging (depends: 21) [git]

Wave FINAL (After ALL tasks — independent review, 4 parallel):
├── Task F1: Plan compliance audit (oracle)
├── Task F2: Code quality review (unspecified-high)
├── Task F3: Real manual QA (unspecified-high)
└── Task F4: Scope fidelity check (deep)

Critical Path: Task 1 → Task 5 → Task 8 → Task 11 → Task 15 → Task 21 → F1-F4
Parallel Speedup: ~70% faster than sequential
Max Concurrent: 7 (Waves 1 & 2)
\`\`\`

### Dependency Matrix (abbreviated — show ALL tasks in your generated plan)

- **1-7**: — — 8-14, 1
- **8**: 3, 5, 7 — 11, 15, 2
- **11**: 8 — 15, 2
- **14**: 5, 10 — 15, 2
- **15**: 6, 11, 14 — 17-19, 21, 3
- **21**: 15 — 23, 24, 4

> This is abbreviated for reference. YOUR generated plan must include the FULL matrix for ALL tasks.

### Agent Dispatch Summary

- **1**: **7** — T1-T4 → \`quick\`, T5 → \`quick\`, T6 → \`quick\`, T7 → \`quick\`
- **2**: **7** — T8 → \`deep\`, T9 → \`unspecified-high\`, T10 → \`unspecified-high\`, T11 → \`deep\`, T12 → \`visual-engineering\`, T13 → \`quick\`, T14 → \`unspecified-high\`
- **3**: **6** — T15 → \`deep\`, T16 → \`visual-engineering\`, T17-T19 → \`quick\`, T20 → \`visual-engineering\`
- **4**: **4** — T21 → \`deep\`, T22 → \`unspecified-high\`, T23 → \`deep\`, T24 → \`git\`
- **FINAL**: **4** — F1 → \`oracle\`, F2 → \`unspecified-high\`, F3 → \`unspecified-high\`, F4 → \`deep\`

---

## TODOs

> Implementation + Test = ONE Task. Never separate.
> EVERY task MUST have: Recommended Agent Profile + Parallelization info + QA Scenarios.
> **A task WITHOUT QA Scenarios is INCOMPLETE. No exceptions.**

- [ ] 1. [Task Title]

  **What to do**:
  - [Clear implementation steps]
  - [Test cases to cover]

  **Must NOT do**:
  - [Specific exclusions from guardrails]

  **Recommended Agent Profile**:
  > Select category + skills based on task domain. Justify each choice.
  - **Category**: \`[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]\`
    - Reason: [Why this category fits the task domain]
  - **Skills**: [\`skill-1\`, \`skill-2\`]
    - \`skill-1\`: [Why needed - domain overlap explanation]
    - \`skill-2\`: [Why needed - domain overlap explanation]
  - **Skills Evaluated but Omitted**:
    - \`omitted-skill\`: [Why domain doesn't overlap]

  **Parallelization**:
  - **Can Run In Parallel**: YES | NO
  - **Parallel Group**: Wave N (with Tasks X, Y) | Sequential
  - **Blocks**: [Tasks that depend on this task completing]
  - **Blocked By**: [Tasks this depends on] | None (can start immediately)

  **References** (CRITICAL - Be Exhaustive):

  > The executor has NO context from your interview. References are their ONLY guide.
  > Each reference must answer: "What should I look at and WHY?"

  **Pattern References** (existing code to follow):
  - \`src/services/auth.ts:45-78\` - Authentication flow pattern (JWT creation, refresh token handling)

  **API/Type References** (contracts to implement against):
  - \`src/types/user.ts:UserDTO\` - Response shape for user endpoints

  **Test References** (testing patterns to follow):
  - \`src/__tests__/auth.test.ts:describe("login")\` - Test structure and mocking patterns

  **External References** (libraries and frameworks):
  - Official docs: \`https://zod.dev/?id=basic-usage\` - Zod validation syntax

  **WHY Each Reference Matters** (explain the relevance):
  - Don't just list files - explain what pattern/information the executor should extract
  - Bad: \`src/utils.ts\` (vague, which utils? why?)
  - Good: \`src/utils/validation.ts:sanitizeInput()\` - Use this sanitization pattern for user input

  **Acceptance Criteria**:

  > **AGENT-EXECUTABLE VERIFICATION ONLY** — No human action permitted.
  > Every criterion MUST be verifiable by running a command or using a tool.

  **If TDD (tests enabled):**
  - [ ] Test file created: src/auth/login.test.ts
  - [ ] bun test src/auth/login.test.ts → PASS (3 tests, 0 failures)

  **QA Scenarios (MANDATORY — task is INCOMPLETE without these):**

  > **This is NOT optional. A task without QA scenarios WILL BE REJECTED.**
  >
  > Write scenario tests that verify the ACTUAL BEHAVIOR of what you built.
  > Minimum: 1 happy path + 1 failure/edge case per task.
  > Each scenario = exact tool + exact steps + exact assertions + evidence path.
  >
  > **The executing agent MUST run these scenarios after implementation.**
  > **The orchestrator WILL verify evidence files exist before marking task complete.**

  \\\`\\\`\\\`
  Scenario: [Happy path — what SHOULD work]
    Tool: [Playwright / interactive_bash / Bash (curl)]
    Preconditions: [Exact setup state]
    Steps:
      1. [Exact action — specific command/selector/endpoint, no vagueness]
      2. [Next action — with expected intermediate state]
      3. [Assertion — exact expected value, not "verify it works"]
    Expected Result: [Concrete, observable, binary pass/fail]
    Failure Indicators: [What specifically would mean this failed]
    Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}.{ext}

  Scenario: [Failure/edge case — what SHOULD fail gracefully]
    Tool: [same format]
    Preconditions: [Invalid input / missing dependency / error state]
    Steps:
      1. [Trigger the error condition]
      2. [Assert error is handled correctly]
    Expected Result: [Graceful failure with correct error message/code]
    Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}-error.{ext}
  \\\`\\\`\\\`

  > **Specificity requirements — every scenario MUST use:**
  > - **Selectors**: Specific CSS selectors (\`.login-button\`, not "the login button")
  > - **Data**: Concrete test data (\`"test@example.com"\`, not \`"[email]"\`)
  > - **Assertions**: Exact values (\`text contains "Welcome back"\`, not "verify it works")
  > - **Timing**: Wait conditions where relevant (\`timeout: 10s\`)
  > - **Negative**: At least ONE failure/error scenario per task
  >
  > **Anti-patterns (your scenario is INVALID if it looks like this):**
  > - ❌ "Verify it works correctly" — HOW? What does "correctly" mean?
  > - ❌ "Check the API returns data" — WHAT data? What fields? What values?
  > - ❌ "Test the component renders" — WHERE? What selector? What content?
  > - ❌ Any scenario without an evidence path

  **Evidence to Capture:**
  - [ ] Each evidence file named: task-{N}-{scenario-slug}.{ext}
  - [ ] Screenshots for UI, terminal output for CLI, response bodies for API

  **Commit**: YES | NO (groups with N)
  - Message: \`type(scope): desc\`
  - Files: \`path/to/file\`
  - Pre-commit: \`test command\`

---

## Final Verification Wave (MANDATORY — after ALL implementation tasks)

> 4 review agents run in PARALLEL. ALL must APPROVE. Rejection → fix → re-run.

- [ ] F1. **Plan Compliance Audit** — \`oracle\`
  Read the plan end-to-end. For each "Must Have": verify implementation exists (read file, curl endpoint, run command). For each "Must NOT Have": search codebase for forbidden patterns — reject with file:line if found. Check evidence files exist in .sisyphus/evidence/. Compare deliverables against plan.
  Output: \`Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT\`

- [ ] F2. **Code Quality Review** — \`unspecified-high\`
  Run \`tsc --noEmit\` + linter + \`bun test\`. Review all changed files for: \`as any\`/\`@ts-ignore\`, empty catches, console.log in prod, commented-out code, unused imports. Check AI slop: excessive comments, over-abstraction, generic names (data/result/item/temp).
  Output: \`Build [PASS/FAIL] | Lint [PASS/FAIL] | Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT\`

- [ ] F3. **Real Manual QA** — \`unspecified-high\` (+ \`playwright\` skill if UI)
  Start from clean state. Execute EVERY QA scenario from EVERY task — follow exact steps, capture evidence. Test cross-task integration (features working together, not isolation). Test edge cases: empty state, invalid input, rapid actions. Save to \`.sisyphus/evidence/final-qa/\`.
  Output: \`Scenarios [N/N pass] | Integration [N/N] | Edge Cases [N tested] | VERDICT\`

- [ ] F4. **Scope Fidelity Check** — \`deep\`
  For each task: read "What to do", read actual diff (git log/diff). Verify 1:1 — everything in spec was built (no missing), nothing beyond spec was built (no creep). Check "Must NOT do" compliance. Detect cross-task contamination: Task N touching Task M's files. Flag unaccounted changes.
  Output: \`Tasks [N/N compliant] | Contamination [CLEAN/N issues] | Unaccounted [CLEAN/N files] | VERDICT\`

---

## Commit Strategy

- **1**: \`type(scope): desc\` — file.ts, npm test

---

## Success Criteria

### Verification Commands
\`\`\`bash
command  # Expected: output
\`\`\`

### Final Checklist
- [ ] All "Must Have" present
- [ ] All "Must NOT Have" absent
- [ ] All tests pass
\`\`\`

---
`