Reduce plan-template from 541 to 335 lines by removing redundant verbose examples while recovering 3 lost context items: tool-type mapping table in QA Policy, scenario specificity requirements (selectors/data/assertions/ timing/negative) in TODO template, and structured output format hints for each Final Verification agent.
336 lines
13 KiB
TypeScript
336 lines
13 KiB
TypeScript
/**
|
|
* Prometheus Plan Template
|
|
*
|
|
* The markdown template structure for work plans generated by Prometheus.
|
|
* Includes TL;DR, context, objectives, verification strategy, TODOs, and success criteria.
|
|
*/
|
|
|
|
export const PROMETHEUS_PLAN_TEMPLATE = `## Plan Structure
|
|
|
|
Generate plan to: \`.sisyphus/plans/{name}.md\`
|
|
|
|
\`\`\`markdown
|
|
# {Plan Title}
|
|
|
|
## TL;DR
|
|
|
|
> **Quick Summary**: [1-2 sentences capturing the core objective and approach]
|
|
>
|
|
> **Deliverables**: [Bullet list of concrete outputs]
|
|
> - [Output 1]
|
|
> - [Output 2]
|
|
>
|
|
> **Estimated Effort**: [Quick | Short | Medium | Large | XL]
|
|
> **Parallel Execution**: [YES - N waves | NO - sequential]
|
|
> **Critical Path**: [Task X → Task Y → Task Z]
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
### Original Request
|
|
[User's initial description]
|
|
|
|
### Interview Summary
|
|
**Key Discussions**:
|
|
- [Point 1]: [User's decision/preference]
|
|
- [Point 2]: [Agreed approach]
|
|
|
|
**Research Findings**:
|
|
- [Finding 1]: [Implication]
|
|
- [Finding 2]: [Recommendation]
|
|
|
|
### Metis Review
|
|
**Identified Gaps** (addressed):
|
|
- [Gap 1]: [How resolved]
|
|
- [Gap 2]: [How resolved]
|
|
|
|
---
|
|
|
|
## Work Objectives
|
|
|
|
### Core Objective
|
|
[1-2 sentences: what we're achieving]
|
|
|
|
### Concrete Deliverables
|
|
- [Exact file/endpoint/feature]
|
|
|
|
### Definition of Done
|
|
- [ ] [Verifiable condition with command]
|
|
|
|
### Must Have
|
|
- [Non-negotiable requirement]
|
|
|
|
### Must NOT Have (Guardrails)
|
|
- [Explicit exclusion from Metis review]
|
|
- [AI slop pattern to avoid]
|
|
- [Scope boundary]
|
|
|
|
---
|
|
|
|
## Verification Strategy (MANDATORY)
|
|
|
|
> **ZERO HUMAN INTERVENTION** — ALL verification is agent-executed. No exceptions.
|
|
> Acceptance criteria requiring "user manually tests/confirms" are FORBIDDEN.
|
|
|
|
### Test Decision
|
|
- **Infrastructure exists**: [YES/NO]
|
|
- **Automated tests**: [TDD / Tests-after / None]
|
|
- **Framework**: [bun test / vitest / jest / pytest / none]
|
|
- **If TDD**: Each task follows RED (failing test) → GREEN (minimal impl) → REFACTOR
|
|
|
|
### QA Policy
|
|
Every task MUST include agent-executed QA scenarios (see TODO template below).
|
|
Evidence saved to \`.sisyphus/evidence/task-{N}-{scenario-slug}.{ext}\`.
|
|
|
|
| Deliverable Type | Verification Tool | Method |
|
|
|------------------|-------------------|--------|
|
|
| Frontend/UI | Playwright (playwright skill) | Navigate, interact, assert DOM, screenshot |
|
|
| TUI/CLI | interactive_bash (tmux) | Run command, send keystrokes, validate output |
|
|
| API/Backend | Bash (curl) | Send requests, assert status + response fields |
|
|
| Library/Module | Bash (bun/node REPL) | Import, call functions, compare output |
|
|
|
|
---
|
|
|
|
## Execution Strategy
|
|
|
|
### Parallel Execution Waves
|
|
|
|
> Maximize throughput by grouping independent tasks into parallel waves.
|
|
> Each wave completes before the next begins.
|
|
> Target: 5-8 tasks per wave. Fewer than 3 per wave (except final) = under-splitting.
|
|
|
|
\`\`\`
|
|
Wave 1 (Start Immediately — foundation + scaffolding):
|
|
├── Task 1: Project scaffolding + config [quick]
|
|
├── Task 2: Design system tokens [quick]
|
|
├── Task 3: Type definitions [quick]
|
|
├── Task 4: Schema definitions [quick]
|
|
├── Task 5: Storage interface + in-memory impl [quick]
|
|
├── Task 6: Auth middleware [quick]
|
|
└── Task 7: Client module [quick]
|
|
|
|
Wave 2 (After Wave 1 — core modules, MAX PARALLEL):
|
|
├── Task 8: Core business logic (depends: 3, 5, 7) [deep]
|
|
├── Task 9: API endpoints (depends: 4, 5) [unspecified-high]
|
|
├── Task 10: Secondary storage impl (depends: 5) [unspecified-high]
|
|
├── Task 11: Retry/fallback logic (depends: 8) [deep]
|
|
├── Task 12: UI layout + navigation (depends: 2) [visual-engineering]
|
|
├── Task 13: API client + hooks (depends: 4) [quick]
|
|
└── Task 14: Telemetry middleware (depends: 5, 10) [unspecified-high]
|
|
|
|
Wave 3 (After Wave 2 — integration + UI):
|
|
├── Task 15: Main route combining modules (depends: 6, 11, 14) [deep]
|
|
├── Task 16: UI data visualization (depends: 12, 13) [visual-engineering]
|
|
├── Task 17: Deployment config A (depends: 15) [quick]
|
|
├── Task 18: Deployment config B (depends: 15) [quick]
|
|
├── Task 19: Deployment config C (depends: 15) [quick]
|
|
└── Task 20: UI request log + build (depends: 16) [visual-engineering]
|
|
|
|
Wave 4 (After Wave 3 — verification):
|
|
├── Task 21: Integration tests (depends: 15) [deep]
|
|
├── Task 22: UI QA - Playwright (depends: 20) [unspecified-high]
|
|
├── Task 23: E2E QA (depends: 21) [deep]
|
|
└── Task 24: Git cleanup + tagging (depends: 21) [git]
|
|
|
|
Wave FINAL (After ALL tasks — independent review, 4 parallel):
|
|
├── Task F1: Plan compliance audit (oracle)
|
|
├── Task F2: Code quality review (unspecified-high)
|
|
├── Task F3: Real manual QA (unspecified-high)
|
|
└── Task F4: Scope fidelity check (deep)
|
|
|
|
Critical Path: Task 1 → Task 5 → Task 8 → Task 11 → Task 15 → Task 21 → F1-F4
|
|
Parallel Speedup: ~70% faster than sequential
|
|
Max Concurrent: 7 (Waves 1 & 2)
|
|
\`\`\`
|
|
|
|
### Dependency Matrix (abbreviated — show ALL tasks in your generated plan)
|
|
|
|
| Task | Depends On | Blocks | Wave |
|
|
|------|------------|--------|------|
|
|
| 1-7 | — | 8-14 | 1 |
|
|
| 8 | 3, 5, 7 | 11, 15 | 2 |
|
|
| 11 | 8 | 15 | 2 |
|
|
| 14 | 5, 10 | 15 | 2 |
|
|
| 15 | 6, 11, 14 | 17-19, 21 | 3 |
|
|
| 21 | 15 | 23, 24 | 4 |
|
|
|
|
> This is abbreviated for reference. YOUR generated plan must include the FULL matrix for ALL tasks.
|
|
|
|
### Agent Dispatch Summary
|
|
|
|
| Wave | # Parallel | Tasks → Agent Category |
|
|
|------|------------|----------------------|
|
|
| 1 | **7** | T1-T4 → \`quick\`, T5 → \`quick\`, T6 → \`quick\`, T7 → \`quick\` |
|
|
| 2 | **7** | T8 → \`deep\`, T9 → \`unspecified-high\`, T10 → \`unspecified-high\`, T11 → \`deep\`, T12 → \`visual-engineering\`, T13 → \`quick\`, T14 → \`unspecified-high\` |
|
|
| 3 | **6** | T15 → \`deep\`, T16 → \`visual-engineering\`, T17-T19 → \`quick\`, T20 → \`visual-engineering\` |
|
|
| 4 | **4** | T21 → \`deep\`, T22 → \`unspecified-high\`, T23 → \`deep\`, T24 → \`git\` |
|
|
| FINAL | **4** | F1 → \`oracle\`, F2 → \`unspecified-high\`, F3 → \`unspecified-high\`, F4 → \`deep\` |
|
|
|
|
---
|
|
|
|
## TODOs
|
|
|
|
> Implementation + Test = ONE Task. Never separate.
|
|
> EVERY task MUST have: Recommended Agent Profile + Parallelization info + QA Scenarios.
|
|
> **A task WITHOUT QA Scenarios is INCOMPLETE. No exceptions.**
|
|
|
|
- [ ] 1. [Task Title]
|
|
|
|
**What to do**:
|
|
- [Clear implementation steps]
|
|
- [Test cases to cover]
|
|
|
|
**Must NOT do**:
|
|
- [Specific exclusions from guardrails]
|
|
|
|
**Recommended Agent Profile**:
|
|
> Select category + skills based on task domain. Justify each choice.
|
|
- **Category**: \`[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]\`
|
|
- Reason: [Why this category fits the task domain]
|
|
- **Skills**: [\`skill-1\`, \`skill-2\`]
|
|
- \`skill-1\`: [Why needed - domain overlap explanation]
|
|
- \`skill-2\`: [Why needed - domain overlap explanation]
|
|
- **Skills Evaluated but Omitted**:
|
|
- \`omitted-skill\`: [Why domain doesn't overlap]
|
|
|
|
**Parallelization**:
|
|
- **Can Run In Parallel**: YES | NO
|
|
- **Parallel Group**: Wave N (with Tasks X, Y) | Sequential
|
|
- **Blocks**: [Tasks that depend on this task completing]
|
|
- **Blocked By**: [Tasks this depends on] | None (can start immediately)
|
|
|
|
**References** (CRITICAL - Be Exhaustive):
|
|
|
|
> The executor has NO context from your interview. References are their ONLY guide.
|
|
> Each reference must answer: "What should I look at and WHY?"
|
|
|
|
**Pattern References** (existing code to follow):
|
|
- \`src/services/auth.ts:45-78\` - Authentication flow pattern (JWT creation, refresh token handling)
|
|
|
|
**API/Type References** (contracts to implement against):
|
|
- \`src/types/user.ts:UserDTO\` - Response shape for user endpoints
|
|
|
|
**Test References** (testing patterns to follow):
|
|
- \`src/__tests__/auth.test.ts:describe("login")\` - Test structure and mocking patterns
|
|
|
|
**External References** (libraries and frameworks):
|
|
- Official docs: \`https://zod.dev/?id=basic-usage\` - Zod validation syntax
|
|
|
|
**WHY Each Reference Matters** (explain the relevance):
|
|
- Don't just list files - explain what pattern/information the executor should extract
|
|
- Bad: \`src/utils.ts\` (vague, which utils? why?)
|
|
- Good: \`src/utils/validation.ts:sanitizeInput()\` - Use this sanitization pattern for user input
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
> **AGENT-EXECUTABLE VERIFICATION ONLY** — No human action permitted.
|
|
> Every criterion MUST be verifiable by running a command or using a tool.
|
|
|
|
**If TDD (tests enabled):**
|
|
- [ ] Test file created: src/auth/login.test.ts
|
|
- [ ] bun test src/auth/login.test.ts → PASS (3 tests, 0 failures)
|
|
|
|
**QA Scenarios (MANDATORY — task is INCOMPLETE without these):**
|
|
|
|
> **This is NOT optional. A task without QA scenarios WILL BE REJECTED.**
|
|
>
|
|
> Write scenario tests that verify the ACTUAL BEHAVIOR of what you built.
|
|
> Minimum: 1 happy path + 1 failure/edge case per task.
|
|
> Each scenario = exact tool + exact steps + exact assertions + evidence path.
|
|
>
|
|
> **The executing agent MUST run these scenarios after implementation.**
|
|
> **The orchestrator WILL verify evidence files exist before marking task complete.**
|
|
|
|
\\\`\\\`\\\`
|
|
Scenario: [Happy path — what SHOULD work]
|
|
Tool: [Playwright / interactive_bash / Bash (curl)]
|
|
Preconditions: [Exact setup state]
|
|
Steps:
|
|
1. [Exact action — specific command/selector/endpoint, no vagueness]
|
|
2. [Next action — with expected intermediate state]
|
|
3. [Assertion — exact expected value, not "verify it works"]
|
|
Expected Result: [Concrete, observable, binary pass/fail]
|
|
Failure Indicators: [What specifically would mean this failed]
|
|
Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}.{ext}
|
|
|
|
Scenario: [Failure/edge case — what SHOULD fail gracefully]
|
|
Tool: [same format]
|
|
Preconditions: [Invalid input / missing dependency / error state]
|
|
Steps:
|
|
1. [Trigger the error condition]
|
|
2. [Assert error is handled correctly]
|
|
Expected Result: [Graceful failure with correct error message/code]
|
|
Evidence: .sisyphus/evidence/task-{N}-{scenario-slug}-error.{ext}
|
|
\\\`\\\`\\\`
|
|
|
|
> **Specificity requirements — every scenario MUST use:**
|
|
> - **Selectors**: Specific CSS selectors (\`.login-button\`, not "the login button")
|
|
> - **Data**: Concrete test data (\`"test@example.com"\`, not \`"[email]"\`)
|
|
> - **Assertions**: Exact values (\`text contains "Welcome back"\`, not "verify it works")
|
|
> - **Timing**: Wait conditions where relevant (\`timeout: 10s\`)
|
|
> - **Negative**: At least ONE failure/error scenario per task
|
|
>
|
|
> **Anti-patterns (your scenario is INVALID if it looks like this):**
|
|
> - ❌ "Verify it works correctly" — HOW? What does "correctly" mean?
|
|
> - ❌ "Check the API returns data" — WHAT data? What fields? What values?
|
|
> - ❌ "Test the component renders" — WHERE? What selector? What content?
|
|
> - ❌ Any scenario without an evidence path
|
|
|
|
**Evidence to Capture:**
|
|
- [ ] Each evidence file named: task-{N}-{scenario-slug}.{ext}
|
|
- [ ] Screenshots for UI, terminal output for CLI, response bodies for API
|
|
|
|
**Commit**: YES | NO (groups with N)
|
|
- Message: \`type(scope): desc\`
|
|
- Files: \`path/to/file\`
|
|
- Pre-commit: \`test command\`
|
|
|
|
---
|
|
|
|
## Final Verification Wave (MANDATORY — after ALL implementation tasks)
|
|
|
|
> 4 review agents run in PARALLEL. ALL must APPROVE. Rejection → fix → re-run.
|
|
|
|
- [ ] F1. **Plan Compliance Audit** — \`oracle\`
|
|
Read the plan end-to-end. For each "Must Have": verify implementation exists (read file, curl endpoint, run command). For each "Must NOT Have": search codebase for forbidden patterns — reject with file:line if found. Check evidence files exist in .sisyphus/evidence/. Compare deliverables against plan.
|
|
Output: \`Must Have [N/N] | Must NOT Have [N/N] | Tasks [N/N] | VERDICT: APPROVE/REJECT\`
|
|
|
|
- [ ] F2. **Code Quality Review** — \`unspecified-high\`
|
|
Run \`tsc --noEmit\` + linter + \`bun test\`. Review all changed files for: \`as any\`/\`@ts-ignore\`, empty catches, console.log in prod, commented-out code, unused imports. Check AI slop: excessive comments, over-abstraction, generic names (data/result/item/temp).
|
|
Output: \`Build [PASS/FAIL] | Lint [PASS/FAIL] | Tests [N pass/N fail] | Files [N clean/N issues] | VERDICT\`
|
|
|
|
- [ ] F3. **Real Manual QA** — \`unspecified-high\` (+ \`playwright\` skill if UI)
|
|
Start from clean state. Execute EVERY QA scenario from EVERY task — follow exact steps, capture evidence. Test cross-task integration (features working together, not isolation). Test edge cases: empty state, invalid input, rapid actions. Save to \`.sisyphus/evidence/final-qa/\`.
|
|
Output: \`Scenarios [N/N pass] | Integration [N/N] | Edge Cases [N tested] | VERDICT\`
|
|
|
|
- [ ] F4. **Scope Fidelity Check** — \`deep\`
|
|
For each task: read "What to do", read actual diff (git log/diff). Verify 1:1 — everything in spec was built (no missing), nothing beyond spec was built (no creep). Check "Must NOT do" compliance. Detect cross-task contamination: Task N touching Task M's files. Flag unaccounted changes.
|
|
Output: \`Tasks [N/N compliant] | Contamination [CLEAN/N issues] | Unaccounted [CLEAN/N files] | VERDICT\`
|
|
|
|
---
|
|
|
|
## Commit Strategy
|
|
|
|
| After Task | Message | Files | Verification |
|
|
|------------|---------|-------|--------------|
|
|
| 1 | \`type(scope): desc\` | file.ts | npm test |
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
### Verification Commands
|
|
\`\`\`bash
|
|
command # Expected: output
|
|
\`\`\`
|
|
|
|
### Final Checklist
|
|
- [ ] All "Must Have" present
|
|
- [ ] All "Must NOT Have" absent
|
|
- [ ] All tests pass
|
|
\`\`\`
|
|
|
|
---
|
|
`
|