- Rename delegate_task tool to task across codebase (100 files) - Update model references: claude-opus-4-6 → 4-5, gpt-5.3-codex → 5.2-codex - Add tool-metadata-store to restore metadata overwritten by fromPlugin() - Add session ID polling for BackgroundManager task sessions - Await async ctx.metadata() calls in tool executors - Add ses_ prefix guard to getMessageDir for performance - Harden BackgroundManager with idle deferral and error handling - Fix duplicate task key in sisyphus-junior test object literals - Fix unawaited showOutputToUser in ast_grep_replace - Fix background=true → run_in_background=true in ultrawork prompt - Fix duplicate task/task references in docs and comments
424 lines
15 KiB
TypeScript
424 lines
15 KiB
TypeScript
/**
|
|
* Prometheus Plan Template
|
|
*
|
|
* The markdown template structure for work plans generated by Prometheus.
|
|
* Includes TL;DR, context, objectives, verification strategy, TODOs, and success criteria.
|
|
*/
|
|
|
|
export const PROMETHEUS_PLAN_TEMPLATE = `## Plan Structure
|
|
|
|
Generate plan to: \`.sisyphus/plans/{name}.md\`
|
|
|
|
\`\`\`markdown
|
|
# {Plan Title}
|
|
|
|
## TL;DR
|
|
|
|
> **Quick Summary**: [1-2 sentences capturing the core objective and approach]
|
|
>
|
|
> **Deliverables**: [Bullet list of concrete outputs]
|
|
> - [Output 1]
|
|
> - [Output 2]
|
|
>
|
|
> **Estimated Effort**: [Quick | Short | Medium | Large | XL]
|
|
> **Parallel Execution**: [YES - N waves | NO - sequential]
|
|
> **Critical Path**: [Task X → Task Y → Task Z]
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
### Original Request
|
|
[User's initial description]
|
|
|
|
### Interview Summary
|
|
**Key Discussions**:
|
|
- [Point 1]: [User's decision/preference]
|
|
- [Point 2]: [Agreed approach]
|
|
|
|
**Research Findings**:
|
|
- [Finding 1]: [Implication]
|
|
- [Finding 2]: [Recommendation]
|
|
|
|
### Metis Review
|
|
**Identified Gaps** (addressed):
|
|
- [Gap 1]: [How resolved]
|
|
- [Gap 2]: [How resolved]
|
|
|
|
---
|
|
|
|
## Work Objectives
|
|
|
|
### Core Objective
|
|
[1-2 sentences: what we're achieving]
|
|
|
|
### Concrete Deliverables
|
|
- [Exact file/endpoint/feature]
|
|
|
|
### Definition of Done
|
|
- [ ] [Verifiable condition with command]
|
|
|
|
### Must Have
|
|
- [Non-negotiable requirement]
|
|
|
|
### Must NOT Have (Guardrails)
|
|
- [Explicit exclusion from Metis review]
|
|
- [AI slop pattern to avoid]
|
|
- [Scope boundary]
|
|
|
|
---
|
|
|
|
## Verification Strategy (MANDATORY)
|
|
|
|
> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
|
|
>
|
|
> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
|
|
> This is NOT conditional — it applies to EVERY task, regardless of test strategy.
|
|
>
|
|
> **FORBIDDEN** — acceptance criteria that require:
|
|
> - "User manually tests..." / "사용자가 직접 테스트..."
|
|
> - "User visually confirms..." / "사용자가 눈으로 확인..."
|
|
> - "User interacts with..." / "사용자가 직접 조작..."
|
|
> - "Ask user to verify..." / "사용자에게 확인 요청..."
|
|
> - ANY step where a human must perform an action
|
|
>
|
|
> **ALL verification is executed by the agent** using tools (Playwright, interactive_bash, curl, etc.). No exceptions.
|
|
|
|
### Test Decision
|
|
- **Infrastructure exists**: [YES/NO]
|
|
- **Automated tests**: [TDD / Tests-after / None]
|
|
- **Framework**: [bun test / vitest / jest / pytest / none]
|
|
|
|
### If TDD Enabled
|
|
|
|
Each TODO follows RED-GREEN-REFACTOR:
|
|
|
|
**Task Structure:**
|
|
1. **RED**: Write failing test first
|
|
- Test file: \`[path].test.ts\`
|
|
- Test command: \`bun test [file]\`
|
|
- Expected: FAIL (test exists, implementation doesn't)
|
|
2. **GREEN**: Implement minimum code to pass
|
|
- Command: \`bun test [file]\`
|
|
- Expected: PASS
|
|
3. **REFACTOR**: Clean up while keeping green
|
|
- Command: \`bun test [file]\`
|
|
- Expected: PASS (still)
|
|
|
|
**Test Setup Task (if infrastructure doesn't exist):**
|
|
- [ ] 0. Setup Test Infrastructure
|
|
- Install: \`bun add -d [test-framework]\`
|
|
- Config: Create \`[config-file]\`
|
|
- Verify: \`bun test --help\` → shows help
|
|
- Example: Create \`src/__tests__/example.test.ts\`
|
|
- Verify: \`bun test\` → 1 test passes
|
|
|
|
### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
|
|
|
|
> Whether TDD is enabled or not, EVERY task MUST include Agent-Executed QA Scenarios.
|
|
> - **With TDD**: QA scenarios complement unit tests at integration/E2E level
|
|
> - **Without TDD**: QA scenarios are the PRIMARY verification method
|
|
>
|
|
> These describe how the executing agent DIRECTLY verifies the deliverable
|
|
> by running it — opening browsers, executing commands, sending API requests.
|
|
> The agent performs what a human tester would do, but automated via tools.
|
|
|
|
**Verification Tool by Deliverable Type:**
|
|
|
|
| Type | Tool | How Agent Verifies |
|
|
|------|------|-------------------|
|
|
| **Frontend/UI** | Playwright (playwright skill) | Navigate, interact, assert DOM, screenshot |
|
|
| **TUI/CLI** | interactive_bash (tmux) | Run command, send keystrokes, validate output |
|
|
| **API/Backend** | Bash (curl/httpie) | Send requests, parse responses, assert fields |
|
|
| **Library/Module** | Bash (bun/node REPL) | Import, call functions, compare output |
|
|
| **Config/Infra** | Bash (shell commands) | Apply config, run state checks, validate |
|
|
|
|
**Each Scenario MUST Follow This Format:**
|
|
|
|
\`\`\`
|
|
Scenario: [Descriptive name — what user action/flow is being verified]
|
|
Tool: [Playwright / interactive_bash / Bash]
|
|
Preconditions: [What must be true before this scenario runs]
|
|
Steps:
|
|
1. [Exact action with specific selector/command/endpoint]
|
|
2. [Next action with expected intermediate state]
|
|
3. [Assertion with exact expected value]
|
|
Expected Result: [Concrete, observable outcome]
|
|
Failure Indicators: [What would indicate failure]
|
|
Evidence: [Screenshot path / output capture / response body path]
|
|
\`\`\`
|
|
|
|
**Scenario Detail Requirements:**
|
|
- **Selectors**: Specific CSS selectors (\`.login-button\`, not "the login button")
|
|
- **Data**: Concrete test data (\`"test@example.com"\`, not \`"[email]"\`)
|
|
- **Assertions**: Exact values (\`text contains "Welcome back"\`, not "verify it works")
|
|
- **Timing**: Include wait conditions where relevant (\`Wait for .dashboard (timeout: 10s)\`)
|
|
- **Negative Scenarios**: At least ONE failure/error scenario per feature
|
|
- **Evidence Paths**: Specific file paths (\`.sisyphus/evidence/task-N-scenario-name.png\`)
|
|
|
|
**Anti-patterns (NEVER write scenarios like this):**
|
|
- ❌ "Verify the login page works correctly"
|
|
- ❌ "Check that the API returns the right data"
|
|
- ❌ "Test the form validation"
|
|
- ❌ "User opens browser and confirms..."
|
|
|
|
**Write scenarios like this instead:**
|
|
- ✅ \`Navigate to /login → Fill input[name="email"] with "test@example.com" → Fill input[name="password"] with "Pass123!" → Click button[type="submit"] → Wait for /dashboard → Assert h1 contains "Welcome"\`
|
|
- ✅ \`POST /api/users {"name":"Test","email":"new@test.com"} → Assert status 201 → Assert response.id is UUID → GET /api/users/{id} → Assert name equals "Test"\`
|
|
- ✅ \`Run ./cli --config test.yaml → Wait for "Loaded" in stdout → Send "q" → Assert exit code 0 → Assert stdout contains "Goodbye"\`
|
|
|
|
**Evidence Requirements:**
|
|
- Screenshots: \`.sisyphus/evidence/\` for all UI verifications
|
|
- Terminal output: Captured for CLI/TUI verifications
|
|
- Response bodies: Saved for API verifications
|
|
- All evidence referenced by specific file path in acceptance criteria
|
|
|
|
---
|
|
|
|
## Execution Strategy
|
|
|
|
### Parallel Execution Waves
|
|
|
|
> Maximize throughput by grouping independent tasks into parallel waves.
|
|
> Each wave completes before the next begins.
|
|
|
|
\`\`\`
|
|
Wave 1 (Start Immediately):
|
|
├── Task 1: [no dependencies]
|
|
└── Task 5: [no dependencies]
|
|
|
|
Wave 2 (After Wave 1):
|
|
├── Task 2: [depends: 1]
|
|
├── Task 3: [depends: 1]
|
|
└── Task 6: [depends: 5]
|
|
|
|
Wave 3 (After Wave 2):
|
|
└── Task 4: [depends: 2, 3]
|
|
|
|
Critical Path: Task 1 → Task 2 → Task 4
|
|
Parallel Speedup: ~40% faster than sequential
|
|
\`\`\`
|
|
|
|
### Dependency Matrix
|
|
|
|
| Task | Depends On | Blocks | Can Parallelize With |
|
|
|------|------------|--------|---------------------|
|
|
| 1 | None | 2, 3 | 5 |
|
|
| 2 | 1 | 4 | 3, 6 |
|
|
| 3 | 1 | 4 | 2, 6 |
|
|
| 4 | 2, 3 | None | None (final) |
|
|
| 5 | None | 6 | 1 |
|
|
| 6 | 5 | None | 2, 3 |
|
|
|
|
### Agent Dispatch Summary
|
|
|
|
| Wave | Tasks | Recommended Agents |
|
|
|------|-------|-------------------|
|
|
| 1 | 1, 5 | task(category="...", load_skills=[...], run_in_background=false) |
|
|
| 2 | 2, 3, 6 | dispatch parallel after Wave 1 completes |
|
|
| 3 | 4 | final integration task |
|
|
|
|
---
|
|
|
|
## TODOs
|
|
|
|
> Implementation + Test = ONE Task. Never separate.
|
|
> EVERY task MUST have: Recommended Agent Profile + Parallelization info.
|
|
|
|
- [ ] 1. [Task Title]
|
|
|
|
**What to do**:
|
|
- [Clear implementation steps]
|
|
- [Test cases to cover]
|
|
|
|
**Must NOT do**:
|
|
- [Specific exclusions from guardrails]
|
|
|
|
**Recommended Agent Profile**:
|
|
> Select category + skills based on task domain. Justify each choice.
|
|
- **Category**: \`[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]\`
|
|
- Reason: [Why this category fits the task domain]
|
|
- **Skills**: [\`skill-1\`, \`skill-2\`]
|
|
- \`skill-1\`: [Why needed - domain overlap explanation]
|
|
- \`skill-2\`: [Why needed - domain overlap explanation]
|
|
- **Skills Evaluated but Omitted**:
|
|
- \`omitted-skill\`: [Why domain doesn't overlap]
|
|
|
|
**Parallelization**:
|
|
- **Can Run In Parallel**: YES | NO
|
|
- **Parallel Group**: Wave N (with Tasks X, Y) | Sequential
|
|
- **Blocks**: [Tasks that depend on this task completing]
|
|
- **Blocked By**: [Tasks this depends on] | None (can start immediately)
|
|
|
|
**References** (CRITICAL - Be Exhaustive):
|
|
|
|
> The executor has NO context from your interview. References are their ONLY guide.
|
|
> Each reference must answer: "What should I look at and WHY?"
|
|
|
|
**Pattern References** (existing code to follow):
|
|
- \`src/services/auth.ts:45-78\` - Authentication flow pattern (JWT creation, refresh token handling)
|
|
- \`src/hooks/useForm.ts:12-34\` - Form validation pattern (Zod schema + react-hook-form integration)
|
|
|
|
**API/Type References** (contracts to implement against):
|
|
- \`src/types/user.ts:UserDTO\` - Response shape for user endpoints
|
|
- \`src/api/schema.ts:createUserSchema\` - Request validation schema
|
|
|
|
**Test References** (testing patterns to follow):
|
|
- \`src/__tests__/auth.test.ts:describe("login")\` - Test structure and mocking patterns
|
|
|
|
**Documentation References** (specs and requirements):
|
|
- \`docs/api-spec.md#authentication\` - API contract details
|
|
- \`ARCHITECTURE.md:Database Layer\` - Database access patterns
|
|
|
|
**External References** (libraries and frameworks):
|
|
- Official docs: \`https://zod.dev/?id=basic-usage\` - Zod validation syntax
|
|
- Example repo: \`github.com/example/project/src/auth\` - Reference implementation
|
|
|
|
**WHY Each Reference Matters** (explain the relevance):
|
|
- Don't just list files - explain what pattern/information the executor should extract
|
|
- Bad: \`src/utils.ts\` (vague, which utils? why?)
|
|
- Good: \`src/utils/validation.ts:sanitizeInput()\` - Use this sanitization pattern for user input
|
|
|
|
**Acceptance Criteria**:
|
|
|
|
> **AGENT-EXECUTABLE VERIFICATION ONLY** — No human action permitted.
|
|
> Every criterion MUST be verifiable by running a command or using a tool.
|
|
> REPLACE all placeholders with actual values from task context.
|
|
|
|
**If TDD (tests enabled):**
|
|
- [ ] Test file created: src/auth/login.test.ts
|
|
- [ ] Test covers: successful login returns JWT token
|
|
- [ ] bun test src/auth/login.test.ts → PASS (3 tests, 0 failures)
|
|
|
|
**Agent-Executed QA Scenarios (MANDATORY — per-scenario, ultra-detailed):**
|
|
|
|
> Write MULTIPLE named scenarios per task: happy path AND failure cases.
|
|
> Each scenario = exact tool + steps with real selectors/data + evidence path.
|
|
|
|
**Example — Frontend/UI (Playwright):**
|
|
|
|
\\\`\\\`\\\`
|
|
Scenario: Successful login redirects to dashboard
|
|
Tool: Playwright (playwright skill)
|
|
Preconditions: Dev server running on localhost:3000, test user exists
|
|
Steps:
|
|
1. Navigate to: http://localhost:3000/login
|
|
2. Wait for: input[name="email"] visible (timeout: 5s)
|
|
3. Fill: input[name="email"] → "test@example.com"
|
|
4. Fill: input[name="password"] → "ValidPass123!"
|
|
5. Click: button[type="submit"]
|
|
6. Wait for: navigation to /dashboard (timeout: 10s)
|
|
7. Assert: h1 text contains "Welcome back"
|
|
8. Assert: cookie "session_token" exists
|
|
9. Screenshot: .sisyphus/evidence/task-1-login-success.png
|
|
Expected Result: Dashboard loads with welcome message
|
|
Evidence: .sisyphus/evidence/task-1-login-success.png
|
|
|
|
Scenario: Login fails with invalid credentials
|
|
Tool: Playwright (playwright skill)
|
|
Preconditions: Dev server running, no valid user with these credentials
|
|
Steps:
|
|
1. Navigate to: http://localhost:3000/login
|
|
2. Fill: input[name="email"] → "wrong@example.com"
|
|
3. Fill: input[name="password"] → "WrongPass"
|
|
4. Click: button[type="submit"]
|
|
5. Wait for: .error-message visible (timeout: 5s)
|
|
6. Assert: .error-message text contains "Invalid credentials"
|
|
7. Assert: URL is still /login (no redirect)
|
|
8. Screenshot: .sisyphus/evidence/task-1-login-failure.png
|
|
Expected Result: Error message shown, stays on login page
|
|
Evidence: .sisyphus/evidence/task-1-login-failure.png
|
|
\\\`\\\`\\\`
|
|
|
|
**Example — API/Backend (curl):**
|
|
|
|
\\\`\\\`\\\`
|
|
Scenario: Create user returns 201 with UUID
|
|
Tool: Bash (curl)
|
|
Preconditions: Server running on localhost:8080
|
|
Steps:
|
|
1. curl -s -w "\\n%{http_code}" -X POST http://localhost:8080/api/users \\
|
|
-H "Content-Type: application/json" \\
|
|
-d '{"email":"new@test.com","name":"Test User"}'
|
|
2. Assert: HTTP status is 201
|
|
3. Assert: response.id matches UUID format
|
|
4. GET /api/users/{returned-id} → Assert name equals "Test User"
|
|
Expected Result: User created and retrievable
|
|
Evidence: Response bodies captured
|
|
|
|
Scenario: Duplicate email returns 409
|
|
Tool: Bash (curl)
|
|
Preconditions: User with email "new@test.com" already exists
|
|
Steps:
|
|
1. Repeat POST with same email
|
|
2. Assert: HTTP status is 409
|
|
3. Assert: response.error contains "already exists"
|
|
Expected Result: Conflict error returned
|
|
Evidence: Response body captured
|
|
\\\`\\\`\\\`
|
|
|
|
**Example — TUI/CLI (interactive_bash):**
|
|
|
|
\\\`\\\`\\\`
|
|
Scenario: CLI loads config and displays menu
|
|
Tool: interactive_bash (tmux)
|
|
Preconditions: Binary built, test config at ./test.yaml
|
|
Steps:
|
|
1. tmux new-session: ./my-cli --config test.yaml
|
|
2. Wait for: "Configuration loaded" in output (timeout: 5s)
|
|
3. Assert: Menu items visible ("1. Create", "2. List", "3. Exit")
|
|
4. Send keys: "3" then Enter
|
|
5. Assert: "Goodbye" in output
|
|
6. Assert: Process exited with code 0
|
|
Expected Result: CLI starts, shows menu, exits cleanly
|
|
Evidence: Terminal output captured
|
|
|
|
Scenario: CLI handles missing config gracefully
|
|
Tool: interactive_bash (tmux)
|
|
Preconditions: No config file at ./nonexistent.yaml
|
|
Steps:
|
|
1. tmux new-session: ./my-cli --config nonexistent.yaml
|
|
2. Wait for: output (timeout: 3s)
|
|
3. Assert: stderr contains "Config file not found"
|
|
4. Assert: Process exited with code 1
|
|
Expected Result: Meaningful error, non-zero exit
|
|
Evidence: Error output captured
|
|
\\\`\\\`\\\`
|
|
|
|
**Evidence to Capture:**
|
|
- [ ] Screenshots in .sisyphus/evidence/ for UI scenarios
|
|
- [ ] Terminal output for CLI/TUI scenarios
|
|
- [ ] Response bodies for API scenarios
|
|
- [ ] Each evidence file named: task-{N}-{scenario-slug}.{ext}
|
|
|
|
**Commit**: YES | NO (groups with N)
|
|
- Message: \`type(scope): desc\`
|
|
- Files: \`path/to/file\`
|
|
- Pre-commit: \`test command\`
|
|
|
|
---
|
|
|
|
## Commit Strategy
|
|
|
|
| After Task | Message | Files | Verification |
|
|
|------------|---------|-------|--------------|
|
|
| 1 | \`type(scope): desc\` | file.ts | npm test |
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
### Verification Commands
|
|
\`\`\`bash
|
|
command # Expected: output
|
|
\`\`\`
|
|
|
|
### Final Checklist
|
|
- [ ] All "Must Have" present
|
|
- [ ] All "Must NOT Have" absent
|
|
- [ ] All tests pass
|
|
\`\`\`
|
|
|
|
---
|
|
`
|