Files
oh-my-openagent/src/agents/prometheus/plan-template.ts
YeonGyu-Kim a691a3ac0a refactor: migrate delegate_task to task tool with metadata fixes
- Rename delegate_task tool to task across codebase (100 files)
- Update model references: claude-opus-4-6 → 4-5, gpt-5.3-codex → 5.2-codex
- Add tool-metadata-store to restore metadata overwritten by fromPlugin()
- Add session ID polling for BackgroundManager task sessions
- Await async ctx.metadata() calls in tool executors
- Add ses_ prefix guard to getMessageDir for performance
- Harden BackgroundManager with idle deferral and error handling
- Fix duplicate task key in sisyphus-junior test object literals
- Fix unawaited showOutputToUser in ast_grep_replace
- Fix background=true → run_in_background=true in ultrawork prompt
- Fix duplicate task/task references in docs and comments
2026-02-06 21:35:30 +09:00

424 lines
15 KiB
TypeScript

/**
* Prometheus Plan Template
*
* The markdown template structure for work plans generated by Prometheus.
* Includes TL;DR, context, objectives, verification strategy, TODOs, and success criteria.
*/
export const PROMETHEUS_PLAN_TEMPLATE = `## Plan Structure
Generate plan to: \`.sisyphus/plans/{name}.md\`
\`\`\`markdown
# {Plan Title}
## TL;DR
> **Quick Summary**: [1-2 sentences capturing the core objective and approach]
>
> **Deliverables**: [Bullet list of concrete outputs]
> - [Output 1]
> - [Output 2]
>
> **Estimated Effort**: [Quick | Short | Medium | Large | XL]
> **Parallel Execution**: [YES - N waves | NO - sequential]
> **Critical Path**: [Task X → Task Y → Task Z]
---
## Context
### Original Request
[User's initial description]
### Interview Summary
**Key Discussions**:
- [Point 1]: [User's decision/preference]
- [Point 2]: [Agreed approach]
**Research Findings**:
- [Finding 1]: [Implication]
- [Finding 2]: [Recommendation]
### Metis Review
**Identified Gaps** (addressed):
- [Gap 1]: [How resolved]
- [Gap 2]: [How resolved]
---
## Work Objectives
### Core Objective
[1-2 sentences: what we're achieving]
### Concrete Deliverables
- [Exact file/endpoint/feature]
### Definition of Done
- [ ] [Verifiable condition with command]
### Must Have
- [Non-negotiable requirement]
### Must NOT Have (Guardrails)
- [Explicit exclusion from Metis review]
- [AI slop pattern to avoid]
- [Scope boundary]
---
## Verification Strategy (MANDATORY)
> **UNIVERSAL RULE: ZERO HUMAN INTERVENTION**
>
> ALL tasks in this plan MUST be verifiable WITHOUT any human action.
> This is NOT conditional — it applies to EVERY task, regardless of test strategy.
>
> **FORBIDDEN** — acceptance criteria that require:
> - "User manually tests..." / "사용자가 직접 테스트..."
> - "User visually confirms..." / "사용자가 눈으로 확인..."
> - "User interacts with..." / "사용자가 직접 조작..."
> - "Ask user to verify..." / "사용자에게 확인 요청..."
> - ANY step where a human must perform an action
>
> **ALL verification is executed by the agent** using tools (Playwright, interactive_bash, curl, etc.). No exceptions.
### Test Decision
- **Infrastructure exists**: [YES/NO]
- **Automated tests**: [TDD / Tests-after / None]
- **Framework**: [bun test / vitest / jest / pytest / none]
### If TDD Enabled
Each TODO follows RED-GREEN-REFACTOR:
**Task Structure:**
1. **RED**: Write failing test first
- Test file: \`[path].test.ts\`
- Test command: \`bun test [file]\`
- Expected: FAIL (test exists, implementation doesn't)
2. **GREEN**: Implement minimum code to pass
- Command: \`bun test [file]\`
- Expected: PASS
3. **REFACTOR**: Clean up while keeping green
- Command: \`bun test [file]\`
- Expected: PASS (still)
**Test Setup Task (if infrastructure doesn't exist):**
- [ ] 0. Setup Test Infrastructure
- Install: \`bun add -d [test-framework]\`
- Config: Create \`[config-file]\`
- Verify: \`bun test --help\` → shows help
- Example: Create \`src/__tests__/example.test.ts\`
- Verify: \`bun test\` → 1 test passes
### Agent-Executed QA Scenarios (MANDATORY — ALL tasks)
> Whether TDD is enabled or not, EVERY task MUST include Agent-Executed QA Scenarios.
> - **With TDD**: QA scenarios complement unit tests at integration/E2E level
> - **Without TDD**: QA scenarios are the PRIMARY verification method
>
> These describe how the executing agent DIRECTLY verifies the deliverable
> by running it — opening browsers, executing commands, sending API requests.
> The agent performs what a human tester would do, but automated via tools.
**Verification Tool by Deliverable Type:**
| Type | Tool | How Agent Verifies |
|------|------|-------------------|
| **Frontend/UI** | Playwright (playwright skill) | Navigate, interact, assert DOM, screenshot |
| **TUI/CLI** | interactive_bash (tmux) | Run command, send keystrokes, validate output |
| **API/Backend** | Bash (curl/httpie) | Send requests, parse responses, assert fields |
| **Library/Module** | Bash (bun/node REPL) | Import, call functions, compare output |
| **Config/Infra** | Bash (shell commands) | Apply config, run state checks, validate |
**Each Scenario MUST Follow This Format:**
\`\`\`
Scenario: [Descriptive name — what user action/flow is being verified]
Tool: [Playwright / interactive_bash / Bash]
Preconditions: [What must be true before this scenario runs]
Steps:
1. [Exact action with specific selector/command/endpoint]
2. [Next action with expected intermediate state]
3. [Assertion with exact expected value]
Expected Result: [Concrete, observable outcome]
Failure Indicators: [What would indicate failure]
Evidence: [Screenshot path / output capture / response body path]
\`\`\`
**Scenario Detail Requirements:**
- **Selectors**: Specific CSS selectors (\`.login-button\`, not "the login button")
- **Data**: Concrete test data (\`"test@example.com"\`, not \`"[email]"\`)
- **Assertions**: Exact values (\`text contains "Welcome back"\`, not "verify it works")
- **Timing**: Include wait conditions where relevant (\`Wait for .dashboard (timeout: 10s)\`)
- **Negative Scenarios**: At least ONE failure/error scenario per feature
- **Evidence Paths**: Specific file paths (\`.sisyphus/evidence/task-N-scenario-name.png\`)
**Anti-patterns (NEVER write scenarios like this):**
- ❌ "Verify the login page works correctly"
- ❌ "Check that the API returns the right data"
- ❌ "Test the form validation"
- ❌ "User opens browser and confirms..."
**Write scenarios like this instead:**
- ✅ \`Navigate to /login → Fill input[name="email"] with "test@example.com" → Fill input[name="password"] with "Pass123!" → Click button[type="submit"] → Wait for /dashboard → Assert h1 contains "Welcome"\`
- ✅ \`POST /api/users {"name":"Test","email":"new@test.com"} → Assert status 201 → Assert response.id is UUID → GET /api/users/{id} → Assert name equals "Test"\`
- ✅ \`Run ./cli --config test.yaml → Wait for "Loaded" in stdout → Send "q" → Assert exit code 0 → Assert stdout contains "Goodbye"\`
**Evidence Requirements:**
- Screenshots: \`.sisyphus/evidence/\` for all UI verifications
- Terminal output: Captured for CLI/TUI verifications
- Response bodies: Saved for API verifications
- All evidence referenced by specific file path in acceptance criteria
---
## Execution Strategy
### Parallel Execution Waves
> Maximize throughput by grouping independent tasks into parallel waves.
> Each wave completes before the next begins.
\`\`\`
Wave 1 (Start Immediately):
├── Task 1: [no dependencies]
└── Task 5: [no dependencies]
Wave 2 (After Wave 1):
├── Task 2: [depends: 1]
├── Task 3: [depends: 1]
└── Task 6: [depends: 5]
Wave 3 (After Wave 2):
└── Task 4: [depends: 2, 3]
Critical Path: Task 1 → Task 2 → Task 4
Parallel Speedup: ~40% faster than sequential
\`\`\`
### Dependency Matrix
| Task | Depends On | Blocks | Can Parallelize With |
|------|------------|--------|---------------------|
| 1 | None | 2, 3 | 5 |
| 2 | 1 | 4 | 3, 6 |
| 3 | 1 | 4 | 2, 6 |
| 4 | 2, 3 | None | None (final) |
| 5 | None | 6 | 1 |
| 6 | 5 | None | 2, 3 |
### Agent Dispatch Summary
| Wave | Tasks | Recommended Agents |
|------|-------|-------------------|
| 1 | 1, 5 | task(category="...", load_skills=[...], run_in_background=false) |
| 2 | 2, 3, 6 | dispatch parallel after Wave 1 completes |
| 3 | 4 | final integration task |
---
## TODOs
> Implementation + Test = ONE Task. Never separate.
> EVERY task MUST have: Recommended Agent Profile + Parallelization info.
- [ ] 1. [Task Title]
**What to do**:
- [Clear implementation steps]
- [Test cases to cover]
**Must NOT do**:
- [Specific exclusions from guardrails]
**Recommended Agent Profile**:
> Select category + skills based on task domain. Justify each choice.
- **Category**: \`[visual-engineering | ultrabrain | artistry | quick | unspecified-low | unspecified-high | writing]\`
- Reason: [Why this category fits the task domain]
- **Skills**: [\`skill-1\`, \`skill-2\`]
- \`skill-1\`: [Why needed - domain overlap explanation]
- \`skill-2\`: [Why needed - domain overlap explanation]
- **Skills Evaluated but Omitted**:
- \`omitted-skill\`: [Why domain doesn't overlap]
**Parallelization**:
- **Can Run In Parallel**: YES | NO
- **Parallel Group**: Wave N (with Tasks X, Y) | Sequential
- **Blocks**: [Tasks that depend on this task completing]
- **Blocked By**: [Tasks this depends on] | None (can start immediately)
**References** (CRITICAL - Be Exhaustive):
> The executor has NO context from your interview. References are their ONLY guide.
> Each reference must answer: "What should I look at and WHY?"
**Pattern References** (existing code to follow):
- \`src/services/auth.ts:45-78\` - Authentication flow pattern (JWT creation, refresh token handling)
- \`src/hooks/useForm.ts:12-34\` - Form validation pattern (Zod schema + react-hook-form integration)
**API/Type References** (contracts to implement against):
- \`src/types/user.ts:UserDTO\` - Response shape for user endpoints
- \`src/api/schema.ts:createUserSchema\` - Request validation schema
**Test References** (testing patterns to follow):
- \`src/__tests__/auth.test.ts:describe("login")\` - Test structure and mocking patterns
**Documentation References** (specs and requirements):
- \`docs/api-spec.md#authentication\` - API contract details
- \`ARCHITECTURE.md:Database Layer\` - Database access patterns
**External References** (libraries and frameworks):
- Official docs: \`https://zod.dev/?id=basic-usage\` - Zod validation syntax
- Example repo: \`github.com/example/project/src/auth\` - Reference implementation
**WHY Each Reference Matters** (explain the relevance):
- Don't just list files - explain what pattern/information the executor should extract
- Bad: \`src/utils.ts\` (vague, which utils? why?)
- Good: \`src/utils/validation.ts:sanitizeInput()\` - Use this sanitization pattern for user input
**Acceptance Criteria**:
> **AGENT-EXECUTABLE VERIFICATION ONLY** — No human action permitted.
> Every criterion MUST be verifiable by running a command or using a tool.
> REPLACE all placeholders with actual values from task context.
**If TDD (tests enabled):**
- [ ] Test file created: src/auth/login.test.ts
- [ ] Test covers: successful login returns JWT token
- [ ] bun test src/auth/login.test.ts → PASS (3 tests, 0 failures)
**Agent-Executed QA Scenarios (MANDATORY — per-scenario, ultra-detailed):**
> Write MULTIPLE named scenarios per task: happy path AND failure cases.
> Each scenario = exact tool + steps with real selectors/data + evidence path.
**Example — Frontend/UI (Playwright):**
\\\`\\\`\\\`
Scenario: Successful login redirects to dashboard
Tool: Playwright (playwright skill)
Preconditions: Dev server running on localhost:3000, test user exists
Steps:
1. Navigate to: http://localhost:3000/login
2. Wait for: input[name="email"] visible (timeout: 5s)
3. Fill: input[name="email"] → "test@example.com"
4. Fill: input[name="password"] → "ValidPass123!"
5. Click: button[type="submit"]
6. Wait for: navigation to /dashboard (timeout: 10s)
7. Assert: h1 text contains "Welcome back"
8. Assert: cookie "session_token" exists
9. Screenshot: .sisyphus/evidence/task-1-login-success.png
Expected Result: Dashboard loads with welcome message
Evidence: .sisyphus/evidence/task-1-login-success.png
Scenario: Login fails with invalid credentials
Tool: Playwright (playwright skill)
Preconditions: Dev server running, no valid user with these credentials
Steps:
1. Navigate to: http://localhost:3000/login
2. Fill: input[name="email"] → "wrong@example.com"
3. Fill: input[name="password"] → "WrongPass"
4. Click: button[type="submit"]
5. Wait for: .error-message visible (timeout: 5s)
6. Assert: .error-message text contains "Invalid credentials"
7. Assert: URL is still /login (no redirect)
8. Screenshot: .sisyphus/evidence/task-1-login-failure.png
Expected Result: Error message shown, stays on login page
Evidence: .sisyphus/evidence/task-1-login-failure.png
\\\`\\\`\\\`
**Example — API/Backend (curl):**
\\\`\\\`\\\`
Scenario: Create user returns 201 with UUID
Tool: Bash (curl)
Preconditions: Server running on localhost:8080
Steps:
1. curl -s -w "\\n%{http_code}" -X POST http://localhost:8080/api/users \\
-H "Content-Type: application/json" \\
-d '{"email":"new@test.com","name":"Test User"}'
2. Assert: HTTP status is 201
3. Assert: response.id matches UUID format
4. GET /api/users/{returned-id} → Assert name equals "Test User"
Expected Result: User created and retrievable
Evidence: Response bodies captured
Scenario: Duplicate email returns 409
Tool: Bash (curl)
Preconditions: User with email "new@test.com" already exists
Steps:
1. Repeat POST with same email
2. Assert: HTTP status is 409
3. Assert: response.error contains "already exists"
Expected Result: Conflict error returned
Evidence: Response body captured
\\\`\\\`\\\`
**Example — TUI/CLI (interactive_bash):**
\\\`\\\`\\\`
Scenario: CLI loads config and displays menu
Tool: interactive_bash (tmux)
Preconditions: Binary built, test config at ./test.yaml
Steps:
1. tmux new-session: ./my-cli --config test.yaml
2. Wait for: "Configuration loaded" in output (timeout: 5s)
3. Assert: Menu items visible ("1. Create", "2. List", "3. Exit")
4. Send keys: "3" then Enter
5. Assert: "Goodbye" in output
6. Assert: Process exited with code 0
Expected Result: CLI starts, shows menu, exits cleanly
Evidence: Terminal output captured
Scenario: CLI handles missing config gracefully
Tool: interactive_bash (tmux)
Preconditions: No config file at ./nonexistent.yaml
Steps:
1. tmux new-session: ./my-cli --config nonexistent.yaml
2. Wait for: output (timeout: 3s)
3. Assert: stderr contains "Config file not found"
4. Assert: Process exited with code 1
Expected Result: Meaningful error, non-zero exit
Evidence: Error output captured
\\\`\\\`\\\`
**Evidence to Capture:**
- [ ] Screenshots in .sisyphus/evidence/ for UI scenarios
- [ ] Terminal output for CLI/TUI scenarios
- [ ] Response bodies for API scenarios
- [ ] Each evidence file named: task-{N}-{scenario-slug}.{ext}
**Commit**: YES | NO (groups with N)
- Message: \`type(scope): desc\`
- Files: \`path/to/file\`
- Pre-commit: \`test command\`
---
## Commit Strategy
| After Task | Message | Files | Verification |
|------------|---------|-------|--------------|
| 1 | \`type(scope): desc\` | file.ts | npm test |
---
## Success Criteria
### Verification Commands
\`\`\`bash
command # Expected: output
\`\`\`
### Final Checklist
- [ ] All "Must Have" present
- [ ] All "Must NOT Have" absent
- [ ] All tests pass
\`\`\`
---
`