feat(hooks): add mandatory hands-on verification enforcement for orchestrated tasks

- sisyphus-orchestrator: Add verification reminder with tool matrix (playwright/interactive_bash/curl)

- start-work: Inject detailed verification workflow with deliverable-specific guidance

🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance
This commit is contained in:
YeonGyu-Kim
2026-01-06 13:16:51 +09:00
parent 7567c40a81
commit 39e92b1900
2 changed files with 66 additions and 2 deletions

View File

@@ -42,7 +42,19 @@ Subagents FREQUENTLY claim completion when:
5. Verify notepad was updated - Must have substantive content
DO NOT TRUST THE AGENT'S SELF-REPORT.
VERIFY EACH CLAIM WITH YOUR OWN TOOL CALLS.`
They are non-deterministic and not exceptional - they CANNOT distinguish between completed and incomplete states.
VERIFY EACH CLAIM WITH YOUR OWN TOOL CALLS.
**HANDS-ON QA REQUIRED (after ALL tasks complete):**
| Deliverable Type | Verification Tool | Action |
|------------------|-------------------|--------|
| **Frontend/UI** | \`/playwright\` skill | Navigate, interact, screenshot evidence |
| **TUI/CLI** | \`interactive_bash\` (tmux) | Run interactively, verify output |
| **API/Backend** | \`bash\` with curl | Send requests, verify responses |
Static analysis CANNOT catch: visual bugs, animation issues, user flow breakages, integration problems.
**FAILURE TO DO HANDS-ON QA = INCOMPLETE WORK.**`
function buildOrchestratorReminder(planName: string, progress: { total: number; completed: number }): string {
const remaining = progress.total - progress.completed

View File

@@ -126,13 +126,65 @@ Which plan would you like to work on? Reply with the number or plan name.`
}
}
const verificationEnforcement = `
---
## MANDATORY VERIFICATION ENFORCEMENT (NON-NEGOTIABLE)
**CRITICAL: You MUST perform hands-on verification after completing ALL tasks. Static analysis alone is NOT sufficient.**
### Verification by Deliverable Type
| Type | Tool | How to Verify |
|------|------|---------------|
| **Frontend/UI** | \`/playwright\` skill | Navigate, click, verify visual state, take screenshots |
| **TUI/CLI** | \`interactive_bash\` (tmux) | Run commands interactively, verify output |
| **API/Backend** | \`bash\` with curl/httpie | Send requests, verify responses |
| **Library/Module** | REPL via \`interactive_bash\` | Import, call functions, verify results |
### Verification Workflow
1. **After ALL tasks complete** (not after each task):
- Start dev server if needed: \`bun run dev\` / \`npm run dev\`
- Wait for server to be ready
2. **For Frontend changes**:
\`\`\`
Load /playwright skill → Navigate to page → Interact with UI → Verify expected behavior → Screenshot evidence
\`\`\`
3. **For TUI/CLI changes**:
\`\`\`
interactive_bash(tmux_command="new-session -d -s qa") → send-keys with commands → capture-pane output → verify
\`\`\`
4. **Evidence required**:
- Screenshots for visual changes (saved to \`.sisyphus/evidence/\`)
- Terminal output for CLI changes
- Response bodies for API changes
### What Static Analysis CANNOT Catch
- Visual rendering issues (wrong colors, broken layouts)
- Animation/transition bugs
- Race conditions in UI interactions
- User flow breakages
- Integration issues between components
### FAILURE TO VERIFY = INCOMPLETE WORK
**Do NOT mark tasks complete or report "done" without hands-on verification.**
If you skip this step, the user will find bugs you could have caught.
`
const idx = output.parts.findIndex((p) => p.type === "text" && p.text)
if (idx >= 0 && output.parts[idx].text) {
output.parts[idx].text = output.parts[idx].text
.replace(/\$SESSION_ID/g, sessionId)
.replace(/\$TIMESTAMP/g, timestamp)
output.parts[idx].text += `\n\n---\n${contextInfo}`
output.parts[idx].text += `\n\n---\n${contextInfo}${verificationEnforcement}`
}
log(`[${HOOK_NAME}] Context injected`, {