feat(hooks): add mandatory hands-on verification enforcement for orchestrated tasks

- sisyphus-orchestrator: Add verification reminder with tool matrix (playwright/interactive_bash/curl) - start-work: Inject detailed verification workflow with deliverable-specific guidance 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance
2026-01-06 13:16:51 +09:00
parent 7567c40a81
commit 39e92b1900
2 changed files with 66 additions and 2 deletions
--- a/src/hooks/sisyphus-orchestrator/index.ts
+++ b/src/hooks/sisyphus-orchestrator/index.ts
@@ -42,7 +42,19 @@ Subagents FREQUENTLY claim completion when:
 5. Verify notepad was updated - Must have substantive content

 DO NOT TRUST THE AGENT'S SELF-REPORT.
-VERIFY EACH CLAIM WITH YOUR OWN TOOL CALLS.`
+They are non-deterministic and not exceptional - they CANNOT distinguish between completed and incomplete states.
+VERIFY EACH CLAIM WITH YOUR OWN TOOL CALLS.
+
+**HANDS-ON QA REQUIRED (after ALL tasks complete):**
+
+| Deliverable Type | Verification Tool | Action |
+|------------------|-------------------|--------|
+| **Frontend/UI** | \`/playwright\` skill | Navigate, interact, screenshot evidence |
+| **TUI/CLI** | \`interactive_bash\` (tmux) | Run interactively, verify output |
+| **API/Backend** | \`bash\` with curl | Send requests, verify responses |
+
+Static analysis CANNOT catch: visual bugs, animation issues, user flow breakages, integration problems.
+**FAILURE TO DO HANDS-ON QA = INCOMPLETE WORK.**`

 function buildOrchestratorReminder(planName: string, progress: { total: number; completed: number }): string {
  const remaining = progress.total - progress.completed
--- a/src/hooks/start-work/index.ts
+++ b/src/hooks/start-work/index.ts
@@ -126,13 +126,65 @@ Which plan would you like to work on? Reply with the number or plan name.`
        }
      }

+      const verificationEnforcement = `
+
+---
+
+## MANDATORY VERIFICATION ENFORCEMENT (NON-NEGOTIABLE)
+
+**CRITICAL: You MUST perform hands-on verification after completing ALL tasks. Static analysis alone is NOT sufficient.**
+
+### Verification by Deliverable Type
+
+| Type | Tool | How to Verify |
+|------|------|---------------|
+| **Frontend/UI** | \`/playwright\` skill | Navigate, click, verify visual state, take screenshots |
+| **TUI/CLI** | \`interactive_bash\` (tmux) | Run commands interactively, verify output |
+| **API/Backend** | \`bash\` with curl/httpie | Send requests, verify responses |
+| **Library/Module** | REPL via \`interactive_bash\` | Import, call functions, verify results |
+
+### Verification Workflow
+
+1. **After ALL tasks complete** (not after each task):
+   - Start dev server if needed: \`bun run dev\` / \`npm run dev\`
+   - Wait for server to be ready
+   
+2. **For Frontend changes**:
+   \`\`\`
+   Load /playwright skill → Navigate to page → Interact with UI → Verify expected behavior → Screenshot evidence
+   \`\`\`
+
+3. **For TUI/CLI changes**:
+   \`\`\`
+   interactive_bash(tmux_command="new-session -d -s qa") → send-keys with commands → capture-pane output → verify
+   \`\`\`
+
+4. **Evidence required**:
+   - Screenshots for visual changes (saved to \`.sisyphus/evidence/\`)
+   - Terminal output for CLI changes
+   - Response bodies for API changes
+
+### What Static Analysis CANNOT Catch
+
+- Visual rendering issues (wrong colors, broken layouts)
+- Animation/transition bugs
+- Race conditions in UI interactions
+- User flow breakages
+- Integration issues between components
+
+### FAILURE TO VERIFY = INCOMPLETE WORK
+
+**Do NOT mark tasks complete or report "done" without hands-on verification.**
+If you skip this step, the user will find bugs you could have caught.
+`
+
      const idx = output.parts.findIndex((p) => p.type === "text" && p.text)
      if (idx >= 0 && output.parts[idx].text) {
        output.parts[idx].text = output.parts[idx].text
          .replace(/\$SESSION_ID/g, sessionId)
          .replace(/\$TIMESTAMP/g, timestamp)
        
-        output.parts[idx].text += `\n\n---\n${contextInfo}`
+        output.parts[idx].text += `\n\n---\n${contextInfo}${verificationEnforcement}`
      }

      log(`[${HOOK_NAME}] Context injected`, {