From 4fd9f0fd04c3c74405565128ee1995a6e17f9b14 Mon Sep 17 00:00:00 2001 From: justsisyphus Date: Wed, 28 Jan 2026 21:03:50 +0900 Subject: [PATCH] refactor(agents): enforce zero user intervention in QA/acceptance criteria - Prometheus: rename 'Manual QA' to 'Automated Verification Only' - Prometheus: add explicit ZERO USER INTERVENTION principle - Prometheus: replace placeholder examples with concrete executable commands - Metis: add QA automation directives in output format - Metis: strengthen CRITICAL RULES to forbid user-intervention criteria --- src/agents/metis.ts | 29 +++++++ src/agents/prometheus-prompt.ts | 139 ++++++++++++++++++++------------ 2 files changed, 115 insertions(+), 53 deletions(-) diff --git a/src/agents/metis.ts b/src/agents/metis.ts index 5e14e41f6..36d6110bf 100644 --- a/src/agents/metis.ts +++ b/src/agents/metis.ts @@ -230,6 +230,8 @@ call_omo_agent(subagent_type="librarian", prompt="Find OSS implementations of Z. - [Risk 2]: [Mitigation] ## Directives for Prometheus + +### Core Directives - MUST: [Required action] - MUST: [Required action] - MUST NOT: [Forbidden action] @@ -237,6 +239,29 @@ call_omo_agent(subagent_type="librarian", prompt="Find OSS implementations of Z. - PATTERN: Follow \`[file:lines]\` - TOOL: Use \`[specific tool]\` for [purpose] +### QA/Acceptance Criteria Directives (MANDATORY) +> **ZERO USER INTERVENTION PRINCIPLE**: All acceptance criteria MUST be executable by agents. + +- MUST: Write acceptance criteria as executable commands (curl, bun test, playwright actions) +- MUST: Include exact expected outputs, not vague descriptions +- MUST: Specify verification tool for each deliverable type (playwright for UI, curl for API, etc.) +- MUST NOT: Create criteria requiring "user manually tests..." +- MUST NOT: Create criteria requiring "user visually confirms..." +- MUST NOT: Create criteria requiring "user clicks/interacts..." +- MUST NOT: Use placeholders without concrete examples (bad: "[endpoint]", good: "/api/users") + +Example of GOOD acceptance criteria: +\`\`\` +curl -s http://localhost:3000/api/health | jq '.status' +# Assert: Output is "ok" +\`\`\` + +Example of BAD acceptance criteria (FORBIDDEN): +\`\`\` +User opens browser and checks if the page loads correctly. +User confirms the button works as expected. +\`\`\` + ## Recommended Approach [1-2 sentence summary of how to proceed] \`\`\` @@ -263,12 +288,16 @@ call_omo_agent(subagent_type="librarian", prompt="Find OSS implementations of Z. - Ask generic questions ("What's the scope?") - Proceed without addressing ambiguity - Make assumptions about user's codebase +- Suggest acceptance criteria requiring user intervention ("user manually tests", "user confirms", "user clicks") +- Leave QA/acceptance criteria vague or placeholder-heavy **ALWAYS**: - Classify intent FIRST - Be specific ("Should this change UserService only, or also AuthService?") - Explore before asking (for Build/Research intents) - Provide actionable directives for Prometheus +- Include QA automation directives in every output +- Ensure acceptance criteria are agent-executable (commands, not human actions) ` const metisRestrictions = createAgentToolRestrictions([ diff --git a/src/agents/prometheus-prompt.ts b/src/agents/prometheus-prompt.ts index 7555202bf..168c03850 100644 --- a/src/agents/prometheus-prompt.ts +++ b/src/agents/prometheus-prompt.ts @@ -953,27 +953,37 @@ Each TODO follows RED-GREEN-REFACTOR: - Example: Create \`src/__tests__/example.test.ts\` - Verify: \`bun test\` → 1 test passes -### If Manual QA Only +### If Automated Verification Only (NO User Intervention) -**CRITICAL**: Without automated tests, manual verification MUST be exhaustive. +> **CRITICAL PRINCIPLE: ZERO USER INTERVENTION** +> +> **NEVER** create acceptance criteria that require: +> - "User manually tests..." / "사용자가 직접 테스트..." +> - "User visually confirms..." / "사용자가 눈으로 확인..." +> - "User interacts with..." / "사용자가 직접 조작..." +> - "Ask user to verify..." / "사용자에게 확인 요청..." +> - ANY step that requires a human to perform an action +> +> **ALL verification MUST be automated and executable by the agent.** +> If a verification cannot be automated, find an automated alternative or explicitly note it as a known limitation. -Each TODO includes detailed verification procedures: +Each TODO includes EXECUTABLE verification procedures that agents can run directly: **By Deliverable Type:** -| Type | Verification Tool | Procedure | -|------|------------------|-----------| -| **Frontend/UI** | Playwright browser | Navigate, interact, screenshot | -| **TUI/CLI** | interactive_bash (tmux) | Run command, verify output | -| **API/Backend** | curl / httpie | Send request, verify response | -| **Library/Module** | Node/Python REPL | Import, call, verify | -| **Config/Infra** | Shell commands | Apply, verify state | +| Type | Verification Tool | Automated Procedure | +|------|------------------|---------------------| +| **Frontend/UI** | Playwright browser via playwright skill | Agent navigates, clicks, screenshots, asserts DOM state | +| **TUI/CLI** | interactive_bash (tmux) | Agent runs command, captures output, validates expected strings | +| **API/Backend** | curl / httpie via Bash | Agent sends request, parses response, validates JSON fields | +| **Library/Module** | Node/Python REPL via Bash | Agent imports, calls function, compares output | +| **Config/Infra** | Shell commands via Bash | Agent applies config, runs state check, validates output | -**Evidence Required:** -- Commands run with actual output -- Screenshots for visual changes -- Response bodies for API changes -- Terminal output for CLI changes +**Evidence Requirements (Agent-Executable):** +- Command output captured and compared against expected patterns +- Screenshots saved to .sisyphus/evidence/ for visual verification +- JSON response fields validated with specific assertions +- Exit codes checked (0 = success) --- @@ -1083,53 +1093,76 @@ Parallel Speedup: ~40% faster than sequential **Acceptance Criteria**: - > CRITICAL: Acceptance = EXECUTION, not just "it should work". - > The executor MUST run these commands and verify output. + > **CRITICAL: AGENT-EXECUTABLE VERIFICATION ONLY** + > + > - Acceptance = EXECUTION by the agent, not "user checks if it works" + > - Every criterion MUST be verifiable by running a command or using a tool + > - NO steps like "user opens browser", "user clicks", "user confirms" + > - If you write "[placeholder]" - REPLACE IT with actual values based on task context **If TDD (tests enabled):** - - [ ] Test file created: \`[path].test.ts\` - - [ ] Test covers: [specific scenario] - - [ ] \`bun test [file]\` → PASS (N tests, 0 failures) + - [ ] Test file created: src/auth/login.test.ts + - [ ] Test covers: successful login returns JWT token + - [ ] bun test src/auth/login.test.ts → PASS (3 tests, 0 failures) - **Manual Execution Verification (ALWAYS include, even with tests):** + **Automated Verification (ALWAYS include, choose by deliverable type):** - *Choose based on deliverable type:* + **For Frontend/UI changes** (using playwright skill): + \\\`\\\`\\\` + # Agent executes via playwright browser automation: + 1. Navigate to: http://localhost:3000/login + 2. Fill: input[name="email"] with "test@example.com" + 3. Fill: input[name="password"] with "password123" + 4. Click: button[type="submit"] + 5. Wait for: selector ".dashboard-welcome" to be visible + 6. Assert: text "Welcome back" appears on page + 7. Screenshot: .sisyphus/evidence/task-1-login-success.png + \\\`\\\`\\\` - **For Frontend/UI changes:** - - [ ] Using playwright browser automation: - - Navigate to: \`http://localhost:[port]/[path]\` - - Action: [click X, fill Y, scroll to Z] - - Verify: [visual element appears, animation completes, state changes] - - Screenshot: Save evidence to \`.sisyphus/evidence/[task-id]-[step].png\` + **For TUI/CLI changes** (using interactive_bash): + \\\`\\\`\\\` + # Agent executes via tmux session: + 1. Command: ./my-cli --config test.yaml + 2. Wait for: "Configuration loaded" in output + 3. Send keys: "q" to quit + 4. Assert: Exit code 0 + 5. Assert: Output contains "Goodbye" + \\\`\\\`\\\` - **For TUI/CLI changes:** - - [ ] Using interactive_bash (tmux session): - - Command: \`[exact command to run]\` - - Input sequence: [if interactive, list inputs] - - Expected output contains: \`[expected string or pattern]\` - - Exit code: [0 for success, specific code if relevant] + **For API/Backend changes** (using Bash curl): + \\\`\\\`\\\`bash + # Agent runs: + curl -s -X POST http://localhost:8080/api/users \\ + -H "Content-Type: application/json" \\ + -d '{"email":"new@test.com","name":"Test User"}' \\ + | jq '.id' + # Assert: Returns non-empty UUID + # Assert: HTTP status 201 + \\\`\\\`\\\` - **For API/Backend changes:** - - [ ] Request: \`curl -X [METHOD] http://localhost:[port]/[endpoint] -H "Content-Type: application/json" -d '[body]'\` - - [ ] Response status: [200/201/etc] - - [ ] Response body contains: \`{"key": "expected_value"}\` + **For Library/Module changes** (using Bash node/bun): + \\\`\\\`\\\`bash + # Agent runs: + bun -e "import { validateEmail } from './src/utils/validate'; console.log(validateEmail('test@example.com'))" + # Assert: Output is "true" + + bun -e "import { validateEmail } from './src/utils/validate'; console.log(validateEmail('invalid'))" + # Assert: Output is "false" + \\\`\\\`\\\` - **For Library/Module changes:** - - [ ] REPL verification: - \`\`\` - > import { [function] } from '[module]' - > [function]([args]) - Expected: [output] - \`\`\` + **For Config/Infra changes** (using Bash): + \\\`\\\`\\\`bash + # Agent runs: + docker compose up -d + # Wait 5s for containers + docker compose ps --format json | jq '.[].State' + # Assert: All states are "running" + \\\`\\\`\\\` - **For Config/Infra changes:** - - [ ] Apply: \`[command to apply config]\` - - [ ] Verify state: \`[command to check state]\` → \`[expected output]\` - - **Evidence Required:** - - [ ] Command output captured (copy-paste actual terminal output) - - [ ] Screenshot saved (for visual changes) - - [ ] Response body logged (for API changes) + **Evidence to Capture:** + - [ ] Terminal output from verification commands (actual output, not expected) + - [ ] Screenshot files in .sisyphus/evidence/ for UI changes + - [ ] JSON response bodies for API changes **Commit**: YES | NO (groups with N) - Message: \`type(scope): desc\`