fix(background-agent): keep stale-pruned tasks through notification cleanup

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
fix(background-agent): skip terminal tasks during stale pruning
2026-03-11 18:01:23 +09:00 · 2026-03-08 02:13:49 +09:00 · 2026-03-08 02:13:43 +09:00 · 2026-03-08 01:59:20 +09:00 · 2026-03-08 01:53:30 +09:00 · 2026-03-08 01:41:45 +09:00
135 changed files with 3243 additions and 1011 deletions
--- a/.issue-comment-2064.md
+++ b/.issue-comment-2064.md
@@ -1,61 +0,0 @@
-[sisyphus-bot] 
-
-## Confirmed Bug
-
-We have identified the root cause of this issue. The bug is in the config writing logic during installation.
-
-### Root Cause
-
-**File:** `src/cli/config-manager/write-omo-config.ts` (line 46)
-
-```typescript
-const merged = deepMergeRecord(existing, newConfig)
-```
-
-When a user runs `oh-my-opencode install` (even just to update settings), the installer:
-1. Reads the existing config (with user's custom model settings)
-2. Generates a **new** config based on detected provider availability
-3. Calls `deepMergeRecord(existing, newConfig)` 
-4. Writes the result back
-
-**The problem:** `deepMergeRecord` overwrites values in `existing` with values from `newConfig`. This means your custom `"model": "openai/gpt-5.2-codex"` gets overwritten by the generated default model (e.g., `anthropic/claude-opus-4-6` if Claude is available).
-
-### Why This Happens
-
-Looking at `deepMergeRecord` (line 24-25):
-```typescript
-} else if (sourceValue !== undefined) {
-  result[key] = sourceValue as TTarget[keyof TTarget]
-}
-```
-
-Any defined value in the source (generated config) overwrites the target (user's config).
-
-### Fix Approach
-
-The merge direction should be reversed to respect user overrides:
-```typescript
-const merged = deepMergeRecord(newConfig, existing)
-```
-
-This ensures:
- User's explicit settings take precedence
- Only new/undefined keys get populated from generated defaults
- Custom model choices are preserved
-
-### SEVERITY: HIGH
-
- **Impact:** User configuration is overwritten without consent
- **Affected Files:** 
-  - `src/cli/config-manager/write-omo-config.ts`
-  - `src/cli/config-manager/deep-merge-record.ts`
- **Trigger:** Running `oh-my-opencode install` (even for unrelated updates)
-
-### Workaround (Until Fix)
-
-Backup your config before running install:
-```bash
-cp ~/.config/opencode/oh-my-opencode.jsonc ~/.config/opencode/oh-my-opencode.jsonc.backup
-```
-
-We're working on a fix that will preserve your explicit model configurations.
--- a/docs/guide/agent-model-matching.md
+++ b/docs/guide/agent-model-matching.md
@@ -64,8 +64,8 @@ These agents have Claude-optimized prompts — long, detailed, mechanics-driven.

 | Agent        | Role              | Fallback Chain                         | Notes                                                                                             |
 | ------------ | ----------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------- |
-| **Sisyphus** | Main orchestrator | Claude Opus → GLM 5 → Big Pickle       | Claude-family first. GPT-5.4 has dedicated support, but Claude/Kimi/GLM remain the preferred fit. |
-| **Metis**    | Plan gap analyzer | Claude Opus → GPT-5.2 → Gemini 3.1 Pro | Claude preferred, GPT acceptable fallback.                                                        |
+| **Sisyphus** | Main orchestrator | Claude Opus → K2P5 → Kimi K2.5 → GPT-5.4 → GLM 5 → Big Pickle | Claude-family first. GPT-5.4 has dedicated prompt support. Kimi/GLM as intermediate fallbacks. |
+| **Metis**    | Plan gap analyzer | Claude Opus → GPT-5.4 → Gemini 3.1 Pro | Claude preferred, GPT acceptable fallback.                                                        |

 ### Dual-Prompt Agents → Claude preferred, GPT supported

@@ -83,7 +83,7 @@ These agents are built for GPT's principle-driven style. Their prompts assume au
 | Agent          | Role                    | Fallback Chain                         | Notes                                            |
 | -------------- | ----------------------- | -------------------------------------- | ------------------------------------------------ |
 | **Hephaestus** | Autonomous deep worker  | GPT-5.3 Codex only                     | No fallback. Requires GPT access. The craftsman. |
-| **Oracle**     | Architecture consultant | GPT-5.2 → Gemini 3.1 Pro → Claude Opus | Read-only high-IQ consultation.                  |
+| **Oracle**     | Architecture consultant | GPT-5.4 → Gemini 3.1 Pro → Claude Opus | Read-only high-IQ consultation.                  |
 | **Momus**      | Ruthless reviewer       | GPT-5.4 → Claude Opus → Gemini 3.1 Pro | Verification and plan review.                    |

 ### Utility Runners → Speed over Intelligence
@@ -119,7 +119,7 @@ Principle-driven, explicit reasoning, deep technical capability. Best for agents
 | Model             | Strengths                                                                                       |
 | ----------------- | ----------------------------------------------------------------------------------------------- |
 | **GPT-5.3 Codex** | Deep coding powerhouse. Autonomous exploration. Required for Hephaestus.                        |
-| **GPT-5.2**       | High intelligence, strategic reasoning. Default for Oracle.                                     |
+| **GPT-5.4**       | High intelligence, strategic reasoning. Default for Oracle.                                     |
 | **GPT-5.4**       | Strong principle-driven reasoning. Default for Momus and a key fallback for Prometheus / Atlas. |
 | **GPT-5-Nano**    | Ultra-cheap, fast. Good for simple utility tasks.                                               |

@@ -149,7 +149,7 @@ When agents delegate work, they don't pick a model name — they pick a **catego
 | `visual-engineering` | Frontend, UI, CSS, design  | Gemini 3.1 Pro → GLM 5 → Claude Opus         |
 | `ultrabrain`         | Maximum reasoning needed   | GPT-5.3 Codex → Gemini 3.1 Pro → Claude Opus |
 | `deep`               | Deep coding, complex logic | GPT-5.3 Codex → Claude Opus → Gemini 3.1 Pro |
-| `artistry`           | Creative, novel approaches | Gemini 3.1 Pro → Claude Opus → GPT-5.2       |
+| `artistry`           | Creative, novel approaches | Gemini 3.1 Pro → Claude Opus → GPT-5.4       |
 | `quick`              | Simple, fast tasks         | Claude Haiku → Gemini Flash → GPT-5-Nano     |
 | `unspecified-high`   | General complex work       | GPT-5.4 → Claude Opus → GLM 5 → K2P5         |
 | `unspecified-low`    | General standard work      | Claude Sonnet → GPT-5.3 Codex → Gemini Flash |
@@ -179,7 +179,7 @@ See the [Orchestration System Guide](./orchestration.md) for how agents dispatch
    "explore": { "model": "github-copilot/grok-code-fast-1" },

    // Architecture consultation: GPT or Claude Opus
-    "oracle": { "model": "openai/gpt-5.2", "variant": "high" },
+    "oracle": { "model": "openai/gpt-5.4", "variant": "high" },

    // Prometheus inherits sisyphus model; just add prompt guidance
    "prometheus": {
@@ -190,7 +190,7 @@ See the [Orchestration System Guide](./orchestration.md) for how agents dispatch
  "categories": {
    "quick": { "model": "opencode/gpt-5-nano" },
    "unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
-    "unspecified-high": { "model": "openai/gpt-5.4", "variant": "high" },
+    "unspecified-high": { "model": "openai/gpt-5.4-high" },
    "visual-engineering": {
      "model": "google/gemini-3.1-pro",
      "variant": "high",
--- a/docs/guide/installation.md
+++ b/docs/guide/installation.md
@@ -49,7 +49,7 @@ Ask the user these questions to determine CLI options:
   - If **no** → `--claude=no`

 2. **Do you have an OpenAI/ChatGPT Plus Subscription?**
-   - If **yes** → `--openai=yes` (GPT-5.2 for Oracle agent)
+   - If **yes** → `--openai=yes` (GPT-5.4 for Oracle agent)
   - If **no** → `--openai=no` (default)

 3. **Will you integrate Gemini models?**
@@ -200,7 +200,7 @@ When GitHub Copilot is the best available provider, oh-my-opencode uses these mo
 | Agent         | Model                             |
 | ------------- | --------------------------------- |
 | **Sisyphus**  | `github-copilot/claude-opus-4-6`  |
-| **Oracle**    | `github-copilot/gpt-5.2`          |
+| **Oracle**    | `github-copilot/gpt-5.4`          |
 | **Explore**   | `github-copilot/grok-code-fast-1` |
 | **Librarian** | `github-copilot/gemini-3-flash`   |

@@ -228,7 +228,7 @@ When OpenCode Zen is the best available provider (no native or Copilot), these m
 | Agent         | Model                                                |
 | ------------- | ---------------------------------------------------- |
 | **Sisyphus**  | `opencode/claude-opus-4-6`                           |
-| **Oracle**    | `opencode/gpt-5.2`                                   |
+| **Oracle**    | `opencode/gpt-5.4`                                   |
 | **Explore**   | `opencode/gpt-5-nano`                                |
 | **Librarian** | `opencode/minimax-m2.5-free` / `opencode/big-pickle` |

@@ -280,7 +280,7 @@ Not all models behave the same way. Understanding which models are "similar" hel
 | Model             | Provider(s)                      | Notes                                             |
 | ----------------- | -------------------------------- | ------------------------------------------------- |
 | **GPT-5.3-codex** | openai, github-copilot, opencode | Deep coding powerhouse. Required for Hephaestus.  |
-| **GPT-5.2**       | openai, github-copilot, opencode | High intelligence. Default for Oracle.            |
+| **GPT-5.4**       | openai, github-copilot, opencode | High intelligence. Default for Oracle.            |
 | **GPT-5-Nano**    | opencode                         | Ultra-cheap, fast. Good for simple utility tasks. |

 **Different-Behavior Models**:
@@ -310,7 +310,7 @@ Based on your subscriptions, here's how the agents were configured:
 | Agent        | Role             | Default Chain                                   | What It Does                                                                             |
 | ------------ | ---------------- | ----------------------------------------------- | ---------------------------------------------------------------------------------------- |
 | **Sisyphus** | Main ultraworker | Opus (max) → Kimi K2.5 → GLM 5 → Big Pickle     | Primary coding agent. Orchestrates everything. **Never use GPT — no GPT prompt exists.** |
-| **Metis**    | Plan review      | Opus (max) → Kimi K2.5 → GPT-5.2 → Gemini 3 Pro | Reviews Prometheus plans for gaps.                                                       |
+| **Metis**    | Plan review      | Opus (max) → Kimi K2.5 → GPT-5.4 → Gemini 3 Pro | Reviews Prometheus plans for gaps.                                                       |

 **Dual-Prompt Agents** (auto-switch between Claude and GPT prompts):

@@ -320,16 +320,16 @@ Priority: **Claude > GPT > Claude-like models**

 | Agent          | Role              | Default Chain                                              | GPT Prompt?                                                      |
 | -------------- | ----------------- | ---------------------------------------------------------- | ---------------------------------------------------------------- |
-| **Prometheus** | Strategic planner | Opus (max) → **GPT-5.2 (high)** → Kimi K2.5 → Gemini 3 Pro | Yes — XML-tagged, principle-driven (~300 lines vs ~1,100 Claude) |
-| **Atlas**      | Todo orchestrator | **Kimi K2.5** → Sonnet → GPT-5.2                           | Yes — GPT-optimized todo management                              |
+| **Prometheus** | Strategic planner | Opus (max) → **GPT-5.4 (high)** → Kimi K2.5 → Gemini 3 Pro | Yes — XML-tagged, principle-driven (~300 lines vs ~1,100 Claude) |
+| **Atlas**      | Todo orchestrator | **Kimi K2.5** → Sonnet → GPT-5.4                           | Yes — GPT-optimized todo management                              |

 **GPT-Native Agents** (built for GPT, don't override to Claude):

 | Agent          | Role                   | Default Chain                          | Notes                                                  |
 | -------------- | ---------------------- | -------------------------------------- | ------------------------------------------------------ |
 | **Hephaestus** | Deep autonomous worker | GPT-5.3-codex (medium) only            | "Codex on steroids." No fallback. Requires GPT access. |
-| **Oracle**     | Architecture/debugging | GPT-5.2 (high) → Gemini 3 Pro → Opus   | High-IQ strategic backup. GPT preferred.               |
-| **Momus**      | High-accuracy reviewer | GPT-5.2 (medium) → Opus → Gemini 3 Pro | Verification agent. GPT preferred.                     |
+| **Oracle**     | Architecture/debugging | GPT-5.4 (high) → Gemini 3 Pro → Opus   | High-IQ strategic backup. GPT preferred.               |
+| **Momus**      | High-accuracy reviewer | GPT-5.4 (medium) → Opus → Gemini 3 Pro | Verification agent. GPT preferred.                     |

 **Utility Agents** (speed over intelligence):

@@ -339,7 +339,7 @@ These agents do search, grep, and retrieval. They intentionally use fast, cheap
 | --------------------- | ------------------ | ---------------------------------------------------------------------- | -------------------------------------------------------------- |
 | **Explore**           | Fast codebase grep | MiniMax M2.5 Free → Grok Code Fast → MiniMax M2.5 → Haiku → GPT-5-Nano | Speed is everything. Grok is blazing fast for grep.            |
 | **Librarian**         | Docs/code search   | MiniMax M2.5 Free → Gemini Flash → Big Pickle                          | Entirely free-tier. Doc retrieval doesn't need deep reasoning. |
-| **Multimodal Looker** | Vision/screenshots | Kimi K2.5 → Kimi Free → Gemini Flash → GPT-5.2 → GLM-4.6v              | Kimi excels at multimodal understanding.                       |
+| **Multimodal Looker** | Vision/screenshots | Kimi K2.5 → Kimi Free → Gemini Flash → GPT-5.4 → GLM-4.6v              | Kimi excels at multimodal understanding.                       |

 #### Why Different Models Need Different Prompts

@@ -388,8 +388,8 @@ GPT (5.3-codex, 5.2) > Claude Opus (decent fallback) > Gemini (acceptable)
 **Safe** (same family):

 - Sisyphus: Opus → Sonnet, Kimi K2.5, GLM 5
- Prometheus: Opus → GPT-5.2 (auto-switches prompt)
- Atlas: Kimi K2.5 → Sonnet, GPT-5.2 (auto-switches)
+- Prometheus: Opus → GPT-5.4 (auto-switches prompt)
+- Atlas: Kimi K2.5 → Sonnet, GPT-5.4 (auto-switches)

 **Dangerous** (no prompt support):

--- a/docs/guide/orchestration.md
+++ b/docs/guide/orchestration.md
@@ -45,7 +45,7 @@ flowchart TB

    subgraph Workers["Worker Layer (Specialized Agents)"]
        Junior[" Sisyphus-Junior<br/>(Task Executor)<br/>Claude Sonnet 4.6"]
-        Oracle[" Oracle<br/>(Architecture)<br/>GPT-5.2"]
+        Oracle[" Oracle<br/>(Architecture)<br/>GPT-5.4"]
        Explore[" Explore<br/>(Codebase Grep)<br/>Grok Code"]
        Librarian[" Librarian<br/>(Docs/OSS)<br/>Gemini 3 Flash"]
        Frontend[" Frontend<br/>(UI/UX)<br/>Gemini 3.1 Pro"]
--- a/docs/guide/overview.md
+++ b/docs/guide/overview.md
@@ -182,7 +182,7 @@ You can override specific agents or categories in your config:
    "explore": { "model": "github-copilot/grok-code-fast-1" },

    // Architecture consultation: GPT or Claude Opus
-    "oracle": { "model": "openai/gpt-5.2", "variant": "high" },
+    "oracle": { "model": "openai/gpt-5.4", "variant": "high" },
  },

  "categories": {
@@ -215,7 +215,7 @@ You can override specific agents or categories in your config:
 **GPT models** (explicit reasoning, principle-driven):

 - GPT-5.3-codex — deep coding powerhouse, required for Hephaestus
- GPT-5.2 — high intelligence, default for Oracle
+- GPT-5.4 — high intelligence, default for Oracle
 - GPT-5-Nano — ultra-cheap, fast utility tasks

 **Different-behavior models**:
--- a/docs/reference/configuration.md
+++ b/docs/reference/configuration.md
@@ -83,8 +83,8 @@ Here's a practical starting configuration:
    "librarian": { "model": "google/gemini-3-flash" },
    "explore": { "model": "github-copilot/grok-code-fast-1" },

-    // Architecture consultation: GPT-5.2 or Claude Opus
-    "oracle": { "model": "openai/gpt-5.2", "variant": "high" },
+    // Architecture consultation: GPT-5.4 or Claude Opus
+    "oracle": { "model": "openai/gpt-5.4", "variant": "high" },

    // Prometheus inherits sisyphus model; just add prompt guidance
    "prometheus": {
@@ -100,7 +100,7 @@ Here's a practical starting configuration:
    "unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },

    // unspecified-high — complex work
-    "unspecified-high": { "model": "openai/gpt-5.4", "variant": "high" },
+    "unspecified-high": { "model": "openai/gpt-5.4-high" },

    // writing — docs/prose
    "writing": { "model": "google/gemini-3-flash" },
@@ -268,13 +268,13 @@ Disable categories: `{ "disabled_categories": ["ultrabrain"] }`
 | Agent                 | Default Model       | Provider Priority                                                            |
 | --------------------- | ------------------- | ---------------------------------------------------------------------------- |
 | **Sisyphus**          | `claude-opus-4-6`   | `claude-opus-4-6` → `glm-5` → `big-pickle`                                   |
-| **Hephaestus**        | `gpt-5.3-codex`     | `gpt-5.3-codex` → `gpt-5.2` (GitHub Copilot fallback)                        |
-| **oracle**            | `gpt-5.2`           | `gpt-5.2` → `gemini-3.1-pro` → `claude-opus-4-6`                             |
+| **Hephaestus**        | `gpt-5.3-codex`     | `gpt-5.3-codex` → `gpt-5.4` (GitHub Copilot fallback)                        |
+| **oracle**            | `gpt-5.4`           | `gpt-5.4` → `gemini-3.1-pro` → `claude-opus-4-6`                             |
 | **librarian**         | `gemini-3-flash`    | `gemini-3-flash` → `minimax-m2.5-free` → `big-pickle`                        |
 | **explore**           | `grok-code-fast-1`  | `grok-code-fast-1` → `minimax-m2.5-free` → `claude-haiku-4-5` → `gpt-5-nano` |
 | **multimodal-looker** | `gpt-5.3-codex`     | `gpt-5.3-codex` → `k2p5` → `gemini-3-flash` → `glm-4.6v` → `gpt-5-nano`      |
 | **Prometheus**        | `claude-opus-4-6`   | `claude-opus-4-6` → `gpt-5.4` → `gemini-3.1-pro`                             |
-| **Metis**             | `claude-opus-4-6`   | `claude-opus-4-6` → `gpt-5.2` → `gemini-3.1-pro`                             |
+| **Metis**             | `claude-opus-4-6`   | `claude-opus-4-6` → `gpt-5.4` → `gemini-3.1-pro`                             |
 | **Momus**             | `gpt-5.4`           | `gpt-5.4` → `claude-opus-4-6` → `gemini-3.1-pro`                             |
 | **Atlas**             | `claude-sonnet-4-6` | `claude-sonnet-4-6` → `gpt-5.4`                                              |

@@ -285,7 +285,7 @@ Disable categories: `{ "disabled_categories": ["ultrabrain"] }`
 | **visual-engineering** | `gemini-3.1-pro`    | `gemini-3.1-pro` → `glm-5` → `claude-opus-4-6`                 |
 | **ultrabrain**         | `gpt-5.3-codex`     | `gpt-5.3-codex` → `gemini-3.1-pro` → `claude-opus-4-6`         |
 | **deep**               | `gpt-5.3-codex`     | `gpt-5.3-codex` → `claude-opus-4-6` → `gemini-3.1-pro`         |
-| **artistry**           | `gemini-3.1-pro`    | `gemini-3.1-pro` → `claude-opus-4-6` → `gpt-5.2`               |
+| **artistry**           | `gemini-3.1-pro`    | `gemini-3.1-pro` → `claude-opus-4-6` → `gpt-5.4`               |
 | **quick**              | `claude-haiku-4-5`  | `claude-haiku-4-5` → `gemini-3-flash` → `gpt-5-nano`           |
 | **unspecified-low**    | `claude-sonnet-4-6` | `claude-sonnet-4-6` → `gpt-5.3-codex` → `gemini-3-flash`       |
 | **unspecified-high**   | `gpt-5.4`           | `gpt-5.4` → `claude-opus-4-6` → `glm-5` → `k2p5` → `kimi-k2.5` |
--- a/docs/reference/features.md
+++ b/docs/reference/features.md
@@ -9,8 +9,8 @@ Oh-My-OpenCode provides 11 specialized AI agents. Each has distinct expertise, o
 | Agent                 | Model              | Purpose                                                                                                                                                                                                                                                                                                                                                          |
 | --------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | **Sisyphus**          | `claude-opus-4-6`  | The default orchestrator. Plans, delegates, and executes complex tasks using specialized subagents with aggressive parallel execution. Todo-driven workflow with extended thinking (32k budget). Fallback: `glm-5` → `big-pickle`.                                                                                                                               |
-| **Hephaestus**        | `gpt-5.3-codex`    | The Legitimate Craftsman. Autonomous deep worker inspired by AmpCode's deep mode. Goal-oriented execution with thorough research before action. Explores codebase patterns, completes tasks end-to-end without premature stopping. Named after the Greek god of forge and craftsmanship. Fallback: `gpt-5.2` on GitHub Copilot. Requires a GPT-capable provider. |
-| **Oracle**            | `gpt-5.2`          | Architecture decisions, code review, debugging. Read-only consultation with stellar logical reasoning and deep analysis. Inspired by AmpCode. Fallback: `gemini-3.1-pro` → `claude-opus-4-6`.                                                                                                                                                                    |
+| **Hephaestus**        | `gpt-5.3-codex`    | The Legitimate Craftsman. Autonomous deep worker inspired by AmpCode's deep mode. Goal-oriented execution with thorough research before action. Explores codebase patterns, completes tasks end-to-end without premature stopping. Named after the Greek god of forge and craftsmanship. Fallback: `gpt-5.4` on GitHub Copilot. Requires a GPT-capable provider. |
+| **Oracle**            | `gpt-5.4`          | Architecture decisions, code review, debugging. Read-only consultation with stellar logical reasoning and deep analysis. Inspired by AmpCode. Fallback: `gemini-3.1-pro` → `claude-opus-4-6`.                                                                                                                                                                    |
 | **Librarian**         | `gemini-3-flash`   | Multi-repo analysis, documentation lookup, OSS implementation examples. Deep codebase understanding with evidence-based answers. Fallback: `minimax-m2.5-free` → `big-pickle`.                                                                                                                                                                                   |
 | **Explore**           | `grok-code-fast-1` | Fast codebase exploration and contextual grep. Fallback: `minimax-m2.5-free` → `claude-haiku-4-5` → `gpt-5-nano`.                                                                                                                                                                                                                                                |
 | **Multimodal-Looker** | `gpt-5.3-codex`    | Visual content specialist. Analyzes PDFs, images, diagrams to extract information. Fallback: `k2p5` → `gemini-3-flash` → `glm-4.6v` → `gpt-5-nano`.                                                                                                                                                                                                              |
@@ -20,7 +20,7 @@ Oh-My-OpenCode provides 11 specialized AI agents. Each has distinct expertise, o
 | Agent          | Model             | Purpose                                                                                                                                            |
 | -------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
 | **Prometheus** | `claude-opus-4-6` | Strategic planner with interview mode. Creates detailed work plans through iterative questioning. Fallback: `gpt-5.4` → `gemini-3.1-pro`.          |
-| **Metis**      | `claude-opus-4-6` | Plan consultant — pre-planning analysis. Identifies hidden intentions, ambiguities, and AI failure points. Fallback: `gpt-5.2` → `gemini-3.1-pro`. |
+| **Metis**      | `claude-opus-4-6` | Plan consultant — pre-planning analysis. Identifies hidden intentions, ambiguities, and AI failure points. Fallback: `gpt-5.4` → `gemini-3.1-pro`. |
 | **Momus**      | `gpt-5.4`         | Plan reviewer — validates plans against clarity, verifiability, and completeness standards. Fallback: `claude-opus-4-6` → `gemini-3.1-pro`.        |

 ### Orchestration Agents
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "The Best AI Agent Harness - Batteries-Included OpenCode Plugin with Multi-Model Orchestration, Parallel Background Agents, and Crafted LSP/AST Tools",
  "main": "dist/index.js",
  "types": "dist/index.d.ts",
@@ -26,6 +26,7 @@
    "build:binaries": "bun run script/build-binaries.ts",
    "build:schema": "bun run script/build-schema.ts",
    "clean": "rm -rf dist",
+    "prepare": "bun run build",
    "postinstall": "node postinstall.mjs",
    "prepublishOnly": "bun run clean && bun run build",
    "typecheck": "tsc --noEmit",
@@ -75,17 +76,17 @@
    "typescript": "^5.7.3"
  },
  "optionalDependencies": {
-    "oh-my-opencode-darwin-arm64": "3.10.1",
-    "oh-my-opencode-darwin-x64": "3.10.1",
-    "oh-my-opencode-darwin-x64-baseline": "3.10.1",
-    "oh-my-opencode-linux-arm64": "3.10.1",
-    "oh-my-opencode-linux-arm64-musl": "3.10.1",
-    "oh-my-opencode-linux-x64": "3.10.1",
-    "oh-my-opencode-linux-x64-baseline": "3.10.1",
-    "oh-my-opencode-linux-x64-musl": "3.10.1",
-    "oh-my-opencode-linux-x64-musl-baseline": "3.10.1",
-    "oh-my-opencode-windows-x64": "3.10.1",
-    "oh-my-opencode-windows-x64-baseline": "3.10.1"
+    "oh-my-opencode-darwin-arm64": "3.11.0",
+    "oh-my-opencode-darwin-x64": "3.11.0",
+    "oh-my-opencode-darwin-x64-baseline": "3.11.0",
+    "oh-my-opencode-linux-arm64": "3.11.0",
+    "oh-my-opencode-linux-arm64-musl": "3.11.0",
+    "oh-my-opencode-linux-x64": "3.11.0",
+    "oh-my-opencode-linux-x64-baseline": "3.11.0",
+    "oh-my-opencode-linux-x64-musl": "3.11.0",
+    "oh-my-opencode-linux-x64-musl-baseline": "3.11.0",
+    "oh-my-opencode-windows-x64": "3.11.0",
+    "oh-my-opencode-windows-x64-baseline": "3.11.0"
  },
  "overrides": {
    "@opencode-ai/sdk": "^1.2.17"
--- a/packages/darwin-arm64/bin/index.js.map
+++ b/packages/darwin-arm64/bin/index.js.map
--- a/packages/darwin-arm64/package.json
+++ b/packages/darwin-arm64/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-darwin-arm64",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (darwin-arm64)",
  "license": "MIT",
  "repository": {
--- a/packages/darwin-x64-baseline/bin/index.js.map
+++ b/packages/darwin-x64-baseline/bin/index.js.map
--- a/packages/darwin-x64-baseline/package.json
+++ b/packages/darwin-x64-baseline/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-darwin-x64-baseline",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (darwin-x64-baseline, no AVX2)",
  "license": "MIT",
  "repository": {
--- a/packages/darwin-x64/bin/index.js.map
+++ b/packages/darwin-x64/bin/index.js.map
--- a/packages/darwin-x64/package.json
+++ b/packages/darwin-x64/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-darwin-x64",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (darwin-x64)",
  "license": "MIT",
  "repository": {
--- a/packages/linux-arm64-musl/bin/index.js.map
+++ b/packages/linux-arm64-musl/bin/index.js.map
--- a/packages/linux-arm64-musl/package.json
+++ b/packages/linux-arm64-musl/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-linux-arm64-musl",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (linux-arm64-musl)",
  "license": "MIT",
  "repository": {
--- a/packages/linux-arm64/bin/index.js.map
+++ b/packages/linux-arm64/bin/index.js.map
--- a/packages/linux-arm64/package.json
+++ b/packages/linux-arm64/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-linux-arm64",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (linux-arm64)",
  "license": "MIT",
  "repository": {
--- a/packages/linux-x64-baseline/bin/index.js.map
+++ b/packages/linux-x64-baseline/bin/index.js.map
--- a/packages/linux-x64-baseline/package.json
+++ b/packages/linux-x64-baseline/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-linux-x64-baseline",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (linux-x64-baseline, no AVX2)",
  "license": "MIT",
  "repository": {
--- a/packages/linux-x64-musl-baseline/bin/index.js.map
+++ b/packages/linux-x64-musl-baseline/bin/index.js.map
--- a/packages/linux-x64-musl-baseline/package.json
+++ b/packages/linux-x64-musl-baseline/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-linux-x64-musl-baseline",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (linux-x64-musl-baseline, no AVX2)",
  "license": "MIT",
  "repository": {
--- a/packages/linux-x64-musl/bin/index.js.map
+++ b/packages/linux-x64-musl/bin/index.js.map
--- a/packages/linux-x64-musl/package.json
+++ b/packages/linux-x64-musl/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-linux-x64-musl",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (linux-x64-musl)",
  "license": "MIT",
  "repository": {
--- a/packages/linux-x64/bin/index.js.map
+++ b/packages/linux-x64/bin/index.js.map
--- a/packages/linux-x64/package.json
+++ b/packages/linux-x64/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-linux-x64",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (linux-x64)",
  "license": "MIT",
  "repository": {
--- a/packages/windows-x64-baseline/bin/index.js.map
+++ b/packages/windows-x64-baseline/bin/index.js.map
--- a/packages/windows-x64-baseline/package.json
+++ b/packages/windows-x64-baseline/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-windows-x64-baseline",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (windows-x64-baseline, no AVX2)",
  "license": "MIT",
  "repository": {
--- a/packages/windows-x64/bin/index.js.map
+++ b/packages/windows-x64/bin/index.js.map
--- a/packages/windows-x64/package.json
+++ b/packages/windows-x64/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode-windows-x64",
-  "version": "3.10.1",
+  "version": "3.11.0",
  "description": "Platform-specific binary for oh-my-opencode (windows-x64)",
  "license": "MIT",
  "repository": {
--- a/signatures/cla.json
+++ b/signatures/cla.json
@@ -1991,6 +1991,30 @@
      "created_at": "2026-03-06T10:05:58Z",
      "repoId": 1108837393,
      "pullRequestNo": 2339
+    },
+    {
+      "name": "wousp112",
+      "id": 186927774,
+      "comment_id": 4014707931,
+      "created_at": "2026-03-06T23:14:44Z",
+      "repoId": 1108837393,
+      "pullRequestNo": 2350
+    },
+    {
+      "name": "rluisr",
+      "id": 7776462,
+      "comment_id": 4015878597,
+      "created_at": "2026-03-07T07:47:45Z",
+      "repoId": 1108837393,
+      "pullRequestNo": 2352
+    },
+    {
+      "name": "hobostay",
+      "id": 110803307,
+      "comment_id": 4016562784,
+      "created_at": "2026-03-07T13:53:56Z",
+      "repoId": 1108837393,
+      "pullRequestNo": 2360
    }
  ]
 }
--- a/src/agents/AGENTS.md
+++ b/src/agents/AGENTS.md
@@ -10,13 +10,13 @@ Agent factories following `createXXXAgent(model) → AgentConfig` pattern. Each

 | Agent | Model | Temp | Mode | Fallback Chain | Purpose |
 |-------|-------|------|------|----------------|---------|
-| **Sisyphus** | claude-opus-4-6 max | 0.1 | all | glm-5 → big-pickle | Main orchestrator, plans + delegates |
-| **Hephaestus** | gpt-5.3-codex medium | 0.1 | all | gpt-5.2 medium (copilot) | Autonomous deep worker |
-| **Oracle** | gpt-5.2 high | 0.1 | subagent | gemini-3.1-pro high → claude-opus-4-6 max | Read-only consultation |
+| **Sisyphus** | claude-opus-4-6 max | 0.1 | all | k2p5 → kimi-k2.5 → gpt-5.4 medium → glm-5 → big-pickle | Main orchestrator, plans + delegates |
+| **Hephaestus** | gpt-5.3-codex medium | 0.1 | all | gpt-5.4 medium (copilot) | Autonomous deep worker |
+| **Oracle** | gpt-5.4 high | 0.1 | subagent | gemini-3.1-pro high → claude-opus-4-6 max | Read-only consultation |
 | **Librarian** | gemini-3-flash | 0.1 | subagent | minimax-m2.5-free → big-pickle | External docs/code search |
 | **Explore** | grok-code-fast-1 | 0.1 | subagent | minimax-m2.5-free → claude-haiku-4-5 → gpt-5-nano | Contextual grep |
 | **Multimodal-Looker** | gpt-5.3-codex medium | 0.1 | subagent | k2p5 → gemini-3-flash → glm-4.6v → gpt-5-nano | PDF/image analysis |
-| **Metis** | claude-opus-4-6 max | **0.3** | subagent | gpt-5.2 high → gemini-3.1-pro high | Pre-planning consultant |
+| **Metis** | claude-opus-4-6 max | **0.3** | subagent | gpt-5.4 high → gemini-3.1-pro high | Pre-planning consultant |
 | **Momus** | gpt-5.4 xhigh | 0.1 | subagent | claude-opus-4-6 max → gemini-3.1-pro high | Plan reviewer |
 | **Atlas** | claude-sonnet-4-6 | 0.1 | primary | gpt-5.4 medium | Todo-list orchestrator |
 | **Prometheus** | claude-opus-4-6 max | 0.1 | — | gpt-5.4 high → gemini-3.1-pro | Strategic planner (internal) |
--- a/src/agents/atlas/agent.ts
+++ b/src/agents/atlas/agent.ts
@@ -5,7 +5,7 @@
 * You are the conductor of a symphony of specialized agents.
 *
 * Routing:
- * 1. GPT models (openai/*, github-copilot/gpt-*) → gpt.ts (GPT-5.2 optimized)
+ * 1. GPT models (openai/*, github-copilot/gpt-*) → gpt.ts (GPT-5.4 optimized)
 * 2. Gemini models (google/*, google-vertex/*) → gemini.ts (Gemini-optimized)
 * 3. Default (Claude, etc.) → default.ts (Claude-optimized)
 */
--- a/src/agents/atlas/default.ts
+++ b/src/agents/atlas/default.ts
@@ -213,7 +213,7 @@ After EVERY delegation, complete ALL of these steps — no shortcuts:

 After verification, READ the plan file directly — every time, no exceptions:
 \`\`\`
-Read(".sisyphus/tasks/{plan-name}.yaml")
+Read(".sisyphus/plans/{plan-name}.md")
 \`\`\`
 Count remaining \`- [ ]\` tasks. This is your ground truth for what comes next.

@@ -335,7 +335,7 @@ task(category="quick", load_skills=[], run_in_background=false, prompt="Task 4..
 \`\`\`

 **Path convention**:
- Plan: \`.sisyphus/plans/{name}.md\` (READ ONLY)
+- Plan: \`.sisyphus/plans/{name}.md\` (you may EDIT to mark checkboxes)
 - Notepad: \`.sisyphus/notepads/{name}/\` (READ/APPEND)
 </notepad_protocol>

@@ -372,6 +372,7 @@ You are the QA gate. Subagents lie. Verify EVERYTHING.
 - Use lsp_diagnostics, grep, glob
 - Manage todos
 - Coordinate and verify
+- **EDIT \`.sisyphus\/plans\/*.md\` to change \`- [ ]\` to \`- [x]\` after verified task completion**

 **YOU DELEGATE**:
 - All code writing/editing
@@ -403,6 +404,20 @@ You are the QA gate. Subagents lie. Verify EVERYTHING.
 - **Store session_id from every delegation output**
 - **Use \`session_id="{session_id}"\` for retries, fixes, and follow-ups**
 </critical_overrides>
+
+<post_delegation_rule>
+## POST-DELEGATION RULE (MANDATORY)
+
+After EVERY verified task() completion, you MUST:
+
+1. **EDIT the plan checkbox**: Change \`- [ ]\` to \`- [x]\` for the completed task in \`.sisyphus/plans/{plan-name}.md\`
+
+2. **READ the plan to confirm**: Read \`.sisyphus/plans/{plan-name}.md\` and verify the checkbox count changed (fewer \`- [ ]\` remaining)
+
+3. **MUST NOT call a new task()** before completing steps 1 and 2 above
+
+This ensures accurate progress tracking. Skip this and you lose visibility into what remains.
+</post_delegation_rule>
 `

 export function getDefaultAtlasPrompt(): string {
--- a/src/agents/atlas/gemini.ts
+++ b/src/agents/atlas/gemini.ts
@@ -309,7 +309,7 @@ task(category="quick", load_skills=[], run_in_background=false, prompt="Task 3..
 - Instruct subagent to append findings (never overwrite)

 **Paths**:
- Plan: \`.sisyphus/plans/{name}.md\` (READ ONLY)
+- Plan: \`.sisyphus\/plans\/{name}.md\` (you may EDIT to mark checkboxes)
 - Notepad: \`.sisyphus/notepads/{name}/\` (READ/APPEND)
 </notepad_protocol>

@@ -343,6 +343,7 @@ Subagents CLAIM "done" when:
 - Use lsp_diagnostics, grep, glob
 - Manage todos
 - Coordinate and verify
+- **EDIT \`.sisyphus\/plans\/*.md\` to change \`- [ ]\` to \`- [x]\` after verified task completion**

 **YOU DELEGATE (NO EXCEPTIONS):**
 - All code writing/editing
@@ -373,6 +374,20 @@ Subagents CLAIM "done" when:
 - Store and reuse session_id for retries
 - **USE TOOL CALLS for verification — not internal reasoning**
 </critical_rules>
+
+<post_delegation_rule>
+## POST-DELEGATION RULE (MANDATORY)
+
+After EVERY verified task() completion, you MUST:
+
+1. **EDIT the plan checkbox**: Change \`- [ ]\` to \`- [x]\` for the completed task in \`.sisyphus/plans/{plan-name}.md\`
+
+2. **READ the plan to confirm**: Read \`.sisyphus/plans/{plan-name}.md\` and verify the checkbox count changed (fewer \`- [ ]\` remaining)
+
+3. **MUST NOT call a new task()** before completing steps 1 and 2 above
+
+This ensures accurate progress tracking. Skip this and you lose visibility into what remains.
+</post_delegation_rule>
 `

 export function getGeminiAtlasPrompt(): string {
--- a/src/agents/atlas/gpt.ts
+++ b/src/agents/atlas/gpt.ts
@@ -313,7 +313,7 @@ task(category="quick", load_skills=[], run_in_background=false, prompt="Task 3..
 - Instruct subagent to append findings (never overwrite)

 **Paths**:
- Plan: \`.sisyphus/plans/{name}.md\` (READ ONLY)
+- Plan: \`.sisyphus/plans/{name}.md\` (you may EDIT to mark checkboxes)
 - Notepad: \`.sisyphus/notepads/{name}/\` (READ/APPEND)
 </notepad_protocol>

@@ -348,6 +348,7 @@ Your job is to CATCH THEM. Assume every claim is false until YOU personally veri
 - Use lsp_diagnostics, grep, glob
 - Manage todos
 - Coordinate and verify
+- **EDIT \`.sisyphus\/plans\/*.md\` to change \`- [ ]\` to \`- [x]\` after verified task completion**

 **YOU DELEGATE**:
 - All code writing/editing
@@ -376,15 +377,19 @@ Your job is to CATCH THEM. Assume every claim is false until YOU personally veri
 - Store and reuse session_id for retries
 </critical_rules>

-<user_updates_spec>
- Send brief updates (1-2 sentences) only when:
-  - Starting a new major phase
-  - Discovering something that changes the plan
- Avoid narrating routine tool calls
- Each update must include a concrete outcome ("Found X", "Verified Y", "Delegated Z")
- Keep updates varied in structure — don't start each the same way
- Do NOT expand task scope; if you notice new work, call it out as optional
-</user_updates_spec>
+<post_delegation_rule>
+## POST-DELEGATION RULE (MANDATORY)
+
+After EVERY verified task() completion, you MUST:
+
+1. **EDIT the plan checkbox**: Change \`- [ ]\` to \`- [x]\` for the completed task in \`.sisyphus/plans/{plan-name}.md\`
+
+2. **READ the plan to confirm**: Read \`.sisyphus/plans/{plan-name}.md\` and verify the checkbox count changed (fewer \`- [ ]\` remaining)
+
+3. **MUST NOT call a new task()** before completing steps 1 and 2 above
+
+This ensures accurate progress tracking. Skip this and you lose visibility into what remains.
+</post_delegation_rule>
 `;

 export function getGptAtlasPrompt(): string {
--- a/src/agents/atlas/prompt-checkbox-enforcement.test.ts
+++ b/src/agents/atlas/prompt-checkbox-enforcement.test.ts
@@ -0,0 +1,155 @@
+import { describe, test, expect } from "bun:test"
+import { ATLAS_SYSTEM_PROMPT } from "./default"
+import { ATLAS_GPT_SYSTEM_PROMPT } from "./gpt"
+import { ATLAS_GEMINI_SYSTEM_PROMPT } from "./gemini"
+
+describe("ATLAS prompt checkbox enforcement", () => {
+  describe("default prompt", () => {
+    test("plan should NOT be marked (READ ONLY)", () => {
+      // given
+      const prompt = ATLAS_SYSTEM_PROMPT
+
+      // when / then
+      expect(prompt).not.toMatch(/\(READ ONLY\)/)
+    })
+
+    test("plan description should include EDIT for checkboxes", () => {
+      // given
+      const prompt = ATLAS_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/edit.*checkbox|checkbox.*edit/)
+    })
+
+    test("boundaries should include exception for editing .sisyphus/plans/*.md checkboxes", () => {
+      // given
+      const prompt = ATLAS_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/\.sisyphus\/plans\/\*\.md/)
+      expect(lowerPrompt).toMatch(/checkbox/)
+    })
+
+    test("prompt should include POST-DELEGATION RULE", () => {
+      // given
+      const prompt = ATLAS_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/post-delegation/)
+    })
+
+    test("prompt should include MUST NOT call a new task() before", () => {
+      // given
+      const prompt = ATLAS_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/must not.*call.*new.*task/)
+    })
+
+    test("default prompt should NOT reference .sisyphus/tasks/", () => {
+      // given
+      const prompt = ATLAS_SYSTEM_PROMPT
+
+      // when / then
+      expect(prompt).not.toMatch(/\.sisyphus\/tasks\//)
+    })
+  })
+
+  describe("GPT prompt", () => {
+    test("plan should NOT be marked (READ ONLY)", () => {
+      // given
+      const prompt = ATLAS_GPT_SYSTEM_PROMPT
+
+      // when / then
+      expect(prompt).not.toMatch(/\(READ ONLY\)/)
+    })
+
+    test("plan description should include EDIT for checkboxes", () => {
+      // given
+      const prompt = ATLAS_GPT_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/edit.*checkbox|checkbox.*edit/)
+    })
+
+    test("boundaries should include exception for editing .sisyphus/plans/*.md checkboxes", () => {
+      // given
+      const prompt = ATLAS_GPT_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/\.sisyphus\/plans\/\*\.md/)
+      expect(lowerPrompt).toMatch(/checkbox/)
+    })
+
+    test("prompt should include POST-DELEGATION RULE", () => {
+      // given
+      const prompt = ATLAS_GPT_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/post-delegation/)
+    })
+
+    test("prompt should include MUST NOT call a new task() before", () => {
+      // given
+      const prompt = ATLAS_GPT_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/must not.*call.*new.*task/)
+    })
+  })
+
+  describe("Gemini prompt", () => {
+    test("plan should NOT be marked (READ ONLY)", () => {
+      // given
+      const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
+
+      // when / then
+      expect(prompt).not.toMatch(/\(READ ONLY\)/)
+    })
+
+    test("plan description should include EDIT for checkboxes", () => {
+      // given
+      const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/edit.*checkbox|checkbox.*edit/)
+    })
+
+    test("boundaries should include exception for editing .sisyphus/plans/*.md checkboxes", () => {
+      // given
+      const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/\.sisyphus\/plans\/\*\.md/)
+      expect(lowerPrompt).toMatch(/checkbox/)
+    })
+
+    test("prompt should include POST-DELEGATION RULE", () => {
+      // given
+      const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/post-delegation/)
+    })
+
+    test("prompt should include MUST NOT call a new task() before", () => {
+      // given
+      const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
+      const lowerPrompt = prompt.toLowerCase()
+
+      // when / then
+      expect(lowerPrompt).toMatch(/must not.*call.*new.*task/)
+    })
+  })
+})
--- a/src/agents/dynamic-agent-prompt-builder.test.ts
+++ b/src/agents/dynamic-agent-prompt-builder.test.ts
@@ -4,7 +4,7 @@ import { describe, it, expect } from "bun:test"
 import {
  buildCategorySkillsDelegationGuide,
  buildUltraworkSection,
-  buildDeepParallelSection,
+  buildParallelDelegationSection,
  buildNonClaudePlannerSection,
  type AvailableSkill,
  type AvailableCategory,
@@ -174,23 +174,39 @@ describe("buildUltraworkSection", () => {
  })
 })

-describe("buildDeepParallelSection", () => {
+describe("buildParallelDelegationSection", () => {
  const deepCategory: AvailableCategory = { name: "deep", description: "Autonomous problem-solving" }
+  const unspecifiedHighCategory: AvailableCategory = { name: "unspecified-high", description: "High effort tasks" }
  const otherCategory: AvailableCategory = { name: "quick", description: "Trivial tasks" }

-  it("#given non-Claude model with deep category #when building #then returns parallel delegation section", () => {
+  it("#given non-Claude model with deep category #when building #then returns aggressive delegation section", () => {
    //#given
    const model = "google/gemini-3-pro"
    const categories = [deepCategory, otherCategory]

    //#when
-    const result = buildDeepParallelSection(model, categories)
+    const result = buildParallelDelegationSection(model, categories)

    //#then
-    expect(result).toContain("Deep Parallel Delegation")
-    expect(result).toContain("EVERY independent unit")
+    expect(result).toContain("DECOMPOSE AND DELEGATE")
+    expect(result).toContain("NOT AN IMPLEMENTER")
    expect(result).toContain("run_in_background=true")
    expect(result).toContain("4 independent units")
+    expect(result).toContain("NEVER implement directly")
+  })
+
+  it("#given non-Claude model with unspecified-high category #when building #then returns aggressive delegation section", () => {
+    //#given
+    const model = "openai/gpt-5.4"
+    const categories = [unspecifiedHighCategory, otherCategory]
+
+    //#when
+    const result = buildParallelDelegationSection(model, categories)
+
+    //#then
+    expect(result).toContain("DECOMPOSE AND DELEGATE")
+    expect(result).toContain("`deep` or `unspecified-high`")
+    expect(result).toContain("NEVER work sequentially")
  })

  it("#given Claude model #when building #then returns empty", () => {
@@ -199,19 +215,19 @@ describe("buildDeepParallelSection", () => {
    const categories = [deepCategory]

    //#when
-    const result = buildDeepParallelSection(model, categories)
+    const result = buildParallelDelegationSection(model, categories)

    //#then
    expect(result).toBe("")
  })

-  it("#given non-Claude model without deep category #when building #then returns empty", () => {
+  it("#given non-Claude model without deep or unspecified-high category #when building #then returns empty", () => {
    //#given
-    const model = "openai/gpt-5.2"
+    const model = "openai/gpt-5.4"
    const categories = [otherCategory]

    //#when
-    const result = buildDeepParallelSection(model, categories)
+    const result = buildParallelDelegationSection(model, categories)

    //#then
    expect(result).toBe("")
@@ -245,7 +261,7 @@ describe("buildNonClaudePlannerSection", () => {

  it("#given GPT model #when building #then returns plan agent section", () => {
    //#given
-    const model = "openai/gpt-5.2"
+    const model = "openai/gpt-5.4"

    //#when
    const result = buildNonClaudePlannerSection(model)
--- a/src/agents/dynamic-agent-prompt-builder.ts
+++ b/src/agents/dynamic-agent-prompt-builder.ts
@@ -247,7 +247,34 @@ task(
 **ANTI-PATTERN (will produce poor results):**
 \`\`\`typescript
 task(category="...", load_skills=[], run_in_background=false, prompt="...")  // Empty load_skills without justification
-\`\`\``
+\`\`\`
+
+---
+
+### Category Domain Matching (ZERO TOLERANCE)
+
+Every delegation MUST use the category that matches the task's domain. Mismatched categories produce measurably worse output because each category runs on a model optimized for that specific domain.
+
+**VISUAL WORK = ALWAYS \`visual-engineering\`. NO EXCEPTIONS.**
+
+Any task involving UI, UX, CSS, styling, layout, animation, design, or frontend components MUST go to \`visual-engineering\`. Never delegate visual work to \`quick\`, \`unspecified-*\`, or any other category.
+
+\`\`\`typescript
+// CORRECT: Visual work → visual-engineering category
+task(category="visual-engineering", load_skills=["frontend-ui-ux"], prompt="Redesign the sidebar layout with new spacing...")
+
+// WRONG: Visual work in wrong category — WILL PRODUCE INFERIOR RESULTS
+task(category="quick", load_skills=[], prompt="Redesign the sidebar layout with new spacing...")
+\`\`\`
+
+| Task Domain | MUST Use Category |
+|---|---|
+| UI, styling, animations, layout, design | \`visual-engineering\` |
+| Hard logic, architecture decisions, algorithms | \`ultrabrain\` |
+| Autonomous research + end-to-end implementation | \`deep\` |
+| Single-file typo, trivial config change | \`quick\` |
+
+**When in doubt about category, it is almost never \`quick\` or \`unspecified-*\`. Match the domain.**`
 }

 export function buildOracleSection(agents: AvailableAgent[]): string {
@@ -332,21 +359,38 @@ Multi-step task? **ALWAYS consult Plan Agent first.** Do NOT start implementatio
 Plan Agent returns a structured work breakdown with parallel execution opportunities. Follow it.`
 }

-export function buildDeepParallelSection(model: string, categories: AvailableCategory[]): string {
+export function buildParallelDelegationSection(model: string, categories: AvailableCategory[]): string {
  const isNonClaude = !model.toLowerCase().includes('claude')
-  const hasDeepCategory = categories.some(c => c.name === 'deep')
+  const hasDelegationCategory = categories.some(c => c.name === 'deep' || c.name === 'unspecified-high')

-  if (!isNonClaude || !hasDeepCategory) return ""
+  if (!isNonClaude || !hasDelegationCategory) return ""

-  return `### Deep Parallel Delegation
+  return `### DECOMPOSE AND DELEGATE — YOU ARE NOT AN IMPLEMENTER

-Delegate EVERY independent unit to a \`deep\` agent in parallel (\`run_in_background=true\`).
-If a task decomposes into 4 independent units, spawn 4 agents simultaneously — not 1 at a time.
+**YOUR FAILURE MODE: You attempt to do work yourself instead of decomposing and delegating.** When you implement directly, the result is measurably worse than when specialized subagents do it. Subagents have domain-specific configurations, loaded skills, and tuned prompts that you lack.

-1. Decompose the implementation into independent work units
-2. Assign one \`deep\` agent per unit — all via \`run_in_background=true\`
-3. Give each agent a clear GOAL with success criteria, not step-by-step instructions
-4. Collect all results, integrate, verify coherence across units`
+**MANDATORY — for ANY implementation task:**
+
+1. **ALWAYS decompose** the task into independent work units. No exceptions. Even if the task "feels small", decompose it.
+2. **ALWAYS delegate** EACH unit to a \`deep\` or \`unspecified-high\` agent in parallel (\`run_in_background=true\`).
+3. **NEVER work sequentially.** If 4 independent units exist, spawn 4 agents simultaneously. Not 1 at a time. Not 2 then 2.
+4. **NEVER implement directly** when delegation is possible. You write prompts, not code.
+
+**YOUR PROMPT TO EACH AGENT MUST INCLUDE:**
+- GOAL with explicit success criteria (what "done" looks like)
+- File paths and constraints (where to work, what not to touch)
+- Existing patterns to follow (reference specific files the agent should read)
+- Clear scope boundary (what is IN scope, what is OUT of scope)
+
+**Vague delegation = failed delegation.** If your prompt to the subagent is shorter than 5 lines, it is too vague.
+
+| You Want To Do | You MUST Do Instead |
+|---|---|
+| Write code yourself | Delegate to \`deep\` or \`unspecified-high\` agent |
+| Handle 3 changes sequentially | Spawn 3 agents in parallel |
+| "Quickly fix this one thing" | Still delegate — your "quick fix" is slower and worse than a subagent's |
+
+**Your value is orchestration, decomposition, and quality control. Delegating with crystal-clear prompts IS your work.**`
 }

 export function buildUltraworkSection(
--- a/src/agents/hephaestus/agent.test.ts
+++ b/src/agents/hephaestus/agent.test.ts
@@ -39,8 +39,8 @@ describe("getHephaestusPromptSource", () => {

  test("returns 'gpt' for generic GPT models", () => {
    // given
-    const model1 = "openai/gpt-5.2";
-    const model2 = "github-copilot/gpt-5.2";
+    const model1 = "openai/gpt-4o";
+    const model2 = "github-copilot/gpt-4o";
    const model3 = "openai/gpt-4o";

    // when
@@ -111,7 +111,7 @@ describe("getHephaestusPrompt", () => {

  test("generic GPT model returns generic GPT prompt", () => {
    // given
-    const model = "openai/gpt-5.2";
+    const model = "openai/gpt-4o";

    // when
    const prompt = getHephaestusPrompt(model);
--- a/src/agents/hephaestus/gpt-5-3-codex.ts
+++ b/src/agents/hephaestus/gpt-5-3-codex.ts
@@ -522,7 +522,7 @@ export function createHephaestusAgent(

  return {
    description:
-      "Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",
+      "Autonomous Deep Worker - goal-oriented execution with GPT 5.4 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",
    mode: MODE,
    model,
    maxTokens: 32000,
--- a/src/agents/librarian.ts
+++ b/src/agents/librarian.ts
@@ -242,10 +242,10 @@ https://github.com/tanstack/query/blob/abc123def/packages/react-query/src/useQue
 ### Primary Tools by Purpose

 - **Official Docs**: Use context7 — \`context7_resolve-library-id\` → \`context7_query-docs\`
- **Find Docs URL**: Use websearch_exa — \`websearch_exa_web_search_exa("library official documentation")\`
+- **Find Docs URL**: Use websearch_exa — \`websearch_web_search_exa("library official documentation")\`
 - **Sitemap Discovery**: Use webfetch — \`webfetch(docs_url + "/sitemap.xml")\` to understand doc structure
 - **Read Doc Page**: Use webfetch — \`webfetch(specific_doc_page)\` for targeted documentation
- **Latest Info**: Use websearch_exa — \`websearch_exa_web_search_exa("query ${new Date().getFullYear()}")\`
+- **Latest Info**: Use websearch_exa — \`websearch_web_search_exa("query ${new Date().getFullYear()}")\`
 - **Fast Code Search**: Use grep_app — \`grep_app_searchGitHub(query, language, useRegexp)\`
 - **Deep Code Search**: Use gh CLI — \`gh search code "query" --repo owner/repo\`
 - **Clone Repo**: Use gh CLI — \`gh repo clone owner/repo \${TMPDIR:-/tmp}/name -- --depth 1\`
--- a/src/agents/prometheus/system-prompt.ts
+++ b/src/agents/prometheus/system-prompt.ts
@@ -48,7 +48,7 @@ export function getPrometheusPromptSource(model?: string): PrometheusPromptSourc

 /**
 * Gets the appropriate Prometheus prompt based on model.
- * GPT models → GPT-5.2 optimized prompt (XML-tagged, principle-driven)
+ * GPT models → GPT-5.4 optimized prompt (XML-tagged, principle-driven)
 * Gemini models → Gemini-optimized prompt (aggressive tool-call enforcement, thinking checkpoints)
 * Default (Claude, etc.) → Claude-optimized prompt (modular sections)
 */
--- a/src/agents/sisyphus-junior/agent.ts
+++ b/src/agents/sisyphus-junior/agent.ts
@@ -5,7 +5,7 @@
 * Category-spawned executor with domain-specific configurations.
 *
 * Routing:
- * 1. GPT models (openai/*, github-copilot/gpt-*) -> gpt.ts (GPT-5.2 optimized)
+ * 1. GPT models (openai/*, github-copilot/gpt-*) -> gpt.ts (GPT-5.4 optimized)
 * 2. Gemini models (google/*, google-vertex/*) -> gemini.ts (Gemini-optimized)
 * 3. Default (Claude, etc.) -> default.ts (Claude-optimized)
 */
--- a/src/agents/sisyphus-junior/index.test.ts
+++ b/src/agents/sisyphus-junior/index.test.ts
@@ -10,13 +10,13 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {
  describe("honored fields", () => {
    test("applies model override", () => {
      // given
-      const override = { model: "openai/gpt-5.2" }
+      const override = { model: "openai/gpt-5.4" }

      // when
      const result = createSisyphusJuniorAgentWithOverrides(override)

      // then
-      expect(result.model).toBe("openai/gpt-5.2")
+      expect(result.model).toBe("openai/gpt-5.4")
    })

    test("applies temperature override", () => {
@@ -105,7 +105,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {
      // given
      const override = {
        disable: true,
-        model: "openai/gpt-5.2",
+        model: "openai/gpt-5.4",
        temperature: 0.9,
      }

@@ -216,7 +216,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {

    test("useTaskSystem=true produces Task Discipline prompt for GPT", () => {
      //#given
-      const override = { model: "openai/gpt-5.2" }
+      const override = { model: "openai/gpt-5.4" }

      //#when
      const result = createSisyphusJuniorAgentWithOverrides(override, undefined, true)
@@ -253,7 +253,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {

    test("useTaskSystem=true includes task_create/task_update in GPT prompt", () => {
      //#given
-      const override = { model: "openai/gpt-5.2" }
+      const override = { model: "openai/gpt-5.4" }

      //#when
      const result = createSisyphusJuniorAgentWithOverrides(override, undefined, true)
@@ -303,7 +303,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {

    test("GPT model uses GPT-optimized prompt with Hephaestus-style sections", () => {
      // given
-      const override = { model: "openai/gpt-5.2" }
+      const override = { model: "openai/gpt-5.4" }

      // when
      const result = createSisyphusJuniorAgentWithOverrides(override)
@@ -401,7 +401,7 @@ describe("getSisyphusJuniorPromptSource", () => {

  test("returns 'gpt' for generic GPT models", () => {
    // given
-    const model = "openai/gpt-5.2"
+    const model = "openai/gpt-4o"

    // when
    const source = getSisyphusJuniorPromptSource(model)
@@ -473,7 +473,7 @@ describe("buildSisyphusJuniorPrompt", () => {

  test("generic GPT model uses generic GPT prompt", () => {
    // given
-    const model = "openai/gpt-5.2"
+    const model = "openai/gpt-5.4"

    // when
    const prompt = buildSisyphusJuniorPrompt(model, false)
--- a/src/agents/sisyphus.ts
+++ b/src/agents/sisyphus.ts
@@ -35,7 +35,7 @@ import {
  buildOracleSection,
  buildHardBlocksSection,
  buildAntiPatternsSection,
-  buildDeepParallelSection,
+  buildParallelDelegationSection,
  buildNonClaudePlannerSection,
  categorizeTools,
 } from "./dynamic-agent-prompt-builder";
@@ -64,7 +64,7 @@ function buildDynamicSisyphusPrompt(
  const oracleSection = buildOracleSection(availableAgents);
  const hardBlocks = buildHardBlocksSection();
  const antiPatterns = buildAntiPatternsSection();
-  const deepParallelSection = buildDeepParallelSection(model, availableCategories);
+  const parallelDelegationSection = buildParallelDelegationSection(model, availableCategories);
  const nonClaudePlannerSection = buildNonClaudePlannerSection(model);
  const taskManagementSection = buildTaskManagementSection(useTaskSystem);
  const todoHookNote = useTaskSystem
@@ -262,7 +262,7 @@ ${categorySkillsGuide}

 ${nonClaudePlannerSection}

-${deepParallelSection}
+${parallelDelegationSection}

 ${delegationTable}

--- a/src/agents/sisyphus/default.ts
+++ b/src/agents/sisyphus/default.ts
@@ -19,7 +19,7 @@ import {
  buildOracleSection,
  buildHardBlocksSection,
  buildAntiPatternsSection,
-  buildDeepParallelSection,
+  buildParallelDelegationSection,
  buildNonClaudePlannerSection,
  categorizeTools,
 } from "../dynamic-agent-prompt-builder";
@@ -158,7 +158,7 @@ export function buildDefaultSisyphusPrompt(
  const oracleSection = buildOracleSection(availableAgents);
  const hardBlocks = buildHardBlocksSection();
  const antiPatterns = buildAntiPatternsSection();
-  const deepParallelSection = buildDeepParallelSection(model, availableCategories);
+  const parallelDelegationSection = buildParallelDelegationSection(model, availableCategories);
  const nonClaudePlannerSection = buildNonClaudePlannerSection(model);
  const taskManagementSection = buildTaskManagementSection(useTaskSystem);
  const todoHookNote = useTaskSystem
@@ -356,7 +356,7 @@ ${categorySkillsGuide}

 ${nonClaudePlannerSection}

-${deepParallelSection}
+${parallelDelegationSection}

 ${delegationTable}

--- a/src/agents/sisyphus/gpt-5-4.ts
+++ b/src/agents/sisyphus/gpt-5-4.ts
@@ -1,14 +1,24 @@
 /**
- * GPT-5.4-native Sisyphus prompt — written from scratch.
+ * GPT-5.4-native Sisyphus prompt — rewritten with 8-block architecture.
 *
 * Design principles (derived from OpenAI's GPT-5.4 prompting guidance):
- * - Compact, block-structured prompts with XML tags
- * - reasoning.effort defaults to "none" — encourage explicit thinking
+ * - Compact, block-structured prompts with XML tags + named sub-anchors
+ * - reasoning.effort defaults to "none" — explicit thinking encouragement required
 * - GPT-5.4 generates preambles natively — do NOT add preamble instructions
 * - GPT-5.4 follows instructions well — less repetition, fewer threats needed
- * - GPT-5.4 benefits from: output contracts, verification loops, dependency checks
- * - GPT-5.4 can be over-literal — add intent inference layer for 알잘딱 behavior
+ * - GPT-5.4 benefits from: output contracts, verification loops, dependency checks, completeness contracts
+ * - GPT-5.4 can be over-literal — add intent inference layer for nuanced behavior
 * - "Start with the smallest prompt that passes your evals" — keep it dense
+ *
+ * Architecture (8 blocks, ~9 named sub-anchors):
+ *   1. <identity>          — Role, instruction priority, orchestrator bias
+ *   2. <constraints>       — Hard blocks + anti-patterns (early placement for GPT-5.4 attention)
+ *   3. <intent>            — Think-first + intent gate + autonomy (merged, domain_guess routing)
+ *   4. <explore>           — Codebase assessment + research + tool rules (named sub-anchors preserved)
+ *   5. <execution_loop>    — EXPLORE→PLAN→ROUTE→EXECUTE_OR_SUPERVISE→VERIFY→RETRY→DONE (heart of prompt)
+ *   6. <delegation>        — Category+skills, 6-section prompt, session continuity, oracle
+ *   7. <tasks>             — Task/todo management
+ *   8. <style>             — Tone (prose) + output contract + progress updates
 */

 import type {
@@ -27,14 +37,13 @@ import {
  buildOracleSection,
  buildHardBlocksSection,
  buildAntiPatternsSection,
-  buildDeepParallelSection,
  buildNonClaudePlannerSection,
  categorizeTools,
 } from "../dynamic-agent-prompt-builder";

-function buildGpt54TaskManagementSection(useTaskSystem: boolean): string {
+function buildGpt54TasksSection(useTaskSystem: boolean): string {
  if (useTaskSystem) {
-    return `<task_management>
+    return `<tasks>
 Create tasks before starting any non-trivial work. This is your primary coordination mechanism.

 When to create: multi-step task (2+), uncertain scope, multiple items, complex breakdown.
@@ -47,10 +56,10 @@ Workflow:

 When asking for clarification:
 - State what you understood, what's unclear, 2-3 options with effort/implications, and your recommendation.
-</task_management>`;
+</tasks>`;
  }

-  return `<task_management>
+  return `<tasks>
 Create todos before starting any non-trivial work. This is your primary coordination mechanism.

 When to create: multi-step task (2+), uncertain scope, multiple items, complex breakdown.
@@ -63,7 +72,7 @@ Workflow:

 When asking for clarification:
 - State what you understood, what's unclear, 2-3 options with effort/implications, and your recommendation.
-</task_management>`;
+</tasks>`;
 }

 export function buildGpt54SisyphusPrompt(
@@ -90,14 +99,13 @@ export function buildGpt54SisyphusPrompt(
  const oracleSection = buildOracleSection(availableAgents);
  const hardBlocks = buildHardBlocksSection();
  const antiPatterns = buildAntiPatternsSection();
-  const deepParallelSection = buildDeepParallelSection(model, availableCategories);
  const nonClaudePlannerSection = buildNonClaudePlannerSection(model);
-  const taskManagementSection = buildGpt54TaskManagementSection(useTaskSystem);
+  const tasksSection = buildGpt54TasksSection(useTaskSystem);
  const todoHookNote = useTaskSystem
    ? "YOUR TASK CREATION WOULD BE TRACKED BY HOOK([SYSTEM REMINDER - TASK CONTINUATION])"
    : "YOUR TODO CREATION WOULD BE TRACKED BY HOOK([SYSTEM REMINDER - TODO CONTINUATION])";

-  return `<identity>
+  const identityBlock = `<identity>
 You are Sisyphus — an AI orchestrator from OhMyOpenCode.

 You are a senior SF Bay Area engineer. You delegate, verify, and ship. Your code is indistinguishable from a senior engineer's work.
@@ -107,25 +115,36 @@ Core competencies: parsing implicit requirements from explicit requests, adaptin
 You never work alone when specialists are available. Frontend → delegate. Deep research → parallel background agents. Architecture → consult Oracle.

 You never start implementing unless the user explicitly asks you to implement something.
-${todoHookNote}
-</identity>

-<think_first>
-Before responding to any non-trivial request, pause and reason through these questions:
+Instruction priority: user instructions override default style/tone/formatting. Newer instructions override older ones. Safety and type-safety constraints never yield.
+
+Default to orchestration. Direct execution is for clearly local, trivial work only.
+${todoHookNote}
+</identity>`;
+
+  const constraintsBlock = `<constraints>
+${hardBlocks}
+
+${antiPatterns}
+</constraints>`;
+
+  const intentBlock = `<intent>
+Every message passes through this gate before any action.
+Your default reasoning effort is minimal. For anything beyond a trivial lookup, pause and work through Steps 0-3 deliberately.
+
+Step 0 — Think first:
+
+Before acting, reason through these questions:
 - What does the user actually want? Not literally — what outcome are they after?
 - What didn't they say that they probably expect?
 - Is there a simpler way to achieve this than what they described?
 - What could go wrong with the obvious approach?
-
-This is especially important because your default reasoning effort is minimal. For anything beyond a simple lookup, think deliberately before acting.
-</think_first>
-
-<intent_gate>
-Every message passes through this gate before any action.
+- What tool calls can I issue IN PARALLEL right now? List independent reads, searches, and agent fires before calling.
+- Is there a skill whose domain connects to this task? If so, load it immediately via \`skill\` tool — do not hesitate.

 ${keyTriggers}

-Step 0 — Infer true intent:
+Step 1 — Classify complexity x domain:

 The user rarely says exactly what they mean. Your job is to read between the lines.

@@ -137,19 +156,25 @@ The user rarely says exactly what they mean. Your job is to read between the lin
 | "what do you think about X?" | Wants your evaluation before committing | evaluate → propose → wait for go-ahead |
 | "X is broken", "seeing error Y" | Wants a minimal fix | diagnose → fix minimally → verify |
 | "refactor", "improve", "clean up" | Open-ended — needs scoping first | assess codebase → propose approach → wait |
-| "어제 작업한거 좀 이상해" | Something from yesterday's work is buggy — find and fix it | check recent changes → hypothesize → verify → fix |
-| "이거 전반적으로 좀 고쳐줘" | Multiple issues — wants a thorough pass | assess scope → create todo list → work through systematically |
-
-State your interpretation briefly: "I read this as [type] — [one line plan]." Then proceed.
-
-Step 1 — Classify complexity:
+| "yesterday's work seems off" | Something from recent work is buggy — find and fix it | check recent changes → hypothesize → verify → fix |
+| "fix this whole thing" | Multiple issues — wants a thorough pass | assess scope → create todo list → work through systematically |

+Complexity:
 - Trivial (single file, known location) → direct tools, unless a Key Trigger fires
 - Explicit (specific file/line, clear command) → execute directly
- Exploratory ("how does X work?") → fire explore agents (1-3) + tools in parallel
+- Exploratory ("how does X work?") → fire explore agents (1-3) + direct tools ALL IN THE SAME RESPONSE
 - Open-ended ("improve", "refactor") → assess codebase first, then propose
 - Ambiguous (multiple interpretations with 2x+ effort difference) → ask ONE question

+Domain guess (provisional — finalized in ROUTE after exploration):
+- Visual (UI, CSS, styling, layout, design, animation) → likely visual-engineering
+- Logic (algorithms, architecture, complex business logic) → likely ultrabrain
+- Writing (docs, prose, technical writing) → likely writing
+- Git (commits, branches, rebases) → likely git
+- General → determine after exploration
+
+State your interpretation: "I read this as [complexity]-[domain_guess] — [one line plan]." Then proceed.
+
 Step 2 — Check before acting:

 - Single valid interpretation → proceed
@@ -157,43 +182,29 @@ Step 2 — Check before acting:
 - Multiple interpretations, very different effort → ask
 - Missing critical info → ask
 - User's design seems flawed → raise concern concisely, propose alternative, ask if they want to proceed anyway
-</intent_gate>

-<autonomy_policy>
-When to proceed vs ask:
+<ask_gate>
+Proceed unless:
+(a) the action is irreversible,
+(b) it has external side effects (sending, deleting, publishing, pushing to production), or
+(c) critical information is missing that would materially change the outcome.
+If proceeding, briefly state what you did and what remains.
+</ask_gate>
+</intent>`;

- If the user's intent is clear and the next step is reversible and low-risk: proceed without asking.
- Ask only if:
-  (a) the action is irreversible,
-  (b) it has external side effects (sending, deleting, publishing, pushing to production), or
-  (c) critical information is missing that would materially change the outcome.
- If proceeding, briefly state what you did and what remains.
+  const exploreBlock = `<explore>
+## Exploration & Research

-Instruction priority:
- User instructions override default style, tone, and formatting.
- Newer instructions override older ones where they conflict.
- Safety and type-safety constraints never yield.
-
-You are an orchestrator. Your default is to delegate, not to do work yourself.
-Before acting directly, check: is there a category + skills combination for this? If yes — delegate via \`task()\`. You should be doing direct implementation less than 10% of the time.
-</autonomy_policy>
-
-<codebase_assessment>
-For open-ended tasks, assess the codebase before following patterns blindly.
+### Codebase maturity (assess on first encounter with a new repo or module)

 Quick check: config files (linter, formatter, types), 2-3 similar files for consistency, project age signals.

-Classify:
 - Disciplined (consistent patterns, configs, tests) → follow existing style strictly
 - Transitional (mixed patterns) → ask which pattern to follow
 - Legacy/Chaotic (no consistency) → propose conventions, get confirmation
 - Greenfield → apply modern best practices

-Verify before assuming: different patterns may be intentional, migration may be in progress.
-</codebase_assessment>
-
-<research>
-## Exploration & Research
+Different patterns may be intentional. Migration may be in progress. Verify before assuming.

 ${toolSelection}

@@ -201,16 +212,29 @@ ${exploreSection}

 ${librarianSection}

-### Parallel execution
+### Tool usage

-Parallelize everything independent. Multiple reads, searches, and agent fires — all at once.
-
-<tool_persistence_rules>
+<tool_persistence>
 - Use tools whenever they materially improve correctness. Your internal reasoning about file contents is unreliable.
 - Do not stop early when another tool call would improve correctness.
 - Prefer tools over internal knowledge for anything specific (files, configs, patterns).
 - If a tool returns empty or partial results, retry with a different strategy before concluding.
-</tool_persistence_rules>
+- Prefer reading MORE files over fewer. When investigating, read the full cluster of related files.
+</tool_persistence>
+
+<parallel_tools>
+- When multiple retrieval, lookup, or read steps are independent, issue them as parallel tool calls.
+- Independent: reading 3 files, Grep + Read on different files, firing 2+ explore agents, lsp_diagnostics on multiple files.
+- Dependent: needing a file path from Grep before Reading it. Sequence only these.
+- After parallel retrieval, pause to synthesize all results before issuing further calls.
+- Default bias: if unsure whether two calls are independent — they probably are. Parallelize.
+</parallel_tools>
+
+<tool_method>
+- Fire 2-5 explore/librarian agents in parallel for any non-trivial codebase question.
+- Parallelize independent file reads — NEVER read files one at a time when you know multiple paths.
+- When delegating AND doing direct work: do both simultaneously.
+</tool_method>

 Explore and Librarian agents are background grep — always \`run_in_background=true\`, always parallel.

@@ -228,23 +252,101 @@ Background result collection:
 5. Cancel disposable tasks individually via \`background_cancel(taskId="...")\`

 Stop searching when: you have enough context, same info repeating, 2 iterations with no new data, or direct answer found.
-</research>
+</explore>`;

-<implementation>
-## Implementation
+  const executionLoopBlock = `<execution_loop>
+## Execution Loop

-### Pre-implementation:
-0. Find relevant skills via \`skill\` tool and load them.
-1. Multi-step task → create todo list immediately with detailed steps. No announcements.
-2. Mark current task \`in_progress\` before starting.
-3. Mark \`completed\` immediately when done — never batch.
+Every implementation task follows this cycle. No exceptions.
+
+1. EXPLORE — Fire 2-5 explore/librarian agents + direct tools IN PARALLEL.
+   Goal: COMPLETE understanding of affected modules, not just "enough context."
+   Follow \`<explore>\` protocol for tool usage and agent prompts.
+
+2. PLAN — List files to modify, specific changes, dependencies, complexity estimate.
+   Multi-step (2+) → consult Plan Agent via \`task(subagent_type="plan", ...)\`.
+   Single-step → mental plan is sufficient.
+
+   <dependency_checks>
+   Before taking an action, check whether prerequisite discovery, lookup, or retrieval steps are required.
+   Do not skip prerequisites just because the intended final action seems obvious.
+   If the task depends on the output of a prior step, resolve that dependency first.
+   </dependency_checks>
+
+3. ROUTE — Finalize who does the work, using domain_guess from \`<intent>\` + exploration results:
+
+   | Decision | Criteria |
+   |---|---|
+   | **delegate** (DEFAULT) | Specialized domain, multi-file, >50 lines, unfamiliar module → matching category |
+   | **self** | Trivial local work only: <10 lines, single file, you have full context |
+   | **answer** | Analysis/explanation request → respond with exploration results |
+   | **ask** | Truly blocked after exhausting exploration → ask ONE precise question |
+   | **challenge** | User's design seems flawed → raise concern, propose alternative |
+
+   Visual domain → MUST delegate to \`visual-engineering\`. No exceptions.
+
+   Skills: if ANY available skill's domain overlaps with the task, load it NOW via \`skill\` tool and include it in \`load_skills\`. When the connection is even remotely plausible, load the skill — the cost of loading an irrelevant skill is near zero, the cost of missing a relevant one is high.
+
+4. EXECUTE_OR_SUPERVISE —
+   If self: surgical changes, match existing patterns, minimal diff. Never suppress type errors. Never commit unless asked. Bugfix rule: fix minimally, never refactor while fixing.
+   If delegated: exhaustive 6-section prompt per \`<delegation>\` protocol. Session continuity for follow-ups.
+
+5. VERIFY —
+
+   <verification_loop>
+   a. Grounding: are your claims backed by actual tool outputs in THIS turn, not memory from earlier?
+   b. \`lsp_diagnostics\` on ALL changed files IN PARALLEL — zero errors required. Actually clean, not "probably clean."
+   c. Tests: run related tests (modified \`foo.ts\` → look for \`foo.test.ts\`). Actually pass, not "should pass."
+   d. Build: run build if applicable — exit 0 required.
+   e. Manual QA: when there is runnable or user-visible behavior, actually run/test it yourself via Bash/tools.
+      \`lsp_diagnostics\` catches type errors, NOT functional bugs. "This should work" is not verification — RUN IT.
+      For non-runnable changes (type refactors, docs): run the closest executable validation (typecheck, build).
+   f. Delegated work: read every file the subagent touched IN PARALLEL. Never trust self-reports.
+   </verification_loop>
+
+   Fix ONLY issues caused by YOUR changes. Pre-existing issues → note them, don't fix.
+
+6. RETRY —
+
+   <failure_recovery>
+   Fix root causes, not symptoms. Re-verify after every attempt. Never make random changes hoping something works.
+   If first approach fails → try a materially different approach (different algorithm, pattern, or library).
+
+   After 3 attempts:
+   1. Stop all edits.
+   2. Revert to last known working state.
+   3. Document what was attempted.
+   4. Consult Oracle with full failure context.
+   5. If Oracle can't resolve → ask the user.
+
+   Never leave code in a broken state. Never delete failing tests to "pass."
+   </failure_recovery>
+
+7. DONE —
+
+   <completeness_contract>
+   Exit the loop ONLY when ALL of:
+   - Every planned task/todo item is marked completed
+   - Diagnostics are clean on all changed files
+   - Build passes (if applicable)
+   - User's original request is FULLY addressed — not partially, not "you can extend later"
+   - Any blocked items are explicitly marked [blocked] with what is missing
+   </completeness_contract>
+
+Progress: report at phase transitions — before exploration, after discovery, before large edits, on blockers.
+1-2 sentences each, outcome-based. Include one specific detail. Not upfront narration or scripted preambles.
+</execution_loop>`;
+
+  const delegationBlock = `<delegation>
+## Delegation System
+
+### Pre-delegation:
+0. Find relevant skills via \`skill\` tool and load them. If the task context connects to ANY available skill — even loosely — load it without hesitation. Err on the side of inclusion.

 ${categorySkillsGuide}

 ${nonClaudePlannerSection}

-${deepParallelSection}
-
 ${delegationTable}

 ### Delegation prompt structure (all 6 sections required):
@@ -258,16 +360,7 @@ ${delegationTable}
 6. CONTEXT: File paths, existing patterns, constraints
 \`\`\`

-<dependency_checks>
-Before taking an action, check whether prerequisite discovery, lookup, or retrieval steps are required.
-Do not skip prerequisites just because the intended final action seems obvious.
-If the task depends on the output of a prior step, resolve that dependency first.
-</dependency_checks>
-
-After delegation completes, verify:
- Does the result work as expected?
- Does it follow existing codebase patterns?
- Did the agent follow MUST DO and MUST NOT DO?
+Post-delegation: delegation never substitutes for verification. Always run \`<verification_loop>\` on delegated results.

 ### Session continuity

@@ -278,76 +371,55 @@ Every \`task()\` returns a session_id. Use it for all follow-ups:

 This preserves full context, avoids repeated exploration, saves 70%+ tokens.

-### Code changes:
- Match existing patterns in disciplined codebases
- Propose approach first in chaotic codebases
- Never suppress type errors (\`as any\`, \`@ts-ignore\`, \`@ts-expect-error\`)
- Never commit unless explicitly requested
- Bugfix rule: fix minimally. Never refactor while fixing.
-</implementation>
+${oracleSection ? `### Oracle

-<verification_loop>
-Before finalizing any task:
- Correctness: does the output satisfy every requirement?
- Grounding: are claims backed by actual file contents or tool outputs, not memory?
- Evidence: run \`lsp_diagnostics\` on all changed files. Actually clean, not "probably clean."
- Tests: if they exist, run them. Actually pass, not "should pass."
- Delegation: if you delegated, read every file the subagent touched. Don't trust claims.
+${oracleSection}` : ""}
+</delegation>`;

-A task is complete when:
- All planned todo items are marked done
- Diagnostics are clean on changed files
- Build passes (if applicable)
- User's original request is fully addressed
+  const styleBlock = `<style>
+## Tone

-If verification fails: fix issues caused by your changes. Do not fix pre-existing issues unless asked.
-</verification_loop>
-
-<failure_recovery>
-When fixes fail:
-1. Fix root causes, not symptoms.
-2. Re-verify after every attempt.
-3. Never make random changes hoping something works.
-
-After 3 consecutive failures:
-1. Stop all edits.
-2. Revert to last known working state.
-3. Document what was attempted.
-4. Consult Oracle with full failure context.
-5. If Oracle can't resolve → ask the user.
-
-Never leave code in a broken state. Never delete failing tests to "pass."
-</failure_recovery>
-
-${oracleSection}
-
-${taskManagementSection}
-
-<style>
 Write in complete, natural sentences. Avoid sentence fragments, bullet-only responses, and terse shorthand.

-Before taking action on a non-trivial request, briefly explain how you plan to deliver the result. This gives the user a chance to course-correct early and builds trust in your approach. Keep this explanation to two or three sentences — enough to be clear, not so much that it delays progress.
+Technical explanations should feel like a knowledgeable colleague walking you through something, not a spec sheet. Use plain language where possible, and when technical terms are necessary, make the surrounding context do the explanatory work.

-When you encounter something worth commenting on — a tradeoff, a pattern choice, a potential issue — explain it clearly rather than suggesting alternatives. Instead of "You could try X" or "Should I do Y?", explain why something works the way it does and what the implications are. The user benefits more from understanding than from a menu of options.
+When you encounter something worth commenting on — a tradeoff, a pattern choice, a potential issue — explain why something works the way it does and what the implications are. The user benefits more from understanding than from a menu of options.

-Stay kind and approachable. Technical explanations should feel like a knowledgeable colleague walking you through something, not a spec sheet. Use plain language where possible, and when technical terms are necessary, make the surrounding context do the explanatory work.
+Stay kind and approachable. Be concise in volume but generous in clarity. Every sentence should carry meaning. Skip empty preambles ("Great question!", "Sure thing!"), but do not skip context that helps the user follow your reasoning.

-Be concise in volume but generous in clarity. Every sentence should carry meaning. Skip empty preambles ("Great question!", "Sure thing!"), but do not skip context that helps the user follow your reasoning.
+If the user's approach has a problem, explain the concern directly and clearly, then describe the alternative you recommend and why it is better. Frame it as an explanation of what you found, not as a suggestion.

-If the user's approach has a problem, explain the concern directly and clearly, then describe the alternative you recommend and why it is better. Do not frame this as a suggestion — frame it as an explanation of what you found.
-</style>
+## Output

-<constraints>
-${hardBlocks}
+<output_contract>
+- Default: 3-6 sentences or ≤5 bullets
+- Simple yes/no: ≤2 sentences
+- Complex multi-file: 1 overview paragraph + ≤5 tagged bullets (What, Where, Risks, Next, Open)
+- Before taking action on a non-trivial request, briefly explain your plan in 2-3 sentences.
+</output_contract>

-${antiPatterns}
+<verbosity_controls>
+- Prefer concise, information-dense writing.
+- Avoid repeating the user's request back to them.
+- Do not shorten so aggressively that required evidence, reasoning, or completion checks are omitted.
+</verbosity_controls>
+</style>`;

-Soft guidelines:
- Prefer existing libraries over new dependencies
- Prefer small, focused changes over large refactors
- When uncertain about scope, ask
-</constraints>
-`;
+  return `${identityBlock}
+
+${constraintsBlock}
+
+${intentBlock}
+
+${exploreBlock}
+
+${executionLoopBlock}
+
+${delegationBlock}
+
+${tasksSection}
+
+${styleBlock}`;
 }

 export { categorizeTools };
--- a/src/agents/types.test.ts
+++ b/src/agents/types.test.ts
@@ -12,9 +12,9 @@ describe("isGpt5_4Model", () => {

  test("does not match other GPT models", () => {
    expect(isGpt5_4Model("openai/gpt-5.3-codex")).toBe(false);
-    expect(isGpt5_4Model("openai/gpt-5.2")).toBe(false);
+    expect(isGpt5_4Model("openai/gpt-5.1")).toBe(false);
    expect(isGpt5_4Model("openai/gpt-4o")).toBe(false);
-    expect(isGpt5_4Model("github-copilot/gpt-5.2")).toBe(false);
+    expect(isGpt5_4Model("github-copilot/gpt-4o")).toBe(false);
  });

  test("does not match non-GPT models", () => {
@@ -26,7 +26,7 @@ describe("isGpt5_4Model", () => {

 describe("isGptModel", () => {
  test("standard openai provider gpt models", () => {
-    expect(isGptModel("openai/gpt-5.2")).toBe(true);
+    expect(isGptModel("openai/gpt-5.4")).toBe(true);
    expect(isGptModel("openai/gpt-4o")).toBe(true);
  });

@@ -39,22 +39,22 @@ describe("isGptModel", () => {
  });

  test("github copilot gpt models", () => {
-    expect(isGptModel("github-copilot/gpt-5.2")).toBe(true);
+    expect(isGptModel("github-copilot/gpt-5.4")).toBe(true);
    expect(isGptModel("github-copilot/gpt-4o")).toBe(true);
  });

  test("litellm proxied gpt models", () => {
-    expect(isGptModel("litellm/gpt-5.2")).toBe(true);
+    expect(isGptModel("litellm/gpt-5.4")).toBe(true);
    expect(isGptModel("litellm/gpt-4o")).toBe(true);
  });

  test("other proxied gpt models", () => {
    expect(isGptModel("ollama/gpt-4o")).toBe(true);
-    expect(isGptModel("custom-provider/gpt-5.2")).toBe(true);
+    expect(isGptModel("custom-provider/gpt-5.4")).toBe(true);
  });

  test("venice provider gpt models", () => {
-    expect(isGptModel("venice/gpt-5.2")).toBe(true);
+    expect(isGptModel("venice/gpt-5.4")).toBe(true);
    expect(isGptModel("venice/gpt-4o")).toBe(true);
  });

@@ -108,7 +108,7 @@ describe("isGeminiModel", () => {
  });

  test("#given gpt models #then returns false", () => {
-    expect(isGeminiModel("openai/gpt-5.2")).toBe(false);
+    expect(isGeminiModel("openai/gpt-5.4")).toBe(false);
    expect(isGeminiModel("openai/o3-mini")).toBe(false);
    expect(isGeminiModel("litellm/gpt-4o")).toBe(false);
  });
--- a/src/agents/utils.test.ts
+++ b/src/agents/utils.test.ts
@@ -39,14 +39,14 @@ describe("createBuiltinAgents with model overrides", () => {
  test("Sisyphus with GPT model override has reasoningEffort, no thinking", async () => {
    // #given
    const overrides = {
-      sisyphus: { model: "github-copilot/gpt-5.2" },
+      sisyphus: { model: "github-copilot/gpt-5.4" },
    }

    // #when
    const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)

    // #then
-    expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.2")
+    expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.4")
    expect(agents.sisyphus.reasoningEffort).toBe("medium")
    expect(agents.sisyphus.thinking).toBeUndefined()
  })
@@ -54,9 +54,9 @@ describe("createBuiltinAgents with model overrides", () => {
  test("Atlas uses uiSelectedModel", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["openai/gpt-5.2", "anthropic/claude-sonnet-4-6"])
+      new Set(["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"])
    )
-    const uiSelectedModel = "openai/gpt-5.2"
+    const uiSelectedModel = "openai/gpt-5.4"

    try {
      // #when
@@ -75,7 +75,7 @@ describe("createBuiltinAgents with model overrides", () => {

      // #then
      expect(agents.atlas).toBeDefined()
-      expect(agents.atlas.model).toBe("openai/gpt-5.2")
+      expect(agents.atlas.model).toBe("openai/gpt-5.4")
    } finally {
      fetchSpy.mockRestore()
    }
@@ -84,9 +84,9 @@ describe("createBuiltinAgents with model overrides", () => {
  test("user config model takes priority over uiSelectedModel for sisyphus", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["openai/gpt-5.2", "anthropic/claude-sonnet-4-6"])
+      new Set(["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"])
    )
-    const uiSelectedModel = "openai/gpt-5.2"
+    const uiSelectedModel = "openai/gpt-5.4"
    const overrides = {
      sisyphus: { model: "google/antigravity-claude-opus-4-5-thinking" },
    }
@@ -117,9 +117,9 @@ describe("createBuiltinAgents with model overrides", () => {
  test("user config model takes priority over uiSelectedModel for atlas", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["openai/gpt-5.2", "anthropic/claude-sonnet-4-6"])
+      new Set(["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"])
    )
-    const uiSelectedModel = "openai/gpt-5.2"
+    const uiSelectedModel = "openai/gpt-5.4"
    const overrides = {
      atlas: { model: "google/antigravity-claude-opus-4-5-thinking" },
    }
@@ -173,8 +173,8 @@ describe("createBuiltinAgents with model overrides", () => {
     // #when
     const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)

-     // #then - oracle resolves via connected cache fallback to openai/gpt-5.2 (not system default)
-     expect(agents.oracle.model).toBe("openai/gpt-5.2")
+     // #then - oracle resolves via connected cache fallback to openai/gpt-5.4 (not system default)
+     expect(agents.oracle.model).toBe("openai/gpt-5.4")
     expect(agents.oracle.reasoningEffort).toBe("medium")
     expect(agents.oracle.thinking).toBeUndefined()
     cacheSpy.mockRestore?.()
@@ -196,14 +196,14 @@ describe("createBuiltinAgents with model overrides", () => {
  test("Oracle with GPT model override has reasoningEffort, no thinking", async () => {
    // #given
    const overrides = {
-      oracle: { model: "openai/gpt-5.2" },
+      oracle: { model: "openai/gpt-5.4" },
    }

    // #when
    const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)

    // #then
-    expect(agents.oracle.model).toBe("openai/gpt-5.2")
+    expect(agents.oracle.model).toBe("openai/gpt-5.4")
    expect(agents.oracle.reasoningEffort).toBe("medium")
    expect(agents.oracle.textVerbosity).toBe("high")
    expect(agents.oracle.thinking).toBeUndefined()
@@ -228,14 +228,14 @@ describe("createBuiltinAgents with model overrides", () => {
   test("non-model overrides are still applied after factory rebuild", async () => {
     // #given
     const overrides = {
-       sisyphus: { model: "github-copilot/gpt-5.2", temperature: 0.5 },
+       sisyphus: { model: "github-copilot/gpt-5.4", temperature: 0.5 },
     }

     // #when
     const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)

     // #then
-     expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.2")
+     expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.4")
     expect(agents.sisyphus.temperature).toBe(0.5)
   })

@@ -261,7 +261,7 @@ describe("createBuiltinAgents with model overrides", () => {
        "opencode/kimi-k2.5-free",
        "zai-coding-plan/glm-5",
        "opencode/big-pickle",
-        "openai/gpt-5.2",
+        "openai/gpt-5.4",
      ])
    )

@@ -298,7 +298,7 @@ describe("createBuiltinAgents with model overrides", () => {
  test("excludes hidden custom agents from orchestrator prompts", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
+      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
    )

    const customAgentSummaries = [
@@ -334,7 +334,7 @@ describe("createBuiltinAgents with model overrides", () => {
  test("excludes disabled custom agents from orchestrator prompts", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
+      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
    )

    const customAgentSummaries = [
@@ -370,7 +370,7 @@ describe("createBuiltinAgents with model overrides", () => {
  test("excludes custom agents when disabledAgents contains their name (case-insensitive)", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
+      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
    )

    const disabledAgents = ["ReSeArChEr"]
@@ -406,7 +406,7 @@ describe("createBuiltinAgents with model overrides", () => {
  test("deduplicates custom agents case-insensitively", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
+      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
    )

    const customAgentSummaries = [
@@ -438,7 +438,7 @@ describe("createBuiltinAgents with model overrides", () => {
  test("sanitizes custom agent strings for markdown tables", async () => {
    // #given
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
+      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
    )

    const customAgentSummaries = [
@@ -479,7 +479,7 @@ describe("createBuiltinAgents without systemDefaultModel", () => {

     // #then - connected cache enables model resolution despite no systemDefaultModel
     expect(agents.oracle).toBeDefined()
-     expect(agents.oracle.model).toBe("openai/gpt-5.2")
+     expect(agents.oracle.model).toBe("openai/gpt-5.4")
     cacheSpy.mockRestore?.()
   })

@@ -787,7 +787,7 @@ describe("Atlas is unaffected by environment context toggle", () => {

  beforeEach(() => {
    fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
+      new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
    )
  })

@@ -891,9 +891,9 @@ describe("createBuiltinAgents with requiresAnyModel gating (sisyphus)", () => {
  })

  test("sisyphus is not created when no fallback model is available and provider not connected", async () => {
-    // #given - only openai/gpt-5.2 available, not in sisyphus fallback chain
+    // #given - only venice/deepseek-v3.2 available, not in sisyphus fallback chain
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["openai/gpt-5.2"])
+      new Set(["venice/deepseek-v3.2"])
    )
    const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue([])

@@ -913,7 +913,7 @@ describe("createBuiltinAgents with requiresAnyModel gating (sisyphus)", () => {
    // #given - user configures a model from a plugin provider (like antigravity)
    // that is NOT in the availableModels cache and NOT in the fallback chain
    const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
-      new Set(["openai/gpt-5.2"])
+      new Set(["openai/gpt-5.4"])
    )
    const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(
      ["openai"]
@@ -1021,7 +1021,7 @@ describe("buildAgent with category and skills", () => {

    const categories = {
      "custom-category": {
-        model: "openai/gpt-5.2",
+        model: "openai/gpt-5.4",
        variant: "xhigh",
      },
    }
@@ -1030,7 +1030,7 @@ describe("buildAgent with category and skills", () => {
    const agent = buildAgent(source["test-agent"], TEST_MODEL, categories)

    // #then
-    expect(agent.model).toBe("openai/gpt-5.2")
+    expect(agent.model).toBe("openai/gpt-5.4")
    expect(agent.variant).toBe("xhigh")
  })

@@ -1247,7 +1247,7 @@ describe("override.category expansion in createBuiltinAgents", () => {
    // #given - custom category has reasoningEffort=xhigh, direct override says "low"
    const categories = {
      "test-cat": {
-        model: "openai/gpt-5.2",
+        model: "openai/gpt-5.4",
        reasoningEffort: "xhigh" as const,
      },
    }
@@ -1267,7 +1267,7 @@ describe("override.category expansion in createBuiltinAgents", () => {
    // #given - custom category has reasoningEffort, no direct reasoningEffort in override
    const categories = {
      "reasoning-cat": {
-        model: "openai/gpt-5.2",
+        model: "openai/gpt-5.4",
        reasoningEffort: "high" as const,
      },
    }
--- a/src/cli/snapshots/model-fallback.test.ts.snap
+++ b/src/cli/snapshots/model-fallback.test.ts.snap
@@ -205,7 +205,7 @@ exports[`generateModelConfig single native provider uses OpenAI models when only
      "model": "opencode/glm-4.7-free",
    },
    "metis": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "momus": {
@@ -213,17 +213,21 @@ exports[`generateModelConfig single native provider uses OpenAI models when only
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "openai/gpt-5.3-codex",
+      "model": "openai/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
      "model": "openai/gpt-5.4",
      "variant": "high",
    },
+    "sisyphus": {
+      "model": "openai/gpt-5.4",
+      "variant": "medium",
+    },
  },
  "categories": {
    "deep": {
@@ -274,7 +278,7 @@ exports[`generateModelConfig single native provider uses OpenAI models with isMa
      "model": "opencode/glm-4.7-free",
    },
    "metis": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "momus": {
@@ -282,17 +286,21 @@ exports[`generateModelConfig single native provider uses OpenAI models with isMa
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "openai/gpt-5.3-codex",
+      "model": "openai/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
      "model": "openai/gpt-5.4",
      "variant": "high",
    },
+    "sisyphus": {
+      "model": "openai/gpt-5.4",
+      "variant": "medium",
+    },
  },
  "categories": {
    "deep": {
@@ -472,11 +480,11 @@ exports[`generateModelConfig all native providers uses preferred models from fal
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "openai/gpt-5.3-codex",
+      "model": "openai/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -547,11 +555,11 @@ exports[`generateModelConfig all native providers uses preferred models with isM
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "openai/gpt-5.3-codex",
+      "model": "openai/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -623,11 +631,11 @@ exports[`generateModelConfig fallback providers uses OpenCode Zen models when on
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "opencode/gpt-5.3-codex",
+      "model": "opencode/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "opencode/gpt-5.2",
+      "model": "opencode/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -698,11 +706,11 @@ exports[`generateModelConfig fallback providers uses OpenCode Zen models with is
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "opencode/gpt-5.3-codex",
+      "model": "opencode/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "opencode/gpt-5.2",
+      "model": "opencode/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -773,7 +781,7 @@ exports[`generateModelConfig fallback providers uses GitHub Copilot models when
      "model": "github-copilot/gemini-3-flash-preview",
    },
    "oracle": {
-      "model": "github-copilot/gpt-5.2",
+      "model": "github-copilot/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -839,7 +847,7 @@ exports[`generateModelConfig fallback providers uses GitHub Copilot models with
      "model": "github-copilot/gemini-3-flash-preview",
    },
    "oracle": {
-      "model": "github-copilot/gpt-5.2",
+      "model": "github-copilot/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -1017,11 +1025,11 @@ exports[`generateModelConfig mixed provider scenarios uses Claude + OpenCode Zen
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "opencode/gpt-5.3-codex",
+      "model": "opencode/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "opencode/gpt-5.2",
+      "model": "opencode/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -1092,11 +1100,11 @@ exports[`generateModelConfig mixed provider scenarios uses OpenAI + Copilot comb
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "openai/gpt-5.3-codex",
+      "model": "openai/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -1294,11 +1302,11 @@ exports[`generateModelConfig mixed provider scenarios uses all fallback provider
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "opencode/gpt-5.3-codex",
+      "model": "opencode/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "github-copilot/gpt-5.2",
+      "model": "github-copilot/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -1369,11 +1377,11 @@ exports[`generateModelConfig mixed provider scenarios uses all providers togethe
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "openai/gpt-5.3-codex",
+      "model": "openai/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
@@ -1444,11 +1452,11 @@ exports[`generateModelConfig mixed provider scenarios uses all providers with is
      "variant": "xhigh",
    },
    "multimodal-looker": {
-      "model": "openai/gpt-5.3-codex",
+      "model": "openai/gpt-5.4",
      "variant": "medium",
    },
    "oracle": {
-      "model": "openai/gpt-5.2",
+      "model": "openai/gpt-5.4",
      "variant": "high",
    },
    "prometheus": {
--- a/src/cli/cli-program.ts
+++ b/src/cli/cli-program.ts
@@ -40,7 +40,7 @@ Examples:

 Model Providers (Priority: Native > Copilot > OpenCode Zen > Z.ai > Kimi):
  Claude        Native anthropic/ models (Opus, Sonnet, Haiku)
-  OpenAI        Native openai/ models (GPT-5.2 for Oracle)
+  OpenAI        Native openai/ models (GPT-5.4 for Oracle)
  Gemini        Native google/ models (Gemini 3 Pro, Flash)
  Copilot       github-copilot/ models (fallback)
  OpenCode Zen  opencode/ models (opencode/claude-opus-4-6, etc.)
--- a/src/cli/config-manager.test.ts
+++ b/src/cli/config-manager.test.ts
@@ -249,12 +249,13 @@ describe("generateOmoConfig - model fallback system", () => {
    // #when generating config
    const result = generateOmoConfig(config)

-    // #then Sisyphus is omitted (requires all fallback providers)
-    expect((result.agents as Record<string, { model: string }>).sisyphus).toBeUndefined()
+    // #then Sisyphus resolves to gpt-5.4 medium (openai is now in sisyphus chain)
+    expect((result.agents as Record<string, { model: string; variant?: string }>).sisyphus.model).toBe("openai/gpt-5.4")
+    expect((result.agents as Record<string, { model: string; variant?: string }>).sisyphus.variant).toBe("medium")
    // #then Oracle should use native OpenAI (first fallback entry)
-    expect((result.agents as Record<string, { model: string }>).oracle.model).toBe("openai/gpt-5.2")
-    // #then multimodal-looker should use native OpenAI (first fallback entry is gpt-5.3-codex)
-    expect((result.agents as Record<string, { model: string }>)["multimodal-looker"].model).toBe("openai/gpt-5.3-codex")
+    expect((result.agents as Record<string, { model: string }>).oracle.model).toBe("openai/gpt-5.4")
+    // #then multimodal-looker should use native OpenAI (first fallback entry is gpt-5.4)
+    expect((result.agents as Record<string, { model: string }>)["multimodal-looker"].model).toBe("openai/gpt-5.4")
  })

  test("uses haiku for explore when Claude max20", () => {
--- a/src/cli/doctor/checks/model-resolution.test.ts
+++ b/src/cli/doctor/checks/model-resolution.test.ts
@@ -61,7 +61,7 @@ describe("model-resolution check", () => {
      // given: User has override for visual-engineering category
      const mockConfig = {
        categories: {
-          "visual-engineering": { model: "openai/gpt-5.2" },
+          "visual-engineering": { model: "openai/gpt-5.4" },
        },
      }

@@ -70,8 +70,8 @@ describe("model-resolution check", () => {
      // then: visual-engineering should show the override
      const visual = info.categories.find((c) => c.name === "visual-engineering")
      expect(visual).toBeDefined()
-      expect(visual!.userOverride).toBe("openai/gpt-5.2")
-      expect(visual!.effectiveResolution).toBe("User override: openai/gpt-5.2")
+      expect(visual!.userOverride).toBe("openai/gpt-5.4")
+      expect(visual!.effectiveResolution).toBe("User override: openai/gpt-5.4")
    })

    it("shows provider fallback when no override exists", async () => {
@@ -96,7 +96,7 @@ describe("model-resolution check", () => {
      //#given User has model with variant override for oracle agent
      const mockConfig = {
        agents: {
-          oracle: { model: "openai/gpt-5.2", variant: "xhigh" },
+          oracle: { model: "openai/gpt-5.4", variant: "xhigh" },
        },
      }

@@ -106,7 +106,7 @@ describe("model-resolution check", () => {
      //#then Oracle should have userVariant set
      const oracle = info.agents.find((a) => a.name === "oracle")
      expect(oracle).toBeDefined()
-      expect(oracle!.userOverride).toBe("openai/gpt-5.2")
+      expect(oracle!.userOverride).toBe("openai/gpt-5.4")
      expect(oracle!.userVariant).toBe("xhigh")
    })

--- a/src/cli/install-validators.ts
+++ b/src/cli/install-validators.ts
@@ -32,7 +32,7 @@ export function formatConfigSummary(config: InstallConfig): string {

  const claudeDetail = config.hasClaude ? (config.isMax20 ? "max20" : "standard") : undefined
  lines.push(formatProvider("Claude", config.hasClaude, claudeDetail))
-  lines.push(formatProvider("OpenAI/ChatGPT", config.hasOpenAI, "GPT-5.2 for Oracle"))
+  lines.push(formatProvider("OpenAI/ChatGPT", config.hasOpenAI, "GPT-5.4 for Oracle"))
  lines.push(formatProvider("Gemini", config.hasGemini))
  lines.push(formatProvider("GitHub Copilot", config.hasCopilot, "fallback"))
  lines.push(formatProvider("OpenCode Zen", config.hasOpencodeZen, "opencode/ models"))
--- a/src/cli/model-fallback-requirements.ts
+++ b/src/cli/model-fallback-requirements.ts
@@ -13,6 +13,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
        variant: "max",
      },
      { providers: ["kimi-for-coding"], model: "k2p5" },
+      { providers: ["openai", "github-copilot", "opencode"], model: "gpt-5.4", variant: "medium" },
      { providers: ["zai-coding-plan", "opencode"], model: "glm-5" },
    ],
    requiresAnyModel: true,
@@ -31,7 +32,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
    fallbackChain: [
      {
        providers: ["openai", "github-copilot", "opencode"],
-        model: "gpt-5.2",
+        model: "gpt-5.4",
        variant: "high",
      },
      {
@@ -67,7 +68,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
    fallbackChain: [
      {
        providers: ["openai", "opencode"],
-        model: "gpt-5.3-codex",
+        model: "gpt-5.4",
        variant: "medium",
      },
      { providers: ["kimi-for-coding"], model: "k2p5" },
@@ -108,7 +109,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
      { providers: ["kimi-for-coding"], model: "k2p5" },
      {
        providers: ["openai", "github-copilot", "opencode"],
-        model: "gpt-5.2",
+        model: "gpt-5.4",
        variant: "high",
      },
      {
@@ -224,7 +225,7 @@ export const CLI_CATEGORY_MODEL_REQUIREMENTS: Record<string, ModelRequirement> =
        },
        {
          providers: ["openai", "github-copilot", "opencode"],
-          model: "gpt-5.2",
+          model: "gpt-5.4",
        },
      ],
      requiresModel: "gemini-3.1-pro",
--- a/src/cli/model-fallback.test.ts
+++ b/src/cli/model-fallback.test.ts
@@ -396,7 +396,7 @@ describe("generateModelConfig", () => {
      expect(result.agents?.sisyphus?.model).toBe("anthropic/claude-opus-4-6")
    })

-    test("Sisyphus is omitted when no fallback provider is available (OpenAI not in chain)", () => {
+    test("Sisyphus resolves to gpt-5.4 medium when only OpenAI is available", () => {
      // #given
      const config = createConfig({ hasOpenAI: true })

@@ -404,7 +404,8 @@ describe("generateModelConfig", () => {
      const result = generateModelConfig(config)

      // #then
-      expect(result.agents?.sisyphus).toBeUndefined()
+      expect(result.agents?.sisyphus?.model).toBe("openai/gpt-5.4")
+      expect(result.agents?.sisyphus?.variant).toBe("medium")
    })
  })

--- a/src/cli/tui-install-prompts.ts
+++ b/src/cli/tui-install-prompts.ts
@@ -44,7 +44,7 @@ export async function promptInstallConfig(detected: DetectedConfig): Promise<Ins
    message: "Do you have an OpenAI/ChatGPT Plus subscription?",
    options: [
      { value: "no", label: "No", hint: "Oracle will use fallback models" },
-      { value: "yes", label: "Yes", hint: "GPT-5.2 for Oracle (high-IQ debugging)" },
+      { value: "yes", label: "Yes", hint: "GPT-5.4 for Oracle (high-IQ debugging)" },
    ],
    initialValue: initial.openai,
  })
@@ -74,7 +74,7 @@ export async function promptInstallConfig(detected: DetectedConfig): Promise<Ins
    message: "Do you have access to OpenCode Zen (opencode/ models)?",
    options: [
      { value: "no", label: "No", hint: "Will use other configured providers" },
-      { value: "yes", label: "Yes", hint: "opencode/claude-opus-4-6, opencode/gpt-5.2, etc." },
+      { value: "yes", label: "Yes", hint: "opencode/claude-opus-4-6, opencode/gpt-5.4, etc." },
    ],
    initialValue: initial.opencodeZen,
  })
--- a/src/config/schema.test.ts
+++ b/src/config/schema.test.ts
@@ -266,7 +266,7 @@ describe("AgentOverrideConfigSchema", () => {
  describe("backward compatibility", () => {
    test("still accepts model field (deprecated)", () => {
      // given
-      const config = { model: "openai/gpt-5.2" }
+      const config = { model: "openai/gpt-5.4" }

      // when
      const result = AgentOverrideConfigSchema.safeParse(config)
@@ -274,14 +274,14 @@ describe("AgentOverrideConfigSchema", () => {
      // then
      expect(result.success).toBe(true)
      if (result.success) {
-        expect(result.data.model).toBe("openai/gpt-5.2")
+        expect(result.data.model).toBe("openai/gpt-5.4")
      }
    })

    test("accepts both model and category (deprecated usage)", () => {
      // given - category should take precedence at runtime, but both should validate
      const config = { 
-        model: "openai/gpt-5.2",
+        model: "openai/gpt-5.4",
        category: "ultrabrain"
      }

@@ -291,7 +291,7 @@ describe("AgentOverrideConfigSchema", () => {
      // then
      expect(result.success).toBe(true)
      if (result.success) {
-        expect(result.data.model).toBe("openai/gpt-5.2")
+        expect(result.data.model).toBe("openai/gpt-5.4")
        expect(result.data.category).toBe("ultrabrain")
      }
    })
@@ -343,7 +343,7 @@ describe("AgentOverrideConfigSchema", () => {
 describe("CategoryConfigSchema", () => {
  test("accepts variant as optional string", () => {
    // given
-    const config = { model: "openai/gpt-5.2", variant: "xhigh" }
+    const config = { model: "openai/gpt-5.4", variant: "xhigh" }

    // when
    const result = CategoryConfigSchema.safeParse(config)
@@ -371,7 +371,7 @@ describe("CategoryConfigSchema", () => {

  test("rejects non-string variant", () => {
    // given
-    const config = { model: "openai/gpt-5.2", variant: 123 }
+    const config = { model: "openai/gpt-5.4", variant: 123 }

    // when
    const result = CategoryConfigSchema.safeParse(config)
@@ -413,7 +413,7 @@ describe("Sisyphus-Junior agent override", () => {
    const config = {
      agents: {
        "sisyphus-junior": {
-          model: "openai/gpt-5.2",
+          model: "openai/gpt-5.4",
          temperature: 0.2,
        },
      },
@@ -426,7 +426,7 @@ describe("Sisyphus-Junior agent override", () => {
    expect(result.success).toBe(true)
    if (result.success) {
      expect(result.data.agents?.["sisyphus-junior"]).toBeDefined()
-      expect(result.data.agents?.["sisyphus-junior"]?.model).toBe("openai/gpt-5.2")
+      expect(result.data.agents?.["sisyphus-junior"]?.model).toBe("openai/gpt-5.4")
      expect(result.data.agents?.["sisyphus-junior"]?.temperature).toBe(0.2)
    }
  })
--- a/src/features/background-agent/manager.test.ts
+++ b/src/features/background-agent/manager.test.ts
@@ -224,6 +224,12 @@ function stubNotifyParentSession(manager: BackgroundManager): void {
  ;(manager as unknown as { notifyParentSession: () => Promise<void> }).notifyParentSession = async () => {}
 }

+async function flushBackgroundNotifications(): Promise<void> {
+  for (let i = 0; i < 6; i++) {
+    await Promise.resolve()
+  }
+}
+
 function createToastRemoveTaskTracker(): { removeTaskCalls: string[]; resetToastManager: () => void } {
  _resetTaskToastManagerForTesting()
  const toastManager = initTaskToastManager({
@@ -1306,11 +1312,20 @@ describe("BackgroundManager.tryCompleteTask", () => {
    expect(abortedSessionIDs).toEqual(["session-1"])
  })

-  test("should clean pendingByParent even when notifyParentSession throws", async () => {
+  test("should clean pendingByParent even when promptAsync notification fails", async () => {
    // given
-    ;(manager as unknown as { notifyParentSession: () => Promise<void> }).notifyParentSession = async () => {
-      throw new Error("notify failed")
+    const client = {
+      session: {
+        prompt: async () => ({}),
+        promptAsync: async () => {
+          throw new Error("notify failed")
+        },
+        abort: async () => ({}),
+        messages: async () => ({ data: [] }),
+      },
    }
+    manager.shutdown()
+    manager = new BackgroundManager({ client, directory: tmpdir() } as unknown as PluginInput)

    const task: BackgroundTask = {
      id: "task-pending-cleanup",
@@ -1424,7 +1439,7 @@ describe("BackgroundManager.tryCompleteTask", () => {
    // then
    expect(rejectedCount).toBe(0)
    expect(promptBodies.length).toBe(2)
-    expect(promptBodies.some((b) => b.noReply === false)).toBe(true)
+    expect(promptBodies.filter((body) => body.noReply === false)).toHaveLength(1)
  })
 })

@@ -1932,7 +1947,6 @@ describe("BackgroundManager - Non-blocking Queue Integration", () => {
    test("should cancel running task and release concurrency", async () => {
      // given
      const manager = createBackgroundManager()
-      stubNotifyParentSession(manager)

      const concurrencyManager = getConcurrencyManager(manager)
      const concurrencyKey = "test-provider/test-model"
@@ -2078,7 +2092,7 @@ describe("BackgroundManager - Non-blocking Queue Integration", () => {
        description: "Task 2",
        prompt: "Do something else",
        agent: "test-agent",
-        model: { providerID: "openai", modelID: "gpt-5.2" },
+        model: { providerID: "openai", modelID: "gpt-5.4" },
        parentSessionID: "parent-session",
        parentMessageID: "parent-message",
      }
@@ -2890,7 +2904,7 @@ describe("BackgroundManager.shutdown session abort", () => {
 })

 describe("BackgroundManager.handleEvent - session.deleted cascade", () => {
-  test("should cancel descendant tasks when parent session is deleted", () => {
+  test("should cancel descendant tasks and keep them until delayed cleanup", async () => {
    // given
    const manager = createBackgroundManager()
    const parentSessionID = "session-parent"
@@ -2937,21 +2951,26 @@ describe("BackgroundManager.handleEvent - session.deleted cascade", () => {
      properties: { info: { id: parentSessionID } },
    })

+    await flushBackgroundNotifications()
+
    // then
-    expect(taskMap.has(childTask.id)).toBe(false)
-    expect(taskMap.has(siblingTask.id)).toBe(false)
-    expect(taskMap.has(grandchildTask.id)).toBe(false)
+    expect(taskMap.has(childTask.id)).toBe(true)
+    expect(taskMap.has(siblingTask.id)).toBe(true)
+    expect(taskMap.has(grandchildTask.id)).toBe(true)
    expect(taskMap.has(unrelatedTask.id)).toBe(true)
    expect(childTask.status).toBe("cancelled")
    expect(siblingTask.status).toBe("cancelled")
    expect(grandchildTask.status).toBe("cancelled")
    expect(pendingByParent.get(parentSessionID)).toBeUndefined()
    expect(pendingByParent.get("session-child")).toBeUndefined()
+    expect(getCompletionTimers(manager).has(childTask.id)).toBe(true)
+    expect(getCompletionTimers(manager).has(siblingTask.id)).toBe(true)
+    expect(getCompletionTimers(manager).has(grandchildTask.id)).toBe(true)

    manager.shutdown()
  })

-  test("should remove tasks from toast manager when session is deleted", () => {
+  test("should remove cancelled tasks from toast manager while preserving delayed cleanup", async () => {
    //#given
    const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
    const manager = createBackgroundManager()
@@ -2980,9 +2999,13 @@ describe("BackgroundManager.handleEvent - session.deleted cascade", () => {
      properties: { info: { id: parentSessionID } },
    })

+    await flushBackgroundNotifications()
+
    //#then
    expect(removeTaskCalls).toContain(childTask.id)
    expect(removeTaskCalls).toContain(grandchildTask.id)
+    expect(getCompletionTimers(manager).has(childTask.id)).toBe(true)
+    expect(getCompletionTimers(manager).has(grandchildTask.id)).toBe(true)

    manager.shutdown()
    resetToastManager()
@@ -3045,7 +3068,7 @@ describe("BackgroundManager.handleEvent - session.error", () => {
    return task
  }

-  test("sets task to error, releases concurrency, and cleans up", async () => {
+  test("sets task to error, releases concurrency, and keeps it until delayed cleanup", async () => {
    //#given
    const manager = createBackgroundManager()
    const concurrencyManager = getConcurrencyManager(manager)
@@ -3078,18 +3101,21 @@ describe("BackgroundManager.handleEvent - session.error", () => {
      },
    })

+    await flushBackgroundNotifications()
+
    //#then
    expect(task.status).toBe("error")
    expect(task.error).toBe("Model not found: kimi-for-coding/k2p5.")
    expect(task.completedAt).toBeInstanceOf(Date)
    expect(concurrencyManager.getCount(concurrencyKey)).toBe(0)
-    expect(getTaskMap(manager).has(task.id)).toBe(false)
+    expect(getTaskMap(manager).has(task.id)).toBe(true)
    expect(getPendingByParent(manager).get(task.parentSessionID)).toBeUndefined()
+    expect(getCompletionTimers(manager).has(task.id)).toBe(true)

    manager.shutdown()
  })

-  test("removes errored task from toast manager", () => {
+  test("should remove errored task from toast manager while preserving delayed cleanup", async () => {
    //#given
    const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
    const manager = createBackgroundManager()
@@ -3111,8 +3137,11 @@ describe("BackgroundManager.handleEvent - session.error", () => {
      },
    })

+    await flushBackgroundNotifications()
+
    //#then
    expect(removeTaskCalls).toContain(task.id)
+    expect(getCompletionTimers(manager).has(task.id)).toBe(true)

    manager.shutdown()
    resetToastManager()
@@ -3393,7 +3422,7 @@ describe("BackgroundManager.pruneStaleTasksAndNotifications - removes pruned tas
    manager.shutdown()
  })

-  test("removes stale task from toast manager", () => {
+  test("removes stale task from toast manager", async () => {
    //#given
    const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
    const manager = createBackgroundManager()
@@ -3408,6 +3437,7 @@ describe("BackgroundManager.pruneStaleTasksAndNotifications - removes pruned tas

    //#when
    pruneStaleTasksAndNotificationsForTest(manager)
+    await flushBackgroundNotifications()

    //#then
    expect(removeTaskCalls).toContain(staleTask.id)
@@ -3415,6 +3445,53 @@ describe("BackgroundManager.pruneStaleTasksAndNotifications - removes pruned tas
    manager.shutdown()
    resetToastManager()
  })
+
+  test("keeps stale task until notification cleanup after notifying parent", async () => {
+    //#given
+    const notifications: string[] = []
+    const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
+    const client = {
+      session: {
+        prompt: async () => ({}),
+        promptAsync: async (args: { path: { id: string }; body: Record<string, unknown> & { noReply?: boolean; parts?: unknown[] } }) => {
+          const firstPart = args.body.parts?.[0]
+          if (firstPart && typeof firstPart === "object" && "text" in firstPart && typeof firstPart.text === "string") {
+            notifications.push(firstPart.text)
+          }
+          return {}
+        },
+        abort: async () => ({}),
+        messages: async () => ({ data: [] }),
+      },
+    }
+    const manager = new BackgroundManager({ client, directory: tmpdir() } as unknown as PluginInput)
+    const staleTask = createMockTask({
+      id: "task-stale-notify-cleanup",
+      sessionID: "session-stale-notify-cleanup",
+      parentSessionID: "parent-stale-notify-cleanup",
+      status: "running",
+      startedAt: new Date(Date.now() - 31 * 60 * 1000),
+    })
+    getTaskMap(manager).set(staleTask.id, staleTask)
+    getPendingByParent(manager).set(staleTask.parentSessionID, new Set([staleTask.id]))
+
+    //#when
+    pruneStaleTasksAndNotificationsForTest(manager)
+    await flushBackgroundNotifications()
+
+    //#then
+    const retainedTask = getTaskMap(manager).get(staleTask.id)
+    expect(retainedTask?.status).toBe("error")
+    expect(getTaskMap(manager).has(staleTask.id)).toBe(true)
+    expect(notifications).toHaveLength(1)
+    expect(notifications[0]).toContain("[ALL BACKGROUND TASKS COMPLETE]")
+    expect(notifications[0]).toContain(staleTask.description)
+    expect(getCompletionTimers(manager).has(staleTask.id)).toBe(true)
+    expect(removeTaskCalls).toContain(staleTask.id)
+
+    manager.shutdown()
+    resetToastManager()
+  })
 })

 describe("BackgroundManager.completionTimers - Memory Leak Fix", () => {
@@ -3518,7 +3595,7 @@ describe("BackgroundManager.completionTimers - Memory Leak Fix", () => {
    expect(completionTimers.size).toBe(0)
  })

-  test("should cancel timer when task is deleted via session.deleted", () => {
+  test("should preserve cleanup timer when terminal task session is deleted", () => {
    // given
    const manager = createBackgroundManager()
    const task: BackgroundTask = {
@@ -3547,7 +3624,7 @@ describe("BackgroundManager.completionTimers - Memory Leak Fix", () => {
    })

    // then
-    expect(completionTimers.has(task.id)).toBe(false)
+    expect(completionTimers.has(task.id)).toBe(true)

    manager.shutdown()
  })
--- a/src/features/background-agent/manager.ts
+++ b/src/features/background-agent/manager.ts
@@ -390,7 +390,6 @@ export class BackgroundManager {
        }).catch(() => {})

        this.markForNotification(existingTask)
-        this.cleanupPendingByParent(existingTask)
        this.enqueueNotificationForParent(existingTask.parentSessionID, () => this.notifyParentSession(existingTask)).catch(err => {
          log("[background-agent] Failed to notify on error:", err)
        })
@@ -661,7 +660,6 @@ export class BackgroundManager {
      }

      this.markForNotification(existingTask)
-      this.cleanupPendingByParent(existingTask)
      this.enqueueNotificationForParent(existingTask.parentSessionID, () => this.notifyParentSession(existingTask)).catch(err => {
        log("[background-agent] Failed to notify on resume error:", err)
      })
@@ -804,16 +802,14 @@ export class BackgroundManager {
        this.idleDeferralTimers.delete(task.id)
      }

-      this.cleanupPendingByParent(task)
-      this.tasks.delete(task.id)
-      this.clearNotificationsForTask(task.id)
-      const toastManager = getTaskToastManager()
-      if (toastManager) {
-        toastManager.removeTask(task.id)
-      }
      if (task.sessionID) {
-        subagentSessions.delete(task.sessionID)
+        SessionCategoryRegistry.remove(task.sessionID)
      }
+
+      this.markForNotification(task)
+      this.enqueueNotificationForParent(task.parentSessionID, () => this.notifyParentSession(task)).catch(err => {
+        log("[background-agent] Error in notifyParentSession for errored task:", { taskId: task.id, error: err })
+      })
    }

    if (event.type === "session.deleted") {
@@ -834,47 +830,30 @@ export class BackgroundManager {

      if (tasksToCancel.size === 0) return

+      const deletedSessionIDs = new Set<string>([sessionID])
+      for (const task of tasksToCancel.values()) {
+        if (task.sessionID) {
+          deletedSessionIDs.add(task.sessionID)
+        }
+      }
+
      for (const task of tasksToCancel.values()) {
        if (task.status === "running" || task.status === "pending") {
          void this.cancelTask(task.id, {
            source: "session.deleted",
            reason: "Session deleted",
-            skipNotification: true,
+          }).then(() => {
+            if (deletedSessionIDs.has(task.parentSessionID)) {
+              this.pendingNotifications.delete(task.parentSessionID)
+            }
          }).catch(err => {
+            if (deletedSessionIDs.has(task.parentSessionID)) {
+              this.pendingNotifications.delete(task.parentSessionID)
+            }
            log("[background-agent] Failed to cancel task on session.deleted:", { taskId: task.id, error: err })
          })
        }
-
-        const existingTimer = this.completionTimers.get(task.id)
-        if (existingTimer) {
-          clearTimeout(existingTimer)
-          this.completionTimers.delete(task.id)
-        }
-
-        const idleTimer = this.idleDeferralTimers.get(task.id)
-        if (idleTimer) {
-          clearTimeout(idleTimer)
-          this.idleDeferralTimers.delete(task.id)
-        }
-
-        this.cleanupPendingByParent(task)
-        this.tasks.delete(task.id)
-        this.clearNotificationsForTask(task.id)
-        const toastManager = getTaskToastManager()
-        if (toastManager) {
-          toastManager.removeTask(task.id)
-        }
-        if (task.sessionID) {
-          subagentSessions.delete(task.sessionID)
-        }
      }
-
-      for (const task of tasksToCancel.values()) {
-        if (task.parentSessionID) {
-          this.pendingNotifications.delete(task.parentSessionID)
-        }
-      }
-
      SessionCategoryRegistry.remove(sessionID)
    }

@@ -1094,8 +1073,6 @@ export class BackgroundManager {
      this.idleDeferralTimers.delete(task.id)
    }

-    this.cleanupPendingByParent(task)
-
    if (abortSession && task.sessionID) {
      this.client.session.abort({
        path: { id: task.sessionID },
@@ -1202,9 +1179,6 @@ export class BackgroundManager {

    this.markForNotification(task)

-    // Ensure pending tracking is cleaned up even if notification fails
-    this.cleanupPendingByParent(task)
-
    const idleTimer = this.idleDeferralTimers.get(task.id)
    if (idleTimer) {
      clearTimeout(idleTimer)
@@ -1260,7 +1234,10 @@ export class BackgroundManager {
        this.pendingByParent.delete(task.parentSessionID)
      }
    } else {
-      allComplete = true
+      remainingCount = Array.from(this.tasks.values())
+        .filter(t => t.parentSessionID === task.parentSessionID && t.id !== task.id && (t.status === "running" || t.status === "pending"))
+        .length
+      allComplete = remainingCount === 0
    }

    const completedTasks = allComplete
@@ -1268,7 +1245,13 @@ export class BackgroundManager {
        .filter(t => t.parentSessionID === task.parentSessionID && t.status !== "running" && t.status !== "pending")
      : []

-    const statusText = task.status === "completed" ? "COMPLETED" : task.status === "interrupt" ? "INTERRUPTED" : "CANCELLED"
+    const statusText = task.status === "completed"
+      ? "COMPLETED"
+      : task.status === "interrupt"
+        ? "INTERRUPTED"
+        : task.status === "error"
+          ? "ERROR"
+          : "CANCELLED"
    const errorInfo = task.error ? `\n**Error:** ${task.error}` : ""

    let notification: string
@@ -1399,8 +1382,13 @@ Use \`background_output(task_id="${task.id}")\` to retrieve this result when rea
        }
        const timer = setTimeout(() => {
          this.completionTimers.delete(taskId)
-          if (this.tasks.has(taskId)) {
+          const taskToRemove = this.tasks.get(taskId)
+          if (taskToRemove) {
            this.clearNotificationsForTask(taskId)
+            if (taskToRemove.sessionID) {
+              subagentSessions.delete(taskToRemove.sessionID)
+              SessionCategoryRegistry.remove(taskToRemove.sessionID)
+            }
            this.tasks.delete(taskId)
            log("[background-agent] Removed completed task from memory:", taskId)
          }
@@ -1435,11 +1423,21 @@ Use \`background_output(task_id="${task.id}")\` to retrieve this result when rea
        task.status = "error"
        task.error = errorMessage
        task.completedAt = new Date()
+        this.taskHistory.record(task.parentSessionID, { id: task.id, sessionID: task.sessionID, agent: task.agent, description: task.description, status: "error", category: task.category, startedAt: task.startedAt, completedAt: task.completedAt })
        if (task.concurrencyKey) {
          this.concurrencyManager.release(task.concurrencyKey)
          task.concurrencyKey = undefined
        }
-        this.cleanupPendingByParent(task)
+        const existingTimer = this.completionTimers.get(taskId)
+        if (existingTimer) {
+          clearTimeout(existingTimer)
+          this.completionTimers.delete(taskId)
+        }
+        const idleTimer = this.idleDeferralTimers.get(taskId)
+        if (idleTimer) {
+          clearTimeout(idleTimer)
+          this.idleDeferralTimers.delete(taskId)
+        }
        if (wasPending) {
          const key = task.model
            ? `${task.model.providerID}/${task.model.modelID}`
@@ -1455,16 +1453,10 @@ Use \`background_output(task_id="${task.id}")\` to retrieve this result when rea
            }
          }
        }
-        this.clearNotificationsForTask(taskId)
-        const toastManager = getTaskToastManager()
-        if (toastManager) {
-          toastManager.removeTask(taskId)
-        }
-        this.tasks.delete(taskId)
-        if (task.sessionID) {
-          subagentSessions.delete(task.sessionID)
-          SessionCategoryRegistry.remove(task.sessionID)
-        }
+        this.markForNotification(task)
+        this.enqueueNotificationForParent(task.parentSessionID, () => this.notifyParentSession(task)).catch(err => {
+          log("[background-agent] Error in notifyParentSession for stale-pruned task:", { taskId: task.id, error: err })
+        })
      },
    })
  }
--- a/src/features/background-agent/task-poller.test.ts
+++ b/src/features/background-agent/task-poller.test.ts
@@ -422,4 +422,38 @@ describe("pruneStaleTasksAndNotifications", () => {
    //#then
    expect(pruned).toContain("old-task")
  })
+
+  it("should skip terminal tasks even when they exceeded TTL", () => {
+    //#given
+    const tasks = new Map<string, BackgroundTask>()
+    const oldStartedAt = new Date(Date.now() - 31 * 60 * 1000)
+    const terminalStatuses: BackgroundTask["status"][] = ["completed", "error", "cancelled", "interrupt"]
+
+    for (const status of terminalStatuses) {
+      tasks.set(status, {
+        id: status,
+        parentSessionID: "parent",
+        parentMessageID: "msg",
+        description: status,
+        prompt: status,
+        agent: "explore",
+        status,
+        startedAt: oldStartedAt,
+        completedAt: new Date(),
+      })
+    }
+
+    const pruned: string[] = []
+
+    //#when
+    pruneStaleTasksAndNotifications({
+      tasks,
+      notifications: new Map<string, BackgroundTask[]>(),
+      onTaskPruned: (taskId) => pruned.push(taskId),
+    })
+
+    //#then
+    expect(pruned).toEqual([])
+    expect(Array.from(tasks.keys())).toEqual(terminalStatuses)
+  })
 })
--- a/src/features/background-agent/task-poller.ts
+++ b/src/features/background-agent/task-poller.ts
@@ -12,6 +12,13 @@ import {
  TASK_TTL_MS,
 } from "./constants"

+const TERMINAL_TASK_STATUSES = new Set<BackgroundTask["status"]>([
+  "completed",
+  "error",
+  "cancelled",
+  "interrupt",
+])
+
 export function pruneStaleTasksAndNotifications(args: {
  tasks: Map<string, BackgroundTask>
  notifications: Map<string, BackgroundTask[]>
@@ -21,6 +28,8 @@ export function pruneStaleTasksAndNotifications(args: {
  const now = Date.now()

  for (const [taskId, task] of tasks.entries()) {
+    if (TERMINAL_TASK_STATUSES.has(task.status)) continue
+
    const timestamp = task.status === "pending"
      ? task.queuedAt?.getTime()
      : task.startedAt?.getTime()
--- a/src/features/builtin-commands/commands.ts
+++ b/src/features/builtin-commands/commands.ts
@@ -1,7 +1,7 @@
 import type { CommandDefinition } from "../claude-code-command-loader"
 import type { BuiltinCommandName, BuiltinCommands } from "./types"
 import { INIT_DEEP_TEMPLATE } from "./templates/init-deep"
-import { RALPH_LOOP_TEMPLATE, CANCEL_RALPH_TEMPLATE } from "./templates/ralph-loop"
+import { RALPH_LOOP_TEMPLATE, ULW_LOOP_TEMPLATE, CANCEL_RALPH_TEMPLATE } from "./templates/ralph-loop"
 import { STOP_CONTINUATION_TEMPLATE } from "./templates/stop-continuation"
 import { REFACTOR_TEMPLATE } from "./templates/refactor"
 import { START_WORK_TEMPLATE } from "./templates/start-work"
@@ -31,16 +31,16 @@ $ARGUMENTS
     argumentHint: '"task description" [--completion-promise=TEXT] [--max-iterations=N] [--strategy=reset|continue]',
   },
   "ulw-loop": {
-     description: "(builtin) Start ultrawork loop - continues until completion with ultrawork mode",
-     template: `<command-instruction>
-${RALPH_LOOP_TEMPLATE}
+      description: "(builtin) Start ultrawork loop - continues until completion with ultrawork mode",
+      template: `<command-instruction>
+${ULW_LOOP_TEMPLATE}
 </command-instruction>

 <user-task>
 $ARGUMENTS
 </user-task>`,
-     argumentHint: '"task description" [--completion-promise=TEXT] [--max-iterations=N] [--strategy=reset|continue]',
-   },
+      argumentHint: '"task description" [--completion-promise=TEXT] [--strategy=reset|continue]',
+    },
  "cancel-ralph": {
    description: "(builtin) Cancel active Ralph Loop",
    template: `<command-instruction>
--- a/src/features/builtin-commands/templates/ralph-loop.ts
+++ b/src/features/builtin-commands/templates/ralph-loop.ts
@@ -28,6 +28,34 @@ Parse the arguments below and begin working on the task. The format is:

 Default completion promise is "DONE" and default max iterations is 100.`

+export const ULW_LOOP_TEMPLATE = `You are starting an ULTRAWORK Loop - a self-referential development loop that runs until verified completion.
+
+## How ULTRAWORK Loop Works
+
+1. You will work on the task continuously
+2. When you believe the work is complete, output: \`<promise>{{COMPLETION_PROMISE}}</promise>\`
+3. That does NOT finish the loop yet. The system will require Oracle verification
+4. The loop only ends after the system confirms Oracle verified the result
+5. There is no iteration limit
+
+## Rules
+
+- Focus on finishing the task completely
+- After you emit the completion promise, run Oracle verification when instructed
+- Do not treat DONE as final completion until Oracle verifies it
+
+## Exit Conditions
+
+1. **Verified Completion**: Oracle verifies the result and the system confirms it
+2. **Cancel**: User runs \`/cancel-ralph\`
+
+## Your Task
+
+Parse the arguments below and begin working on the task. The format is:
+\`"task description" [--completion-promise=TEXT] [--strategy=reset|continue]\`
+
+Default completion promise is "DONE".`
+
 export const CANCEL_RALPH_TEMPLATE = `Cancel the currently active Ralph Loop.

 This will:
--- a/src/hooks/anthropic-effort/index.test.ts
+++ b/src/hooks/anthropic-effort/index.test.ts
@@ -162,7 +162,7 @@ describe("createAnthropicEffortHook", () => {
      const hook = createAnthropicEffortHook()
      const { input, output } = createMockParams({
        providerID: "openai",
-        modelID: "gpt-5.2",
+        modelID: "gpt-5.4",
      })

      //#when chat.params hook is called
--- a/src/hooks/atlas/compaction-agent-filter.test.ts
+++ b/src/hooks/atlas/compaction-agent-filter.test.ts
@@ -0,0 +1,108 @@
+declare const require: (name: string) => any
+const { afterEach, beforeEach, describe, expect, mock, test } = require("bun:test")
+import { existsSync, mkdirSync, rmSync, writeFileSync } from "node:fs"
+import { tmpdir } from "node:os"
+import { join } from "node:path"
+import { randomUUID } from "node:crypto"
+
+import { clearBoulderState, writeBoulderState } from "../../features/boulder-state"
+import { _resetForTesting } from "../../features/claude-code-session-state"
+import type { BoulderState } from "../../features/boulder-state"
+
+const TEST_STORAGE_ROOT = join(tmpdir(), `atlas-compaction-storage-${randomUUID()}`)
+const TEST_MESSAGE_STORAGE = join(TEST_STORAGE_ROOT, "message")
+const TEST_PART_STORAGE = join(TEST_STORAGE_ROOT, "part")
+
+mock.module("../../features/hook-message-injector/constants", () => ({
+  OPENCODE_STORAGE: TEST_STORAGE_ROOT,
+  MESSAGE_STORAGE: TEST_MESSAGE_STORAGE,
+  PART_STORAGE: TEST_PART_STORAGE,
+}))
+
+mock.module("../../shared/opencode-message-dir", () => ({
+  getMessageDir: (sessionID: string) => {
+    const directory = join(TEST_MESSAGE_STORAGE, sessionID)
+    return existsSync(directory) ? directory : null
+  },
+}))
+
+mock.module("../../shared/opencode-storage-detection", () => ({
+  isSqliteBackend: () => false,
+}))
+
+const { createAtlasHook } = await import("./index")
+
+describe("atlas hook compaction agent filtering", () => {
+  let testDirectory: string
+
+  function createMockPluginInput() {
+    const promptMock = mock(() => Promise.resolve())
+    return {
+      directory: testDirectory,
+      client: {
+        session: {
+          prompt: promptMock,
+          promptAsync: promptMock,
+        },
+      },
+      _promptMock: promptMock,
+    } as Parameters<typeof createAtlasHook>[0] & { _promptMock: ReturnType<typeof mock> }
+  }
+
+  function writeMessage(sessionID: string, fileName: string, agent: string): void {
+    const messageDir = join(TEST_MESSAGE_STORAGE, sessionID)
+    mkdirSync(messageDir, { recursive: true })
+    writeFileSync(
+      join(messageDir, fileName),
+      JSON.stringify({
+        agent,
+        model: { providerID: "anthropic", modelID: "claude-opus-4-6" },
+      }),
+    )
+  }
+
+  beforeEach(() => {
+    testDirectory = join(tmpdir(), `atlas-compaction-test-${randomUUID()}`)
+    mkdirSync(testDirectory, { recursive: true })
+    clearBoulderState(testDirectory)
+    _resetForTesting()
+  })
+
+  afterEach(() => {
+    clearBoulderState(testDirectory)
+    rmSync(testDirectory, { recursive: true, force: true })
+    _resetForTesting()
+  })
+
+  test("should inject continuation when the latest message is compaction but the previous agent matches atlas", async () => {
+    // given
+    const sessionID = "main-session-after-compaction"
+    const planPath = join(testDirectory, "test-plan.md")
+    writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [ ] Task 2")
+
+    const state: BoulderState = {
+      active_plan: planPath,
+      started_at: "2026-01-02T10:00:00Z",
+      session_ids: [sessionID],
+      plan_name: "test-plan",
+      agent: "atlas",
+    }
+    writeBoulderState(testDirectory, state)
+    writeMessage(sessionID, "msg_001.json", "atlas")
+    writeMessage(sessionID, "msg_002.json", "compaction")
+
+    const mockInput = createMockPluginInput()
+    const hook = createAtlasHook(mockInput)
+
+    // when
+    await hook.handler({
+      event: {
+        type: "session.idle",
+        properties: { sessionID },
+      },
+    })
+
+    // then
+    expect(mockInput._promptMock).toHaveBeenCalledTimes(1)
+  })
+})
--- a/src/hooks/atlas/index.test.ts
+++ b/src/hooks/atlas/index.test.ts
@@ -409,6 +409,123 @@ describe("atlas hook", () => {
      cleanupMessageStorage(sessionID)
    })

+    describe("completion gate output ordering", () => {
+      const COMPLETION_GATE_SESSION = "completion-gate-order-test"
+
+      beforeEach(() => {
+        setupMessageStorage(COMPLETION_GATE_SESSION, "atlas")
+      })
+
+      afterEach(() => {
+        cleanupMessageStorage(COMPLETION_GATE_SESSION)
+      })
+
+      test("should include completion gate before Subagent Response in transformed boulder output", async () => {
+        // given - Atlas caller with boulder state
+        const planPath = join(TEST_DIR, "test-plan.md")
+        writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
+
+        const state: BoulderState = {
+          active_plan: planPath,
+          started_at: "2026-01-02T10:00:00Z",
+          session_ids: ["session-1"],
+          plan_name: "test-plan",
+        }
+        writeBoulderState(TEST_DIR, state)
+
+        const hook = createAtlasHook(createMockPluginInput())
+        const output = {
+          title: "Sisyphus Task",
+          output: "Task completed successfully",
+          metadata: {},
+        }
+
+        // when
+        await hook["tool.execute.after"](
+          { tool: "task", sessionID: COMPLETION_GATE_SESSION },
+          output
+        )
+
+        // then - completion gate should appear BEFORE Subagent Response
+        const subagentResponseIndex = output.output.indexOf("**Subagent Response:**")
+        const completionGateIndex = output.output.indexOf("COMPLETION GATE")
+
+        expect(completionGateIndex).toBeGreaterThanOrEqual(0)
+        expect(subagentResponseIndex).toBeGreaterThanOrEqual(0)
+        expect(completionGateIndex).toBeLessThan(subagentResponseIndex)
+      })
+
+      test("should include completion gate before verification phase text", async () => {
+        // given - Atlas caller with boulder state
+        const planPath = join(TEST_DIR, "test-plan.md")
+        writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
+
+        const state: BoulderState = {
+          active_plan: planPath,
+          started_at: "2026-01-02T10:00:00Z",
+          session_ids: ["session-1"],
+          plan_name: "test-plan",
+        }
+        writeBoulderState(TEST_DIR, state)
+
+        const hook = createAtlasHook(createMockPluginInput())
+        const output = {
+          title: "Sisyphus Task",
+          output: "Task completed successfully",
+          metadata: {},
+        }
+
+        // when
+        await hook["tool.execute.after"](
+          { tool: "task", sessionID: COMPLETION_GATE_SESSION },
+          output
+        )
+
+        // then - completion gate should appear BEFORE verification phase text
+        const completionGateIndex = output.output.indexOf("COMPLETION GATE")
+        const lyingIndex = output.output.indexOf("LYING")
+        const phase1Index = output.output.indexOf("PHASE 1")
+
+        expect(completionGateIndex).toBeGreaterThanOrEqual(0)
+        expect(lyingIndex).toBeGreaterThanOrEqual(0)
+        expect(completionGateIndex).toBeLessThan(lyingIndex)
+        if (phase1Index !== -1) {
+          expect(completionGateIndex).toBeLessThan(phase1Index)
+        }
+      })
+
+      test("should not contain old STEP 7 MARK COMPLETION IN PLAN FILE text", async () => {
+        // given - Atlas caller with boulder state
+        const planPath = join(TEST_DIR, "test-plan.md")
+        writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
+
+        const state: BoulderState = {
+          active_plan: planPath,
+          started_at: "2026-01-02T10:00:00Z",
+          session_ids: ["session-1"],
+          plan_name: "test-plan",
+        }
+        writeBoulderState(TEST_DIR, state)
+
+        const hook = createAtlasHook(createMockPluginInput())
+        const output = {
+          title: "Sisyphus Task",
+          output: "Task completed successfully",
+          metadata: {},
+        }
+
+        // when
+        await hook["tool.execute.after"](
+          { tool: "task", sessionID: COMPLETION_GATE_SESSION },
+          output
+        )
+
+        // then - old STEP 7 MARK COMPLETION IN PLAN FILE should be absent
+        expect(output.output).not.toContain("STEP 7: MARK COMPLETION IN PLAN FILE")
+        expect(output.output).not.toContain("MARK COMPLETION IN PLAN FILE")
+      })
+    })
+
    describe("Write/Edit tool direct work reminder", () => {
      const ORCHESTRATOR_SESSION = "orchestrator-write-test"

--- a/src/hooks/atlas/session-last-agent.sqlite.test.ts
+++ b/src/hooks/atlas/session-last-agent.sqlite.test.ts
@@ -0,0 +1,46 @@
+const { describe, expect, mock, test } = require("bun:test")
+
+mock.module("../../shared", () => ({
+  getMessageDir: () => null,
+  isSqliteBackend: () => true,
+  normalizeSDKResponse: <TData>(response: { data?: TData }, fallback: TData): TData => response.data ?? fallback,
+}))
+
+const { getLastAgentFromSession } = await import("./session-last-agent")
+
+function createMockClient(messages: Array<{ info?: { agent?: string } }>) {
+  return {
+    session: {
+      messages: async () => ({ data: messages }),
+    },
+  }
+}
+
+describe("getLastAgentFromSession sqlite branch", () => {
+  test("should skip compaction and return the previous real agent from sqlite messages", async () => {
+    // given
+    const client = createMockClient([
+      { info: { agent: "atlas" } },
+      { info: { agent: "compaction" } },
+    ])
+
+    // when
+    const result = await getLastAgentFromSession("ses_sqlite_compaction", client)
+
+    // then
+    expect(result).toBe("atlas")
+  })
+
+  test("should return null when sqlite history contains only compaction", async () => {
+    // given
+    const client = createMockClient([{ info: { agent: "compaction" } }])
+
+    // when
+    const result = await getLastAgentFromSession("ses_sqlite_only_compaction", client)
+
+    // then
+    expect(result).toBeNull()
+  })
+})
+
+export {}
--- a/src/hooks/atlas/session-last-agent.ts
+++ b/src/hooks/atlas/session-last-agent.ts
@@ -1,24 +1,65 @@
-import type { PluginInput } from "@opencode-ai/plugin"
+import { readFileSync, readdirSync } from "node:fs"
+import { join } from "node:path"

-import { findNearestMessageWithFields } from "../../features/hook-message-injector"
-import { findNearestMessageWithFieldsFromSDK } from "../../features/hook-message-injector"
-import { getMessageDir, isSqliteBackend } from "../../shared"
+import { getMessageDir, isSqliteBackend, normalizeSDKResponse } from "../../shared"

-type OpencodeClient = PluginInput["client"]
+type SessionMessagesClient = {
+  session: {
+    messages: (input: { path: { id: string } }) => Promise<unknown>
+  }
+}
+
+function isCompactionAgent(agent: unknown): boolean {
+  return typeof agent === "string" && agent.toLowerCase() === "compaction"
+}
+
+function getLastAgentFromMessageDir(messageDir: string): string | null {
+  try {
+    const files = readdirSync(messageDir)
+      .filter((fileName) => fileName.endsWith(".json"))
+      .sort()
+      .reverse()
+
+    for (const fileName of files) {
+      try {
+        const content = readFileSync(join(messageDir, fileName), "utf-8")
+        const parsed = JSON.parse(content) as { agent?: unknown }
+        if (typeof parsed.agent === "string" && !isCompactionAgent(parsed.agent)) {
+          return parsed.agent.toLowerCase()
+        }
+      } catch {
+        continue
+      }
+    }
+  } catch {
+    return null
+  }
+
+  return null
+}

 export async function getLastAgentFromSession(
  sessionID: string,
-  client?: OpencodeClient
+  client?: SessionMessagesClient
 ): Promise<string | null> {
-  let nearest = null
-
  if (isSqliteBackend() && client) {
-    nearest = await findNearestMessageWithFieldsFromSDK(client, sessionID)
-  } else {
-    const messageDir = getMessageDir(sessionID)
-    if (!messageDir) return null
-    nearest = findNearestMessageWithFields(messageDir)
+    const response = await client.session.messages({ path: { id: sessionID } })
+    const messages = normalizeSDKResponse(response, [] as Array<{ info?: { agent?: string } }>, {
+      preferResponseOnMissingData: true,
+    })
+
+    for (let i = messages.length - 1; i >= 0; i--) {
+      const agent = messages[i].info?.agent
+      if (typeof agent === "string" && !isCompactionAgent(agent)) {
+        return agent.toLowerCase()
+      }
+    }
+
+    return null
  }

-  return nearest?.agent?.toLowerCase() ?? null
+  const messageDir = getMessageDir(sessionID)
+  if (!messageDir) return null
+
+  return getLastAgentFromMessageDir(messageDir)
 }
--- a/src/hooks/atlas/system-reminder-templates.test.ts
+++ b/src/hooks/atlas/system-reminder-templates.test.ts
@@ -0,0 +1,37 @@
+import { describe, it, expect } from "bun:test"
+import { BOULDER_CONTINUATION_PROMPT } from "./system-reminder-templates"
+
+describe("BOULDER_CONTINUATION_PROMPT", () => {
+  describe("checkbox-first priority rules", () => {
+    it("first rule after RULES: mentions both reading the plan AND marking a still-unchecked completed task", () => {
+      const rulesSection = BOULDER_CONTINUATION_PROMPT.split("RULES:")[1]!
+      const firstRule = rulesSection.split("\n")[1]!.trim()
+
+      expect(firstRule).toContain("Read the plan")
+      expect(firstRule).toContain("mark")
+      expect(firstRule).toContain("completed")
+    })
+
+    it("first rule includes IMMEDIATELY keyword", () => {
+      const rulesSection = BOULDER_CONTINUATION_PROMPT.split("RULES:")[1]!
+      const firstRule = rulesSection.split("\n")[1]!.trim()
+
+      expect(firstRule).toContain("IMMEDIATELY")
+    })
+
+    it("checkbox-marking guidance appears BEFORE Proceed without asking for permission", () => {
+      const rulesSection = BOULDER_CONTINUATION_PROMPT.split("RULES:")[1]!
+
+      const checkboxMarkingMatch = rulesSection.match(/- \[x\]/i)
+      const proceedMatch = rulesSection.match(/Proceed without asking for permission/)
+
+      expect(checkboxMarkingMatch).not.toBeNull()
+      expect(proceedMatch).not.toBeNull()
+
+      const checkboxPosition = checkboxMarkingMatch!.index
+      const proceedPosition = proceedMatch!.index
+
+      expect(checkboxPosition).toBeLessThan(proceedPosition)
+    })
+  })
+})
--- a/src/hooks/atlas/system-reminder-templates.ts
+++ b/src/hooks/atlas/system-reminder-templates.ts
@@ -33,9 +33,8 @@ export const BOULDER_CONTINUATION_PROMPT = `${createSystemDirective(SystemDirect
 You have an active work plan with incomplete tasks. Continue working.

 RULES:
- **FIRST**: Read the plan file NOW to check exact current progress — count remaining \`- [ ]\` tasks
+- **FIRST**: Read the plan file NOW. If the last completed task is still unchecked, mark it \`- [x]\` IMMEDIATELY before anything else
 - Proceed without asking for permission
- Change \`- [ ]\` to \`- [x]\` in the plan file when done
 - Use the notepad at .sisyphus/notepads/{PLAN_NAME}/ to record learnings
 - Do not stop until all tasks are complete
 - If blocked, document the blocker and move to the next task`
--- a/src/hooks/atlas/tool-execute-after.ts
+++ b/src/hooks/atlas/tool-execute-after.ts
@@ -7,7 +7,7 @@ import { HOOK_NAME } from "./hook-name"
 import { DIRECT_WORK_REMINDER } from "./system-reminder-templates"
 import { isSisyphusPath } from "./sisyphus-path"
 import { extractSessionIdFromOutput } from "./subagent-session-id"
-import { buildOrchestratorReminder, buildStandaloneVerificationReminder } from "./verification-reminders"
+import { buildCompletionGate, buildOrchestratorReminder, buildStandaloneVerificationReminder } from "./verification-reminders"
 import { isWriteOrEditToolName } from "./write-edit-tool-policy"
 import type { ToolExecuteAfterInput, ToolExecuteAfterOutput } from "./types"

@@ -76,7 +76,11 @@ export function createToolExecuteAfterHandler(input: {
        // Preserve original subagent response - critical for debugging failed tasks
        const originalResponse = toolOutput.output

-toolOutput.output = `
+        toolOutput.output = `
+<system-reminder>
+${buildCompletionGate(boulderState.plan_name, subagentSessionId)}
+</system-reminder>
+
 ## SUBAGENT WORK COMPLETED

 ${fileChanges}
@@ -88,7 +92,7 @@ ${fileChanges}
 ${originalResponse}

 <system-reminder>
-${buildOrchestratorReminder(boulderState.plan_name, progress, subagentSessionId, autoCommit)}
+${buildOrchestratorReminder(boulderState.plan_name, progress, subagentSessionId, autoCommit, false)}
 </system-reminder>`
        log(`[${HOOK_NAME}] Output transformed for orchestrator mode (boulder)`, {
          plan: boulderState.plan_name,
--- a/src/hooks/atlas/verification-reminders.test.ts
+++ b/src/hooks/atlas/verification-reminders.test.ts
@@ -0,0 +1,94 @@
+import { describe, expect, it } from "bun:test"
+import { buildOrchestratorReminder, buildCompletionGate } from "./verification-reminders"
+
+// Test helpers for given/when/then pattern
+const given = describe
+const when = describe
+const then = it
+
+describe("buildCompletionGate", () => {
+  given("a plan name and session id", () => {
+    const planName = "test-plan"
+    const sessionId = "test-session-123"
+
+    when("buildCompletionGate is called", () => {
+      const gate = buildCompletionGate(planName, sessionId)
+
+      then("completion gate text is present", () => {
+        expect(gate).toContain("COMPLETION GATE")
+      })
+
+      then("gate appears before verification phase text", () => {
+        const gateIndex = gate.indexOf("COMPLETION GATE")
+        const verificationIndex = gate.indexOf("VERIFICATION_REMINDER")
+        expect(gateIndex).toBeLessThan(verificationIndex)
+      })
+
+      then("gate interpolates the plan name path", () => {
+        expect(gate).toContain(planName)
+        expect(gate).toContain(`.sisyphus/plans/${planName}.md`)
+      })
+
+      then("gate includes Edit instructions", () => {
+        expect(gate.toLowerCase()).toContain("edit")
+      })
+
+      then("gate includes Read instructions", () => {
+        expect(gate.toLowerCase()).toContain("read")
+      })
+
+      then("old STEP 7 MARK COMPLETION text is absent", () => {
+        expect(gate).not.toContain("STEP 7")
+        expect(gate).not.toContain("MARK COMPLETION IN PLAN FILE")
+      })
+
+      then("step numbering remains consecutive after removal", () => {
+        const stepMatches = gate.match(/STEP \d+:/g) ?? []
+        if (stepMatches.length > 1) {
+          const numbers = stepMatches.map((s: string) => parseInt(s.match(/\d+/)?.[0] ?? "0"))
+          for (let i = 1; i < numbers.length; i++) {
+            expect(numbers[i]).toBe(numbers[i - 1] + 1)
+          }
+        }
+      })
+    })
+  })
+})
+
+describe("buildOrchestratorReminder", () => {
+  given("progress with completed tasks", () => {
+    const planName = "my-test-plan"
+    const sessionId = "session-abc"
+    const progress = { total: 10, completed: 3 }
+
+    when("buildOrchestratorReminder is called with autoCommit true", () => {
+      const reminder = buildOrchestratorReminder(planName, progress, sessionId, true)
+
+      then("old STEP 7 MARK COMPLETION IN PLAN FILE text is absent", () => {
+        expect(reminder).not.toContain("STEP 7: MARK COMPLETION IN PLAN FILE")
+      })
+
+      then("completion gate appears before verification reminder", () => {
+        const gateIndex = reminder.indexOf("COMPLETION GATE")
+        const verificationIndex = reminder.indexOf("VERIFICATION_REMINDER")
+        expect(gateIndex).toBeGreaterThanOrEqual(0)
+        expect(gateIndex).toBeLessThan(verificationIndex)
+      })
+    })
+
+    when("buildOrchestratorReminder is called with autoCommit false", () => {
+      const reminder = buildOrchestratorReminder(planName, progress, sessionId, false)
+
+      then("old STEP 7 MARK COMPLETION IN PLAN FILE text is absent", () => {
+        expect(reminder).not.toContain("STEP 7: MARK COMPLETION IN PLAN FILE")
+      })
+
+      then("completion gate appears before verification reminder", () => {
+        const gateIndex = reminder.indexOf("COMPLETION GATE")
+        const verificationIndex = reminder.indexOf("VERIFICATION_REMINDER")
+        expect(gateIndex).toBeGreaterThanOrEqual(0)
+        expect(gateIndex).toBeLessThan(verificationIndex)
+      })
+    })
+  })
+})
--- a/src/hooks/atlas/verification-reminders.ts
+++ b/src/hooks/atlas/verification-reminders.ts
@@ -1,7 +1,37 @@
 import { VERIFICATION_REMINDER } from "./system-reminder-templates"

+export function buildCompletionGate(planName: string, sessionId: string): string {
+  return `
+**COMPLETION GATE — DO NOT PROCEED UNTIL THIS IS DONE**
+
+Your completion will NOT be recorded until you complete ALL of the following:
+
+1. **Edit** the plan file \`.sisyphus/plans/${planName}.md\`:
+   - Change \`- [ ]\` to \`- [x]\` for the completed task
+   - Use \`Edit\` tool to modify the checkbox
+
+2. **Read** the plan file AGAIN:
+   \`\`\`
+   Read(".sisyphus/plans/${planName}.md")
+   \`\`\`
+   - Verify the checkbox count changed (more \`- [x]\` than before)
+
+3. **DO NOT call \`task()\` again** until you have completed steps 1 and 2 above.
+
+If anything fails while closing this out, resume the same session immediately:
+\`\`\`typescript
+task(session_id="${sessionId}", prompt="fix: checkbox not recorded correctly")
+\`\`\`
+
+**Your completion is NOT tracked until the checkbox is marked in the plan file.**
+
+**VERIFICATION_REMINDER**`
+}
+
 function buildVerificationReminder(sessionId: string): string {
-  return `${VERIFICATION_REMINDER}
+  return `**VERIFICATION_REMINDER**
+
+${VERIFICATION_REMINDER}

 ---

@@ -15,20 +45,21 @@ export function buildOrchestratorReminder(
  planName: string,
  progress: { total: number; completed: number },
  sessionId: string,
-  autoCommit: boolean = true
+  autoCommit: boolean = true,
+  includeCompletionGate: boolean = true
 ): string {
  const remaining = progress.total - progress.completed
-  
+
  const commitStep = autoCommit
    ? `
-**STEP 8: COMMIT ATOMIC UNIT**
+**STEP 7: COMMIT ATOMIC UNIT**

 - Stage ONLY the verified changes
 - Commit with clear message describing what was done
 `
    : ""

-  const nextStepNumber = autoCommit ? 9 : 8
+  const nextStepNumber = autoCommit ? 8 : 7

  return `
 ---
@@ -37,7 +68,9 @@ export function buildOrchestratorReminder(

 ---

-${buildVerificationReminder(sessionId)}
+${includeCompletionGate ? `${buildCompletionGate(planName, sessionId)}
+
+` : ""}${buildVerificationReminder(sessionId)}

 **STEP 5: READ SUBAGENT NOTEPAD (LEARNINGS, ISSUES, PROBLEMS)**

@@ -64,22 +97,13 @@ Read(".sisyphus/plans/${planName}.md")
 Count exactly: how many \`- [ ]\` remain? How many \`- [x]\` completed?
 This is YOUR ground truth. Use it to decide what comes next.

-**STEP 7: MARK COMPLETION IN PLAN FILE (IMMEDIATELY)**
-
-RIGHT NOW - Do not delay. Verification passed → Mark IMMEDIATELY.
-
-Update the plan file \`.sisyphus/plans/${planName}.md\`:
- Change \`- [ ]\` to \`- [x]\` for the completed task
- Use \`Edit\` tool to modify the checkbox
-
-**DO THIS BEFORE ANYTHING ELSE. Unmarked = Untracked = Lost progress.**
 ${commitStep}
 **STEP ${nextStepNumber}: PROCEED TO NEXT TASK**

 - Read the plan file AGAIN to identify the next \`- [ ]\` task
 - Start immediately - DO NOT STOP

-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

 **${remaining} tasks remain. Keep bouldering.**`
 }
--- a/src/hooks/keyword-detector/ultrawork/default.ts
+++ b/src/hooks/keyword-detector/ultrawork/default.ts
@@ -202,7 +202,7 @@ BEFORE writing ANY code, you MUST define:
 | **Observable** | What can be measured/seen | "Console shows 'success', no errors" |
 | **Pass/Fail** | Binary, no ambiguity | "Returns 200 OK" not "should work" |

-Write these criteria explicitly. Share with user if scope is non-trivial.
+Write these criteria explicitly. **Record them in your TODO/Task items.** Each task MUST include a "QA: [how to verify]" field. These criteria are your CONTRACT — work toward them, verify against them.

 ### Test Plan Template (MANDATORY for non-trivial tasks)

@@ -228,6 +228,32 @@ Write these criteria explicitly. Share with user if scope is non-trivial.

 **WITHOUT evidence = NOT verified = NOT done.**

+<MANUAL_QA_MANDATE>
+### YOU MUST EXECUTE MANUAL QA YOURSELF. THIS IS NOT OPTIONAL.
+
+**YOUR FAILURE MODE**: You finish coding, run lsp_diagnostics, and declare "done" without actually TESTING the feature. lsp_diagnostics catches type errors, NOT functional bugs. Your work is NOT verified until you MANUALLY test it.
+
+**WHAT MANUAL QA MEANS — execute ALL that apply:**
+
+| If your change... | YOU MUST... |
+|---|---|
+| Adds/modifies a CLI command | Run the command with Bash. Show the output. |
+| Changes build output | Run the build. Verify the output files exist and are correct. |
+| Modifies API behavior | Call the endpoint. Show the response. |
+| Changes UI rendering | Describe what renders. Use a browser tool if available. |
+| Adds a new tool/hook/feature | Test it end-to-end in a real scenario. |
+| Modifies config handling | Load the config. Verify it parses correctly. |
+
+**UNACCEPTABLE QA CLAIMS:**
+- "This should work" — RUN IT.
+- "The types check out" — Types don't catch logic bugs. RUN IT.
+- "lsp_diagnostics is clean" — That's a TYPE check, not a FUNCTIONAL check. RUN IT.
+- "Tests pass" — Tests cover known cases. Does the ACTUAL FEATURE work as the user expects? RUN IT.
+
+**You have Bash, you have tools. There is ZERO excuse for not running manual QA.**
+**Manual QA is the FINAL gate before reporting completion. Skip it and your work is INCOMPLETE.**
+</MANUAL_QA_MANDATE>
+
 ### TDD Workflow (when test infrastructure exists)

 1. **SPEC**: Define what "working" means (success criteria above)
--- a/src/hooks/keyword-detector/ultrawork/gemini.ts
+++ b/src/hooks/keyword-detector/ultrawork/gemini.ts
@@ -236,6 +236,33 @@ task(subagent_type="plan", load_skills=[], prompt="<gathered context + user requ
 If ANY answer is no → GO BACK AND DO IT. Do not claim completion.
 </ANTI_OPTIMISM_CHECKPOINT>

+<MANUAL_QA_MANDATE>
+### YOU MUST EXECUTE MANUAL QA. THIS IS NOT OPTIONAL. DO NOT SKIP THIS.
+
+**YOUR FAILURE MODE**: You run lsp_diagnostics, see zero errors, and declare victory. lsp_diagnostics catches TYPE errors. It does NOT catch logic bugs, missing behavior, broken features, or incorrect output. Your work is NOT verified until you MANUALLY TEST the actual feature.
+
+**AFTER every implementation, you MUST:**
+
+1. **Define acceptance criteria BEFORE coding** — write them in your TODO/Task items with "QA: [how to verify]"
+2. **Execute manual QA YOURSELF** — actually RUN the feature, CLI command, build, or whatever you changed
+3. **Report what you observed** — show actual output, not claims
+
+| If your change... | YOU MUST... |
+|---|---|
+| Adds/modifies a CLI command | Run the command with Bash. Show the output. |
+| Changes build output | Run the build. Verify output files exist and are correct. |
+| Modifies API behavior | Call the endpoint. Show the response. |
+| Adds a new tool/hook/feature | Test it end-to-end in a real scenario. |
+| Modifies config handling | Load the config. Verify it parses correctly. |
+
+**UNACCEPTABLE (WILL BE REJECTED):**
+- "This should work" — DID YOU RUN IT? NO? THEN RUN IT.
+- "lsp_diagnostics is clean" — That is a TYPE check, not a FUNCTIONAL check. RUN THE FEATURE.
+- "Tests pass" — Tests cover known cases. Does the ACTUAL feature work? VERIFY IT MANUALLY.
+
+**You have Bash, you have tools. There is ZERO excuse for skipping manual QA.**
+</MANUAL_QA_MANDATE>
+
 **WITHOUT evidence = NOT verified = NOT done.**

 ## ZERO TOLERANCE FAILURES
--- a/src/hooks/keyword-detector/ultrawork/gpt.ts
+++ b/src/hooks/keyword-detector/ultrawork/gpt.ts
@@ -118,6 +118,14 @@ deep_context = background_output(task_id=...)
 - \`lsp_diagnostics\` on modified files
 - Run tests if available

+## ACCEPTANCE CRITERIA WORKFLOW
+
+**BEFORE implementation**, define what "done" means in concrete, binary terms:
+
+1. Write acceptance criteria as pass/fail conditions (not "should work" — specific observable outcomes)
+2. Record them in your TODO/Task items with a "QA: [how to verify]" field
+3. Work toward those criteria, not just "finishing code"
+
 ## QUALITY STANDARDS

 | Phase | Action | Required Evidence |
@@ -125,6 +133,25 @@ deep_context = background_output(task_id=...)
 | Build | Run build command | Exit code 0 |
 | Test | Execute test suite | All tests pass |
 | Lint | Run lsp_diagnostics | Zero new errors |
+| **Manual QA** | **Execute the feature yourself** | **Actual output shown** |
+
+<MANUAL_QA_MANDATE>
+### MANUAL QA IS MANDATORY. lsp_diagnostics IS NOT ENOUGH.
+
+lsp_diagnostics catches type errors. It does NOT catch logic bugs, missing behavior, or broken features. After EVERY implementation, you MUST manually test the actual feature.
+
+**Execute ALL that apply:**
+
+| If your change... | YOU MUST... |
+|---|---|
+| Adds/modifies a CLI command | Run the command with Bash. Show the output. |
+| Changes build output | Run the build. Verify output files. |
+| Modifies API behavior | Call the endpoint. Show the response. |
+| Adds a new tool/hook/feature | Test it end-to-end in a real scenario. |
+| Modifies config handling | Load the config. Verify it parses correctly. |
+
+**"This should work" is NOT evidence. RUN IT. Show what happened. That is evidence.**
+</MANUAL_QA_MANDATE>

 ## COMPLETION CRITERIA

@@ -133,6 +160,7 @@ A task is complete when:
 2. lsp_diagnostics shows zero errors on modified files
 3. Tests pass (or pre-existing failures documented)
 4. Code matches existing codebase patterns
+5. **Manual QA executed — actual feature tested, output observed and reported**

 **Deliver exactly what was asked. No more, no less.**

--- a/src/hooks/keyword-detector/ultrawork/source-detector.ts
+++ b/src/hooks/keyword-detector/ultrawork/source-detector.ts
@@ -3,7 +3,7 @@
 *
 * Routing logic:
 * 1. Planner agents (prometheus, plan) → planner.ts
- * 2. GPT 5.2 models → gpt5.2.ts
+ * 2. GPT 5.4 models → gpt5.4.ts
 * 3. Gemini models → gemini.ts
 * 4. Everything else (Claude, etc.) → default.ts
 */
--- a/src/hooks/model-fallback/hook.test.ts
+++ b/src/hooks/model-fallback/hook.test.ts
@@ -134,8 +134,8 @@ describe("model fallback hook", () => {

    //#then - chain should progress to entry[1], not repeat entry[0]
    expect(secondOutput.message["model"]).toEqual({
-      providerID: "zai-coding-plan",
-      modelID: "glm-5",
+      providerID: "kimi-for-coding",
+      modelID: "k2p5",
    })
    expect(secondOutput.message["variant"]).toBeUndefined()
  })
--- a/src/hooks/no-sisyphus-gpt/index.test.ts
+++ b/src/hooks/no-sisyphus-gpt/index.test.ts
@@ -104,7 +104,7 @@ describe("no-sisyphus-gpt hook", () => {
    await hook["chat.message"]?.({
      sessionID: "ses_3",
      agent: HEPHAESTUS_DISPLAY,
-      model: { providerID: "openai", modelID: "gpt-5.2" },
+      model: { providerID: "openai", modelID: "gpt-5.4" },
    }, output)

    // then - no toast
@@ -126,7 +126,7 @@ describe("no-sisyphus-gpt hook", () => {
    // when - chat.message runs without input.agent
    await hook["chat.message"]?.({
      sessionID: "ses_4",
-      model: { providerID: "openai", modelID: "gpt-5.2" },
+      model: { providerID: "openai", modelID: "gpt-4o" },
    }, output)

    // then - toast shown via session-agent fallback
--- a/src/hooks/ralph-loop/completion-handler.ts
+++ b/src/hooks/ralph-loop/completion-handler.ts
@@ -0,0 +1,61 @@
+import type { PluginInput } from "@opencode-ai/plugin"
+import { log } from "../../shared/logger"
+import { buildContinuationPrompt } from "./continuation-prompt-builder"
+import { HOOK_NAME } from "./constants"
+import { injectContinuationPrompt } from "./continuation-prompt-injector"
+import type { RalphLoopState } from "./types"
+
+type LoopStateController = {
+	clear: () => boolean
+	markVerificationPending: (sessionID: string) => RalphLoopState | null
+}
+
+export async function handleDetectedCompletion(
+	ctx: PluginInput,
+	input: {
+		sessionID: string
+		state: RalphLoopState
+		loopState: LoopStateController
+		directory: string
+		apiTimeoutMs: number
+	},
+): Promise<void> {
+	const { sessionID, state, loopState, directory, apiTimeoutMs } = input
+
+	if (state.ultrawork && !state.verification_pending) {
+		const verificationState = loopState.markVerificationPending(sessionID)
+		if (!verificationState) {
+			log(`[${HOOK_NAME}] Failed to transition ultrawork loop to verification`, {
+				sessionID,
+			})
+			return
+		}
+
+		await injectContinuationPrompt(ctx, {
+			sessionID,
+			prompt: buildContinuationPrompt(verificationState),
+			directory,
+			apiTimeoutMs,
+		})
+
+		await ctx.client.tui?.showToast?.({
+			body: {
+				title: "ULTRAWORK LOOP",
+				message: "DONE detected. Oracle verification is now required.",
+				variant: "info",
+				duration: 5000,
+			},
+		}).catch(() => {})
+		return
+	}
+
+	loopState.clear()
+
+	const title = state.ultrawork ? "ULTRAWORK LOOP COMPLETE!" : "Ralph Loop Complete!"
+	const message = state.ultrawork
+		? `JUST ULW ULW! Task completed after ${state.iteration} iteration(s)`
+		: `Task completed after ${state.iteration} iteration(s)`
+	await ctx.client.tui?.showToast?.({
+		body: { title, message, variant: "success", duration: 5000 },
+	}).catch(() => {})
+}
--- a/src/hooks/ralph-loop/completion-promise-detector.ts
+++ b/src/hooks/ralph-loop/completion-promise-detector.ts
@@ -20,6 +20,7 @@ function buildPromisePattern(promise: string): RegExp {
 export function detectCompletionInTranscript(
 	transcriptPath: string | undefined,
 	promise: string,
+	startedAt?: string,
 ): boolean {
 	if (!transcriptPath) return false

@@ -32,8 +33,9 @@ export function detectCompletionInTranscript(

 		for (const line of lines) {
 			try {
-				const entry = JSON.parse(line) as { type?: string }
+				const entry = JSON.parse(line) as { type?: string; timestamp?: string }
 				if (entry.type === "user") continue
+				if (startedAt && entry.timestamp && entry.timestamp < startedAt) continue
 				if (pattern.test(line)) return true
 			} catch {
 				continue
--- a/src/hooks/ralph-loop/constants.ts
+++ b/src/hooks/ralph-loop/constants.ts
@@ -3,3 +3,4 @@ export const DEFAULT_STATE_FILE = ".sisyphus/ralph-loop.local.md"
 export const COMPLETION_TAG_PATTERN = /<promise>(.*?)<\/promise>/is
 export const DEFAULT_MAX_ITERATIONS = 100
 export const DEFAULT_COMPLETION_PROMISE = "DONE"
+export const ULTRAWORK_VERIFICATION_PROMISE = "VERIFIED"
--- a/src/hooks/ralph-loop/continuation-prompt-builder.ts
+++ b/src/hooks/ralph-loop/continuation-prompt-builder.ts
@@ -1,6 +1,10 @@
 import { SYSTEM_DIRECTIVE_PREFIX } from "../../shared/system-directive"
 import type { RalphLoopState } from "./types"

+function getMaxIterationsLabel(state: RalphLoopState): string {
+	return typeof state.max_iterations === "number" ? String(state.max_iterations) : "unbounded"
+}
+
 const CONTINUATION_PROMPT = `${SYSTEM_DIRECTIVE_PREFIX} - RALPH LOOP {{ITERATION}}/{{MAX}}]

 Your previous attempt did not output the completion promise. Continue working on the task.
@@ -14,12 +18,55 @@ IMPORTANT:
 Original task:
 {{PROMPT}}`

+const ULTRAWORK_VERIFICATION_PROMPT = `${SYSTEM_DIRECTIVE_PREFIX} - ULTRAWORK LOOP VERIFICATION {{ITERATION}}/{{MAX}}]
+
+You already emitted <promise>{{INITIAL_PROMISE}}</promise>. This does NOT finish the loop yet.
+
+REQUIRED NOW:
+- Call Oracle using task(subagent_type="oracle", load_skills=[], run_in_background=false, ...)
+- Ask Oracle to verify whether the original task is actually complete
+- The system will inspect the Oracle session directly for the verification result
+- If Oracle does not verify, continue fixing the task and do not consider it complete
+
+Original task:
+{{PROMPT}}`
+
+const ULTRAWORK_VERIFICATION_FAILED_PROMPT = `${SYSTEM_DIRECTIVE_PREFIX} - ULTRAWORK LOOP VERIFICATION FAILED {{ITERATION}}/{{MAX}}]
+
+Oracle did not emit <promise>VERIFIED</promise>. Verification failed.
+
+REQUIRED NOW:
+- Verification failed. Fix the task until Oracle's review is satisfied
+- Oracle does not lie. Treat the verification result as ground truth
+- Do not claim completion early or argue with the failed verification
+- After fixing the remaining issues, request Oracle review again using task(subagent_type="oracle", load_skills=[], run_in_background=false, ...)
+- Only when the work is ready for review again, output: <promise>{{PROMISE}}</promise>
+
+Original task:
+{{PROMPT}}`
+
 export function buildContinuationPrompt(state: RalphLoopState): string {
-	const continuationPrompt = CONTINUATION_PROMPT.replace(
+	const template = state.verification_pending
+		? ULTRAWORK_VERIFICATION_PROMPT
+		: CONTINUATION_PROMPT
+	const continuationPrompt = template.replace(
 		"{{ITERATION}}",
 		String(state.iteration),
 	)
-		.replace("{{MAX}}", String(state.max_iterations))
+		.replace("{{MAX}}", getMaxIterationsLabel(state))
+		.replace("{{INITIAL_PROMISE}}", state.initial_completion_promise ?? state.completion_promise)
+		.replace("{{PROMISE}}", state.completion_promise)
+		.replace("{{PROMPT}}", state.prompt)
+
+	return state.ultrawork ? `ultrawork ${continuationPrompt}` : continuationPrompt
+}
+
+export function buildVerificationFailurePrompt(state: RalphLoopState): string {
+	const continuationPrompt = ULTRAWORK_VERIFICATION_FAILED_PROMPT.replace(
+		"{{ITERATION}}",
+		String(state.iteration),
+	)
+		.replace("{{MAX}}", getMaxIterationsLabel(state))
 		.replace("{{PROMISE}}", state.completion_promise)
 		.replace("{{PROMPT}}", state.prompt)

--- a/src/hooks/ralph-loop/index.test.ts
+++ b/src/hooks/ralph-loop/index.test.ts
@@ -1078,7 +1078,7 @@ Original task: Build something`
      expect(messagesCalls.length).toBe(1)
    })

-    test("should show ultrawork completion toast", async () => {
+    test("should require oracle verification toast for ultrawork completion promise", async () => {
      // given - hook with ultrawork mode and completion in transcript
      const transcriptPath = join(TEST_DIR, "transcript.jsonl")
      const hook = createRalphLoopHook(createMockPluginInput(), {
@@ -1090,10 +1090,9 @@ Original task: Build something`
      // when - idle event triggered
      await hook.event({ event: { type: "session.idle", properties: { sessionID: "test-id" } } })

-      // then - ultrawork toast shown
-      const completionToast = toastCalls.find(t => t.title === "ULTRAWORK LOOP COMPLETE!")
-      expect(completionToast).toBeDefined()
-      expect(completionToast!.message).toMatch(/JUST ULW ULW!/)
+      const verificationToast = toastCalls.find(t => t.title === "ULTRAWORK LOOP")
+      expect(verificationToast).toBeDefined()
+      expect(verificationToast!.message).toMatch(/Oracle verification is now required/)
    })

    test("should show regular completion toast when ultrawork disabled", async () => {
--- a/src/hooks/ralph-loop/loop-state-controller.ts
+++ b/src/hooks/ralph-loop/loop-state-controller.ts
@@ -3,6 +3,7 @@ import {
 	DEFAULT_COMPLETION_PROMISE,
 	DEFAULT_MAX_ITERATIONS,
 	HOOK_NAME,
+	ULTRAWORK_VERIFICATION_PROMISE,
 } from "./constants"
 import { clearState, incrementIteration, readState, writeState } from "./storage"
 import { log } from "../../shared/logger"
@@ -28,18 +29,24 @@ export function createLoopStateController(options: {
 				strategy?: "reset" | "continue"
 			},
 		): boolean {
+			const initialCompletionPromise =
+				loopOptions?.completionPromise ??
+				DEFAULT_COMPLETION_PROMISE
 			const state: RalphLoopState = {
 				active: true,
 				iteration: 1,
-				max_iterations:
-					loopOptions?.maxIterations ??
-					config?.default_max_iterations ??
-					DEFAULT_MAX_ITERATIONS,
+				max_iterations: loopOptions?.ultrawork
+					? undefined
+					: loopOptions?.maxIterations ??
+						config?.default_max_iterations ??
+						DEFAULT_MAX_ITERATIONS,
 				message_count_at_start: loopOptions?.messageCountAtStart,
-				completion_promise:
-					loopOptions?.completionPromise ??
-					DEFAULT_COMPLETION_PROMISE,
+				completion_promise: initialCompletionPromise,
+				initial_completion_promise: initialCompletionPromise,
+				verification_attempt_id: undefined,
+				verification_session_id: undefined,
 				ultrawork: loopOptions?.ultrawork,
+				verification_pending: undefined,
 				strategy: loopOptions?.strategy ?? config?.default_strategy ?? "continue",
 				started_at: new Date().toISOString(),
 				prompt,
@@ -109,5 +116,62 @@ export function createLoopStateController(options: {

 			return state
 		},
+
+		markVerificationPending(sessionID: string): RalphLoopState | null {
+			const state = readState(directory, stateDir)
+			if (!state || state.session_id !== sessionID || !state.ultrawork) {
+				return null
+			}
+
+			state.verification_pending = true
+			state.completion_promise = ULTRAWORK_VERIFICATION_PROMISE
+			state.verification_attempt_id = undefined
+			state.verification_session_id = undefined
+			state.initial_completion_promise ??= DEFAULT_COMPLETION_PROMISE
+
+			if (!writeState(directory, state, stateDir)) {
+				return null
+			}
+
+			return state
+		},
+
+		setVerificationSessionID(sessionID: string, verificationSessionID: string): RalphLoopState | null {
+			const state = readState(directory, stateDir)
+			if (!state || state.session_id !== sessionID || !state.ultrawork || !state.verification_pending) {
+				return null
+			}
+
+			state.verification_session_id = verificationSessionID
+
+			if (!writeState(directory, state, stateDir)) {
+				return null
+			}
+
+			return state
+		},
+
+		restartAfterFailedVerification(sessionID: string, messageCountAtStart?: number): RalphLoopState | null {
+			const state = readState(directory, stateDir)
+			if (!state || state.session_id !== sessionID || !state.ultrawork || !state.verification_pending) {
+				return null
+			}
+
+			state.iteration += 1
+			state.started_at = new Date().toISOString()
+			state.completion_promise = state.initial_completion_promise ?? DEFAULT_COMPLETION_PROMISE
+			state.verification_pending = undefined
+			state.verification_attempt_id = undefined
+			state.verification_session_id = undefined
+			if (typeof messageCountAtStart === "number") {
+				state.message_count_at_start = messageCountAtStart
+			}
+
+			if (!writeState(directory, state, stateDir)) {
+				return null
+			}
+
+			return state
+		},
 	}
 }
--- a/src/hooks/ralph-loop/ralph-loop-event-handler.ts
+++ b/src/hooks/ralph-loop/ralph-loop-event-handler.ts
@@ -2,11 +2,14 @@ import type { PluginInput } from "@opencode-ai/plugin"
 import { log } from "../../shared/logger"
 import type { RalphLoopOptions, RalphLoopState } from "./types"
 import { HOOK_NAME } from "./constants"
+import { handleDetectedCompletion } from "./completion-handler"
 import {
 	detectCompletionInSessionMessages,
 	detectCompletionInTranscript,
 } from "./completion-promise-detector"
 import { continueIteration } from "./iteration-continuation"
+import { handleDeletedLoopSession, handleErroredLoopSession } from "./session-event-handler"
+import { handleFailedVerification } from "./verification-failure-handler"

 type SessionRecovery = {
 	isRecovering: (sessionID: string) => boolean
@@ -18,6 +21,9 @@ type LoopStateController = {
 	clear: () => boolean
 	incrementIteration: () => RalphLoopState | null
 	setSessionID: (sessionID: string) => RalphLoopState | null
+	markVerificationPending: (sessionID: string) => RalphLoopState | null
+	setVerificationSessionID: (sessionID: string, verificationSessionID: string) => RalphLoopState | null
+	restartAfterFailedVerification: (sessionID: string, messageCountAtStart?: number) => RalphLoopState | null
 }
 type RalphLoopEventHandlerOptions = { directory: string; apiTimeoutMs: number; getTranscriptPath: (sessionID: string) => string | undefined; checkSessionExists?: RalphLoopOptions["checkSessionExists"]; sessionRecovery: SessionRecovery; loopState: LoopStateController }

@@ -53,7 +59,13 @@ export function createRalphLoopEventHandler(
 					return
 				}

-				if (state.session_id && state.session_id !== sessionID) {
+				const verificationSessionID = state.verification_pending
+					? state.verification_session_id
+					: undefined
+				const matchesParentSession = state.session_id === undefined || state.session_id === sessionID
+				const matchesVerificationSession = verificationSessionID === sessionID
+
+				if (!matchesParentSession && !matchesVerificationSession && state.session_id) {
 					if (options.checkSessionExists) {
 						try {
 							const exists = await options.checkSessionExists(state.session_id)
@@ -75,10 +87,27 @@ export function createRalphLoopEventHandler(
 					return
 				}

-				const transcriptPath = options.getTranscriptPath(sessionID)
-				const completionViaTranscript = detectCompletionInTranscript(transcriptPath, state.completion_promise)
+				const completionSessionID = verificationSessionID ?? (state.verification_pending ? undefined : sessionID)
+				const transcriptPath = completionSessionID ? options.getTranscriptPath(completionSessionID) : undefined
+				const completionViaTranscript = completionSessionID
+					? detectCompletionInTranscript(
+						transcriptPath,
+						state.completion_promise,
+						state.started_at,
+					)
+					: false
 				const completionViaApi = completionViaTranscript
 					? false
+					: verificationSessionID
+						? await detectCompletionInSessionMessages(ctx, {
+							sessionID: verificationSessionID,
+							promise: state.completion_promise,
+							apiTimeoutMs: options.apiTimeoutMs,
+							directory: options.directory,
+							sinceMessageIndex: undefined,
+						})
+					: state.verification_pending
+						? false
 					: await detectCompletionInSessionMessages(ctx, {
 						sessionID,
 						promise: state.completion_promise,
@@ -96,15 +125,41 @@ export function createRalphLoopEventHandler(
 							? "transcript_file"
 							: "session_messages_api",
 					})
-					options.loopState.clear()
-
-					const title = state.ultrawork ? "ULTRAWORK LOOP COMPLETE!" : "Ralph Loop Complete!"
-					const message = state.ultrawork ? `JUST ULW ULW! Task completed after ${state.iteration} iteration(s)` : `Task completed after ${state.iteration} iteration(s)`
-					await ctx.client.tui?.showToast?.({ body: { title, message, variant: "success", duration: 5000 } }).catch(() => {})
+					await handleDetectedCompletion(ctx, {
+						sessionID,
+						state,
+						loopState: options.loopState,
+						directory: options.directory,
+						apiTimeoutMs: options.apiTimeoutMs,
+					})
 					return
 				}

-				if (state.iteration >= state.max_iterations) {
+				if (state.verification_pending) {
+					if (verificationSessionID && matchesVerificationSession) {
+						const restarted = await handleFailedVerification(ctx, {
+							state,
+							loopState: options.loopState,
+							directory: options.directory,
+							apiTimeoutMs: options.apiTimeoutMs,
+						})
+						if (restarted) {
+							return
+						}
+					}
+
+					log(`[${HOOK_NAME}] Waiting for oracle verification`, {
+						sessionID,
+						verificationSessionID,
+						iteration: state.iteration,
+					})
+					return
+				}
+
+				if (
+					typeof state.max_iterations === "number"
+					&& state.iteration >= state.max_iterations
+				) {
 					log(`[${HOOK_NAME}] Max iterations reached`, {
 						sessionID,
 						iteration: state.iteration,
@@ -133,7 +188,7 @@ export function createRalphLoopEventHandler(
 				await ctx.client.tui?.showToast?.({
 					body: {
 						title: "Ralph Loop",
-						message: `Iteration ${newState.iteration}/${newState.max_iterations}`,
+						message: `Iteration ${newState.iteration}/${typeof newState.max_iterations === "number" ? newState.max_iterations : "unbounded"}`,
 						variant: "info",
 						duration: 2000,
 					},
@@ -159,36 +214,12 @@ export function createRalphLoopEventHandler(
 		}

 		if (event.type === "session.deleted") {
-			const sessionInfo = props?.info as { id?: string } | undefined
-			if (!sessionInfo?.id) return
-			const state = options.loopState.getState()
-			if (state?.session_id === sessionInfo.id) {
-				options.loopState.clear()
-				log(`[${HOOK_NAME}] Session deleted, loop cleared`, { sessionID: sessionInfo.id })
-			}
-			options.sessionRecovery.clear(sessionInfo.id)
+			if (!handleDeletedLoopSession(props, options.loopState, options.sessionRecovery)) return
 			return
 		}

 		if (event.type === "session.error") {
-			const sessionID = props?.sessionID as string | undefined
-			const error = props?.error as { name?: string } | undefined
-
-			if (error?.name === "MessageAbortedError") {
-				if (sessionID) {
-					const state = options.loopState.getState()
-					if (state?.session_id === sessionID) {
-						options.loopState.clear()
-						log(`[${HOOK_NAME}] User aborted, loop cleared`, { sessionID })
-					}
-					options.sessionRecovery.clear(sessionID)
-				}
-				return
-			}
-
-			if (sessionID) {
-				options.sessionRecovery.markRecovering(sessionID)
-			}
+			handleErroredLoopSession(props, options.loopState, options.sessionRecovery)
 		}
 	}
 }
--- a/src/hooks/ralph-loop/session-event-handler.ts
+++ b/src/hooks/ralph-loop/session-event-handler.ts
@@ -0,0 +1,56 @@
+import { log } from "../../shared/logger"
+import { HOOK_NAME } from "./constants"
+import type { RalphLoopState } from "./types"
+
+type LoopStateController = {
+	getState: () => RalphLoopState | null
+	clear: () => boolean
+}
+
+type SessionRecovery = {
+	clear: (sessionID: string) => void
+	markRecovering: (sessionID: string) => void
+}
+
+export function handleDeletedLoopSession(
+	props: Record<string, unknown> | undefined,
+	loopState: LoopStateController,
+	sessionRecovery: SessionRecovery,
+): boolean {
+	const sessionInfo = props?.info as { id?: string } | undefined
+	if (!sessionInfo?.id) return false
+
+	const state = loopState.getState()
+	if (state?.session_id === sessionInfo.id) {
+		loopState.clear()
+		log(`[${HOOK_NAME}] Session deleted, loop cleared`, { sessionID: sessionInfo.id })
+	}
+	sessionRecovery.clear(sessionInfo.id)
+	return true
+}
+
+export function handleErroredLoopSession(
+	props: Record<string, unknown> | undefined,
+	loopState: LoopStateController,
+	sessionRecovery: SessionRecovery,
+): boolean {
+	const sessionID = props?.sessionID as string | undefined
+	const error = props?.error as { name?: string } | undefined
+
+	if (error?.name === "MessageAbortedError") {
+		if (sessionID) {
+			const state = loopState.getState()
+			if (state?.session_id === sessionID) {
+				loopState.clear()
+				log(`[${HOOK_NAME}] User aborted, loop cleared`, { sessionID })
+			}
+			sessionRecovery.clear(sessionID)
+		}
+		return true
+	}
+
+	if (sessionID) {
+		sessionRecovery.markRecovering(sessionID)
+	}
+	return true
+}
--- a/src/hooks/ralph-loop/storage.ts
+++ b/src/hooks/ralph-loop/storage.ts
@@ -40,10 +40,18 @@ export function readState(directory: string, customPath?: string): RalphLoopStat
      return str.replace(/^["']|["']$/g, "")
    }

+    const ultrawork = data.ultrawork === true || data.ultrawork === "true" ? true : undefined
+    const maxIterations =
+      data.max_iterations === undefined || data.max_iterations === ""
+        ? ultrawork
+          ? undefined
+          : DEFAULT_MAX_ITERATIONS
+        : Number(data.max_iterations) || DEFAULT_MAX_ITERATIONS
+
    return {
      active: isActive,
      iteration: iterationNum,
-      max_iterations: Number(data.max_iterations) || DEFAULT_MAX_ITERATIONS,
+      max_iterations: maxIterations,
      message_count_at_start:
        typeof data.message_count_at_start === "number"
          ? data.message_count_at_start
@@ -51,10 +59,23 @@ export function readState(directory: string, customPath?: string): RalphLoopStat
            ? Number(data.message_count_at_start)
            : undefined,
      completion_promise: stripQuotes(data.completion_promise) || DEFAULT_COMPLETION_PROMISE,
+      initial_completion_promise: data.initial_completion_promise
+        ? stripQuotes(data.initial_completion_promise)
+        : undefined,
+      verification_attempt_id: data.verification_attempt_id
+        ? stripQuotes(data.verification_attempt_id)
+        : undefined,
+      verification_session_id: data.verification_session_id
+        ? stripQuotes(data.verification_session_id)
+        : undefined,
      started_at: stripQuotes(data.started_at) || new Date().toISOString(),
      prompt: body.trim(),
      session_id: data.session_id ? stripQuotes(data.session_id) : undefined,
-      ultrawork: data.ultrawork === true || data.ultrawork === "true" ? true : undefined,
+      ultrawork,
+      verification_pending:
+        data.verification_pending === true || data.verification_pending === "true"
+          ? true
+          : undefined,
      strategy: data.strategy === "reset" || data.strategy === "continue" ? data.strategy : undefined,
    }
  } catch {
@@ -77,18 +98,34 @@ export function writeState(

    const sessionIdLine = state.session_id ? `session_id: "${state.session_id}"\n` : ""
    const ultraworkLine = state.ultrawork !== undefined ? `ultrawork: ${state.ultrawork}\n` : ""
+    const verificationPendingLine =
+      state.verification_pending !== undefined
+        ? `verification_pending: ${state.verification_pending}\n`
+        : ""
    const strategyLine = state.strategy ? `strategy: "${state.strategy}"\n` : ""
+    const initialCompletionPromiseLine = state.initial_completion_promise
+      ? `initial_completion_promise: "${state.initial_completion_promise}"\n`
+      : ""
+    const verificationAttemptLine = state.verification_attempt_id
+      ? `verification_attempt_id: "${state.verification_attempt_id}"\n`
+      : ""
+    const verificationSessionLine = state.verification_session_id
+      ? `verification_session_id: "${state.verification_session_id}"\n`
+      : ""
    const messageCountAtStartLine =
      typeof state.message_count_at_start === "number"
        ? `message_count_at_start: ${state.message_count_at_start}\n`
        : ""
+    const maxIterationsLine =
+      typeof state.max_iterations === "number"
+        ? `max_iterations: ${state.max_iterations}\n`
+        : ""
    const content = `---
 active: ${state.active}
 iteration: ${state.iteration}
-max_iterations: ${state.max_iterations}
-completion_promise: "${state.completion_promise}"
-started_at: "${state.started_at}"
-${sessionIdLine}${ultraworkLine}${strategyLine}${messageCountAtStartLine}---
+${maxIterationsLine}completion_promise: "${state.completion_promise}"
+${initialCompletionPromiseLine}${verificationAttemptLine}${verificationSessionLine}started_at: "${state.started_at}"
+${sessionIdLine}${ultraworkLine}${verificationPendingLine}${strategyLine}${messageCountAtStartLine}---
 ${state.prompt}
 `

--- a/src/hooks/ralph-loop/types.ts
+++ b/src/hooks/ralph-loop/types.ts
@@ -3,13 +3,17 @@ import type { RalphLoopConfig } from "../../config"
 export interface RalphLoopState {
  active: boolean
  iteration: number
-  max_iterations: number
+  max_iterations?: number
  message_count_at_start?: number
  completion_promise: string
+  initial_completion_promise?: string
+  verification_attempt_id?: string
+  verification_session_id?: string
  started_at: string
  prompt: string
  session_id?: string
  ultrawork?: boolean
+  verification_pending?: boolean
  strategy?: "reset" | "continue"
 }

--- a/src/hooks/ralph-loop/ulw-loop-verification.test.ts
+++ b/src/hooks/ralph-loop/ulw-loop-verification.test.ts
@@ -0,0 +1,297 @@
+import { afterEach, beforeEach, describe, expect, test } from "bun:test"
+import { existsSync, mkdirSync, rmSync, writeFileSync } from "node:fs"
+import { tmpdir } from "node:os"
+import { join } from "node:path"
+import { createRalphLoopHook } from "./index"
+import { ULTRAWORK_VERIFICATION_PROMISE } from "./constants"
+import { clearState, writeState } from "./storage"
+
+describe("ulw-loop verification", () => {
+	const testDir = join(tmpdir(), `ulw-loop-verification-${Date.now()}`)
+	let promptCalls: Array<{ sessionID: string; text: string }>
+	let toastCalls: Array<{ title: string; message: string; variant: string }>
+	let parentTranscriptPath: string
+	let oracleTranscriptPath: string
+
+	function createMockPluginInput() {
+		return {
+			client: {
+				session: {
+					promptAsync: async (opts: { path: { id: string }; body: { parts: Array<{ type: string; text: string }> } }) => {
+						promptCalls.push({
+							sessionID: opts.path.id,
+							text: opts.body.parts[0].text,
+						})
+						return {}
+					},
+					messages: async () => ({ data: [] }),
+				},
+				tui: {
+					showToast: async (opts: { body: { title: string; message: string; variant: string } }) => {
+						toastCalls.push(opts.body)
+						return {}
+					},
+				},
+			},
+			directory: testDir,
+		} as unknown as Parameters<typeof createRalphLoopHook>[0]
+	}
+
+	beforeEach(() => {
+		promptCalls = []
+		toastCalls = []
+		parentTranscriptPath = join(testDir, "transcript-parent.jsonl")
+		oracleTranscriptPath = join(testDir, "transcript-oracle.jsonl")
+
+		if (!existsSync(testDir)) {
+			mkdirSync(testDir, { recursive: true })
+		}
+
+		clearState(testDir)
+	})
+
+	afterEach(() => {
+		clearState(testDir)
+		if (existsSync(testDir)) {
+			rmSync(testDir, { recursive: true, force: true })
+		}
+	})
+
+	test("#given ulw loop emits DONE #when idle fires #then verification phase starts instead of completing", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+
+		expect(hook.getState()?.verification_pending).toBe(true)
+		expect(hook.getState()?.completion_promise).toBe(ULTRAWORK_VERIFICATION_PROMISE)
+		expect(hook.getState()?.verification_session_id).toBeUndefined()
+		expect(promptCalls).toHaveLength(1)
+		expect(promptCalls[0].text).toContain('task(subagent_type="oracle"')
+		expect(toastCalls.some((toast) => toast.title === "ULTRAWORK LOOP COMPLETE!")).toBe(false)
+	})
+
+	test("#given ulw loop is awaiting verification #when VERIFIED appears in oracle session #then loop completes", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+		writeState(testDir, {
+			...hook.getState()!,
+			verification_session_id: "ses-oracle",
+		})
+		writeFileSync(
+			oracleTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: `verified <promise>${ULTRAWORK_VERIFICATION_PROMISE}</promise>` } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+
+		expect(hook.getState()).toBeNull()
+		expect(toastCalls.some((toast) => toast.title === "ULTRAWORK LOOP COMPLETE!")).toBe(true)
+	})
+
+	test("#given ulw loop is awaiting verification #when oracle session idles with VERIFIED #then loop completes without parent idle", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+		writeState(testDir, {
+			...hook.getState()!,
+			verification_session_id: "ses-oracle",
+		})
+		writeFileSync(
+			oracleTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: `verified <promise>${ULTRAWORK_VERIFICATION_PROMISE}</promise>` } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "ses-oracle" } } })
+
+		expect(hook.getState()).toBeNull()
+		expect(toastCalls.some((toast) => toast.title === "ULTRAWORK LOOP COMPLETE!")).toBe(true)
+	})
+
+	test("#given ulw loop is awaiting verification without oracle session #when idle fires again #then loop waits instead of continuing", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+		const stateAfterDone = hook.getState()
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+
+		expect(hook.getState()?.iteration).toBe(stateAfterDone?.iteration)
+		expect(promptCalls).toHaveLength(1)
+		expect(hook.getState()?.verification_pending).toBe(true)
+	})
+
+	test("#given ulw loop is awaiting oracle verification #when oracle has not verified yet #then loop waits instead of continuing", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+		writeState(testDir, {
+			...hook.getState()!,
+			verification_session_id: "ses-oracle",
+		})
+		writeFileSync(
+			oracleTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "still checking" } })}\n`,
+		)
+		const stateBeforeWait = hook.getState()
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+
+		expect(hook.getState()?.iteration).toBe(stateBeforeWait?.iteration)
+		expect(promptCalls).toHaveLength(1)
+		expect(hook.getState()?.verification_session_id).toBe("ses-oracle")
+	})
+
+	test("#given oracle verification fails #when oracle session idles #then main session receives retry instructions", async () => {
+		const sessionMessages: Record<string, unknown[]> = {
+			"session-123": [{}, {}, {}],
+		}
+		const hook = createRalphLoopHook({
+			...createMockPluginInput(),
+			client: {
+				...createMockPluginInput().client,
+				session: {
+					...createMockPluginInput().client.session,
+					messages: async (opts: { path: { id: string } }) => ({
+						data: sessionMessages[opts.path.id] ?? [],
+					}),
+				},
+			},
+		} as Parameters<typeof createRalphLoopHook>[0], {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+		writeState(testDir, {
+			...hook.getState()!,
+			verification_session_id: "ses-oracle",
+		})
+		writeFileSync(
+			oracleTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "verification failed: missing tests" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "ses-oracle" } } })
+
+		expect(hook.getState()?.iteration).toBe(2)
+		expect(hook.getState()?.completion_promise).toBe("DONE")
+		expect(hook.getState()?.verification_pending).toBeUndefined()
+		expect(hook.getState()?.verification_session_id).toBeUndefined()
+		expect(hook.getState()?.message_count_at_start).toBe(3)
+		expect(promptCalls).toHaveLength(2)
+		expect(promptCalls[1]?.sessionID).toBe("session-123")
+		expect(promptCalls[1]?.text).toContain("Verification failed")
+		expect(promptCalls[1]?.text).toContain("Oracle does not lie")
+		expect(promptCalls[1]?.text).toContain('task(subagent_type="oracle"')
+	})
+
+	test("#given ulw loop without max iterations #when it continues #then it stays unbounded", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+
+		expect(hook.getState()?.iteration).toBe(2)
+		expect(hook.getState()?.max_iterations).toBeUndefined()
+		expect(promptCalls[0].text).toContain("2/unbounded")
+	})
+
+	test("#given prior transcript completion from older run #when new ulw loop starts #then old completion is ignored", async () => {
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: "2000-01-01T00:00:00.000Z", tool_output: { output: "old <promise>DONE</promise>" } })}\n`,
+		)
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+
+		expect(hook.getState()?.iteration).toBe(2)
+		expect(hook.getState()?.verification_pending).toBeUndefined()
+		expect(promptCalls).toHaveLength(1)
+	})
+
+	test("#given ulw loop was awaiting verification #when same session starts again #then verification state is overwritten", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+		hook.startLoop("session-123", "Restarted task", { ultrawork: true })
+
+		expect(hook.getState()?.prompt).toBe("Restarted task")
+		expect(hook.getState()?.verification_pending).toBeUndefined()
+		expect(hook.getState()?.completion_promise).toBe("DONE")
+	})
+
+	test("#given parent session emits VERIFIED #when oracle session is not tracked #then ulw loop does not complete", async () => {
+		const hook = createRalphLoopHook(createMockPluginInput(), {
+			getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
+		})
+		hook.startLoop("session-123", "Build API", { ultrawork: true })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+		writeFileSync(
+			parentTranscriptPath,
+			`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: `bad parent leak <promise>${ULTRAWORK_VERIFICATION_PROMISE}</promise>` } })}\n`,
+		)
+
+		await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
+
+		expect(hook.getState()).not.toBeNull()
+		expect(hook.getState()?.verification_pending).toBe(true)
+	})
+})
--- a/src/hooks/ralph-loop/verification-failure-handler.ts
+++ b/src/hooks/ralph-loop/verification-failure-handler.ts
@@ -0,0 +1,99 @@
+import type { PluginInput } from "@opencode-ai/plugin"
+import { log } from "../../shared/logger"
+import { buildVerificationFailurePrompt } from "./continuation-prompt-builder"
+import { HOOK_NAME } from "./constants"
+import { injectContinuationPrompt } from "./continuation-prompt-injector"
+import type { RalphLoopState } from "./types"
+
+type LoopStateController = {
+	restartAfterFailedVerification: (
+		sessionID: string,
+		messageCountAtStart?: number,
+	) => RalphLoopState | null
+}
+
+function getMessageCountFromResponse(messagesResponse: unknown): number {
+	if (Array.isArray(messagesResponse)) {
+		return messagesResponse.length
+	}
+
+	if (
+		typeof messagesResponse === "object"
+		&& messagesResponse !== null
+		&& "data" in messagesResponse
+	) {
+		const data = (messagesResponse as { data?: unknown }).data
+		return Array.isArray(data) ? data.length : 0
+	}
+
+	return 0
+}
+
+async function getSessionMessageCount(
+	ctx: PluginInput,
+	sessionID: string,
+	directory: string,
+): Promise<number> {
+	const messagesResponse = await ctx.client.session.messages({
+		path: { id: sessionID },
+		query: { directory },
+	})
+
+	return getMessageCountFromResponse(messagesResponse)
+}
+
+export async function handleFailedVerification(
+	ctx: PluginInput,
+	input: {
+		state: RalphLoopState
+		directory: string
+		apiTimeoutMs: number
+		loopState: LoopStateController
+	},
+): Promise<boolean> {
+	const { state, directory, apiTimeoutMs, loopState } = input
+	const parentSessionID = state.session_id
+	if (!parentSessionID) {
+		return false
+	}
+
+	let messageCountAtStart: number
+	try {
+		messageCountAtStart = await getSessionMessageCount(ctx, parentSessionID, directory)
+	} catch (error) {
+		log(`[${HOOK_NAME}] Failed to read parent session before verification retry`, {
+			parentSessionID,
+			error: String(error),
+		})
+		return false
+	}
+
+	const resumedState = loopState.restartAfterFailedVerification(
+		parentSessionID,
+		messageCountAtStart,
+	)
+	if (!resumedState) {
+		log(`[${HOOK_NAME}] Failed to restart loop after verification failure`, {
+			parentSessionID,
+		})
+		return false
+	}
+
+	await injectContinuationPrompt(ctx, {
+		sessionID: parentSessionID,
+		prompt: buildVerificationFailurePrompt(resumedState),
+		directory,
+		apiTimeoutMs,
+	})
+
+	await ctx.client.tui?.showToast?.({
+		body: {
+			title: "ULTRAWORK LOOP",
+			message: "Oracle verification failed. Continuing ULTRAWORK loop.",
+			variant: "warning",
+			duration: 5000,
+		},
+	}).catch(() => {})
+
+	return true
+}
--- a/src/hooks/runtime-fallback/index.test.ts
+++ b/src/hooks/runtime-fallback/index.test.ts
@@ -103,7 +103,7 @@ describe("runtime-fallback", () => {
      await hook.event({
        event: {
          type: "session.created",
-          properties: { info: { id: sessionID, model: "openai/gpt-5.2" } },
+          properties: { info: { id: sessionID, model: "openai/gpt-5.4" } },
        },
      })

@@ -202,7 +202,7 @@ describe("runtime-fallback", () => {
    test("should trigger fallback for missing API key errors when fallback models are configured", async () => {
      const hook = createRuntimeFallbackHook(createMockPluginInput(), {
        config: createMockConfig({ notify_on_fallback: false }),
-        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
+        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
      })
      const sessionID = "test-session-missing-api-key-fallback"
      SessionCategoryRegistry.register(sessionID, "test")
@@ -230,7 +230,7 @@ describe("runtime-fallback", () => {

      const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
      expect(fallbackLog).toBeDefined()
-      expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.2" })
+      expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.4" })
    })

    test("should detect retryable error from message pattern 'rate limit'", async () => {
@@ -260,7 +260,7 @@ describe("runtime-fallback", () => {
        config: createMockConfig({ notify_on_fallback: false }),
        pluginConfig: createMockPluginConfigWithCategoryFallback([
          "anthropic/claude-opus-4.6",
-          "openai/gpt-5.2",
+          "openai/gpt-5.4",
        ]),
      })
      const sessionID = "test-session-model-not-found"
@@ -302,7 +302,7 @@ describe("runtime-fallback", () => {

      const fallbackLogs = logCalls.filter((c) => c.msg.includes("Preparing fallback"))
      expect(fallbackLogs.length).toBeGreaterThanOrEqual(2)
-      expect(fallbackLogs[1]?.data).toMatchObject({ from: "anthropic/claude-opus-4.6", to: "openai/gpt-5.2" })
+      expect(fallbackLogs[1]?.data).toMatchObject({ from: "anthropic/claude-opus-4.6", to: "openai/gpt-5.4" })

      const nonRetryLog = logCalls.find(
        (c) => c.msg.includes("Error not retryable") && (c.data as { sessionID?: string } | undefined)?.sessionID === sessionID
@@ -313,7 +313,7 @@ describe("runtime-fallback", () => {
    test("should trigger fallback on Copilot auto-retry signal in message.updated", async () => {
      const hook = createRuntimeFallbackHook(createMockPluginInput(), {
        config: createMockConfig({ notify_on_fallback: false }),
-        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
+        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
      })

      const sessionID = "test-session-copilot-auto-retry"
@@ -346,7 +346,7 @@ describe("runtime-fallback", () => {

      const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
      expect(fallbackLog).toBeDefined()
-      expect(fallbackLog?.data).toMatchObject({ from: "github-copilot/claude-opus-4.6", to: "openai/gpt-5.2" })
+      expect(fallbackLog?.data).toMatchObject({ from: "github-copilot/claude-opus-4.6", to: "openai/gpt-5.4" })
    })

    test("should trigger fallback on OpenAI auto-retry signal in message.updated", async () => {
@@ -658,7 +658,7 @@ describe("runtime-fallback", () => {
    test("should trigger fallback when message.updated has missing API key error without model", async () => {
      const hook = createRuntimeFallbackHook(createMockPluginInput(), {
        config: createMockConfig({ notify_on_fallback: false }),
-        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
+        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
      })
      const sessionID = "test-message-updated-missing-model"
      SessionCategoryRegistry.register(sessionID, "test")
@@ -689,7 +689,7 @@ describe("runtime-fallback", () => {

      const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
      expect(fallbackLog).toBeDefined()
-      expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.2" })
+      expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.4" })
    })

    test("should not advance fallback state from message.updated while retry is already in flight", async () => {
@@ -709,7 +709,7 @@ describe("runtime-fallback", () => {
          pluginConfig: createMockPluginConfigWithCategoryFallback([
            "github-copilot/claude-opus-4.6",
            "anthropic/claude-opus-4-6",
-            "openai/gpt-5.2",
+            "openai/gpt-5.4",
          ]),
        }
      )
@@ -799,7 +799,7 @@ describe("runtime-fallback", () => {
          pluginConfig: createMockPluginConfigWithCategoryFallback([
            "github-copilot/claude-opus-4.6",
            "anthropic/claude-opus-4-6",
-            "openai/gpt-5.2",
+            "openai/gpt-5.4",
          ]),
        }
      )
@@ -883,7 +883,7 @@ describe("runtime-fallback", () => {
          pluginConfig: createMockPluginConfigWithCategoryFallback([
            "github-copilot/claude-opus-4.6",
            "anthropic/claude-opus-4-6",
-            "openai/gpt-5.2",
+            "openai/gpt-5.4",
          ]),
          session_timeout_ms: 20,
        }
@@ -949,7 +949,7 @@ describe("runtime-fallback", () => {
          pluginConfig: createMockPluginConfigWithCategoryFallback([
            "github-copilot/claude-opus-4.6",
            "anthropic/claude-opus-4-6",
-            "openai/gpt-5.2",
+            "openai/gpt-5.4",
          ]),
          session_timeout_ms: 20,
        }
@@ -1034,7 +1034,7 @@ describe("runtime-fallback", () => {
          pluginConfig: createMockPluginConfigWithCategoryFallback([
            "github-copilot/claude-opus-4.6",
            "anthropic/claude-opus-4-6",
-            "openai/gpt-5.2",
+            "openai/gpt-5.4",
          ]),
          session_timeout_ms: 20,
        }
@@ -1099,7 +1099,7 @@ describe("runtime-fallback", () => {
          pluginConfig: createMockPluginConfigWithCategoryFallback([
            "github-copilot/claude-opus-4.6",
            "anthropic/claude-opus-4-6",
-            "openai/gpt-5.2",
+            "openai/gpt-5.4",
          ]),
          session_timeout_ms: 20,
        }
@@ -1637,7 +1637,7 @@ describe("runtime-fallback", () => {
        }),
        {
          config: createMockConfig({ notify_on_fallback: false }),
-          pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
+          pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
        }
      )

@@ -1665,7 +1665,7 @@ describe("runtime-fallback", () => {
        },
      })

-      expect(retriedModels).toContain("openai/gpt-5.2")
+      expect(retriedModels).toContain("openai/gpt-5.4")
    })

    test("triggers fallback when message has mixed text and error parts", async () => {
@@ -1745,7 +1745,7 @@ describe("runtime-fallback", () => {
        }),
        {
          config: createMockConfig({ notify_on_fallback: false }),
-          pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
+          pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
        }
      )

@@ -1841,7 +1841,7 @@ describe("runtime-fallback", () => {
    test("should apply fallback model on next chat.message after error", async () => {
      const hook = createRuntimeFallbackHook(createMockPluginInput(), {
        config: createMockConfig({ notify_on_fallback: false }),
-        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2", "google/gemini-3.1-pro"]),
+        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4", "google/gemini-3.1-pro"]),
      })
      const sessionID = "test-session-switch"
      SessionCategoryRegistry.register(sessionID, "test")
@@ -1871,13 +1871,13 @@ describe("runtime-fallback", () => {
        output
      )

-      expect(output.message.model).toEqual({ providerID: "openai", modelID: "gpt-5.2" })
+      expect(output.message.model).toEqual({ providerID: "openai", modelID: "gpt-5.4" })
    })

    test("should notify when fallback occurs", async () => {
      const hook = createRuntimeFallbackHook(createMockPluginInput(), {
        config: createMockConfig({ notify_on_fallback: true }),
-        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
+        pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
      })
      const sessionID = "test-session-notify"
      SessionCategoryRegistry.register(sessionID, "test")
@@ -1897,7 +1897,7 @@ describe("runtime-fallback", () => {
      })

      expect(toastCalls.length).toBe(1)
-      expect(toastCalls[0]?.message.includes("gpt-5.2")).toBe(true)
+      expect(toastCalls[0]?.message.includes("gpt-5.4")).toBe(true)
    })
  })

@@ -1916,7 +1916,7 @@ describe("runtime-fallback", () => {
      const input = createMockPluginInput()
      const hook = createRuntimeFallbackHook(input, {
        config: createMockConfig({ notify_on_fallback: false }),
-        pluginConfig: createMockPluginConfigWithAgentFallback("oracle", ["openai/gpt-5.2", "google/gemini-3.1-pro"]),
+        pluginConfig: createMockPluginConfigWithAgentFallback("oracle", ["openai/gpt-5.4", "google/gemini-3.1-pro"]),
      })
      const sessionID = "test-agent-fallback"

@@ -1936,16 +1936,16 @@ describe("runtime-fallback", () => {
        },
      })

-      //#then - should prepare fallback to openai/gpt-5.2
+      //#then - should prepare fallback to openai/gpt-5.4
      const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
      expect(fallbackLog).toBeDefined()
-      expect(fallbackLog?.data).toMatchObject({ from: "anthropic/claude-opus-4-5", to: "openai/gpt-5.2" })
+      expect(fallbackLog?.data).toMatchObject({ from: "anthropic/claude-opus-4-5", to: "openai/gpt-5.4" })
    })

    test("should detect agent from sessionID pattern", async () => {
      const hook = createRuntimeFallbackHook(createMockPluginInput(), {
        config: createMockConfig({ notify_on_fallback: false }),
-        pluginConfig: createMockPluginConfigWithAgentFallback("sisyphus", ["openai/gpt-5.2"]),
+        pluginConfig: createMockPluginConfigWithAgentFallback("sisyphus", ["openai/gpt-5.4"]),
      })
      const sessionID = "sisyphus-session-123"

@@ -1966,7 +1966,7 @@ describe("runtime-fallback", () => {
      //#then - should detect sisyphus from sessionID and use its fallback
      const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
      expect(fallbackLog).toBeDefined()
-      expect(fallbackLog?.data).toMatchObject({ to: "openai/gpt-5.2" })
+      expect(fallbackLog?.data).toMatchObject({ to: "openai/gpt-5.4" })
    })

    test("should preserve resolved agent during auto-retry", async () => {
@@ -2019,7 +2019,7 @@ describe("runtime-fallback", () => {
      const hook = createRuntimeFallbackHook(createMockPluginInput(), {
        config: createMockConfig({ cooldown_seconds: 60, notify_on_fallback: false }),
        pluginConfig: createMockPluginConfigWithCategoryFallback([
-          "openai/gpt-5.2",
+          "openai/gpt-5.4",
          "anthropic/claude-opus-4-5",
        ]),
      })
--- a/src/hooks/think-mode/index.test.ts
+++ b/src/hooks/think-mode/index.test.ts
@@ -70,7 +70,7 @@ describe("createThinkModeHook", () => {
    const input = createHookInput({
      sessionID,
      providerID: "github-copilot",
-      modelID: "gpt-5.2",
+      modelID: "gpt-5.4",
    })
    const output = createHookOutput("ultrathink about this")

@@ -81,7 +81,7 @@ describe("createThinkModeHook", () => {
    expect(output.message.variant).toBe("high")
    expect(output.message.model).toEqual({
      providerID: "github-copilot",
-      modelID: "gpt-5-2-high",
+      modelID: "gpt-5-4-high",
    })
  })

--- a/src/hooks/think-mode/switcher.test.ts
+++ b/src/hooks/think-mode/switcher.test.ts
@@ -32,11 +32,11 @@ describe("think-mode switcher", () => {
      })

      it("should handle dots in GPT version numbers", () => {
-        // given a GPT model ID with dot format (gpt-5.2)
-        const variant = getHighVariant("gpt-5.2")
+        // given a GPT model ID with dot format (gpt-5.4)
+        const variant = getHighVariant("gpt-5.4")

        // then should return high variant
-        expect(variant).toBe("gpt-5-2-high")
+        expect(variant).toBe("gpt-5-4-high")
      })

      it("should handle dots in GPT-5.1 codex variants", () => {
@@ -60,7 +60,7 @@ describe("think-mode switcher", () => {
      it("should return null for already-high variants", () => {
        // given model IDs that are already high variants
        expect(getHighVariant("claude-opus-4-6-high")).toBeNull()
-        expect(getHighVariant("gpt-5-2-high")).toBeNull()
+        expect(getHighVariant("gpt-5-4-high")).toBeNull()
        expect(getHighVariant("gemini-3-1-pro-high")).toBeNull()
      })

@@ -76,20 +76,20 @@ describe("think-mode switcher", () => {
    it("should detect -high suffix", () => {
      // given model IDs with -high suffix
      expect(isAlreadyHighVariant("claude-opus-4-6-high")).toBe(true)
-      expect(isAlreadyHighVariant("gpt-5-2-high")).toBe(true)
+      expect(isAlreadyHighVariant("gpt-5-4-high")).toBe(true)
      expect(isAlreadyHighVariant("gemini-3.1-pro-high")).toBe(true)
    })

    it("should detect -high suffix after normalization", () => {
      // given model IDs with dots that end in -high
-      expect(isAlreadyHighVariant("gpt-5.2-high")).toBe(true)
+      expect(isAlreadyHighVariant("gpt-5.4-high")).toBe(true)
    })

    it("should return false for base models", () => {
      // given base model IDs without -high suffix
      expect(isAlreadyHighVariant("claude-opus-4-6")).toBe(false)
      expect(isAlreadyHighVariant("claude-opus-4.6")).toBe(false)
-      expect(isAlreadyHighVariant("gpt-5.2")).toBe(false)
+      expect(isAlreadyHighVariant("gpt-5.4")).toBe(false)
      expect(isAlreadyHighVariant("gemini-3.1-pro")).toBe(false)
    })

@@ -111,10 +111,10 @@ describe("think-mode switcher", () => {

      it("should preserve openai/ prefix when getting high variant", () => {
        // given a model ID with openai/ prefix
-        const variant = getHighVariant("openai/gpt-5-2")
+        const variant = getHighVariant("openai/gpt-5-4")

        // then should return high variant with prefix preserved
-        expect(variant).toBe("openai/gpt-5-2-high")
+        expect(variant).toBe("openai/gpt-5-4-high")
      })

      it("should handle prefixes with dots in version numbers", () => {
@@ -141,7 +141,7 @@ describe("think-mode switcher", () => {
      it("should return null for already-high prefixed models", () => {
        // given prefixed model IDs that are already high
        expect(getHighVariant("vertex_ai/claude-opus-4-6-high")).toBeNull()
-        expect(getHighVariant("openai/gpt-5-2-high")).toBeNull()
+        expect(getHighVariant("openai/gpt-5-4-high")).toBeNull()
      })
    })

@@ -149,20 +149,20 @@ describe("think-mode switcher", () => {
      it("should detect -high suffix in prefixed models", () => {
        // given prefixed model IDs with -high suffix
        expect(isAlreadyHighVariant("vertex_ai/claude-opus-4-6-high")).toBe(true)
-        expect(isAlreadyHighVariant("openai/gpt-5-2-high")).toBe(true)
+        expect(isAlreadyHighVariant("openai/gpt-5-4-high")).toBe(true)
        expect(isAlreadyHighVariant("custom/gemini-3.1-pro-high")).toBe(true)
      })

      it("should return false for prefixed base models", () => {
        // given prefixed base model IDs without -high suffix
        expect(isAlreadyHighVariant("vertex_ai/claude-opus-4-6")).toBe(false)
-        expect(isAlreadyHighVariant("openai/gpt-5-2")).toBe(false)
+        expect(isAlreadyHighVariant("openai/gpt-5-4")).toBe(false)
      })

      it("should handle prefixed models with dots", () => {
        // given prefixed model IDs with dots
-        expect(isAlreadyHighVariant("vertex_ai/gpt-5.2")).toBe(false)
-        expect(isAlreadyHighVariant("vertex_ai/gpt-5.2-high")).toBe(true)
+        expect(isAlreadyHighVariant("vertex_ai/gpt-5.4")).toBe(false)
+        expect(isAlreadyHighVariant("vertex_ai/gpt-5.4-high")).toBe(true)
      })
    })
 })
--- a/src/hooks/think-mode/switcher.ts
+++ b/src/hooks/think-mode/switcher.ts
@@ -25,7 +25,7 @@ import { normalizeModelID } from "../../shared"
 * @example
 * extractModelPrefix("vertex_ai/claude-sonnet-4-6") // { prefix: "vertex_ai/", base: "claude-sonnet-4-6" }
 * extractModelPrefix("claude-sonnet-4-6") // { prefix: "", base: "claude-sonnet-4-6" }
- * extractModelPrefix("openai/gpt-5.2") // { prefix: "openai/", base: "gpt-5.2" }
+ * extractModelPrefix("openai/gpt-5.4") // { prefix: "openai/", base: "gpt-5.4" }
 */
 function extractModelPrefix(modelID: string): { prefix: string; base: string } {
  const slashIndex = modelID.indexOf("/")
@@ -61,10 +61,10 @@ const HIGH_VARIANT_MAP: Record<string, string> = {
  "gpt-5-1-codex": "gpt-5-1-codex-high",
  "gpt-5-1-codex-mini": "gpt-5-1-codex-mini-high",
  "gpt-5-1-codex-max": "gpt-5-1-codex-max-high",
-  // GPT-5.2
-  "gpt-5-2": "gpt-5-2-high",
-  "gpt-5-2-chat-latest": "gpt-5-2-chat-latest-high",
-  "gpt-5-2-pro": "gpt-5-2-pro-high",
+  // GPT-5.4
+  "gpt-5-4": "gpt-5-4-high",
+  "gpt-5-4-chat-latest": "gpt-5-4-chat-latest-high",
+  "gpt-5-4-pro": "gpt-5-4-pro-high",
  // Antigravity (Google)
  "antigravity-gemini-3-1-pro": "antigravity-gemini-3-1-pro-high",
  "antigravity-gemini-3-flash": "antigravity-gemini-3-flash-high",
@@ -97,4 +97,3 @@ export function isAlreadyHighVariant(modelID: string): boolean {
  const { base } = extractModelPrefix(normalized)
  return ALREADY_HIGH.has(base) || base.endsWith("-high")
 }
-
--- a/src/hooks/todo-continuation-enforcer/todo-continuation-enforcer.test.ts
+++ b/src/hooks/todo-continuation-enforcer/todo-continuation-enforcer.test.ts
@@ -1345,8 +1345,8 @@ describe("todo-continuation-enforcer", () => {

    // OpenCode returns assistant messages with flat modelID/providerID, not nested model object
    const mockMessagesWithAssistant = [
-      { info: { id: "msg-1", role: "user", agent: "sisyphus", model: { providerID: "openai", modelID: "gpt-5.2" } } },
-      { info: { id: "msg-2", role: "assistant", agent: "sisyphus", modelID: "gpt-5.2", providerID: "openai" } },
+      { info: { id: "msg-1", role: "user", agent: "sisyphus", model: { providerID: "openai", modelID: "gpt-5.4" } } },
+      { info: { id: "msg-2", role: "assistant", agent: "sisyphus", modelID: "gpt-5.4", providerID: "openai" } },
    ]

    const mockInput = {
@@ -1390,7 +1390,7 @@ describe("todo-continuation-enforcer", () => {

     // then - model should be extracted from assistant message's flat modelID/providerID
     expect(promptCalls.length).toBe(1)
-     expect(promptCalls[0].model).toEqual({ providerID: "openai", modelID: "gpt-5.2" })
+     expect(promptCalls[0].model).toEqual({ providerID: "openai", modelID: "gpt-5.4" })
  })

  // ============================================================
--- a/src/plugin-config.test.ts
+++ b/src/plugin-config.test.ts
@@ -12,7 +12,7 @@ describe("mergeConfigs", () => {
      const base = {
        categories: {
          general: {
-            model: "openai/gpt-5.2",
+            model: "openai/gpt-5.4",
            temperature: 0.5,
          },
          quick: {
@@ -35,7 +35,7 @@ describe("mergeConfigs", () => {
      const result = mergeConfigs(base, override);

      // then general.model should be preserved from base
-      expect(result.categories?.general?.model).toBe("openai/gpt-5.2");
+      expect(result.categories?.general?.model).toBe("openai/gpt-5.4");
      // then general.temperature should be overridden
      expect(result.categories?.general?.temperature).toBe(0.3);
      // then quick should be preserved from base
@@ -48,7 +48,7 @@ describe("mergeConfigs", () => {
      const base: OhMyOpenCodeConfig = {
        categories: {
          general: {
-            model: "openai/gpt-5.2",
+            model: "openai/gpt-5.4",
          },
        },
      };
@@ -57,7 +57,7 @@ describe("mergeConfigs", () => {

      const result = mergeConfigs(base, override);

-      expect(result.categories?.general?.model).toBe("openai/gpt-5.2");
+      expect(result.categories?.general?.model).toBe("openai/gpt-5.4");
    });

    it("should use override categories when base has no categories", () => {
@@ -66,14 +66,14 @@ describe("mergeConfigs", () => {
      const override: OhMyOpenCodeConfig = {
        categories: {
          general: {
-            model: "openai/gpt-5.2",
+            model: "openai/gpt-5.4",
          },
        },
      };

      const result = mergeConfigs(base, override);

-      expect(result.categories?.general?.model).toBe("openai/gpt-5.2");
+      expect(result.categories?.general?.model).toBe("openai/gpt-5.4");
    });
  });

@@ -81,7 +81,7 @@ describe("mergeConfigs", () => {
    it("should deep merge agents", () => {
      const base: OhMyOpenCodeConfig = {
        agents: {
-          oracle: { model: "openai/gpt-5.2" },
+          oracle: { model: "openai/gpt-5.4" },
        },
      };

@@ -94,7 +94,7 @@ describe("mergeConfigs", () => {

      const result = mergeConfigs(base, override);

-      expect(result.agents?.oracle?.model).toBe("openai/gpt-5.2");
+      expect(result.agents?.oracle?.model).toBe("openai/gpt-5.4");
      expect(result.agents?.oracle?.temperature).toBe(0.5);
      expect(result.agents?.explore?.model).toBe("anthropic/claude-haiku-4-5");
    });
@@ -127,8 +127,8 @@ describe("parseConfigPartially", () => {
    it("should return the full config when everything is valid", () => {
      const rawConfig = {
        agents: {
-          oracle: { model: "openai/gpt-5.2" },
-          momus: { model: "openai/gpt-5.2" },
+          oracle: { model: "openai/gpt-5.4" },
+          momus: { model: "openai/gpt-5.4" },
        },
        disabled_hooks: ["comment-checker"],
      };
@@ -136,8 +136,8 @@ describe("parseConfigPartially", () => {
      const result = parseConfigPartially(rawConfig);

      expect(result).not.toBeNull();
-      expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.2");
-      expect(result!.agents?.momus?.model).toBe("openai/gpt-5.2");
+      expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.4");
+      expect(result!.agents?.momus?.model).toBe("openai/gpt-5.4");
      expect(result!.disabled_hooks).toEqual(["comment-checker"]);
    });
  });
@@ -150,8 +150,8 @@ describe("parseConfigPartially", () => {
    it("should preserve valid agent overrides when another section is invalid", () => {
      const rawConfig = {
        agents: {
-          oracle: { model: "openai/gpt-5.2" },
-          momus: { model: "openai/gpt-5.2" },
+          oracle: { model: "openai/gpt-5.4" },
+          momus: { model: "openai/gpt-5.4" },
          prometheus: {
            permission: {
              edit: { "*": "ask", ".sisyphus/**": "allow" },
@@ -171,7 +171,7 @@ describe("parseConfigPartially", () => {
    it("should preserve valid agents when a non-agent section is invalid", () => {
      const rawConfig = {
        agents: {
-          oracle: { model: "openai/gpt-5.2" },
+          oracle: { model: "openai/gpt-5.4" },
        },
        disabled_hooks: ["not-a-real-hook"],
      };
@@ -179,7 +179,7 @@ describe("parseConfigPartially", () => {
      const result = parseConfigPartially(rawConfig);

      expect(result).not.toBeNull();
-      expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.2");
+      expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.4");
      expect(result!.disabled_hooks).toEqual(["not-a-real-hook"]);
    });
  });
@@ -224,7 +224,7 @@ describe("parseConfigPartially", () => {
    it("should ignore unknown keys and return valid sections", () => {
      const rawConfig = {
        agents: {
-          oracle: { model: "openai/gpt-5.2" },
+          oracle: { model: "openai/gpt-5.4" },
        },
        some_future_key: { foo: "bar" },
      };
@@ -232,7 +232,7 @@ describe("parseConfigPartially", () => {
      const result = parseConfigPartially(rawConfig);

      expect(result).not.toBeNull();
-      expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.2");
+      expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.4");
      expect((result as Record<string, unknown>)["some_future_key"]).toBeUndefined();
    });
  });
--- a/src/plugin-handlers/config-handler.test.ts
+++ b/src/plugin-handlers/config-handler.test.ts
@@ -656,7 +656,7 @@ describe("Prometheus direct override priority over category", () => {
      },
      categories: {
        "test-planning": {
-          model: "openai/gpt-5.2",
+          model: "openai/gpt-5.4",
          reasoningEffort: "xhigh",
        },
      },
@@ -698,7 +698,7 @@ describe("Prometheus direct override priority over category", () => {
      },
      categories: {
        "reasoning-cat": {
-          model: "openai/gpt-5.2",
+          model: "openai/gpt-5.4",
          reasoningEffort: "high",
        },
      },
@@ -739,7 +739,7 @@ describe("Prometheus direct override priority over category", () => {
      },
      categories: {
        "temp-cat": {
-          model: "openai/gpt-5.2",
+          model: "openai/gpt-5.4",
          temperature: 0.8,
        },
      },
@@ -860,7 +860,7 @@ describe("Plan agent model inheritance from prometheus", () => {
  test("plan agent inherits temperature, reasoningEffort, and other model settings from prometheus", async () => {
    //#given - prometheus configured with category that has temperature and reasoningEffort
    spyOn(shared, "resolveModelPipeline" as any).mockReturnValue({
-      model: "openai/gpt-5.2",
+      model: "openai/gpt-5.4",
      provenance: "override",
      variant: "high",
    })
@@ -871,7 +871,7 @@ describe("Plan agent model inheritance from prometheus", () => {
      },
      agents: {
        prometheus: {
-          model: "openai/gpt-5.2",
+          model: "openai/gpt-5.4",
          variant: "high",
          temperature: 0.3,
          top_p: 0.9,
@@ -902,7 +902,7 @@ describe("Plan agent model inheritance from prometheus", () => {
    const agents = config.agent as Record<string, Record<string, unknown>>
    expect(agents.plan).toBeDefined()
    expect(agents.plan.mode).toBe("subagent")
-    expect(agents.plan.model).toBe("openai/gpt-5.2")
+    expect(agents.plan.model).toBe("openai/gpt-5.4")
    expect(agents.plan.variant).toBe("high")
    expect(agents.plan.temperature).toBe(0.3)
    expect(agents.plan.top_p).toBe(0.9)
@@ -913,7 +913,7 @@ describe("Plan agent model inheritance from prometheus", () => {
  })

  test("plan agent user override takes priority over prometheus inherited settings", async () => {
-    //#given - prometheus resolves to opus, but user has plan override for gpt-5.2
+    //#given - prometheus resolves to opus, but user has plan override for gpt-5.4
    spyOn(shared, "resolveModelPipeline" as any).mockReturnValue({
      model: "anthropic/claude-opus-4-6",
      provenance: "provider-fallback",
@@ -926,7 +926,7 @@ describe("Plan agent model inheritance from prometheus", () => {
      },
      agents: {
        plan: {
-          model: "openai/gpt-5.2",
+          model: "openai/gpt-5.4",
          variant: "high",
          temperature: 0.5,
        },
@@ -950,7 +950,7 @@ describe("Plan agent model inheritance from prometheus", () => {

    //#then - plan uses its own override, not prometheus settings
    const agents = config.agent as Record<string, Record<string, unknown>>
-    expect(agents.plan.model).toBe("openai/gpt-5.2")
+    expect(agents.plan.model).toBe("openai/gpt-5.4")
    expect(agents.plan.variant).toBe("high")
    expect(agents.plan.temperature).toBe(0.5)
  })
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
YeonGyu-Kim	d46946c85f	fix(background-agent): keep stale-pruned tasks through notification cleanup Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-11 18:01:23 +09:00
YeonGyu-Kim	3b588283b1	fix(background-agent): skip terminal tasks during stale pruning Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-08 02:13:49 +09:00
YeonGyu-Kim	816e46a967	fix(background-agent): keep terminal tasks until parent notification cleanup Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-08 02:13:43 +09:00
YeonGyu-Kim	f3be710a73	release: v3.11.0	2026-03-08 01:59:20 +09:00
YeonGyu-Kim	01efda454f	feat(model-requirements): set multimodal-looker primary model to gpt-5.4 medium Change multimodal-looker's primary model from gpt-5.3-codex to gpt-5.4 medium in both runtime and CLI fallback chains. Changes: - Runtime chain (src/shared/model-requirements.ts): primary now gpt-5.4 - CLI chain (src/cli/model-fallback-requirements.ts): primary now gpt-5.4 - Updated test expectations in model-requirements.test.ts - Updated config-manager.test.ts assertion - Updated model-fallback snapshots	2026-03-08 01:53:30 +09:00
YeonGyu-Kim	60bc9a7609	feat(model-requirements): add k2p5, kimi-k2.5, gpt-5.4 medium to Sisyphus fallback chain Sisyphus can now fall back through Kimi and OpenAI models when Claude is unavailable, enabling OpenAI-only users to use Sisyphus directly instead of being redirected to Hephaestus. Runtime chain: claude-opus-4-6 max → k2p5 → kimi-k2.5 → gpt-5.4 medium → glm-5 → big-pickle CLI chain: claude-opus-4-6 max → k2p5 → gpt-5.4 medium → glm-5	2026-03-08 01:41:45 +09:00
YeonGyu-Kim	bf8d0ffcc0	fix(atlas): enforce checkbox completion before next task 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2026-03-08 01:41:45 +09:00
YeonGyu-Kim	532143c5f4	feat(delegate-task): use explicit high variant for unspecified-high category - Update DEFAULT_CATEGORIES to use 'openai/gpt-5.4-high' directly instead of separate model + variant - Add helper functions (isExplicitHighModel, getExplicitHighBaseModel) to preserve explicit high models during fuzzy matching - Update category resolver to avoid collapsing explicit high models to base model + variant pair - Update tests to verify explicit high model handling in both background and sync modes - Update documentation examples to reflect new configuration 🤖 Generated with OhMyOpenCode assistance	2026-03-08 01:41:45 +09:00
github-actions[bot]	5e86b22cee	@hobostay has signed the CLA in code-yeongyu/oh-my-openagent#2360	2026-03-07 13:54:05 +00:00
github-actions[bot]	6660590276	@rluisr has signed the CLA in code-yeongyu/oh-my-openagent#2352	2026-03-07 07:47:56 +00:00
YeonGyu-Kim	b3ef86c574	fix(atlas): skip compaction in last-agent recovery Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-07 15:39:25 +09:00
YeonGyu-Kim	e193002775	fix(plugin): ignore compaction session agent updates Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-07 15:39:25 +09:00
acamq	f5f996983e	Merge pull request #2252 from acamq/fix/librarian-exa-name fix: correct librarian agent tool name from websearch_exa_web_search_exa to websearch_web_search_exa	2026-03-06 22:11:42 -07:00
acamq	b717d26880	Merge pull request #2278 from MoerAI/fix/tmux-health-check-url fix(tmux): use correct health check endpoint /global/health	2026-03-06 21:37:09 -07:00
acamq	51de6f18ee	Merge pull request #2334 from devxoul/fix/flaky-background-task-test fix(test): fix flaky late-session-id background task test	2026-03-06 20:48:50 -07:00
acamq	2ae63ca590	Merge pull request #2350 from wousp112/fix/git-plugin-prepare fix(install): build dist for git-based plugin installs	2026-03-06 20:13:46 -07:00
github-actions[bot]	a245abe07b	@wousp112 has signed the CLA in code-yeongyu/oh-my-openagent#2350	2026-03-06 23:14:57 +00:00
YeonGyu-Kim	58052984ff	remove trash	2026-03-07 06:42:58 +09:00
YeonGyu-Kim	58d4f8b40a	Revert "Merge pull request #2339 from JimMoen/fix/external-directory-default-ask" This reverts commit `8a1352fc9b`, reversing changes made to `d08bc04e67`.	2026-03-07 06:40:19 +09:00
wousp112	f6d8d44aba	fix(install): build dist for git-based plugin installs Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-06 21:25:51 +00:00
YeonGyu-Kim	8ec2c44615	fix(ulw-loop): retry parent session after failed verification Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-07 05:46:05 +09:00
YeonGyu-Kim	fade6740ae	chore: update GPT-5.2 references to GPT-5.4 Align runtime defaults, tests, docs, and generated artifacts with the newer GPT-5.4 baseline. Keep think-mode and prompt-routing expectations consistent after the model version bump.	2026-03-07 05:46:05 +09:00
acamq	8a1352fc9b	Merge pull request #2339 from JimMoen/fix/external-directory-default-ask fix(tool-config): stop overriding external_directory permission	2026-03-06 13:40:56 -07:00
YeonGyu-Kim	d08bc04e67	feat(sisyphus): strengthen non-Claude parallel delegation guidance 🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2026-03-07 00:47:55 +09:00
YeonGyu-Kim	fa460469f0	feat(sisyphus): rewrite GPT-5.4 prompt with 8-block architecture Restructure from 13 scattered XML blocks to 8 dense blocks with 9 named sub-anchors, following OpenAI GPT-5.4 prompting guidance and Oracle-reviewed context preservation strategy. Key changes: - Merge think_first + intent_gate + autonomy into unified <intent> with domain_guess classification and <ask_gate> sub-anchor - Add <execution_loop> as central workflow: EXPLORE -> PLAN -> ROUTE -> EXECUTE_OR_SUPERVISE -> VERIFY -> RETRY -> DONE - Add mandatory manual QA in <verification_loop> (conditional on runnable behavior) - Move <constraints> to position #2 for GPT-5.4 attention pattern - Add <completeness_contract> as explicit loop exit gate - Add <output_contract> and <verbosity_controls> per GPT-5.4 guidance - Add domain_guess (provisional) in intent, finalized in ROUTE after exploration -- visual domain always routes to visual-engineering - Preserve all named sub-anchors: ask_gate, tool_persistence, parallel_tools, tool_method, dependency_checks, verification_loop, failure_recovery, completeness_contract - Add skill loading emphasis at intent/route/delegation layers - Rename EXECUTE to EXECUTE_OR_SUPERVISE to preserve orchestrator identity with non-execution exits (answer/ask/challenge)	2026-03-07 00:43:01 +09:00
YeonGyu-Kim	20b185b59f	fix(task): append plan delegation prompt requirements Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-06 22:56:51 +09:00
YeonGyu-Kim	898b628d3d	fix(ulw-loop): track Oracle verification sessions explicitly 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2026-03-06 22:37:41 +09:00
YeonGyu-Kim	9778cc6c98	feat(ultrawork): enforce manual QA execution and acceptance criteria workflow Add MANUAL_QA_MANDATE sections to all three ultrawork prompts (default, GPT, Gemini). Agents must now define acceptance criteria in TODO/Task items before implementation, then execute manual QA themselves after completing work. lsp_diagnostics alone is explicitly called out as insufficient since it only catches type errors, not functional bugs. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-06 22:33:42 +09:00
YeonGyu-Kim	2e7b7c1f55	feat(prompts): enforce category domain matching and design-system-first workflow Remove deep parallel delegation section from GPT-5.4 Sisyphus prompt since it encouraged direct implementation over orchestration. Add zero-tolerance category domain matching guide to all Sisyphus prompts with visual-engineering examples. Rewrite visual-engineering category prompt with 4-phase mandatory workflow (analyze design system, create if missing, build with system, verify) targeting Gemini's tendency to skip foundational steps.	2026-03-06 22:19:18 +09:00
YeonGyu-Kim	c17f7215f2	test(ulw-loop): cover Oracle verification flow Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-06 22:00:21 +09:00
YeonGyu-Kim	a010de1db2	feat(ulw-loop): require Oracle verification before completion Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-03-06 22:00:14 +09:00
YeonGyu-Kim	c3f2198d34	feat(gpt-5.4): amplify parallel tool-calling with XML behavioral contracts Add <parallel_tool_calling> and <tool_usage_rules> blocks that GPT-5.4 treats as first-class behavioral contracts. Add parallel-planning question to <think_first>, strengthen Exploratory route in intent gate, and add IN PARALLEL annotations to verification loop.	2026-03-06 21:09:30 +09:00
JimMoen	a1ca658d76	fix(tool-config): stop overriding external_directory permission Remove the hardcoded external_directory: "allow" default from applyToolConfig(). This was silently overriding OpenCode's built-in default of "ask" and any user-configured external_directory permission. With this change, external_directory permission is fully controlled by OpenCode's defaults and user configuration, as intended. Fixes #1973 Fixes #2194	2026-03-06 17:58:08 +08:00
Jeon Suyeol	1429ae1505	fix(test): increase poll timeout to fix flaky late-session-id test WAIT_FOR_SESSION_TIMEOUT_MS of 2ms was too tight for 2 poll iterations at 1ms intervals — setTimeout precision caused the budget to expire before the 2nd getTask call. Bumped to 50ms.	2026-03-06 12:16:49 +09:00
MoerAI	d6fe9aa123	fix(tmux): use correct health check endpoint /global/health The server health check was using /health which returns HTTP 403 since the endpoint doesn't exist in OpenCode. The correct endpoint is /global/health as defined in OpenCode's server routes. Fixes #2260	2026-03-04 10:17:12 +09:00
acamq	c69344686c	fix: correct librarian agent tool name from websearch_exa_web_search_exa to websearch_web_search_exa The librarian agent's system prompt contained incorrect example function names for the Exa web search tool, causing the agent to call a non-existent tool 'websearch_exa_web_search_exa' instead of the correct 'websearch_web_search_exa'. Fixes #2242	2026-03-02 09:17:43 -07:00