Compare commits

...

36 Commits

Author SHA1 Message Date
YeonGyu-Kim
d46946c85f fix(background-agent): keep stale-pruned tasks through notification cleanup
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-11 18:01:23 +09:00
YeonGyu-Kim
3b588283b1 fix(background-agent): skip terminal tasks during stale pruning
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-08 02:13:49 +09:00
YeonGyu-Kim
816e46a967 fix(background-agent): keep terminal tasks until parent notification cleanup
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-08 02:13:43 +09:00
YeonGyu-Kim
f3be710a73 release: v3.11.0 2026-03-08 01:59:20 +09:00
YeonGyu-Kim
01efda454f feat(model-requirements): set multimodal-looker primary model to gpt-5.4 medium
Change multimodal-looker's primary model from gpt-5.3-codex to gpt-5.4 medium
in both runtime and CLI fallback chains.

Changes:
- Runtime chain (src/shared/model-requirements.ts): primary now gpt-5.4
- CLI chain (src/cli/model-fallback-requirements.ts): primary now gpt-5.4
- Updated test expectations in model-requirements.test.ts
- Updated config-manager.test.ts assertion
- Updated model-fallback snapshots
2026-03-08 01:53:30 +09:00
YeonGyu-Kim
60bc9a7609 feat(model-requirements): add k2p5, kimi-k2.5, gpt-5.4 medium to Sisyphus fallback chain
Sisyphus can now fall back through Kimi and OpenAI models when Claude
is unavailable, enabling OpenAI-only users to use Sisyphus directly
instead of being redirected to Hephaestus.

Runtime chain: claude-opus-4-6 max → k2p5 → kimi-k2.5 → gpt-5.4 medium → glm-5 → big-pickle
CLI chain: claude-opus-4-6 max → k2p5 → gpt-5.4 medium → glm-5
2026-03-08 01:41:45 +09:00
YeonGyu-Kim
bf8d0ffcc0 fix(atlas): enforce checkbox completion before next task
🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2026-03-08 01:41:45 +09:00
YeonGyu-Kim
532143c5f4 feat(delegate-task): use explicit high variant for unspecified-high category
- Update DEFAULT_CATEGORIES to use 'openai/gpt-5.4-high' directly instead of separate model + variant
- Add helper functions (isExplicitHighModel, getExplicitHighBaseModel) to preserve explicit high models during fuzzy matching
- Update category resolver to avoid collapsing explicit high models to base model + variant pair
- Update tests to verify explicit high model handling in both background and sync modes
- Update documentation examples to reflect new configuration

🤖 Generated with OhMyOpenCode assistance
2026-03-08 01:41:45 +09:00
github-actions[bot]
5e86b22cee @hobostay has signed the CLA in code-yeongyu/oh-my-openagent#2360 2026-03-07 13:54:05 +00:00
github-actions[bot]
6660590276 @rluisr has signed the CLA in code-yeongyu/oh-my-openagent#2352 2026-03-07 07:47:56 +00:00
YeonGyu-Kim
b3ef86c574 fix(atlas): skip compaction in last-agent recovery
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-07 15:39:25 +09:00
YeonGyu-Kim
e193002775 fix(plugin): ignore compaction session agent updates
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-07 15:39:25 +09:00
acamq
f5f996983e Merge pull request #2252 from acamq/fix/librarian-exa-name
fix: correct librarian agent tool name from websearch_exa_web_search_exa to websearch_web_search_exa
2026-03-06 22:11:42 -07:00
acamq
b717d26880 Merge pull request #2278 from MoerAI/fix/tmux-health-check-url
fix(tmux): use correct health check endpoint /global/health
2026-03-06 21:37:09 -07:00
acamq
51de6f18ee Merge pull request #2334 from devxoul/fix/flaky-background-task-test
fix(test): fix flaky late-session-id background task test
2026-03-06 20:48:50 -07:00
acamq
2ae63ca590 Merge pull request #2350 from wousp112/fix/git-plugin-prepare
fix(install): build dist for git-based plugin installs
2026-03-06 20:13:46 -07:00
github-actions[bot]
a245abe07b @wousp112 has signed the CLA in code-yeongyu/oh-my-openagent#2350 2026-03-06 23:14:57 +00:00
YeonGyu-Kim
58052984ff remove trash 2026-03-07 06:42:58 +09:00
YeonGyu-Kim
58d4f8b40a Revert "Merge pull request #2339 from JimMoen/fix/external-directory-default-ask"
This reverts commit 8a1352fc9b, reversing
changes made to d08bc04e67.
2026-03-07 06:40:19 +09:00
wousp112
f6d8d44aba fix(install): build dist for git-based plugin installs
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-06 21:25:51 +00:00
YeonGyu-Kim
8ec2c44615 fix(ulw-loop): retry parent session after failed verification
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-07 05:46:05 +09:00
YeonGyu-Kim
fade6740ae chore: update GPT-5.2 references to GPT-5.4
Align runtime defaults, tests, docs, and generated artifacts with the newer GPT-5.4 baseline. Keep think-mode and prompt-routing expectations consistent after the model version bump.
2026-03-07 05:46:05 +09:00
acamq
8a1352fc9b Merge pull request #2339 from JimMoen/fix/external-directory-default-ask
fix(tool-config): stop overriding external_directory permission
2026-03-06 13:40:56 -07:00
YeonGyu-Kim
d08bc04e67 feat(sisyphus): strengthen non-Claude parallel delegation guidance
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2026-03-07 00:47:55 +09:00
YeonGyu-Kim
fa460469f0 feat(sisyphus): rewrite GPT-5.4 prompt with 8-block architecture
Restructure from 13 scattered XML blocks to 8 dense blocks with 9
named sub-anchors, following OpenAI GPT-5.4 prompting guidance and
Oracle-reviewed context preservation strategy.

Key changes:
- Merge think_first + intent_gate + autonomy into unified <intent>
  with domain_guess classification and <ask_gate> sub-anchor
- Add <execution_loop> as central workflow: EXPLORE -> PLAN -> ROUTE ->
  EXECUTE_OR_SUPERVISE -> VERIFY -> RETRY -> DONE
- Add mandatory manual QA in <verification_loop> (conditional on
  runnable behavior)
- Move <constraints> to position #2 for GPT-5.4 attention pattern
- Add <completeness_contract> as explicit loop exit gate
- Add <output_contract> and <verbosity_controls> per GPT-5.4 guidance
- Add domain_guess (provisional) in intent, finalized in ROUTE after
  exploration -- visual domain always routes to visual-engineering
- Preserve all named sub-anchors: ask_gate, tool_persistence,
  parallel_tools, tool_method, dependency_checks, verification_loop,
  failure_recovery, completeness_contract
- Add skill loading emphasis at intent/route/delegation layers
- Rename EXECUTE to EXECUTE_OR_SUPERVISE to preserve orchestrator
  identity with non-execution exits (answer/ask/challenge)
2026-03-07 00:43:01 +09:00
YeonGyu-Kim
20b185b59f fix(task): append plan delegation prompt requirements
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-06 22:56:51 +09:00
YeonGyu-Kim
898b628d3d fix(ulw-loop): track Oracle verification sessions explicitly
🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2026-03-06 22:37:41 +09:00
YeonGyu-Kim
9778cc6c98 feat(ultrawork): enforce manual QA execution and acceptance criteria workflow
Add MANUAL_QA_MANDATE sections to all three ultrawork prompts (default,
GPT, Gemini). Agents must now define acceptance criteria in TODO/Task items
before implementation, then execute manual QA themselves after completing
work. lsp_diagnostics alone is explicitly called out as insufficient since
it only catches type errors, not functional bugs.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-06 22:33:42 +09:00
YeonGyu-Kim
2e7b7c1f55 feat(prompts): enforce category domain matching and design-system-first workflow
Remove deep parallel delegation section from GPT-5.4 Sisyphus prompt since
it encouraged direct implementation over orchestration. Add zero-tolerance
category domain matching guide to all Sisyphus prompts with visual-engineering
examples. Rewrite visual-engineering category prompt with 4-phase mandatory
workflow (analyze design system, create if missing, build with system, verify)
targeting Gemini's tendency to skip foundational steps.
2026-03-06 22:19:18 +09:00
YeonGyu-Kim
c17f7215f2 test(ulw-loop): cover Oracle verification flow
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-06 22:00:21 +09:00
YeonGyu-Kim
a010de1db2 feat(ulw-loop): require Oracle verification before completion
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-06 22:00:14 +09:00
YeonGyu-Kim
c3f2198d34 feat(gpt-5.4): amplify parallel tool-calling with XML behavioral contracts
Add <parallel_tool_calling> and <tool_usage_rules> blocks that GPT-5.4
treats as first-class behavioral contracts. Add parallel-planning question
to <think_first>, strengthen Exploratory route in intent gate, and add
IN PARALLEL annotations to verification loop.
2026-03-06 21:09:30 +09:00
JimMoen
a1ca658d76 fix(tool-config): stop overriding external_directory permission
Remove the hardcoded external_directory: "allow" default from
applyToolConfig(). This was silently overriding OpenCode's built-in
default of "ask" and any user-configured external_directory permission.

With this change, external_directory permission is fully controlled by
OpenCode's defaults and user configuration, as intended.

Fixes #1973
Fixes #2194
2026-03-06 17:58:08 +08:00
Jeon Suyeol
1429ae1505 fix(test): increase poll timeout to fix flaky late-session-id test
WAIT_FOR_SESSION_TIMEOUT_MS of 2ms was too tight for 2 poll iterations
at 1ms intervals — setTimeout precision caused the budget to expire
before the 2nd getTask call. Bumped to 50ms.
2026-03-06 12:16:49 +09:00
MoerAI
d6fe9aa123 fix(tmux): use correct health check endpoint /global/health
The server health check was using /health which returns HTTP 403 since
the endpoint doesn't exist in OpenCode. The correct endpoint is
/global/health as defined in OpenCode's server routes.

Fixes #2260
2026-03-04 10:17:12 +09:00
acamq
c69344686c fix: correct librarian agent tool name from websearch_exa_web_search_exa to websearch_web_search_exa
The librarian agent's system prompt contained incorrect example function
names for the Exa web search tool, causing the agent to call a non-existent
tool 'websearch_exa_web_search_exa' instead of the correct
'websearch_web_search_exa'.

Fixes #2242
2026-03-02 09:17:43 -07:00
135 changed files with 3243 additions and 1011 deletions

View File

@@ -1,61 +0,0 @@
[sisyphus-bot]
## Confirmed Bug
We have identified the root cause of this issue. The bug is in the config writing logic during installation.
### Root Cause
**File:** `src/cli/config-manager/write-omo-config.ts` (line 46)
```typescript
const merged = deepMergeRecord(existing, newConfig)
```
When a user runs `oh-my-opencode install` (even just to update settings), the installer:
1. Reads the existing config (with user's custom model settings)
2. Generates a **new** config based on detected provider availability
3. Calls `deepMergeRecord(existing, newConfig)`
4. Writes the result back
**The problem:** `deepMergeRecord` overwrites values in `existing` with values from `newConfig`. This means your custom `"model": "openai/gpt-5.2-codex"` gets overwritten by the generated default model (e.g., `anthropic/claude-opus-4-6` if Claude is available).
### Why This Happens
Looking at `deepMergeRecord` (line 24-25):
```typescript
} else if (sourceValue !== undefined) {
result[key] = sourceValue as TTarget[keyof TTarget]
}
```
Any defined value in the source (generated config) overwrites the target (user's config).
### Fix Approach
The merge direction should be reversed to respect user overrides:
```typescript
const merged = deepMergeRecord(newConfig, existing)
```
This ensures:
- User's explicit settings take precedence
- Only new/undefined keys get populated from generated defaults
- Custom model choices are preserved
### SEVERITY: HIGH
- **Impact:** User configuration is overwritten without consent
- **Affected Files:**
- `src/cli/config-manager/write-omo-config.ts`
- `src/cli/config-manager/deep-merge-record.ts`
- **Trigger:** Running `oh-my-opencode install` (even for unrelated updates)
### Workaround (Until Fix)
Backup your config before running install:
```bash
cp ~/.config/opencode/oh-my-opencode.jsonc ~/.config/opencode/oh-my-opencode.jsonc.backup
```
We're working on a fix that will preserve your explicit model configurations.

View File

@@ -64,8 +64,8 @@ These agents have Claude-optimized prompts — long, detailed, mechanics-driven.
| Agent | Role | Fallback Chain | Notes |
| ------------ | ----------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------- |
| **Sisyphus** | Main orchestrator | Claude Opus → GLM 5 → Big Pickle | Claude-family first. GPT-5.4 has dedicated support, but Claude/Kimi/GLM remain the preferred fit. |
| **Metis** | Plan gap analyzer | Claude Opus → GPT-5.2 → Gemini 3.1 Pro | Claude preferred, GPT acceptable fallback. |
| **Sisyphus** | Main orchestrator | Claude Opus → K2P5 → Kimi K2.5 → GPT-5.4 → GLM 5 → Big Pickle | Claude-family first. GPT-5.4 has dedicated prompt support. Kimi/GLM as intermediate fallbacks. |
| **Metis** | Plan gap analyzer | Claude Opus → GPT-5.4 → Gemini 3.1 Pro | Claude preferred, GPT acceptable fallback. |
### Dual-Prompt Agents → Claude preferred, GPT supported
@@ -83,7 +83,7 @@ These agents are built for GPT's principle-driven style. Their prompts assume au
| Agent | Role | Fallback Chain | Notes |
| -------------- | ----------------------- | -------------------------------------- | ------------------------------------------------ |
| **Hephaestus** | Autonomous deep worker | GPT-5.3 Codex only | No fallback. Requires GPT access. The craftsman. |
| **Oracle** | Architecture consultant | GPT-5.2 → Gemini 3.1 Pro → Claude Opus | Read-only high-IQ consultation. |
| **Oracle** | Architecture consultant | GPT-5.4 → Gemini 3.1 Pro → Claude Opus | Read-only high-IQ consultation. |
| **Momus** | Ruthless reviewer | GPT-5.4 → Claude Opus → Gemini 3.1 Pro | Verification and plan review. |
### Utility Runners → Speed over Intelligence
@@ -119,7 +119,7 @@ Principle-driven, explicit reasoning, deep technical capability. Best for agents
| Model | Strengths |
| ----------------- | ----------------------------------------------------------------------------------------------- |
| **GPT-5.3 Codex** | Deep coding powerhouse. Autonomous exploration. Required for Hephaestus. |
| **GPT-5.2** | High intelligence, strategic reasoning. Default for Oracle. |
| **GPT-5.4** | High intelligence, strategic reasoning. Default for Oracle. |
| **GPT-5.4** | Strong principle-driven reasoning. Default for Momus and a key fallback for Prometheus / Atlas. |
| **GPT-5-Nano** | Ultra-cheap, fast. Good for simple utility tasks. |
@@ -149,7 +149,7 @@ When agents delegate work, they don't pick a model name — they pick a **catego
| `visual-engineering` | Frontend, UI, CSS, design | Gemini 3.1 Pro → GLM 5 → Claude Opus |
| `ultrabrain` | Maximum reasoning needed | GPT-5.3 Codex → Gemini 3.1 Pro → Claude Opus |
| `deep` | Deep coding, complex logic | GPT-5.3 Codex → Claude Opus → Gemini 3.1 Pro |
| `artistry` | Creative, novel approaches | Gemini 3.1 Pro → Claude Opus → GPT-5.2 |
| `artistry` | Creative, novel approaches | Gemini 3.1 Pro → Claude Opus → GPT-5.4 |
| `quick` | Simple, fast tasks | Claude Haiku → Gemini Flash → GPT-5-Nano |
| `unspecified-high` | General complex work | GPT-5.4 → Claude Opus → GLM 5 → K2P5 |
| `unspecified-low` | General standard work | Claude Sonnet → GPT-5.3 Codex → Gemini Flash |
@@ -179,7 +179,7 @@ See the [Orchestration System Guide](./orchestration.md) for how agents dispatch
"explore": { "model": "github-copilot/grok-code-fast-1" },
// Architecture consultation: GPT or Claude Opus
"oracle": { "model": "openai/gpt-5.2", "variant": "high" },
"oracle": { "model": "openai/gpt-5.4", "variant": "high" },
// Prometheus inherits sisyphus model; just add prompt guidance
"prometheus": {
@@ -190,7 +190,7 @@ See the [Orchestration System Guide](./orchestration.md) for how agents dispatch
"categories": {
"quick": { "model": "opencode/gpt-5-nano" },
"unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
"unspecified-high": { "model": "openai/gpt-5.4", "variant": "high" },
"unspecified-high": { "model": "openai/gpt-5.4-high" },
"visual-engineering": {
"model": "google/gemini-3.1-pro",
"variant": "high",

View File

@@ -49,7 +49,7 @@ Ask the user these questions to determine CLI options:
- If **no**`--claude=no`
2. **Do you have an OpenAI/ChatGPT Plus Subscription?**
- If **yes**`--openai=yes` (GPT-5.2 for Oracle agent)
- If **yes**`--openai=yes` (GPT-5.4 for Oracle agent)
- If **no**`--openai=no` (default)
3. **Will you integrate Gemini models?**
@@ -200,7 +200,7 @@ When GitHub Copilot is the best available provider, oh-my-opencode uses these mo
| Agent | Model |
| ------------- | --------------------------------- |
| **Sisyphus** | `github-copilot/claude-opus-4-6` |
| **Oracle** | `github-copilot/gpt-5.2` |
| **Oracle** | `github-copilot/gpt-5.4` |
| **Explore** | `github-copilot/grok-code-fast-1` |
| **Librarian** | `github-copilot/gemini-3-flash` |
@@ -228,7 +228,7 @@ When OpenCode Zen is the best available provider (no native or Copilot), these m
| Agent | Model |
| ------------- | ---------------------------------------------------- |
| **Sisyphus** | `opencode/claude-opus-4-6` |
| **Oracle** | `opencode/gpt-5.2` |
| **Oracle** | `opencode/gpt-5.4` |
| **Explore** | `opencode/gpt-5-nano` |
| **Librarian** | `opencode/minimax-m2.5-free` / `opencode/big-pickle` |
@@ -280,7 +280,7 @@ Not all models behave the same way. Understanding which models are "similar" hel
| Model | Provider(s) | Notes |
| ----------------- | -------------------------------- | ------------------------------------------------- |
| **GPT-5.3-codex** | openai, github-copilot, opencode | Deep coding powerhouse. Required for Hephaestus. |
| **GPT-5.2** | openai, github-copilot, opencode | High intelligence. Default for Oracle. |
| **GPT-5.4** | openai, github-copilot, opencode | High intelligence. Default for Oracle. |
| **GPT-5-Nano** | opencode | Ultra-cheap, fast. Good for simple utility tasks. |
**Different-Behavior Models**:
@@ -310,7 +310,7 @@ Based on your subscriptions, here's how the agents were configured:
| Agent | Role | Default Chain | What It Does |
| ------------ | ---------------- | ----------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **Sisyphus** | Main ultraworker | Opus (max) → Kimi K2.5 → GLM 5 → Big Pickle | Primary coding agent. Orchestrates everything. **Never use GPT — no GPT prompt exists.** |
| **Metis** | Plan review | Opus (max) → Kimi K2.5 → GPT-5.2 → Gemini 3 Pro | Reviews Prometheus plans for gaps. |
| **Metis** | Plan review | Opus (max) → Kimi K2.5 → GPT-5.4 → Gemini 3 Pro | Reviews Prometheus plans for gaps. |
**Dual-Prompt Agents** (auto-switch between Claude and GPT prompts):
@@ -320,16 +320,16 @@ Priority: **Claude > GPT > Claude-like models**
| Agent | Role | Default Chain | GPT Prompt? |
| -------------- | ----------------- | ---------------------------------------------------------- | ---------------------------------------------------------------- |
| **Prometheus** | Strategic planner | Opus (max) → **GPT-5.2 (high)** → Kimi K2.5 → Gemini 3 Pro | Yes — XML-tagged, principle-driven (~300 lines vs ~1,100 Claude) |
| **Atlas** | Todo orchestrator | **Kimi K2.5** → Sonnet → GPT-5.2 | Yes — GPT-optimized todo management |
| **Prometheus** | Strategic planner | Opus (max) → **GPT-5.4 (high)** → Kimi K2.5 → Gemini 3 Pro | Yes — XML-tagged, principle-driven (~300 lines vs ~1,100 Claude) |
| **Atlas** | Todo orchestrator | **Kimi K2.5** → Sonnet → GPT-5.4 | Yes — GPT-optimized todo management |
**GPT-Native Agents** (built for GPT, don't override to Claude):
| Agent | Role | Default Chain | Notes |
| -------------- | ---------------------- | -------------------------------------- | ------------------------------------------------------ |
| **Hephaestus** | Deep autonomous worker | GPT-5.3-codex (medium) only | "Codex on steroids." No fallback. Requires GPT access. |
| **Oracle** | Architecture/debugging | GPT-5.2 (high) → Gemini 3 Pro → Opus | High-IQ strategic backup. GPT preferred. |
| **Momus** | High-accuracy reviewer | GPT-5.2 (medium) → Opus → Gemini 3 Pro | Verification agent. GPT preferred. |
| **Oracle** | Architecture/debugging | GPT-5.4 (high) → Gemini 3 Pro → Opus | High-IQ strategic backup. GPT preferred. |
| **Momus** | High-accuracy reviewer | GPT-5.4 (medium) → Opus → Gemini 3 Pro | Verification agent. GPT preferred. |
**Utility Agents** (speed over intelligence):
@@ -339,7 +339,7 @@ These agents do search, grep, and retrieval. They intentionally use fast, cheap
| --------------------- | ------------------ | ---------------------------------------------------------------------- | -------------------------------------------------------------- |
| **Explore** | Fast codebase grep | MiniMax M2.5 Free → Grok Code Fast → MiniMax M2.5 → Haiku → GPT-5-Nano | Speed is everything. Grok is blazing fast for grep. |
| **Librarian** | Docs/code search | MiniMax M2.5 Free → Gemini Flash → Big Pickle | Entirely free-tier. Doc retrieval doesn't need deep reasoning. |
| **Multimodal Looker** | Vision/screenshots | Kimi K2.5 → Kimi Free → Gemini Flash → GPT-5.2 → GLM-4.6v | Kimi excels at multimodal understanding. |
| **Multimodal Looker** | Vision/screenshots | Kimi K2.5 → Kimi Free → Gemini Flash → GPT-5.4 → GLM-4.6v | Kimi excels at multimodal understanding. |
#### Why Different Models Need Different Prompts
@@ -388,8 +388,8 @@ GPT (5.3-codex, 5.2) > Claude Opus (decent fallback) > Gemini (acceptable)
**Safe** (same family):
- Sisyphus: Opus → Sonnet, Kimi K2.5, GLM 5
- Prometheus: Opus → GPT-5.2 (auto-switches prompt)
- Atlas: Kimi K2.5 → Sonnet, GPT-5.2 (auto-switches)
- Prometheus: Opus → GPT-5.4 (auto-switches prompt)
- Atlas: Kimi K2.5 → Sonnet, GPT-5.4 (auto-switches)
**Dangerous** (no prompt support):

View File

@@ -45,7 +45,7 @@ flowchart TB
subgraph Workers["Worker Layer (Specialized Agents)"]
Junior[" Sisyphus-Junior<br/>(Task Executor)<br/>Claude Sonnet 4.6"]
Oracle[" Oracle<br/>(Architecture)<br/>GPT-5.2"]
Oracle[" Oracle<br/>(Architecture)<br/>GPT-5.4"]
Explore[" Explore<br/>(Codebase Grep)<br/>Grok Code"]
Librarian[" Librarian<br/>(Docs/OSS)<br/>Gemini 3 Flash"]
Frontend[" Frontend<br/>(UI/UX)<br/>Gemini 3.1 Pro"]

View File

@@ -182,7 +182,7 @@ You can override specific agents or categories in your config:
"explore": { "model": "github-copilot/grok-code-fast-1" },
// Architecture consultation: GPT or Claude Opus
"oracle": { "model": "openai/gpt-5.2", "variant": "high" },
"oracle": { "model": "openai/gpt-5.4", "variant": "high" },
},
"categories": {
@@ -215,7 +215,7 @@ You can override specific agents or categories in your config:
**GPT models** (explicit reasoning, principle-driven):
- GPT-5.3-codex — deep coding powerhouse, required for Hephaestus
- GPT-5.2 — high intelligence, default for Oracle
- GPT-5.4 — high intelligence, default for Oracle
- GPT-5-Nano — ultra-cheap, fast utility tasks
**Different-behavior models**:

View File

@@ -83,8 +83,8 @@ Here's a practical starting configuration:
"librarian": { "model": "google/gemini-3-flash" },
"explore": { "model": "github-copilot/grok-code-fast-1" },
// Architecture consultation: GPT-5.2 or Claude Opus
"oracle": { "model": "openai/gpt-5.2", "variant": "high" },
// Architecture consultation: GPT-5.4 or Claude Opus
"oracle": { "model": "openai/gpt-5.4", "variant": "high" },
// Prometheus inherits sisyphus model; just add prompt guidance
"prometheus": {
@@ -100,7 +100,7 @@ Here's a practical starting configuration:
"unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
// unspecified-high — complex work
"unspecified-high": { "model": "openai/gpt-5.4", "variant": "high" },
"unspecified-high": { "model": "openai/gpt-5.4-high" },
// writing — docs/prose
"writing": { "model": "google/gemini-3-flash" },
@@ -268,13 +268,13 @@ Disable categories: `{ "disabled_categories": ["ultrabrain"] }`
| Agent | Default Model | Provider Priority |
| --------------------- | ------------------- | ---------------------------------------------------------------------------- |
| **Sisyphus** | `claude-opus-4-6` | `claude-opus-4-6``glm-5``big-pickle` |
| **Hephaestus** | `gpt-5.3-codex` | `gpt-5.3-codex``gpt-5.2` (GitHub Copilot fallback) |
| **oracle** | `gpt-5.2` | `gpt-5.2``gemini-3.1-pro``claude-opus-4-6` |
| **Hephaestus** | `gpt-5.3-codex` | `gpt-5.3-codex``gpt-5.4` (GitHub Copilot fallback) |
| **oracle** | `gpt-5.4` | `gpt-5.4``gemini-3.1-pro``claude-opus-4-6` |
| **librarian** | `gemini-3-flash` | `gemini-3-flash``minimax-m2.5-free``big-pickle` |
| **explore** | `grok-code-fast-1` | `grok-code-fast-1``minimax-m2.5-free``claude-haiku-4-5``gpt-5-nano` |
| **multimodal-looker** | `gpt-5.3-codex` | `gpt-5.3-codex``k2p5``gemini-3-flash``glm-4.6v``gpt-5-nano` |
| **Prometheus** | `claude-opus-4-6` | `claude-opus-4-6``gpt-5.4``gemini-3.1-pro` |
| **Metis** | `claude-opus-4-6` | `claude-opus-4-6``gpt-5.2``gemini-3.1-pro` |
| **Metis** | `claude-opus-4-6` | `claude-opus-4-6``gpt-5.4``gemini-3.1-pro` |
| **Momus** | `gpt-5.4` | `gpt-5.4``claude-opus-4-6``gemini-3.1-pro` |
| **Atlas** | `claude-sonnet-4-6` | `claude-sonnet-4-6``gpt-5.4` |
@@ -285,7 +285,7 @@ Disable categories: `{ "disabled_categories": ["ultrabrain"] }`
| **visual-engineering** | `gemini-3.1-pro` | `gemini-3.1-pro``glm-5``claude-opus-4-6` |
| **ultrabrain** | `gpt-5.3-codex` | `gpt-5.3-codex``gemini-3.1-pro``claude-opus-4-6` |
| **deep** | `gpt-5.3-codex` | `gpt-5.3-codex``claude-opus-4-6``gemini-3.1-pro` |
| **artistry** | `gemini-3.1-pro` | `gemini-3.1-pro``claude-opus-4-6``gpt-5.2` |
| **artistry** | `gemini-3.1-pro` | `gemini-3.1-pro``claude-opus-4-6``gpt-5.4` |
| **quick** | `claude-haiku-4-5` | `claude-haiku-4-5``gemini-3-flash``gpt-5-nano` |
| **unspecified-low** | `claude-sonnet-4-6` | `claude-sonnet-4-6``gpt-5.3-codex``gemini-3-flash` |
| **unspecified-high** | `gpt-5.4` | `gpt-5.4``claude-opus-4-6``glm-5``k2p5``kimi-k2.5` |

View File

@@ -9,8 +9,8 @@ Oh-My-OpenCode provides 11 specialized AI agents. Each has distinct expertise, o
| Agent | Model | Purpose |
| --------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Sisyphus** | `claude-opus-4-6` | The default orchestrator. Plans, delegates, and executes complex tasks using specialized subagents with aggressive parallel execution. Todo-driven workflow with extended thinking (32k budget). Fallback: `glm-5``big-pickle`. |
| **Hephaestus** | `gpt-5.3-codex` | The Legitimate Craftsman. Autonomous deep worker inspired by AmpCode's deep mode. Goal-oriented execution with thorough research before action. Explores codebase patterns, completes tasks end-to-end without premature stopping. Named after the Greek god of forge and craftsmanship. Fallback: `gpt-5.2` on GitHub Copilot. Requires a GPT-capable provider. |
| **Oracle** | `gpt-5.2` | Architecture decisions, code review, debugging. Read-only consultation with stellar logical reasoning and deep analysis. Inspired by AmpCode. Fallback: `gemini-3.1-pro``claude-opus-4-6`. |
| **Hephaestus** | `gpt-5.3-codex` | The Legitimate Craftsman. Autonomous deep worker inspired by AmpCode's deep mode. Goal-oriented execution with thorough research before action. Explores codebase patterns, completes tasks end-to-end without premature stopping. Named after the Greek god of forge and craftsmanship. Fallback: `gpt-5.4` on GitHub Copilot. Requires a GPT-capable provider. |
| **Oracle** | `gpt-5.4` | Architecture decisions, code review, debugging. Read-only consultation with stellar logical reasoning and deep analysis. Inspired by AmpCode. Fallback: `gemini-3.1-pro``claude-opus-4-6`. |
| **Librarian** | `gemini-3-flash` | Multi-repo analysis, documentation lookup, OSS implementation examples. Deep codebase understanding with evidence-based answers. Fallback: `minimax-m2.5-free``big-pickle`. |
| **Explore** | `grok-code-fast-1` | Fast codebase exploration and contextual grep. Fallback: `minimax-m2.5-free``claude-haiku-4-5``gpt-5-nano`. |
| **Multimodal-Looker** | `gpt-5.3-codex` | Visual content specialist. Analyzes PDFs, images, diagrams to extract information. Fallback: `k2p5``gemini-3-flash``glm-4.6v``gpt-5-nano`. |
@@ -20,7 +20,7 @@ Oh-My-OpenCode provides 11 specialized AI agents. Each has distinct expertise, o
| Agent | Model | Purpose |
| -------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Prometheus** | `claude-opus-4-6` | Strategic planner with interview mode. Creates detailed work plans through iterative questioning. Fallback: `gpt-5.4``gemini-3.1-pro`. |
| **Metis** | `claude-opus-4-6` | Plan consultant — pre-planning analysis. Identifies hidden intentions, ambiguities, and AI failure points. Fallback: `gpt-5.2``gemini-3.1-pro`. |
| **Metis** | `claude-opus-4-6` | Plan consultant — pre-planning analysis. Identifies hidden intentions, ambiguities, and AI failure points. Fallback: `gpt-5.4``gemini-3.1-pro`. |
| **Momus** | `gpt-5.4` | Plan reviewer — validates plans against clarity, verifiability, and completeness standards. Fallback: `claude-opus-4-6``gemini-3.1-pro`. |
### Orchestration Agents

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode",
"version": "3.10.1",
"version": "3.11.0",
"description": "The Best AI Agent Harness - Batteries-Included OpenCode Plugin with Multi-Model Orchestration, Parallel Background Agents, and Crafted LSP/AST Tools",
"main": "dist/index.js",
"types": "dist/index.d.ts",
@@ -26,6 +26,7 @@
"build:binaries": "bun run script/build-binaries.ts",
"build:schema": "bun run script/build-schema.ts",
"clean": "rm -rf dist",
"prepare": "bun run build",
"postinstall": "node postinstall.mjs",
"prepublishOnly": "bun run clean && bun run build",
"typecheck": "tsc --noEmit",
@@ -75,17 +76,17 @@
"typescript": "^5.7.3"
},
"optionalDependencies": {
"oh-my-opencode-darwin-arm64": "3.10.1",
"oh-my-opencode-darwin-x64": "3.10.1",
"oh-my-opencode-darwin-x64-baseline": "3.10.1",
"oh-my-opencode-linux-arm64": "3.10.1",
"oh-my-opencode-linux-arm64-musl": "3.10.1",
"oh-my-opencode-linux-x64": "3.10.1",
"oh-my-opencode-linux-x64-baseline": "3.10.1",
"oh-my-opencode-linux-x64-musl": "3.10.1",
"oh-my-opencode-linux-x64-musl-baseline": "3.10.1",
"oh-my-opencode-windows-x64": "3.10.1",
"oh-my-opencode-windows-x64-baseline": "3.10.1"
"oh-my-opencode-darwin-arm64": "3.11.0",
"oh-my-opencode-darwin-x64": "3.11.0",
"oh-my-opencode-darwin-x64-baseline": "3.11.0",
"oh-my-opencode-linux-arm64": "3.11.0",
"oh-my-opencode-linux-arm64-musl": "3.11.0",
"oh-my-opencode-linux-x64": "3.11.0",
"oh-my-opencode-linux-x64-baseline": "3.11.0",
"oh-my-opencode-linux-x64-musl": "3.11.0",
"oh-my-opencode-linux-x64-musl-baseline": "3.11.0",
"oh-my-opencode-windows-x64": "3.11.0",
"oh-my-opencode-windows-x64-baseline": "3.11.0"
},
"overrides": {
"@opencode-ai/sdk": "^1.2.17"

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-darwin-arm64",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (darwin-arm64)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-darwin-x64-baseline",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (darwin-x64-baseline, no AVX2)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-darwin-x64",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (darwin-x64)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-linux-arm64-musl",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (linux-arm64-musl)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-linux-arm64",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (linux-arm64)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-linux-x64-baseline",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (linux-x64-baseline, no AVX2)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-linux-x64-musl-baseline",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (linux-x64-musl-baseline, no AVX2)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-linux-x64-musl",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (linux-x64-musl)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-linux-x64",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (linux-x64)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-windows-x64-baseline",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (windows-x64-baseline, no AVX2)",
"license": "MIT",
"repository": {

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode-windows-x64",
"version": "3.10.1",
"version": "3.11.0",
"description": "Platform-specific binary for oh-my-opencode (windows-x64)",
"license": "MIT",
"repository": {

View File

@@ -1991,6 +1991,30 @@
"created_at": "2026-03-06T10:05:58Z",
"repoId": 1108837393,
"pullRequestNo": 2339
},
{
"name": "wousp112",
"id": 186927774,
"comment_id": 4014707931,
"created_at": "2026-03-06T23:14:44Z",
"repoId": 1108837393,
"pullRequestNo": 2350
},
{
"name": "rluisr",
"id": 7776462,
"comment_id": 4015878597,
"created_at": "2026-03-07T07:47:45Z",
"repoId": 1108837393,
"pullRequestNo": 2352
},
{
"name": "hobostay",
"id": 110803307,
"comment_id": 4016562784,
"created_at": "2026-03-07T13:53:56Z",
"repoId": 1108837393,
"pullRequestNo": 2360
}
]
}

View File

@@ -10,13 +10,13 @@ Agent factories following `createXXXAgent(model) → AgentConfig` pattern. Each
| Agent | Model | Temp | Mode | Fallback Chain | Purpose |
|-------|-------|------|------|----------------|---------|
| **Sisyphus** | claude-opus-4-6 max | 0.1 | all | glm-5 → big-pickle | Main orchestrator, plans + delegates |
| **Hephaestus** | gpt-5.3-codex medium | 0.1 | all | gpt-5.2 medium (copilot) | Autonomous deep worker |
| **Oracle** | gpt-5.2 high | 0.1 | subagent | gemini-3.1-pro high → claude-opus-4-6 max | Read-only consultation |
| **Sisyphus** | claude-opus-4-6 max | 0.1 | all | k2p5 → kimi-k2.5 → gpt-5.4 medium → glm-5 → big-pickle | Main orchestrator, plans + delegates |
| **Hephaestus** | gpt-5.3-codex medium | 0.1 | all | gpt-5.4 medium (copilot) | Autonomous deep worker |
| **Oracle** | gpt-5.4 high | 0.1 | subagent | gemini-3.1-pro high → claude-opus-4-6 max | Read-only consultation |
| **Librarian** | gemini-3-flash | 0.1 | subagent | minimax-m2.5-free → big-pickle | External docs/code search |
| **Explore** | grok-code-fast-1 | 0.1 | subagent | minimax-m2.5-free → claude-haiku-4-5 → gpt-5-nano | Contextual grep |
| **Multimodal-Looker** | gpt-5.3-codex medium | 0.1 | subagent | k2p5 → gemini-3-flash → glm-4.6v → gpt-5-nano | PDF/image analysis |
| **Metis** | claude-opus-4-6 max | **0.3** | subagent | gpt-5.2 high → gemini-3.1-pro high | Pre-planning consultant |
| **Metis** | claude-opus-4-6 max | **0.3** | subagent | gpt-5.4 high → gemini-3.1-pro high | Pre-planning consultant |
| **Momus** | gpt-5.4 xhigh | 0.1 | subagent | claude-opus-4-6 max → gemini-3.1-pro high | Plan reviewer |
| **Atlas** | claude-sonnet-4-6 | 0.1 | primary | gpt-5.4 medium | Todo-list orchestrator |
| **Prometheus** | claude-opus-4-6 max | 0.1 | — | gpt-5.4 high → gemini-3.1-pro | Strategic planner (internal) |

View File

@@ -5,7 +5,7 @@
* You are the conductor of a symphony of specialized agents.
*
* Routing:
* 1. GPT models (openai/*, github-copilot/gpt-*) → gpt.ts (GPT-5.2 optimized)
* 1. GPT models (openai/*, github-copilot/gpt-*) → gpt.ts (GPT-5.4 optimized)
* 2. Gemini models (google/*, google-vertex/*) → gemini.ts (Gemini-optimized)
* 3. Default (Claude, etc.) → default.ts (Claude-optimized)
*/

View File

@@ -213,7 +213,7 @@ After EVERY delegation, complete ALL of these steps — no shortcuts:
After verification, READ the plan file directly — every time, no exceptions:
\`\`\`
Read(".sisyphus/tasks/{plan-name}.yaml")
Read(".sisyphus/plans/{plan-name}.md")
\`\`\`
Count remaining \`- [ ]\` tasks. This is your ground truth for what comes next.
@@ -335,7 +335,7 @@ task(category="quick", load_skills=[], run_in_background=false, prompt="Task 4..
\`\`\`
**Path convention**:
- Plan: \`.sisyphus/plans/{name}.md\` (READ ONLY)
- Plan: \`.sisyphus/plans/{name}.md\` (you may EDIT to mark checkboxes)
- Notepad: \`.sisyphus/notepads/{name}/\` (READ/APPEND)
</notepad_protocol>
@@ -372,6 +372,7 @@ You are the QA gate. Subagents lie. Verify EVERYTHING.
- Use lsp_diagnostics, grep, glob
- Manage todos
- Coordinate and verify
- **EDIT \`.sisyphus\/plans\/*.md\` to change \`- [ ]\` to \`- [x]\` after verified task completion**
**YOU DELEGATE**:
- All code writing/editing
@@ -403,6 +404,20 @@ You are the QA gate. Subagents lie. Verify EVERYTHING.
- **Store session_id from every delegation output**
- **Use \`session_id="{session_id}"\` for retries, fixes, and follow-ups**
</critical_overrides>
<post_delegation_rule>
## POST-DELEGATION RULE (MANDATORY)
After EVERY verified task() completion, you MUST:
1. **EDIT the plan checkbox**: Change \`- [ ]\` to \`- [x]\` for the completed task in \`.sisyphus/plans/{plan-name}.md\`
2. **READ the plan to confirm**: Read \`.sisyphus/plans/{plan-name}.md\` and verify the checkbox count changed (fewer \`- [ ]\` remaining)
3. **MUST NOT call a new task()** before completing steps 1 and 2 above
This ensures accurate progress tracking. Skip this and you lose visibility into what remains.
</post_delegation_rule>
`
export function getDefaultAtlasPrompt(): string {

View File

@@ -309,7 +309,7 @@ task(category="quick", load_skills=[], run_in_background=false, prompt="Task 3..
- Instruct subagent to append findings (never overwrite)
**Paths**:
- Plan: \`.sisyphus/plans/{name}.md\` (READ ONLY)
- Plan: \`.sisyphus\/plans\/{name}.md\` (you may EDIT to mark checkboxes)
- Notepad: \`.sisyphus/notepads/{name}/\` (READ/APPEND)
</notepad_protocol>
@@ -343,6 +343,7 @@ Subagents CLAIM "done" when:
- Use lsp_diagnostics, grep, glob
- Manage todos
- Coordinate and verify
- **EDIT \`.sisyphus\/plans\/*.md\` to change \`- [ ]\` to \`- [x]\` after verified task completion**
**YOU DELEGATE (NO EXCEPTIONS):**
- All code writing/editing
@@ -373,6 +374,20 @@ Subagents CLAIM "done" when:
- Store and reuse session_id for retries
- **USE TOOL CALLS for verification — not internal reasoning**
</critical_rules>
<post_delegation_rule>
## POST-DELEGATION RULE (MANDATORY)
After EVERY verified task() completion, you MUST:
1. **EDIT the plan checkbox**: Change \`- [ ]\` to \`- [x]\` for the completed task in \`.sisyphus/plans/{plan-name}.md\`
2. **READ the plan to confirm**: Read \`.sisyphus/plans/{plan-name}.md\` and verify the checkbox count changed (fewer \`- [ ]\` remaining)
3. **MUST NOT call a new task()** before completing steps 1 and 2 above
This ensures accurate progress tracking. Skip this and you lose visibility into what remains.
</post_delegation_rule>
`
export function getGeminiAtlasPrompt(): string {

View File

@@ -313,7 +313,7 @@ task(category="quick", load_skills=[], run_in_background=false, prompt="Task 3..
- Instruct subagent to append findings (never overwrite)
**Paths**:
- Plan: \`.sisyphus/plans/{name}.md\` (READ ONLY)
- Plan: \`.sisyphus/plans/{name}.md\` (you may EDIT to mark checkboxes)
- Notepad: \`.sisyphus/notepads/{name}/\` (READ/APPEND)
</notepad_protocol>
@@ -348,6 +348,7 @@ Your job is to CATCH THEM. Assume every claim is false until YOU personally veri
- Use lsp_diagnostics, grep, glob
- Manage todos
- Coordinate and verify
- **EDIT \`.sisyphus\/plans\/*.md\` to change \`- [ ]\` to \`- [x]\` after verified task completion**
**YOU DELEGATE**:
- All code writing/editing
@@ -376,15 +377,19 @@ Your job is to CATCH THEM. Assume every claim is false until YOU personally veri
- Store and reuse session_id for retries
</critical_rules>
<user_updates_spec>
- Send brief updates (1-2 sentences) only when:
- Starting a new major phase
- Discovering something that changes the plan
- Avoid narrating routine tool calls
- Each update must include a concrete outcome ("Found X", "Verified Y", "Delegated Z")
- Keep updates varied in structure — don't start each the same way
- Do NOT expand task scope; if you notice new work, call it out as optional
</user_updates_spec>
<post_delegation_rule>
## POST-DELEGATION RULE (MANDATORY)
After EVERY verified task() completion, you MUST:
1. **EDIT the plan checkbox**: Change \`- [ ]\` to \`- [x]\` for the completed task in \`.sisyphus/plans/{plan-name}.md\`
2. **READ the plan to confirm**: Read \`.sisyphus/plans/{plan-name}.md\` and verify the checkbox count changed (fewer \`- [ ]\` remaining)
3. **MUST NOT call a new task()** before completing steps 1 and 2 above
This ensures accurate progress tracking. Skip this and you lose visibility into what remains.
</post_delegation_rule>
`;
export function getGptAtlasPrompt(): string {

View File

@@ -0,0 +1,155 @@
import { describe, test, expect } from "bun:test"
import { ATLAS_SYSTEM_PROMPT } from "./default"
import { ATLAS_GPT_SYSTEM_PROMPT } from "./gpt"
import { ATLAS_GEMINI_SYSTEM_PROMPT } from "./gemini"
describe("ATLAS prompt checkbox enforcement", () => {
describe("default prompt", () => {
test("plan should NOT be marked (READ ONLY)", () => {
// given
const prompt = ATLAS_SYSTEM_PROMPT
// when / then
expect(prompt).not.toMatch(/\(READ ONLY\)/)
})
test("plan description should include EDIT for checkboxes", () => {
// given
const prompt = ATLAS_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/edit.*checkbox|checkbox.*edit/)
})
test("boundaries should include exception for editing .sisyphus/plans/*.md checkboxes", () => {
// given
const prompt = ATLAS_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/\.sisyphus\/plans\/\*\.md/)
expect(lowerPrompt).toMatch(/checkbox/)
})
test("prompt should include POST-DELEGATION RULE", () => {
// given
const prompt = ATLAS_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/post-delegation/)
})
test("prompt should include MUST NOT call a new task() before", () => {
// given
const prompt = ATLAS_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/must not.*call.*new.*task/)
})
test("default prompt should NOT reference .sisyphus/tasks/", () => {
// given
const prompt = ATLAS_SYSTEM_PROMPT
// when / then
expect(prompt).not.toMatch(/\.sisyphus\/tasks\//)
})
})
describe("GPT prompt", () => {
test("plan should NOT be marked (READ ONLY)", () => {
// given
const prompt = ATLAS_GPT_SYSTEM_PROMPT
// when / then
expect(prompt).not.toMatch(/\(READ ONLY\)/)
})
test("plan description should include EDIT for checkboxes", () => {
// given
const prompt = ATLAS_GPT_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/edit.*checkbox|checkbox.*edit/)
})
test("boundaries should include exception for editing .sisyphus/plans/*.md checkboxes", () => {
// given
const prompt = ATLAS_GPT_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/\.sisyphus\/plans\/\*\.md/)
expect(lowerPrompt).toMatch(/checkbox/)
})
test("prompt should include POST-DELEGATION RULE", () => {
// given
const prompt = ATLAS_GPT_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/post-delegation/)
})
test("prompt should include MUST NOT call a new task() before", () => {
// given
const prompt = ATLAS_GPT_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/must not.*call.*new.*task/)
})
})
describe("Gemini prompt", () => {
test("plan should NOT be marked (READ ONLY)", () => {
// given
const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
// when / then
expect(prompt).not.toMatch(/\(READ ONLY\)/)
})
test("plan description should include EDIT for checkboxes", () => {
// given
const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/edit.*checkbox|checkbox.*edit/)
})
test("boundaries should include exception for editing .sisyphus/plans/*.md checkboxes", () => {
// given
const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/\.sisyphus\/plans\/\*\.md/)
expect(lowerPrompt).toMatch(/checkbox/)
})
test("prompt should include POST-DELEGATION RULE", () => {
// given
const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/post-delegation/)
})
test("prompt should include MUST NOT call a new task() before", () => {
// given
const prompt = ATLAS_GEMINI_SYSTEM_PROMPT
const lowerPrompt = prompt.toLowerCase()
// when / then
expect(lowerPrompt).toMatch(/must not.*call.*new.*task/)
})
})
})

View File

@@ -4,7 +4,7 @@ import { describe, it, expect } from "bun:test"
import {
buildCategorySkillsDelegationGuide,
buildUltraworkSection,
buildDeepParallelSection,
buildParallelDelegationSection,
buildNonClaudePlannerSection,
type AvailableSkill,
type AvailableCategory,
@@ -174,23 +174,39 @@ describe("buildUltraworkSection", () => {
})
})
describe("buildDeepParallelSection", () => {
describe("buildParallelDelegationSection", () => {
const deepCategory: AvailableCategory = { name: "deep", description: "Autonomous problem-solving" }
const unspecifiedHighCategory: AvailableCategory = { name: "unspecified-high", description: "High effort tasks" }
const otherCategory: AvailableCategory = { name: "quick", description: "Trivial tasks" }
it("#given non-Claude model with deep category #when building #then returns parallel delegation section", () => {
it("#given non-Claude model with deep category #when building #then returns aggressive delegation section", () => {
//#given
const model = "google/gemini-3-pro"
const categories = [deepCategory, otherCategory]
//#when
const result = buildDeepParallelSection(model, categories)
const result = buildParallelDelegationSection(model, categories)
//#then
expect(result).toContain("Deep Parallel Delegation")
expect(result).toContain("EVERY independent unit")
expect(result).toContain("DECOMPOSE AND DELEGATE")
expect(result).toContain("NOT AN IMPLEMENTER")
expect(result).toContain("run_in_background=true")
expect(result).toContain("4 independent units")
expect(result).toContain("NEVER implement directly")
})
it("#given non-Claude model with unspecified-high category #when building #then returns aggressive delegation section", () => {
//#given
const model = "openai/gpt-5.4"
const categories = [unspecifiedHighCategory, otherCategory]
//#when
const result = buildParallelDelegationSection(model, categories)
//#then
expect(result).toContain("DECOMPOSE AND DELEGATE")
expect(result).toContain("`deep` or `unspecified-high`")
expect(result).toContain("NEVER work sequentially")
})
it("#given Claude model #when building #then returns empty", () => {
@@ -199,19 +215,19 @@ describe("buildDeepParallelSection", () => {
const categories = [deepCategory]
//#when
const result = buildDeepParallelSection(model, categories)
const result = buildParallelDelegationSection(model, categories)
//#then
expect(result).toBe("")
})
it("#given non-Claude model without deep category #when building #then returns empty", () => {
it("#given non-Claude model without deep or unspecified-high category #when building #then returns empty", () => {
//#given
const model = "openai/gpt-5.2"
const model = "openai/gpt-5.4"
const categories = [otherCategory]
//#when
const result = buildDeepParallelSection(model, categories)
const result = buildParallelDelegationSection(model, categories)
//#then
expect(result).toBe("")
@@ -245,7 +261,7 @@ describe("buildNonClaudePlannerSection", () => {
it("#given GPT model #when building #then returns plan agent section", () => {
//#given
const model = "openai/gpt-5.2"
const model = "openai/gpt-5.4"
//#when
const result = buildNonClaudePlannerSection(model)

View File

@@ -247,7 +247,34 @@ task(
**ANTI-PATTERN (will produce poor results):**
\`\`\`typescript
task(category="...", load_skills=[], run_in_background=false, prompt="...") // Empty load_skills without justification
\`\`\``
\`\`\`
---
### Category Domain Matching (ZERO TOLERANCE)
Every delegation MUST use the category that matches the task's domain. Mismatched categories produce measurably worse output because each category runs on a model optimized for that specific domain.
**VISUAL WORK = ALWAYS \`visual-engineering\`. NO EXCEPTIONS.**
Any task involving UI, UX, CSS, styling, layout, animation, design, or frontend components MUST go to \`visual-engineering\`. Never delegate visual work to \`quick\`, \`unspecified-*\`, or any other category.
\`\`\`typescript
// CORRECT: Visual work → visual-engineering category
task(category="visual-engineering", load_skills=["frontend-ui-ux"], prompt="Redesign the sidebar layout with new spacing...")
// WRONG: Visual work in wrong category — WILL PRODUCE INFERIOR RESULTS
task(category="quick", load_skills=[], prompt="Redesign the sidebar layout with new spacing...")
\`\`\`
| Task Domain | MUST Use Category |
|---|---|
| UI, styling, animations, layout, design | \`visual-engineering\` |
| Hard logic, architecture decisions, algorithms | \`ultrabrain\` |
| Autonomous research + end-to-end implementation | \`deep\` |
| Single-file typo, trivial config change | \`quick\` |
**When in doubt about category, it is almost never \`quick\` or \`unspecified-*\`. Match the domain.**`
}
export function buildOracleSection(agents: AvailableAgent[]): string {
@@ -332,21 +359,38 @@ Multi-step task? **ALWAYS consult Plan Agent first.** Do NOT start implementatio
Plan Agent returns a structured work breakdown with parallel execution opportunities. Follow it.`
}
export function buildDeepParallelSection(model: string, categories: AvailableCategory[]): string {
export function buildParallelDelegationSection(model: string, categories: AvailableCategory[]): string {
const isNonClaude = !model.toLowerCase().includes('claude')
const hasDeepCategory = categories.some(c => c.name === 'deep')
const hasDelegationCategory = categories.some(c => c.name === 'deep' || c.name === 'unspecified-high')
if (!isNonClaude || !hasDeepCategory) return ""
if (!isNonClaude || !hasDelegationCategory) return ""
return `### Deep Parallel Delegation
return `### DECOMPOSE AND DELEGATE — YOU ARE NOT AN IMPLEMENTER
Delegate EVERY independent unit to a \`deep\` agent in parallel (\`run_in_background=true\`).
If a task decomposes into 4 independent units, spawn 4 agents simultaneously — not 1 at a time.
**YOUR FAILURE MODE: You attempt to do work yourself instead of decomposing and delegating.** When you implement directly, the result is measurably worse than when specialized subagents do it. Subagents have domain-specific configurations, loaded skills, and tuned prompts that you lack.
1. Decompose the implementation into independent work units
2. Assign one \`deep\` agent per unit — all via \`run_in_background=true\`
3. Give each agent a clear GOAL with success criteria, not step-by-step instructions
4. Collect all results, integrate, verify coherence across units`
**MANDATORY — for ANY implementation task:**
1. **ALWAYS decompose** the task into independent work units. No exceptions. Even if the task "feels small", decompose it.
2. **ALWAYS delegate** EACH unit to a \`deep\` or \`unspecified-high\` agent in parallel (\`run_in_background=true\`).
3. **NEVER work sequentially.** If 4 independent units exist, spawn 4 agents simultaneously. Not 1 at a time. Not 2 then 2.
4. **NEVER implement directly** when delegation is possible. You write prompts, not code.
**YOUR PROMPT TO EACH AGENT MUST INCLUDE:**
- GOAL with explicit success criteria (what "done" looks like)
- File paths and constraints (where to work, what not to touch)
- Existing patterns to follow (reference specific files the agent should read)
- Clear scope boundary (what is IN scope, what is OUT of scope)
**Vague delegation = failed delegation.** If your prompt to the subagent is shorter than 5 lines, it is too vague.
| You Want To Do | You MUST Do Instead |
|---|---|
| Write code yourself | Delegate to \`deep\` or \`unspecified-high\` agent |
| Handle 3 changes sequentially | Spawn 3 agents in parallel |
| "Quickly fix this one thing" | Still delegate — your "quick fix" is slower and worse than a subagent's |
**Your value is orchestration, decomposition, and quality control. Delegating with crystal-clear prompts IS your work.**`
}
export function buildUltraworkSection(

View File

@@ -39,8 +39,8 @@ describe("getHephaestusPromptSource", () => {
test("returns 'gpt' for generic GPT models", () => {
// given
const model1 = "openai/gpt-5.2";
const model2 = "github-copilot/gpt-5.2";
const model1 = "openai/gpt-4o";
const model2 = "github-copilot/gpt-4o";
const model3 = "openai/gpt-4o";
// when
@@ -111,7 +111,7 @@ describe("getHephaestusPrompt", () => {
test("generic GPT model returns generic GPT prompt", () => {
// given
const model = "openai/gpt-5.2";
const model = "openai/gpt-4o";
// when
const prompt = getHephaestusPrompt(model);

View File

@@ -522,7 +522,7 @@ export function createHephaestusAgent(
return {
description:
"Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",
"Autonomous Deep Worker - goal-oriented execution with GPT 5.4 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)",
mode: MODE,
model,
maxTokens: 32000,

View File

@@ -242,10 +242,10 @@ https://github.com/tanstack/query/blob/abc123def/packages/react-query/src/useQue
### Primary Tools by Purpose
- **Official Docs**: Use context7 — \`context7_resolve-library-id\`\`context7_query-docs\`
- **Find Docs URL**: Use websearch_exa — \`websearch_exa_web_search_exa("library official documentation")\`
- **Find Docs URL**: Use websearch_exa — \`websearch_web_search_exa("library official documentation")\`
- **Sitemap Discovery**: Use webfetch — \`webfetch(docs_url + "/sitemap.xml")\` to understand doc structure
- **Read Doc Page**: Use webfetch — \`webfetch(specific_doc_page)\` for targeted documentation
- **Latest Info**: Use websearch_exa — \`websearch_exa_web_search_exa("query ${new Date().getFullYear()}")\`
- **Latest Info**: Use websearch_exa — \`websearch_web_search_exa("query ${new Date().getFullYear()}")\`
- **Fast Code Search**: Use grep_app — \`grep_app_searchGitHub(query, language, useRegexp)\`
- **Deep Code Search**: Use gh CLI — \`gh search code "query" --repo owner/repo\`
- **Clone Repo**: Use gh CLI — \`gh repo clone owner/repo \${TMPDIR:-/tmp}/name -- --depth 1\`

View File

@@ -48,7 +48,7 @@ export function getPrometheusPromptSource(model?: string): PrometheusPromptSourc
/**
* Gets the appropriate Prometheus prompt based on model.
* GPT models → GPT-5.2 optimized prompt (XML-tagged, principle-driven)
* GPT models → GPT-5.4 optimized prompt (XML-tagged, principle-driven)
* Gemini models → Gemini-optimized prompt (aggressive tool-call enforcement, thinking checkpoints)
* Default (Claude, etc.) → Claude-optimized prompt (modular sections)
*/

View File

@@ -5,7 +5,7 @@
* Category-spawned executor with domain-specific configurations.
*
* Routing:
* 1. GPT models (openai/*, github-copilot/gpt-*) -> gpt.ts (GPT-5.2 optimized)
* 1. GPT models (openai/*, github-copilot/gpt-*) -> gpt.ts (GPT-5.4 optimized)
* 2. Gemini models (google/*, google-vertex/*) -> gemini.ts (Gemini-optimized)
* 3. Default (Claude, etc.) -> default.ts (Claude-optimized)
*/

View File

@@ -10,13 +10,13 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {
describe("honored fields", () => {
test("applies model override", () => {
// given
const override = { model: "openai/gpt-5.2" }
const override = { model: "openai/gpt-5.4" }
// when
const result = createSisyphusJuniorAgentWithOverrides(override)
// then
expect(result.model).toBe("openai/gpt-5.2")
expect(result.model).toBe("openai/gpt-5.4")
})
test("applies temperature override", () => {
@@ -105,7 +105,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {
// given
const override = {
disable: true,
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
temperature: 0.9,
}
@@ -216,7 +216,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {
test("useTaskSystem=true produces Task Discipline prompt for GPT", () => {
//#given
const override = { model: "openai/gpt-5.2" }
const override = { model: "openai/gpt-5.4" }
//#when
const result = createSisyphusJuniorAgentWithOverrides(override, undefined, true)
@@ -253,7 +253,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {
test("useTaskSystem=true includes task_create/task_update in GPT prompt", () => {
//#given
const override = { model: "openai/gpt-5.2" }
const override = { model: "openai/gpt-5.4" }
//#when
const result = createSisyphusJuniorAgentWithOverrides(override, undefined, true)
@@ -303,7 +303,7 @@ describe("createSisyphusJuniorAgentWithOverrides", () => {
test("GPT model uses GPT-optimized prompt with Hephaestus-style sections", () => {
// given
const override = { model: "openai/gpt-5.2" }
const override = { model: "openai/gpt-5.4" }
// when
const result = createSisyphusJuniorAgentWithOverrides(override)
@@ -401,7 +401,7 @@ describe("getSisyphusJuniorPromptSource", () => {
test("returns 'gpt' for generic GPT models", () => {
// given
const model = "openai/gpt-5.2"
const model = "openai/gpt-4o"
// when
const source = getSisyphusJuniorPromptSource(model)
@@ -473,7 +473,7 @@ describe("buildSisyphusJuniorPrompt", () => {
test("generic GPT model uses generic GPT prompt", () => {
// given
const model = "openai/gpt-5.2"
const model = "openai/gpt-5.4"
// when
const prompt = buildSisyphusJuniorPrompt(model, false)

View File

@@ -35,7 +35,7 @@ import {
buildOracleSection,
buildHardBlocksSection,
buildAntiPatternsSection,
buildDeepParallelSection,
buildParallelDelegationSection,
buildNonClaudePlannerSection,
categorizeTools,
} from "./dynamic-agent-prompt-builder";
@@ -64,7 +64,7 @@ function buildDynamicSisyphusPrompt(
const oracleSection = buildOracleSection(availableAgents);
const hardBlocks = buildHardBlocksSection();
const antiPatterns = buildAntiPatternsSection();
const deepParallelSection = buildDeepParallelSection(model, availableCategories);
const parallelDelegationSection = buildParallelDelegationSection(model, availableCategories);
const nonClaudePlannerSection = buildNonClaudePlannerSection(model);
const taskManagementSection = buildTaskManagementSection(useTaskSystem);
const todoHookNote = useTaskSystem
@@ -262,7 +262,7 @@ ${categorySkillsGuide}
${nonClaudePlannerSection}
${deepParallelSection}
${parallelDelegationSection}
${delegationTable}

View File

@@ -19,7 +19,7 @@ import {
buildOracleSection,
buildHardBlocksSection,
buildAntiPatternsSection,
buildDeepParallelSection,
buildParallelDelegationSection,
buildNonClaudePlannerSection,
categorizeTools,
} from "../dynamic-agent-prompt-builder";
@@ -158,7 +158,7 @@ export function buildDefaultSisyphusPrompt(
const oracleSection = buildOracleSection(availableAgents);
const hardBlocks = buildHardBlocksSection();
const antiPatterns = buildAntiPatternsSection();
const deepParallelSection = buildDeepParallelSection(model, availableCategories);
const parallelDelegationSection = buildParallelDelegationSection(model, availableCategories);
const nonClaudePlannerSection = buildNonClaudePlannerSection(model);
const taskManagementSection = buildTaskManagementSection(useTaskSystem);
const todoHookNote = useTaskSystem
@@ -356,7 +356,7 @@ ${categorySkillsGuide}
${nonClaudePlannerSection}
${deepParallelSection}
${parallelDelegationSection}
${delegationTable}

View File

@@ -1,14 +1,24 @@
/**
* GPT-5.4-native Sisyphus prompt — written from scratch.
* GPT-5.4-native Sisyphus prompt — rewritten with 8-block architecture.
*
* Design principles (derived from OpenAI's GPT-5.4 prompting guidance):
* - Compact, block-structured prompts with XML tags
* - reasoning.effort defaults to "none" — encourage explicit thinking
* - Compact, block-structured prompts with XML tags + named sub-anchors
* - reasoning.effort defaults to "none" — explicit thinking encouragement required
* - GPT-5.4 generates preambles natively — do NOT add preamble instructions
* - GPT-5.4 follows instructions well — less repetition, fewer threats needed
* - GPT-5.4 benefits from: output contracts, verification loops, dependency checks
* - GPT-5.4 can be over-literal — add intent inference layer for 알잘딱 behavior
* - GPT-5.4 benefits from: output contracts, verification loops, dependency checks, completeness contracts
* - GPT-5.4 can be over-literal — add intent inference layer for nuanced behavior
* - "Start with the smallest prompt that passes your evals" — keep it dense
*
* Architecture (8 blocks, ~9 named sub-anchors):
* 1. <identity> — Role, instruction priority, orchestrator bias
* 2. <constraints> — Hard blocks + anti-patterns (early placement for GPT-5.4 attention)
* 3. <intent> — Think-first + intent gate + autonomy (merged, domain_guess routing)
* 4. <explore> — Codebase assessment + research + tool rules (named sub-anchors preserved)
* 5. <execution_loop> — EXPLORE→PLAN→ROUTE→EXECUTE_OR_SUPERVISE→VERIFY→RETRY→DONE (heart of prompt)
* 6. <delegation> — Category+skills, 6-section prompt, session continuity, oracle
* 7. <tasks> — Task/todo management
* 8. <style> — Tone (prose) + output contract + progress updates
*/
import type {
@@ -27,14 +37,13 @@ import {
buildOracleSection,
buildHardBlocksSection,
buildAntiPatternsSection,
buildDeepParallelSection,
buildNonClaudePlannerSection,
categorizeTools,
} from "../dynamic-agent-prompt-builder";
function buildGpt54TaskManagementSection(useTaskSystem: boolean): string {
function buildGpt54TasksSection(useTaskSystem: boolean): string {
if (useTaskSystem) {
return `<task_management>
return `<tasks>
Create tasks before starting any non-trivial work. This is your primary coordination mechanism.
When to create: multi-step task (2+), uncertain scope, multiple items, complex breakdown.
@@ -47,10 +56,10 @@ Workflow:
When asking for clarification:
- State what you understood, what's unclear, 2-3 options with effort/implications, and your recommendation.
</task_management>`;
</tasks>`;
}
return `<task_management>
return `<tasks>
Create todos before starting any non-trivial work. This is your primary coordination mechanism.
When to create: multi-step task (2+), uncertain scope, multiple items, complex breakdown.
@@ -63,7 +72,7 @@ Workflow:
When asking for clarification:
- State what you understood, what's unclear, 2-3 options with effort/implications, and your recommendation.
</task_management>`;
</tasks>`;
}
export function buildGpt54SisyphusPrompt(
@@ -90,14 +99,13 @@ export function buildGpt54SisyphusPrompt(
const oracleSection = buildOracleSection(availableAgents);
const hardBlocks = buildHardBlocksSection();
const antiPatterns = buildAntiPatternsSection();
const deepParallelSection = buildDeepParallelSection(model, availableCategories);
const nonClaudePlannerSection = buildNonClaudePlannerSection(model);
const taskManagementSection = buildGpt54TaskManagementSection(useTaskSystem);
const tasksSection = buildGpt54TasksSection(useTaskSystem);
const todoHookNote = useTaskSystem
? "YOUR TASK CREATION WOULD BE TRACKED BY HOOK([SYSTEM REMINDER - TASK CONTINUATION])"
: "YOUR TODO CREATION WOULD BE TRACKED BY HOOK([SYSTEM REMINDER - TODO CONTINUATION])";
return `<identity>
const identityBlock = `<identity>
You are Sisyphus — an AI orchestrator from OhMyOpenCode.
You are a senior SF Bay Area engineer. You delegate, verify, and ship. Your code is indistinguishable from a senior engineer's work.
@@ -107,25 +115,36 @@ Core competencies: parsing implicit requirements from explicit requests, adaptin
You never work alone when specialists are available. Frontend → delegate. Deep research → parallel background agents. Architecture → consult Oracle.
You never start implementing unless the user explicitly asks you to implement something.
${todoHookNote}
</identity>
<think_first>
Before responding to any non-trivial request, pause and reason through these questions:
Instruction priority: user instructions override default style/tone/formatting. Newer instructions override older ones. Safety and type-safety constraints never yield.
Default to orchestration. Direct execution is for clearly local, trivial work only.
${todoHookNote}
</identity>`;
const constraintsBlock = `<constraints>
${hardBlocks}
${antiPatterns}
</constraints>`;
const intentBlock = `<intent>
Every message passes through this gate before any action.
Your default reasoning effort is minimal. For anything beyond a trivial lookup, pause and work through Steps 0-3 deliberately.
Step 0 — Think first:
Before acting, reason through these questions:
- What does the user actually want? Not literally — what outcome are they after?
- What didn't they say that they probably expect?
- Is there a simpler way to achieve this than what they described?
- What could go wrong with the obvious approach?
This is especially important because your default reasoning effort is minimal. For anything beyond a simple lookup, think deliberately before acting.
</think_first>
<intent_gate>
Every message passes through this gate before any action.
- What tool calls can I issue IN PARALLEL right now? List independent reads, searches, and agent fires before calling.
- Is there a skill whose domain connects to this task? If so, load it immediately via \`skill\` tool — do not hesitate.
${keyTriggers}
Step 0Infer true intent:
Step 1Classify complexity x domain:
The user rarely says exactly what they mean. Your job is to read between the lines.
@@ -137,19 +156,25 @@ The user rarely says exactly what they mean. Your job is to read between the lin
| "what do you think about X?" | Wants your evaluation before committing | evaluate → propose → wait for go-ahead |
| "X is broken", "seeing error Y" | Wants a minimal fix | diagnose → fix minimally → verify |
| "refactor", "improve", "clean up" | Open-ended — needs scoping first | assess codebase → propose approach → wait |
| "어제 작업한거 좀 이상해" | Something from yesterday's work is buggy — find and fix it | check recent changes → hypothesize → verify → fix |
| "이거 전반적으로 좀 고쳐줘" | Multiple issues — wants a thorough pass | assess scope → create todo list → work through systematically |
State your interpretation briefly: "I read this as [type] — [one line plan]." Then proceed.
Step 1 — Classify complexity:
| "yesterday's work seems off" | Something from recent work is buggy — find and fix it | check recent changes → hypothesize → verify → fix |
| "fix this whole thing" | Multiple issues — wants a thorough pass | assess scope → create todo list → work through systematically |
Complexity:
- Trivial (single file, known location) → direct tools, unless a Key Trigger fires
- Explicit (specific file/line, clear command) → execute directly
- Exploratory ("how does X work?") → fire explore agents (1-3) + tools in parallel
- Exploratory ("how does X work?") → fire explore agents (1-3) + direct tools ALL IN THE SAME RESPONSE
- Open-ended ("improve", "refactor") → assess codebase first, then propose
- Ambiguous (multiple interpretations with 2x+ effort difference) → ask ONE question
Domain guess (provisional — finalized in ROUTE after exploration):
- Visual (UI, CSS, styling, layout, design, animation) → likely visual-engineering
- Logic (algorithms, architecture, complex business logic) → likely ultrabrain
- Writing (docs, prose, technical writing) → likely writing
- Git (commits, branches, rebases) → likely git
- General → determine after exploration
State your interpretation: "I read this as [complexity]-[domain_guess] — [one line plan]." Then proceed.
Step 2 — Check before acting:
- Single valid interpretation → proceed
@@ -157,43 +182,29 @@ Step 2 — Check before acting:
- Multiple interpretations, very different effort → ask
- Missing critical info → ask
- User's design seems flawed → raise concern concisely, propose alternative, ask if they want to proceed anyway
</intent_gate>
<autonomy_policy>
When to proceed vs ask:
<ask_gate>
Proceed unless:
(a) the action is irreversible,
(b) it has external side effects (sending, deleting, publishing, pushing to production), or
(c) critical information is missing that would materially change the outcome.
If proceeding, briefly state what you did and what remains.
</ask_gate>
</intent>`;
- If the user's intent is clear and the next step is reversible and low-risk: proceed without asking.
- Ask only if:
(a) the action is irreversible,
(b) it has external side effects (sending, deleting, publishing, pushing to production), or
(c) critical information is missing that would materially change the outcome.
- If proceeding, briefly state what you did and what remains.
const exploreBlock = `<explore>
## Exploration & Research
Instruction priority:
- User instructions override default style, tone, and formatting.
- Newer instructions override older ones where they conflict.
- Safety and type-safety constraints never yield.
You are an orchestrator. Your default is to delegate, not to do work yourself.
Before acting directly, check: is there a category + skills combination for this? If yes — delegate via \`task()\`. You should be doing direct implementation less than 10% of the time.
</autonomy_policy>
<codebase_assessment>
For open-ended tasks, assess the codebase before following patterns blindly.
### Codebase maturity (assess on first encounter with a new repo or module)
Quick check: config files (linter, formatter, types), 2-3 similar files for consistency, project age signals.
Classify:
- Disciplined (consistent patterns, configs, tests) → follow existing style strictly
- Transitional (mixed patterns) → ask which pattern to follow
- Legacy/Chaotic (no consistency) → propose conventions, get confirmation
- Greenfield → apply modern best practices
Verify before assuming: different patterns may be intentional, migration may be in progress.
</codebase_assessment>
<research>
## Exploration & Research
Different patterns may be intentional. Migration may be in progress. Verify before assuming.
${toolSelection}
@@ -201,16 +212,29 @@ ${exploreSection}
${librarianSection}
### Parallel execution
### Tool usage
Parallelize everything independent. Multiple reads, searches, and agent fires — all at once.
<tool_persistence_rules>
<tool_persistence>
- Use tools whenever they materially improve correctness. Your internal reasoning about file contents is unreliable.
- Do not stop early when another tool call would improve correctness.
- Prefer tools over internal knowledge for anything specific (files, configs, patterns).
- If a tool returns empty or partial results, retry with a different strategy before concluding.
</tool_persistence_rules>
- Prefer reading MORE files over fewer. When investigating, read the full cluster of related files.
</tool_persistence>
<parallel_tools>
- When multiple retrieval, lookup, or read steps are independent, issue them as parallel tool calls.
- Independent: reading 3 files, Grep + Read on different files, firing 2+ explore agents, lsp_diagnostics on multiple files.
- Dependent: needing a file path from Grep before Reading it. Sequence only these.
- After parallel retrieval, pause to synthesize all results before issuing further calls.
- Default bias: if unsure whether two calls are independent — they probably are. Parallelize.
</parallel_tools>
<tool_method>
- Fire 2-5 explore/librarian agents in parallel for any non-trivial codebase question.
- Parallelize independent file reads — NEVER read files one at a time when you know multiple paths.
- When delegating AND doing direct work: do both simultaneously.
</tool_method>
Explore and Librarian agents are background grep — always \`run_in_background=true\`, always parallel.
@@ -228,23 +252,101 @@ Background result collection:
5. Cancel disposable tasks individually via \`background_cancel(taskId="...")\`
Stop searching when: you have enough context, same info repeating, 2 iterations with no new data, or direct answer found.
</research>
</explore>`;
<implementation>
## Implementation
const executionLoopBlock = `<execution_loop>
## Execution Loop
### Pre-implementation:
0. Find relevant skills via \`skill\` tool and load them.
1. Multi-step task → create todo list immediately with detailed steps. No announcements.
2. Mark current task \`in_progress\` before starting.
3. Mark \`completed\` immediately when done — never batch.
Every implementation task follows this cycle. No exceptions.
1. EXPLORE — Fire 2-5 explore/librarian agents + direct tools IN PARALLEL.
Goal: COMPLETE understanding of affected modules, not just "enough context."
Follow \`<explore>\` protocol for tool usage and agent prompts.
2. PLAN — List files to modify, specific changes, dependencies, complexity estimate.
Multi-step (2+) → consult Plan Agent via \`task(subagent_type="plan", ...)\`.
Single-step → mental plan is sufficient.
<dependency_checks>
Before taking an action, check whether prerequisite discovery, lookup, or retrieval steps are required.
Do not skip prerequisites just because the intended final action seems obvious.
If the task depends on the output of a prior step, resolve that dependency first.
</dependency_checks>
3. ROUTE — Finalize who does the work, using domain_guess from \`<intent>\` + exploration results:
| Decision | Criteria |
|---|---|
| **delegate** (DEFAULT) | Specialized domain, multi-file, >50 lines, unfamiliar module → matching category |
| **self** | Trivial local work only: <10 lines, single file, you have full context |
| **answer** | Analysis/explanation request → respond with exploration results |
| **ask** | Truly blocked after exhausting exploration → ask ONE precise question |
| **challenge** | User's design seems flawed → raise concern, propose alternative |
Visual domain → MUST delegate to \`visual-engineering\`. No exceptions.
Skills: if ANY available skill's domain overlaps with the task, load it NOW via \`skill\` tool and include it in \`load_skills\`. When the connection is even remotely plausible, load the skill — the cost of loading an irrelevant skill is near zero, the cost of missing a relevant one is high.
4. EXECUTE_OR_SUPERVISE —
If self: surgical changes, match existing patterns, minimal diff. Never suppress type errors. Never commit unless asked. Bugfix rule: fix minimally, never refactor while fixing.
If delegated: exhaustive 6-section prompt per \`<delegation>\` protocol. Session continuity for follow-ups.
5. VERIFY —
<verification_loop>
a. Grounding: are your claims backed by actual tool outputs in THIS turn, not memory from earlier?
b. \`lsp_diagnostics\` on ALL changed files IN PARALLEL — zero errors required. Actually clean, not "probably clean."
c. Tests: run related tests (modified \`foo.ts\` → look for \`foo.test.ts\`). Actually pass, not "should pass."
d. Build: run build if applicable — exit 0 required.
e. Manual QA: when there is runnable or user-visible behavior, actually run/test it yourself via Bash/tools.
\`lsp_diagnostics\` catches type errors, NOT functional bugs. "This should work" is not verification — RUN IT.
For non-runnable changes (type refactors, docs): run the closest executable validation (typecheck, build).
f. Delegated work: read every file the subagent touched IN PARALLEL. Never trust self-reports.
</verification_loop>
Fix ONLY issues caused by YOUR changes. Pre-existing issues → note them, don't fix.
6. RETRY —
<failure_recovery>
Fix root causes, not symptoms. Re-verify after every attempt. Never make random changes hoping something works.
If first approach fails → try a materially different approach (different algorithm, pattern, or library).
After 3 attempts:
1. Stop all edits.
2. Revert to last known working state.
3. Document what was attempted.
4. Consult Oracle with full failure context.
5. If Oracle can't resolve → ask the user.
Never leave code in a broken state. Never delete failing tests to "pass."
</failure_recovery>
7. DONE —
<completeness_contract>
Exit the loop ONLY when ALL of:
- Every planned task/todo item is marked completed
- Diagnostics are clean on all changed files
- Build passes (if applicable)
- User's original request is FULLY addressed — not partially, not "you can extend later"
- Any blocked items are explicitly marked [blocked] with what is missing
</completeness_contract>
Progress: report at phase transitions — before exploration, after discovery, before large edits, on blockers.
1-2 sentences each, outcome-based. Include one specific detail. Not upfront narration or scripted preambles.
</execution_loop>`;
const delegationBlock = `<delegation>
## Delegation System
### Pre-delegation:
0. Find relevant skills via \`skill\` tool and load them. If the task context connects to ANY available skill — even loosely — load it without hesitation. Err on the side of inclusion.
${categorySkillsGuide}
${nonClaudePlannerSection}
${deepParallelSection}
${delegationTable}
### Delegation prompt structure (all 6 sections required):
@@ -258,16 +360,7 @@ ${delegationTable}
6. CONTEXT: File paths, existing patterns, constraints
\`\`\`
<dependency_checks>
Before taking an action, check whether prerequisite discovery, lookup, or retrieval steps are required.
Do not skip prerequisites just because the intended final action seems obvious.
If the task depends on the output of a prior step, resolve that dependency first.
</dependency_checks>
After delegation completes, verify:
- Does the result work as expected?
- Does it follow existing codebase patterns?
- Did the agent follow MUST DO and MUST NOT DO?
Post-delegation: delegation never substitutes for verification. Always run \`<verification_loop>\` on delegated results.
### Session continuity
@@ -278,76 +371,55 @@ Every \`task()\` returns a session_id. Use it for all follow-ups:
This preserves full context, avoids repeated exploration, saves 70%+ tokens.
### Code changes:
- Match existing patterns in disciplined codebases
- Propose approach first in chaotic codebases
- Never suppress type errors (\`as any\`, \`@ts-ignore\`, \`@ts-expect-error\`)
- Never commit unless explicitly requested
- Bugfix rule: fix minimally. Never refactor while fixing.
</implementation>
${oracleSection ? `### Oracle
<verification_loop>
Before finalizing any task:
- Correctness: does the output satisfy every requirement?
- Grounding: are claims backed by actual file contents or tool outputs, not memory?
- Evidence: run \`lsp_diagnostics\` on all changed files. Actually clean, not "probably clean."
- Tests: if they exist, run them. Actually pass, not "should pass."
- Delegation: if you delegated, read every file the subagent touched. Don't trust claims.
${oracleSection}` : ""}
</delegation>`;
A task is complete when:
- All planned todo items are marked done
- Diagnostics are clean on changed files
- Build passes (if applicable)
- User's original request is fully addressed
const styleBlock = `<style>
## Tone
If verification fails: fix issues caused by your changes. Do not fix pre-existing issues unless asked.
</verification_loop>
<failure_recovery>
When fixes fail:
1. Fix root causes, not symptoms.
2. Re-verify after every attempt.
3. Never make random changes hoping something works.
After 3 consecutive failures:
1. Stop all edits.
2. Revert to last known working state.
3. Document what was attempted.
4. Consult Oracle with full failure context.
5. If Oracle can't resolve → ask the user.
Never leave code in a broken state. Never delete failing tests to "pass."
</failure_recovery>
${oracleSection}
${taskManagementSection}
<style>
Write in complete, natural sentences. Avoid sentence fragments, bullet-only responses, and terse shorthand.
Before taking action on a non-trivial request, briefly explain how you plan to deliver the result. This gives the user a chance to course-correct early and builds trust in your approach. Keep this explanation to two or three sentences — enough to be clear, not so much that it delays progress.
Technical explanations should feel like a knowledgeable colleague walking you through something, not a spec sheet. Use plain language where possible, and when technical terms are necessary, make the surrounding context do the explanatory work.
When you encounter something worth commenting on — a tradeoff, a pattern choice, a potential issue — explain it clearly rather than suggesting alternatives. Instead of "You could try X" or "Should I do Y?", explain why something works the way it does and what the implications are. The user benefits more from understanding than from a menu of options.
When you encounter something worth commenting on — a tradeoff, a pattern choice, a potential issue — explain why something works the way it does and what the implications are. The user benefits more from understanding than from a menu of options.
Stay kind and approachable. Technical explanations should feel like a knowledgeable colleague walking you through something, not a spec sheet. Use plain language where possible, and when technical terms are necessary, make the surrounding context do the explanatory work.
Stay kind and approachable. Be concise in volume but generous in clarity. Every sentence should carry meaning. Skip empty preambles ("Great question!", "Sure thing!"), but do not skip context that helps the user follow your reasoning.
Be concise in volume but generous in clarity. Every sentence should carry meaning. Skip empty preambles ("Great question!", "Sure thing!"), but do not skip context that helps the user follow your reasoning.
If the user's approach has a problem, explain the concern directly and clearly, then describe the alternative you recommend and why it is better. Frame it as an explanation of what you found, not as a suggestion.
If the user's approach has a problem, explain the concern directly and clearly, then describe the alternative you recommend and why it is better. Do not frame this as a suggestion — frame it as an explanation of what you found.
</style>
## Output
<constraints>
${hardBlocks}
<output_contract>
- Default: 3-6 sentences or ≤5 bullets
- Simple yes/no: ≤2 sentences
- Complex multi-file: 1 overview paragraph + ≤5 tagged bullets (What, Where, Risks, Next, Open)
- Before taking action on a non-trivial request, briefly explain your plan in 2-3 sentences.
</output_contract>
${antiPatterns}
<verbosity_controls>
- Prefer concise, information-dense writing.
- Avoid repeating the user's request back to them.
- Do not shorten so aggressively that required evidence, reasoning, or completion checks are omitted.
</verbosity_controls>
</style>`;
Soft guidelines:
- Prefer existing libraries over new dependencies
- Prefer small, focused changes over large refactors
- When uncertain about scope, ask
</constraints>
`;
return `${identityBlock}
${constraintsBlock}
${intentBlock}
${exploreBlock}
${executionLoopBlock}
${delegationBlock}
${tasksSection}
${styleBlock}`;
}
export { categorizeTools };

View File

@@ -12,9 +12,9 @@ describe("isGpt5_4Model", () => {
test("does not match other GPT models", () => {
expect(isGpt5_4Model("openai/gpt-5.3-codex")).toBe(false);
expect(isGpt5_4Model("openai/gpt-5.2")).toBe(false);
expect(isGpt5_4Model("openai/gpt-5.1")).toBe(false);
expect(isGpt5_4Model("openai/gpt-4o")).toBe(false);
expect(isGpt5_4Model("github-copilot/gpt-5.2")).toBe(false);
expect(isGpt5_4Model("github-copilot/gpt-4o")).toBe(false);
});
test("does not match non-GPT models", () => {
@@ -26,7 +26,7 @@ describe("isGpt5_4Model", () => {
describe("isGptModel", () => {
test("standard openai provider gpt models", () => {
expect(isGptModel("openai/gpt-5.2")).toBe(true);
expect(isGptModel("openai/gpt-5.4")).toBe(true);
expect(isGptModel("openai/gpt-4o")).toBe(true);
});
@@ -39,22 +39,22 @@ describe("isGptModel", () => {
});
test("github copilot gpt models", () => {
expect(isGptModel("github-copilot/gpt-5.2")).toBe(true);
expect(isGptModel("github-copilot/gpt-5.4")).toBe(true);
expect(isGptModel("github-copilot/gpt-4o")).toBe(true);
});
test("litellm proxied gpt models", () => {
expect(isGptModel("litellm/gpt-5.2")).toBe(true);
expect(isGptModel("litellm/gpt-5.4")).toBe(true);
expect(isGptModel("litellm/gpt-4o")).toBe(true);
});
test("other proxied gpt models", () => {
expect(isGptModel("ollama/gpt-4o")).toBe(true);
expect(isGptModel("custom-provider/gpt-5.2")).toBe(true);
expect(isGptModel("custom-provider/gpt-5.4")).toBe(true);
});
test("venice provider gpt models", () => {
expect(isGptModel("venice/gpt-5.2")).toBe(true);
expect(isGptModel("venice/gpt-5.4")).toBe(true);
expect(isGptModel("venice/gpt-4o")).toBe(true);
});
@@ -108,7 +108,7 @@ describe("isGeminiModel", () => {
});
test("#given gpt models #then returns false", () => {
expect(isGeminiModel("openai/gpt-5.2")).toBe(false);
expect(isGeminiModel("openai/gpt-5.4")).toBe(false);
expect(isGeminiModel("openai/o3-mini")).toBe(false);
expect(isGeminiModel("litellm/gpt-4o")).toBe(false);
});

View File

@@ -39,14 +39,14 @@ describe("createBuiltinAgents with model overrides", () => {
test("Sisyphus with GPT model override has reasoningEffort, no thinking", async () => {
// #given
const overrides = {
sisyphus: { model: "github-copilot/gpt-5.2" },
sisyphus: { model: "github-copilot/gpt-5.4" },
}
// #when
const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)
// #then
expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.2")
expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.4")
expect(agents.sisyphus.reasoningEffort).toBe("medium")
expect(agents.sisyphus.thinking).toBeUndefined()
})
@@ -54,9 +54,9 @@ describe("createBuiltinAgents with model overrides", () => {
test("Atlas uses uiSelectedModel", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["openai/gpt-5.2", "anthropic/claude-sonnet-4-6"])
new Set(["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"])
)
const uiSelectedModel = "openai/gpt-5.2"
const uiSelectedModel = "openai/gpt-5.4"
try {
// #when
@@ -75,7 +75,7 @@ describe("createBuiltinAgents with model overrides", () => {
// #then
expect(agents.atlas).toBeDefined()
expect(agents.atlas.model).toBe("openai/gpt-5.2")
expect(agents.atlas.model).toBe("openai/gpt-5.4")
} finally {
fetchSpy.mockRestore()
}
@@ -84,9 +84,9 @@ describe("createBuiltinAgents with model overrides", () => {
test("user config model takes priority over uiSelectedModel for sisyphus", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["openai/gpt-5.2", "anthropic/claude-sonnet-4-6"])
new Set(["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"])
)
const uiSelectedModel = "openai/gpt-5.2"
const uiSelectedModel = "openai/gpt-5.4"
const overrides = {
sisyphus: { model: "google/antigravity-claude-opus-4-5-thinking" },
}
@@ -117,9 +117,9 @@ describe("createBuiltinAgents with model overrides", () => {
test("user config model takes priority over uiSelectedModel for atlas", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["openai/gpt-5.2", "anthropic/claude-sonnet-4-6"])
new Set(["openai/gpt-5.4", "anthropic/claude-sonnet-4-6"])
)
const uiSelectedModel = "openai/gpt-5.2"
const uiSelectedModel = "openai/gpt-5.4"
const overrides = {
atlas: { model: "google/antigravity-claude-opus-4-5-thinking" },
}
@@ -173,8 +173,8 @@ describe("createBuiltinAgents with model overrides", () => {
// #when
const agents = await createBuiltinAgents([], {}, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)
// #then - oracle resolves via connected cache fallback to openai/gpt-5.2 (not system default)
expect(agents.oracle.model).toBe("openai/gpt-5.2")
// #then - oracle resolves via connected cache fallback to openai/gpt-5.4 (not system default)
expect(agents.oracle.model).toBe("openai/gpt-5.4")
expect(agents.oracle.reasoningEffort).toBe("medium")
expect(agents.oracle.thinking).toBeUndefined()
cacheSpy.mockRestore?.()
@@ -196,14 +196,14 @@ describe("createBuiltinAgents with model overrides", () => {
test("Oracle with GPT model override has reasoningEffort, no thinking", async () => {
// #given
const overrides = {
oracle: { model: "openai/gpt-5.2" },
oracle: { model: "openai/gpt-5.4" },
}
// #when
const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)
// #then
expect(agents.oracle.model).toBe("openai/gpt-5.2")
expect(agents.oracle.model).toBe("openai/gpt-5.4")
expect(agents.oracle.reasoningEffort).toBe("medium")
expect(agents.oracle.textVerbosity).toBe("high")
expect(agents.oracle.thinking).toBeUndefined()
@@ -228,14 +228,14 @@ describe("createBuiltinAgents with model overrides", () => {
test("non-model overrides are still applied after factory rebuild", async () => {
// #given
const overrides = {
sisyphus: { model: "github-copilot/gpt-5.2", temperature: 0.5 },
sisyphus: { model: "github-copilot/gpt-5.4", temperature: 0.5 },
}
// #when
const agents = await createBuiltinAgents([], overrides, undefined, TEST_DEFAULT_MODEL, undefined, undefined, [], undefined, undefined)
// #then
expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.2")
expect(agents.sisyphus.model).toBe("github-copilot/gpt-5.4")
expect(agents.sisyphus.temperature).toBe(0.5)
})
@@ -261,7 +261,7 @@ describe("createBuiltinAgents with model overrides", () => {
"opencode/kimi-k2.5-free",
"zai-coding-plan/glm-5",
"opencode/big-pickle",
"openai/gpt-5.2",
"openai/gpt-5.4",
])
)
@@ -298,7 +298,7 @@ describe("createBuiltinAgents with model overrides", () => {
test("excludes hidden custom agents from orchestrator prompts", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
)
const customAgentSummaries = [
@@ -334,7 +334,7 @@ describe("createBuiltinAgents with model overrides", () => {
test("excludes disabled custom agents from orchestrator prompts", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
)
const customAgentSummaries = [
@@ -370,7 +370,7 @@ describe("createBuiltinAgents with model overrides", () => {
test("excludes custom agents when disabledAgents contains their name (case-insensitive)", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
)
const disabledAgents = ["ReSeArChEr"]
@@ -406,7 +406,7 @@ describe("createBuiltinAgents with model overrides", () => {
test("deduplicates custom agents case-insensitively", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
)
const customAgentSummaries = [
@@ -438,7 +438,7 @@ describe("createBuiltinAgents with model overrides", () => {
test("sanitizes custom agent strings for markdown tables", async () => {
// #given
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
)
const customAgentSummaries = [
@@ -479,7 +479,7 @@ describe("createBuiltinAgents without systemDefaultModel", () => {
// #then - connected cache enables model resolution despite no systemDefaultModel
expect(agents.oracle).toBeDefined()
expect(agents.oracle.model).toBe("openai/gpt-5.2")
expect(agents.oracle.model).toBe("openai/gpt-5.4")
cacheSpy.mockRestore?.()
})
@@ -787,7 +787,7 @@ describe("Atlas is unaffected by environment context toggle", () => {
beforeEach(() => {
fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.2"])
new Set(["anthropic/claude-opus-4-6", "openai/gpt-5.4"])
)
})
@@ -891,9 +891,9 @@ describe("createBuiltinAgents with requiresAnyModel gating (sisyphus)", () => {
})
test("sisyphus is not created when no fallback model is available and provider not connected", async () => {
// #given - only openai/gpt-5.2 available, not in sisyphus fallback chain
// #given - only venice/deepseek-v3.2 available, not in sisyphus fallback chain
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["openai/gpt-5.2"])
new Set(["venice/deepseek-v3.2"])
)
const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue([])
@@ -913,7 +913,7 @@ describe("createBuiltinAgents with requiresAnyModel gating (sisyphus)", () => {
// #given - user configures a model from a plugin provider (like antigravity)
// that is NOT in the availableModels cache and NOT in the fallback chain
const fetchSpy = spyOn(shared, "fetchAvailableModels").mockResolvedValue(
new Set(["openai/gpt-5.2"])
new Set(["openai/gpt-5.4"])
)
const cacheSpy = spyOn(connectedProvidersCache, "readConnectedProvidersCache").mockReturnValue(
["openai"]
@@ -1021,7 +1021,7 @@ describe("buildAgent with category and skills", () => {
const categories = {
"custom-category": {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
variant: "xhigh",
},
}
@@ -1030,7 +1030,7 @@ describe("buildAgent with category and skills", () => {
const agent = buildAgent(source["test-agent"], TEST_MODEL, categories)
// #then
expect(agent.model).toBe("openai/gpt-5.2")
expect(agent.model).toBe("openai/gpt-5.4")
expect(agent.variant).toBe("xhigh")
})
@@ -1247,7 +1247,7 @@ describe("override.category expansion in createBuiltinAgents", () => {
// #given - custom category has reasoningEffort=xhigh, direct override says "low"
const categories = {
"test-cat": {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
reasoningEffort: "xhigh" as const,
},
}
@@ -1267,7 +1267,7 @@ describe("override.category expansion in createBuiltinAgents", () => {
// #given - custom category has reasoningEffort, no direct reasoningEffort in override
const categories = {
"reasoning-cat": {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
reasoningEffort: "high" as const,
},
}

View File

@@ -205,7 +205,7 @@ exports[`generateModelConfig single native provider uses OpenAI models when only
"model": "opencode/glm-4.7-free",
},
"metis": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"momus": {
@@ -213,17 +213,21 @@ exports[`generateModelConfig single native provider uses OpenAI models when only
"variant": "xhigh",
},
"multimodal-looker": {
"model": "openai/gpt-5.3-codex",
"model": "openai/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"prometheus": {
"model": "openai/gpt-5.4",
"variant": "high",
},
"sisyphus": {
"model": "openai/gpt-5.4",
"variant": "medium",
},
},
"categories": {
"deep": {
@@ -274,7 +278,7 @@ exports[`generateModelConfig single native provider uses OpenAI models with isMa
"model": "opencode/glm-4.7-free",
},
"metis": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"momus": {
@@ -282,17 +286,21 @@ exports[`generateModelConfig single native provider uses OpenAI models with isMa
"variant": "xhigh",
},
"multimodal-looker": {
"model": "openai/gpt-5.3-codex",
"model": "openai/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"prometheus": {
"model": "openai/gpt-5.4",
"variant": "high",
},
"sisyphus": {
"model": "openai/gpt-5.4",
"variant": "medium",
},
},
"categories": {
"deep": {
@@ -472,11 +480,11 @@ exports[`generateModelConfig all native providers uses preferred models from fal
"variant": "xhigh",
},
"multimodal-looker": {
"model": "openai/gpt-5.3-codex",
"model": "openai/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -547,11 +555,11 @@ exports[`generateModelConfig all native providers uses preferred models with isM
"variant": "xhigh",
},
"multimodal-looker": {
"model": "openai/gpt-5.3-codex",
"model": "openai/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -623,11 +631,11 @@ exports[`generateModelConfig fallback providers uses OpenCode Zen models when on
"variant": "xhigh",
},
"multimodal-looker": {
"model": "opencode/gpt-5.3-codex",
"model": "opencode/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "opencode/gpt-5.2",
"model": "opencode/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -698,11 +706,11 @@ exports[`generateModelConfig fallback providers uses OpenCode Zen models with is
"variant": "xhigh",
},
"multimodal-looker": {
"model": "opencode/gpt-5.3-codex",
"model": "opencode/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "opencode/gpt-5.2",
"model": "opencode/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -773,7 +781,7 @@ exports[`generateModelConfig fallback providers uses GitHub Copilot models when
"model": "github-copilot/gemini-3-flash-preview",
},
"oracle": {
"model": "github-copilot/gpt-5.2",
"model": "github-copilot/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -839,7 +847,7 @@ exports[`generateModelConfig fallback providers uses GitHub Copilot models with
"model": "github-copilot/gemini-3-flash-preview",
},
"oracle": {
"model": "github-copilot/gpt-5.2",
"model": "github-copilot/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -1017,11 +1025,11 @@ exports[`generateModelConfig mixed provider scenarios uses Claude + OpenCode Zen
"variant": "xhigh",
},
"multimodal-looker": {
"model": "opencode/gpt-5.3-codex",
"model": "opencode/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "opencode/gpt-5.2",
"model": "opencode/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -1092,11 +1100,11 @@ exports[`generateModelConfig mixed provider scenarios uses OpenAI + Copilot comb
"variant": "xhigh",
},
"multimodal-looker": {
"model": "openai/gpt-5.3-codex",
"model": "openai/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -1294,11 +1302,11 @@ exports[`generateModelConfig mixed provider scenarios uses all fallback provider
"variant": "xhigh",
},
"multimodal-looker": {
"model": "opencode/gpt-5.3-codex",
"model": "opencode/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "github-copilot/gpt-5.2",
"model": "github-copilot/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -1369,11 +1377,11 @@ exports[`generateModelConfig mixed provider scenarios uses all providers togethe
"variant": "xhigh",
},
"multimodal-looker": {
"model": "openai/gpt-5.3-codex",
"model": "openai/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"prometheus": {
@@ -1444,11 +1452,11 @@ exports[`generateModelConfig mixed provider scenarios uses all providers with is
"variant": "xhigh",
},
"multimodal-looker": {
"model": "openai/gpt-5.3-codex",
"model": "openai/gpt-5.4",
"variant": "medium",
},
"oracle": {
"model": "openai/gpt-5.2",
"model": "openai/gpt-5.4",
"variant": "high",
},
"prometheus": {

View File

@@ -40,7 +40,7 @@ Examples:
Model Providers (Priority: Native > Copilot > OpenCode Zen > Z.ai > Kimi):
Claude Native anthropic/ models (Opus, Sonnet, Haiku)
OpenAI Native openai/ models (GPT-5.2 for Oracle)
OpenAI Native openai/ models (GPT-5.4 for Oracle)
Gemini Native google/ models (Gemini 3 Pro, Flash)
Copilot github-copilot/ models (fallback)
OpenCode Zen opencode/ models (opencode/claude-opus-4-6, etc.)

View File

@@ -249,12 +249,13 @@ describe("generateOmoConfig - model fallback system", () => {
// #when generating config
const result = generateOmoConfig(config)
// #then Sisyphus is omitted (requires all fallback providers)
expect((result.agents as Record<string, { model: string }>).sisyphus).toBeUndefined()
// #then Sisyphus resolves to gpt-5.4 medium (openai is now in sisyphus chain)
expect((result.agents as Record<string, { model: string; variant?: string }>).sisyphus.model).toBe("openai/gpt-5.4")
expect((result.agents as Record<string, { model: string; variant?: string }>).sisyphus.variant).toBe("medium")
// #then Oracle should use native OpenAI (first fallback entry)
expect((result.agents as Record<string, { model: string }>).oracle.model).toBe("openai/gpt-5.2")
// #then multimodal-looker should use native OpenAI (first fallback entry is gpt-5.3-codex)
expect((result.agents as Record<string, { model: string }>)["multimodal-looker"].model).toBe("openai/gpt-5.3-codex")
expect((result.agents as Record<string, { model: string }>).oracle.model).toBe("openai/gpt-5.4")
// #then multimodal-looker should use native OpenAI (first fallback entry is gpt-5.4)
expect((result.agents as Record<string, { model: string }>)["multimodal-looker"].model).toBe("openai/gpt-5.4")
})
test("uses haiku for explore when Claude max20", () => {

View File

@@ -61,7 +61,7 @@ describe("model-resolution check", () => {
// given: User has override for visual-engineering category
const mockConfig = {
categories: {
"visual-engineering": { model: "openai/gpt-5.2" },
"visual-engineering": { model: "openai/gpt-5.4" },
},
}
@@ -70,8 +70,8 @@ describe("model-resolution check", () => {
// then: visual-engineering should show the override
const visual = info.categories.find((c) => c.name === "visual-engineering")
expect(visual).toBeDefined()
expect(visual!.userOverride).toBe("openai/gpt-5.2")
expect(visual!.effectiveResolution).toBe("User override: openai/gpt-5.2")
expect(visual!.userOverride).toBe("openai/gpt-5.4")
expect(visual!.effectiveResolution).toBe("User override: openai/gpt-5.4")
})
it("shows provider fallback when no override exists", async () => {
@@ -96,7 +96,7 @@ describe("model-resolution check", () => {
//#given User has model with variant override for oracle agent
const mockConfig = {
agents: {
oracle: { model: "openai/gpt-5.2", variant: "xhigh" },
oracle: { model: "openai/gpt-5.4", variant: "xhigh" },
},
}
@@ -106,7 +106,7 @@ describe("model-resolution check", () => {
//#then Oracle should have userVariant set
const oracle = info.agents.find((a) => a.name === "oracle")
expect(oracle).toBeDefined()
expect(oracle!.userOverride).toBe("openai/gpt-5.2")
expect(oracle!.userOverride).toBe("openai/gpt-5.4")
expect(oracle!.userVariant).toBe("xhigh")
})

View File

@@ -32,7 +32,7 @@ export function formatConfigSummary(config: InstallConfig): string {
const claudeDetail = config.hasClaude ? (config.isMax20 ? "max20" : "standard") : undefined
lines.push(formatProvider("Claude", config.hasClaude, claudeDetail))
lines.push(formatProvider("OpenAI/ChatGPT", config.hasOpenAI, "GPT-5.2 for Oracle"))
lines.push(formatProvider("OpenAI/ChatGPT", config.hasOpenAI, "GPT-5.4 for Oracle"))
lines.push(formatProvider("Gemini", config.hasGemini))
lines.push(formatProvider("GitHub Copilot", config.hasCopilot, "fallback"))
lines.push(formatProvider("OpenCode Zen", config.hasOpencodeZen, "opencode/ models"))

View File

@@ -13,6 +13,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
variant: "max",
},
{ providers: ["kimi-for-coding"], model: "k2p5" },
{ providers: ["openai", "github-copilot", "opencode"], model: "gpt-5.4", variant: "medium" },
{ providers: ["zai-coding-plan", "opencode"], model: "glm-5" },
],
requiresAnyModel: true,
@@ -31,7 +32,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
fallbackChain: [
{
providers: ["openai", "github-copilot", "opencode"],
model: "gpt-5.2",
model: "gpt-5.4",
variant: "high",
},
{
@@ -67,7 +68,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
fallbackChain: [
{
providers: ["openai", "opencode"],
model: "gpt-5.3-codex",
model: "gpt-5.4",
variant: "medium",
},
{ providers: ["kimi-for-coding"], model: "k2p5" },
@@ -108,7 +109,7 @@ export const CLI_AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
{ providers: ["kimi-for-coding"], model: "k2p5" },
{
providers: ["openai", "github-copilot", "opencode"],
model: "gpt-5.2",
model: "gpt-5.4",
variant: "high",
},
{
@@ -224,7 +225,7 @@ export const CLI_CATEGORY_MODEL_REQUIREMENTS: Record<string, ModelRequirement> =
},
{
providers: ["openai", "github-copilot", "opencode"],
model: "gpt-5.2",
model: "gpt-5.4",
},
],
requiresModel: "gemini-3.1-pro",

View File

@@ -396,7 +396,7 @@ describe("generateModelConfig", () => {
expect(result.agents?.sisyphus?.model).toBe("anthropic/claude-opus-4-6")
})
test("Sisyphus is omitted when no fallback provider is available (OpenAI not in chain)", () => {
test("Sisyphus resolves to gpt-5.4 medium when only OpenAI is available", () => {
// #given
const config = createConfig({ hasOpenAI: true })
@@ -404,7 +404,8 @@ describe("generateModelConfig", () => {
const result = generateModelConfig(config)
// #then
expect(result.agents?.sisyphus).toBeUndefined()
expect(result.agents?.sisyphus?.model).toBe("openai/gpt-5.4")
expect(result.agents?.sisyphus?.variant).toBe("medium")
})
})

View File

@@ -44,7 +44,7 @@ export async function promptInstallConfig(detected: DetectedConfig): Promise<Ins
message: "Do you have an OpenAI/ChatGPT Plus subscription?",
options: [
{ value: "no", label: "No", hint: "Oracle will use fallback models" },
{ value: "yes", label: "Yes", hint: "GPT-5.2 for Oracle (high-IQ debugging)" },
{ value: "yes", label: "Yes", hint: "GPT-5.4 for Oracle (high-IQ debugging)" },
],
initialValue: initial.openai,
})
@@ -74,7 +74,7 @@ export async function promptInstallConfig(detected: DetectedConfig): Promise<Ins
message: "Do you have access to OpenCode Zen (opencode/ models)?",
options: [
{ value: "no", label: "No", hint: "Will use other configured providers" },
{ value: "yes", label: "Yes", hint: "opencode/claude-opus-4-6, opencode/gpt-5.2, etc." },
{ value: "yes", label: "Yes", hint: "opencode/claude-opus-4-6, opencode/gpt-5.4, etc." },
],
initialValue: initial.opencodeZen,
})

View File

@@ -266,7 +266,7 @@ describe("AgentOverrideConfigSchema", () => {
describe("backward compatibility", () => {
test("still accepts model field (deprecated)", () => {
// given
const config = { model: "openai/gpt-5.2" }
const config = { model: "openai/gpt-5.4" }
// when
const result = AgentOverrideConfigSchema.safeParse(config)
@@ -274,14 +274,14 @@ describe("AgentOverrideConfigSchema", () => {
// then
expect(result.success).toBe(true)
if (result.success) {
expect(result.data.model).toBe("openai/gpt-5.2")
expect(result.data.model).toBe("openai/gpt-5.4")
}
})
test("accepts both model and category (deprecated usage)", () => {
// given - category should take precedence at runtime, but both should validate
const config = {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
category: "ultrabrain"
}
@@ -291,7 +291,7 @@ describe("AgentOverrideConfigSchema", () => {
// then
expect(result.success).toBe(true)
if (result.success) {
expect(result.data.model).toBe("openai/gpt-5.2")
expect(result.data.model).toBe("openai/gpt-5.4")
expect(result.data.category).toBe("ultrabrain")
}
})
@@ -343,7 +343,7 @@ describe("AgentOverrideConfigSchema", () => {
describe("CategoryConfigSchema", () => {
test("accepts variant as optional string", () => {
// given
const config = { model: "openai/gpt-5.2", variant: "xhigh" }
const config = { model: "openai/gpt-5.4", variant: "xhigh" }
// when
const result = CategoryConfigSchema.safeParse(config)
@@ -371,7 +371,7 @@ describe("CategoryConfigSchema", () => {
test("rejects non-string variant", () => {
// given
const config = { model: "openai/gpt-5.2", variant: 123 }
const config = { model: "openai/gpt-5.4", variant: 123 }
// when
const result = CategoryConfigSchema.safeParse(config)
@@ -413,7 +413,7 @@ describe("Sisyphus-Junior agent override", () => {
const config = {
agents: {
"sisyphus-junior": {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
temperature: 0.2,
},
},
@@ -426,7 +426,7 @@ describe("Sisyphus-Junior agent override", () => {
expect(result.success).toBe(true)
if (result.success) {
expect(result.data.agents?.["sisyphus-junior"]).toBeDefined()
expect(result.data.agents?.["sisyphus-junior"]?.model).toBe("openai/gpt-5.2")
expect(result.data.agents?.["sisyphus-junior"]?.model).toBe("openai/gpt-5.4")
expect(result.data.agents?.["sisyphus-junior"]?.temperature).toBe(0.2)
}
})

View File

@@ -224,6 +224,12 @@ function stubNotifyParentSession(manager: BackgroundManager): void {
;(manager as unknown as { notifyParentSession: () => Promise<void> }).notifyParentSession = async () => {}
}
async function flushBackgroundNotifications(): Promise<void> {
for (let i = 0; i < 6; i++) {
await Promise.resolve()
}
}
function createToastRemoveTaskTracker(): { removeTaskCalls: string[]; resetToastManager: () => void } {
_resetTaskToastManagerForTesting()
const toastManager = initTaskToastManager({
@@ -1306,11 +1312,20 @@ describe("BackgroundManager.tryCompleteTask", () => {
expect(abortedSessionIDs).toEqual(["session-1"])
})
test("should clean pendingByParent even when notifyParentSession throws", async () => {
test("should clean pendingByParent even when promptAsync notification fails", async () => {
// given
;(manager as unknown as { notifyParentSession: () => Promise<void> }).notifyParentSession = async () => {
throw new Error("notify failed")
const client = {
session: {
prompt: async () => ({}),
promptAsync: async () => {
throw new Error("notify failed")
},
abort: async () => ({}),
messages: async () => ({ data: [] }),
},
}
manager.shutdown()
manager = new BackgroundManager({ client, directory: tmpdir() } as unknown as PluginInput)
const task: BackgroundTask = {
id: "task-pending-cleanup",
@@ -1424,7 +1439,7 @@ describe("BackgroundManager.tryCompleteTask", () => {
// then
expect(rejectedCount).toBe(0)
expect(promptBodies.length).toBe(2)
expect(promptBodies.some((b) => b.noReply === false)).toBe(true)
expect(promptBodies.filter((body) => body.noReply === false)).toHaveLength(1)
})
})
@@ -1932,7 +1947,6 @@ describe("BackgroundManager - Non-blocking Queue Integration", () => {
test("should cancel running task and release concurrency", async () => {
// given
const manager = createBackgroundManager()
stubNotifyParentSession(manager)
const concurrencyManager = getConcurrencyManager(manager)
const concurrencyKey = "test-provider/test-model"
@@ -2078,7 +2092,7 @@ describe("BackgroundManager - Non-blocking Queue Integration", () => {
description: "Task 2",
prompt: "Do something else",
agent: "test-agent",
model: { providerID: "openai", modelID: "gpt-5.2" },
model: { providerID: "openai", modelID: "gpt-5.4" },
parentSessionID: "parent-session",
parentMessageID: "parent-message",
}
@@ -2890,7 +2904,7 @@ describe("BackgroundManager.shutdown session abort", () => {
})
describe("BackgroundManager.handleEvent - session.deleted cascade", () => {
test("should cancel descendant tasks when parent session is deleted", () => {
test("should cancel descendant tasks and keep them until delayed cleanup", async () => {
// given
const manager = createBackgroundManager()
const parentSessionID = "session-parent"
@@ -2937,21 +2951,26 @@ describe("BackgroundManager.handleEvent - session.deleted cascade", () => {
properties: { info: { id: parentSessionID } },
})
await flushBackgroundNotifications()
// then
expect(taskMap.has(childTask.id)).toBe(false)
expect(taskMap.has(siblingTask.id)).toBe(false)
expect(taskMap.has(grandchildTask.id)).toBe(false)
expect(taskMap.has(childTask.id)).toBe(true)
expect(taskMap.has(siblingTask.id)).toBe(true)
expect(taskMap.has(grandchildTask.id)).toBe(true)
expect(taskMap.has(unrelatedTask.id)).toBe(true)
expect(childTask.status).toBe("cancelled")
expect(siblingTask.status).toBe("cancelled")
expect(grandchildTask.status).toBe("cancelled")
expect(pendingByParent.get(parentSessionID)).toBeUndefined()
expect(pendingByParent.get("session-child")).toBeUndefined()
expect(getCompletionTimers(manager).has(childTask.id)).toBe(true)
expect(getCompletionTimers(manager).has(siblingTask.id)).toBe(true)
expect(getCompletionTimers(manager).has(grandchildTask.id)).toBe(true)
manager.shutdown()
})
test("should remove tasks from toast manager when session is deleted", () => {
test("should remove cancelled tasks from toast manager while preserving delayed cleanup", async () => {
//#given
const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
const manager = createBackgroundManager()
@@ -2980,9 +2999,13 @@ describe("BackgroundManager.handleEvent - session.deleted cascade", () => {
properties: { info: { id: parentSessionID } },
})
await flushBackgroundNotifications()
//#then
expect(removeTaskCalls).toContain(childTask.id)
expect(removeTaskCalls).toContain(grandchildTask.id)
expect(getCompletionTimers(manager).has(childTask.id)).toBe(true)
expect(getCompletionTimers(manager).has(grandchildTask.id)).toBe(true)
manager.shutdown()
resetToastManager()
@@ -3045,7 +3068,7 @@ describe("BackgroundManager.handleEvent - session.error", () => {
return task
}
test("sets task to error, releases concurrency, and cleans up", async () => {
test("sets task to error, releases concurrency, and keeps it until delayed cleanup", async () => {
//#given
const manager = createBackgroundManager()
const concurrencyManager = getConcurrencyManager(manager)
@@ -3078,18 +3101,21 @@ describe("BackgroundManager.handleEvent - session.error", () => {
},
})
await flushBackgroundNotifications()
//#then
expect(task.status).toBe("error")
expect(task.error).toBe("Model not found: kimi-for-coding/k2p5.")
expect(task.completedAt).toBeInstanceOf(Date)
expect(concurrencyManager.getCount(concurrencyKey)).toBe(0)
expect(getTaskMap(manager).has(task.id)).toBe(false)
expect(getTaskMap(manager).has(task.id)).toBe(true)
expect(getPendingByParent(manager).get(task.parentSessionID)).toBeUndefined()
expect(getCompletionTimers(manager).has(task.id)).toBe(true)
manager.shutdown()
})
test("removes errored task from toast manager", () => {
test("should remove errored task from toast manager while preserving delayed cleanup", async () => {
//#given
const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
const manager = createBackgroundManager()
@@ -3111,8 +3137,11 @@ describe("BackgroundManager.handleEvent - session.error", () => {
},
})
await flushBackgroundNotifications()
//#then
expect(removeTaskCalls).toContain(task.id)
expect(getCompletionTimers(manager).has(task.id)).toBe(true)
manager.shutdown()
resetToastManager()
@@ -3393,7 +3422,7 @@ describe("BackgroundManager.pruneStaleTasksAndNotifications - removes pruned tas
manager.shutdown()
})
test("removes stale task from toast manager", () => {
test("removes stale task from toast manager", async () => {
//#given
const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
const manager = createBackgroundManager()
@@ -3408,6 +3437,7 @@ describe("BackgroundManager.pruneStaleTasksAndNotifications - removes pruned tas
//#when
pruneStaleTasksAndNotificationsForTest(manager)
await flushBackgroundNotifications()
//#then
expect(removeTaskCalls).toContain(staleTask.id)
@@ -3415,6 +3445,53 @@ describe("BackgroundManager.pruneStaleTasksAndNotifications - removes pruned tas
manager.shutdown()
resetToastManager()
})
test("keeps stale task until notification cleanup after notifying parent", async () => {
//#given
const notifications: string[] = []
const { removeTaskCalls, resetToastManager } = createToastRemoveTaskTracker()
const client = {
session: {
prompt: async () => ({}),
promptAsync: async (args: { path: { id: string }; body: Record<string, unknown> & { noReply?: boolean; parts?: unknown[] } }) => {
const firstPart = args.body.parts?.[0]
if (firstPart && typeof firstPart === "object" && "text" in firstPart && typeof firstPart.text === "string") {
notifications.push(firstPart.text)
}
return {}
},
abort: async () => ({}),
messages: async () => ({ data: [] }),
},
}
const manager = new BackgroundManager({ client, directory: tmpdir() } as unknown as PluginInput)
const staleTask = createMockTask({
id: "task-stale-notify-cleanup",
sessionID: "session-stale-notify-cleanup",
parentSessionID: "parent-stale-notify-cleanup",
status: "running",
startedAt: new Date(Date.now() - 31 * 60 * 1000),
})
getTaskMap(manager).set(staleTask.id, staleTask)
getPendingByParent(manager).set(staleTask.parentSessionID, new Set([staleTask.id]))
//#when
pruneStaleTasksAndNotificationsForTest(manager)
await flushBackgroundNotifications()
//#then
const retainedTask = getTaskMap(manager).get(staleTask.id)
expect(retainedTask?.status).toBe("error")
expect(getTaskMap(manager).has(staleTask.id)).toBe(true)
expect(notifications).toHaveLength(1)
expect(notifications[0]).toContain("[ALL BACKGROUND TASKS COMPLETE]")
expect(notifications[0]).toContain(staleTask.description)
expect(getCompletionTimers(manager).has(staleTask.id)).toBe(true)
expect(removeTaskCalls).toContain(staleTask.id)
manager.shutdown()
resetToastManager()
})
})
describe("BackgroundManager.completionTimers - Memory Leak Fix", () => {
@@ -3518,7 +3595,7 @@ describe("BackgroundManager.completionTimers - Memory Leak Fix", () => {
expect(completionTimers.size).toBe(0)
})
test("should cancel timer when task is deleted via session.deleted", () => {
test("should preserve cleanup timer when terminal task session is deleted", () => {
// given
const manager = createBackgroundManager()
const task: BackgroundTask = {
@@ -3547,7 +3624,7 @@ describe("BackgroundManager.completionTimers - Memory Leak Fix", () => {
})
// then
expect(completionTimers.has(task.id)).toBe(false)
expect(completionTimers.has(task.id)).toBe(true)
manager.shutdown()
})

View File

@@ -390,7 +390,6 @@ export class BackgroundManager {
}).catch(() => {})
this.markForNotification(existingTask)
this.cleanupPendingByParent(existingTask)
this.enqueueNotificationForParent(existingTask.parentSessionID, () => this.notifyParentSession(existingTask)).catch(err => {
log("[background-agent] Failed to notify on error:", err)
})
@@ -661,7 +660,6 @@ export class BackgroundManager {
}
this.markForNotification(existingTask)
this.cleanupPendingByParent(existingTask)
this.enqueueNotificationForParent(existingTask.parentSessionID, () => this.notifyParentSession(existingTask)).catch(err => {
log("[background-agent] Failed to notify on resume error:", err)
})
@@ -804,16 +802,14 @@ export class BackgroundManager {
this.idleDeferralTimers.delete(task.id)
}
this.cleanupPendingByParent(task)
this.tasks.delete(task.id)
this.clearNotificationsForTask(task.id)
const toastManager = getTaskToastManager()
if (toastManager) {
toastManager.removeTask(task.id)
}
if (task.sessionID) {
subagentSessions.delete(task.sessionID)
SessionCategoryRegistry.remove(task.sessionID)
}
this.markForNotification(task)
this.enqueueNotificationForParent(task.parentSessionID, () => this.notifyParentSession(task)).catch(err => {
log("[background-agent] Error in notifyParentSession for errored task:", { taskId: task.id, error: err })
})
}
if (event.type === "session.deleted") {
@@ -834,47 +830,30 @@ export class BackgroundManager {
if (tasksToCancel.size === 0) return
const deletedSessionIDs = new Set<string>([sessionID])
for (const task of tasksToCancel.values()) {
if (task.sessionID) {
deletedSessionIDs.add(task.sessionID)
}
}
for (const task of tasksToCancel.values()) {
if (task.status === "running" || task.status === "pending") {
void this.cancelTask(task.id, {
source: "session.deleted",
reason: "Session deleted",
skipNotification: true,
}).then(() => {
if (deletedSessionIDs.has(task.parentSessionID)) {
this.pendingNotifications.delete(task.parentSessionID)
}
}).catch(err => {
if (deletedSessionIDs.has(task.parentSessionID)) {
this.pendingNotifications.delete(task.parentSessionID)
}
log("[background-agent] Failed to cancel task on session.deleted:", { taskId: task.id, error: err })
})
}
const existingTimer = this.completionTimers.get(task.id)
if (existingTimer) {
clearTimeout(existingTimer)
this.completionTimers.delete(task.id)
}
const idleTimer = this.idleDeferralTimers.get(task.id)
if (idleTimer) {
clearTimeout(idleTimer)
this.idleDeferralTimers.delete(task.id)
}
this.cleanupPendingByParent(task)
this.tasks.delete(task.id)
this.clearNotificationsForTask(task.id)
const toastManager = getTaskToastManager()
if (toastManager) {
toastManager.removeTask(task.id)
}
if (task.sessionID) {
subagentSessions.delete(task.sessionID)
}
}
for (const task of tasksToCancel.values()) {
if (task.parentSessionID) {
this.pendingNotifications.delete(task.parentSessionID)
}
}
SessionCategoryRegistry.remove(sessionID)
}
@@ -1094,8 +1073,6 @@ export class BackgroundManager {
this.idleDeferralTimers.delete(task.id)
}
this.cleanupPendingByParent(task)
if (abortSession && task.sessionID) {
this.client.session.abort({
path: { id: task.sessionID },
@@ -1202,9 +1179,6 @@ export class BackgroundManager {
this.markForNotification(task)
// Ensure pending tracking is cleaned up even if notification fails
this.cleanupPendingByParent(task)
const idleTimer = this.idleDeferralTimers.get(task.id)
if (idleTimer) {
clearTimeout(idleTimer)
@@ -1260,7 +1234,10 @@ export class BackgroundManager {
this.pendingByParent.delete(task.parentSessionID)
}
} else {
allComplete = true
remainingCount = Array.from(this.tasks.values())
.filter(t => t.parentSessionID === task.parentSessionID && t.id !== task.id && (t.status === "running" || t.status === "pending"))
.length
allComplete = remainingCount === 0
}
const completedTasks = allComplete
@@ -1268,7 +1245,13 @@ export class BackgroundManager {
.filter(t => t.parentSessionID === task.parentSessionID && t.status !== "running" && t.status !== "pending")
: []
const statusText = task.status === "completed" ? "COMPLETED" : task.status === "interrupt" ? "INTERRUPTED" : "CANCELLED"
const statusText = task.status === "completed"
? "COMPLETED"
: task.status === "interrupt"
? "INTERRUPTED"
: task.status === "error"
? "ERROR"
: "CANCELLED"
const errorInfo = task.error ? `\n**Error:** ${task.error}` : ""
let notification: string
@@ -1399,8 +1382,13 @@ Use \`background_output(task_id="${task.id}")\` to retrieve this result when rea
}
const timer = setTimeout(() => {
this.completionTimers.delete(taskId)
if (this.tasks.has(taskId)) {
const taskToRemove = this.tasks.get(taskId)
if (taskToRemove) {
this.clearNotificationsForTask(taskId)
if (taskToRemove.sessionID) {
subagentSessions.delete(taskToRemove.sessionID)
SessionCategoryRegistry.remove(taskToRemove.sessionID)
}
this.tasks.delete(taskId)
log("[background-agent] Removed completed task from memory:", taskId)
}
@@ -1435,11 +1423,21 @@ Use \`background_output(task_id="${task.id}")\` to retrieve this result when rea
task.status = "error"
task.error = errorMessage
task.completedAt = new Date()
this.taskHistory.record(task.parentSessionID, { id: task.id, sessionID: task.sessionID, agent: task.agent, description: task.description, status: "error", category: task.category, startedAt: task.startedAt, completedAt: task.completedAt })
if (task.concurrencyKey) {
this.concurrencyManager.release(task.concurrencyKey)
task.concurrencyKey = undefined
}
this.cleanupPendingByParent(task)
const existingTimer = this.completionTimers.get(taskId)
if (existingTimer) {
clearTimeout(existingTimer)
this.completionTimers.delete(taskId)
}
const idleTimer = this.idleDeferralTimers.get(taskId)
if (idleTimer) {
clearTimeout(idleTimer)
this.idleDeferralTimers.delete(taskId)
}
if (wasPending) {
const key = task.model
? `${task.model.providerID}/${task.model.modelID}`
@@ -1455,16 +1453,10 @@ Use \`background_output(task_id="${task.id}")\` to retrieve this result when rea
}
}
}
this.clearNotificationsForTask(taskId)
const toastManager = getTaskToastManager()
if (toastManager) {
toastManager.removeTask(taskId)
}
this.tasks.delete(taskId)
if (task.sessionID) {
subagentSessions.delete(task.sessionID)
SessionCategoryRegistry.remove(task.sessionID)
}
this.markForNotification(task)
this.enqueueNotificationForParent(task.parentSessionID, () => this.notifyParentSession(task)).catch(err => {
log("[background-agent] Error in notifyParentSession for stale-pruned task:", { taskId: task.id, error: err })
})
},
})
}

View File

@@ -422,4 +422,38 @@ describe("pruneStaleTasksAndNotifications", () => {
//#then
expect(pruned).toContain("old-task")
})
it("should skip terminal tasks even when they exceeded TTL", () => {
//#given
const tasks = new Map<string, BackgroundTask>()
const oldStartedAt = new Date(Date.now() - 31 * 60 * 1000)
const terminalStatuses: BackgroundTask["status"][] = ["completed", "error", "cancelled", "interrupt"]
for (const status of terminalStatuses) {
tasks.set(status, {
id: status,
parentSessionID: "parent",
parentMessageID: "msg",
description: status,
prompt: status,
agent: "explore",
status,
startedAt: oldStartedAt,
completedAt: new Date(),
})
}
const pruned: string[] = []
//#when
pruneStaleTasksAndNotifications({
tasks,
notifications: new Map<string, BackgroundTask[]>(),
onTaskPruned: (taskId) => pruned.push(taskId),
})
//#then
expect(pruned).toEqual([])
expect(Array.from(tasks.keys())).toEqual(terminalStatuses)
})
})

View File

@@ -12,6 +12,13 @@ import {
TASK_TTL_MS,
} from "./constants"
const TERMINAL_TASK_STATUSES = new Set<BackgroundTask["status"]>([
"completed",
"error",
"cancelled",
"interrupt",
])
export function pruneStaleTasksAndNotifications(args: {
tasks: Map<string, BackgroundTask>
notifications: Map<string, BackgroundTask[]>
@@ -21,6 +28,8 @@ export function pruneStaleTasksAndNotifications(args: {
const now = Date.now()
for (const [taskId, task] of tasks.entries()) {
if (TERMINAL_TASK_STATUSES.has(task.status)) continue
const timestamp = task.status === "pending"
? task.queuedAt?.getTime()
: task.startedAt?.getTime()

View File

@@ -1,7 +1,7 @@
import type { CommandDefinition } from "../claude-code-command-loader"
import type { BuiltinCommandName, BuiltinCommands } from "./types"
import { INIT_DEEP_TEMPLATE } from "./templates/init-deep"
import { RALPH_LOOP_TEMPLATE, CANCEL_RALPH_TEMPLATE } from "./templates/ralph-loop"
import { RALPH_LOOP_TEMPLATE, ULW_LOOP_TEMPLATE, CANCEL_RALPH_TEMPLATE } from "./templates/ralph-loop"
import { STOP_CONTINUATION_TEMPLATE } from "./templates/stop-continuation"
import { REFACTOR_TEMPLATE } from "./templates/refactor"
import { START_WORK_TEMPLATE } from "./templates/start-work"
@@ -31,16 +31,16 @@ $ARGUMENTS
argumentHint: '"task description" [--completion-promise=TEXT] [--max-iterations=N] [--strategy=reset|continue]',
},
"ulw-loop": {
description: "(builtin) Start ultrawork loop - continues until completion with ultrawork mode",
template: `<command-instruction>
${RALPH_LOOP_TEMPLATE}
description: "(builtin) Start ultrawork loop - continues until completion with ultrawork mode",
template: `<command-instruction>
${ULW_LOOP_TEMPLATE}
</command-instruction>
<user-task>
$ARGUMENTS
</user-task>`,
argumentHint: '"task description" [--completion-promise=TEXT] [--max-iterations=N] [--strategy=reset|continue]',
},
argumentHint: '"task description" [--completion-promise=TEXT] [--strategy=reset|continue]',
},
"cancel-ralph": {
description: "(builtin) Cancel active Ralph Loop",
template: `<command-instruction>

View File

@@ -28,6 +28,34 @@ Parse the arguments below and begin working on the task. The format is:
Default completion promise is "DONE" and default max iterations is 100.`
export const ULW_LOOP_TEMPLATE = `You are starting an ULTRAWORK Loop - a self-referential development loop that runs until verified completion.
## How ULTRAWORK Loop Works
1. You will work on the task continuously
2. When you believe the work is complete, output: \`<promise>{{COMPLETION_PROMISE}}</promise>\`
3. That does NOT finish the loop yet. The system will require Oracle verification
4. The loop only ends after the system confirms Oracle verified the result
5. There is no iteration limit
## Rules
- Focus on finishing the task completely
- After you emit the completion promise, run Oracle verification when instructed
- Do not treat DONE as final completion until Oracle verifies it
## Exit Conditions
1. **Verified Completion**: Oracle verifies the result and the system confirms it
2. **Cancel**: User runs \`/cancel-ralph\`
## Your Task
Parse the arguments below and begin working on the task. The format is:
\`"task description" [--completion-promise=TEXT] [--strategy=reset|continue]\`
Default completion promise is "DONE".`
export const CANCEL_RALPH_TEMPLATE = `Cancel the currently active Ralph Loop.
This will:

View File

@@ -162,7 +162,7 @@ describe("createAnthropicEffortHook", () => {
const hook = createAnthropicEffortHook()
const { input, output } = createMockParams({
providerID: "openai",
modelID: "gpt-5.2",
modelID: "gpt-5.4",
})
//#when chat.params hook is called

View File

@@ -0,0 +1,108 @@
declare const require: (name: string) => any
const { afterEach, beforeEach, describe, expect, mock, test } = require("bun:test")
import { existsSync, mkdirSync, rmSync, writeFileSync } from "node:fs"
import { tmpdir } from "node:os"
import { join } from "node:path"
import { randomUUID } from "node:crypto"
import { clearBoulderState, writeBoulderState } from "../../features/boulder-state"
import { _resetForTesting } from "../../features/claude-code-session-state"
import type { BoulderState } from "../../features/boulder-state"
const TEST_STORAGE_ROOT = join(tmpdir(), `atlas-compaction-storage-${randomUUID()}`)
const TEST_MESSAGE_STORAGE = join(TEST_STORAGE_ROOT, "message")
const TEST_PART_STORAGE = join(TEST_STORAGE_ROOT, "part")
mock.module("../../features/hook-message-injector/constants", () => ({
OPENCODE_STORAGE: TEST_STORAGE_ROOT,
MESSAGE_STORAGE: TEST_MESSAGE_STORAGE,
PART_STORAGE: TEST_PART_STORAGE,
}))
mock.module("../../shared/opencode-message-dir", () => ({
getMessageDir: (sessionID: string) => {
const directory = join(TEST_MESSAGE_STORAGE, sessionID)
return existsSync(directory) ? directory : null
},
}))
mock.module("../../shared/opencode-storage-detection", () => ({
isSqliteBackend: () => false,
}))
const { createAtlasHook } = await import("./index")
describe("atlas hook compaction agent filtering", () => {
let testDirectory: string
function createMockPluginInput() {
const promptMock = mock(() => Promise.resolve())
return {
directory: testDirectory,
client: {
session: {
prompt: promptMock,
promptAsync: promptMock,
},
},
_promptMock: promptMock,
} as Parameters<typeof createAtlasHook>[0] & { _promptMock: ReturnType<typeof mock> }
}
function writeMessage(sessionID: string, fileName: string, agent: string): void {
const messageDir = join(TEST_MESSAGE_STORAGE, sessionID)
mkdirSync(messageDir, { recursive: true })
writeFileSync(
join(messageDir, fileName),
JSON.stringify({
agent,
model: { providerID: "anthropic", modelID: "claude-opus-4-6" },
}),
)
}
beforeEach(() => {
testDirectory = join(tmpdir(), `atlas-compaction-test-${randomUUID()}`)
mkdirSync(testDirectory, { recursive: true })
clearBoulderState(testDirectory)
_resetForTesting()
})
afterEach(() => {
clearBoulderState(testDirectory)
rmSync(testDirectory, { recursive: true, force: true })
_resetForTesting()
})
test("should inject continuation when the latest message is compaction but the previous agent matches atlas", async () => {
// given
const sessionID = "main-session-after-compaction"
const planPath = join(testDirectory, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [ ] Task 2")
const state: BoulderState = {
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: [sessionID],
plan_name: "test-plan",
agent: "atlas",
}
writeBoulderState(testDirectory, state)
writeMessage(sessionID, "msg_001.json", "atlas")
writeMessage(sessionID, "msg_002.json", "compaction")
const mockInput = createMockPluginInput()
const hook = createAtlasHook(mockInput)
// when
await hook.handler({
event: {
type: "session.idle",
properties: { sessionID },
},
})
// then
expect(mockInput._promptMock).toHaveBeenCalledTimes(1)
})
})

View File

@@ -409,6 +409,123 @@ describe("atlas hook", () => {
cleanupMessageStorage(sessionID)
})
describe("completion gate output ordering", () => {
const COMPLETION_GATE_SESSION = "completion-gate-order-test"
beforeEach(() => {
setupMessageStorage(COMPLETION_GATE_SESSION, "atlas")
})
afterEach(() => {
cleanupMessageStorage(COMPLETION_GATE_SESSION)
})
test("should include completion gate before Subagent Response in transformed boulder output", async () => {
// given - Atlas caller with boulder state
const planPath = join(TEST_DIR, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
const state: BoulderState = {
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: ["session-1"],
plan_name: "test-plan",
}
writeBoulderState(TEST_DIR, state)
const hook = createAtlasHook(createMockPluginInput())
const output = {
title: "Sisyphus Task",
output: "Task completed successfully",
metadata: {},
}
// when
await hook["tool.execute.after"](
{ tool: "task", sessionID: COMPLETION_GATE_SESSION },
output
)
// then - completion gate should appear BEFORE Subagent Response
const subagentResponseIndex = output.output.indexOf("**Subagent Response:**")
const completionGateIndex = output.output.indexOf("COMPLETION GATE")
expect(completionGateIndex).toBeGreaterThanOrEqual(0)
expect(subagentResponseIndex).toBeGreaterThanOrEqual(0)
expect(completionGateIndex).toBeLessThan(subagentResponseIndex)
})
test("should include completion gate before verification phase text", async () => {
// given - Atlas caller with boulder state
const planPath = join(TEST_DIR, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
const state: BoulderState = {
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: ["session-1"],
plan_name: "test-plan",
}
writeBoulderState(TEST_DIR, state)
const hook = createAtlasHook(createMockPluginInput())
const output = {
title: "Sisyphus Task",
output: "Task completed successfully",
metadata: {},
}
// when
await hook["tool.execute.after"](
{ tool: "task", sessionID: COMPLETION_GATE_SESSION },
output
)
// then - completion gate should appear BEFORE verification phase text
const completionGateIndex = output.output.indexOf("COMPLETION GATE")
const lyingIndex = output.output.indexOf("LYING")
const phase1Index = output.output.indexOf("PHASE 1")
expect(completionGateIndex).toBeGreaterThanOrEqual(0)
expect(lyingIndex).toBeGreaterThanOrEqual(0)
expect(completionGateIndex).toBeLessThan(lyingIndex)
if (phase1Index !== -1) {
expect(completionGateIndex).toBeLessThan(phase1Index)
}
})
test("should not contain old STEP 7 MARK COMPLETION IN PLAN FILE text", async () => {
// given - Atlas caller with boulder state
const planPath = join(TEST_DIR, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
const state: BoulderState = {
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: ["session-1"],
plan_name: "test-plan",
}
writeBoulderState(TEST_DIR, state)
const hook = createAtlasHook(createMockPluginInput())
const output = {
title: "Sisyphus Task",
output: "Task completed successfully",
metadata: {},
}
// when
await hook["tool.execute.after"](
{ tool: "task", sessionID: COMPLETION_GATE_SESSION },
output
)
// then - old STEP 7 MARK COMPLETION IN PLAN FILE should be absent
expect(output.output).not.toContain("STEP 7: MARK COMPLETION IN PLAN FILE")
expect(output.output).not.toContain("MARK COMPLETION IN PLAN FILE")
})
})
describe("Write/Edit tool direct work reminder", () => {
const ORCHESTRATOR_SESSION = "orchestrator-write-test"

View File

@@ -0,0 +1,46 @@
const { describe, expect, mock, test } = require("bun:test")
mock.module("../../shared", () => ({
getMessageDir: () => null,
isSqliteBackend: () => true,
normalizeSDKResponse: <TData>(response: { data?: TData }, fallback: TData): TData => response.data ?? fallback,
}))
const { getLastAgentFromSession } = await import("./session-last-agent")
function createMockClient(messages: Array<{ info?: { agent?: string } }>) {
return {
session: {
messages: async () => ({ data: messages }),
},
}
}
describe("getLastAgentFromSession sqlite branch", () => {
test("should skip compaction and return the previous real agent from sqlite messages", async () => {
// given
const client = createMockClient([
{ info: { agent: "atlas" } },
{ info: { agent: "compaction" } },
])
// when
const result = await getLastAgentFromSession("ses_sqlite_compaction", client)
// then
expect(result).toBe("atlas")
})
test("should return null when sqlite history contains only compaction", async () => {
// given
const client = createMockClient([{ info: { agent: "compaction" } }])
// when
const result = await getLastAgentFromSession("ses_sqlite_only_compaction", client)
// then
expect(result).toBeNull()
})
})
export {}

View File

@@ -1,24 +1,65 @@
import type { PluginInput } from "@opencode-ai/plugin"
import { readFileSync, readdirSync } from "node:fs"
import { join } from "node:path"
import { findNearestMessageWithFields } from "../../features/hook-message-injector"
import { findNearestMessageWithFieldsFromSDK } from "../../features/hook-message-injector"
import { getMessageDir, isSqliteBackend } from "../../shared"
import { getMessageDir, isSqliteBackend, normalizeSDKResponse } from "../../shared"
type OpencodeClient = PluginInput["client"]
type SessionMessagesClient = {
session: {
messages: (input: { path: { id: string } }) => Promise<unknown>
}
}
function isCompactionAgent(agent: unknown): boolean {
return typeof agent === "string" && agent.toLowerCase() === "compaction"
}
function getLastAgentFromMessageDir(messageDir: string): string | null {
try {
const files = readdirSync(messageDir)
.filter((fileName) => fileName.endsWith(".json"))
.sort()
.reverse()
for (const fileName of files) {
try {
const content = readFileSync(join(messageDir, fileName), "utf-8")
const parsed = JSON.parse(content) as { agent?: unknown }
if (typeof parsed.agent === "string" && !isCompactionAgent(parsed.agent)) {
return parsed.agent.toLowerCase()
}
} catch {
continue
}
}
} catch {
return null
}
return null
}
export async function getLastAgentFromSession(
sessionID: string,
client?: OpencodeClient
client?: SessionMessagesClient
): Promise<string | null> {
let nearest = null
if (isSqliteBackend() && client) {
nearest = await findNearestMessageWithFieldsFromSDK(client, sessionID)
} else {
const messageDir = getMessageDir(sessionID)
if (!messageDir) return null
nearest = findNearestMessageWithFields(messageDir)
const response = await client.session.messages({ path: { id: sessionID } })
const messages = normalizeSDKResponse(response, [] as Array<{ info?: { agent?: string } }>, {
preferResponseOnMissingData: true,
})
for (let i = messages.length - 1; i >= 0; i--) {
const agent = messages[i].info?.agent
if (typeof agent === "string" && !isCompactionAgent(agent)) {
return agent.toLowerCase()
}
}
return null
}
return nearest?.agent?.toLowerCase() ?? null
const messageDir = getMessageDir(sessionID)
if (!messageDir) return null
return getLastAgentFromMessageDir(messageDir)
}

View File

@@ -0,0 +1,37 @@
import { describe, it, expect } from "bun:test"
import { BOULDER_CONTINUATION_PROMPT } from "./system-reminder-templates"
describe("BOULDER_CONTINUATION_PROMPT", () => {
describe("checkbox-first priority rules", () => {
it("first rule after RULES: mentions both reading the plan AND marking a still-unchecked completed task", () => {
const rulesSection = BOULDER_CONTINUATION_PROMPT.split("RULES:")[1]!
const firstRule = rulesSection.split("\n")[1]!.trim()
expect(firstRule).toContain("Read the plan")
expect(firstRule).toContain("mark")
expect(firstRule).toContain("completed")
})
it("first rule includes IMMEDIATELY keyword", () => {
const rulesSection = BOULDER_CONTINUATION_PROMPT.split("RULES:")[1]!
const firstRule = rulesSection.split("\n")[1]!.trim()
expect(firstRule).toContain("IMMEDIATELY")
})
it("checkbox-marking guidance appears BEFORE Proceed without asking for permission", () => {
const rulesSection = BOULDER_CONTINUATION_PROMPT.split("RULES:")[1]!
const checkboxMarkingMatch = rulesSection.match(/- \[x\]/i)
const proceedMatch = rulesSection.match(/Proceed without asking for permission/)
expect(checkboxMarkingMatch).not.toBeNull()
expect(proceedMatch).not.toBeNull()
const checkboxPosition = checkboxMarkingMatch!.index
const proceedPosition = proceedMatch!.index
expect(checkboxPosition).toBeLessThan(proceedPosition)
})
})
})

View File

@@ -33,9 +33,8 @@ export const BOULDER_CONTINUATION_PROMPT = `${createSystemDirective(SystemDirect
You have an active work plan with incomplete tasks. Continue working.
RULES:
- **FIRST**: Read the plan file NOW to check exact current progress — count remaining \`- [ ]\` tasks
- **FIRST**: Read the plan file NOW. If the last completed task is still unchecked, mark it \`- [x]\` IMMEDIATELY before anything else
- Proceed without asking for permission
- Change \`- [ ]\` to \`- [x]\` in the plan file when done
- Use the notepad at .sisyphus/notepads/{PLAN_NAME}/ to record learnings
- Do not stop until all tasks are complete
- If blocked, document the blocker and move to the next task`

View File

@@ -7,7 +7,7 @@ import { HOOK_NAME } from "./hook-name"
import { DIRECT_WORK_REMINDER } from "./system-reminder-templates"
import { isSisyphusPath } from "./sisyphus-path"
import { extractSessionIdFromOutput } from "./subagent-session-id"
import { buildOrchestratorReminder, buildStandaloneVerificationReminder } from "./verification-reminders"
import { buildCompletionGate, buildOrchestratorReminder, buildStandaloneVerificationReminder } from "./verification-reminders"
import { isWriteOrEditToolName } from "./write-edit-tool-policy"
import type { ToolExecuteAfterInput, ToolExecuteAfterOutput } from "./types"
@@ -76,7 +76,11 @@ export function createToolExecuteAfterHandler(input: {
// Preserve original subagent response - critical for debugging failed tasks
const originalResponse = toolOutput.output
toolOutput.output = `
toolOutput.output = `
<system-reminder>
${buildCompletionGate(boulderState.plan_name, subagentSessionId)}
</system-reminder>
## SUBAGENT WORK COMPLETED
${fileChanges}
@@ -88,7 +92,7 @@ ${fileChanges}
${originalResponse}
<system-reminder>
${buildOrchestratorReminder(boulderState.plan_name, progress, subagentSessionId, autoCommit)}
${buildOrchestratorReminder(boulderState.plan_name, progress, subagentSessionId, autoCommit, false)}
</system-reminder>`
log(`[${HOOK_NAME}] Output transformed for orchestrator mode (boulder)`, {
plan: boulderState.plan_name,

View File

@@ -0,0 +1,94 @@
import { describe, expect, it } from "bun:test"
import { buildOrchestratorReminder, buildCompletionGate } from "./verification-reminders"
// Test helpers for given/when/then pattern
const given = describe
const when = describe
const then = it
describe("buildCompletionGate", () => {
given("a plan name and session id", () => {
const planName = "test-plan"
const sessionId = "test-session-123"
when("buildCompletionGate is called", () => {
const gate = buildCompletionGate(planName, sessionId)
then("completion gate text is present", () => {
expect(gate).toContain("COMPLETION GATE")
})
then("gate appears before verification phase text", () => {
const gateIndex = gate.indexOf("COMPLETION GATE")
const verificationIndex = gate.indexOf("VERIFICATION_REMINDER")
expect(gateIndex).toBeLessThan(verificationIndex)
})
then("gate interpolates the plan name path", () => {
expect(gate).toContain(planName)
expect(gate).toContain(`.sisyphus/plans/${planName}.md`)
})
then("gate includes Edit instructions", () => {
expect(gate.toLowerCase()).toContain("edit")
})
then("gate includes Read instructions", () => {
expect(gate.toLowerCase()).toContain("read")
})
then("old STEP 7 MARK COMPLETION text is absent", () => {
expect(gate).not.toContain("STEP 7")
expect(gate).not.toContain("MARK COMPLETION IN PLAN FILE")
})
then("step numbering remains consecutive after removal", () => {
const stepMatches = gate.match(/STEP \d+:/g) ?? []
if (stepMatches.length > 1) {
const numbers = stepMatches.map((s: string) => parseInt(s.match(/\d+/)?.[0] ?? "0"))
for (let i = 1; i < numbers.length; i++) {
expect(numbers[i]).toBe(numbers[i - 1] + 1)
}
}
})
})
})
})
describe("buildOrchestratorReminder", () => {
given("progress with completed tasks", () => {
const planName = "my-test-plan"
const sessionId = "session-abc"
const progress = { total: 10, completed: 3 }
when("buildOrchestratorReminder is called with autoCommit true", () => {
const reminder = buildOrchestratorReminder(planName, progress, sessionId, true)
then("old STEP 7 MARK COMPLETION IN PLAN FILE text is absent", () => {
expect(reminder).not.toContain("STEP 7: MARK COMPLETION IN PLAN FILE")
})
then("completion gate appears before verification reminder", () => {
const gateIndex = reminder.indexOf("COMPLETION GATE")
const verificationIndex = reminder.indexOf("VERIFICATION_REMINDER")
expect(gateIndex).toBeGreaterThanOrEqual(0)
expect(gateIndex).toBeLessThan(verificationIndex)
})
})
when("buildOrchestratorReminder is called with autoCommit false", () => {
const reminder = buildOrchestratorReminder(planName, progress, sessionId, false)
then("old STEP 7 MARK COMPLETION IN PLAN FILE text is absent", () => {
expect(reminder).not.toContain("STEP 7: MARK COMPLETION IN PLAN FILE")
})
then("completion gate appears before verification reminder", () => {
const gateIndex = reminder.indexOf("COMPLETION GATE")
const verificationIndex = reminder.indexOf("VERIFICATION_REMINDER")
expect(gateIndex).toBeGreaterThanOrEqual(0)
expect(gateIndex).toBeLessThan(verificationIndex)
})
})
})
})

View File

@@ -1,7 +1,37 @@
import { VERIFICATION_REMINDER } from "./system-reminder-templates"
export function buildCompletionGate(planName: string, sessionId: string): string {
return `
**COMPLETION GATE — DO NOT PROCEED UNTIL THIS IS DONE**
Your completion will NOT be recorded until you complete ALL of the following:
1. **Edit** the plan file \`.sisyphus/plans/${planName}.md\`:
- Change \`- [ ]\` to \`- [x]\` for the completed task
- Use \`Edit\` tool to modify the checkbox
2. **Read** the plan file AGAIN:
\`\`\`
Read(".sisyphus/plans/${planName}.md")
\`\`\`
- Verify the checkbox count changed (more \`- [x]\` than before)
3. **DO NOT call \`task()\` again** until you have completed steps 1 and 2 above.
If anything fails while closing this out, resume the same session immediately:
\`\`\`typescript
task(session_id="${sessionId}", prompt="fix: checkbox not recorded correctly")
\`\`\`
**Your completion is NOT tracked until the checkbox is marked in the plan file.**
**VERIFICATION_REMINDER**`
}
function buildVerificationReminder(sessionId: string): string {
return `${VERIFICATION_REMINDER}
return `**VERIFICATION_REMINDER**
${VERIFICATION_REMINDER}
---
@@ -15,20 +45,21 @@ export function buildOrchestratorReminder(
planName: string,
progress: { total: number; completed: number },
sessionId: string,
autoCommit: boolean = true
autoCommit: boolean = true,
includeCompletionGate: boolean = true
): string {
const remaining = progress.total - progress.completed
const commitStep = autoCommit
? `
**STEP 8: COMMIT ATOMIC UNIT**
**STEP 7: COMMIT ATOMIC UNIT**
- Stage ONLY the verified changes
- Commit with clear message describing what was done
`
: ""
const nextStepNumber = autoCommit ? 9 : 8
const nextStepNumber = autoCommit ? 8 : 7
return `
---
@@ -37,7 +68,9 @@ export function buildOrchestratorReminder(
---
${buildVerificationReminder(sessionId)}
${includeCompletionGate ? `${buildCompletionGate(planName, sessionId)}
` : ""}${buildVerificationReminder(sessionId)}
**STEP 5: READ SUBAGENT NOTEPAD (LEARNINGS, ISSUES, PROBLEMS)**
@@ -64,22 +97,13 @@ Read(".sisyphus/plans/${planName}.md")
Count exactly: how many \`- [ ]\` remain? How many \`- [x]\` completed?
This is YOUR ground truth. Use it to decide what comes next.
**STEP 7: MARK COMPLETION IN PLAN FILE (IMMEDIATELY)**
RIGHT NOW - Do not delay. Verification passed → Mark IMMEDIATELY.
Update the plan file \`.sisyphus/plans/${planName}.md\`:
- Change \`- [ ]\` to \`- [x]\` for the completed task
- Use \`Edit\` tool to modify the checkbox
**DO THIS BEFORE ANYTHING ELSE. Unmarked = Untracked = Lost progress.**
${commitStep}
**STEP ${nextStepNumber}: PROCEED TO NEXT TASK**
- Read the plan file AGAIN to identify the next \`- [ ]\` task
- Start immediately - DO NOT STOP
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**${remaining} tasks remain. Keep bouldering.**`
}

View File

@@ -202,7 +202,7 @@ BEFORE writing ANY code, you MUST define:
| **Observable** | What can be measured/seen | "Console shows 'success', no errors" |
| **Pass/Fail** | Binary, no ambiguity | "Returns 200 OK" not "should work" |
Write these criteria explicitly. Share with user if scope is non-trivial.
Write these criteria explicitly. **Record them in your TODO/Task items.** Each task MUST include a "QA: [how to verify]" field. These criteria are your CONTRACT — work toward them, verify against them.
### Test Plan Template (MANDATORY for non-trivial tasks)
@@ -228,6 +228,32 @@ Write these criteria explicitly. Share with user if scope is non-trivial.
**WITHOUT evidence = NOT verified = NOT done.**
<MANUAL_QA_MANDATE>
### YOU MUST EXECUTE MANUAL QA YOURSELF. THIS IS NOT OPTIONAL.
**YOUR FAILURE MODE**: You finish coding, run lsp_diagnostics, and declare "done" without actually TESTING the feature. lsp_diagnostics catches type errors, NOT functional bugs. Your work is NOT verified until you MANUALLY test it.
**WHAT MANUAL QA MEANS — execute ALL that apply:**
| If your change... | YOU MUST... |
|---|---|
| Adds/modifies a CLI command | Run the command with Bash. Show the output. |
| Changes build output | Run the build. Verify the output files exist and are correct. |
| Modifies API behavior | Call the endpoint. Show the response. |
| Changes UI rendering | Describe what renders. Use a browser tool if available. |
| Adds a new tool/hook/feature | Test it end-to-end in a real scenario. |
| Modifies config handling | Load the config. Verify it parses correctly. |
**UNACCEPTABLE QA CLAIMS:**
- "This should work" — RUN IT.
- "The types check out" — Types don't catch logic bugs. RUN IT.
- "lsp_diagnostics is clean" — That's a TYPE check, not a FUNCTIONAL check. RUN IT.
- "Tests pass" — Tests cover known cases. Does the ACTUAL FEATURE work as the user expects? RUN IT.
**You have Bash, you have tools. There is ZERO excuse for not running manual QA.**
**Manual QA is the FINAL gate before reporting completion. Skip it and your work is INCOMPLETE.**
</MANUAL_QA_MANDATE>
### TDD Workflow (when test infrastructure exists)
1. **SPEC**: Define what "working" means (success criteria above)

View File

@@ -236,6 +236,33 @@ task(subagent_type="plan", load_skills=[], prompt="<gathered context + user requ
If ANY answer is no → GO BACK AND DO IT. Do not claim completion.
</ANTI_OPTIMISM_CHECKPOINT>
<MANUAL_QA_MANDATE>
### YOU MUST EXECUTE MANUAL QA. THIS IS NOT OPTIONAL. DO NOT SKIP THIS.
**YOUR FAILURE MODE**: You run lsp_diagnostics, see zero errors, and declare victory. lsp_diagnostics catches TYPE errors. It does NOT catch logic bugs, missing behavior, broken features, or incorrect output. Your work is NOT verified until you MANUALLY TEST the actual feature.
**AFTER every implementation, you MUST:**
1. **Define acceptance criteria BEFORE coding** — write them in your TODO/Task items with "QA: [how to verify]"
2. **Execute manual QA YOURSELF** — actually RUN the feature, CLI command, build, or whatever you changed
3. **Report what you observed** — show actual output, not claims
| If your change... | YOU MUST... |
|---|---|
| Adds/modifies a CLI command | Run the command with Bash. Show the output. |
| Changes build output | Run the build. Verify output files exist and are correct. |
| Modifies API behavior | Call the endpoint. Show the response. |
| Adds a new tool/hook/feature | Test it end-to-end in a real scenario. |
| Modifies config handling | Load the config. Verify it parses correctly. |
**UNACCEPTABLE (WILL BE REJECTED):**
- "This should work" — DID YOU RUN IT? NO? THEN RUN IT.
- "lsp_diagnostics is clean" — That is a TYPE check, not a FUNCTIONAL check. RUN THE FEATURE.
- "Tests pass" — Tests cover known cases. Does the ACTUAL feature work? VERIFY IT MANUALLY.
**You have Bash, you have tools. There is ZERO excuse for skipping manual QA.**
</MANUAL_QA_MANDATE>
**WITHOUT evidence = NOT verified = NOT done.**
## ZERO TOLERANCE FAILURES

View File

@@ -118,6 +118,14 @@ deep_context = background_output(task_id=...)
- \`lsp_diagnostics\` on modified files
- Run tests if available
## ACCEPTANCE CRITERIA WORKFLOW
**BEFORE implementation**, define what "done" means in concrete, binary terms:
1. Write acceptance criteria as pass/fail conditions (not "should work" — specific observable outcomes)
2. Record them in your TODO/Task items with a "QA: [how to verify]" field
3. Work toward those criteria, not just "finishing code"
## QUALITY STANDARDS
| Phase | Action | Required Evidence |
@@ -125,6 +133,25 @@ deep_context = background_output(task_id=...)
| Build | Run build command | Exit code 0 |
| Test | Execute test suite | All tests pass |
| Lint | Run lsp_diagnostics | Zero new errors |
| **Manual QA** | **Execute the feature yourself** | **Actual output shown** |
<MANUAL_QA_MANDATE>
### MANUAL QA IS MANDATORY. lsp_diagnostics IS NOT ENOUGH.
lsp_diagnostics catches type errors. It does NOT catch logic bugs, missing behavior, or broken features. After EVERY implementation, you MUST manually test the actual feature.
**Execute ALL that apply:**
| If your change... | YOU MUST... |
|---|---|
| Adds/modifies a CLI command | Run the command with Bash. Show the output. |
| Changes build output | Run the build. Verify output files. |
| Modifies API behavior | Call the endpoint. Show the response. |
| Adds a new tool/hook/feature | Test it end-to-end in a real scenario. |
| Modifies config handling | Load the config. Verify it parses correctly. |
**"This should work" is NOT evidence. RUN IT. Show what happened. That is evidence.**
</MANUAL_QA_MANDATE>
## COMPLETION CRITERIA
@@ -133,6 +160,7 @@ A task is complete when:
2. lsp_diagnostics shows zero errors on modified files
3. Tests pass (or pre-existing failures documented)
4. Code matches existing codebase patterns
5. **Manual QA executed — actual feature tested, output observed and reported**
**Deliver exactly what was asked. No more, no less.**

View File

@@ -3,7 +3,7 @@
*
* Routing logic:
* 1. Planner agents (prometheus, plan) → planner.ts
* 2. GPT 5.2 models → gpt5.2.ts
* 2. GPT 5.4 models → gpt5.4.ts
* 3. Gemini models → gemini.ts
* 4. Everything else (Claude, etc.) → default.ts
*/

View File

@@ -134,8 +134,8 @@ describe("model fallback hook", () => {
//#then - chain should progress to entry[1], not repeat entry[0]
expect(secondOutput.message["model"]).toEqual({
providerID: "zai-coding-plan",
modelID: "glm-5",
providerID: "kimi-for-coding",
modelID: "k2p5",
})
expect(secondOutput.message["variant"]).toBeUndefined()
})

View File

@@ -104,7 +104,7 @@ describe("no-sisyphus-gpt hook", () => {
await hook["chat.message"]?.({
sessionID: "ses_3",
agent: HEPHAESTUS_DISPLAY,
model: { providerID: "openai", modelID: "gpt-5.2" },
model: { providerID: "openai", modelID: "gpt-5.4" },
}, output)
// then - no toast
@@ -126,7 +126,7 @@ describe("no-sisyphus-gpt hook", () => {
// when - chat.message runs without input.agent
await hook["chat.message"]?.({
sessionID: "ses_4",
model: { providerID: "openai", modelID: "gpt-5.2" },
model: { providerID: "openai", modelID: "gpt-4o" },
}, output)
// then - toast shown via session-agent fallback

View File

@@ -0,0 +1,61 @@
import type { PluginInput } from "@opencode-ai/plugin"
import { log } from "../../shared/logger"
import { buildContinuationPrompt } from "./continuation-prompt-builder"
import { HOOK_NAME } from "./constants"
import { injectContinuationPrompt } from "./continuation-prompt-injector"
import type { RalphLoopState } from "./types"
type LoopStateController = {
clear: () => boolean
markVerificationPending: (sessionID: string) => RalphLoopState | null
}
export async function handleDetectedCompletion(
ctx: PluginInput,
input: {
sessionID: string
state: RalphLoopState
loopState: LoopStateController
directory: string
apiTimeoutMs: number
},
): Promise<void> {
const { sessionID, state, loopState, directory, apiTimeoutMs } = input
if (state.ultrawork && !state.verification_pending) {
const verificationState = loopState.markVerificationPending(sessionID)
if (!verificationState) {
log(`[${HOOK_NAME}] Failed to transition ultrawork loop to verification`, {
sessionID,
})
return
}
await injectContinuationPrompt(ctx, {
sessionID,
prompt: buildContinuationPrompt(verificationState),
directory,
apiTimeoutMs,
})
await ctx.client.tui?.showToast?.({
body: {
title: "ULTRAWORK LOOP",
message: "DONE detected. Oracle verification is now required.",
variant: "info",
duration: 5000,
},
}).catch(() => {})
return
}
loopState.clear()
const title = state.ultrawork ? "ULTRAWORK LOOP COMPLETE!" : "Ralph Loop Complete!"
const message = state.ultrawork
? `JUST ULW ULW! Task completed after ${state.iteration} iteration(s)`
: `Task completed after ${state.iteration} iteration(s)`
await ctx.client.tui?.showToast?.({
body: { title, message, variant: "success", duration: 5000 },
}).catch(() => {})
}

View File

@@ -20,6 +20,7 @@ function buildPromisePattern(promise: string): RegExp {
export function detectCompletionInTranscript(
transcriptPath: string | undefined,
promise: string,
startedAt?: string,
): boolean {
if (!transcriptPath) return false
@@ -32,8 +33,9 @@ export function detectCompletionInTranscript(
for (const line of lines) {
try {
const entry = JSON.parse(line) as { type?: string }
const entry = JSON.parse(line) as { type?: string; timestamp?: string }
if (entry.type === "user") continue
if (startedAt && entry.timestamp && entry.timestamp < startedAt) continue
if (pattern.test(line)) return true
} catch {
continue

View File

@@ -3,3 +3,4 @@ export const DEFAULT_STATE_FILE = ".sisyphus/ralph-loop.local.md"
export const COMPLETION_TAG_PATTERN = /<promise>(.*?)<\/promise>/is
export const DEFAULT_MAX_ITERATIONS = 100
export const DEFAULT_COMPLETION_PROMISE = "DONE"
export const ULTRAWORK_VERIFICATION_PROMISE = "VERIFIED"

View File

@@ -1,6 +1,10 @@
import { SYSTEM_DIRECTIVE_PREFIX } from "../../shared/system-directive"
import type { RalphLoopState } from "./types"
function getMaxIterationsLabel(state: RalphLoopState): string {
return typeof state.max_iterations === "number" ? String(state.max_iterations) : "unbounded"
}
const CONTINUATION_PROMPT = `${SYSTEM_DIRECTIVE_PREFIX} - RALPH LOOP {{ITERATION}}/{{MAX}}]
Your previous attempt did not output the completion promise. Continue working on the task.
@@ -14,12 +18,55 @@ IMPORTANT:
Original task:
{{PROMPT}}`
const ULTRAWORK_VERIFICATION_PROMPT = `${SYSTEM_DIRECTIVE_PREFIX} - ULTRAWORK LOOP VERIFICATION {{ITERATION}}/{{MAX}}]
You already emitted <promise>{{INITIAL_PROMISE}}</promise>. This does NOT finish the loop yet.
REQUIRED NOW:
- Call Oracle using task(subagent_type="oracle", load_skills=[], run_in_background=false, ...)
- Ask Oracle to verify whether the original task is actually complete
- The system will inspect the Oracle session directly for the verification result
- If Oracle does not verify, continue fixing the task and do not consider it complete
Original task:
{{PROMPT}}`
const ULTRAWORK_VERIFICATION_FAILED_PROMPT = `${SYSTEM_DIRECTIVE_PREFIX} - ULTRAWORK LOOP VERIFICATION FAILED {{ITERATION}}/{{MAX}}]
Oracle did not emit <promise>VERIFIED</promise>. Verification failed.
REQUIRED NOW:
- Verification failed. Fix the task until Oracle's review is satisfied
- Oracle does not lie. Treat the verification result as ground truth
- Do not claim completion early or argue with the failed verification
- After fixing the remaining issues, request Oracle review again using task(subagent_type="oracle", load_skills=[], run_in_background=false, ...)
- Only when the work is ready for review again, output: <promise>{{PROMISE}}</promise>
Original task:
{{PROMPT}}`
export function buildContinuationPrompt(state: RalphLoopState): string {
const continuationPrompt = CONTINUATION_PROMPT.replace(
const template = state.verification_pending
? ULTRAWORK_VERIFICATION_PROMPT
: CONTINUATION_PROMPT
const continuationPrompt = template.replace(
"{{ITERATION}}",
String(state.iteration),
)
.replace("{{MAX}}", String(state.max_iterations))
.replace("{{MAX}}", getMaxIterationsLabel(state))
.replace("{{INITIAL_PROMISE}}", state.initial_completion_promise ?? state.completion_promise)
.replace("{{PROMISE}}", state.completion_promise)
.replace("{{PROMPT}}", state.prompt)
return state.ultrawork ? `ultrawork ${continuationPrompt}` : continuationPrompt
}
export function buildVerificationFailurePrompt(state: RalphLoopState): string {
const continuationPrompt = ULTRAWORK_VERIFICATION_FAILED_PROMPT.replace(
"{{ITERATION}}",
String(state.iteration),
)
.replace("{{MAX}}", getMaxIterationsLabel(state))
.replace("{{PROMISE}}", state.completion_promise)
.replace("{{PROMPT}}", state.prompt)

View File

@@ -1078,7 +1078,7 @@ Original task: Build something`
expect(messagesCalls.length).toBe(1)
})
test("should show ultrawork completion toast", async () => {
test("should require oracle verification toast for ultrawork completion promise", async () => {
// given - hook with ultrawork mode and completion in transcript
const transcriptPath = join(TEST_DIR, "transcript.jsonl")
const hook = createRalphLoopHook(createMockPluginInput(), {
@@ -1090,10 +1090,9 @@ Original task: Build something`
// when - idle event triggered
await hook.event({ event: { type: "session.idle", properties: { sessionID: "test-id" } } })
// then - ultrawork toast shown
const completionToast = toastCalls.find(t => t.title === "ULTRAWORK LOOP COMPLETE!")
expect(completionToast).toBeDefined()
expect(completionToast!.message).toMatch(/JUST ULW ULW!/)
const verificationToast = toastCalls.find(t => t.title === "ULTRAWORK LOOP")
expect(verificationToast).toBeDefined()
expect(verificationToast!.message).toMatch(/Oracle verification is now required/)
})
test("should show regular completion toast when ultrawork disabled", async () => {

View File

@@ -3,6 +3,7 @@ import {
DEFAULT_COMPLETION_PROMISE,
DEFAULT_MAX_ITERATIONS,
HOOK_NAME,
ULTRAWORK_VERIFICATION_PROMISE,
} from "./constants"
import { clearState, incrementIteration, readState, writeState } from "./storage"
import { log } from "../../shared/logger"
@@ -28,18 +29,24 @@ export function createLoopStateController(options: {
strategy?: "reset" | "continue"
},
): boolean {
const initialCompletionPromise =
loopOptions?.completionPromise ??
DEFAULT_COMPLETION_PROMISE
const state: RalphLoopState = {
active: true,
iteration: 1,
max_iterations:
loopOptions?.maxIterations ??
config?.default_max_iterations ??
DEFAULT_MAX_ITERATIONS,
max_iterations: loopOptions?.ultrawork
? undefined
: loopOptions?.maxIterations ??
config?.default_max_iterations ??
DEFAULT_MAX_ITERATIONS,
message_count_at_start: loopOptions?.messageCountAtStart,
completion_promise:
loopOptions?.completionPromise ??
DEFAULT_COMPLETION_PROMISE,
completion_promise: initialCompletionPromise,
initial_completion_promise: initialCompletionPromise,
verification_attempt_id: undefined,
verification_session_id: undefined,
ultrawork: loopOptions?.ultrawork,
verification_pending: undefined,
strategy: loopOptions?.strategy ?? config?.default_strategy ?? "continue",
started_at: new Date().toISOString(),
prompt,
@@ -109,5 +116,62 @@ export function createLoopStateController(options: {
return state
},
markVerificationPending(sessionID: string): RalphLoopState | null {
const state = readState(directory, stateDir)
if (!state || state.session_id !== sessionID || !state.ultrawork) {
return null
}
state.verification_pending = true
state.completion_promise = ULTRAWORK_VERIFICATION_PROMISE
state.verification_attempt_id = undefined
state.verification_session_id = undefined
state.initial_completion_promise ??= DEFAULT_COMPLETION_PROMISE
if (!writeState(directory, state, stateDir)) {
return null
}
return state
},
setVerificationSessionID(sessionID: string, verificationSessionID: string): RalphLoopState | null {
const state = readState(directory, stateDir)
if (!state || state.session_id !== sessionID || !state.ultrawork || !state.verification_pending) {
return null
}
state.verification_session_id = verificationSessionID
if (!writeState(directory, state, stateDir)) {
return null
}
return state
},
restartAfterFailedVerification(sessionID: string, messageCountAtStart?: number): RalphLoopState | null {
const state = readState(directory, stateDir)
if (!state || state.session_id !== sessionID || !state.ultrawork || !state.verification_pending) {
return null
}
state.iteration += 1
state.started_at = new Date().toISOString()
state.completion_promise = state.initial_completion_promise ?? DEFAULT_COMPLETION_PROMISE
state.verification_pending = undefined
state.verification_attempt_id = undefined
state.verification_session_id = undefined
if (typeof messageCountAtStart === "number") {
state.message_count_at_start = messageCountAtStart
}
if (!writeState(directory, state, stateDir)) {
return null
}
return state
},
}
}

View File

@@ -2,11 +2,14 @@ import type { PluginInput } from "@opencode-ai/plugin"
import { log } from "../../shared/logger"
import type { RalphLoopOptions, RalphLoopState } from "./types"
import { HOOK_NAME } from "./constants"
import { handleDetectedCompletion } from "./completion-handler"
import {
detectCompletionInSessionMessages,
detectCompletionInTranscript,
} from "./completion-promise-detector"
import { continueIteration } from "./iteration-continuation"
import { handleDeletedLoopSession, handleErroredLoopSession } from "./session-event-handler"
import { handleFailedVerification } from "./verification-failure-handler"
type SessionRecovery = {
isRecovering: (sessionID: string) => boolean
@@ -18,6 +21,9 @@ type LoopStateController = {
clear: () => boolean
incrementIteration: () => RalphLoopState | null
setSessionID: (sessionID: string) => RalphLoopState | null
markVerificationPending: (sessionID: string) => RalphLoopState | null
setVerificationSessionID: (sessionID: string, verificationSessionID: string) => RalphLoopState | null
restartAfterFailedVerification: (sessionID: string, messageCountAtStart?: number) => RalphLoopState | null
}
type RalphLoopEventHandlerOptions = { directory: string; apiTimeoutMs: number; getTranscriptPath: (sessionID: string) => string | undefined; checkSessionExists?: RalphLoopOptions["checkSessionExists"]; sessionRecovery: SessionRecovery; loopState: LoopStateController }
@@ -53,7 +59,13 @@ export function createRalphLoopEventHandler(
return
}
if (state.session_id && state.session_id !== sessionID) {
const verificationSessionID = state.verification_pending
? state.verification_session_id
: undefined
const matchesParentSession = state.session_id === undefined || state.session_id === sessionID
const matchesVerificationSession = verificationSessionID === sessionID
if (!matchesParentSession && !matchesVerificationSession && state.session_id) {
if (options.checkSessionExists) {
try {
const exists = await options.checkSessionExists(state.session_id)
@@ -75,10 +87,27 @@ export function createRalphLoopEventHandler(
return
}
const transcriptPath = options.getTranscriptPath(sessionID)
const completionViaTranscript = detectCompletionInTranscript(transcriptPath, state.completion_promise)
const completionSessionID = verificationSessionID ?? (state.verification_pending ? undefined : sessionID)
const transcriptPath = completionSessionID ? options.getTranscriptPath(completionSessionID) : undefined
const completionViaTranscript = completionSessionID
? detectCompletionInTranscript(
transcriptPath,
state.completion_promise,
state.started_at,
)
: false
const completionViaApi = completionViaTranscript
? false
: verificationSessionID
? await detectCompletionInSessionMessages(ctx, {
sessionID: verificationSessionID,
promise: state.completion_promise,
apiTimeoutMs: options.apiTimeoutMs,
directory: options.directory,
sinceMessageIndex: undefined,
})
: state.verification_pending
? false
: await detectCompletionInSessionMessages(ctx, {
sessionID,
promise: state.completion_promise,
@@ -96,15 +125,41 @@ export function createRalphLoopEventHandler(
? "transcript_file"
: "session_messages_api",
})
options.loopState.clear()
const title = state.ultrawork ? "ULTRAWORK LOOP COMPLETE!" : "Ralph Loop Complete!"
const message = state.ultrawork ? `JUST ULW ULW! Task completed after ${state.iteration} iteration(s)` : `Task completed after ${state.iteration} iteration(s)`
await ctx.client.tui?.showToast?.({ body: { title, message, variant: "success", duration: 5000 } }).catch(() => {})
await handleDetectedCompletion(ctx, {
sessionID,
state,
loopState: options.loopState,
directory: options.directory,
apiTimeoutMs: options.apiTimeoutMs,
})
return
}
if (state.iteration >= state.max_iterations) {
if (state.verification_pending) {
if (verificationSessionID && matchesVerificationSession) {
const restarted = await handleFailedVerification(ctx, {
state,
loopState: options.loopState,
directory: options.directory,
apiTimeoutMs: options.apiTimeoutMs,
})
if (restarted) {
return
}
}
log(`[${HOOK_NAME}] Waiting for oracle verification`, {
sessionID,
verificationSessionID,
iteration: state.iteration,
})
return
}
if (
typeof state.max_iterations === "number"
&& state.iteration >= state.max_iterations
) {
log(`[${HOOK_NAME}] Max iterations reached`, {
sessionID,
iteration: state.iteration,
@@ -133,7 +188,7 @@ export function createRalphLoopEventHandler(
await ctx.client.tui?.showToast?.({
body: {
title: "Ralph Loop",
message: `Iteration ${newState.iteration}/${newState.max_iterations}`,
message: `Iteration ${newState.iteration}/${typeof newState.max_iterations === "number" ? newState.max_iterations : "unbounded"}`,
variant: "info",
duration: 2000,
},
@@ -159,36 +214,12 @@ export function createRalphLoopEventHandler(
}
if (event.type === "session.deleted") {
const sessionInfo = props?.info as { id?: string } | undefined
if (!sessionInfo?.id) return
const state = options.loopState.getState()
if (state?.session_id === sessionInfo.id) {
options.loopState.clear()
log(`[${HOOK_NAME}] Session deleted, loop cleared`, { sessionID: sessionInfo.id })
}
options.sessionRecovery.clear(sessionInfo.id)
if (!handleDeletedLoopSession(props, options.loopState, options.sessionRecovery)) return
return
}
if (event.type === "session.error") {
const sessionID = props?.sessionID as string | undefined
const error = props?.error as { name?: string } | undefined
if (error?.name === "MessageAbortedError") {
if (sessionID) {
const state = options.loopState.getState()
if (state?.session_id === sessionID) {
options.loopState.clear()
log(`[${HOOK_NAME}] User aborted, loop cleared`, { sessionID })
}
options.sessionRecovery.clear(sessionID)
}
return
}
if (sessionID) {
options.sessionRecovery.markRecovering(sessionID)
}
handleErroredLoopSession(props, options.loopState, options.sessionRecovery)
}
}
}

View File

@@ -0,0 +1,56 @@
import { log } from "../../shared/logger"
import { HOOK_NAME } from "./constants"
import type { RalphLoopState } from "./types"
type LoopStateController = {
getState: () => RalphLoopState | null
clear: () => boolean
}
type SessionRecovery = {
clear: (sessionID: string) => void
markRecovering: (sessionID: string) => void
}
export function handleDeletedLoopSession(
props: Record<string, unknown> | undefined,
loopState: LoopStateController,
sessionRecovery: SessionRecovery,
): boolean {
const sessionInfo = props?.info as { id?: string } | undefined
if (!sessionInfo?.id) return false
const state = loopState.getState()
if (state?.session_id === sessionInfo.id) {
loopState.clear()
log(`[${HOOK_NAME}] Session deleted, loop cleared`, { sessionID: sessionInfo.id })
}
sessionRecovery.clear(sessionInfo.id)
return true
}
export function handleErroredLoopSession(
props: Record<string, unknown> | undefined,
loopState: LoopStateController,
sessionRecovery: SessionRecovery,
): boolean {
const sessionID = props?.sessionID as string | undefined
const error = props?.error as { name?: string } | undefined
if (error?.name === "MessageAbortedError") {
if (sessionID) {
const state = loopState.getState()
if (state?.session_id === sessionID) {
loopState.clear()
log(`[${HOOK_NAME}] User aborted, loop cleared`, { sessionID })
}
sessionRecovery.clear(sessionID)
}
return true
}
if (sessionID) {
sessionRecovery.markRecovering(sessionID)
}
return true
}

View File

@@ -40,10 +40,18 @@ export function readState(directory: string, customPath?: string): RalphLoopStat
return str.replace(/^["']|["']$/g, "")
}
const ultrawork = data.ultrawork === true || data.ultrawork === "true" ? true : undefined
const maxIterations =
data.max_iterations === undefined || data.max_iterations === ""
? ultrawork
? undefined
: DEFAULT_MAX_ITERATIONS
: Number(data.max_iterations) || DEFAULT_MAX_ITERATIONS
return {
active: isActive,
iteration: iterationNum,
max_iterations: Number(data.max_iterations) || DEFAULT_MAX_ITERATIONS,
max_iterations: maxIterations,
message_count_at_start:
typeof data.message_count_at_start === "number"
? data.message_count_at_start
@@ -51,10 +59,23 @@ export function readState(directory: string, customPath?: string): RalphLoopStat
? Number(data.message_count_at_start)
: undefined,
completion_promise: stripQuotes(data.completion_promise) || DEFAULT_COMPLETION_PROMISE,
initial_completion_promise: data.initial_completion_promise
? stripQuotes(data.initial_completion_promise)
: undefined,
verification_attempt_id: data.verification_attempt_id
? stripQuotes(data.verification_attempt_id)
: undefined,
verification_session_id: data.verification_session_id
? stripQuotes(data.verification_session_id)
: undefined,
started_at: stripQuotes(data.started_at) || new Date().toISOString(),
prompt: body.trim(),
session_id: data.session_id ? stripQuotes(data.session_id) : undefined,
ultrawork: data.ultrawork === true || data.ultrawork === "true" ? true : undefined,
ultrawork,
verification_pending:
data.verification_pending === true || data.verification_pending === "true"
? true
: undefined,
strategy: data.strategy === "reset" || data.strategy === "continue" ? data.strategy : undefined,
}
} catch {
@@ -77,18 +98,34 @@ export function writeState(
const sessionIdLine = state.session_id ? `session_id: "${state.session_id}"\n` : ""
const ultraworkLine = state.ultrawork !== undefined ? `ultrawork: ${state.ultrawork}\n` : ""
const verificationPendingLine =
state.verification_pending !== undefined
? `verification_pending: ${state.verification_pending}\n`
: ""
const strategyLine = state.strategy ? `strategy: "${state.strategy}"\n` : ""
const initialCompletionPromiseLine = state.initial_completion_promise
? `initial_completion_promise: "${state.initial_completion_promise}"\n`
: ""
const verificationAttemptLine = state.verification_attempt_id
? `verification_attempt_id: "${state.verification_attempt_id}"\n`
: ""
const verificationSessionLine = state.verification_session_id
? `verification_session_id: "${state.verification_session_id}"\n`
: ""
const messageCountAtStartLine =
typeof state.message_count_at_start === "number"
? `message_count_at_start: ${state.message_count_at_start}\n`
: ""
const maxIterationsLine =
typeof state.max_iterations === "number"
? `max_iterations: ${state.max_iterations}\n`
: ""
const content = `---
active: ${state.active}
iteration: ${state.iteration}
max_iterations: ${state.max_iterations}
completion_promise: "${state.completion_promise}"
started_at: "${state.started_at}"
${sessionIdLine}${ultraworkLine}${strategyLine}${messageCountAtStartLine}---
${maxIterationsLine}completion_promise: "${state.completion_promise}"
${initialCompletionPromiseLine}${verificationAttemptLine}${verificationSessionLine}started_at: "${state.started_at}"
${sessionIdLine}${ultraworkLine}${verificationPendingLine}${strategyLine}${messageCountAtStartLine}---
${state.prompt}
`

View File

@@ -3,13 +3,17 @@ import type { RalphLoopConfig } from "../../config"
export interface RalphLoopState {
active: boolean
iteration: number
max_iterations: number
max_iterations?: number
message_count_at_start?: number
completion_promise: string
initial_completion_promise?: string
verification_attempt_id?: string
verification_session_id?: string
started_at: string
prompt: string
session_id?: string
ultrawork?: boolean
verification_pending?: boolean
strategy?: "reset" | "continue"
}

View File

@@ -0,0 +1,297 @@
import { afterEach, beforeEach, describe, expect, test } from "bun:test"
import { existsSync, mkdirSync, rmSync, writeFileSync } from "node:fs"
import { tmpdir } from "node:os"
import { join } from "node:path"
import { createRalphLoopHook } from "./index"
import { ULTRAWORK_VERIFICATION_PROMISE } from "./constants"
import { clearState, writeState } from "./storage"
describe("ulw-loop verification", () => {
const testDir = join(tmpdir(), `ulw-loop-verification-${Date.now()}`)
let promptCalls: Array<{ sessionID: string; text: string }>
let toastCalls: Array<{ title: string; message: string; variant: string }>
let parentTranscriptPath: string
let oracleTranscriptPath: string
function createMockPluginInput() {
return {
client: {
session: {
promptAsync: async (opts: { path: { id: string }; body: { parts: Array<{ type: string; text: string }> } }) => {
promptCalls.push({
sessionID: opts.path.id,
text: opts.body.parts[0].text,
})
return {}
},
messages: async () => ({ data: [] }),
},
tui: {
showToast: async (opts: { body: { title: string; message: string; variant: string } }) => {
toastCalls.push(opts.body)
return {}
},
},
},
directory: testDir,
} as unknown as Parameters<typeof createRalphLoopHook>[0]
}
beforeEach(() => {
promptCalls = []
toastCalls = []
parentTranscriptPath = join(testDir, "transcript-parent.jsonl")
oracleTranscriptPath = join(testDir, "transcript-oracle.jsonl")
if (!existsSync(testDir)) {
mkdirSync(testDir, { recursive: true })
}
clearState(testDir)
})
afterEach(() => {
clearState(testDir)
if (existsSync(testDir)) {
rmSync(testDir, { recursive: true, force: true })
}
})
test("#given ulw loop emits DONE #when idle fires #then verification phase starts instead of completing", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
expect(hook.getState()?.verification_pending).toBe(true)
expect(hook.getState()?.completion_promise).toBe(ULTRAWORK_VERIFICATION_PROMISE)
expect(hook.getState()?.verification_session_id).toBeUndefined()
expect(promptCalls).toHaveLength(1)
expect(promptCalls[0].text).toContain('task(subagent_type="oracle"')
expect(toastCalls.some((toast) => toast.title === "ULTRAWORK LOOP COMPLETE!")).toBe(false)
})
test("#given ulw loop is awaiting verification #when VERIFIED appears in oracle session #then loop completes", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
writeState(testDir, {
...hook.getState()!,
verification_session_id: "ses-oracle",
})
writeFileSync(
oracleTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: `verified <promise>${ULTRAWORK_VERIFICATION_PROMISE}</promise>` } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
expect(hook.getState()).toBeNull()
expect(toastCalls.some((toast) => toast.title === "ULTRAWORK LOOP COMPLETE!")).toBe(true)
})
test("#given ulw loop is awaiting verification #when oracle session idles with VERIFIED #then loop completes without parent idle", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
writeState(testDir, {
...hook.getState()!,
verification_session_id: "ses-oracle",
})
writeFileSync(
oracleTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: `verified <promise>${ULTRAWORK_VERIFICATION_PROMISE}</promise>` } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "ses-oracle" } } })
expect(hook.getState()).toBeNull()
expect(toastCalls.some((toast) => toast.title === "ULTRAWORK LOOP COMPLETE!")).toBe(true)
})
test("#given ulw loop is awaiting verification without oracle session #when idle fires again #then loop waits instead of continuing", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
const stateAfterDone = hook.getState()
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
expect(hook.getState()?.iteration).toBe(stateAfterDone?.iteration)
expect(promptCalls).toHaveLength(1)
expect(hook.getState()?.verification_pending).toBe(true)
})
test("#given ulw loop is awaiting oracle verification #when oracle has not verified yet #then loop waits instead of continuing", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
writeState(testDir, {
...hook.getState()!,
verification_session_id: "ses-oracle",
})
writeFileSync(
oracleTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "still checking" } })}\n`,
)
const stateBeforeWait = hook.getState()
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
expect(hook.getState()?.iteration).toBe(stateBeforeWait?.iteration)
expect(promptCalls).toHaveLength(1)
expect(hook.getState()?.verification_session_id).toBe("ses-oracle")
})
test("#given oracle verification fails #when oracle session idles #then main session receives retry instructions", async () => {
const sessionMessages: Record<string, unknown[]> = {
"session-123": [{}, {}, {}],
}
const hook = createRalphLoopHook({
...createMockPluginInput(),
client: {
...createMockPluginInput().client,
session: {
...createMockPluginInput().client.session,
messages: async (opts: { path: { id: string } }) => ({
data: sessionMessages[opts.path.id] ?? [],
}),
},
},
} as Parameters<typeof createRalphLoopHook>[0], {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
writeState(testDir, {
...hook.getState()!,
verification_session_id: "ses-oracle",
})
writeFileSync(
oracleTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "verification failed: missing tests" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "ses-oracle" } } })
expect(hook.getState()?.iteration).toBe(2)
expect(hook.getState()?.completion_promise).toBe("DONE")
expect(hook.getState()?.verification_pending).toBeUndefined()
expect(hook.getState()?.verification_session_id).toBeUndefined()
expect(hook.getState()?.message_count_at_start).toBe(3)
expect(promptCalls).toHaveLength(2)
expect(promptCalls[1]?.sessionID).toBe("session-123")
expect(promptCalls[1]?.text).toContain("Verification failed")
expect(promptCalls[1]?.text).toContain("Oracle does not lie")
expect(promptCalls[1]?.text).toContain('task(subagent_type="oracle"')
})
test("#given ulw loop without max iterations #when it continues #then it stays unbounded", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
expect(hook.getState()?.iteration).toBe(2)
expect(hook.getState()?.max_iterations).toBeUndefined()
expect(promptCalls[0].text).toContain("2/unbounded")
})
test("#given prior transcript completion from older run #when new ulw loop starts #then old completion is ignored", async () => {
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: "2000-01-01T00:00:00.000Z", tool_output: { output: "old <promise>DONE</promise>" } })}\n`,
)
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
expect(hook.getState()?.iteration).toBe(2)
expect(hook.getState()?.verification_pending).toBeUndefined()
expect(promptCalls).toHaveLength(1)
})
test("#given ulw loop was awaiting verification #when same session starts again #then verification state is overwritten", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
hook.startLoop("session-123", "Restarted task", { ultrawork: true })
expect(hook.getState()?.prompt).toBe("Restarted task")
expect(hook.getState()?.verification_pending).toBeUndefined()
expect(hook.getState()?.completion_promise).toBe("DONE")
})
test("#given parent session emits VERIFIED #when oracle session is not tracked #then ulw loop does not complete", async () => {
const hook = createRalphLoopHook(createMockPluginInput(), {
getTranscriptPath: (sessionID) => sessionID === "ses-oracle" ? oracleTranscriptPath : parentTranscriptPath,
})
hook.startLoop("session-123", "Build API", { ultrawork: true })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
writeFileSync(
parentTranscriptPath,
`${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: "done <promise>DONE</promise>" } })}\n${JSON.stringify({ type: "tool_result", timestamp: new Date().toISOString(), tool_output: { output: `bad parent leak <promise>${ULTRAWORK_VERIFICATION_PROMISE}</promise>` } })}\n`,
)
await hook.event({ event: { type: "session.idle", properties: { sessionID: "session-123" } } })
expect(hook.getState()).not.toBeNull()
expect(hook.getState()?.verification_pending).toBe(true)
})
})

View File

@@ -0,0 +1,99 @@
import type { PluginInput } from "@opencode-ai/plugin"
import { log } from "../../shared/logger"
import { buildVerificationFailurePrompt } from "./continuation-prompt-builder"
import { HOOK_NAME } from "./constants"
import { injectContinuationPrompt } from "./continuation-prompt-injector"
import type { RalphLoopState } from "./types"
type LoopStateController = {
restartAfterFailedVerification: (
sessionID: string,
messageCountAtStart?: number,
) => RalphLoopState | null
}
function getMessageCountFromResponse(messagesResponse: unknown): number {
if (Array.isArray(messagesResponse)) {
return messagesResponse.length
}
if (
typeof messagesResponse === "object"
&& messagesResponse !== null
&& "data" in messagesResponse
) {
const data = (messagesResponse as { data?: unknown }).data
return Array.isArray(data) ? data.length : 0
}
return 0
}
async function getSessionMessageCount(
ctx: PluginInput,
sessionID: string,
directory: string,
): Promise<number> {
const messagesResponse = await ctx.client.session.messages({
path: { id: sessionID },
query: { directory },
})
return getMessageCountFromResponse(messagesResponse)
}
export async function handleFailedVerification(
ctx: PluginInput,
input: {
state: RalphLoopState
directory: string
apiTimeoutMs: number
loopState: LoopStateController
},
): Promise<boolean> {
const { state, directory, apiTimeoutMs, loopState } = input
const parentSessionID = state.session_id
if (!parentSessionID) {
return false
}
let messageCountAtStart: number
try {
messageCountAtStart = await getSessionMessageCount(ctx, parentSessionID, directory)
} catch (error) {
log(`[${HOOK_NAME}] Failed to read parent session before verification retry`, {
parentSessionID,
error: String(error),
})
return false
}
const resumedState = loopState.restartAfterFailedVerification(
parentSessionID,
messageCountAtStart,
)
if (!resumedState) {
log(`[${HOOK_NAME}] Failed to restart loop after verification failure`, {
parentSessionID,
})
return false
}
await injectContinuationPrompt(ctx, {
sessionID: parentSessionID,
prompt: buildVerificationFailurePrompt(resumedState),
directory,
apiTimeoutMs,
})
await ctx.client.tui?.showToast?.({
body: {
title: "ULTRAWORK LOOP",
message: "Oracle verification failed. Continuing ULTRAWORK loop.",
variant: "warning",
duration: 5000,
},
}).catch(() => {})
return true
}

View File

@@ -103,7 +103,7 @@ describe("runtime-fallback", () => {
await hook.event({
event: {
type: "session.created",
properties: { info: { id: sessionID, model: "openai/gpt-5.2" } },
properties: { info: { id: sessionID, model: "openai/gpt-5.4" } },
},
})
@@ -202,7 +202,7 @@ describe("runtime-fallback", () => {
test("should trigger fallback for missing API key errors when fallback models are configured", async () => {
const hook = createRuntimeFallbackHook(createMockPluginInput(), {
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
})
const sessionID = "test-session-missing-api-key-fallback"
SessionCategoryRegistry.register(sessionID, "test")
@@ -230,7 +230,7 @@ describe("runtime-fallback", () => {
const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
expect(fallbackLog).toBeDefined()
expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.2" })
expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.4" })
})
test("should detect retryable error from message pattern 'rate limit'", async () => {
@@ -260,7 +260,7 @@ describe("runtime-fallback", () => {
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback([
"anthropic/claude-opus-4.6",
"openai/gpt-5.2",
"openai/gpt-5.4",
]),
})
const sessionID = "test-session-model-not-found"
@@ -302,7 +302,7 @@ describe("runtime-fallback", () => {
const fallbackLogs = logCalls.filter((c) => c.msg.includes("Preparing fallback"))
expect(fallbackLogs.length).toBeGreaterThanOrEqual(2)
expect(fallbackLogs[1]?.data).toMatchObject({ from: "anthropic/claude-opus-4.6", to: "openai/gpt-5.2" })
expect(fallbackLogs[1]?.data).toMatchObject({ from: "anthropic/claude-opus-4.6", to: "openai/gpt-5.4" })
const nonRetryLog = logCalls.find(
(c) => c.msg.includes("Error not retryable") && (c.data as { sessionID?: string } | undefined)?.sessionID === sessionID
@@ -313,7 +313,7 @@ describe("runtime-fallback", () => {
test("should trigger fallback on Copilot auto-retry signal in message.updated", async () => {
const hook = createRuntimeFallbackHook(createMockPluginInput(), {
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
})
const sessionID = "test-session-copilot-auto-retry"
@@ -346,7 +346,7 @@ describe("runtime-fallback", () => {
const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
expect(fallbackLog).toBeDefined()
expect(fallbackLog?.data).toMatchObject({ from: "github-copilot/claude-opus-4.6", to: "openai/gpt-5.2" })
expect(fallbackLog?.data).toMatchObject({ from: "github-copilot/claude-opus-4.6", to: "openai/gpt-5.4" })
})
test("should trigger fallback on OpenAI auto-retry signal in message.updated", async () => {
@@ -658,7 +658,7 @@ describe("runtime-fallback", () => {
test("should trigger fallback when message.updated has missing API key error without model", async () => {
const hook = createRuntimeFallbackHook(createMockPluginInput(), {
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
})
const sessionID = "test-message-updated-missing-model"
SessionCategoryRegistry.register(sessionID, "test")
@@ -689,7 +689,7 @@ describe("runtime-fallback", () => {
const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
expect(fallbackLog).toBeDefined()
expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.2" })
expect(fallbackLog?.data).toMatchObject({ from: "google/gemini-2.5-pro", to: "openai/gpt-5.4" })
})
test("should not advance fallback state from message.updated while retry is already in flight", async () => {
@@ -709,7 +709,7 @@ describe("runtime-fallback", () => {
pluginConfig: createMockPluginConfigWithCategoryFallback([
"github-copilot/claude-opus-4.6",
"anthropic/claude-opus-4-6",
"openai/gpt-5.2",
"openai/gpt-5.4",
]),
}
)
@@ -799,7 +799,7 @@ describe("runtime-fallback", () => {
pluginConfig: createMockPluginConfigWithCategoryFallback([
"github-copilot/claude-opus-4.6",
"anthropic/claude-opus-4-6",
"openai/gpt-5.2",
"openai/gpt-5.4",
]),
}
)
@@ -883,7 +883,7 @@ describe("runtime-fallback", () => {
pluginConfig: createMockPluginConfigWithCategoryFallback([
"github-copilot/claude-opus-4.6",
"anthropic/claude-opus-4-6",
"openai/gpt-5.2",
"openai/gpt-5.4",
]),
session_timeout_ms: 20,
}
@@ -949,7 +949,7 @@ describe("runtime-fallback", () => {
pluginConfig: createMockPluginConfigWithCategoryFallback([
"github-copilot/claude-opus-4.6",
"anthropic/claude-opus-4-6",
"openai/gpt-5.2",
"openai/gpt-5.4",
]),
session_timeout_ms: 20,
}
@@ -1034,7 +1034,7 @@ describe("runtime-fallback", () => {
pluginConfig: createMockPluginConfigWithCategoryFallback([
"github-copilot/claude-opus-4.6",
"anthropic/claude-opus-4-6",
"openai/gpt-5.2",
"openai/gpt-5.4",
]),
session_timeout_ms: 20,
}
@@ -1099,7 +1099,7 @@ describe("runtime-fallback", () => {
pluginConfig: createMockPluginConfigWithCategoryFallback([
"github-copilot/claude-opus-4.6",
"anthropic/claude-opus-4-6",
"openai/gpt-5.2",
"openai/gpt-5.4",
]),
session_timeout_ms: 20,
}
@@ -1637,7 +1637,7 @@ describe("runtime-fallback", () => {
}),
{
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
}
)
@@ -1665,7 +1665,7 @@ describe("runtime-fallback", () => {
},
})
expect(retriedModels).toContain("openai/gpt-5.2")
expect(retriedModels).toContain("openai/gpt-5.4")
})
test("triggers fallback when message has mixed text and error parts", async () => {
@@ -1745,7 +1745,7 @@ describe("runtime-fallback", () => {
}),
{
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
}
)
@@ -1841,7 +1841,7 @@ describe("runtime-fallback", () => {
test("should apply fallback model on next chat.message after error", async () => {
const hook = createRuntimeFallbackHook(createMockPluginInput(), {
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2", "google/gemini-3.1-pro"]),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4", "google/gemini-3.1-pro"]),
})
const sessionID = "test-session-switch"
SessionCategoryRegistry.register(sessionID, "test")
@@ -1871,13 +1871,13 @@ describe("runtime-fallback", () => {
output
)
expect(output.message.model).toEqual({ providerID: "openai", modelID: "gpt-5.2" })
expect(output.message.model).toEqual({ providerID: "openai", modelID: "gpt-5.4" })
})
test("should notify when fallback occurs", async () => {
const hook = createRuntimeFallbackHook(createMockPluginInput(), {
config: createMockConfig({ notify_on_fallback: true }),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.2"]),
pluginConfig: createMockPluginConfigWithCategoryFallback(["openai/gpt-5.4"]),
})
const sessionID = "test-session-notify"
SessionCategoryRegistry.register(sessionID, "test")
@@ -1897,7 +1897,7 @@ describe("runtime-fallback", () => {
})
expect(toastCalls.length).toBe(1)
expect(toastCalls[0]?.message.includes("gpt-5.2")).toBe(true)
expect(toastCalls[0]?.message.includes("gpt-5.4")).toBe(true)
})
})
@@ -1916,7 +1916,7 @@ describe("runtime-fallback", () => {
const input = createMockPluginInput()
const hook = createRuntimeFallbackHook(input, {
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithAgentFallback("oracle", ["openai/gpt-5.2", "google/gemini-3.1-pro"]),
pluginConfig: createMockPluginConfigWithAgentFallback("oracle", ["openai/gpt-5.4", "google/gemini-3.1-pro"]),
})
const sessionID = "test-agent-fallback"
@@ -1936,16 +1936,16 @@ describe("runtime-fallback", () => {
},
})
//#then - should prepare fallback to openai/gpt-5.2
//#then - should prepare fallback to openai/gpt-5.4
const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
expect(fallbackLog).toBeDefined()
expect(fallbackLog?.data).toMatchObject({ from: "anthropic/claude-opus-4-5", to: "openai/gpt-5.2" })
expect(fallbackLog?.data).toMatchObject({ from: "anthropic/claude-opus-4-5", to: "openai/gpt-5.4" })
})
test("should detect agent from sessionID pattern", async () => {
const hook = createRuntimeFallbackHook(createMockPluginInput(), {
config: createMockConfig({ notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithAgentFallback("sisyphus", ["openai/gpt-5.2"]),
pluginConfig: createMockPluginConfigWithAgentFallback("sisyphus", ["openai/gpt-5.4"]),
})
const sessionID = "sisyphus-session-123"
@@ -1966,7 +1966,7 @@ describe("runtime-fallback", () => {
//#then - should detect sisyphus from sessionID and use its fallback
const fallbackLog = logCalls.find((c) => c.msg.includes("Preparing fallback"))
expect(fallbackLog).toBeDefined()
expect(fallbackLog?.data).toMatchObject({ to: "openai/gpt-5.2" })
expect(fallbackLog?.data).toMatchObject({ to: "openai/gpt-5.4" })
})
test("should preserve resolved agent during auto-retry", async () => {
@@ -2019,7 +2019,7 @@ describe("runtime-fallback", () => {
const hook = createRuntimeFallbackHook(createMockPluginInput(), {
config: createMockConfig({ cooldown_seconds: 60, notify_on_fallback: false }),
pluginConfig: createMockPluginConfigWithCategoryFallback([
"openai/gpt-5.2",
"openai/gpt-5.4",
"anthropic/claude-opus-4-5",
]),
})

View File

@@ -70,7 +70,7 @@ describe("createThinkModeHook", () => {
const input = createHookInput({
sessionID,
providerID: "github-copilot",
modelID: "gpt-5.2",
modelID: "gpt-5.4",
})
const output = createHookOutput("ultrathink about this")
@@ -81,7 +81,7 @@ describe("createThinkModeHook", () => {
expect(output.message.variant).toBe("high")
expect(output.message.model).toEqual({
providerID: "github-copilot",
modelID: "gpt-5-2-high",
modelID: "gpt-5-4-high",
})
})

View File

@@ -32,11 +32,11 @@ describe("think-mode switcher", () => {
})
it("should handle dots in GPT version numbers", () => {
// given a GPT model ID with dot format (gpt-5.2)
const variant = getHighVariant("gpt-5.2")
// given a GPT model ID with dot format (gpt-5.4)
const variant = getHighVariant("gpt-5.4")
// then should return high variant
expect(variant).toBe("gpt-5-2-high")
expect(variant).toBe("gpt-5-4-high")
})
it("should handle dots in GPT-5.1 codex variants", () => {
@@ -60,7 +60,7 @@ describe("think-mode switcher", () => {
it("should return null for already-high variants", () => {
// given model IDs that are already high variants
expect(getHighVariant("claude-opus-4-6-high")).toBeNull()
expect(getHighVariant("gpt-5-2-high")).toBeNull()
expect(getHighVariant("gpt-5-4-high")).toBeNull()
expect(getHighVariant("gemini-3-1-pro-high")).toBeNull()
})
@@ -76,20 +76,20 @@ describe("think-mode switcher", () => {
it("should detect -high suffix", () => {
// given model IDs with -high suffix
expect(isAlreadyHighVariant("claude-opus-4-6-high")).toBe(true)
expect(isAlreadyHighVariant("gpt-5-2-high")).toBe(true)
expect(isAlreadyHighVariant("gpt-5-4-high")).toBe(true)
expect(isAlreadyHighVariant("gemini-3.1-pro-high")).toBe(true)
})
it("should detect -high suffix after normalization", () => {
// given model IDs with dots that end in -high
expect(isAlreadyHighVariant("gpt-5.2-high")).toBe(true)
expect(isAlreadyHighVariant("gpt-5.4-high")).toBe(true)
})
it("should return false for base models", () => {
// given base model IDs without -high suffix
expect(isAlreadyHighVariant("claude-opus-4-6")).toBe(false)
expect(isAlreadyHighVariant("claude-opus-4.6")).toBe(false)
expect(isAlreadyHighVariant("gpt-5.2")).toBe(false)
expect(isAlreadyHighVariant("gpt-5.4")).toBe(false)
expect(isAlreadyHighVariant("gemini-3.1-pro")).toBe(false)
})
@@ -111,10 +111,10 @@ describe("think-mode switcher", () => {
it("should preserve openai/ prefix when getting high variant", () => {
// given a model ID with openai/ prefix
const variant = getHighVariant("openai/gpt-5-2")
const variant = getHighVariant("openai/gpt-5-4")
// then should return high variant with prefix preserved
expect(variant).toBe("openai/gpt-5-2-high")
expect(variant).toBe("openai/gpt-5-4-high")
})
it("should handle prefixes with dots in version numbers", () => {
@@ -141,7 +141,7 @@ describe("think-mode switcher", () => {
it("should return null for already-high prefixed models", () => {
// given prefixed model IDs that are already high
expect(getHighVariant("vertex_ai/claude-opus-4-6-high")).toBeNull()
expect(getHighVariant("openai/gpt-5-2-high")).toBeNull()
expect(getHighVariant("openai/gpt-5-4-high")).toBeNull()
})
})
@@ -149,20 +149,20 @@ describe("think-mode switcher", () => {
it("should detect -high suffix in prefixed models", () => {
// given prefixed model IDs with -high suffix
expect(isAlreadyHighVariant("vertex_ai/claude-opus-4-6-high")).toBe(true)
expect(isAlreadyHighVariant("openai/gpt-5-2-high")).toBe(true)
expect(isAlreadyHighVariant("openai/gpt-5-4-high")).toBe(true)
expect(isAlreadyHighVariant("custom/gemini-3.1-pro-high")).toBe(true)
})
it("should return false for prefixed base models", () => {
// given prefixed base model IDs without -high suffix
expect(isAlreadyHighVariant("vertex_ai/claude-opus-4-6")).toBe(false)
expect(isAlreadyHighVariant("openai/gpt-5-2")).toBe(false)
expect(isAlreadyHighVariant("openai/gpt-5-4")).toBe(false)
})
it("should handle prefixed models with dots", () => {
// given prefixed model IDs with dots
expect(isAlreadyHighVariant("vertex_ai/gpt-5.2")).toBe(false)
expect(isAlreadyHighVariant("vertex_ai/gpt-5.2-high")).toBe(true)
expect(isAlreadyHighVariant("vertex_ai/gpt-5.4")).toBe(false)
expect(isAlreadyHighVariant("vertex_ai/gpt-5.4-high")).toBe(true)
})
})
})

View File

@@ -25,7 +25,7 @@ import { normalizeModelID } from "../../shared"
* @example
* extractModelPrefix("vertex_ai/claude-sonnet-4-6") // { prefix: "vertex_ai/", base: "claude-sonnet-4-6" }
* extractModelPrefix("claude-sonnet-4-6") // { prefix: "", base: "claude-sonnet-4-6" }
* extractModelPrefix("openai/gpt-5.2") // { prefix: "openai/", base: "gpt-5.2" }
* extractModelPrefix("openai/gpt-5.4") // { prefix: "openai/", base: "gpt-5.4" }
*/
function extractModelPrefix(modelID: string): { prefix: string; base: string } {
const slashIndex = modelID.indexOf("/")
@@ -61,10 +61,10 @@ const HIGH_VARIANT_MAP: Record<string, string> = {
"gpt-5-1-codex": "gpt-5-1-codex-high",
"gpt-5-1-codex-mini": "gpt-5-1-codex-mini-high",
"gpt-5-1-codex-max": "gpt-5-1-codex-max-high",
// GPT-5.2
"gpt-5-2": "gpt-5-2-high",
"gpt-5-2-chat-latest": "gpt-5-2-chat-latest-high",
"gpt-5-2-pro": "gpt-5-2-pro-high",
// GPT-5.4
"gpt-5-4": "gpt-5-4-high",
"gpt-5-4-chat-latest": "gpt-5-4-chat-latest-high",
"gpt-5-4-pro": "gpt-5-4-pro-high",
// Antigravity (Google)
"antigravity-gemini-3-1-pro": "antigravity-gemini-3-1-pro-high",
"antigravity-gemini-3-flash": "antigravity-gemini-3-flash-high",
@@ -97,4 +97,3 @@ export function isAlreadyHighVariant(modelID: string): boolean {
const { base } = extractModelPrefix(normalized)
return ALREADY_HIGH.has(base) || base.endsWith("-high")
}

View File

@@ -1345,8 +1345,8 @@ describe("todo-continuation-enforcer", () => {
// OpenCode returns assistant messages with flat modelID/providerID, not nested model object
const mockMessagesWithAssistant = [
{ info: { id: "msg-1", role: "user", agent: "sisyphus", model: { providerID: "openai", modelID: "gpt-5.2" } } },
{ info: { id: "msg-2", role: "assistant", agent: "sisyphus", modelID: "gpt-5.2", providerID: "openai" } },
{ info: { id: "msg-1", role: "user", agent: "sisyphus", model: { providerID: "openai", modelID: "gpt-5.4" } } },
{ info: { id: "msg-2", role: "assistant", agent: "sisyphus", modelID: "gpt-5.4", providerID: "openai" } },
]
const mockInput = {
@@ -1390,7 +1390,7 @@ describe("todo-continuation-enforcer", () => {
// then - model should be extracted from assistant message's flat modelID/providerID
expect(promptCalls.length).toBe(1)
expect(promptCalls[0].model).toEqual({ providerID: "openai", modelID: "gpt-5.2" })
expect(promptCalls[0].model).toEqual({ providerID: "openai", modelID: "gpt-5.4" })
})
// ============================================================

View File

@@ -12,7 +12,7 @@ describe("mergeConfigs", () => {
const base = {
categories: {
general: {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
temperature: 0.5,
},
quick: {
@@ -35,7 +35,7 @@ describe("mergeConfigs", () => {
const result = mergeConfigs(base, override);
// then general.model should be preserved from base
expect(result.categories?.general?.model).toBe("openai/gpt-5.2");
expect(result.categories?.general?.model).toBe("openai/gpt-5.4");
// then general.temperature should be overridden
expect(result.categories?.general?.temperature).toBe(0.3);
// then quick should be preserved from base
@@ -48,7 +48,7 @@ describe("mergeConfigs", () => {
const base: OhMyOpenCodeConfig = {
categories: {
general: {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
},
},
};
@@ -57,7 +57,7 @@ describe("mergeConfigs", () => {
const result = mergeConfigs(base, override);
expect(result.categories?.general?.model).toBe("openai/gpt-5.2");
expect(result.categories?.general?.model).toBe("openai/gpt-5.4");
});
it("should use override categories when base has no categories", () => {
@@ -66,14 +66,14 @@ describe("mergeConfigs", () => {
const override: OhMyOpenCodeConfig = {
categories: {
general: {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
},
},
};
const result = mergeConfigs(base, override);
expect(result.categories?.general?.model).toBe("openai/gpt-5.2");
expect(result.categories?.general?.model).toBe("openai/gpt-5.4");
});
});
@@ -81,7 +81,7 @@ describe("mergeConfigs", () => {
it("should deep merge agents", () => {
const base: OhMyOpenCodeConfig = {
agents: {
oracle: { model: "openai/gpt-5.2" },
oracle: { model: "openai/gpt-5.4" },
},
};
@@ -94,7 +94,7 @@ describe("mergeConfigs", () => {
const result = mergeConfigs(base, override);
expect(result.agents?.oracle?.model).toBe("openai/gpt-5.2");
expect(result.agents?.oracle?.model).toBe("openai/gpt-5.4");
expect(result.agents?.oracle?.temperature).toBe(0.5);
expect(result.agents?.explore?.model).toBe("anthropic/claude-haiku-4-5");
});
@@ -127,8 +127,8 @@ describe("parseConfigPartially", () => {
it("should return the full config when everything is valid", () => {
const rawConfig = {
agents: {
oracle: { model: "openai/gpt-5.2" },
momus: { model: "openai/gpt-5.2" },
oracle: { model: "openai/gpt-5.4" },
momus: { model: "openai/gpt-5.4" },
},
disabled_hooks: ["comment-checker"],
};
@@ -136,8 +136,8 @@ describe("parseConfigPartially", () => {
const result = parseConfigPartially(rawConfig);
expect(result).not.toBeNull();
expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.2");
expect(result!.agents?.momus?.model).toBe("openai/gpt-5.2");
expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.4");
expect(result!.agents?.momus?.model).toBe("openai/gpt-5.4");
expect(result!.disabled_hooks).toEqual(["comment-checker"]);
});
});
@@ -150,8 +150,8 @@ describe("parseConfigPartially", () => {
it("should preserve valid agent overrides when another section is invalid", () => {
const rawConfig = {
agents: {
oracle: { model: "openai/gpt-5.2" },
momus: { model: "openai/gpt-5.2" },
oracle: { model: "openai/gpt-5.4" },
momus: { model: "openai/gpt-5.4" },
prometheus: {
permission: {
edit: { "*": "ask", ".sisyphus/**": "allow" },
@@ -171,7 +171,7 @@ describe("parseConfigPartially", () => {
it("should preserve valid agents when a non-agent section is invalid", () => {
const rawConfig = {
agents: {
oracle: { model: "openai/gpt-5.2" },
oracle: { model: "openai/gpt-5.4" },
},
disabled_hooks: ["not-a-real-hook"],
};
@@ -179,7 +179,7 @@ describe("parseConfigPartially", () => {
const result = parseConfigPartially(rawConfig);
expect(result).not.toBeNull();
expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.2");
expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.4");
expect(result!.disabled_hooks).toEqual(["not-a-real-hook"]);
});
});
@@ -224,7 +224,7 @@ describe("parseConfigPartially", () => {
it("should ignore unknown keys and return valid sections", () => {
const rawConfig = {
agents: {
oracle: { model: "openai/gpt-5.2" },
oracle: { model: "openai/gpt-5.4" },
},
some_future_key: { foo: "bar" },
};
@@ -232,7 +232,7 @@ describe("parseConfigPartially", () => {
const result = parseConfigPartially(rawConfig);
expect(result).not.toBeNull();
expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.2");
expect(result!.agents?.oracle?.model).toBe("openai/gpt-5.4");
expect((result as Record<string, unknown>)["some_future_key"]).toBeUndefined();
});
});

View File

@@ -656,7 +656,7 @@ describe("Prometheus direct override priority over category", () => {
},
categories: {
"test-planning": {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
reasoningEffort: "xhigh",
},
},
@@ -698,7 +698,7 @@ describe("Prometheus direct override priority over category", () => {
},
categories: {
"reasoning-cat": {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
reasoningEffort: "high",
},
},
@@ -739,7 +739,7 @@ describe("Prometheus direct override priority over category", () => {
},
categories: {
"temp-cat": {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
temperature: 0.8,
},
},
@@ -860,7 +860,7 @@ describe("Plan agent model inheritance from prometheus", () => {
test("plan agent inherits temperature, reasoningEffort, and other model settings from prometheus", async () => {
//#given - prometheus configured with category that has temperature and reasoningEffort
spyOn(shared, "resolveModelPipeline" as any).mockReturnValue({
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
provenance: "override",
variant: "high",
})
@@ -871,7 +871,7 @@ describe("Plan agent model inheritance from prometheus", () => {
},
agents: {
prometheus: {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
variant: "high",
temperature: 0.3,
top_p: 0.9,
@@ -902,7 +902,7 @@ describe("Plan agent model inheritance from prometheus", () => {
const agents = config.agent as Record<string, Record<string, unknown>>
expect(agents.plan).toBeDefined()
expect(agents.plan.mode).toBe("subagent")
expect(agents.plan.model).toBe("openai/gpt-5.2")
expect(agents.plan.model).toBe("openai/gpt-5.4")
expect(agents.plan.variant).toBe("high")
expect(agents.plan.temperature).toBe(0.3)
expect(agents.plan.top_p).toBe(0.9)
@@ -913,7 +913,7 @@ describe("Plan agent model inheritance from prometheus", () => {
})
test("plan agent user override takes priority over prometheus inherited settings", async () => {
//#given - prometheus resolves to opus, but user has plan override for gpt-5.2
//#given - prometheus resolves to opus, but user has plan override for gpt-5.4
spyOn(shared, "resolveModelPipeline" as any).mockReturnValue({
model: "anthropic/claude-opus-4-6",
provenance: "provider-fallback",
@@ -926,7 +926,7 @@ describe("Plan agent model inheritance from prometheus", () => {
},
agents: {
plan: {
model: "openai/gpt-5.2",
model: "openai/gpt-5.4",
variant: "high",
temperature: 0.5,
},
@@ -950,7 +950,7 @@ describe("Plan agent model inheritance from prometheus", () => {
//#then - plan uses its own override, not prometheus settings
const agents = config.agent as Record<string, Record<string, unknown>>
expect(agents.plan.model).toBe("openai/gpt-5.2")
expect(agents.plan.model).toBe("openai/gpt-5.4")
expect(agents.plan.variant).toBe("high")
expect(agents.plan.temperature).toBe(0.5)
})

Some files were not shown because too many files have changed in this diff Show More