Compare commits

..

649 Commits

Author SHA1 Message Date
github-actions[bot]
3dc11ea620 @HOYALIM has signed the CLA in code-yeongyu/oh-my-openagent#2935 2026-03-29 07:31:49 +00:00
github-actions[bot]
5a28ee1bef @quangtran88 has signed the CLA in code-yeongyu/oh-my-openagent#2929 2026-03-29 03:21:54 +00:00
github-actions[bot]
5d4e57ce96 @lorenzo-dallamuta has signed the CLA in code-yeongyu/oh-my-openagent#2925 2026-03-28 21:43:40 +00:00
YeonGyu-Kim
b2497f1327 fix: resolve 3 community-reported bugs (#2915, #2917, #2918)
- background_output: snapshot read cursor before consuming, restore on
  /undo message removal so re-reads return data (fixes #2915)
- MCP loader: preserve oauth field in transformMcpServer, add scope/
  projectPath filtering so local-scoped MCPs only load in matching
  directories (fixes #2917)
- runtime-fallback: add 'reached your usage limit' to retryable error
  patterns so quota exhaustion triggers model fallback (fixes #2918)

Verified: bun test (4606 pass / 0 fail), tsc --noEmit clean
2026-03-29 04:53:43 +09:00
github-actions[bot]
9fc56ab544 @ryandielhenn has signed the CLA in code-yeongyu/oh-my-openagent#2919 2026-03-28 17:47:04 +00:00
github-actions[bot]
448a8dc93d @AlexDochioiu has signed the CLA in code-yeongyu/oh-my-openagent#2916 2026-03-28 12:20:53 +00:00
YeonGyu-Kim
9cbcf17dde Merge PR #2913: fix(recovery): ignore empty summary-only assistant messages 2026-03-28 17:15:04 +09:00
Ravi Tharuma
4e214cba4e fix(recovery): ignore empty summary-only assistant messages 2026-03-28 08:57:36 +01:00
YeonGyu-Kim
4a029258a4 fix: resolve 5 remaining pre-publish blockers (14, 15, 17, 21, 25c)
- completion-promise-detector: restrict to assistant text parts only,
  remove tool_result from completion detection (blocker 14)
- ralph-loop tests: flip tool_result completion expectations to negative
  coverage, add false-positive rejection tests (blocker 15)
- skill tools: merge nativeSkills into initial cachedDescription
  synchronously before any execute() call (blocker 17)
- skill tools test: add assertion for initial description including
  native skills before execute() (blocker 25c)
- docs: sync all 4 fallback-chain docs with model-requirements.ts
  runtime source of truth (blocker 21)

Verified: bun test (4599 pass / 0 fail), tsc --noEmit clean
2026-03-28 15:57:27 +09:00
YeonGyu-Kim
d2c576c510 fix: resolve 25 pre-publish blockers
- postinstall.mjs: fix alias package detection
- migrate-legacy-plugin-entry: dedupe + regression tests
- task_system: default consistency across runtime paths
- task() contract: consistent tool behavior
- runtime model selection, tool cap, stale-task cancellation
- recovery sanitization, context-limit gating
- Ralph semantic DONE hardening, Atlas fallback persistence
- native-skill description/content, skill path traversal guard
- publish workflow: platform awaited via reusable workflow job
- release: version edits reapplied before commit/tag
- JSONC plugin migration: top-level plugin key safety
- cold-cache: user fallback models skip disconnected providers
- docs/version/release framing updates

Verified: bun test (4599 pass), tsc --noEmit clean, bun run build clean
2026-03-28 15:24:18 +09:00
YeonGyu-Kim
44b039bef6 Merge pull request #2705 from MoerAI/fix/sisyphus-premature-implementation
fix(sisyphus): block premature implementation before context is complete (fixes #2274)
2026-03-28 01:46:31 +09:00
YeonGyu-Kim
aeec5ef98d Merge pull request #2773 from MoerAI/fix/ralph-loop-fuzzy-completion
fix(ralph-loop): add semantic completion detection as fallback for natural language (fixes #2489)
2026-03-28 01:44:56 +09:00
YeonGyu-Kim
7b2b8be181 Merge pull request #2771 from MoerAI/fix/bash-file-read-guard
fix(hooks): add bash-file-read-guard to warn agents against cat/head/tail (fixes #2096)
2026-03-28 01:44:44 +09:00
YeonGyu-Kim
adc55138c8 Merge pull request #2850 from octo-patch/feature/upgrade-minimax-m2.7
feat: upgrade remaining MiniMax M2.5 fallbacks to M2.7-highspeed
2026-03-28 01:42:25 +09:00
YeonGyu-Kim
3715fb79b9 Merge pull request #2871 from Jholly2008/kkk/fix-on-complete-hook-shell
fix(cli): respect platform shell for --on-complete
2026-03-28 01:42:13 +09:00
YeonGyu-Kim
b9ed0ca30b Merge pull request #2877 from WhiteGiverMa/fix/atlas-agent-not-found
fix: use getAgentDisplayName in injectBoulderContinuation
2026-03-28 01:40:38 +09:00
YeonGyu-Kim
45e9fcd776 Merge pull request #2894 from codivedev/fix/issue-2881
fix: detect and warn about opencode-skills conflict
2026-03-28 01:40:04 +09:00
YeonGyu-Kim
49687a654a Merge pull request #2895 from MoerAI/fix/session-recovery-missing-messageid
fix(session-recovery): fallback to fetching messageID from session messages (fixes #2046)
2026-03-28 01:39:44 +09:00
YeonGyu-Kim
364550038a Merge pull request #2892 from MoerAI/fix/subagent-model-config-ignored
fix(delegate-task): honor user model override in category-resolver cold cache (fixes #2712)
2026-03-28 01:39:33 +09:00
YeonGyu-Kim
9e6f2d9977 Merge pull request #2893 from MoerAI/fix/keyword-detector-silent-skips
fix(keyword-detector): add logging for silent skip paths (fixes #2058)
2026-03-28 01:39:20 +09:00
YeonGyu-Kim
4fa7d48c04 Merge pull request #2890 from MoerAI/fix/start-work-atlas-not-found
fix(start-work): gracefully handle missing Atlas agent (fixes #2132)
2026-03-28 01:38:53 +09:00
YeonGyu-Kim
b4a5189a07 Merge pull request #2889 from MoerAI/fix/git-master-config-ignored
fix(config): apply git_master defaults when section is missing (fixes #2040)
2026-03-28 01:38:38 +09:00
YeonGyu-Kim
631092461c Merge pull request #2891 from codivedev/fix/issue-2882
fix: use display name in runtime-fallback retry
2026-03-28 01:38:14 +09:00
YeonGyu-Kim
c5068d37d2 fix(#2885): add model_not_supported to RETRYABLE error patterns
model_not_supported errors from providers (e.g. OpenAI returning
{"error": {"code": "model_not_supported"}}) were not recognized as
retryable. Subagents would silently fail with no response, hanging the
parent session.

Fix:
- Add "model_not_supported", "model not supported", "model is not
  supported" to RETRYABLE_MESSAGE_PATTERNS in model-error-classifier.ts
- Add regex patterns to RETRYABLE_ERROR_PATTERNS in
  runtime-fallback/constants.ts to match "model ... is ... not ...
  supported" with flexible spacing
- Add regression test covering all three variations

Now model_not_supported errors trigger the normal fallback chain instead
of silent failure.
2026-03-28 00:42:52 +09:00
YeonGyu-Kim
1434313bd7 Merge pull request #2717 from gtg7784/fix/analyze-mode-load-skills-hint
Verified: tsc clean, 27/27 keyword-detector tests pass. Fix is correct — analyze-mode message was missing delegate_task required params which caused load_skills validation errors.
2026-03-28 00:29:53 +09:00
codivedev
38347a396e fix: address review comments - remove duplicates and respect skills config 2026-03-27 13:49:58 +01:00
codivedev
885d3a2462 fix: detect and warn about opencode-skills conflict 2026-03-27 13:40:00 +01:00
codivedev
b4d4d30fa8 fix: use display name in runtime-fallback retry 2026-03-27 13:39:50 +01:00
github-actions[bot]
9d9365901b @codivedev has signed the CLA in code-yeongyu/oh-my-openagent#2888 2026-03-27 12:39:50 +00:00
MoerAI
2b2b280895 fix: apply Zod defaults to empty config fallback 2026-03-27 21:30:56 +09:00
MoerAI
fee60d2def fix(session-recovery): fallback to fetching messageID from session messages (fixes #2046) 2026-03-27 21:26:09 +09:00
MoerAI
f030e0d78d fix(keyword-detector): add logging for silent skip paths (fixes #2058) 2026-03-27 21:24:03 +09:00
MoerAI
5d5eb46f19 fix(delegate-task): honor user model override in category-resolver cold cache (fixes #2712) 2026-03-27 21:21:17 +09:00
User
787ce99eda fix: detect and warn about opencode-skills conflict
When opencode-skills plugin is registered alongside oh-my-openagent,
all user skills are loaded twice, causing 'Duplicate tool names detected'
warnings and HTTP 400 errors.

This fix:
1. Detects if opencode-skills plugin is loaded in opencode.json
2. Emits a startup warning explaining the conflict
3. Suggests fixes: either remove opencode-skills or disable skills in oh-my-openagent

Fixes #2881
2026-03-27 13:21:08 +01:00
MoerAI
d09af86ea7 fix(start-work): gracefully handle missing Atlas agent (fixes #2132) 2026-03-27 21:13:44 +09:00
MoerAI
5b9b6eb0b8 fix(config): apply git_master defaults when section is missing (fixes #2040) 2026-03-27 21:00:01 +09:00
YeonGyu-Kim
324dbb119c fix(#2791): await session.abort() in all subagent completion/cancel paths
Fire-and-forget session.abort() calls during subagent completion left
dangling promises that raced with parent session teardown. In Bun on
WSL2/Linux, this triggered a StringImplShape assertion (SIGABRT) as
WebKit GC collected string data still referenced by the inflight request.

Fix: await session.abort() in all four completion/error paths:
- startTask promptAsync error handler (launch path)
- resume promptAsync error handler (resume path)
- cancelTask (explicit cancel path)
- tryCompleteTask (normal completion path)

Also marks the two .catch() error callbacks as async so the await is
valid.

Test: update session.deleted cascade test to flush two microtask rounds
since cancelTask now awaits abort before cleanupPendingByParent.
2026-03-27 19:57:57 +09:00
YeonGyu-Kim
ab0b084199 Merge pull request #2884 from RaviTharuma/fix/runtime-fallback-hook-isolation
Verified: bun test src/plugin/event.test.ts src/hooks/runtime-fallback/index.test.ts -- 68/68 pass. tsc clean.
2026-03-27 19:10:06 +09:00
YeonGyu-Kim
f1f099fde9 fix(#2849): resolve platform binaries using current package name
The installer wrapper and postinstall script still hardcoded the old
oh-my-opencode-* platform package family. When users installed the renamed
npm package oh-my-openagent (for example via npx oh-my-openagent install on
WSL2/Linux), the main package installed correctly but the wrapper looked for
nonexistent optional binaries like oh-my-opencode-linux-x64.

Fix:
- derive packageBaseName from package.json at runtime in wrapper/postinstall
- thread packageBaseName through platform package candidate resolution
- keep legacy oh-my-opencode default as fallback
- add regression test for renamed package family
2026-03-27 18:29:36 +09:00
YeonGyu-Kim
6662205646 fix(#2748): pass browserProvider into skill() discovery
skill-context already filtered browser-related skills using the configured
browser provider, but the skill tool rebuilt discovery without forwarding
browserProvider. That caused skills like agent-browser to be prompt-visible
while skill() could still fail to resolve them unless browser_automation_engine.provider
was explicitly threaded through both paths.

Fix:
- pass skillContext.browserProvider from tool-registry into createSkillTool
- extend SkillLoadOptions with browserProvider
- forward browserProvider to getAllSkills()
- add regression tests for execution and description visibility
2026-03-27 17:58:38 +09:00
YeonGyu-Kim
76bf269b39 fix(#2754): include native PluginInput skills in skill() discovery
The skill tool previously only merged disk-discovered skills and preloaded
options.skills, so skills registered natively via ctx.skills (for example
from config.skills.paths or other plugins) were prompt-visible but not
discoverable by skill().

Fix:
- pass ctx.skills from tool-registry into createSkillTool
- extend SkillLoadOptions with nativeSkills accessor
- merge nativeSkills.all() results into getSkills()
- add test covering native PluginInput skill discovery
2026-03-27 17:42:43 +09:00
Ravi Tharuma
3e4b988860 fix: isolate event hook failures during dispatch 2026-03-27 09:30:43 +01:00
YeonGyu-Kim
d3dbb4976e fix(#2854): enable task system by default (oracle/subagent delegation)
task_system now defaults to true instead of false. Users on opencode-go
or other plans were unable to invoke oracle or use subagent delegation
because experimental.task_system defaulted off.

The task system is the primary mechanism for oracle, explore, and other
subagent invocations — it should not be behind an experimental flag.

Users can still explicitly set experimental.task_system: false to opt out.
2026-03-27 17:03:33 +09:00
YeonGyu-Kim
ec7a2e3eae fix(#2857): prevent npm scoped package paths from being resolved as skill paths
resolveSkillPathReferences: add looksLikeFilePath() guard that requires
a file extension or trailing slash before resolving @scope/package
references. npm packages like @mycom/my_mcp_tools@beta were incorrectly
being rewritten to absolute paths in skill templates.

2 new tests.
2026-03-27 16:59:04 +09:00
YeonGyu-Kim
c41e59e9ab fix(#2825): secondary agents no longer pruned after 30 min of total runtime
TTL (pruneStaleTasksAndNotifications) now resets on last activity:
- Uses task.progress.lastUpdate as TTL anchor for running tasks
  (was always using startedAt, causing 30-min hard deadline)
- Added taskTtlMs config option for user-adjustable TTL
- Error message shows actual TTL duration, not hardcoded '30 minutes'
- 3 new tests for the new behavior
2026-03-27 16:06:38 +09:00
YeonGyu-Kim
3b4420bc23 fix(#2735): check model availability before using custom subagent default model
subagent-resolver: when falling back to matchedAgent.model for custom
subagents, verify the model is actually available via fuzzyMatchModel
before setting it as categoryModel. Prevents delegate_task from using
an unavailable model when the user's custom agent config references
a model they don't have access to.

Test updated to include the target model in available models mock.
2026-03-27 15:50:16 +09:00
YeonGyu-Kim
3be26cb97f fix(#2732): enhance notification for failed/crashed subagent tasks
- completedTaskSummaries now includes status and error info
- notifyParentSession: noReply=false for failed tasks so parent reacts
- Batch notification distinguishes successful vs failed/cancelled tasks
- notification-template updated to show task errors
- task-poller: session-gone tests (85 new lines)
- CI: add Bun shim to PATH for legacy plugin migration tests
2026-03-27 15:48:07 +09:00
YeonGyu-Kim
e22e13cd29 fix(#2732): detect crashed subagent sessions with shorter timeout
When a subagent session disappears from the status registry (process
crashed), the main agent was waiting the full stale timeout before
acting. Fix:

- Add sessionGoneTimeoutMs config option (default 60s, vs 30min normal)
- task-poller: use shorter timeout when session is gone from status
- manager: verify session existence when gone, fail crashed tasks
  immediately with descriptive error
- Add legacy-plugin-toast hook for #2823 migration warnings
- Update schema with new config option
2026-03-27 15:43:01 +09:00
YeonGyu-Kim
6a733c9dde fix(#2823): auto-migrate legacy plugin name and warn users at startup
- logLegacyPluginStartupWarning now emits console.warn (visible to user,
  not just log file) when oh-my-opencode is detected in opencode.json
- Auto-migrates opencode.json plugin entry from oh-my-opencode to
  oh-my-openagent (with backup)
- plugin-config.ts: add console.warn when loading legacy config filename
- test: 10 tests covering migration, console output, edge cases
2026-03-27 15:40:04 +09:00
YeonGyu-Kim
127626a122 fix(#2822): properly cleanup tmux sessions on process shutdown
Two issues fixed:
1. process-cleanup.ts used fire-and-forget void Promise for shutdown
   handlers — now properly collects and awaits all cleanup promises
   via Promise.allSettled, with dedup guard to prevent double cleanup
2. TmuxSessionManager was never registered for process cleanup —
   now registered in create-managers.ts via registerManagerForCleanup

Also fixed setTimeout().unref() which could let the process exit
before cleanup completes.
2026-03-27 15:23:48 +09:00
YeonGyu-Kim
5765168af4 fix(#2731): skip unauthenticated providers when resolving subagent model
Background subagents (explore/librarian) failed with auth errors because
resolveModelForDelegateTask() always picked the first fallback entry when
availableModels was empty — often an unauthenticated provider like xai
or opencode-go.

Fix: when connectedProvidersCache is populated, iterate fallback chain
and pick the first entry whose provider is in the connected set.
Legacy behavior preserved when cache is null (not yet populated).

- model-selection.ts: use readConnectedProvidersCache to filter fallback chain
- test: 4 new tests for connected-provider-aware resolution
2026-03-27 14:38:02 +09:00
github-actions[bot]
e65a0ed10d @WhiteGiverMa has signed the CLA in code-yeongyu/oh-my-openagent#2877 2026-03-27 05:26:49 +00:00
YeonGyu-Kim
041770ff42 fix(#2736): prevent infinite compaction loop by setting cooldown before try
lastCompactionTime was only set on successful compaction. When compaction
failed (rate limit, timeout, etc.), no cooldown was recorded, causing
immediate retries in an infinite loop.

Fix: set lastCompactionTime before the try block so both success and
failure respect the cooldown window.

- test: add failed-compaction cooldown enforcement test
- test: fix timeout retry test to advance past cooldown
2026-03-27 14:25:38 +09:00
WhiteGiverMa
a3b84ec5f9 fix: use getAgentDisplayName in injectBoulderContinuation
The agent parameter was using raw config key "atlas" but the SDK
expects the display name "Atlas (Plan Executor)". This caused
"Agent not found: 'atlas'" errors when auto-compact tried to
continue boulder execution.

Root cause: injectBoulderContinuation passed raw agent key to
session.promptAsync, but SDK's agent matching logic compares
against display names registered in the system.

Fix: Use getAgentDisplayName() to convert the config key to
the expected display name before passing to the SDK.
2026-03-27 13:22:58 +08:00
YeonGyu-Kim
7ce7a85768 fix(#2855): tmux health check fails across module instances in same process
The server-health module used module-level state for inProcessServerRunning,
which doesn't survive when Bun loads separate module instances in the same
process. Fix: use globalThis with Symbol.for key so the flag is truly
process-global.

- server-health.ts: replace module-level boolean with globalThis[Symbol.for()]
- export markServerRunningInProcess from tmux-utils barrel
- test: verify flag skips HTTP fetch, verify globalThis persistence
2026-03-27 14:17:06 +09:00
YeonGyu-Kim
19ab3b5656 fix(#2853): sync .sisyphus state from worktree to main repo before removal
When .sisyphus/ is gitignored, task state written during worktree execution
is lost when the worktree is removed. Fix:

- add worktree-sync.ts: syncSisyphusStateFromWorktree() copies .sisyphus/
  contents from worktree to main repo directory
- update start-work.ts template: documents the sync step as CRITICAL when
  worktree_path is set in boulder.json
- update work-with-pr/SKILL.md: adds explicit sync step before worktree removal
- export from boulder-state index
- test: 5 scenarios covering no-.sisyphus, nested dirs, overwrite stale state
2026-03-27 14:11:45 +09:00
YeonGyu-Kim
670d8ab175 fix(#2852): forward model overrides from categories/agent config to subagents
- fix(switcher): use lastIndexOf for multi-slash model IDs (e.g. aws/anthropic/claude-sonnet-4)
- fix(model-resolution): same lastIndexOf fix in doctor parseProviderModel
- fix(call-omo-agent): resolve model from agent config and forward to both
  background and sync executors via DelegatedModelConfig
- fix(subagent-resolver): inherit category model/variant when agent uses
  category reference without explicit model override
- test: add model override forwarding tests for call-omo-agent
- test: add multi-slash model ID test for switcher
2026-03-27 13:59:06 +09:00
YeonGyu-Kim
40a92138ea fix: resolve three open bugs (#2836, #2858, #2873)
- fix(context-limit): check modelContextLimitsCache for all Anthropic
  models, not just GA-model set; user config/cache wins over 200K default
  (fixes #2836)

- fix(agent-key-remapper): preserve config key aliases alongside display
  names so `opencode run --agent sisyphus` resolves correctly
  (fixes #2858)

- fix(tool-config): respect host permission.skill=deny by disabling
  skill/skill_mcp tools when host denies them (fixes #2873)

- test: update context-limit and agent-key-remapper tests to match new
  behavior
2026-03-27 13:32:18 +09:00
YeonGyu-Kim
a081ddcefb docs: update documentation for v3.13.1 feature changes
- Document object-style fallback_models with per-model settings
- Add model-settings compatibility normalization docs
- Document file:// URI support for prompt and prompt_append
- Add deterministic core-agent order (Tab cycling) docs
- Update rename compatibility notes (legacy plugin warning)
- Document doctor legacy package name detection
- Add models.dev-backed capability cache documentation
- Update Hephaestus default to gpt-5.4 (medium)
- Correct MiniMax M2.7/M2.5 usage across all docs
- Update all agent/category provider chain tables
- Fix stale CLI install/doctor options to match source
- Add refresh-model-capabilities command to CLI reference

Co-authored-by: Sisyphus <sisyphus@oh-my-opencode>
2026-03-27 12:59:50 +09:00
YeonGyu-Kim
8f4554e115 fix(lsp): accept directory as alias for filePath in lsp_diagnostics
Models frequently hallucinate a 'directory' parameter alongside filePath,
causing hard failures. Instead of rejecting, accept directory as an alias
for filePath and gracefully handle when both are provided (prefer filePath).

This prevents the 'filePath and directory are mutually exclusive' error
that users hit when models pass both parameters.

Fixes model confusion with lsp_diagnostics tool parameters.
2026-03-27 12:59:50 +09:00
github-actions[bot]
07793f35a7 @Jholly2008 has signed the CLA in code-yeongyu/oh-my-openagent#2871 2026-03-27 03:37:37 +00:00
YeonGyu-Kim
8ca93c7a27 Merge pull request #2863 from MoerAI/fix/task-schema-mutual-exclusion
fix(delegate-task): reject when both category and subagent_type provided (fixes #2847)
2026-03-27 12:30:46 +09:00
YeonGyu-Kim
a1b4e97e74 Merge pull request #2856 from potb/fix/publish-version-commitback
fix(publish): restore version commit-back to dev after npm release
2026-03-27 12:30:34 +09:00
YeonGyu-Kim
47e7d4afbb Merge pull request #2861 from MoerAI/fix/category-config-params
fix(delegate-task): apply category config temperature/maxTokens/top_p to categoryModel (fixes #2831)
2026-03-27 12:30:31 +09:00
YeonGyu-Kim
6d3172adc9 Merge pull request #2862 from MoerAI/fix/empty-text-with-tool-calls
fix(recovery): detect empty text parts alongside tool calls in fixEmptyMessages (fixes #2830)
2026-03-27 12:30:29 +09:00
YeonGyu-Kim
65dc3e4a3b Merge pull request #2865 from LTS2/fix/2803-hook-task-examples-missing-load-skills
Fix missing load_skills parameter in hook-injected delegate_task examples
2026-03-27 12:30:26 +09:00
YeonGyu-Kim
587ee704e8 Merge pull request #2866 from LTS2/fix/2830-empty-message-recovery-with-tool-calls
Fix empty message recovery when tool calls coexist with empty text parts
2026-03-27 12:30:23 +09:00
YeonGyu-Kim
3bafa88204 Merge pull request #2867 from MoerAI/fix/openai-tool-limit
fix(tools): add max_tools config to cap registered tools for OpenAI compatibility (fixes #2848)
2026-03-27 12:30:21 +09:00
YeonGyu-Kim
f2496158e8 Merge pull request #2859 from RaviTharuma/docs/fallback-model-objects
docs(config): document object-style fallback_models
2026-03-27 12:25:07 +09:00
YeonGyu-Kim
a7ac2e7aba merge: resolve conflicts with dev docs update 2026-03-27 12:24:51 +09:00
YeonGyu-Kim
a2c7fed9d4 docs: comprehensive update for v3.14.0 features
- Document object-style fallback_models with per-model settings
- Add package rename compatibility layer docs (oh-my-opencode → oh-my-openagent)
- Update agent-model-matching with Hephaestus gpt-5.4 default
- Document MiniMax M2.5 → M2.7 upgrade across agents
- Add agent priority/order deterministic Tab cycling docs
- Document file:// URI support for agent prompt field
- Add doctor legacy package name warning docs
- Update CLI reference with new doctor checks
- Document model settings compatibility resolver
2026-03-27 12:20:40 +09:00
MoerAI
98572c8dac fix: guard fallback override to preserve category config params when fallback fields are undefined 2026-03-27 12:20:19 +09:00
孔祥俊
661737b95a fix(cli): respect platform shell for --on-complete 2026-03-27 11:11:58 +08:00
MoerAI
8136679b1c test: update test to expect mutual exclusion error for category+subagent_type 2026-03-27 11:53:54 +09:00
MoerAI
82d89fd5fc fix(tools): add max_tools config to cap registered tools for OpenAI compatibility (fixes #2848) 2026-03-27 11:11:56 +09:00
ewjin
b1735d4004 fix: detect empty text parts in messages with tool calls during session recovery 2026-03-27 11:04:42 +09:00
ewjin
8bde294978 fix: add missing load_skills parameter to hook-injected delegate_task examples 2026-03-27 10:56:49 +09:00
MoerAI
a476e557c9 fix(delegate-task): reject when both category and subagent_type provided (fixes #2847) 2026-03-27 10:35:48 +09:00
MoerAI
404390efda fix(recovery): detect empty text parts alongside tool calls in fixEmptyMessages (fixes #2830) 2026-03-27 10:18:20 +09:00
MoerAI
944cf429a7 fix(delegate-task): apply category config temperature/maxTokens/top_p to categoryModel (fixes #2831) 2026-03-27 10:01:23 +09:00
Ravi Tharuma
241224f7ab docs(config): document object-style fallback_models 2026-03-26 18:55:22 +01:00
YeonGyu-Kim
1c9f4148d0 fix(publish-ci): sync mock-heavy test isolation with ci.yml
Apply the same mock.module() isolation fixes to publish.yml:
- Move shared and session-recovery mock-heavy tests to isolated section
- Use dynamic find + exclusion for remaining src/shared tests
- Include session-recovery tests in remaining batch

Ensures publish workflow has the same test config as main CI run.
2026-03-27 00:56:55 +09:00
YeonGyu-Kim
8dd0191ea5 fix(ci): isolate mock-heavy shared tests to prevent cross-file contamination
Move 4 src/shared tests that use mock.module() to the isolated test section:
- model-capabilities.test.ts (mocks ./connected-providers-cache)
- log-legacy-plugin-startup-warning.test.ts (mocks ./legacy-plugin-warning)
- model-error-classifier.test.ts
- opencode-message-dir.test.ts

Also isolate recover-tool-result-missing.test.ts (mocks ./storage).

Use find + exclusion pattern in remaining tests to dynamically build the
src/shared file list without the isolated mock-heavy files.

Fixes 6 Linux CI failures caused by bun's mock.module() cache pollution
when running in parallel.
2026-03-27 00:08:27 +09:00
YeonGyu-Kim
9daaeedc50 fix(test): restore shared Bun mocks after suite cleanup
Prevent src/shared batch runs from leaking module mocks into later files, which was breaking Linux CI cache metadata and legacy plugin warning assertions.
2026-03-27 00:08:20 +09:00
YeonGyu-Kim
3e13a4cf57 fix(session-recovery): filter invalid prt_* part IDs from tool_use_id reconstruction
When recovering missing tool results, the session recovery hook was using
raw part.id (prt_* format) as tool_use_id when callID was absent, causing
ZodError validation failures from the API.

Added isValidToolUseID() guard that only accepts toolu_* and call_* prefixed
IDs, and normalizeMessagePart() that returns null for parts without valid
callIDs. Both the SQLite fallback and stored-parts paths now filter out
invalid entries before constructing tool_result payloads.

Includes 4 regression tests covering both valid/invalid callID paths for
both SQLite and stored-parts backends.
2026-03-26 20:48:33 +09:00
Peïo Thibault
fb837db90d fix(publish): restore version commit-back to dev after npm release 2026-03-26 12:44:27 +01:00
YeonGyu-Kim
8e65d6cf2c fix(test): make legacy-plugin-warning tests isolation-safe
Pass explicit config dir to checkForLegacyPluginEntry instead of relying
on XDG_CONFIG_HOME env var, which gets contaminated by parallel tests on
Linux CI. Also adds missing 'join' import.
2026-03-26 19:54:05 +09:00
YeonGyu-Kim
f419a3a925 fix(test): use Bun.spawnSync in command discovery test to avoid execFileSync mock leakage
The opencode-project-command-discovery test used execFileSync for git init,
which collided with image-converter.test.ts's global execFileSync mock when
running in parallel on Linux CI. Switching to Bun.spawnSync avoids the mock
entirely since spyOn(childProcess, 'execFileSync') doesn't affect Bun APIs.

Fixes CI flake that only reproduced on Linux.
2026-03-26 19:47:25 +09:00
YeonGyu-Kim
1c54fdad26 feat(compat): package rename compatibility layer for oh-my-opencode → oh-my-openagent
- Add legacy plugin startup warning when oh-my-opencode config detected
- Update CLI installer and TUI installer for new package name
- Split monolithic config-manager.test.ts into focused test modules
- Add plugin config detection tests for legacy name fallback
- Update processed-command-store to use plugin-identity constants
- Add claude-code-plugin-loader discovery test for both config names
- Update chat-params and ultrawork-db tests for plugin identity

Part of #2823
2026-03-26 19:44:55 +09:00
YeonGyu-Kim
d39891fcab docs: update hephaestus default model references from gpt-5.3-codex to gpt-5.4
Updated across README (all locales), docs/guide/, docs/reference/,
docs/examples/, AGENTS.md files, and test expectations/snapshots.

The deep category and multimodal-looker still use gpt-5.3-codex as
those are separate from the hephaestus agent.
2026-03-26 19:25:26 +09:00
YeonGyu-Kim
d57ed97386 feat(hephaestus): upgrade default model from gpt-5.3-codex to gpt-5.4
Hephaestus now uses gpt-5.4 as its default model across all providers
(openai, github-copilot, venice, opencode), matching Sisyphus's GPT 5.4
support. The separate gpt-5.3-codex → github-copilot fallback entry is
removed since gpt-5.4 is available on all required providers.
2026-03-26 19:02:37 +09:00
github-actions[bot]
6a510c01e0 @kuitos has signed the CLA in code-yeongyu/oh-my-openagent#2833 2026-03-26 09:56:02 +00:00
MoerAI
95801a4850 fix(ralph-loop): extract text from parsed entry instead of testing raw JSONL 2026-03-26 18:16:07 +09:00
YeonGyu-Kim
b34eab3884 fix(test): isolate model-capabilities from local provider cache
Mock connected-providers-cache in model-capabilities.test.ts to prevent
findProviderModelMetadata from reading disk-cached model metadata.

Without this mock, the 'prefers runtime models.dev cache' test gets
polluted by real cached data from opencode serve runs, causing the
test to receive different maxOutputTokens/supportsTemperature values
than the mock runtime snapshot provides.

This was the last CI-only failure — passes locally with cache, fails
on CI without cache, now passes everywhere via mock isolation.

Full suite: 4484 pass, 0 fail.
2026-03-26 18:14:51 +09:00
YeonGyu-Kim
4efc181390 fix(ci): resolve all test failures + complete rename compat layer
Sisyphus-authored fixes across 15 files:

- plugin-identity: align CONFIG_BASENAME with actual config file name
- add-plugin-to-opencode-config: handle legacy→canonical name migration
- plugin-detection tests: update expectations for new identity constants
- doctor/system: fix legacy name warning test assertions
- install tests: align with new plugin name
- chat-params tests: fix mock isolation
- model-capabilities tests: fix snapshot expectations
- image-converter: fix platform-dependent test assertions (Linux CI)
- example configs: expanded with more detailed comments

Full suite: 4484 pass, 0 fail, typecheck clean.
2026-03-26 18:04:31 +09:00
PR Bot
3601061da0 feat: upgrade remaining MiniMax M2.5 fallbacks to M2.7-highspeed
Upgrade the secondary MiniMax fallback entries in librarian and explore
agents from M2.5 to M2.7-highspeed, completing the M2.5→M2.7 migration.

- librarian: minimax-m2.5 → minimax-m2.7-highspeed (2nd fallback)
- explore: minimax-m2.7 → minimax-m2.7-highspeed (2nd), minimax-m2.5 → minimax-m2.7 (3rd)
- Update corresponding test assertions
2026-03-26 16:49:31 +08:00
YeonGyu-Kim
e86edca633 feat(doctor): warn on legacy package name + add example configs
- Doctor now detects when opencode.json references 'oh-my-opencode'
  (legacy name) and warns users to switch to 'oh-my-openagent' with
  the exact replacement string.

- Added 3 example config files in docs/examples/:
  - default.jsonc: balanced setup with all agents documented
  - coding-focused.jsonc: Sisyphus + Hephaestus heavy
  - planning-focused.jsonc: Prometheus + Atlas heavy

All examples include every agent (sisyphus, hephaestus, atlas,
prometheus, explore, librarian) with model recommendations.

Helps with #2823
2026-03-26 17:01:42 +09:00
YeonGyu-Kim
a8ec92748c fix(model-resolution): honor user config overrides on cold cache
When provider-models cache is cold (first run / cache miss),
resolveModelForDelegateTask returns {skipped: true}. Previously this
caused the subagent resolver to:

1. Ignore the user's explicit model override (e.g. explore.model)
2. Fall through to the hardcoded fallback chain which may contain
   model IDs that don't exist in the provider catalog

Now:
- subagent-resolver: if resolution is skipped but user explicitly
  configured a model, use it directly
- subagent-resolver: don't assign hardcoded fallback chain on skip
- category-resolver: same — don't leak hardcoded chain on skip
- general-agents: if user model fails resolution, use it as-is
  instead of falling back to hardcoded chain first entry

Closes #2820
2026-03-26 16:54:04 +09:00
YeonGyu-Kim
dd85d1451a fix(model-requirements): align fallback models with available provider catalogs
- opencode/minimax-m2.7-highspeed → opencode/minimax-m2.5 (provider lacks m2.7 variants)
- opencode-go/minimax-m2.7-highspeed → opencode-go/minimax-m2.7 (provider lacks -highspeed)
- opencode/minimax-m2.7 → opencode/minimax-m2.5 (provider only has m2.5)
- added xai as alternative provider for grok-code-fast-1 (prevents wrong provider prefix)
2026-03-26 16:00:02 +09:00
YeonGyu-Kim
682eead61b Merge pull request #2845 from code-yeongyu/fix/path-discovery-parity-followup
fix: add remaining path discovery parity coverage
2026-03-26 13:13:15 +09:00
YeonGyu-Kim
42f5386100 fix(tests): drop duplicate tilde config regression
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:08:53 +09:00
YeonGyu-Kim
5bc019eb7c fix(skills): remove duplicate homedir import
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:06:32 +09:00
YeonGyu-Kim
097e2be7e8 fix(slashcommand): discover nested opencode commands with slash names
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:05:03 +09:00
YeonGyu-Kim
c637d77965 fix(commands): discover ancestor opencode project commands
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:04:39 +09:00
YeonGyu-Kim
4c8aacef48 fix(agents): include .agents skills in agent awareness
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:02:52 +09:00
YeonGyu-Kim
8413bc6a91 fix(skills): expand tilde config source paths
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:02:36 +09:00
YeonGyu-Kim
86a62aef45 fix(skills): discover ancestor project skill directories
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:02:36 +09:00
YeonGyu-Kim
961cc788f6 fix(shared): support opencode directory aliases
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:02:12 +09:00
YeonGyu-Kim
19838b78a7 fix(shared): add bounded project discovery helpers
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 13:01:06 +09:00
YeonGyu-Kim
9d4a8f2183 Merge pull request #2844 from code-yeongyu/fix/opencode-followup-gaps
fix: close remaining upstream path discovery gaps
2026-03-26 12:35:39 +09:00
YeonGyu-Kim
7f742723b5 fix(slashcommand): use slash separator for nested commands
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 12:29:42 +09:00
YeonGyu-Kim
b20a34bfa7 fix(slashcommand): discover nested opencode commands
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 12:15:47 +09:00
YeonGyu-Kim
12a4318439 fix(commands): load .agents skills into command config
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 12:15:47 +09:00
YeonGyu-Kim
e4a5973b16 fix(agents): include .agents skills in agent awareness
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 12:15:47 +09:00
YeonGyu-Kim
83819a15d3 fix(shared): stop ancestor discovery at worktree root
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 12:15:47 +09:00
YeonGyu-Kim
a391f44420 Merge pull request #2842 from code-yeongyu/fix/opencode-skill-override-gaps
fix: align path discovery with upstream opencode
2026-03-26 11:54:08 +09:00
YeonGyu-Kim
94b4a4f850 fix(slashcommand): deduplicate opencode command aliases
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:36:59 +09:00
YeonGyu-Kim
9fde370838 fix(commands): preserve nearest opencode command precedence
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:36:59 +09:00
YeonGyu-Kim
b6ee7f09b1 fix(slashcommand): discover ancestor opencode commands
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:22:00 +09:00
YeonGyu-Kim
28bcab066e fix(commands): load opencode command dirs from aliases
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:22:00 +09:00
YeonGyu-Kim
b5cb50b561 fix(skills): discover ancestor project skill directories
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:22:00 +09:00
YeonGyu-Kim
8242500856 fix(skills): expand tilde config source paths
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:22:00 +09:00
YeonGyu-Kim
6d688ac0ae fix(shared): support opencode directory aliases
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:22:00 +09:00
YeonGyu-Kim
da3e80464d fix(shared): add ancestor project discovery helpers
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-26 11:22:00 +09:00
YeonGyu-Kim
23df6bd255 Merge pull request #2841 from code-yeongyu/fix/model-fallback-test-isolation
fix(tests): resolve 5 cross-file test isolation failures
2026-03-26 09:31:09 +09:00
YeonGyu-Kim
7895361f42 fix(tests): resolve 5 cross-file test isolation failures
- model-fallback hook: mock selectFallbackProvider and add _resetForTesting()
  to test-setup.ts to clear module-level state between files
- fallback-retry-handler: add afterAll(mock.restore) and use mockReturnValueOnce
  to prevent connected-providers mock leaking to subsequent test files
- opencode-config-dir: use win32.join for Windows APPDATA path construction
  so tests pass on macOS (path.join uses POSIX semantics regardless of
  process.platform override)
- system-loaded-version: use resolveSymlink from file-utils instead of
  realpathSync to handle macOS /var -> /private/var symlink consistently

All 4456 tests pass (0 failures) on full bun test suite.
2026-03-26 09:30:34 +09:00
YeonGyu-Kim
90919bf359 Merge pull request #2664 from kilhyeonjun/fix/anthropic-1m-ga-context-limit
fix(shared): respect cached model context limits for Anthropic providers post-GA
2026-03-26 08:55:04 +09:00
YeonGyu-Kim
32f2c688e7 Merge pull request #2707 from MoerAI/fix/windows-symlink-config
fix(windows): resolve symlinked config paths and plugin name parsing (fixes #2271)
2026-03-26 08:54:45 +09:00
YeonGyu-Kim
e6d0484e57 Merge pull request #2710 from MoerAI/fix/rate-limit-hang
fix(runtime-fallback): detect bare 429 rate-limit signals (fixes #2677)
2026-03-26 08:53:41 +09:00
YeonGyu-Kim
abd62472cf Merge pull request #2752 from MoerAI/fix/quota-error-fallback-detection
fix(runtime-fallback): detect prettified quota errors without HTTP status codes (fixes #2747)
2026-03-26 08:50:58 +09:00
YeonGyu-Kim
b1e099130a Merge pull request #2756 from MoerAI/fix/plugin-display-name
fix(plugin): display friendly name in configuration UI instead of file path (fixes #2644)
2026-03-26 08:50:29 +09:00
YeonGyu-Kim
09fb364bfb Merge pull request #2833 from kuitos/feat/agent-order-support
feat(agent-priority): inject order field for deterministic agent Tab cycling
2026-03-26 08:49:58 +09:00
YeonGyu-Kim
d1ff8b1e3f Merge pull request #2727 from octo-patch/feature/upgrade-minimax-m2.7
feat: upgrade MiniMax from M2.5 to M2.7 and expand to more agents/categories
2026-03-26 08:49:11 +09:00
YeonGyu-Kim
6e42b553cc Merge origin/dev into feature/upgrade-minimax-m2.7 (resolve conflicts) 2026-03-26 08:48:53 +09:00
YeonGyu-Kim
02ab83f4d4 Merge pull request #2834 from RaviTharuma/feat/model-capabilities-canonical-guardrails
fix(model-capabilities): harden canonical alias guardrails
2026-03-26 08:46:43 +09:00
github-actions[bot]
ce1bffbc4d @ventsislav-georgiev has signed the CLA in code-yeongyu/oh-my-openagent#2840 2026-03-25 23:11:43 +00:00
github-actions[bot]
4d4680be3c @clansty has signed the CLA in code-yeongyu/oh-my-openagent#2839 2026-03-25 21:33:49 +00:00
Ravi Tharuma
ce877ec0d8 test(atlas): avoid shared barrel mock pollution 2026-03-25 22:27:26 +01:00
Ravi Tharuma
ec20a82b4e fix(model-capabilities): align gemini aliases and alias lookup 2026-03-25 22:19:51 +01:00
Ravi Tharuma
5043cc21ac fix(model-capabilities): harden canonical alias guardrails 2026-03-25 22:11:45 +01:00
github-actions[bot]
8df3a2876a @anas-asghar4831 has signed the CLA in code-yeongyu/oh-my-openagent#2837 2026-03-25 18:48:32 +00:00
YeonGyu-Kim
087e33d086 Merge pull request #2832 from RaviTharuma/fix/todo-sync-priority-default
test(todo-sync): match required priority fallback
2026-03-26 01:30:50 +09:00
Ravi Tharuma
46c6e1dcf6 test(todo-sync): match required priority fallback 2026-03-25 16:38:21 +01:00
kuitos
5befb60229 feat(agent-priority): inject order field for deterministic agent Tab cycling
Inject an explicit `order` field (1-4) into the four core agents
(Sisyphus, Hephaestus, Prometheus, Atlas) via reorderAgentsByPriority().
This pre-empts OpenCode's alphabetical agent sorting so the intended
Tab cycle order is preserved once OpenCode merges order field support
(anomalyco/opencode#19127).

Refs anomalyco/opencode#7372
2026-03-25 23:35:40 +08:00
Ravi Tharuma
55df2179b8 fix(todo-sync): preserve missing task priority 2026-03-25 16:26:23 +01:00
YeonGyu-Kim
76420b36ab Merge pull request #2829 from RaviTharuma/fix/model-capabilities-review-followup
fix(model-capabilities): harden runtime capability handling
2026-03-26 00:25:07 +09:00
Ravi Tharuma
a15f6076bc feat(model-capabilities): add maintenance guardrails 2026-03-25 16:14:19 +01:00
Ravi Tharuma
7c0289d7bc fix(model-capabilities): honor root thinking flags 2026-03-25 15:41:12 +01:00
YeonGyu-Kim
5e9231e251 Merge pull request #2828 from code-yeongyu/fix/content-based-thinking-gating-v2
fix(thinking-block-validator): replace model-name gating with content-based history detection
2026-03-25 23:26:52 +09:00
YeonGyu-Kim
f04cc0fa9c fix(thinking-block-validator): replace model-name gating with content-based history detection
Replace isExtendedThinkingModel() model-name check with hasSignedThinkingBlocksInHistory()
which scans message history for real Anthropic-signed thinking blocks.

Content-based gating is more robust than model-name checks — works correctly
with custom model IDs, proxied models, and new model releases without code changes.

- Add isSignedThinkingPart() that matches type thinking/redacted_thinking with valid signature
- Skip synthetic parts (injected by previous hook runs)
- GPT reasoning blocks (type=reasoning, no signature) correctly excluded
- Add comprehensive tests: signed injection, redacted_thinking, reasoning negative case, synthetic skip

Inspired by PR #2653 content-based approach, combined with redacted_thinking support from 0732cb85.

Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-25 23:23:46 +09:00
Ravi Tharuma
613ef8eee8 fix(model-capabilities): harden runtime capability handling 2026-03-25 15:09:25 +01:00
YeonGyu-Kim
99b398063c Merge pull request #2826 from RaviTharuma/feat/model-capabilities-models-dev
feat(model-capabilities): add models.dev snapshot and runtime capability refresh
2026-03-25 23:08:17 +09:00
Ravi Tharuma
2af9324400 feat: add models.dev-backed model capabilities 2026-03-25 14:47:46 +01:00
YeonGyu-Kim
7a52639a1b Merge pull request #2673 from sanoyphilippe/fix/oauth-discovery-root-fallback
fix(mcp-oauth): fall back to root well-known URL for non-root resource paths (fixes #2675)
2026-03-25 21:48:13 +09:00
YeonGyu-Kim
5df54bced4 Merge pull request #2725 from cphoward/fix/spawn-budget-lifetime-semantics-clean
fix(background-agent): decrement spawn budget on task completion, cancellation, error, and interrupt
2026-03-25 21:46:51 +09:00
YeonGyu-Kim
cd04e6a19e Merge pull request #2751 from sjawhar/fix/atlas-subagent-agent-guard
fix(atlas): restore agent mismatch guard for subagent boulder continuation
2026-03-25 21:46:37 +09:00
YeonGyu-Kim
e974b151c1 Merge pull request #2701 from tonymfer/fix/lsp-initialization-options
fix(lsp): wrap initialization config in initializationOptions field
2026-03-25 21:46:16 +09:00
YeonGyu-Kim
6f213a0ac9 Merge pull request #2686 from sjawhar/fix/look-at-respect-configured-model
fix(look-at): respect configured multimodal-looker model instead of overriding via dynamic fallback
2026-03-25 21:46:11 +09:00
YeonGyu-Kim
71004e88d3 Merge pull request #2583 from Jrakru/fix/start-work-atlas-handoff
fix: preserve Atlas handoff metadata on /start-work
2026-03-25 21:46:06 +09:00
YeonGyu-Kim
5898d36321 Merge pull request #2575 from apple-ouyang/fix/issue-2571-subagent-safeguards
fix(delegate-task): add subagent turn limit and model routing transparency
2026-03-25 21:46:01 +09:00
YeonGyu-Kim
90aa3e4489 Merge pull request #2589 from MoerAI/fix/plan-agent-continuation-loop
fix(todo-continuation-enforcer): add plan agent to DEFAULT_SKIP_AGENTS (fixes #2526)
2026-03-25 21:45:58 +09:00
YeonGyu-Kim
2268ba45f9 Merge pull request #2262 from Stranmor/feat/prompt-file-uri-support
feat: support file:// URIs in agent prompt field
2026-03-25 21:45:53 +09:00
YeonGyu-Kim
aca9342722 Merge pull request #2345 from DarkFunct/fix/todo-sync-priority-null
fix(todo-sync): provide default priority to prevent SQLite NOT NULL violation
2026-03-25 21:45:48 +09:00
YeonGyu-Kim
a3519c3a14 Merge pull request #2544 from djdembeck/fix/quick-anti-loop-v2
fix(agents): add termination criteria to Sisyphus-Junior default
2026-03-25 21:45:43 +09:00
YeonGyu-Kim
e610d88558 Merge pull request #2594 from MoerAI/fix/subagent-fallback-model-v2
fix(agent-registration): always attempt fallback when model resolution fails (fixes #2427, supersedes #2517)
2026-03-25 21:45:40 +09:00
YeonGyu-Kim
ed09bf5462 Merge pull request #2674 from RaviTharuma/fix/dedup-delegated-model-config
refactor: deduplicate DelegatedModelConfig into shared module
2026-03-25 21:43:31 +09:00
YeonGyu-Kim
1d48518b41 Merge pull request #2643 from RaviTharuma/feat/model-settings-compatibility-resolver
feat(settings): add model settings compatibility resolver
2026-03-25 21:43:28 +09:00
YeonGyu-Kim
d6d4cece9d Merge pull request #2622 from RaviTharuma/feat/object-style-fallback-models
feat(config): object-style fallback_models with per-model settings
2026-03-25 21:43:22 +09:00
Ravi Tharuma
9d930656da test(restack): drop stale compatibility expectations 2026-03-25 11:14:04 +01:00
Ravi Tharuma
f86b8b3336 fix(review): align model compatibility and prompt param helpers 2026-03-25 11:14:04 +01:00
Ravi Tharuma
1f5d7702ff refactor(delegate-task): deduplicate DelegatedModelConfig + registry refactor
- Move DelegatedModelConfig to src/shared/model-resolution-types.ts
- Re-export from delegate-task/types.ts (preserving import paths)
- Replace background-agent/types.ts local duplicate with shared import
- Consolidate model-settings-compatibility.ts registry patterns
2026-03-25 11:14:04 +01:00
Ravi Tharuma
1e70f64001 chore(schema): refresh generated fallback model schema 2026-03-25 11:13:53 +01:00
Ravi Tharuma
d4f962b55d feat(model-settings-compat): add variant/reasoningEffort compatibility resolver
- Registry-based model family detection (provider-agnostic)
- Variant and reasoningEffort ladder downgrade logic
- Three-tier resolution: metadata override → family heuristic → unknown drop
- Comprehensive test suite covering all model families
2026-03-25 11:13:53 +01:00
Ravi Tharuma
fb085538eb test(background-agent): restore spawner createTask import 2026-03-25 11:13:28 +01:00
Ravi Tharuma
e5c5438a44 fix(delegate-task): gate fallback settings to real fallback matches 2026-03-25 11:04:49 +01:00
Ravi Tharuma
a77a16c494 feat(config): support object-style fallback_models with per-model settings
Add support for object-style entries in fallback_models arrays, enabling
per-model configuration of variant, reasoningEffort, temperature, top_p,
maxTokens, and thinking settings.

- Zod schema for FallbackModelObject with full validation
- normalizeFallbackModels() and flattenToFallbackModelStrings() utilities
- Provider-agnostic model resolution pipeline with fallback chain
- Session prompt params state management
- Fallback chain construction with prefix-match lookup
- Integration across delegate-task, background-agent, and plugin layers
2026-03-25 11:04:49 +01:00
YeonGyu-Kim
7761e48dca Merge pull request #2592 from MoerAI/fix/gemini-quota-fallback
fix(runtime-fallback): detect Gemini quota errors in session.status retry events (fixes #2454)
2026-03-25 18:14:21 +09:00
MoerAI
d7a1945b27 fix(plugin-loader): preserve scoped npm package names in plugin key parsing
Scoped packages like @scope/pkg were truncated to just 'pkg' because
basename() strips the scope prefix. Fix:
- Detect scoped packages (starting with @) and find version separator
  after the scope slash, not at the leading @
- Return full scoped name (@scope/pkg) instead of calling basename
- Add regression test for scoped package name preservation
2026-03-25 17:10:07 +09:00
MoerAI
44fb114370 fix(runtime-fallback): rename misleading test to match actual behavior
The test name claimed it exercised RETRYABLE_ERROR_PATTERNS directly,
but classifyErrorType actually matches 'payment required' via the
quota_exceeded path first. Rename to 'detects payment required errors
as retryable' to accurately describe end-to-end behavior.
2026-03-25 16:58:49 +09:00
MoerAI
774d0bd84d fix(ralph-loop): restrict semantic completion to DONE promise and assistant entries
- Gate semantic detection on promise === 'DONE' in both transcript and
  session message paths to prevent false positives on VERIFIED promises
- Restrict transcript semantic fallback to assistant/text entries only,
  skipping tool_use/tool_result to avoid matching file content
- Add regression test for VERIFIED promise not triggering semantic detection
2026-03-25 16:49:53 +09:00
YeonGyu-Kim
bf804b0626 fix(shared): restrict cached Anthropic 1M context to GA 4.6 models only 2026-03-25 14:29:59 +09:00
YeonGyu-Kim
c4aa380855 Merge pull request #2734 from ndaemy/fix/remove-duplicate-ultrawork-separator
fix(keyword-detector): remove duplicate separator from ultrawork templates
2026-03-25 13:22:41 +09:00
YeonGyu-Kim
993bd51eac Merge pull request #2524 from Gujiassh/fix/session-todo-filename-match
fix(session-manager): match todo filenames exactly
2026-03-25 13:22:39 +09:00
YeonGyu-Kim
732743960f Merge pull request #2533 from Gujiassh/fix/background-task-metadata-id
fix(delegate-task): report the real background task id
2026-03-25 13:22:37 +09:00
YeonGyu-Kim
bff573488c Merge pull request #2443 from tc9011/fix/github-copilot-model-version
fix: github copilot model version for Sisyphus agent
2026-03-25 13:22:34 +09:00
YeonGyu-Kim
77424f86c8 Merge pull request #2816 from code-yeongyu/fix/keep-agent-with-explicit-model
fix: always keep agent with explicit model, robust port binding & writable dir fallback
2026-03-25 11:48:26 +09:00
YeonGyu-Kim
919f7e4092 fix(data-path): writable directory fallback for data/cache paths
getDataDir() and getCacheDir() now verify the directory is writable and
fall back to os.tmpdir() if not.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 11:46:07 +09:00
YeonGyu-Kim
78a3e985be fix(mcp-oauth): robust port binding for callback server
Use port 0 fallback when findAvailablePort fails, read the actual bound
port from server.port. Tests refactored to use mock server when real
socket binding is unavailable in CI.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 11:46:07 +09:00
YeonGyu-Kim
42fb2548d6 fix(agent): always keep agent when model is explicitly configured
Previously, when an explicit model was configured, the agent name was
omitted to prevent opencode's built-in agent fallback chain from
overriding the user-specified model. This removes that conditional logic
and always passes the agent name alongside the model. Tests are updated
to reflect this behavior change.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 11:46:07 +09:00
YeonGyu-Kim
bff74f4237 Merge pull request #2695 from MoerAI/fix/provider-agnostic-fallback
fix(runtime-fallback): make fallback provider selection provider-agnostic (fixes #2303)
2026-03-25 11:36:50 +09:00
YeonGyu-Kim
038b8a79ec Revert "Merge pull request #2611 from MoerAI/fix/keep-default-builder-agent"
This reverts commit 0aa8bfe839, reversing
changes made to 422eaa9ae0.
2026-03-25 11:13:05 +09:00
YeonGyu-Kim
0aa8bfe839 Merge pull request #2611 from MoerAI/fix/keep-default-builder-agent
fix(config): keep default OpenCode Build agent enabled by default (fixes #2545)
2026-03-25 11:11:34 +09:00
YeonGyu-Kim
422eaa9ae0 Merge pull request #2753 from MoerAI/fix/prometheus-model-override
fix(prometheus): respect agent model override instead of using global opencode.json model (fixes #2693)
2026-03-25 11:09:48 +09:00
YeonGyu-Kim
63ebedc9a2 Merge pull request #2606 from RaviTharuma/fix/clamp-variant-on-non-opus-fallback
fix: clamp unsupported max variant for non-Opus Claude models
2026-03-25 11:06:31 +09:00
YeonGyu-Kim
f0b5835459 fix(publish): correct repo guard to oh-my-openagent (GitHub renamed repo) 2026-03-25 09:21:38 +09:00
YeonGyu-Kim
2a495c2e8d Merge pull request #2813 from code-yeongyu/fix/tmux-test-flake-20260325
test(tmux): remove flaky live env wrapper assertion
2026-03-25 02:08:05 +09:00
YeonGyu-Kim
0edb87b1c1 test(tmux): remove flaky live env wrapper assertion
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-25 02:05:51 +09:00
YeonGyu-Kim
cca057dc0f Merge pull request #2812 from code-yeongyu/fix/non-interactive-env-win-bash-prefix
fix(non-interactive-env): force unix prefix for bash git commands
2026-03-25 01:24:18 +09:00
YeonGyu-Kim
e000a3bb0d fix(non-interactive-env): force unix prefix for bash git commands
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-25 01:23:02 +09:00
YeonGyu-Kim
c19fc4ba22 Merge pull request #2811 from code-yeongyu/fix/publish-workflow-guard-topology-20260325
fix(publish): align repo guard and test topology
2026-03-25 01:19:29 +09:00
YeonGyu-Kim
e0de06851d fix(publish): align repo guard and test topology
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-25 01:17:42 +09:00
YeonGyu-Kim
26ac413dd9 Merge pull request #2801 from MoerAI/fix/null-byte-sanitization
fix(tool-execute-before): strip null bytes from bash commands to prevent crash (fixes #2220)
2026-03-25 01:12:45 +09:00
YeonGyu-Kim
81c912cf04 Merge pull request #2800 from MoerAI/fix/background-task-fallback-chain
fix(background-task): register fallback chain for background sessions (fixes #2203)
2026-03-25 01:12:41 +09:00
YeonGyu-Kim
9c348db450 Merge pull request #2799 from MoerAI/fix/unstable-agent-config-override
fix(category-resolver): respect is_unstable_agent config override (fixes #2061)
2026-03-25 01:12:36 +09:00
YeonGyu-Kim
2993b3255d Merge pull request #2796 from guazi04/fix/circuit-breaker-false-positive-upstream
fix(circuit-breaker): treat unknown tool input as non-comparable to prevent false positives on flat events
2026-03-25 01:12:31 +09:00
YeonGyu-Kim
0b77e2def0 Merge pull request #2810 from code-yeongyu/fix/webfetch-redirect-loop
fix(webfetch): guard redirect loops in built-in flow
2026-03-25 00:40:54 +09:00
YeonGyu-Kim
bfa8fa2378 Merge pull request #2804 from code-yeongyu/fix/b2-hashline-formatter-cache-per-project
fix(hashline-edit): scope formatter cache by directory
2026-03-25 00:32:41 +09:00
YeonGyu-Kim
6ee680af99 Merge pull request #2809 from code-yeongyu/fix/2330-recursive-subagent-spawn
fix(task): preserve restricted agent tools in sync continuation
2026-03-25 00:32:14 +09:00
YeonGyu-Kim
d327334ded Merge pull request #2808 from code-yeongyu/fix-gemini-3-pro-cleanup
fix(models): remove stale Gemini 3 Pro references
2026-03-25 00:32:10 +09:00
YeonGyu-Kim
07d120a78d Merge pull request #2807 from code-yeongyu/fix/b4-manager-model-override-1774351606
fix(background-task): apply model override omission to manager live path
2026-03-25 00:31:49 +09:00
YeonGyu-Kim
8b7b1c843a Merge pull request #2806 from code-yeongyu/fix/b5-permission-merge-order
fix(plugin): restore permission merge order precedence
2026-03-25 00:31:43 +09:00
YeonGyu-Kim
a1786f469d Merge pull request #2805 from code-yeongyu/fix/b3-config-filename-precedence
fix(config): prefer canonical plugin config filenames
2026-03-25 00:31:18 +09:00
YeonGyu-Kim
da77d8addf Merge pull request #2802 from code-yeongyu/fix/b1-preemptive-compaction-epoch-guard
fix: handle repeated compaction epochs in continuation guard
2026-03-25 00:30:54 +09:00
YeonGyu-Kim
971912e065 fix(webfetch): avoid rewriting successful redirect content 2026-03-24 23:59:57 +09:00
YeonGyu-Kim
af301ab29a fix(webfetch): guard redirect loops in built-in flow 2026-03-24 23:58:53 +09:00
YeonGyu-Kim
984464470c fix(task): preserve restricted agent tools in sync continuation
Restore sync continuation to apply agent tool restrictions after permissive defaults so resumed explore and librarian sessions cannot regain nested delegation. Add regression tests for resumed restricted agents while keeping plan-family continuation behavior intact.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 23:54:29 +09:00
YeonGyu-Kim
535ecee318 fix(models): remove stale Gemini 3 Pro references
Keep repo-owned CLI, docs, and test fixtures aligned with current Gemini 3.1 naming while leaving upstream catalog behavior untouched.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 23:53:56 +09:00
YeonGyu-Kim
32035d153e fix(config): prefer canonical plugin config filenames
Ensure oh-my-opencode filenames always win over legacy oh-my-openagent files so readers match canonical writer behavior.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:38:54 +09:00
YeonGyu-Kim
a0649616bf fix(todo-continuation-enforcer): acknowledge compaction epochs during idle
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:36:22 +09:00
YeonGyu-Kim
cb12b286c8 fix(todo-continuation-enforcer): arm compaction epochs on compaction
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:36:22 +09:00
YeonGyu-Kim
8e239e134c fix(todo-continuation-enforcer): make compaction guard epoch-aware
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:36:22 +09:00
YeonGyu-Kim
733676f1a9 fix(todo-continuation-enforcer): add compaction epoch state
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:36:22 +09:00
YeonGyu-Kim
d2e566ba9d fix(preemptive-compaction): mock session history in degradation test
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:36:22 +09:00
YeonGyu-Kim
6da4d2dae0 fix(hashline-edit): scope formatter cache by directory
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:30:16 +09:00
YeonGyu-Kim
3b41191980 fix(background-agent): honor explicit model override in manager
Keep BackgroundManager launch and resume from sending both agent and model so OpenCode does not override configured subagent models. Add launch and resume regressions for the live production path.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:28:01 +09:00
YeonGyu-Kim
0b614b751c fix(permissions): preserve explicit deny over OmO defaults
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 20:24:14 +09:00
MoerAI
c56a01c15d fix(tool-execute-before): strip null bytes from bash commands to prevent crash (fixes #2220) 2026-03-24 19:17:05 +09:00
MoerAI
d2d48fc9ff fix(background-task): register fallback chain for background sessions (fixes #2203) 2026-03-24 19:11:13 +09:00
MoerAI
41a43c62fc fix(category-resolver): respect is_unstable_agent config override (fixes #2061) 2026-03-24 19:08:21 +09:00
YeonGyu-Kim
cea8769a7f Merge pull request #2798 from code-yeongyu/fix/2353-model-selection-v2
fix(plugin): persist selected model only for main session
2026-03-24 18:57:50 +09:00
YeonGyu-Kim
7fa2417c42 fix(plugin): persist selected model only for main session
Reuse the stored model only for subsequent main-session messages when the UI provides no model, while preserving first-message behavior, explicit overrides, and subagent isolation.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 18:11:27 +09:00
YeonGyu-Kim
4bba924dad Revert "Merge pull request #2797 from code-yeongyu/fix/2353-model-selection-persistence"
This reverts commit e691303919, reversing
changes made to d4aee20743.
2026-03-24 17:59:21 +09:00
YeonGyu-Kim
e691303919 Merge pull request #2797 from code-yeongyu/fix/2353-model-selection-persistence
fix(plugin): preserve selected model across messages
2026-03-24 17:54:34 +09:00
YeonGyu-Kim
d4aee20743 Merge pull request #2794 from code-yeongyu/fix/2775-thinking-block-signatures
fix(thinking-block-validator): reuse signed thinking blocks instead of synthetic placeholders
2026-03-24 17:54:31 +09:00
YeonGyu-Kim
bad70f5e24 fix(plugin): preserve selected model across messages
Reuse the current session's selected model during config-time agent rebuilds when config.model is missing, so desktop sessions do not snap back to the default model after each send.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 17:47:08 +09:00
Mou
b9fa2a3ebc fix(background-agent): prevent circuit breaker false positives on flat-format events 2026-03-24 16:35:54 +08:00
YeonGyu-Kim
0e7bd595f8 fix(session-recovery): reuse signed thinking blocks safely
Reuse signed Anthropic thinking blocks only when they can still sort before the target message's parts, otherwise skip recovery instead of reintroducing invalid loops.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 17:22:07 +09:00
YeonGyu-Kim
0732cb85f9 fix(thinking-block-validator): reuse signed thinking parts
Preserve prior signed Anthropic thinking blocks instead of creating unsigned synthetic placeholders, and skip injection when no signed block exists.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-24 17:22:07 +09:00
YeonGyu-Kim
500784a9b9 Merge pull request #2790 from code-yeongyu/fix/2666-mcp-schema-sanitization
fix(schema): strip contentEncoding from MCP tool schemas for Gemini (fixes #2200)
2026-03-24 16:24:57 +09:00
YeonGyu-Kim
5e856b4fde fix(schema): strip contentEncoding from MCP tool schemas for Gemini compatibility
The existing normalizeToolArgSchemas only applies to omo plugin tools
(via tool-registry.ts), but MCP server tool schemas bypass this
sanitization entirely. MCP schemas with contentEncoding/contentMediaType
cause Gemini 400 errors.

Add sanitizeJsonSchema() to strip unsupported keywords from MCP tool
inputSchema before serialization in formatMcpCapabilities.

Fixes #2200
Supersedes #2666
2026-03-24 16:24:44 +09:00
YeonGyu-Kim
03dc903e8e Merge pull request #2789 from code-yeongyu/fix/2671-clearSessionState
fix(anthropic-recovery): clear session state after successful summarize (fixes #2225)
2026-03-24 16:23:25 +09:00
YeonGyu-Kim
69d0b23ab6 fix(anthropic-recovery): clear session state after successful summarize and fix timing test
- Add missing clearSessionState() call after successful summarize (line 117)
  Without this, retry state persisted even after success, potentially causing
  unnecessary retries on subsequent compaction events.

- Fix timing-sensitive test: adjust attempt=0 and firstAttemptTime to give
  proper remainingTimeMs buffer for capped delay calculation.

Fixes #2225
Supersedes #2671
2026-03-24 16:23:11 +09:00
YeonGyu-Kim
ee8735cd2c Merge pull request #2788 from code-yeongyu/fix/2670-uiSelectedModel-nullification
fix(agents): preserve uiSelectedModel when agent override has no model (fixes #2351)
2026-03-24 16:22:15 +09:00
YeonGyu-Kim
d8fe61131c fix(agents): preserve uiSelectedModel when agent override has no model
Three agent builder files used falsy checks that incorrectly nullified
uiSelectedModel when override objects existed but had no model set:

- sisyphus-agent.ts: `?.model ?` → `?.model !== undefined ?`
- atlas-agent.ts: `?.model ?` → `?.model !== undefined ?`
- general-agents.ts: `!override?.model` → `override?.model === undefined`

This caused user model selection in web mode to revert to defaults.

Fixes #2351
2026-03-24 16:22:03 +09:00
YeonGyu-Kim
935995d270 Merge pull request #2668 from MoerAI/fix/session-degradation-detection
fix(session): detect post-compaction no-text degradation and trigger recovery (fixes #2232)
2026-03-24 16:21:30 +09:00
YeonGyu-Kim
23d8b88c4a Merge pull request #2669 from MoerAI/fix/atlas-worktree-verification
fix(atlas): use worktree path for git verification when available (fixes #2229)
2026-03-24 16:21:27 +09:00
YeonGyu-Kim
b4285ce565 Merge pull request #2787 from code-yeongyu/fix/review-fixes
fix(permissions): ensure omo permission overrides take precedence over opencode defaults
2026-03-24 16:20:27 +09:00
YeonGyu-Kim
f9d354b63e fix(permissions): ensure omo permission overrides take precedence over opencode defaults
The spread order in applyToolConfig was incorrect - omo's external_directory: 'allow'
was placed BEFORE the config.permission spread, allowing opencode's default 'ask' to
overwrite it. This caused write/edit tools to hang on headless opencode serve sessions
(no TUI to approve permission prompts).

Move omo's permission overrides AFTER the base config spread so they always win.

Fixes write/edit tool hangs when running opencode serve headlessly.
2026-03-24 16:19:56 +09:00
YeonGyu-Kim
370eb945ee Merge pull request #2786 from code-yeongyu/docs/rename-opencode-to-openagent
docs: rename oh-my-opencode to oh-my-openagent
2026-03-24 15:39:00 +09:00
YeonGyu-Kim
6387065e6f docs: rename oh-my-opencode to oh-my-openagent 2026-03-24 15:31:54 +09:00
YeonGyu-Kim
bebdb97c21 Merge pull request #2784 from code-yeongyu/fix/remove-openclaw-hyperlink
docs: remove OpenClaw hyperlink
2026-03-24 13:35:12 +09:00
YeonGyu-Kim
b5e2ead4e1 docs: remove OpenClaw hyperlink from Building in Public 2026-03-24 13:34:57 +09:00
YeonGyu-Kim
91922dae36 Merge pull request #2783 from code-yeongyu/fix/building-in-public-image
docs: add screenshot to Building in Public section
2026-03-24 13:34:14 +09:00
YeonGyu-Kim
cb3d8af995 docs: add screenshot to Building in Public section
Added the actual Discord screenshot showing real-time development
with Jobdori in #building-in-public channel.
2026-03-24 13:34:04 +09:00
YeonGyu-Kim
0fb3e2063a Merge pull request #2782 from code-yeongyu/feat/building-in-public-readme
docs: add Building in Public section to all READMEs
2026-03-24 13:23:46 +09:00
YeonGyu-Kim
b37b877c45 docs: add Building in Public section to all READMEs
- Added TIP box linking to #building-in-public Discord channel
- Mentions Jobdori AI assistant (built on heavily customized OpenClaw)
- Added to all 5 language variants (EN, KO, JA, ZH-CN, RU)
- Positioned above waitlist section for visibility
2026-03-24 13:23:21 +09:00
YeonGyu-Kim
f854246d7f Merge pull request #2772 from MoerAI/fix/custom-model-resolution
fix(delegate-task): trust user-configured category models without fuzzy validation (fixes #2740)
2026-03-24 12:38:22 +09:00
YeonGyu-Kim
f1eaa7bf9b fix(shell): detect csh/tcsh and use setenv syntax (#2769)
fix(non-interactive-env): detect shell type for csh/tcsh env var syntax (fixes #2089)
2026-03-24 12:30:49 +09:00
YeonGyu-Kim
ed9b4a6329 Merge pull request #2780 from code-yeongyu/fix/issues-2741-2648-2779
fix: resolve subagent model override, empty plan completion, deep task refusal (#2741, #2648, #2779)
2026-03-24 10:28:24 +09:00
YeonGyu-Kim
a00a22ac4c fix: remove copy-paste artifacts in hephaestus gpt-5-3-codex prompt
Same issue as gpt.ts and gpt-5-4.ts: duplicated CORRECT block with pipe
characters and duplicated Hard Constraints/Task Scope Clarification sections.
2026-03-24 10:14:53 +09:00
YeonGyu-Kim
8879581fc1 fix: remove copy-paste artifacts in hephaestus GPT prompts
- Remove leading pipe characters (|) from duplicated CORRECT block
- Remove duplicated ## Hard Constraints and ### Task Scope Clarification sections
- Properly place Task Scope Clarification section between CORRECT list and Hard Constraints

Addresses review comments by cubic-dev-ai[bot] on PR #2780
2026-03-24 09:57:30 +09:00
YeonGyu-Kim
230ce835e5 fix: resolve 3 bugs - subagent model override, empty plan completion, deep task refusal
- #2741: Pass inheritedModel as fallback in subagent-resolver when user hasn't
  configured an override, ensuring custom provider models take priority
- #2648: Fix getPlanProgress to treat plans with 0 checkboxes as incomplete
  instead of complete (total > 0 && completed === total)
- #2779: Relax Hephaestus single-task guard to accept multi-step sub-tasks
  from Atlas delegation, only rejecting genuinely independent tasks

Fixes #2741, fixes #2648, fixes #2779
2026-03-24 09:45:11 +09:00
YeonGyu-Kim
10e56badb3 Merge pull request #2776 from code-yeongyu/fix/background-agent-timeout-defaults
fix: stabilize background-agent stale timeout tests (Date.now race condition)
2026-03-24 03:29:35 +09:00
YeonGyu-Kim
cddf78434c Merge pull request #2770 from code-yeongyu/fix/ci-test-timeout
fix: add fetch mock to install test to prevent CI timeout
2026-03-24 03:29:23 +09:00
YeonGyu-Kim
0078b736b9 fix: stabilize stale timeout tests with fixed Date.now()
Tests 'should use default timeout when config not provided' (manager.test.ts)
and 'should use DEFAULT_MESSAGE_STALENESS_TIMEOUT_MS when not configured'
(task-poller.test.ts) failed in CI because Date.now() drifted between
test setup (when creating timestamps like Date.now() - 46*60*1000) and
actual execution inside checkAndInterruptStaleTasks().

On slower CI machines, this drift pushed borderline values across
the threshold, causing tasks that should be stale to remain 'running'.

Fix: Mock Date.now with spyOn to return a fixed time, ensuring
consistent timeout calculations regardless of execution speed.
2026-03-23 22:17:03 +09:00
MoerAI
6d7f69625b fix: update stale timeout test fixtures for new 45/60 min defaults 2026-03-23 21:00:59 +09:00
MoerAI
fda17dd161 fix(background-agent): increase default stale timeouts and improve cancellation messages (fixes #2684) 2026-03-23 20:49:43 +09:00
MoerAI
aaaeb6997c fix(ralph-loop): add semantic completion detection as fallback for natural language (fixes #2489) 2026-03-23 20:46:10 +09:00
MoerAI
c41d6fd912 fix(delegate-task): trust user-configured category models without fuzzy validation (fixes #2740) 2026-03-23 20:39:47 +09:00
MoerAI
2f801f6c28 fix(hooks): add bash-file-read-guard to warn agents against cat/head/tail (fixes #2096) 2026-03-23 20:15:20 +09:00
YeonGyu-Kim
6e9128e060 fix: add fetch mock to install test to prevent CI timeout
The first test case 'non-TUI mode: should show warning but continue when
OpenCode binary not found' was missing a globalThis.fetch mock, causing it
to make a real HTTP request to npm registry via fetchNpmDistTags().
The npm fetch timeout (5s) collided with the test timeout (5s), causing
flaky CI failures.

Added the same fetch mock pattern already used by the other two test cases.
Test runtime dropped from 5000ms+ to ~2ms.
2026-03-23 20:03:45 +09:00
MoerAI
92509d8cfb fix(non-interactive-env): detect shell type for csh/tcsh env var syntax (fixes #2089) 2026-03-23 19:33:54 +09:00
YeonGyu-Kim
331f7ec52b Merge pull request #2768 from code-yeongyu/fix/issue-2117
fix: emit formatter events from hashline-edit tool (fixes #2117)
2026-03-23 18:49:10 +09:00
YeonGyu-Kim
4ba2da7ebb fix: add tests and fix typing for formatter trigger (#2768) 2026-03-23 18:46:44 +09:00
YeonGyu-Kim
f95d3b1ef5 fix: emit formatter events from hashline-edit tool (fixes #2117) 2026-03-23 18:40:27 +09:00
YeonGyu-Kim
d5d7c7dd26 Merge pull request #2767 from code-yeongyu/fix/issue-2742
fix: respect disabled_tools config in agent prompts (fixes #2742)
2026-03-23 18:39:51 +09:00
YeonGyu-Kim
6a56c0e241 Merge pull request #2766 from code-yeongyu/fix/issue-390
fix: trigger compaction before continue after session error recovery (fixes #390)
2026-03-23 18:39:50 +09:00
YeonGyu-Kim
94c234c88c Merge pull request #2765 from code-yeongyu/fix/issue-2024
fix: skip keyword injection for non-OMO agents (fixes #2024)
2026-03-23 18:39:48 +09:00
YeonGyu-Kim
2ab976c511 Merge pull request #2764 from code-yeongyu/fix/issue-2624
fix: add oh-my-openagent.jsonc config file detection (fixes #2624)
2026-03-23 18:39:46 +09:00
YeonGyu-Kim
dc66088483 Merge pull request #2763 from code-yeongyu/fix/issue-2037
fix: respect OPENCODE_DISABLE_CLAUDE_CODE env vars (fixes #2037)
2026-03-23 18:39:45 +09:00
YeonGyu-Kim
67b5f46a7c Merge pull request #2762 from code-yeongyu/fix/issue-2150
fix: clarify Prometheus file permission error message (fixes #2150)
2026-03-23 18:39:43 +09:00
YeonGyu-Kim
0e483d27ac Merge pull request #2761 from code-yeongyu/fix/issue-2729
fix: validate serverUrl port before tmux pane spawn (fixes #2729)
2026-03-23 18:39:41 +09:00
YeonGyu-Kim
f5eaa648e9 fix: respect disabled_tools config in agent prompts (fixes #2742)
- Check disabled_tools for 'question' in tool-config-handler permission logic
- Strip Question tool code examples from Prometheus prompts when disabled
- Pass disabled_tools through prometheus agent config builder pipeline
- Add tests for disabled_tools question permission handling
2026-03-23 18:13:38 +09:00
YeonGyu-Kim
4c4760a4ee fix: trigger compaction before continue after session error recovery (fixes #390) 2026-03-23 18:12:51 +09:00
YeonGyu-Kim
7f20dd6ff5 fix: add oh-my-openagent.jsonc config file detection (fixes #2624) 2026-03-23 18:11:01 +09:00
YeonGyu-Kim
de371be236 fix: skip keyword injection for non-OMO agents (fixes #2024) 2026-03-23 18:10:44 +09:00
YeonGyu-Kim
f3c2138ef4 fix: respect OPENCODE_DISABLE_CLAUDE_CODE env vars (fixes #2037) 2026-03-23 18:10:08 +09:00
YeonGyu-Kim
0810e37240 fix: validate serverUrl port before tmux pane spawn (fixes #2729) 2026-03-23 18:09:31 +09:00
YeonGyu-Kim
a64e364fa6 fix: clarify Prometheus file permission error message (fixes #2150) 2026-03-23 18:07:59 +09:00
MoerAI
f16d55ad95 fix: add errorName-based quota detection and strengthen test coverage 2026-03-23 15:19:09 +09:00
github-actions[bot]
d886ac701f @hunghoang3011 has signed the CLA in code-yeongyu/oh-my-openagent#2758 2026-03-23 04:28:31 +00:00
Philippe Oscar Sanoy
3c49bf3a8c Merge branch 'code-yeongyu:dev' into fix/oauth-discovery-root-fallback 2026-03-23 09:45:54 +08:00
MoerAI
29a7bc2d31 fix(plugin): display friendly name in configuration UI instead of file path (fixes #2644) 2026-03-23 10:41:37 +09:00
MoerAI
11f1d71c93 fix(prometheus): respect agent model override instead of using global opencode.json model (fixes #2693) 2026-03-23 10:36:59 +09:00
MoerAI
62d2704009 fix(runtime-fallback): detect prettified quota errors without HTTP status codes (fixes #2747) 2026-03-23 10:34:22 +09:00
Sami Jawhar
db32bad004 fix(look-at): respect configured multimodal-looker model instead of overriding via dynamic fallback 2026-03-23 01:12:24 +00:00
Sami Jawhar
5777bf9894 fix(atlas): restore agent mismatch guard for subagent boulder continuation (#18681) 2026-03-23 01:04:36 +00:00
github-actions[bot]
30dc50d880 @0xYiliu has signed the CLA in code-yeongyu/oh-my-openagent#2738 2026-03-21 23:05:07 +00:00
github-actions[bot]
b17e633464 @ndaemy has signed the CLA in code-yeongyu/oh-my-openagent#2734 2026-03-21 10:18:31 +00:00
ndaemy
07ea8debdc fix(keyword-detector): remove duplicate separator from ultrawork templates 2026-03-21 19:09:51 +09:00
YeonGyu-Kim
eec268ee42 fix: use find() instead of calls[0] in wakeGateway test to handle background fetch calls 2026-03-21 18:01:39 +09:00
github-actions[bot]
363661c0d6 @whackur has signed the CLA in code-yeongyu/oh-my-openagent#2733 2026-03-21 05:27:27 +00:00
PR Bot
0d52519293 feat: upgrade MiniMax from M2.5 to M2.7 and expand to more agents/categories
- Upgrade minimax-m2.5 → minimax-m2.7 (latest model) across all agents and categories
- Replace minimax-m2.5-free with minimax-m2.7-highspeed (optimized speed variant)
- Expand MiniMax fallback coverage to atlas, sisyphus-junior, writing, and unspecified-low
- Add isMiniMaxModel() detection function in types.ts for model family detection
- Update all tests (58 passing) and documentation
2026-03-21 01:29:53 +08:00
Casey Howard
031503bb8c test(background-agent): add regression tests for spawn budget decrement on task completion
Tests prove rootDescendantCounts is never decremented on task completion,
cancellation, or error — making maxDescendants a lifetime quota instead of
a concurrent-active cap. All 4 tests fail (RED phase) before the fix.

Refs: code-yeongyu/oh-my-openagent#2700
2026-03-20 12:52:06 -04:00
Casey Howard
5986583641 fix(background-agent): decrement spawn budget on task completion, cancellation, error, and interrupt
rootDescendantCounts was incremented on every spawn but never decremented
when tasks reached terminal states (completed, cancelled, error, interrupt,
stale-pruned). This made maxDescendants=50 a session-lifetime quota instead
of its intended semantics as a concurrent-active agent cap.

Fix: add unregisterRootDescendant() in five terminal-state handlers:
- tryCompleteTask(): task completes successfully
- cancelTask(): running task cancelled (wasRunning guard prevents
  double-decrement for pending tasks already handled by
  rollbackPreStartDescendantReservation)
- session.error handler: task errors
- promptAsync catch (startTask): task interrupted on launch
- promptAsync catch (resume): task interrupted on resume
- onTaskPruned callback: stale task pruned (wasPending guard)

Fixes: code-yeongyu/oh-my-openagent#2700
2026-03-20 12:51:21 -04:00
github-actions[bot]
261bbdf4dc @nguyentamdat has signed the CLA in code-yeongyu/oh-my-openagent#2718 2026-03-20 07:34:31 +00:00
Taegeon Alan Go
10eb3a07e0 Merge branch 'dev' into fix/analyze-mode-load-skills-hint 2026-03-20 16:29:13 +09:00
Taegeon Go (Alan)
03feaa0594 fix(analyze-mode): add mandatory load_skills param hint to prevent delegate_task errors 2026-03-20 16:26:17 +09:00
YeonGyu-Kim
8aec4c5cb3 feat(hooks/todo-continuation-enforcer): enhance continuation message with skeptical verification guidance 2026-03-20 16:13:02 +09:00
YeonGyu-Kim
16cbc847ac fix(cli/run): set OPENCODE_CLIENT to 'run' to exclude question tool from registry 2026-03-20 16:12:58 +09:00
YeonGyu-Kim
436ce71dc8 docs(skills/github-triage): fix Phase 1 JSON parsing and large repo handling 2026-03-20 16:12:54 +09:00
MoerAI
3773e370ec fix(runtime-fallback): detect bare 429 rate-limit signals (fixes #2677) 2026-03-20 11:00:00 +09:00
MoerAI
23a30e86f2 fix(windows): resolve symlinked config paths for plugin detection (fixes #2271) 2026-03-20 10:44:19 +09:00
MoerAI
d2d65fbf99 fix(sisyphus): block premature implementation before context is complete (fixes #2274) 2026-03-20 10:16:30 +09:00
MoerAI
0e610a72bc fix(runtime-fallback): make fallback provider selection provider-agnostic (fixes #2303) 2026-03-20 09:53:24 +09:00
github-actions[bot]
d2a49428b9 @tonymfer has signed the CLA in code-yeongyu/oh-my-openagent#2701 2026-03-19 17:14:04 +00:00
Tony Park
04637ff0f1 fix(lsp): wrap initialization config in initializationOptions field
The LSP `initialize` request expects custom server options in the
`initializationOptions` field, but the code was spreading
`this.server.initialization` directly into the root params object.
This caused LSP servers that depend on `initializationOptions`
(like ets-language-server, pyright, etc.) to not receive their
configuration.

Closes #2665

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 02:11:54 +09:00
github-actions[bot]
c3b23bf603 @trafgals has signed the CLA in code-yeongyu/oh-my-openagent#2690 2026-03-19 04:22:43 +00:00
YeonGyu-Kim
50094de73e docs: fix remaining AGENTS hook composition text
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-19 12:02:52 +09:00
YeonGyu-Kim
3aa2748c04 docs: sync hook counts after continuation hook removal
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-19 12:02:52 +09:00
YeonGyu-Kim
ccaf759b6b fix(hooks): remove gpt permission continuation hook
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-19 12:02:52 +09:00
YeonGyu-Kim
521a1f76a9 fix(atlas): stop only after 10 consecutive prompt failures
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-19 12:02:52 +09:00
github-actions[bot]
490f0f2090 @walioo has signed the CLA in code-yeongyu/oh-my-openagent#2688 2026-03-19 02:35:04 +00:00
YeonGyu-Kim
caf595e727 fix(build-binaries): prevent test imports from triggering binary builds
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-19 10:47:33 +09:00
YeonGyu-Kim
1f64a45113 Merge pull request #2620 from code-yeongyu/feat/openclaw-bidirectional
feat: port OpenClaw bidirectional integration from omx
2026-03-19 10:47:07 +09:00
YeonGyu-Kim
9b2dc2189c fix(ralph-loop): detect promise tags in tool_result parts for ulw verification
Oracle's <promise>VERIFIED</promise> arrives as a tool_result part from the
task() tool call, not as a text part. Both detectCompletionInSessionMessages
and collectAssistantText only scanned type=text parts, missing the
verification signal entirely. This caused ulw loops to fail verification
even when Oracle successfully emitted VERIFIED.

Include tool_result parts in promise detection alongside text parts.
Exclude tool_use parts to avoid false positives from instructional text.
2026-03-18 19:09:59 +09:00
MoerAI
071fab1618 fix: match existing codebase session.messages() parameter shape 2026-03-18 19:08:05 +09:00
YeonGyu-Kim
f6c24e42af fix(ralph-loop): detect promise tags in tool_result parts for ulw verification
Oracle's <promise>VERIFIED</promise> arrives as a tool_result part from the
task() tool call, not as a text part. Both detectCompletionInSessionMessages
and collectAssistantText only scanned type=text parts, missing the
verification signal entirely. This caused ulw loops to fail verification
even when Oracle successfully emitted VERIFIED.

Include tool_result parts in promise detection alongside text parts.
Exclude tool_use parts to avoid false positives from instructional text.
2026-03-18 19:03:30 +09:00
YeonGyu-Kim
22fd976eb9 feat(categories): change quick category default model from claude-haiku-4-5 to gpt-5.4-mini
GPT-5.4-mini provides stronger reasoning at comparable speed and cost.
Haiku remains as the next fallback priority in the chain.

Changes:
- DEFAULT_CATEGORIES quick model: anthropic/claude-haiku-4-5 → openai/gpt-5.4-mini
- Fallback chain: gpt-5.4-mini → haiku → gemini-3-flash → minimax-m2.5 → gpt-5-nano
- OpenAI-only catalog: quick uses gpt-5.4-mini directly
- Think-mode: add gpt-5-4-mini and gpt-5-4-nano high variants
- Update all documentation references
2026-03-18 19:03:30 +09:00
YeonGyu-Kim
826284f3d9 Merge pull request #2676 from code-yeongyu/fix/atlas-task-session-review-followup
fix(atlas): address review findings for task session reuse
2026-03-18 18:50:45 +09:00
YeonGyu-Kim
3c7e6a3940 fix(atlas): address review findings for task session reuse 2026-03-18 18:44:42 +09:00
YeonGyu-Kim
33ef4db502 Merge pull request #2640 from HaD0Yun/had0yun/atlas-task-session-reuse
feat(atlas): persist preferred task session reuse
2026-03-18 18:37:16 +09:00
YeonGyu-Kim
458ec06b0e fix: extract question text from questions array per opencode tool schema 2026-03-18 18:27:09 +09:00
YeonGyu-Kim
6b66f69433 feat(gpt-permission-continuation): add context-aware continuation prompts
- Add buildContextualContinuationPrompt to include assistant message context
- Move extractPermissionPhrase to detector module for better separation
- Block continuation injection in subagent sessions
- Update handler to use contextual prompts with last response context
- Add tests for subagent session blocking and contextual prompts
- Update todo coordination test to verify new prompt format

🤖 Generated with assistance of OhMyOpenCode
2026-03-18 17:52:32 +09:00
YeonGyu-Kim
ce8957e1e1 fix(ralph-loop): harden oracle verification flow
Capture oracle verification sessions more reliably and accept parent-session VERIFIED evidence so ULW loops do not retry after successful review.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-18 17:45:59 +09:00
sanoyphilippe
0d96e0d3bc Fix OAuth discovery for servers with non-root resource paths
When the resource URL has a sub-path (e.g. https://mcp.sentry.dev/mcp),
the RFC 8414 path-suffixed well-known URL may not exist. Fall back to
the root well-known URL before giving up.

This matches OpenCode core's behavior and fixes authentication for
servers like Sentry that serve OAuth metadata only at the root path.
2026-03-18 16:45:54 +08:00
MoerAI
a3db64b931 fix: address cubic review — SDK compatibility and race condition fixes 2026-03-18 17:42:17 +09:00
HaD0Yun
8859da5fef fix(atlas): harden task session reuse 2026-03-18 17:31:27 +09:00
YeonGyu-Kim
23c0ff60f2 feat(background-agent): increase default max tool calls to 4000
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-18 16:36:55 +09:00
MoerAI
4723319eef fix(atlas): use worktree path for git verification when available (fixes #2229) 2026-03-18 16:23:37 +09:00
MoerAI
b8f3186d65 fix(session): detect post-compaction no-text degradation and trigger recovery (fixes #2232) 2026-03-18 16:13:23 +09:00
YeonGyu-Kim
01e18f8773 chore: remove console.* debug logging from non-CLI source files 2026-03-18 15:29:50 +09:00
YeonGyu-Kim
1669c83782 revert(todo-continuation): remove [TODO-DIAG] console.error debug logging 2026-03-18 15:10:51 +09:00
YeonGyu-Kim
09cfd0b408 diag(todo-continuation): add comprehensive debug logging for session idle handling
Add [TODO-DIAG] console.error statements throughout the todo continuation
enforcer to help diagnose why continuation prompts aren't being injected.

Changes:
- Add session.idle event handler diagnostic in handler.ts
- Add detailed blocking reason logging in idle-event.ts for all gate checks
- Update JSON schema to reflect circuit breaker config changes

🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2026-03-18 14:45:14 +09:00
YeonGyu-Kim
d48ea025f0 refactor(circuit-breaker): replace sliding window with consecutive call detection
Switch background task loop detection from percentage-based sliding window
(80% of 20-call window) to consecutive same-tool counting. Triggers when
same tool signature is called 20+ times in a row; a different tool resets
the counter.
2026-03-18 14:32:27 +09:00
YeonGyu-Kim
c5c7ba4eed perf: pre-compile regex patterns and optimize hot-path string operations
- error-classifier: pre-compile default retry pattern regex
- think-mode/detector: combine multilingual patterns into single regex
- parser: skip redundant toLowerCase on pre-lowered keywords
- edit-operations: use fast arraysEqual instead of JSON comparison
- hash-computation: optimize streaming line extraction with index tracking
2026-03-18 14:19:23 +09:00
YeonGyu-Kim
90aa3a306c perf(hooks,tools): optimize string operations and reduce redundant iterations
- output-renderer, hashline-edit-diff: replace str += with array join (H2)
- auto-slash-command: single-pass Map grouping instead of 6x filter (M1)
- comment-checker: hoist Zod schema to module scope (M2)
- session-last-agent: reverse iterate sorted array instead of sort+reverse (L2)
2026-03-18 14:19:12 +09:00
YeonGyu-Kim
c2f7d059d2 perf(shared): optimize hot-path utilities across plugin
- task-list: replace O(n³) blocker resolution with Map lookup (C4)
- logger: buffer log entries and flush periodically to reduce sync I/O (C5)
- plugin-interface: create chatParamsHandler once at init (H3)
- pattern-matcher: cache compiled RegExp for wildcard matchers (H6)
- file-reference-resolver: use replaceAll instead of split/join (M9)
- connected-providers-cache: add in-memory cache for read operations (L4)
2026-03-18 14:19:00 +09:00
YeonGyu-Kim
7a96a167e6 perf(claude-code-hooks): defer config loading until after disabled check
Move loadClaudeHooksConfig and loadPluginExtendedConfig after isHookDisabled check
in both tool-execute-before and tool-execute-after handlers to skip 5 file reads
per tool call when hooks are disabled (C1)
2026-03-18 14:18:49 +09:00
YeonGyu-Kim
2da19fe608 perf(background-agent): use Set for countedToolPartIDs, cache circuit breaker settings, optimize loop detector
- Replace countedToolPartIDs string[] with Set<string> for O(1) has/add vs O(n) includes/spread (C2)
- Cache resolveCircuitBreakerSettings at manager level to avoid repeated object creation (C3)
- Optimize recordToolCall to avoid full array copy with slice (L1)
2026-03-18 14:18:38 +09:00
YeonGyu-Kim
952bd5338d fix(background-agent): treat non-active session statuses as terminal to prevent parent session hang
Previously, pollRunningTasks() and checkAndInterruptStaleTasks() treated
any non-"idle" session status as "still running", which caused tasks with
terminal statuses like "interrupted" to be skipped indefinitely — both
for completion detection AND stale timeout. This made the parent session
hang forever waiting for an ALL COMPLETE notification that never came.

Extract isActiveSessionStatus() and isTerminalSessionStatus() that
classify session statuses explicitly. Only known active statuses
("busy", "retry", "running") protect tasks from completion/stale checks.
Known terminal statuses ("interrupted") trigger immediate completion.
Unknown statuses fall through to the standard idle/gone path with output
validation as a conservative default.

Introduced by: a0c93816 (2026-02-14), dc370f7f (2026-03-08)
2026-03-18 14:06:23 +09:00
YeonGyu-Kim
57757a345d refactor: improve test isolation and DI for cache/port-utils/resolve-file-uri
- connected-providers-cache: extract factory pattern (createConnectedProvidersCacheStore) for testable cache dir injection
- port-utils.test: environment-independent tests with real socket probing and contiguous port detection
- resolve-file-uri.test: mock homedir instead of touching real home directory
- github-triage: update SKILL.md
2026-03-18 13:17:01 +09:00
YeonGyu-Kim
3caae14192 fix(ralph-loop): abort stale Oracle sessions before ulw verification restart
When Oracle verification fails in ulw-loop mode, the previous Oracle
session was never aborted before restarting. Each retry created a new
descendant session, causing unbounded session accumulation and 500
errors from server overload.

Now abort the old verification session before:
- restarting the loop after failed verification
- re-entering verification phase on subsequent DONE detection
2026-03-18 12:49:27 +09:00
kilhyeonjun
719a58270b fix(shared): respect cached model context limits for Anthropic providers post-GA
After Anthropic's 1M context GA (2026-03-13), the beta header is no
longer sent. The existing detection relied solely on the beta header
to set anthropicContext1MEnabled, causing all Anthropic models to
fall back to the 200K default despite models.dev reporting 1M.

Update resolveActualContextLimit to check per-model cached limits
from provider config (populated from models.dev data) when the
explicit 1M flag is not set. Priority order:
1. Explicit 1M mode (beta header or env var) - all Anthropic models
2. Per-model cached limit from provider config
3. Default 200K fallback

This preserves the #2460 fix (explicit 1M flag always wins over
cached values) while allowing GA models to use their correct limits.

Fixes premature context warnings at 140K and unnecessary compaction
at 156K for opus-4-6 and sonnet-4-6 users without env var workaround.
2026-03-18 12:21:08 +09:00
YeonGyu-Kim
55ac653eaa feat(hooks): add todo-description-override hook to enforce atomic todo format
Override TodoWrite description via tool.definition hook to require
WHERE/WHY/HOW/RESULT in each todo title and enforce 1-3 tool call
granularity.
2026-03-18 11:49:13 +09:00
YeonGyu-Kim
1d5652dfa9 Merge pull request #2655 from tad-hq/infinite-circuit-target-fix
fix(circuit-breaker): make repetitive detection target-aware and add enabled escape hatch
2026-03-18 11:46:06 +09:00
YeonGyu-Kim
76c460536d docs(start-work): update worktree and task breakdown guidance
- Change worktree behavior: default to current directory, worktree only with --worktree flag
- Add mandatory TASK BREAKDOWN section with granular sub-task requirements
- Add WORKTREE COMPLETION section for merging worktree branches back

🤖 Generated with assistance of OhMyOpenCode
2026-03-18 11:16:43 +09:00
github-actions[bot]
b067d4a284 @ogormans-deptstack has signed the CLA in code-yeongyu/oh-my-openagent#2656 2026-03-17 20:42:53 +00:00
github-actions[bot]
94838ec039 @tad-hq has signed the CLA in code-yeongyu/oh-my-openagent#2655 2026-03-17 20:07:20 +00:00
tad-hq
224ecea8c7 chore: regenerate JSON schema with circuitBreaker.enabled field 2026-03-17 13:43:56 -06:00
tad-hq
5d5755f29d fix(circuit-breaker): wire target-aware detection into background manager 2026-03-17 13:40:46 -06:00
tad-hq
1fdce01fd2 fix(circuit-breaker): target-aware loop detection via tool signatures 2026-03-17 13:36:09 -06:00
tad-hq
c8213c970e fix(circuit-breaker): add enabled config flag as escape hatch 2026-03-17 13:29:06 -06:00
YeonGyu-Kim
576ff453e5 Merge pull request #2651 from code-yeongyu/fix/openagent-version-in-publish
fix(release): set version when publishing oh-my-openagent
2026-03-18 02:15:36 +09:00
YeonGyu-Kim
9b8aca45f9 fix(release): set version when publishing oh-my-openagent
The publish step was updating name and optionalDependencies but not
version, causing npm to try publishing the base package.json version
(3.11.0) instead of the release version (3.12.0).

Error was: 'You cannot publish over the previously published versions: 3.11.0'
2026-03-18 02:15:15 +09:00
YeonGyu-Kim
f1f20f5a79 Merge pull request #2650 from code-yeongyu/fix/openagent-platform-publish
fix(release): add oh-my-openagent dual-publish to platform and main workflows
2026-03-18 01:55:31 +09:00
YeonGyu-Kim
de40caf76d fix(release): add oh-my-openagent dual-publish to platform and main workflows
- publish-platform.yml: Build job now checks BOTH oh-my-opencode and
  oh-my-openagent before skipping. Build only skips when both are published.
  Added 'Publish oh-my-openagent-{platform}' step that renames package.json
  and publishes under the openagent name.

- publish.yml: Added 'Publish oh-my-openagent' step after opencode publish.
  Rewrites package name and optionalDependencies to oh-my-openagent variants,
  then publishes. Restores package.json after.

Previously, oh-my-openagent platform packages were never published because
the build skip check only looked at oh-my-opencode (which was already published),
causing the entire build to be skipped.
2026-03-18 01:45:02 +09:00
Ravi Tharuma
71b1f7e807 fix(anthropic-effort): clamp variant against mutable request message 2026-03-17 11:57:56 +01:00
HaD0Yun
8adf6a2c47 fix(atlas): tighten session reuse metadata parsing 2026-03-17 18:14:17 +09:00
github-actions[bot]
d80833896c @HaD0Yun has signed the CLA in code-yeongyu/oh-my-openagent#2640 2026-03-17 08:27:56 +00:00
HaD0Yun
5c6194372e feat(atlas): persist preferred task session reuse 2026-03-17 17:25:46 +09:00
YeonGyu-Kim
399796cbe4 fix(openclaw): add comment clarifying proc.exited race condition avoidance
cubic identified potential race condition where Bun's proc.exitCode
may be null immediately after stdout closes. Added clarifying
comment that await proc.exited ensures exitCode is set before
checking.

fixes: cubic review on PR #2620
2026-03-17 17:14:52 +09:00
YeonGyu-Kim
77c3ed1a1f chore: remove omx state files and add .omx/ to gitignore 2026-03-17 17:00:29 +09:00
YeonGyu-Kim
82e25c845b fix: address cubic re-review — remove non-existent session.stop event, fix env var fallback test 2026-03-17 17:00:18 +09:00
YeonGyu-Kim
d50c38f037 refactor(tests): rename benchmarks/ to tests/hashline/, remove FriendliAI dependency
- Move benchmarks/ → tests/hashline/
- Replace @friendliai/ai-provider with @ai-sdk/openai-compatible
- Remove all 'benchmark' naming (package name, scripts, env vars, session IDs)
- Fix import paths for new directory depth (../src → ../../src)
- Fix pre-existing syntax error in headless.ts (unclosed case block)
- Inject HASHLINE_EDIT_DESCRIPTION into test system prompt
- Scripts renamed: bench:* → test:*
2026-03-17 16:47:13 +09:00
YeonGyu-Kim
f2d5f4ca92 improve(hashline-edit): rewrite tool description with examples and fix lines schema
- Add XML-structured description (<must>, <operations>, <examples>, <auto>)
- Add 5 concrete examples including BAD pattern showing duplication
- Add explicit anti-duplication warning for range replace
- Move snapshot rule to top-level <must> section
- Clarify batch semantics (multiple ops, not one big replace)
- Fix lines schema: add string[] to union (was string|null, now string[]|string|null)
- Matches runtime RawHashlineEdit type and description text
2026-03-17 16:47:13 +09:00
YeonGyu-Kim
b788586caf relax task timeouts: stale timeout 3min→20min, session wait 30s→1min 2026-03-17 16:47:13 +09:00
YeonGyu-Kim
90351e442e update look_at tool description to discourage visual precision use cases 2026-03-17 16:47:13 +09:00
YeonGyu-Kim
4ad88b2576 feat(task-toast): show model name before category in toast notification
Display resolved model ID (e.g., gpt-5.3-codex: deep) instead of
agent/category format when modelInfo is available. Falls back to
old format when no model info exists.
2026-03-17 16:47:13 +09:00
YeonGyu-Kim
2ce69710e3 docs: sync agent-model-matching guide with actual fallback chains
- Metis: add missing GPT-5.4 high as 2nd fallback
- Hephaestus: add GPT-5.4 (Copilot) fallback, was incorrectly listed as Codex-only
- Oracle: add opencode-go/glm-5 as last fallback
- Momus: add opencode-go/glm-5 fallback, note xhigh variant
- Atlas: add GPT-5.4 medium as 3rd fallback
- Sisyphus: add Kimi K2.5 (moonshot providers) in chain
- Sisyphus-Junior: add missing agent to Utility Runners section
- GPT Family table: merge duplicate GPT-5.4 rows
- Categories: add missing opencode-go intermediate fallbacks for
  visual-engineering, ultrabrain, quick, unspecified-low/high, writing
2026-03-17 16:47:13 +09:00
YeonGyu-Kim
0b4d092cf6 Merge pull request #2639 from code-yeongyu/feature/2635-smart-circuit-breaker
feat(background-agent): add smart circuit breaker for repeated tool calls
2026-03-17 16:43:08 +09:00
YeonGyu-Kim
53285617d3 Merge pull request #2636 from code-yeongyu/fix/pre-publish-blockers
fix: resolve 12 pre-publish blockers (security, correctness, migration)
2026-03-17 16:36:04 +09:00
YeonGyu-Kim
ae3befbfbe fix(background-agent): apply smart circuit breaker to manager events
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-17 16:31:55 +09:00
YeonGyu-Kim
dc1a05ac3e feat(background-agent): add loop detector helpers
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-17 16:31:55 +09:00
YeonGyu-Kim
e271b4a1b0 feat(config): add background task circuit breaker settings
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-17 16:31:55 +09:00
YeonGyu-Kim
fee938d63a fix(cli): cherry-pick glm-4.7-free → gpt-5-nano fallback fix from dev 2026-03-17 16:30:12 +09:00
YeonGyu-Kim
4d74d888e4 Merge pull request #2637 from code-yeongyu/fix/ulw-verification-session-tracking
fix(ulw-loop): add fallback for Oracle verification session tracking
2026-03-17 16:25:28 +09:00
YeonGyu-Kim
4bc7b1d27c fix(ulw-loop): add fallback for Oracle verification session tracking
The verification_session_id was never reliably set because the
prompt-based attempt_id matching in tool-execute-after depends on
metadata.prompt surviving the delegate-task execution chain. When
this fails silently, the loop never detects Oracle's VERIFIED
emission.

Add a fallback: when exact attempt_id matching fails but oracle
agent + verification_pending state match, still set the session ID.
Add diagnostic logging to trace verification flow failures.
Add integration test covering the full verification chain.
2026-03-17 16:21:40 +09:00
YeonGyu-Kim
78dac0642e Merge pull request #2590 from MoerAI/fix/subagent-circuit-breaker
fix(background-agent): add circuit breaker to prevent subagent infinite loops (fixes #2571)
2026-03-17 16:09:29 +09:00
YeonGyu-Kim
92bc72a90b fix(bun-install): use workspaceDir option instead of hardcoded cache-dir 2026-03-17 16:05:51 +09:00
YeonGyu-Kim
a7301ba8a9 fix(delegate-task): guard skipped sentinel in subagent-resolver 2026-03-17 15:57:23 +09:00
YeonGyu-Kim
e9887dd82f fix(doctor): align auto-update and doctor config paths 2026-03-17 15:56:02 +09:00
YeonGyu-Kim
c0082d8a09 Merge pull request #2634 from code-yeongyu/fix/run-in-background-required
fix(delegate-task): remove auto-default for run_in_background, require explicit parameter
2026-03-17 15:55:17 +09:00
YeonGyu-Kim
fbc3b4e230 Merge pull request #2612 from MoerAI/fix/dead-fallback-model
fix(cli): replace dead glm-4.7-free with gpt-5-nano as ultimate fallback (fixes #2101)
2026-03-17 15:53:29 +09:00
YeonGyu-Kim
1f7fdb43ba Merge pull request #2539 from cpkt9762/fix/category-variant-no-requirement
fix(delegate-task): build categoryModel with variant for categories without fallback chain
2026-03-17 15:53:11 +09:00
YeonGyu-Kim
566031f4fa fix(delegate-task): remove auto-default for run_in_background, require explicit parameter
Remove the auto-defaulting logic from PR #2420 that silently set
run_in_background=false when category/subagent_type/session_id was present.

The tool description falsely claimed 'Default: false' which misled agents
into omitting the parameter. Now the description says REQUIRED and the
validation always throws when the parameter is missing, with a clear
error message guiding the agent to retry with the correct value.

Reverts the behavioral change from #2420 while keeping the issue's
root cause (misleading description) fixed.
2026-03-17 15:49:47 +09:00
YeonGyu-Kim
0cf386ec52 fix(skill-tool): invalidate cached skill description on execute 2026-03-17 15:49:26 +09:00
YeonGyu-Kim
d493f9ec3a fix(cli-run): move resolveRunModel inside try block 2026-03-17 15:49:26 +09:00
YeonGyu-Kim
2c7ded2433 fix(background-agent): defer task cleanup while siblings running 2026-03-17 15:17:34 +09:00
YeonGyu-Kim
82c7807a4f fix(event): clear retry dedupe key on non-retry status 2026-03-17 15:17:34 +09:00
YeonGyu-Kim
df7e1ae16d fix(todo-continuation): remove activity-based stagnation bypass 2026-03-17 15:17:34 +09:00
YeonGyu-Kim
0471078006 fix(tmux): escape serverUrl in pane shell commands 2026-03-17 15:16:54 +09:00
YeonGyu-Kim
1070b9170f docs: remove temporary injury notice from README 2026-03-17 10:41:56 +09:00
acamq
bb312711cf Merge pull request #2618 from RaviTharuma/fix/extract-status-code-nested-errors
fix(runtime-fallback): extract status code from nested AI SDK errors
2026-03-16 16:28:31 -06:00
github-actions[bot]
c31facf41e @gxlife has signed the CLA in code-yeongyu/oh-my-openagent#2625 2026-03-16 15:17:21 +00:00
YeonGyu-Kim
c644930753 Fix OpenClaw review issues 2026-03-16 22:28:54 +09:00
YeonGyu-Kim
b79df5e018 feat: port OpenClaw bidirectional integration from omx
Ports the complete OpenClaw integration system from oh-my-codex:

Outbound (opencode→OpenClaw):
- wakeOpenClaw() fire-and-forget gateway notifications
- HTTP and command gateway dispatchers
- Template variable interpolation
- Config from oh-my-opencode.jsonc (no env gate needed)

Inbound (OpenClaw→opencode):
- Reply listener daemon (Discord/Telegram polling)
- Session registry for message↔tmux pane correlation
- Tmux pane detection, content capture, and text injection
- Input sanitization and rate limiting
- Pane verification before injection

Files:
- src/openclaw/ (types, config, dispatcher, index, reply-listener, session-registry, tmux, daemon)
- src/config/schema/openclaw.ts (Zod v4 schema)
- src/hooks/openclaw.ts (session hook)
- Tests: 12 pass (config + dispatcher)
2026-03-16 21:55:10 +09:00
Ravi Tharuma
de66f1f397 fix(runtime-fallback): prefer numeric status codes over non-numeric in extraction chain
The nullish-coalescing chain could stop at a non-numeric value (e.g.
status: "error"), preventing deeper nested numeric statusCode values
from being reached. Switch to Array.find() with a type guard to always
select the first numeric value.

Adds 11 tests for extractStatusCode covering: top-level, nested
(data/error/cause), non-numeric skip, fallback to regex, and
precedence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 13:51:23 +01:00
YeonGyu-Kim
427fa6d7a2 Merge pull request #2619 from code-yeongyu/revert/openclaw-one-way
revert: remove one-way OpenClaw integration
2026-03-16 21:09:30 +09:00
YeonGyu-Kim
239da8b02a Revert "Merge pull request #2607 from code-yeongyu/feat/openclaw-integration"
This reverts commit 8213534e87, reversing
changes made to 84fb1113f1.
2026-03-16 21:09:08 +09:00
YeonGyu-Kim
17244e2c84 Revert "Merge pull request #2609 from code-yeongyu/fix/rename-omx-to-omo-env"
This reverts commit 4759dfb654, reversing
changes made to 8213534e87.
2026-03-16 21:09:08 +09:00
Ravi Tharuma
24a0f7b032 fix(runtime-fallback): extract status code from nested AI SDK errors
AI SDK wraps HTTP status codes inside error.error.statusCode (e.g., AI_APICallError). The current extractStatusCode only checks the top level, missing these nested codes.

This caused runtime-fallback to skip retryable errors like 400, 500, 504 because it couldn't find the status code.

Fixes #2617
2026-03-16 13:04:14 +01:00
MoerAI
fc48df1d53 fix(cli): replace dead glm-4.7-free with gpt-5-nano as ultimate fallback
The opencode/glm-4.7-free model was removed from the OpenCode platform,
causing the ULTIMATE_FALLBACK in the CLI installer to point to a dead
model. Users installing OMO without any major provider configured would
get a non-functional model assignment.

Replaced with opencode/gpt-5-nano which is confirmed available per
user reports and existing fallback chains in model-requirements.ts.

Fixes #2101
2026-03-16 19:21:10 +09:00
MoerAI
6455b851b8 fix(config): keep default OpenCode Build agent enabled by default
The default_builder_enabled config defaults to false, which removes
the default OpenCode Build agent on OMO install. This forces users
into the full OMO orchestration for every task, including simple ones
where the lightweight Build agent would be more appropriate.

Changed the default to true so the Build agent remains available
alongside Sisyphus. Users who prefer the previous behavior can set
default_builder_enabled: false in their config.

Fixes #2545
2026-03-16 19:18:46 +09:00
YeonGyu-Kim
4759dfb654 Merge pull request #2609 from code-yeongyu/fix/rename-omx-to-omo-env
fix: rename OMX_OPENCLAW env vars to OMO_OPENCLAW
2026-03-16 18:47:50 +09:00
YeonGyu-Kim
2c8813e95d fix: rename OMX_OPENCLAW env vars to OMO_OPENCLAW
Renames all environment variable gates from the old oh-my-codex (OMX) prefix
to the correct oh-my-openagent (OMO) prefix:

- OMX_OPENCLAW -> OMO_OPENCLAW
- OMX_OPENCLAW_COMMAND -> OMO_OPENCLAW_COMMAND
- OMX_OPENCLAW_DEBUG -> OMO_OPENCLAW_DEBUG
- OMX_OPENCLAW_COMMAND_TIMEOUT_MS -> OMO_OPENCLAW_COMMAND_TIMEOUT_MS

Adds TDD tests verifying:
- OMO_OPENCLAW=1 is required for activation
- Old OMX_OPENCLAW env var is not accepted
2026-03-16 18:45:34 +09:00
YeonGyu-Kim
8213534e87 Merge pull request #2607 from code-yeongyu/feat/openclaw-integration
feat: implement OpenClaw integration
2026-03-16 17:48:11 +09:00
YeonGyu-Kim
450685f5ea fix: extract session ID from properties.info.id for session.created/deleted events 2026-03-16 17:38:47 +09:00
YeonGyu-Kim
03b346ba51 feat: implement OpenClaw integration
Ports the OMX OpenClaw module into oh-my-openagent as a first-class integration.
This integration allows forwarding internal events (session lifecycle, tool execution) to external gateways (HTTP or command-based).

- Added `src/openclaw` directory with implementation:
  - `dispatcher.ts`: Handles HTTP/Command dispatching with interpolation
  - `types.ts`: TypeScript definitions
  - `client.ts`: Main entry point `wakeOpenClaw`
  - `index.ts`: Public API
- Added `src/config/schema/openclaw.ts` for Zod schema validation
- Updated `src/config/schema/oh-my-opencode-config.ts` to include `openclaw` config
- Added `src/hooks/openclaw-sender/index.ts` to listen for events
- Registered the hook in `src/plugin/hooks/create-session-hooks.ts`
- Added unit tests in `src/openclaw/__tests__`

Events handled:
- `session-start` (via `session.created`)
- `session-end` (via `session.deleted`)
- `session-idle` (via `session.idle`)
- `ask-user-question` (via `tool.execute.before` for `ask_user_question`)
- `stop` (via `tool.execute.before` for `stop-continuation` command)
2026-03-16 17:21:56 +09:00
Ravi Tharuma
9346bc8379 fix: clamp variant "max" to "high" for non-Opus Claude models on fallback
When an agent configured with variant: "max" falls back from Opus to
Sonnet (or Haiku), the "max" variant was passed through unchanged.
OpenCode sends this as level: "max" to the Anthropic API, which rejects
it with: level "max" not supported, valid levels: low, medium, high

The anthropic-effort hook previously only handled Opus (inject effort=max)
and skipped all other Claude models. Now it actively clamps "max" → "high"
for non-Opus Claude models and mutates message.variant so OpenCode
doesn't pass the unsupported level to the API.
2026-03-16 07:49:55 +01:00
YeonGyu-Kim
84fb1113f1 chore: add pre-publish blocker tracking document
Add FIX-BLOCKS.md to track critical and high-priority issues

identified in pre-publish reviews.

🤖 GENERATED WITH ASSISTANCE OF OhMyOpenCode
2026-03-16 14:15:36 +09:00
YeonGyu-Kim
90decd1fd4 chore(schema): regenerate schema after hook enum forward-compat change
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-openagent)
2026-03-16 14:15:36 +09:00
YeonGyu-Kim
47d1ad7bb9 fix(plugin): persist ultrawork variant on same-model override and normalize Claude model IDs
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-openagent)
2026-03-16 14:15:36 +09:00
YeonGyu-Kim
32a296bf1e fix(auto-slash-command): use event-ID dedup, align precedence, enforce skill agent gate
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-openagent)
2026-03-16 14:15:36 +09:00
YeonGyu-Kim
67bb9ec1e2 fix(delegate-task): resolve variant-bearing fallback models during immediate selection
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-openagent)
2026-03-16 14:15:36 +09:00
YeonGyu-Kim
d57c27feee fix(tmux): replace hardcoded zsh with portable shell detection
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-openagent)
2026-03-16 14:15:36 +09:00
YeonGyu-Kim
1339ecdd13 fix(hashline): restore v3.11.2 legacy hash computation for backward compatibility
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-openagent)
2026-03-16 14:15:36 +09:00
github-actions[bot]
8c4fa47e5e @sanoyphilippe has signed the CLA in code-yeongyu/oh-my-openagent#2604 2026-03-16 04:55:22 +00:00
github-actions[bot]
10e0c7f997 @Jrakru has signed the CLA in code-yeongyu/oh-my-openagent#2602 2026-03-16 03:40:45 +00:00
YeonGyu-Kim
48707a6901 test(tmux): isolate tmux environment checks from process env
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 11:37:56 +09:00
YeonGyu-Kim
fe3f0584ed test(skill-loader): avoid node:fs mock leakage in project skill references
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 11:37:56 +09:00
acamq
1cfc1c8a8b Merge pull request #2596 from cyberprophet/fix/doctor-plugin-version-fallback
fix(doctor): fall back to loadedVersion when pluginVersion is null
2026-03-15 20:22:10 -06:00
acamq
8401e61260 Merge pull request #2597 from code-yeongyu/fix/todo-compaction-only-guard
fix(todo-continuation-enforcer): skip continuation when only compaction messages exist
2026-03-15 20:20:20 -06:00
acamq
085ca0abcb Merge pull request #2598 from code-yeongyu/revert-2582-fix/fix-install-test
Revert "fix(test): update package name to oh-my-openagent in install test"
2026-03-15 20:09:25 -06:00
MoerAI
7e3c36ee03 ci: retrigger CI 2026-03-16 11:08:14 +09:00
MoerAI
11d942f3a2 fix(runtime-fallback): detect Gemini quota errors in session.status retry events
When Gemini returns a quota exhausted error, OpenCode auto-retries and
fires session.status with type='retry'. The extractAutoRetrySignal
function requires BOTH 'retrying in' text AND a quota pattern to match,
but some providers (like Gemini) include only the error text in the
retry message without the 'retrying in' phrase.

Since status.type='retry' already confirms this is a retry event, the
fix adds a fallback check: if extractAutoRetrySignal fails, check the
message directly against RETRYABLE_ERROR_PATTERNS. This ensures quota
errors like 'exhausted your capacity' trigger the fallback chain even
when the retry message format differs from expected.

Fixes #2454
2026-03-16 11:08:14 +09:00
MoerAI
3055454ecc fix(background-agent): add circuit breaker to prevent subagent infinite loops
Adds a configurable maxToolCalls limit (default: 200) that automatically
cancels background tasks when they exceed the threshold. This prevents
runaway subagent loops from burning unlimited tokens, as reported in #2571
where a Gemini subagent ran 809 consecutive tool calls over 3.5 hours
costing ~$350.

The circuit breaker triggers in the existing tool call tracking path
(message.part.updated/delta events) and cancels the task with a clear
error message explaining what happened. The limit is configurable via
background_task.maxToolCalls in oh-my-opencode.jsonc.

Fixes #2571
2026-03-16 11:07:33 +09:00
MoerAI
2b6b08345a fix(todo-continuation-enforcer): add plan agent to DEFAULT_SKIP_AGENTS to prevent infinite loop
The todo-continuation-enforcer injects continuation prompts when
sessions go idle with pending todos. When Plan Mode agents (which are
read-only) create todo items, the continuation prompt contradicts
Plan Mode's STRICTLY FORBIDDEN directive, causing an infinite loop
where the agent acknowledges the conflict then goes idle, triggering
another injection.

Adding 'plan' to DEFAULT_SKIP_AGENTS prevents continuation injection
into Plan Mode sessions, matching the same exclusion pattern already
used for prometheus and compaction agents.

Fixes #2526
2026-03-16 11:07:28 +09:00
acamq
a7800a8bf6 Revert "fix(test): update package name to oh-my-openagent in install test" 2026-03-15 20:06:55 -06:00
MoerAI
abdd39da00 fix(agent-registration): always attempt fallback when model resolution fails
Removes both the isFirstRunNoCache and override?.model guards from
the fallback logic in collectPendingBuiltinAgents(). Previously, when
a user configured a model like minimax/MiniMax-M2.5 that wasn't in
availableModels, the agent was silently excluded and --agent Librarian
would crash with 'undefined is not an object'.

Now: if applyModelResolution() fails for ANY reason (cache state,
unavailable model, config merge issue), getFirstFallbackModel() is
always attempted. A log warning is emitted when a user-configured
model couldn't be resolved, making the previously silent failure
visible.

Supersedes #2517
Fixes #2427
2026-03-16 11:06:00 +09:00
acamq
9e7abe2dea fix(todo-continuation-enforcer): skip continuation for compaction-only message history 2026-03-15 20:02:56 -06:00
cyberprophet
5b7ca99b96 fix(doctor): fall back to loadedVersion when pluginVersion is null 2026-03-16 11:00:05 +09:00
YeonGyu-Kim
f31f50abec fix(release): revert package identity to oh-my-opencode
Keep installer, config detection, schema generation, and publish workflows aligned with the long-lived oh-my-opencode package so this release does not split across two npm names.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
612b9c163d fix(config): clear stale context limit cache on provider updates
Rebuilding provider model limits prevents removed entries from leaking into later compaction decisions after config changes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
16b0d9eb77 fix(atlas): gate final-wave approval on real plan state
Ignore nested plan checkboxes and track parallel final-wave approvals so Atlas only pauses for user approval when the real top-level review wave is complete.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
1ad5db4e8b fix(runtime-fallback): advance session.status fallback chain
Allow provider cooldown events to override a pending fallback retry so runtime fallback can keep progressing instead of stalling on the same model.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
988478a0fa fix(config): allow forward-compatible disabled hooks
Keep disabled_hooks aligned with runtime behavior by accepting unknown hook names instead of treating future entries as schema errors.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
e87075b9a4 fix(background-task): restore opt-in full session output
Bring background_output back to the legacy contract so callers only get full session transcripts when they explicitly ask for them.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
fe4493c6a6 fix(model-fallback): keep model fallback opt-in by default
Restore the runtime default that was introduced for model fallback so unset config no longer enables automatic retries unexpectedly.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
7f7527047e fix(cli): validate and detect OpenCode Go install settings
Reject invalid --opencode-go values during non-TUI installs and detect existing OpenCode Go usage from the generated oh-my-opencode config so updates preserve the right defaults.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
532995bb51 fix(model-fallback): align OpenAI fallback resolution across CLI and runtime
Keep install-time and runtime model tables in sync, stop OpenAI-only misrouting when OpenCode Go is present, and add valid OpenAI fallbacks for atlas, metis, and sisyphus-junior.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:55 +09:00
YeonGyu-Kim
b63082a3bb fix(skills): correct invalid task tool references 2026-03-16 10:38:54 +09:00
YeonGyu-Kim
674df1b1b8 fix(hooks): remove dead delegate-task-english-directive hook 2026-03-16 10:38:54 +09:00
YeonGyu-Kim
2b8ae214b6 fix(auto-slash-command): expire duplicate suppression after 30s
Allow legitimate repeated slash commands in long sessions by replacing session-lifetime dedup with a short-lived TTL cache.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:54 +09:00
YeonGyu-Kim
bbd2e86499 fix(hashline): accept legacy hashes for indented anchors
Keep persisted LINE#ID anchors working after strict whitespace hashing by falling back to the legacy hash for validation-only lookups.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-16 10:38:54 +09:00
acamq
f03de4f8a8 Merge pull request #2535 from conversun/fix/prometheus-compaction-agent-fallback
fix(todo-continuation-enforcer): prevent post-compaction agent fallback to General
2026-03-15 19:29:34 -06:00
acamq
65ccc9b854 Merge pull request #2588 from acamq/refactor/doctor-lsp-extensions
refactor(doctor): show detected LSP extensions instead of hardcoded server counts
2026-03-15 19:12:48 -06:00
acamq
85d812964b chore: remove unused LspServerInfo type 2026-03-15 19:09:47 -06:00
acamq
da788d3906 fix(doctor): remove redundant extensions from verbose LSP header
The header line was showing all extensions unioned together which was
redundant with the per-server detail lines below and caused line overflow.
Status mode also simplified to just show server count.
2026-03-15 19:02:21 -06:00
acamq
03da2e94a2 refactor(doctor): show detected LSP servers and extensions instead of hardcoded counts
Replace the hardcoded 4-server list in doctor LSP check with getAllServers()
from server-resolution.ts, which covers all 40+ builtin servers plus user
config. Output now shows server count with supported extensions, and verbose
mode expands to per-server detail lines.

Status:  LSP 3 servers (.go, .py, .pyi, .ts, .tsx)
Verbose: LSP 3 servers (.go, .py, .pyi, .ts, .tsx)
           typescript (.ts, .tsx, .js, .jsx)
           pyright (.py, .pyi)
           gopls (.go)

Closes #2587
2026-03-15 19:00:17 -06:00
acamq
73685da275 Merge pull request #2563 from robinmordasiewicz/fix/claude-code-plugin-v3-array-format
fix(plugin-loader): support Claude Code v3 flat array format for installed_plugins.json
2026-03-15 18:02:29 -06:00
acamq
8f9bdf0893 Merge pull request #2559 from MoerAI/fix/issue-2555-disabled-tools-merge
fix: union disabled_tools in mergeConfigs() like other disabled_* arrays
2026-03-15 17:57:18 -06:00
acamq
2cf329a302 revert: remove accidentally committed built files from bce8ff3
Reverts the dist/ directory added in bce8ff3a7 ("chore: include pre-built
dist for github install"). Built artifacts should not be tracked in git.
2026-03-15 17:51:08 -06:00
acamq
e03d0e0485 Merge pull request #2585 from acamq/fix/custom-agent-summaries-completeness
fix(agents): include config agents and migrated plugin agents in customAgentSummaries
2026-03-15 17:48:50 -06:00
acamq
14d7043263 Merge pull request #2546 from acamq/fix/installer-paths
fix(installer): always use .config/opencode for CLI on Windows (#2502)
2026-03-15 17:44:39 -06:00
acamq
e8a3e549bb fix(agents): include config agents and migrated plugin agents in customAgentSummaries
PR #2424 fixed the critical bug (passing client object instead of agent
summaries array), but only included user, project, and raw plugin agents.

This adds the two missing sources:
- OpenCode native config agents (params.config.agent)
- Plugin agents with migrateAgentConfig applied before summary extraction

Ensures Sisyphus has complete awareness of all registered agent sources.

Closes #2386

Co-authored-by: NS Cola <123285105+davincilll@users.noreply.github.com>
2026-03-15 17:30:57 -06:00
Jean Philippe Wan
711aac0f0a fix: preserve atlas handoff on start-work 2026-03-15 19:04:20 -04:00
acamq
2fd6f4bf57 Merge pull request #2582 from acamq/fix/fix-install-test
fix(test): update package name to oh-my-openagent in install test
2026-03-15 16:31:56 -06:00
acamq
0f0e4c649b fix(test): update package name to oh-my-openagent in install test
The test was checking for the old package name 'oh-my-opencode'
but the plugin registration now uses 'oh-my-openagent'.
2026-03-15 16:26:35 -06:00
acamq
b7c68080b4 Merge pull request #2532 from ricatix/fix/doctor-verbose-models
fix(cli): render verbose doctor check details
2026-03-15 16:19:08 -06:00
acamq
f248c73478 Merge pull request #2507 from MoerAI/fix/issue-2287-unstable-agent-check
fix(delegate-task): only check resolved model for isUnstableAgent, not category default
2026-03-15 15:34:57 -06:00
acamq
8470a6bf1f fix(test): isolate XDG_CONFIG_HOME in Windows CLI tests
Windows CLI tests were not deleting XDG_CONFIG_HOME, making them
fragile in environments where this variable is set. getCliConfigDir()
reads XDG_CONFIG_HOME on all platforms, not just Linux.
2026-03-15 15:30:52 -06:00
acamq
f92c0931a3 fix(installer): respect XDG_CONFIG_HOME on Windows for CLI config dir 2026-03-15 08:26:41 -06:00
Ouyang Xingyuan
f2b26e5346 fix(delegate-task): add subagent turn limit and model routing transparency
原因:
- subagent 无最大步数限制,陷入 tool-call 死循环时可无限运行,造成巨额 API 费用
- category 路由将 subagent 静默切换到与父 session 不同的模型,用户完全无感知

改动:
- sync-session-poller: 新增 maxAssistantTurns 参数(默认 300),每检测到新 assistant 消息
  计数一次,超限后调用 abortSyncSession 并返回明确错误信息
- sync-task: task 完成时在返回字符串中显示实际使用的模型;若与父 session 模型不同,
  加 ⚠️ 警告提示用户发生了静默路由

影响:
- 现有行为不变,maxAssistantTurns 为可选参数,默认值 300 远高于正常任务所需轮次
- 修复 #2571:用户一个下午因 Sisyphus-Junior 死循环 + 静默路由到 Gemini 3.1 Pro
  烧掉 $350+,且 OpenCode 显示费用仅为实际的一半
2026-03-15 12:05:42 +08:00
github-actions[bot]
aa27c75ead @idrekdon has signed the CLA in code-yeongyu/oh-my-openagent#2572 2026-03-14 17:57:23 +00:00
Robin Mordasiewicz
0d1d405a72 fix(discovery): add null-safe validation for v3 array entries
Filter out null, undefined, or malformed entries in installed_plugins.json
before accessing properties. Prevents fatal crash on corrupted data.

Addresses cubic-dev-ai review feedback.
2026-03-14 05:35:12 +00:00
Robin Mordasiewicz
bc0ba843ac fix(agent-loader): convert model object to string for opencode compatibility
mapClaudeModelToOpenCode() returns {providerID, modelID} but opencode
expects model as a string. Both agent loaders now convert to
'providerID/modelID' string format before assigning to config.
2026-03-14 05:16:50 +00:00
Robin Mordasiewicz
bce8ff3a75 chore: include pre-built dist for github install 2026-03-14 04:56:50 +00:00
github-actions[bot]
5073efef48 @robinmordasiewicz has signed the CLA in code-yeongyu/oh-my-openagent#2563 2026-03-14 04:47:19 +00:00
Robin Mordasiewicz
a7f0a4cf46 fix(plugin-loader): support Claude Code v3 flat array format for installed_plugins.json 2026-03-14 04:40:27 +00:00
YeonGyu-Kim
913fcf270d remove ai slops 2026-03-14 12:48:05 +09:00
YeonGyu-Kim
c7518eae2d add skills 2026-03-14 12:45:58 +09:00
YeonGyu-Kim
0dcfcd372b feat(cli): support both oh-my-opencode and oh-my-openagent package names
Update CLI config manager to detect and handle both legacy (oh-my-opencode)
and new (oh-my-openagent) package names during installation. Migration
will automatically replace old plugin entries with the new name.

🤖 Generated with assistance of OhMyOpenCode
2026-03-14 12:45:58 +09:00
YeonGyu-Kim
6aeda598b9 feat(schema): generate oh-my-openagent schema alongside legacy schema
Update build script to generate both oh-my-opencode.schema.json (backward
compatibility) and oh-my-openagent.schema.json (new package name).
Also adds delegate-task-english-directive hook to schema.

🤖 Generated with assistance of OhMyOpenCode
2026-03-14 12:45:58 +09:00
YeonGyu-Kim
b0ab34b568 feat(shared): add plugin identity constants for package name migration
Add centralized plugin identity constants to support migration from
oh-my-opencode to oh-my-openagent. Includes both current and legacy
names for backward compatibility.

🤖 Generated with assistance of OhMyOpenCode
2026-03-14 12:45:58 +09:00
YeonGyu-Kim
a00bb8b6a7 feat(skill): integrate /get-unpublished-changes and /review-work into pre-publish-review
Phase 0 now runs /get-unpublished-changes as single source of truth
instead of manual bash commands. Phase 1 uses its output for grouping.
Layer 2 explicitly references /review-work skill flow.

🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2026-03-14 12:45:58 +09:00
github-actions[bot]
b5789bf449 @vidwade has signed the CLA in code-yeongyu/oh-my-openagent#2561 2026-03-14 02:32:16 +00:00
MoerAI
9a774f1db2 fix: union disabled_tools in mergeConfigs() like other disabled_* arrays
disabled_tools was defined in the Zod schema but omitted from
mergeConfigs(), causing project-level config to shadow user-level
disabled_tools instead of merging both sets. Add Set union and
regression test.

Closes #2555
2026-03-13 21:35:46 +09:00
github-actions[bot]
6625670079 @Yeachan-Heo has signed the CLA in code-yeongyu/oh-my-openagent#2554 2026-03-13 06:41:04 +00:00
YeonGyu-Kim
f3de122147 feat(hooks): add delegate-task-english-directive hook to enforce English for subagents
Appends bold uppercase English-only directive to explore, librarian,
oracle, and plan subagent prompts via tool.execute.before on the task tool.
2026-03-13 14:22:13 +09:00
YeonGyu-Kim
0303488906 Merge pull request #2550 from code-yeongyu/fix/deploy-blockers
fix: resolve all deployment blockers from v3.11.2→HEAD release review
2026-03-13 14:21:45 +09:00
YeonGyu-Kim
3e746c9a56 fix(review): resolve 3 review-work blocking issues 2026-03-13 14:09:36 +09:00
YeonGyu-Kim
786c7a84d0 fix(background-agent): prevent queue item loss on concurrent cancel and guard against cancelled task resurrection 2026-03-13 13:12:59 +09:00
YeonGyu-Kim
380889caa3 fix(delegate-task): add exception fallback for cleanup reason and correct test mock status type 2026-03-13 13:08:50 +09:00
YeonGyu-Kim
04b0c6f33c fix(atlas): pause after final verification wave for explicit user approval 2026-03-13 12:43:33 +09:00
YeonGyu-Kim
fd71c89b95 fix(background-agent): release descendant quota on pre-start task cancellation and creation failure 2026-03-13 12:37:33 +09:00
YeonGyu-Kim
11df83713e refactor(preemptive-compaction): use shared context-limit resolver to eliminate duplicated logic 2026-03-13 12:36:07 +09:00
YeonGyu-Kim
457f303adf fix(background-agent): clean global subagentSessions and SessionCategoryRegistry on dispose 2026-03-13 10:56:44 +09:00
YeonGyu-Kim
0015dd88af fix(agent-config): normalize agent names before builtin override filtering to prevent alias bypass 2026-03-13 10:55:51 +09:00
YeonGyu-Kim
9bce6314b1 fix(runtime-fallback): scope visible-assistant check to current turn and cleanup retry dedupe keys 2026-03-13 10:54:47 +09:00
YeonGyu-Kim
cbe113ebab fix(slashcommand): support parent config dirs in command execution path to match discovery 2026-03-13 10:54:15 +09:00
YeonGyu-Kim
e3f6c12347 fix(atlas): restrict idle-event session append to boulder-owned subagent sessions only 2026-03-13 10:53:45 +09:00
YeonGyu-Kim
b356c50285 fix(delegate-task): cancel child background tasks on parent abort and timeout in unstable agent flow 2026-03-13 10:49:44 +09:00
YeonGyu-Kim
38938508fa test(model-fallback): update snapshots and kimi model expectations for opencode-go integration 2026-03-13 10:48:05 +09:00
YeonGyu-Kim
2c8a8eb4f1 fix(gpt-permission-continuation): add per-session consecutive auto-continue cap to prevent infinite loops 2026-03-13 10:48:00 +09:00
acamq
6b2da3c59b fix(installer): always use .config/opencode for CLI on Windows (#2502) 2026-03-12 17:46:52 -06:00
djdembeck
a7a7799b44 fix(agents): add termination criteria to Sisyphus-Junior default 2026-03-12 16:09:51 -05:00
github-actions[bot]
825e854cff @cpkt9762 has signed the CLA in code-yeongyu/oh-my-openagent#2539 2026-03-12 20:17:38 +00:00
cpkt9762
11e9276498 fix(delegate-task): build categoryModel with variant for categories without fallback chain
When a category has no CATEGORY_MODEL_REQUIREMENTS entry (e.g.
user-defined categories like solana-re), the !requirement branch
set actualModel but never built categoryModel with variant from
the user config. The bottom fallback then created categoryModel
via parseModelString alone, silently dropping the variant.

Mirror the requirement branch logic: read variant from
userCategories and resolved.config, and build categoryModel
with it.

Fixes #2538
2026-03-13 04:15:17 +08:00
conversun
088844474a fix(todo-continuation-enforcer): tighten post-compaction guard with session-agent fallback
Refine continuation agent resolution to prefer session-state agent fallback while keeping compaction-specific protection. Replace sticky boolean compaction flag with a short-lived timestamp guard so unresolved agents are blocked only during the immediate post-compaction window, avoiding long-lived suppression and preserving existing continuation behavior.
2026-03-13 00:55:37 +08:00
github-actions[bot]
4226808432 @Gujiassh has signed the CLA in code-yeongyu/oh-my-openagent#2524 2026-03-12 16:36:59 +00:00
conversun
22b4b30dd7 fix(todo-continuation-enforcer): prevent post-compaction agent fallback to General
After compaction, message history is truncated and the original agent
(e.g. Prometheus) can no longer be resolved from messages. The todo
continuation enforcer would then inject a continuation prompt with
agent=undefined, causing the host to default to General -- which has
write permissions Prometheus should never have.

Root cause chain:
1. handler.ts had no session.compacted handler (unlike Atlas)
2. idle-event.ts relied on finding a compaction marker in truncated
   message history -- the marker disappears after real compaction
3. continuation-injection.ts proceeded when agentName was undefined
   because the skipAgents check only matched truthy agent names
4. prometheus-md-only/agent-resolution.ts did not filter compaction
   agent from message history fallback results

Fixes:
- Add session.compacted handler that sets hasRecentCompaction state flag
- Replace fragile history-based compaction detection with state flag
- Block continuation injection when agent is unknown post-compaction
- Filter compaction agent in Prometheus agent resolution fallback
2026-03-13 00:36:03 +08:00
Gujiassh
1e0823a0fc fix(delegate-task): report the real background task id
Keep background task metadata aligned with the background_output contract so callers do not pass a session id where the task manager expects a background task id.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-13 01:25:13 +09:00
github-actions[bot]
0412e40780 @ricatix has signed the CLA in code-yeongyu/oh-my-openagent#2532 2026-03-12 15:23:10 +00:00
ricatix
63ac37cd29 fix(cli): render verbose doctor check details
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 22:20:48 +07:00
github-actions[bot]
18cbaadb52 @xodn348 has signed the CLA in code-yeongyu/oh-my-openagent#2531 2026-03-12 15:14:20 +00:00
github-actions[bot]
27538dcfe6 @apple-ouyang has signed the CLA in code-yeongyu/oh-my-openagent#2528 2026-03-12 14:39:21 +00:00
YeonGyu-Kim
e4e5f159f9 fix(tmux): wrap opencode attach commands in zsh -c shell
🤖 Generated with assistance of OhMyOpenCode
2026-03-12 20:12:38 +09:00
YeonGyu-Kim
4f4e53b436 feat(skill): re-read skills and commands from disk on every invocation
Removes in-memory caching so newly created skills mid-session are
immediately available via skill(). Clears the module-level skill cache
before each getAllSkills() call. Pre-provided skills from options are
merged as fallbacks for test compatibility.
2026-03-12 20:03:58 +09:00
Gujiassh
edfa411684 fix(session-manager): match todo filenames exactly
Stop sibling session IDs from colliding in stable JSON storage by requiring an exact todo filename match instead of a substring filter.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 19:58:57 +09:00
YeonGyu-Kim
55b80fb7cd fix(skill-loader): discover skills from parent config dir when using profiles
OPENCODE_CONFIG_DIR pointing to profiles/ subdirectory caused skills at
~/.config/opencode/skills/ to be invisible. Added getOpenCodeSkillDirs()
with the same parent-dir fallback that getOpenCodeCommandDirs() uses.
2026-03-12 19:53:30 +09:00
YeonGyu-Kim
c85b6adb7d chore: gitignore platform binary sourcemaps and untrack existing ones 2026-03-12 19:53:20 +09:00
YeonGyu-Kim
a400adae97 feat(skill): render skills as slash commands in available items list
Skills now appear as <command> items with / prefix (e.g., /review-work)
instead of <skill> items, making them discoverable alongside regular
slash commands in the skill tool description.
2026-03-12 18:53:44 +09:00
YeonGyu-Kim
50638cf783 test(hooks): fix test isolation in session-notification-sender tests
Use namespace import pattern (import * as sender) to prevent cross-file
spy leakage in Bun's shared module state. Move restoreAllMocks to
beforeEach for proper cleanup ordering.

🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance
2026-03-12 18:37:10 +09:00
YeonGyu-Kim
8e3829f63a test(auto-slash-command): add tests for skills as slash commands 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
b4e01e9987 feat(slashcommand): support parent opencode config dirs for command discovery 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
8c2385fe31 feat(hooks): add quiet and nothrow to notification shell executions 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
c3ab066335 feat(shared): export opencode-command-dirs module 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
7937f9d777 feat(shared): add opencode-command-dirs utility for multi-level command discovery 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
53c65a7e63 feat(cli): add sisyphus-junior model fallback requirements
Add CLI_AGENT_MODEL_REQUIREMENTS entry for sisyphus-junior with
fallback chain: claude-sonnet-4-6 -> kimi-k2.5 -> big-pickle.

🤖 Generated with assistance of OhMyOpenCode
2026-03-12 18:19:06 +09:00
YeonGyu-Kim
8f6b952dc0 feat(prometheus): require explicit user approval in Final Verification Wave
Add mandatory explicit user okay before completing work in Final
Verification Wave. Present consolidated results and wait for user
confirmation before marking tasks complete.

🤖 Generated with assistance of OhMyOpenCode
2026-03-12 18:19:06 +09:00
YeonGyu-Kim
e0bf0eb7cf docs: add opencode-go provider tier documentation 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
a9fde452ac feat(opencode-go): update on-complete hook for provider display 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
338379941d feat(opencode-go): integrate into model fallback chain resolution 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
44d602b7e5 feat(opencode-go): integrate installer with config detection 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
66ec9f58ee feat(opencode-go): add CLI install flag and TUI prompts 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
89d1e105a8 feat(opencode-go): add model requirements for go-tier models 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
504b68f2ac feat(opencode-go): add provider type and availability detection 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
2bbbdc4ca9 refactor(github-triage): rewrite as read-only report-based analyzer 2026-03-12 18:19:06 +09:00
YeonGyu-Kim
ca7c0e391e fix(bun-install): default outputMode to "pipe" to prevent TUI stdout leak
runBunInstallWithDetails() defaulted to outputMode:"inherit", causing
bun install stdout/stderr to leak into the TUI when callers omitted the
option. Changed default to "pipe" so output is captured silently.

Also fixed stale mock in background-update-check.test.ts: the test was
mocking runBunInstall (unused) instead of runBunInstallWithDetails, and
returning boolean instead of BunInstallResult.
2026-03-12 18:19:06 +09:00
YeonGyu-Kim
81301a6071 feat: skip model resolution for delegated tasks when provider cache not yet created
Before provider cache exists (first run), resolveModelForDelegateTask now
returns undefined instead of guessing a model. This lets OpenCode use its
system default model when no model is specified in the prompt body.

User-specified model overrides still take priority regardless of cache state.
2026-03-12 18:19:06 +09:00
YeonGyu-Kim
62883d753f Merge pull request #2519 from code-yeongyu/fix/ultrawork-variant-no-max-override
fix: skip ultrawork variant override without SDK validation + add porcelain worktree parser
2026-03-12 17:27:57 +09:00
YeonGyu-Kim
c9d30f8be3 feat: add porcelain worktree parser with listWorktrees and parseWorktreeListPorcelain
Introduce git worktree list --porcelain parsing following upstream opencode patterns. Exports listWorktrees() for full worktree enumeration with branch info alongside existing detectWorktreePath().

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 17:25:10 +09:00
YeonGyu-Kim
2210997c89 fix: skip ultrawork variant override when SDK validation unavailable
When provider.list is not available for SDK validation, do not apply the configured ultrawork variant. This prevents models without a max variant from being incorrectly forced to max when ultrawork mode activates.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 17:24:54 +09:00
YeonGyu-Kim
feb2160a7a Merge pull request #2518 from code-yeongyu/fix-2499-ulw-oracle-verified-loop
Keep ulw-loop running until Oracle verifies completion
2026-03-12 17:15:49 +09:00
YeonGyu-Kim
37c7231a50 test: isolate connected providers cache test setup
Prevent the cache test from deleting the user cache directory and add a regression test for that setup path.

Co-authored-by: Codex <noreply@openai.com>
2026-03-12 17:08:06 +09:00
YeonGyu-Kim
1812c9f054 test(ralph-loop): cover overlapping ultrawork loops
Lock down stale-session and overwrite cases so a previous ULW verification flow cannot complete or mutate a newer loop.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 17:05:02 +09:00
YeonGyu-Kim
f31537f14c fix(ralph-loop): continue ultrawork until oracle verifies
Keep /ulw-loop iterating after the main session emits DONE so completion still depends on an actual Oracle VERIFIED result.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 17:00:25 +09:00
YeonGyu-Kim
e763885df1 Merge pull request #2516 from code-yeongyu/fix/hashline-strict-whitespace-hash
fix(hashline): use strict whitespace hashing (trimEnd only, preserve leading indentation)
2026-03-12 16:52:30 +09:00
YeonGyu-Kim
0cbc15da96 fix(hashline): use strict whitespace hashing (trimEnd only, preserve leading indentation)
Previously computeLineHash stripped ALL whitespace before hashing, making
indentation changes invisible to hash validation. This weakened the stale-line
detection guarantee, especially for indentation-sensitive files (Python, YAML).

Now only trailing whitespace and carriage returns are stripped, matching
oh-my-pi upstream behavior. Leading indentation is preserved in the hash,
so indentation-only changes correctly trigger hash mismatches.
2026-03-12 16:42:41 +09:00
YeonGyu-Kim
04b0d62a55 feat(session-notification): include session context in ready notifications
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 15:29:21 +09:00
YeonGyu-Kim
943f31f460 feat(session-notification): add ready notification content builder
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 15:29:21 +09:00
YeonGyu-Kim
8e1a4dffa9 Merge pull request #2486 from code-yeongyu/fix/issue-2357-child-session-fallback
fix: enable runtime fallback for delegated child sessions (#2357)
2026-03-12 13:53:24 +09:00
YeonGyu-Kim
abc4b2a6a4 fix(runtime-fallback): remove committed rebase conflict markers
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 13:49:46 +09:00
YeonGyu-Kim
d8da2f1ad6 fix(runtime-fallback): clear retry keys on failed session bootstrap
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 13:39:30 +09:00
YeonGyu-Kim
62a905b690 fix(runtime-fallback): reuse normalized messages for visible assistant checks
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 13:39:30 +09:00
YeonGyu-Kim
79fb746a1c fix(runtime-fallback): resolve agents from normalized session messages
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 13:39:30 +09:00
YeonGyu-Kim
fcd4fa5164 fix(runtime-fallback): normalize retry part message extraction
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 13:39:30 +09:00
YeonGyu-Kim
6a4a3322c1 fix(runtime-fallback): add session messages extractor
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 13:39:30 +09:00
YeonGyu-Kim
3caa3fcc3d fix: address Cubic findings for runtime fallback child sessions
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 13:39:30 +09:00
YeonGyu-Kim
ba86ef0eea fix: enable runtime fallback for delegated child sessions (#2357) 2026-03-12 13:39:04 +09:00
MoerAI
eb79d29696 fix(delegate-task): only check resolved model for isUnstableAgent, not default (#2287) 2026-03-12 12:48:29 +09:00
acamq
4ded45d14c Merge pull request #2446 from win0na/fix/momus-key-trigger-specificity
fix(momus): make keyTrigger specify file-path-only invocation requirement
2026-03-11 20:34:08 -06:00
acamq
9032eeaa68 Merge pull request #2419 from guazi04/fix/serverurl-throw-getter
fix(tmux): handle serverUrl throw getter from upstream opencode refactor
2026-03-11 20:32:38 -06:00
YeonGyu-Kim
3ea23561f2 Merge pull request #2488 from code-yeongyu/fix/issue-2295-fallback-provider-preserve
fix: preserve session provider context in fallback chain
2026-03-12 11:24:43 +09:00
YeonGyu-Kim
0cdbd15f74 Merge pull request #2487 from code-yeongyu/fix/issue-2431-lsp-path-resolution
fix: unify LSP server PATH resolution between detection and spawn
2026-03-12 11:24:41 +09:00
YeonGyu-Kim
60e6f6d4f3 Merge pull request #2484 from code-yeongyu/fix/issue-2393-cubic-error-name
fix: add FreeUsageLimitError to RETRYABLE_ERROR_NAMES set
2026-03-12 11:24:37 +09:00
YeonGyu-Kim
b00fc89dfa Merge pull request #2458 from code-yeongyu/fix/memory-leaks
fix: resolve 12 memory leaks (3 critical + 9 high)
2026-03-12 11:21:13 +09:00
YeonGyu-Kim
2912b6598c fix: address Cubic findings for provider preserve fallback
- Reorder resolveFallbackProviderID: providerHint now checked before global connected-provider cache
- Revert require('bun:test') hack to standard ESM import in fallback-chain-from-models.test.ts

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 11:05:31 +09:00
YeonGyu-Kim
755efe226e fix: address Cubic findings for FreeUsageLimitError classification
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 11:05:26 +09:00
YeonGyu-Kim
6014f03ed2 fix: address Cubic finding for LSP server npm bin path
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 11:04:43 +09:00
YeonGyu-Kim
2b4a5ca5da test(agent-variant): restore hephaestus openai case
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 11:04:43 +09:00
YeonGyu-Kim
4157c2224f fix(background-agent): clear pending parent on silent cancel
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 11:04:35 +09:00
YeonGyu-Kim
d253f267c3 fix(skill-mcp-manager): guard stale client cleanup
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 11:04:28 +09:00
YeonGyu-Kim
d83f875740 fix(call-omo-agent): track reused sync sessions
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 11:04:20 +09:00
github-actions[bot]
5da347c3ec @ChicK00o has signed the CLA in code-yeongyu/oh-my-openagent#2499 2026-03-12 01:26:01 +00:00
github-actions[bot]
e5706bba48 @djdembeck has signed the CLA in code-yeongyu/oh-my-openagent#2497 2026-03-12 00:48:45 +00:00
acamq
f6ae3a4c64 Merge pull request #2493 from acamq/fix/fallback-test-regression
fix(test): update agent-variant test model to gpt-5.4
2026-03-11 15:47:23 -06:00
acamq
9832f7b52e fix(test): update agent-variant test model to gpt-5.4 2026-03-11 15:43:03 -06:00
acamq
5f3f8bb1d3 Merge pull request #2492 from acamq/fix/prometheus-test-regressions
test: update ultrabrain model expectations to gpt-5.4
2026-03-11 15:25:13 -06:00
acamq
2d6be11fa0 test: update ultrabrain model expectations to gpt-5.4
The DEFAULT_CATEGORIES ultrabrain model was updated from openai/gpt-5.3-codex
to openai/gpt-5.4 in a previous commit, but test expectations were not updated.

Updated test expectations in:
- src/plugin-handlers/config-handler.test.ts (lines 560, 620)
- src/agents/utils.test.ts (lines 1119, 1232, 1234, 1301, 1303, 1316, 1318)
2026-03-11 15:18:29 -06:00
acamq
5f419b7d9d Merge pull request #2473 from code-yeongyu/fix/sync-package-json-to-opencode-intent
fix(auto-update): sync cache package.json to opencode.json intent
2026-03-11 14:51:49 -06:00
acamq
d08754d1b4 fix(auto-update): pipe bun install output and restore other-deps preservation test
background-update-check.ts was using runBunInstall() which defaults to outputMode:"inherit", leaking bun install stdout/stderr into the background session. Reverted to runBunInstallWithDetails({ outputMode: "pipe" }) and explicitly logs result.error on failure.

Restores the accidentally deleted test case asserting that sibling dependencies (e.g. other:"1.0.0") are preserved in package.json after a plugin version sync.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-11 13:28:12 -06:00
acamq
e6e32d345e fix(auto-update): expand semver regex to support hyphenated prerelease tags
The previous pattern `(-[\w.]+)?` used `\w` which excludes hyphens, causing versions like `1.2.3-alpha-1` and `1.2.3-rc-test` to be misclassified as unpinned tags. Updated both plugin-entry.ts and sync-package-json.ts (which share the definition) to the spec-compliant pattern that allows dot-separated identifiers using [0-9A-Za-z-] and optional build metadata.

Also adds String() coercion before .trim() in sync-package-json.ts to guard against a TypeError if the parsed JSON value for currentVersion is non-string at runtime.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-11 13:28:04 -06:00
YeonGyu-Kim
7c89a2acf6 test: update gpt-5.4 fallback expectations
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 02:24:47 +09:00
YeonGyu-Kim
57b4985424 fix(background-agent): delay session error task cleanup
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 02:24:42 +09:00
YeonGyu-Kim
f9c8392179 fix(tmux-subagent): cap stale close retries
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 02:24:35 +09:00
YeonGyu-Kim
cbb378265e fix(skill-mcp-manager): drop superseded stale clients
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 02:24:29 +09:00
YeonGyu-Kim
7997606892 fix(call-omo-agent): preserve reused session tracking
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 02:24:22 +09:00
YeonGyu-Kim
99730088ef fix: remove contaminated await change from FreeUsageLimitError PR
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:51:25 +09:00
YeonGyu-Kim
7870e43578 fix: preserve session provider context in fallback chain (#2295) 2026-03-12 01:49:16 +09:00
YeonGyu-Kim
9b792c3224 Merge pull request #2485 from code-yeongyu/fix/issue-2316-tool-after-error-boundary
fix: add error boundary around extract/discard hooks in tool-execute-after
2026-03-12 01:46:51 +09:00
YeonGyu-Kim
9d0b56d375 fix: unify LSP server PATH resolution between detection and spawn (#2431) 2026-03-12 01:44:06 +09:00
YeonGyu-Kim
305389bd7f fix: add error boundary around extract/discard hooks in tool-execute-after (#2316) 2026-03-12 01:41:07 +09:00
YeonGyu-Kim
e249333898 test(skill-mcp-manager): cover pending cleanup registration retention
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:40:34 +09:00
YeonGyu-Kim
810dd5848f test(skill-mcp-manager): cover disposed guard after disconnectAll
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:40:34 +09:00
YeonGyu-Kim
079c6b17b0 fix: add FreeUsageLimitError to RETRYABLE_ERROR_NAMES set (#2393) 2026-03-12 01:40:24 +09:00
YeonGyu-Kim
aa1aad3bb1 fix: add disposed guard to MCP manager and guard unregister on pending connections 2026-03-12 01:37:03 +09:00
YeonGyu-Kim
f564404015 fix: address review-work round 6 findings (dispose isolation, event dispatch, disconnectedSessions ref-counting) 2026-03-12 01:37:03 +09:00
YeonGyu-Kim
cf276322a3 fix(background-agent): handle async shutdown in process-cleanup signal handlers 2026-03-12 01:37:03 +09:00
YeonGyu-Kim
2c3c447dc4 fix: address review-work round 3 findings (async shutdown, signal generation, stale test name) 2026-03-12 01:37:03 +09:00
YeonGyu-Kim
ff536e992a fix: address review-work round 2 findings
- MCP teardown race: add shutdownGeneration counter to prevent
  in-flight connections from resurrecting after disconnectAll
- MCP multi-key disconnect race: replace disconnectedSessions Set
  with generation-based Map to track per-session disconnect events
- MCP clients: check shutdownGeneration in stdio/http client
  creators before inserting into state.clients
- BackgroundManager: call clearTaskHistoryWhenParentTasksGone after
  timer-based task removal in scheduleTaskRemoval and notifyParentSession
- BackgroundManager: clean completedTaskSummaries when parent has
  no remaining tasks
- Plugin dispose: remove duplicate tmuxSessionManager.cleanup call
  since BackgroundManager.shutdown already handles it via onShutdown
2026-03-12 01:37:03 +09:00
YeonGyu-Kim
03eaa429ce fix: address 5 edge cases from review-work findings
- C3: include command args in auto-slash-command dedup key
- H2: track completed task summaries for ALL COMPLETE message
- H9: increment tmux close retry count on re-mark
- H8: detect stale MCP connections after disconnect+reconnect race
- H8: guard disconnectedSessions growth for non-MCP sessions
- C1: await tmux cleanup in plugin dispose lifecycle
2026-03-12 01:37:03 +09:00
YeonGyu-Kim
b8aea50dfa test(background-agent): update completion timer test for per-task cleanup
Test expected timers only after allComplete, but H2 fix intentionally
decoupled per-task cleanup from sibling completion state. Updated
assertion to expect timer after individual task notification.
2026-03-12 01:37:03 +09:00
YeonGyu-Kim
deaac8cb39 fix(plugin): add dispose lifecycle for full teardown on reload
Plugin created managers, hooks, intervals, and process listeners on
every load but had no teardown mechanism. On plugin reload, old
instances remained alive causing cumulative memory leaks.

- Add createPluginDispose() orchestrating shutdown sequence:
  backgroundManager.shutdown() → skillMcpManager.disconnectAll() →
  disposeHooks()
- Add disposeHooks() aggregator with safe optional chaining
- Wire dispose into index.ts to clean previous instance on reload
- Make dispose idempotent (safe to call multiple times)

Tests: 4 pass, 8 expects
2026-03-12 01:37:03 +09:00
YeonGyu-Kim
b4e13883b1 fix(background-agent): fix 3 memory leaks in task lifecycle management
H3: cancelTask(skipNotification=true) now schedules task removal.
Previously the early return path skipped cleanup, leaking task objects
in this.tasks Map permanently. Extracted scheduleTaskRemoval() helper
called from both skipNotification and normal paths.

H2: Per-task completion cleanup timer decoupled from allComplete check.
Previously cleanup timer only ran when ALL sibling tasks completed. Now
each finished task gets its own removal timer regardless of siblings.

H1+C2: TaskHistory.clearAll() added and wired into shutdown(). Added
clearSession() calls on session error/deletion and prune cycles.
taskHistory was the only data structure missed by shutdown().

Tests: 10 pass (3 cancel + 3 completion + 4 history)
2026-03-12 01:37:03 +09:00
YeonGyu-Kim
d1fc6629c2 fix(skill-mcp-manager): remove process listeners on disconnect and guard connection races
H7: Process 'exit'/'SIGINT' listeners registered per-session were
never removed when all sessions disconnected, accumulating handlers.
- Add unregisterProcessCleanup() called in disconnectAll()

H8: Race condition where disconnectSession() during pending connection
left orphan clients in state.clients.
- Add disconnectedSessions Set to track mid-flight disconnects
- Check disconnect marker after connection resolves, close if stale
- Clear marker on reconnection for same session

Tests: 6 pass (3 disconnect + 3 race)
2026-03-12 01:37:03 +09:00
YeonGyu-Kim
fed720dd11 fix(tmux-subagent): retry pending pane closes to prevent zombie panes
When queryWindowState returned null during session deletion, the
session mapping was deleted but the real tmux pane stayed alive,
creating zombie panes.

- Add closePending/closeRetryCount fields to TrackedSession
- Mark sessions closePending instead of deleting on close failure
- Add retryPendingCloses() called from onSessionCreated and cleanup
- Force-remove mappings after 3 failed retry attempts
- Extract TrackedSessionState helper for field initialization

Tests: 3 pass, 9 expects
2026-03-12 01:37:02 +09:00
YeonGyu-Kim
a2f030e699 fix(todo-continuation-enforcer): expose prune interval for cleanup
Prune interval created inside hook was not exposed for disposal,
preventing cleanup on plugin unload.

- Add dispose() method that clears the prune interval
- Export dispose in hook return type

Tests: 2 pass, 6 expects
2026-03-12 01:37:02 +09:00
YeonGyu-Kim
2d2ca863f1 fix(runtime-fallback): clear monitoring interval on dispose
setInterval for model availability monitoring was never cleared,
keeping the hook alive indefinitely with no dispose mechanism.

- Add dispose() method to RuntimeFallbackHook that clears interval
- Track intervalId in hook state for cleanup
- Export dispose in hook return type

Tests: 3 pass, 10 expects
2026-03-12 01:37:02 +09:00
YeonGyu-Kim
f342dcfa12 fix(call-omo-agent): add finally cleanup for sync executor session Sets
Sync call_omo_agent leaked entries in global activeSessionMessages
and activeSessionToolResults Sets when execution threw errors,
since cleanup only ran on success path.

- Wrap session Set operations in try/finally blocks
- Ensure Set.delete() runs regardless of success/failure
- Add guard against double-cleanup

Tests: 2 pass, 14 expects
2026-03-12 01:37:02 +09:00
YeonGyu-Kim
7904410294 fix(auto-slash-command): bound Set growth with TTL eviction and session cleanup
processedCommands and recentResults Sets grew infinitely because
Date.now() in dedup keys made deduplication impossible and no
session.deleted cleanup existed.

- Extract ProcessedCommandStore with maxSize cap and TTL-based eviction
- Add session cleanup on session.deleted event
- Remove Date.now() from dedup keys for effective deduplication
- Add dispose() for interval cleanup

Tests: 3 pass, 9 expects
2026-03-12 01:37:02 +09:00
YeonGyu-Kim
3822423069 Merge pull request #2482 from code-yeongyu/fix/issue-2407-binary-version-embed
fix: sync root package.json version before binary compile
2026-03-12 01:34:33 +09:00
YeonGyu-Kim
e26088ba8f Merge pull request #2481 from code-yeongyu/fix/issue-2185-lsp-notification-params
fix: use rest params in LSP sendNotification to avoid undefined serialization
2026-03-12 01:34:29 +09:00
YeonGyu-Kim
7998667a86 Merge pull request #2480 from code-yeongyu/fix/issue-2356-preemptive-compaction-limit
fix: skip preemptive compaction when model context limit is unknown
2026-03-12 01:34:25 +09:00
YeonGyu-Kim
9eefbfe310 fix: restore await on metadata call in create-background-task (#2441) 2026-03-12 01:34:16 +09:00
YeonGyu-Kim
ef2017833d Merge pull request #2425 from MoerAI/fix/issue-2408-gemini-vertex-edit-schema
fix(hashline-edit): remove array type from lines union to fix Gemini Vertex schema validation
2026-03-12 01:32:37 +09:00
YeonGyu-Kim
994b9a724b Merge pull request #2424 from MoerAI/fix/issue-2386-custom-agent-summaries
fix(agents): pass custom agent summaries instead of client object to createBuiltinAgents
2026-03-12 01:32:35 +09:00
YeonGyu-Kim
142f8ac7d1 Merge pull request #2422 from MoerAI/fix/issue-2393-model-fallback-defaults
fix(model-fallback): enable by default and add missing error patterns for usage limits
2026-03-12 01:32:34 +09:00
YeonGyu-Kim
f5be99f911 Merge pull request #2420 from MoerAI/fix/issue-2375-run-in-background-default
fix(delegate-task): default run_in_background to false when orchestrator intent is detected
2026-03-12 01:32:31 +09:00
YeonGyu-Kim
182fe746fc Merge pull request #2476 from code-yeongyu/fix/issue-2441-session-id-pending
fix: omit sessionId from metadata when not yet assigned
2026-03-12 01:32:30 +09:00
YeonGyu-Kim
f61ee25282 Merge pull request #2475 from code-yeongyu/fix/issue-2300-compaction-event-dispatch
fix: register preemptive-compaction event handler in dispatchToHooks
2026-03-12 01:32:29 +09:00
YeonGyu-Kim
08b411fc3b fix: use rest params in LSP sendNotification to avoid undefined serialization (#2185)
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:24:42 +09:00
YeonGyu-Kim
26091b2f48 fix: skip preemptive compaction when model context limit is unknown (#2356) 2026-03-12 01:24:16 +09:00
YeonGyu-Kim
afe3792ecf docs(config): correct background task default timeout description
Keep the background_task schema comment aligned with the runtime default so timeout guidance stays accurate.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:14:43 +09:00
YeonGyu-Kim
aaa54858a3 fix(background-agent): extend default no-progress stale timeout to 30 minutes
Give never-updated background tasks a longer default window and keep the default-threshold regression coverage aligned with that behavior.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:14:35 +09:00
YeonGyu-Kim
6d5175b9b0 fix(delegate-task): extend default sync poll timeout to 30 minutes
Keep synchronous subagent runs from timing out after 10 minutes when no explicit override is configured.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:14:26 +09:00
YeonGyu-Kim
f6125c5efa docs: refresh category model variant references
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:08:07 +09:00
YeonGyu-Kim
004f504e6c fix(agents): keep oracle available on first run without cache
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:07:57 +09:00
YeonGyu-Kim
f4f54c2b7f test(ralph-loop): remove volatile tool result timestamp
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:07:50 +09:00
YeonGyu-Kim
b9369d3c89 fix(config): preserve disabled arrays during partial parsing
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:07:43 +09:00
YeonGyu-Kim
88568398ac fix: sync root package.json version before binary compile (#2407) 2026-03-12 01:06:30 +09:00
YeonGyu-Kim
f2a7d227cb fix: omit sessionId from metadata when not yet assigned (#2441) 2026-03-12 01:02:12 +09:00
YeonGyu-Kim
39e799c596 docs: sync category model defaults
Update the public and internal docs to describe the new ultrabrain and unspecified-high defaults so the documented routing matches runtime behavior.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:00:41 +09:00
YeonGyu-Kim
7c29962014 fix(delegate-task): refresh built-in category defaults
Keep delegate-task category defaults in sync with the new routing policy so ultrabrain and unspecified-high resolve to the intended primary models.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:00:41 +09:00
YeonGyu-Kim
d2c2e8196b fix(shared): update category fallback priorities
Align ultrabrain with GPT-5.4 xhigh and move unspecified-high to Opus-first fallback order so category routing reflects the new model policy.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-03-12 01:00:41 +09:00
YeonGyu-Kim
4a67044cd6 fix: register preemptive-compaction event handler in dispatchToHooks (#2300) 2026-03-12 00:55:15 +09:00
acamq
c55603782c fix(auto-update): handle null JSON.parse and restore mocks on test failure 2026-03-11 08:08:30 -06:00
acamq
46a8ad279b Merge remote-tracking branch 'origin/dev' into fix/sync-package-json-to-opencode-intent 2026-03-11 08:04:16 -06:00
acamq
0764f0e563 fix(auto-update): sync cache package.json to opencode.json intent
When users switch from pinned version to tag in opencode.json (e.g.,
3.10.0 -> @latest), the cache package.json still contains the resolved
version. This causes bun install to reinstall the old version instead
of resolving the new tag.

This adds syncCachePackageJsonToIntent() which updates the cache
package.json to match user intent before running bun install. Uses
atomic writes (temp file + rename) with UUID-based temp names for
concurrent safety.

Critical changes:
- Treat all sync errors as abort conditions (file_not_found,
  plugin_not_in_deps, parse_error, write_error) to prevent corrupting
  a bad cache state further
- Remove dead code (unreachable revert branch for pinned versions)
- Add tests for all error paths and atomic write cleanup
2026-03-11 07:42:08 -06:00
Winona Bryan
d62a586be4 fix(momus): make keyTrigger specify file-path-only invocation requirement
The previous keyTrigger ('Work plan created → invoke Momus') was too
vague — Sisyphus would fire Momus on inline plans or todo lists,
causing Momus to REJECT because its input_extraction requires exactly
one .sisyphus/plans/*.md file path.

The updated trigger explicitly states:
- Momus should only be invoked when a plan file exists on disk
- The file path must be the sole prompt content
- Inline plans and todo lists should NOT trigger Momus
2026-03-11 02:13:21 -04:00
tc9011
6d8bc95fa6 fix: github copilot model version for Sisyphus agent 2026-03-11 10:34:25 +08:00
MoerAI
204322b120 fix(hashline-edit): remove array type from lines union to fix Gemini Vertex schema validation (#2408) 2026-03-10 17:18:14 +09:00
MoerAI
46c3bfcf1f fix(agents): pass custom agent summaries instead of client object to createBuiltinAgents (#2386) 2026-03-10 17:10:55 +09:00
MoerAI
059853554d fix(model-fallback): enable by default and add missing error patterns for usage limits (#2393) 2026-03-10 17:04:17 +09:00
MoerAI
49b7e695ce fix(delegate-task): default run_in_background to false when orchestrator intent is detected (#2375) 2026-03-10 16:57:47 +09:00
guazi04
309a3e48ec fix(tmux): handle serverUrl throw getter from upstream opencode refactor 2026-03-10 15:45:44 +08:00
韩澍
229c6b0cdb fix(todo-sync): provide default priority to prevent SQLite NOT NULL violation
extractPriority() returns undefined when task metadata has no priority
field, but OpenCode's TodoTable requires priority as NOT NULL. This
causes a silent SQLiteError that prevents all Task→Todo syncing.

Add ?? "medium" fallback so todos always have a valid priority.
2026-03-06 23:28:58 +08:00
Stranmor
3eb97110c6 feat: support file:// URIs in agent prompt field 2026-03-03 03:32:07 +03:00
715 changed files with 88528 additions and 9068 deletions

BIN
.github/assets/building-in-public.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 278 KiB

View File

@@ -60,16 +60,33 @@ jobs:
bun test src/features/opencode-skill-loader/loader.test.ts
bun test src/hooks/anthropic-context-window-limit-recovery/recovery-hook.test.ts
bun test src/hooks/anthropic-context-window-limit-recovery/executor.test.ts
# src/shared mock-heavy files (mock.module pollutes connected-providers-cache and legacy-plugin-warning)
bun test src/shared/model-capabilities.test.ts
bun test src/shared/log-legacy-plugin-startup-warning.test.ts
bun test src/shared/model-error-classifier.test.ts
bun test src/shared/opencode-message-dir.test.ts
# session-recovery mock isolation (recover-tool-result-missing mocks ./storage)
bun test src/hooks/session-recovery/recover-tool-result-missing.test.ts
# legacy-plugin-toast mock isolation (hook.test.ts mocks ./auto-migrate)
bun test src/hooks/legacy-plugin-toast/hook.test.ts
- name: Run remaining tests
run: |
# Enumerate subdirectories/files explicitly to EXCLUDE mock-heavy files
# that were already run in isolation above.
# Excluded from src/shared: model-capabilities, log-legacy-plugin-startup-warning, model-error-classifier, opencode-message-dir
# Excluded from src/cli: doctor/formatter.test.ts, doctor/format-default.test.ts
# Excluded from src/tools: call-omo-agent/sync-executor.test.ts, call-omo-agent/session-creator.test.ts, session-manager (all)
# Excluded from src/hooks/anthropic-context-window-limit-recovery: recovery-hook.test.ts, executor.test.ts
# Build src/shared file list excluding mock-heavy files already run in isolation
SHARED_FILES=$(find src/shared -name '*.test.ts' \
! -name 'model-capabilities.test.ts' \
! -name 'log-legacy-plugin-startup-warning.test.ts' \
! -name 'model-error-classifier.test.ts' \
! -name 'opencode-message-dir.test.ts' \
| sort | tr '\n' ' ')
bun test bin script src/config src/mcp src/index.test.ts \
src/agents src/shared \
src/agents $SHARED_FILES \
src/cli/run src/cli/config-manager src/cli/mcp-oauth \
src/cli/index.test.ts src/cli/install.test.ts src/cli/model-fallback.test.ts \
src/cli/config-manager.test.ts \
@@ -82,6 +99,8 @@ jobs:
src/tools/call-omo-agent/background-executor.test.ts \
src/tools/call-omo-agent/subagent-session-creator.test.ts \
src/hooks/anthropic-context-window-limit-recovery/empty-content-recovery-sdk.test.ts src/hooks/anthropic-context-window-limit-recovery/parser.test.ts src/hooks/anthropic-context-window-limit-recovery/pruning-deduplication.test.ts src/hooks/anthropic-context-window-limit-recovery/recovery-deduplication.test.ts src/hooks/anthropic-context-window-limit-recovery/storage.test.ts \
src/hooks/session-recovery/detect-error-type.test.ts src/hooks/session-recovery/index.test.ts src/hooks/session-recovery/recover-empty-content-message-sdk.test.ts src/hooks/session-recovery/resume.test.ts src/hooks/session-recovery/storage \
src/hooks/legacy-plugin-toast/auto-migrate.test.ts \
src/hooks/claude-code-compatibility \
src/hooks/context-injection \
src/hooks/provider-toast \

View File

@@ -56,32 +56,82 @@ jobs:
env:
BUN_INSTALL_ALLOW_SCRIPTS: "@ast-grep/napi"
- name: Validate release inputs
id: validate
env:
INPUT_VERSION: ${{ inputs.version }}
INPUT_DIST_TAG: ${{ inputs.dist_tag }}
run: |
VERSION="$INPUT_VERSION"
DIST_TAG="$INPUT_DIST_TAG"
if ! [[ "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+(-[0-9A-Za-z]+(\.[0-9A-Za-z]+)*)?$ ]]; then
echo "::error::Invalid version: $VERSION"
exit 1
fi
if [ -n "$DIST_TAG" ] && ! [[ "$DIST_TAG" =~ ^[a-z][a-z0-9-]*$ ]]; then
echo "::error::Invalid dist_tag: $DIST_TAG"
exit 1
fi
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "dist_tag=$DIST_TAG" >> $GITHUB_OUTPUT
- name: Check if already published
id: check
env:
VERSION: ${{ steps.validate.outputs.version }}
run: |
PKG_NAME="oh-my-opencode-${{ matrix.platform }}"
VERSION="${{ inputs.version }}"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/${PKG_NAME}/${VERSION}")
# Convert platform name for output (replace - with _)
PLATFORM_KEY="${{ matrix.platform }}"
PLATFORM_KEY="${PLATFORM_KEY//-/_}"
if [ "$STATUS" = "200" ]; then
# Check oh-my-opencode
OC_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/oh-my-opencode-${{ matrix.platform }}/${VERSION}")
# Check oh-my-openagent
OA_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/oh-my-openagent-${{ matrix.platform }}/${VERSION}")
echo "oh-my-opencode-${{ matrix.platform }}@${VERSION}: ${OC_STATUS}"
echo "oh-my-openagent-${{ matrix.platform }}@${VERSION}: ${OA_STATUS}"
if [ "$OC_STATUS" = "200" ]; then
echo "skip_opencode=true" >> $GITHUB_OUTPUT
echo "✓ oh-my-opencode-${{ matrix.platform }}@${VERSION} already published"
else
echo "skip_opencode=false" >> $GITHUB_OUTPUT
echo "→ oh-my-opencode-${{ matrix.platform }}@${VERSION} needs publishing"
fi
if [ "$OA_STATUS" = "200" ]; then
echo "skip_openagent=true" >> $GITHUB_OUTPUT
echo "✓ oh-my-openagent-${{ matrix.platform }}@${VERSION} already published"
else
echo "skip_openagent=false" >> $GITHUB_OUTPUT
echo "→ oh-my-openagent-${{ matrix.platform }}@${VERSION} needs publishing"
fi
# Skip build only if BOTH are already published
if [ "$OC_STATUS" = "200" ] && [ "$OA_STATUS" = "200" ]; then
echo "skip=true" >> $GITHUB_OUTPUT
echo "skip_${PLATFORM_KEY}=true" >> $GITHUB_OUTPUT
echo "✓ ${PKG_NAME}@${VERSION} already published"
else
echo "skip=false" >> $GITHUB_OUTPUT
echo "skip_${PLATFORM_KEY}=false" >> $GITHUB_OUTPUT
echo "→ ${PKG_NAME}@${VERSION} needs publishing"
fi
- name: Update version in package.json
if: steps.check.outputs.skip != 'true'
env:
VERSION: ${{ steps.validate.outputs.version }}
run: |
VERSION="${{ inputs.version }}"
cd packages/${{ matrix.platform }}
jq --arg v "$VERSION" '.version = $v' package.json > tmp.json && mv tmp.json package.json
- name: Set root package version
if: steps.check.outputs.skip != 'true'
env:
VERSION: ${{ steps.validate.outputs.version }}
run: |
jq --arg v "$VERSION" '.version = $v' package.json > tmp.json && mv tmp.json package.json
- name: Pre-download baseline compile target
if: steps.check.outputs.skip != 'true' && endsWith(matrix.platform, '-baseline')
shell: bash
@@ -192,11 +242,6 @@ jobs:
retention-days: 1
if-no-files-found: error
# =============================================================================
# Job 2: Publish all platforms (oh-my-opencode + oh-my-openagent)
# - Runs on ubuntu-latest for ALL platforms (just downloading artifacts)
# - Uses NODE_AUTH_TOKEN for auth + OIDC for provenance attestation
# =============================================================================
publish:
needs: build
if: always() && !cancelled()
@@ -207,37 +252,60 @@ jobs:
matrix:
platform: [darwin-arm64, darwin-x64, darwin-x64-baseline, linux-x64, linux-x64-baseline, linux-arm64, linux-x64-musl, linux-x64-musl-baseline, linux-arm64-musl, windows-x64, windows-x64-baseline]
steps:
- name: Check if oh-my-opencode already published
id: check
- name: Validate release inputs
id: validate
env:
INPUT_VERSION: ${{ inputs.version }}
INPUT_DIST_TAG: ${{ inputs.dist_tag }}
run: |
PKG_NAME="oh-my-opencode-${{ matrix.platform }}"
VERSION="${{ inputs.version }}"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/${PKG_NAME}/${VERSION}")
if [ "$STATUS" = "200" ]; then
echo "skip=true" >> $GITHUB_OUTPUT
echo "✓ ${PKG_NAME}@${VERSION} already published, skipping"
else
echo "skip=false" >> $GITHUB_OUTPUT
echo "→ ${PKG_NAME}@${VERSION} will be published"
VERSION="$INPUT_VERSION"
DIST_TAG="$INPUT_DIST_TAG"
if ! [[ "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+(-[0-9A-Za-z]+(\.[0-9A-Za-z]+)*)?$ ]]; then
echo "::error::Invalid version: $VERSION"
exit 1
fi
- name: Check if oh-my-openagent already published
id: check-openagent
if [ -n "$DIST_TAG" ] && ! [[ "$DIST_TAG" =~ ^[a-z][a-z0-9-]*$ ]]; then
echo "::error::Invalid dist_tag: $DIST_TAG"
exit 1
fi
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "dist_tag=$DIST_TAG" >> $GITHUB_OUTPUT
- name: Check if already published
id: check
env:
VERSION: ${{ steps.validate.outputs.version }}
run: |
PKG_NAME="oh-my-openagent-${{ matrix.platform }}"
VERSION="${{ inputs.version }}"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/${PKG_NAME}/${VERSION}")
if [ "$STATUS" = "200" ]; then
echo "skip=true" >> $GITHUB_OUTPUT
echo "✓ ${PKG_NAME}@${VERSION} already published, skipping"
OC_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/oh-my-opencode-${{ matrix.platform }}/${VERSION}")
OA_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/oh-my-openagent-${{ matrix.platform }}/${VERSION}")
if [ "$OC_STATUS" = "200" ]; then
echo "skip_opencode=true" >> $GITHUB_OUTPUT
echo "✓ oh-my-opencode-${{ matrix.platform }}@${VERSION} already published"
else
echo "skip=false" >> $GITHUB_OUTPUT
echo "→ ${PKG_NAME}@${VERSION} will be published"
echo "skip_opencode=false" >> $GITHUB_OUTPUT
fi
if [ "$OA_STATUS" = "200" ]; then
echo "skip_openagent=true" >> $GITHUB_OUTPUT
echo "✓ oh-my-openagent-${{ matrix.platform }}@${VERSION} already published"
else
echo "skip_openagent=false" >> $GITHUB_OUTPUT
fi
# Need artifact if either package needs publishing
if [ "$OC_STATUS" = "200" ] && [ "$OA_STATUS" = "200" ]; then
echo "skip_all=true" >> $GITHUB_OUTPUT
else
echo "skip_all=false" >> $GITHUB_OUTPUT
fi
- name: Download artifact
id: download
if: steps.check.outputs.skip != 'true' || steps.check-openagent.outputs.skip != 'true'
if: steps.check.outputs.skip_all != 'true'
continue-on-error: true
uses: actions/download-artifact@v4
with:
@@ -245,7 +313,7 @@ jobs:
path: .
- name: Extract artifact
if: (steps.check.outputs.skip != 'true' || steps.check-openagent.outputs.skip != 'true') && steps.download.outcome == 'success'
if: steps.check.outputs.skip_all != 'true' && steps.download.outcome == 'success'
run: |
PLATFORM="${{ matrix.platform }}"
mkdir -p packages/${PLATFORM}
@@ -261,45 +329,45 @@ jobs:
ls -la packages/${PLATFORM}/bin/
- uses: actions/setup-node@v4
if: (steps.check.outputs.skip != 'true' || steps.check-openagent.outputs.skip != 'true') && steps.download.outcome == 'success'
if: steps.check.outputs.skip_all != 'true' && steps.download.outcome == 'success'
with:
node-version: "24"
registry-url: "https://registry.npmjs.org"
- name: Publish ${{ matrix.platform }}
if: steps.check.outputs.skip != 'true' && steps.download.outcome == 'success'
run: |
cd packages/${{ matrix.platform }}
TAG_ARG=""
if [ -n "${{ inputs.dist_tag }}" ]; then
TAG_ARG="--tag ${{ inputs.dist_tag }}"
fi
npm publish --access public --provenance $TAG_ARG
- name: Publish oh-my-opencode-${{ matrix.platform }}
if: steps.check.outputs.skip_opencode != 'true' && steps.download.outcome == 'success'
env:
DIST_TAG: ${{ steps.validate.outputs.dist_tag }}
NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
NPM_CONFIG_PROVENANCE: true
run: |
cd packages/${{ matrix.platform }}
if [ -n "$DIST_TAG" ]; then
npm publish --access public --provenance --tag "$DIST_TAG"
else
npm publish --access public --provenance
fi
timeout-minutes: 15
- name: Publish oh-my-openagent-${{ matrix.platform }}
if: steps.check-openagent.outputs.skip != 'true' && steps.download.outcome == 'success'
if: steps.check.outputs.skip_openagent != 'true' && steps.download.outcome == 'success'
env:
DIST_TAG: ${{ steps.validate.outputs.dist_tag }}
NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
NPM_CONFIG_PROVENANCE: true
run: |
cd packages/${{ matrix.platform }}
# Rename package for oh-my-openagent
jq --arg name "oh-my-openagent-${{ matrix.platform }}" \
--arg desc "Platform-specific binary for oh-my-openagent (${{ matrix.platform }})" \
'.name = $name | .description = $desc | .bin = {"oh-my-openagent": (.bin | to_entries | .[0].value)}' \
package.json > tmp.json && mv tmp.json package.json
TAG_ARG=""
if [ -n "${{ inputs.dist_tag }}" ]; then
TAG_ARG="--tag ${{ inputs.dist_tag }}"
if [ -n "$DIST_TAG" ]; then
npm publish --access public --provenance --tag "$DIST_TAG"
else
npm publish --access public --provenance
fi
npm publish --access public --provenance $TAG_ARG
env:
NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
NPM_CONFIG_PROVENANCE: true
timeout-minutes: 15

View File

@@ -57,32 +57,51 @@ jobs:
bun test src/cli/doctor/format-default.test.ts
bun test src/tools/call-omo-agent/sync-executor.test.ts
bun test src/tools/call-omo-agent/session-creator.test.ts
bun test src/tools/session-manager
bun test src/features/opencode-skill-loader/loader.test.ts
bun test src/hooks/anthropic-context-window-limit-recovery/recovery-hook.test.ts
bun test src/hooks/anthropic-context-window-limit-recovery/executor.test.ts
# src/shared mock-heavy files (mock.module pollutes connected-providers-cache and legacy-plugin-warning)
bun test src/shared/model-capabilities.test.ts
bun test src/shared/log-legacy-plugin-startup-warning.test.ts
bun test src/shared/model-error-classifier.test.ts
bun test src/shared/opencode-message-dir.test.ts
# session-recovery mock isolation (recover-tool-result-missing mocks ./storage)
bun test src/hooks/session-recovery/recover-tool-result-missing.test.ts
# legacy-plugin-toast mock isolation (hook.test.ts mocks ./auto-migrate)
bun test src/hooks/legacy-plugin-toast/hook.test.ts
- name: Run remaining tests
run: |
# Enumerate subdirectories/files explicitly to EXCLUDE mock-heavy files
# that were already run in isolation above.
# Excluded from src/shared: model-capabilities, log-legacy-plugin-startup-warning, model-error-classifier, opencode-message-dir
# Excluded from src/cli: doctor/formatter.test.ts, doctor/format-default.test.ts
# Excluded from src/tools: call-omo-agent/sync-executor.test.ts, call-omo-agent/session-creator.test.ts
# Excluded from src/tools: call-omo-agent/sync-executor.test.ts, call-omo-agent/session-creator.test.ts, session-manager (all)
# Excluded from src/hooks/anthropic-context-window-limit-recovery: recovery-hook.test.ts, executor.test.ts
# Excluded from src/tools: call-omo-agent/sync-executor.test.ts, call-omo-agent/session-creator.test.ts
# Build src/shared file list excluding mock-heavy files already run in isolation
SHARED_FILES=$(find src/shared -name '*.test.ts' \
! -name 'model-capabilities.test.ts' \
! -name 'log-legacy-plugin-startup-warning.test.ts' \
! -name 'model-error-classifier.test.ts' \
! -name 'opencode-message-dir.test.ts' \
| sort | tr '\n' ' ')
bun test bin script src/config src/mcp src/index.test.ts \
src/agents src/shared \
src/agents $SHARED_FILES \
src/cli/run src/cli/config-manager src/cli/mcp-oauth \
src/cli/index.test.ts src/cli/install.test.ts src/cli/model-fallback.test.ts \
src/cli/config-manager.test.ts \
src/cli/doctor/runner.test.ts src/cli/doctor/checks \
src/tools/ast-grep src/tools/background-task src/tools/delegate-task \
src/tools/glob src/tools/grep src/tools/interactive-bash \
src/tools/look-at src/tools/lsp src/tools/session-manager \
src/tools/look-at src/tools/lsp \
src/tools/skill src/tools/skill-mcp src/tools/slashcommand src/tools/task \
src/tools/call-omo-agent/background-agent-executor.test.ts \
src/tools/call-omo-agent/background-executor.test.ts \
src/tools/call-omo-agent/subagent-session-creator.test.ts \
src/hooks/anthropic-context-window-limit-recovery/empty-content-recovery-sdk.test.ts src/hooks/anthropic-context-window-limit-recovery/parser.test.ts src/hooks/anthropic-context-window-limit-recovery/pruning-deduplication.test.ts src/hooks/anthropic-context-window-limit-recovery/recovery-deduplication.test.ts src/hooks/anthropic-context-window-limit-recovery/storage.test.ts \
src/hooks/session-recovery/detect-error-type.test.ts src/hooks/session-recovery/index.test.ts src/hooks/session-recovery/recover-empty-content-message-sdk.test.ts src/hooks/session-recovery/resume.test.ts src/hooks/session-recovery/storage \
src/hooks/legacy-plugin-toast/auto-migrate.test.ts \
src/hooks/claude-code-compatibility \
src/hooks/context-injection \
src/hooks/provider-toast \
@@ -148,33 +167,47 @@ jobs:
- name: Calculate version
id: version
env:
RAW_VERSION: ${{ inputs.version }}
BUMP: ${{ inputs.bump }}
run: |
VERSION="${{ inputs.version }}"
VERSION="$RAW_VERSION"
if [ -z "$VERSION" ]; then
PREV=$(curl -s https://registry.npmjs.org/oh-my-opencode/latest | jq -r '.version // "0.0.0"')
BASE="${PREV%%-*}"
IFS='.' read -r MAJOR MINOR PATCH <<< "$BASE"
case "${{ inputs.bump }}" in
case "$BUMP" in
major) VERSION="$((MAJOR+1)).0.0" ;;
minor) VERSION="${MAJOR}.$((MINOR+1)).0" ;;
*) VERSION="${MAJOR}.${MINOR}.$((PATCH+1))" ;;
esac
fi
if ! [[ "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+(-[0-9A-Za-z]+(\.[0-9A-Za-z]+)*)?$ ]]; then
echo "::error::Invalid version: $VERSION"
exit 1
fi
echo "version=$VERSION" >> $GITHUB_OUTPUT
if [[ "$VERSION" == *"-"* ]]; then
DIST_TAG=$(echo "$VERSION" | cut -d'-' -f2 | cut -d'.' -f1)
DIST_TAG=$(printf '%s' "$VERSION" | cut -d'-' -f2 | cut -d'.' -f1)
if ! [[ "$DIST_TAG" =~ ^[a-z][a-z0-9-]*$ ]]; then
echo "::error::Invalid dist_tag: $DIST_TAG"
exit 1
fi
echo "dist_tag=${DIST_TAG:-next}" >> $GITHUB_OUTPUT
else
echo "dist_tag=" >> $GITHUB_OUTPUT
fi
echo "Version: $VERSION"
- name: Check if already published
id: check
env:
VERSION: ${{ steps.version.outputs.version }}
run: |
VERSION="${{ steps.version.outputs.version }}"
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/oh-my-opencode/${VERSION}")
if [ "$STATUS" = "200" ]; then
echo "skip=true" >> $GITHUB_OUTPUT
@@ -185,15 +218,16 @@ jobs:
- name: Update version
if: steps.check.outputs.skip != 'true'
env:
VERSION: ${{ steps.version.outputs.version }}
run: |
VERSION="${{ steps.version.outputs.version }}"
jq --arg v "$VERSION" '.version = $v' package.json > tmp.json && mv tmp.json package.json
for platform in darwin-arm64 darwin-x64 darwin-x64-baseline linux-x64 linux-x64-baseline linux-arm64 linux-x64-musl linux-x64-musl-baseline linux-arm64-musl windows-x64 windows-x64-baseline; do
jq --arg v "$VERSION" '.version = $v' "packages/${platform}/package.json" > tmp.json
mv tmp.json "packages/${platform}/package.json"
done
jq --arg v "$VERSION" '.optionalDependencies = (.optionalDependencies | to_entries | map(.value = $v) | from_entries)' package.json > tmp.json && mv tmp.json package.json
- name: Build main package
@@ -206,68 +240,73 @@ jobs:
- name: Publish oh-my-opencode
if: steps.check.outputs.skip != 'true'
run: |
TAG_ARG=""
if [ -n "${{ steps.version.outputs.dist_tag }}" ]; then
TAG_ARG="--tag ${{ steps.version.outputs.dist_tag }}"
fi
npm publish --access public --provenance $TAG_ARG
env:
DIST_TAG: ${{ steps.version.outputs.dist_tag }}
NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
NPM_CONFIG_PROVENANCE: true
- name: Publish oh-my-openagent
if: steps.check.outputs.skip != 'true'
run: |
# Update package name to oh-my-openagent
jq '.name = "oh-my-openagent"' package.json > tmp.json && mv tmp.json package.json
# Update optionalDependencies to use oh-my-openagent naming
jq '.optionalDependencies = {
"oh-my-openagent-darwin-arm64": "${{ steps.version.outputs.version }}",
"oh-my-openagent-darwin-x64": "${{ steps.version.outputs.version }}",
"oh-my-openagent-darwin-x64-baseline": "${{ steps.version.outputs.version }}",
"oh-my-openagent-linux-arm64": "${{ steps.version.outputs.version }}",
"oh-my-openagent-linux-arm64-musl": "${{ steps.version.outputs.version }}",
"oh-my-openagent-linux-x64": "${{ steps.version.outputs.version }}",
"oh-my-openagent-linux-x64-baseline": "${{ steps.version.outputs.version }}",
"oh-my-openagent-linux-x64-musl": "${{ steps.version.outputs.version }}",
"oh-my-openagent-linux-x64-musl-baseline": "${{ steps.version.outputs.version }}",
"oh-my-openagent-windows-x64": "${{ steps.version.outputs.version }}",
"oh-my-openagent-windows-x64-baseline": "${{ steps.version.outputs.version }}"
}' package.json > tmp.json && mv tmp.json package.json
TAG_ARG=""
if [ -n "${{ steps.version.outputs.dist_tag }}" ]; then
TAG_ARG="--tag ${{ steps.version.outputs.dist_tag }}"
if [ -n "$DIST_TAG" ]; then
npm publish --access public --provenance --tag "$DIST_TAG"
else
npm publish --access public --provenance
fi
npm publish --access public --provenance $TAG_ARG || echo "oh-my-openagent publish may have failed (package may already exist)"
env:
NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
NPM_CONFIG_PROVENANCE: true
- name: Restore package.json
if: steps.check.outputs.skip != 'true'
run: |
# Restore original package name
jq '.name = "oh-my-opencode"' package.json > tmp.json && mv tmp.json package.json
trigger-platform:
runs-on: ubuntu-latest
- name: Check if oh-my-openagent already published
id: check-openagent
env:
VERSION: ${{ steps.version.outputs.version }}
run: |
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "https://registry.npmjs.org/oh-my-openagent/${VERSION}")
if [ "$STATUS" = "200" ]; then
echo "skip=true" >> $GITHUB_OUTPUT
echo "✓ oh-my-openagent@${VERSION} already published"
else
echo "skip=false" >> $GITHUB_OUTPUT
fi
- name: Publish oh-my-openagent
if: steps.check-openagent.outputs.skip != 'true'
env:
VERSION: ${{ steps.version.outputs.version }}
DIST_TAG: ${{ steps.version.outputs.dist_tag }}
NODE_AUTH_TOKEN: ${{ secrets.NODE_AUTH_TOKEN }}
NPM_CONFIG_PROVENANCE: true
run: |
# Update package name, version, and optionalDependencies for oh-my-openagent
jq --arg v "$VERSION" '
.name = "oh-my-openagent" |
.version = $v |
.optionalDependencies = (
.optionalDependencies | to_entries |
map(.key = (.key | sub("^oh-my-opencode-"; "oh-my-openagent-")) | .value = $v) |
from_entries
)
' package.json > tmp.json && mv tmp.json package.json
if [ -n "$DIST_TAG" ]; then
npm publish --access public --provenance --tag "$DIST_TAG"
else
npm publish --access public --provenance
fi
- name: Restore package.json
if: always() && steps.check-openagent.outputs.skip != 'true'
run: |
git checkout -- package.json
publish-platform:
needs: publish-main
if: inputs.skip_platform != true
steps:
- name: Trigger platform publish workflow
run: |
gh workflow run publish-platform.yml \
--repo ${{ github.repository }} \
--ref ${{ github.ref }} \
-f version=${{ needs.publish-main.outputs.version }} \
-f dist_tag=${{ needs.publish-main.outputs.dist_tag }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/publish-platform.yml
with:
version: ${{ needs.publish-main.outputs.version }}
dist_tag: ${{ needs.publish-main.outputs.dist_tag }}
secrets: inherit
release:
runs-on: ubuntu-latest
needs: publish-main
needs: [publish-main, publish-platform]
if: always() && needs.publish-main.result == 'success' && (inputs.skip_platform == true || needs.publish-platform.result == 'success')
steps:
- uses: actions/checkout@v4
with:
@@ -291,13 +330,53 @@ jobs:
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Create GitHub release
- name: Apply release version to source tree
env:
VERSION: ${{ needs.publish-main.outputs.version }}
run: |
jq --arg v "$VERSION" '.version = $v' package.json > tmp.json && mv tmp.json package.json
for platform in darwin-arm64 darwin-x64 darwin-x64-baseline linux-x64 linux-x64-baseline linux-arm64 linux-x64-musl linux-x64-musl-baseline linux-arm64-musl windows-x64 windows-x64-baseline; do
jq --arg v "$VERSION" '.version = $v' "packages/${platform}/package.json" > tmp.json
mv tmp.json "packages/${platform}/package.json"
done
jq --arg v "$VERSION" '.optionalDependencies = (.optionalDependencies | to_entries | map(.value = $v) | from_entries)' package.json > tmp.json && mv tmp.json package.json
- name: Commit version bump
env:
VERSION: ${{ needs.publish-main.outputs.version }}
run: |
git config user.email "github-actions[bot]@users.noreply.github.com"
git config user.name "github-actions[bot]"
git add package.json packages/*/package.json
git diff --cached --quiet || git commit -m "release: v${VERSION}"
- name: Create release tag
env:
VERSION: ${{ needs.publish-main.outputs.version }}
run: |
if git rev-parse "v${VERSION}" >/dev/null 2>&1; then
echo "::error::Tag v${VERSION} already exists"
exit 1
fi
git tag "v${VERSION}"
- name: Push release state
env:
VERSION: ${{ needs.publish-main.outputs.version }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
git push origin HEAD
git push origin "v${VERSION}"
- name: Create GitHub release
env:
VERSION: ${{ needs.publish-main.outputs.version }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
VERSION="${{ needs.publish-main.outputs.version }}"
gh release view "v${VERSION}" >/dev/null 2>&1 || \
gh release create "v${VERSION}" --title "v${VERSION}" --notes-file /tmp/changelog.md
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Delete draft release
run: gh release delete next --yes 2>/dev/null || true
@@ -306,13 +385,13 @@ jobs:
- name: Merge to master
continue-on-error: true
env:
VERSION: ${{ needs.publish-main.outputs.version }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
VERSION="${{ needs.publish-main.outputs.version }}"
git stash --include-untracked || true
git checkout master
git reset --hard "v${VERSION}"
git push -f origin master || echo "::warning::Failed to push to master"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -0,0 +1,46 @@
name: Refresh Model Capabilities
on:
schedule:
- cron: "17 4 * * 1"
workflow_dispatch:
permissions:
contents: write
pull-requests: write
jobs:
refresh:
runs-on: ubuntu-latest
if: github.repository == 'code-yeongyu/oh-my-openagent'
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
with:
bun-version: latest
- name: Install dependencies
run: bun install
env:
BUN_INSTALL_ALLOW_SCRIPTS: "@ast-grep/napi"
- name: Refresh bundled model capabilities snapshot
run: bun run build:model-capabilities
- name: Validate capability guardrails
run: bun run test:model-capabilities
- name: Create refresh pull request
uses: peter-evans/create-pull-request@v7
with:
commit-message: "chore: refresh model capabilities snapshot"
title: "chore: refresh model capabilities snapshot"
body: |
Automated refresh of `src/generated/model-capabilities.generated.json` from `https://models.dev/api.json`.
This keeps the bundled capability snapshot aligned with upstream model metadata without relying on manual refreshes.
branch: automation/refresh-model-capabilities
delete-branch: true
labels: |
maintenance

2
.gitignore vendored
View File

@@ -9,6 +9,7 @@ dist/
# Platform binaries (built, not committed)
packages/*/bin/oh-my-opencode
packages/*/bin/oh-my-opencode.exe
packages/*/bin/*.map
# IDE
.idea/
@@ -35,3 +36,4 @@ test-injection/
notepad.md
oauth-success.html
*.bun-build
.omx/

View File

@@ -1,105 +1,229 @@
---
name: github-triage
description: "Unified GitHub triage for issues AND PRs. 1 item = 1 background task (category: free). Issues: answer questions from codebase, analyze bugs. PRs: review bugfixes, merge safe ones. All parallel, all background. Triggers: 'triage', 'triage issues', 'triage PRs', 'github triage'."
description: "Read-only GitHub triage for issues AND PRs. 1 item = 1 background task (category: quick). Analyzes all open items and writes evidence-backed reports to /tmp/{datetime}/. Every claim requires a GitHub permalink as proof. NEVER takes any action on GitHub - no comments, no merges, no closes, no labels. Reports only. Triggers: 'triage', 'triage issues', 'triage PRs', 'github triage'."
---
# GitHub Triage — Unified Issue & PR Processor
# GitHub Triage - Read-Only Analyzer
<role>
You are a GitHub triage orchestrator. You fetch all open issues and PRs, classify each one, then spawn exactly 1 background subagent per item using `category="free"`. Each subagent analyzes its item, takes action (comment/close/merge/report), and records results via TaskCreate.
Read-only GitHub triage orchestrator. Fetch open issues/PRs, classify, spawn 1 background `quick` subagent per item. Each subagent analyzes and writes a report file. ZERO GitHub mutations.
</role>
---
## Architecture
## ARCHITECTURE
```
1 issue or PR = 1 TaskCreate = 1 task(category="free", run_in_background=true)
```
**1 ISSUE/PR = 1 `task_create` = 1 `quick` SUBAGENT (background). NO EXCEPTIONS.**
| Rule | Value |
|------|-------|
| Category for ALL subagents | `free` |
| Execution mode | `run_in_background=true` |
| Parallelism | ALL items launched simultaneously |
| Result tracking | Each subagent calls `TaskCreate` with its findings |
| Result collection | `background_output()` polling loop |
| Category | `quick` |
| Execution | `run_in_background=true` |
| Parallelism | ALL items simultaneously |
| Tracking | `task_create` per item |
| Output | `/tmp/{YYYYMMDD-HHmmss}/issue-{N}.md` or `pr-{N}.md` |
---
## PHASE 1: FETCH ALL OPEN ITEMS
## Zero-Action Policy (ABSOLUTE)
<fetch>
Run these commands to collect data. Use the bundled script if available, otherwise fall back to gh CLI.
<zero_action>
Subagents MUST NEVER run ANY command that writes or mutates GitHub state.
**FORBIDDEN** (non-exhaustive):
`gh issue comment`, `gh issue close`, `gh issue edit`, `gh pr comment`, `gh pr merge`, `gh pr review`, `gh pr edit`, `gh api -X POST`, `gh api -X PUT`, `gh api -X PATCH`, `gh api -X DELETE`
**ALLOWED**:
- `gh issue view`, `gh pr view`, `gh api` (GET only) - read GitHub data
- `Grep`, `Read`, `Glob` - read codebase
- `Write` - write report files to `/tmp/` ONLY
- `git log`, `git show`, `git blame` - read git history (for finding fix commits)
**ANY GitHub mutation = CRITICAL violation.**
</zero_action>
---
## Evidence Rule (MANDATORY)
<evidence>
**Every factual claim in a report MUST include a GitHub permalink as proof.**
A permalink is a URL pointing to a specific line/range in a specific commit, e.g.:
`https://github.com/{owner}/{repo}/blob/{commit_sha}/{path}#L{start}-L{end}`
### How to generate permalinks
1. Find the relevant file and line(s) via Grep/Read.
2. Get the current commit SHA: `git rev-parse HEAD`
3. Construct: `https://github.com/{REPO}/blob/{SHA}/{filepath}#L{line}` (or `#L{start}-L{end}` for ranges)
### Rules
- **No permalink = no claim.** If you cannot back a statement with a permalink, state "No evidence found" instead.
- Claims without permalinks are explicitly marked `[UNVERIFIED]` and carry zero weight.
- Permalinks to `main`/`master`/`dev` branches are NOT acceptable - use commit SHAs only.
- For bug analysis: permalink to the problematic code. For fix verification: permalink to the fixing commit diff.
</evidence>
---
## Phase 0: Setup
```bash
REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner)
# Issues: all open
gh issue list --repo $REPO --state open --limit 500 \
--json number,title,state,createdAt,updatedAt,labels,author,body,comments
# PRs: all open
gh pr list --repo $REPO --state open --limit 500 \
--json number,title,state,createdAt,updatedAt,labels,author,body,headRefName,baseRefName,isDraft,mergeable,reviewDecision,statusCheckRollup
REPORT_DIR="/tmp/$(date +%Y%m%d-%H%M%S)"
mkdir -p "$REPORT_DIR"
COMMIT_SHA=$(git rev-parse HEAD)
```
If either returns exactly 500 results, paginate using `--search "created:<LAST_CREATED_AT"` until exhausted.
</fetch>
Pass `REPO`, `REPORT_DIR`, and `COMMIT_SHA` to every subagent.
---
## PHASE 2: CLASSIFY EACH ITEM
---
For each item, determine its type based on title, labels, and body content:
## Phase 1: Fetch All Open Items (CORRECTED)
<classification>
**IMPORTANT:** `body` and `comments` fields may contain control characters that break jq parsing. Fetch basic metadata first, then fetch full details per-item in subagents.
### Issues
```bash
# Step 1: Fetch basic metadata (without body/comments to avoid JSON parsing issues)
ISSUES_LIST=$(gh issue list --repo $REPO --state open --limit 500 \
--json number,title,labels,author,createdAt)
ISSUE_COUNT=$(echo "$ISSUES_LIST" | jq length)
| Type | Detection | Action Path |
|------|-----------|-------------|
| `ISSUE_QUESTION` | Title contains `[Question]`, `[Discussion]`, `?`, or body is asking "how to" / "why does" / "is it possible" | SUBAGENT_ISSUE_QUESTION |
| `ISSUE_BUG` | Title contains `[Bug]`, `Bug:`, body describes unexpected behavior, error messages, stack traces | SUBAGENT_ISSUE_BUG |
| `ISSUE_FEATURE` | Title contains `[Feature]`, `[RFE]`, `[Enhancement]`, `Feature Request`, `Proposal` | SUBAGENT_ISSUE_FEATURE |
| `ISSUE_OTHER` | Anything else | SUBAGENT_ISSUE_OTHER |
# Paginate if needed
if [ "$ISSUE_COUNT" -eq 500 ]; then
LAST_DATE=$(echo "$ISSUES_LIST" | jq -r '.[-1].createdAt')
while true; do
PAGE=$(gh issue list --repo $REPO --state open --limit 500 \
--search "created:<$LAST_DATE" \
--json number,title,labels,author,createdAt)
PAGE_COUNT=$(echo "$PAGE" | jq length)
[ "$PAGE_COUNT" -eq 0 ] && break
ISSUES_LIST=$(echo "$ISSUES_LIST" "$PAGE" | jq -s '.[0] + .[1] | unique_by(.number)')
ISSUE_COUNT=$(echo "$ISSUES_LIST" | jq length)
[ "$PAGE_COUNT" -lt 500 ] && break
LAST_DATE=$(echo "$PAGE" | jq -r '.[-1].createdAt')
done
fi
### PRs
# Same for PRs
PRS_LIST=$(gh pr list --repo $REPO --state open --limit 500 \
--json number,title,labels,author,headRefName,baseRefName,isDraft,createdAt)
PR_COUNT=$(echo "$PRS_LIST" | jq length)
| Type | Detection | Action Path |
|------|-----------|-------------|
| `PR_BUGFIX` | Title starts with `fix`, `fix:`, `fix(`, branch contains `fix/`, `bugfix/`, or labels include `bug` | SUBAGENT_PR_BUGFIX |
| `PR_OTHER` | Everything else (feat, refactor, docs, chore, etc.) | SUBAGENT_PR_OTHER |
if [ "$PR_COUNT" -eq 500 ]; then
LAST_DATE=$(echo "$PRS_LIST" | jq -r '.[-1].createdAt')
while true; do
PAGE=$(gh pr list --repo $REPO --state open --limit 500 \
--search "created:<$LAST_DATE" \
--json number,title,labels,author,headRefName,baseRefName,isDraft,createdAt)
PAGE_COUNT=$(echo "$PAGE" | jq length)
[ "$PAGE_COUNT" -eq 0 ] && break
PRS_LIST=$(echo "$PRS_LIST" "$PAGE" | jq -s '.[0] + .[1] | unique_by(.number)')
PR_COUNT=$(echo "$PRS_LIST" | jq length)
[ "$PAGE_COUNT" -lt 500 ] && break
LAST_DATE=$(echo "$PAGE" | jq -r '.[-1].createdAt')
done
fi
echo "Total issues: $ISSUE_COUNT, Total PRs: $PR_COUNT"
```
**LARGE REPOSITORY HANDLING:**
If total items exceeds 50, you MUST process ALL items. Use the pagination code above to fetch every single open issue and PR.
**DO NOT** sample or limit to 50 items - process the entire backlog.
Example: If there are 500 open issues, spawn 500 subagents. If there are 1000 open PRs, spawn 1000 subagents.
**Note:** Background task system will queue excess tasks automatically.
</classification>
---
## PHASE 3: SPAWN 1 BACKGROUND TASK PER ITEM
## Phase 2: Classify
For EVERY item, create a TaskCreate entry first, then spawn a background task.
| Type | Detection |
|------|-----------|
| `ISSUE_QUESTION` | `[Question]`, `[Discussion]`, `?`, "how to" / "why does" / "is it possible" |
| `ISSUE_BUG` | `[Bug]`, `Bug:`, error messages, stack traces, unexpected behavior |
| `ISSUE_FEATURE` | `[Feature]`, `[RFE]`, `[Enhancement]`, `Feature Request`, `Proposal` |
| `ISSUE_OTHER` | Anything else |
| `PR_BUGFIX` | Title starts with `fix`, branch contains `fix/`/`bugfix/`, label `bug` |
| `PR_OTHER` | Everything else |
---
## Phase 3: Spawn Subagents (Individual Tool Calls)
**CRITICAL: Create tasks ONE BY ONE using individual `task_create` tool calls. NEVER batch or script.**
For each item, execute these steps sequentially:
### Step 3.1: Create Task Record
```typescript
task_create(
subject="Triage: #{number} {title}",
description="GitHub {issue|PR} triage analysis - {type}",
metadata={"type": "{ISSUE_QUESTION|ISSUE_BUG|ISSUE_FEATURE|ISSUE_OTHER|PR_BUGFIX|PR_OTHER}", "number": {number}}
)
```
### Step 3.2: Spawn Analysis Subagent (Background)
```typescript
task(
category="quick",
run_in_background=true,
load_skills=[],
prompt=SUBAGENT_PROMPT
)
```
**ABSOLUTE RULES for Subagents:**
- **ONLY ANALYZE** - Never take action on GitHub (no comments, merges, closes)
- **READ-ONLY** - Use tools only for reading code/GitHub data
- **WRITE REPORT ONLY** - Output goes to `{REPORT_DIR}/{issue|pr}-{number}.md` via Write tool
- **EVIDENCE REQUIRED** - Every claim must have GitHub permalink as proof
```
For each item:
1. TaskCreate(subject="Triage: #{number} {title}")
2. task(category="free", run_in_background=true, load_skills=[], prompt=SUBAGENT_PROMPT)
1. task_create(subject="Triage: #{number} {title}")
2. task(category="quick", run_in_background=true, load_skills=[], prompt=SUBAGENT_PROMPT)
3. Store mapping: item_number -> { task_id, background_task_id }
```
---
## SUBAGENT PROMPT TEMPLATES
## Subagent Prompts
### Common Preamble (include in ALL subagent prompts)
```
CONTEXT:
- Repository: {REPO}
- Report directory: {REPORT_DIR}
- Current commit SHA: {COMMIT_SHA}
PERMALINK FORMAT:
Every factual claim MUST include a permalink: https://github.com/{REPO}/blob/{COMMIT_SHA}/{filepath}#L{start}-L{end}
No permalink = no claim. Mark unverifiable claims as [UNVERIFIED].
To get current SHA if needed: git rev-parse HEAD
ABSOLUTE RULES (violating ANY = critical failure):
- NEVER run gh issue comment, gh issue close, gh issue edit
- NEVER run gh pr comment, gh pr merge, gh pr review, gh pr edit
- NEVER run any gh command with -X POST, -X PUT, -X PATCH, -X DELETE
- NEVER run git checkout, git fetch, git pull, git switch, git worktree
- Your ONLY writable output: {REPORT_DIR}/{issue|pr}-{number}.md via the Write tool
```
Each subagent gets an explicit, step-by-step prompt. Free models are limited — leave NOTHING implicit.
---
### SUBAGENT_ISSUE_QUESTION
<issue_question_prompt>
### ISSUE_QUESTION
```
You are a GitHub issue responder for the repository {REPO}.
You are analyzing issue #{number} for {REPO}.
ITEM:
- Issue #{number}: {title}
@@ -107,52 +231,43 @@ ITEM:
- Body: {body}
- Comments: {comments_summary}
YOUR JOB:
1. Read the issue carefully. Understand what the user is asking.
2. Search the codebase to find the answer. Use Grep and Read tools.
- Search for relevant file names, function names, config keys mentioned in the issue.
- Read the files you find to understand how the feature works.
3. Decide: Can you answer this clearly and accurately from the codebase?
TASK:
1. Understand the question.
2. Search the codebase (Grep, Read) for the answer.
3. For every finding, construct a permalink: https://github.com/{REPO}/blob/{COMMIT_SHA}/{path}#L{N}
4. Write report to {REPORT_DIR}/issue-{number}.md
IF YES (you found a clear, accurate answer):
Step A: Write a helpful comment. The comment MUST:
- Start with exactly: [sisyphus-bot]
- Be warm, friendly, and thorough
- Include specific file paths and code references
- Include code snippets or config examples if helpful
- End with "Feel free to reopen if this doesn't resolve your question!"
Step B: Post the comment:
gh issue comment {number} --repo {REPO} --body "YOUR_COMMENT"
Step C: Close the issue:
gh issue close {number} --repo {REPO}
Step D: Report back with this EXACT format:
ACTION: ANSWERED_AND_CLOSED
COMMENT_POSTED: yes
SUMMARY: [1-2 sentence summary of your answer]
REPORT FORMAT (write this as the file content):
IF NO (not enough info in codebase, or answer is uncertain):
Report back with:
ACTION: NEEDS_MANUAL_ATTENTION
REASON: [why you couldn't answer — be specific]
PARTIAL_FINDINGS: [what you DID find, if anything]
# Issue #{number}: {title}
**Type:** Question | **Author:** {author} | **Created:** {createdAt}
RULES:
- NEVER guess. Only answer if the codebase clearly supports your answer.
- NEVER make up file paths or function names.
- The [sisyphus-bot] prefix is MANDATORY on every comment you post.
- Be genuinely helpful — imagine you're a senior maintainer who cares about the community.
## Question
[1-2 sentence summary]
## Findings
[Each finding with permalink proof. Example:]
- The config is parsed in [`src/config/loader.ts#L42-L58`](https://github.com/{REPO}/blob/{SHA}/src/config/loader.ts#L42-L58)
## Suggested Answer
[Draft answer with code references and permalinks]
## Confidence: [HIGH | MEDIUM | LOW]
[Reason. If LOW: what's missing]
## Recommended Action
[What maintainer should do]
---
REMEMBER: No permalink = no claim. Every code reference needs a permalink.
```
</issue_question_prompt>
---
### SUBAGENT_ISSUE_BUG
<issue_bug_prompt>
### ISSUE_BUG
```
You are a GitHub bug analyzer for the repository {REPO}.
You are analyzing bug report #{number} for {REPO}.
ITEM:
- Issue #{number}: {title}
@@ -160,74 +275,75 @@ ITEM:
- Body: {body}
- Comments: {comments_summary}
YOUR JOB:
1. Read the issue carefully. Understand the reported bug:
- What behavior does the user expect?
- What behavior do they actually see?
- What steps reproduce it?
2. Search the codebase for the relevant code. Use Grep and Read tools.
- Find the files/functions mentioned or related to the bug.
- Read them carefully and trace the logic.
3. Determine one of three outcomes:
TASK:
1. Understand: expected behavior, actual behavior, reproduction steps.
2. Search the codebase for relevant code. Trace the logic.
3. Determine verdict: CONFIRMED_BUG, NOT_A_BUG, ALREADY_FIXED, or UNCLEAR.
4. For ALREADY_FIXED: find the fixing commit using git log/git blame. Include the commit SHA and what changed.
5. For every finding, construct a permalink.
6. Write report to {REPORT_DIR}/issue-{number}.md
OUTCOME A — CONFIRMED BUG (you found the problematic code):
Step 1: Post a comment on the issue. The comment MUST:
- Start with exactly: [sisyphus-bot]
- Apologize sincerely for the inconvenience ("We're sorry you ran into this issue.")
- Briefly acknowledge what the bug is
- Say "We've identified the root cause and will work on a fix."
- Do NOT reveal internal implementation details unnecessarily
Step 2: Post the comment:
gh issue comment {number} --repo {REPO} --body "YOUR_COMMENT"
Step 3: Report back with:
ACTION: CONFIRMED_BUG
ROOT_CAUSE: [which file, which function, what goes wrong]
FIX_APPROACH: [how to fix it — be specific: "In {file}, line ~{N}, change X to Y because Z"]
SEVERITY: [LOW|MEDIUM|HIGH|CRITICAL]
AFFECTED_FILES: [list of files that need changes]
FINDING "ALREADY_FIXED" COMMITS:
- Use `git log --all --oneline -- {file}` to find recent changes to relevant files
- Use `git log --all --grep="fix" --grep="{keyword}" --all-match --oneline` to search commit messages
- Use `git blame {file}` to find who last changed the relevant lines
- Use `git show {commit_sha}` to verify the fix
- Construct commit permalink: https://github.com/{REPO}/commit/{fix_commit_sha}
OUTCOME B — NOT A BUG (user misunderstanding, provably correct behavior):
ONLY choose this if you can RIGOROUSLY PROVE the behavior is correct.
Step 1: Post a comment. The comment MUST:
- Start with exactly: [sisyphus-bot]
- Be kind and empathetic — never condescending
- Explain clearly WHY the current behavior is correct
- Include specific code references or documentation links
- Offer a workaround or alternative if possible
- End with "Please let us know if you have further questions!"
Step 2: Post the comment:
gh issue comment {number} --repo {REPO} --body "YOUR_COMMENT"
Step 3: DO NOT close the issue. Let the user or maintainer decide.
Step 4: Report back with:
ACTION: NOT_A_BUG
EXPLANATION: [why this is correct behavior]
PROOF: [specific code reference proving it]
REPORT FORMAT (write this as the file content):
OUTCOME C — UNCLEAR (can't determine from codebase alone):
Report back with:
ACTION: NEEDS_INVESTIGATION
FINDINGS: [what you found so far]
BLOCKERS: [what's preventing you from determining the cause]
SUGGESTED_NEXT_STEPS: [what a human should look at]
# Issue #{number}: {title}
**Type:** Bug Report | **Author:** {author} | **Created:** {createdAt}
RULES:
- NEVER guess at root causes. Only report CONFIRMED_BUG if you found the exact problematic code.
- NEVER close bug issues yourself. Only comment.
- For OUTCOME B (not a bug): you MUST have rigorous proof. If there's ANY doubt, choose OUTCOME C instead.
- The [sisyphus-bot] prefix is MANDATORY on every comment.
- When apologizing, be genuine. The user took time to report this.
## Bug Summary
**Expected:** [what user expects]
**Actual:** [what actually happens]
**Reproduction:** [steps if provided]
## Verdict: [CONFIRMED_BUG | NOT_A_BUG | ALREADY_FIXED | UNCLEAR]
## Analysis
### Evidence
[Each piece of evidence with permalink. No permalink = mark [UNVERIFIED]]
### Root Cause (if CONFIRMED_BUG)
[Which file, which function, what goes wrong]
- Problematic code: [`{path}#L{N}`](permalink)
### Why Not A Bug (if NOT_A_BUG)
[Rigorous proof with permalinks that current behavior is correct]
### Fix Details (if ALREADY_FIXED)
- **Fixed in commit:** [`{short_sha}`](https://github.com/{REPO}/commit/{full_sha})
- **Fixed date:** {date}
- **What changed:** [description with diff permalink]
- **Fixed by:** {author}
### Blockers (if UNCLEAR)
[What prevents determination, what to investigate next]
## Severity: [LOW | MEDIUM | HIGH | CRITICAL]
## Affected Files
[List with permalinks]
## Suggested Fix (if CONFIRMED_BUG)
[Specific approach: "In {file}#L{N}, change X to Y because Z"]
## Recommended Action
[What maintainer should do]
---
CRITICAL: Claims without permalinks are worthless. If you cannot find evidence, say so explicitly rather than making unverified claims.
```
</issue_bug_prompt>
---
### SUBAGENT_ISSUE_FEATURE
<issue_feature_prompt>
### ISSUE_FEATURE
```
You are a GitHub feature request analyzer for the repository {REPO}.
You are analyzing feature request #{number} for {REPO}.
ITEM:
- Issue #{number}: {title}
@@ -235,38 +351,41 @@ ITEM:
- Body: {body}
- Comments: {comments_summary}
YOUR JOB:
1. Read the feature request.
2. Search the codebase to check if this feature already exists (partially or fully).
3. Assess feasibility and alignment with the project.
TASK:
1. Understand the request.
2. Search codebase for existing (partial/full) implementations.
3. Assess feasibility.
4. Write report to {REPORT_DIR}/issue-{number}.md
Report back with:
ACTION: FEATURE_ASSESSED
ALREADY_EXISTS: [YES_FULLY | YES_PARTIALLY | NO]
IF_EXISTS: [where in the codebase, how to use it]
FEASIBILITY: [EASY | MODERATE | HARD | ARCHITECTURAL_CHANGE]
RELEVANT_FILES: [files that would need changes]
NOTES: [any observations about implementation approach]
REPORT FORMAT (write this as the file content):
If the feature already fully exists:
Post a comment (prefix: [sisyphus-bot]) explaining how to use the existing feature with examples.
gh issue comment {number} --repo {REPO} --body "YOUR_COMMENT"
# Issue #{number}: {title}
**Type:** Feature Request | **Author:** {author} | **Created:** {createdAt}
RULES:
- Do NOT close feature requests.
- The [sisyphus-bot] prefix is MANDATORY on any comment.
## Request Summary
[What the user wants]
## Existing Implementation: [YES_FULLY | YES_PARTIALLY | NO]
[If exists: where, with permalinks to the implementation]
## Feasibility: [EASY | MODERATE | HARD | ARCHITECTURAL_CHANGE]
## Relevant Files
[With permalinks]
## Implementation Notes
[Approach, pitfalls, dependencies]
## Recommended Action
[What maintainer should do]
```
</issue_feature_prompt>
---
### SUBAGENT_ISSUE_OTHER
<issue_other_prompt>
### ISSUE_OTHER
```
You are a GitHub issue analyzer for the repository {REPO}.
You are analyzing issue #{number} for {REPO}.
ITEM:
- Issue #{number}: {title}
@@ -274,209 +393,195 @@ ITEM:
- Body: {body}
- Comments: {comments_summary}
YOUR JOB:
Quickly assess this issue and report:
ACTION: ASSESSED
TYPE_GUESS: [QUESTION | BUG | FEATURE | DISCUSSION | META | STALE]
SUMMARY: [1-2 sentence summary]
NEEDS_ATTENTION: [YES | NO]
SUGGESTED_LABEL: [if any]
TASK: Assess and write report to {REPORT_DIR}/issue-{number}.md
Do NOT post comments. Do NOT close. Just analyze and report.
REPORT FORMAT (write this as the file content):
# Issue #{number}: {title}
**Type:** [QUESTION | BUG | FEATURE | DISCUSSION | META | STALE]
**Author:** {author} | **Created:** {createdAt}
## Summary
[1-2 sentences]
## Needs Attention: [YES | NO]
## Suggested Label: [if any]
## Recommended Action: [what maintainer should do]
```
</issue_other_prompt>
---
### SUBAGENT_PR_BUGFIX
<pr_bugfix_prompt>
### PR_BUGFIX
```
You are a GitHub PR reviewer for the repository {REPO}.
You are reviewing PR #{number} for {REPO}.
ITEM:
- PR #{number}: {title}
- Author: {author}
- Base: {baseRefName}
- Head: {headRefName}
- Draft: {isDraft}
- Mergeable: {mergeable}
- Review Decision: {reviewDecision}
- CI Status: {statusCheckRollup_summary}
- Base: {baseRefName} <- Head: {headRefName}
- Draft: {isDraft} | Mergeable: {mergeable}
- Review: {reviewDecision} | CI: {statusCheckRollup_summary}
- Body: {body}
YOUR JOB:
1. Fetch PR details (DO NOT checkout the branch — read-only analysis):
gh pr view {number} --repo {REPO} --json files,reviews,comments,statusCheckRollup,reviewDecision
2. Read the changed files list. For each changed file, use `gh api repos/{REPO}/pulls/{number}/files` to see the diff.
3. Search the codebase to understand what the PR is fixing and whether the fix is correct.
4. Evaluate merge safety:
TASK:
1. Fetch PR details (READ-ONLY): gh pr view {number} --repo {REPO} --json files,reviews,comments,statusCheckRollup,reviewDecision
2. Read diff: gh api repos/{REPO}/pulls/{number}/files
3. Search codebase to verify fix correctness.
4. Write report to {REPORT_DIR}/pr-{number}.md
MERGE CONDITIONS (ALL must be true for auto-merge):
a. CI status checks: ALL passing (no failures, no pending)
b. Review decision: APPROVED
c. The fix is clearly correct — addresses an obvious, unambiguous bug
d. No risky side effects (no architectural changes, no breaking changes)
e. Not a draft PR
f. Mergeable state is clean (no conflicts)
REPORT FORMAT (write this as the file content):
IF ALL MERGE CONDITIONS MET:
Step 1: Merge the PR:
gh pr merge {number} --repo {REPO} --squash --auto
Step 2: Report back with:
ACTION: MERGED
FIX_SUMMARY: [what bug was fixed and how]
FILES_CHANGED: [list of files]
RISK: NONE
# PR #{number}: {title}
**Type:** Bugfix | **Author:** {author}
**Base:** {baseRefName} <- {headRefName} | **Draft:** {isDraft}
IF ANY CONDITION NOT MET:
Report back with:
ACTION: NEEDS_HUMAN_DECISION
FIX_SUMMARY: [what the PR does]
WHAT_IT_FIXES: [the bug or issue it addresses]
CI_STATUS: [PASS | FAIL | PENDING — list any failures]
REVIEW_STATUS: [APPROVED | CHANGES_REQUESTED | PENDING | NONE]
MISSING: [what's preventing auto-merge — be specific]
RISK_ASSESSMENT: [what could go wrong]
AMBIGUOUS_PARTS: [anything that needs human judgment]
RECOMMENDED_ACTION: [what the maintainer should do]
## Fix Summary
[What bug, how fixed - with permalinks to changed code]
ABSOLUTE RULES:
- NEVER run `git checkout`, `git fetch`, `git pull`, or `git switch`. READ-ONLY via gh CLI and API.
- NEVER checkout the PR branch. NEVER. Use `gh api` and `gh pr view` only.
- Only merge if you are 100% certain ALL conditions are met. When in doubt, report instead.
- The [sisyphus-bot] prefix is MANDATORY on any comment you post.
## Code Review
### Correctness
[Is fix correct? Root cause addressed? Evidence with permalinks]
### Side Effects
[Risky changes, breaking changes - with permalinks if any]
### Code Quality
[Style, patterns, test coverage]
## Merge Readiness
| Check | Status |
|-------|--------|
| CI | [PASS / FAIL / PENDING] |
| Review | [APPROVED / CHANGES_REQUESTED / PENDING / NONE] |
| Mergeable | [YES / NO / CONFLICTED] |
| Draft | [YES / NO] |
| Correctness | [VERIFIED / CONCERNS / UNCLEAR] |
| Risk | [NONE / LOW / MEDIUM / HIGH] |
## Files Changed
[List with brief descriptions]
## Recommended Action: [MERGE | REQUEST_CHANGES | NEEDS_REVIEW | WAIT]
[Reasoning with evidence]
---
NEVER merge. NEVER comment. NEVER review. Write to file ONLY.
```
</pr_bugfix_prompt>
---
### SUBAGENT_PR_OTHER
<pr_other_prompt>
### PR_OTHER
```
You are a GitHub PR reviewer for the repository {REPO}.
You are reviewing PR #{number} for {REPO}.
ITEM:
- PR #{number}: {title}
- Author: {author}
- Base: {baseRefName}
- Head: {headRefName}
- Draft: {isDraft}
- Mergeable: {mergeable}
- Review Decision: {reviewDecision}
- CI Status: {statusCheckRollup_summary}
- Base: {baseRefName} <- Head: {headRefName}
- Draft: {isDraft} | Mergeable: {mergeable}
- Review: {reviewDecision} | CI: {statusCheckRollup_summary}
- Body: {body}
YOUR JOB:
1. Fetch PR details (READ-ONLY — no checkout):
gh pr view {number} --repo {REPO} --json files,reviews,comments,statusCheckRollup,reviewDecision
2. Read the changed files via `gh api repos/{REPO}/pulls/{number}/files`.
3. Assess the PR and report:
TASK:
1. Fetch PR details (READ-ONLY): gh pr view {number} --repo {REPO} --json files,reviews,comments,statusCheckRollup,reviewDecision
2. Read diff: gh api repos/{REPO}/pulls/{number}/files
3. Write report to {REPORT_DIR}/pr-{number}.md
ACTION: PR_ASSESSED
TYPE: [FEATURE | REFACTOR | DOCS | CHORE | TEST | OTHER]
SUMMARY: [what this PR does in 2-3 sentences]
CI_STATUS: [PASS | FAIL | PENDING]
REVIEW_STATUS: [APPROVED | CHANGES_REQUESTED | PENDING | NONE]
FILES_CHANGED: [count and key files]
RISK_LEVEL: [LOW | MEDIUM | HIGH]
ALIGNMENT: [does this fit the project direction? YES | NO | UNCLEAR]
BLOCKERS: [anything preventing merge]
RECOMMENDED_ACTION: [MERGE | REQUEST_CHANGES | NEEDS_REVIEW | CLOSE | WAIT]
NOTES: [any observations for the maintainer]
REPORT FORMAT (write this as the file content):
ABSOLUTE RULES:
- NEVER run `git checkout`, `git fetch`, `git pull`, or `git switch`. READ-ONLY.
- NEVER checkout the PR branch. Use `gh api` and `gh pr view` only.
- Do NOT merge non-bugfix PRs automatically. Report only.
# PR #{number}: {title}
**Type:** [FEATURE | REFACTOR | DOCS | CHORE | TEST | OTHER]
**Author:** {author}
**Base:** {baseRefName} <- {headRefName} | **Draft:** {isDraft}
## Summary
[2-3 sentences with permalinks to key changes]
## Status
| Check | Status |
|-------|--------|
| CI | [PASS / FAIL / PENDING] |
| Review | [APPROVED / CHANGES_REQUESTED / PENDING / NONE] |
| Mergeable | [YES / NO / CONFLICTED] |
| Risk | [LOW / MEDIUM / HIGH] |
| Alignment | [YES / NO / UNCLEAR] |
## Files Changed
[Count and key files]
## Blockers
[If any]
## Recommended Action: [MERGE | REQUEST_CHANGES | NEEDS_REVIEW | CLOSE | WAIT]
[Reasoning]
---
NEVER merge. NEVER comment. NEVER review. Write to file ONLY.
```
</pr_other_prompt>
---
## Phase 4: Collect & Update
Poll `background_output()` per task. As each completes:
1. Parse report.
2. `task_update(id=task_id, status="completed", description=REPORT_SUMMARY)`
3. Stream to user immediately.
---
## PHASE 4: COLLECT RESULTS & UPDATE TASKS
## Phase 5: Final Summary
<collection>
Poll `background_output()` for each spawned task. As each completes:
1. Parse the subagent's report.
2. Update the corresponding TaskCreate entry:
- `TaskUpdate(id=task_id, status="completed", description=FULL_REPORT_TEXT)`
3. Stream the result to the user immediately — do not wait for all to finish.
Track counters:
- issues_answered (commented + closed)
- bugs_confirmed
- bugs_not_a_bug
- prs_merged
- prs_needs_decision
- features_assessed
</collection>
---
## PHASE 5: FINAL SUMMARY
After all background tasks complete, produce a summary:
Write to `{REPORT_DIR}/SUMMARY.md` AND display to user:
```markdown
# GitHub Triage Report {REPO}
# GitHub Triage Report - {REPO}
**Date:** {date}
**Date:** {date} | **Commit:** {COMMIT_SHA}
**Items Processed:** {total}
**Report Directory:** {REPORT_DIR}
## Issues ({issue_count})
| Action | Count |
|--------|-------|
| Answered & Closed | {issues_answered} |
| Bug Confirmed | {bugs_confirmed} |
| Not A Bug (explained) | {bugs_not_a_bug} |
| Feature Assessed | {features_assessed} |
| Needs Manual Attention | {needs_manual} |
| Category | Count |
|----------|-------|
| Bug Confirmed | {n} |
| Bug Already Fixed | {n} |
| Not A Bug | {n} |
| Needs Investigation | {n} |
| Question Analyzed | {n} |
| Feature Assessed | {n} |
| Other | {n} |
## PRs ({pr_count})
| Action | Count |
|--------|-------|
| Auto-Merged (safe bugfix) | {prs_merged} |
| Needs Human Decision | {prs_needs_decision} |
| Assessed (non-bugfix) | {prs_assessed} |
| Category | Count |
|----------|-------|
| Bugfix Reviewed | {n} |
| Other PR Reviewed | {n} |
## Items Requiring Your Attention
[List each item that needs human decision with its report summary]
## Items Requiring Attention
[Each item: number, title, verdict, 1-line summary, link to report file]
## Report Files
[All generated files with paths]
```
---
## ANTI-PATTERNS
## Anti-Patterns
| Violation | Severity |
|-----------|----------|
| Using any category other than `free` | CRITICAL |
| ANY GitHub mutation (comment/close/merge/review/label/edit) | **CRITICAL** |
| Claim without permalink | **CRITICAL** |
| Using category other than `quick` | CRITICAL |
| Batching multiple items into one task | CRITICAL |
| Using `run_in_background=false` | CRITICAL |
| Subagent running `git checkout` on a PR branch | CRITICAL |
| Posting comment without `[sisyphus-bot]` prefix | CRITICAL |
| Merging a PR that doesn't meet ALL 6 conditions | CRITICAL |
| Closing a bug issue (only comment, never close bugs) | HIGH |
| Guessing at answers without codebase evidence | HIGH |
| Not recording results via TaskCreate/TaskUpdate | HIGH |
---
## QUICK START
When invoked:
1. `TaskCreate` for the overall triage job
2. Fetch all open issues + PRs via gh CLI (paginate if needed)
3. Classify each item (ISSUE_QUESTION, ISSUE_BUG, ISSUE_FEATURE, PR_BUGFIX, etc.)
4. For EACH item: `TaskCreate` + `task(category="free", run_in_background=true, load_skills=[], prompt=...)`
5. Poll `background_output()` — stream results as they arrive
6. `TaskUpdate` each task with the subagent's findings
7. Produce final summary report
| `run_in_background=false` | CRITICAL |
| `git checkout` on PR branch | CRITICAL |
| Guessing without codebase evidence | HIGH |
| Not writing report to `{REPORT_DIR}` | HIGH |
| Using branch name instead of commit SHA in permalink | HIGH |

View File

@@ -0,0 +1,407 @@
---
name: pre-publish-review
description: "Nuclear-grade 16-agent pre-publish release gate. Runs /get-unpublished-changes to detect all changes since last npm release, spawns up to 10 ultrabrain agents for deep per-change analysis, invokes /review-work (5 agents) for holistic review, and 1 oracle for overall release synthesis. Use before EVERY npm publish. Triggers: 'pre-publish review', 'review before publish', 'release review', 'pre-release review', 'ready to publish?', 'can I publish?', 'pre-publish', 'safe to publish', 'publishing review', 'pre-publish check'."
---
# Pre-Publish Review — 16-Agent Release Gate
Three-layer review before publishing to npm. Every layer covers a different angle — together they catch what no single reviewer could.
| Layer | Agents | Type | What They Check |
|-------|--------|------|-----------------|
| Per-Change Deep Dive | up to 10 | ultrabrain | Each logical change group individually — correctness, edge cases, pattern adherence |
| Holistic Review | 5 | review-work | Goal compliance, QA execution, code quality, security, context mining across full changeset |
| Release Synthesis | 1 | oracle | Overall release readiness, version bump, breaking changes, deployment risk |
---
## Phase 0: Detect Unpublished Changes
Run `/get-unpublished-changes` FIRST. This is the single source of truth for what changed.
```
skill(name="get-unpublished-changes")
```
This command automatically:
- Detects published npm version vs local version
- Lists all commits since last release
- Reads actual diffs (not just commit messages) to describe REAL changes
- Groups changes by type (feat/fix/refactor/docs) with scope
- Identifies breaking changes
- Recommends version bump (patch/minor/major)
**Save the full output** — it feeds directly into Phase 1 grouping and all agent prompts.
Then capture raw data needed by agent prompts:
```bash
# Extract versions (already in /get-unpublished-changes output)
PUBLISHED=$(npm view oh-my-opencode version 2>/dev/null || echo "not published")
LOCAL=$(node -p "require('./package.json').version" 2>/dev/null || echo "unknown")
# Raw data for agents (diffs, file lists)
COMMITS=$(git log "v${PUBLISHED}"..HEAD --oneline 2>/dev/null || echo "no commits")
COMMIT_COUNT=$(echo "$COMMITS" | wc -l | tr -d ' ')
DIFF_STAT=$(git diff "v${PUBLISHED}"..HEAD --stat 2>/dev/null || echo "no diff")
CHANGED_FILES=$(git diff --name-only "v${PUBLISHED}"..HEAD 2>/dev/null || echo "none")
FILE_COUNT=$(echo "$CHANGED_FILES" | wc -l | tr -d ' ')
```
If `PUBLISHED` is "not published", this is a first release — use the full git history instead.
---
## Phase 1: Parse Changes into Groups
Use the `/get-unpublished-changes` output as the starting point — it already groups by scope and type.
**Grouping strategy:**
1. Start from the `/get-unpublished-changes` analysis which already categorizes by feat/fix/refactor/docs with scope
2. Further split by **module/area** — changes touching the same module or feature area belong together
3. Target **up to 10 groups**. If fewer than 10 commits, each commit is its own group. If more than 10 logical areas, merge the smallest groups.
4. For each group, extract:
- **Group name**: Short descriptive label (e.g., "agent-model-resolution", "hook-system-refactor")
- **Commits**: List of commit hashes and messages
- **Files**: Changed files in this group
- **Diff**: The relevant portion of the full diff (`git diff v${PUBLISHED}..HEAD -- {group files}`)
---
## Phase 2: Spawn All Agents
Launch ALL agents in a single turn. Every agent uses `run_in_background=true`. No sequential launches.
### Layer 1: Ultrabrain Per-Change Analysis (up to 10)
For each change group, spawn one ultrabrain agent. Each gets only its portion of the diff — not the full changeset.
```
task(
category="ultrabrain",
run_in_background=true,
load_skills=[],
description="Deep analysis: {GROUP_NAME}",
prompt="""
<review_type>PER-CHANGE DEEP ANALYSIS</review_type>
<change_group>{GROUP_NAME}</change_group>
<project>oh-my-opencode (npm package)</project>
<published_version>{PUBLISHED}</published_version>
<target_version>{LOCAL}</target_version>
<commits>
{GROUP_COMMITS — hash and message for each commit in this group}
</commits>
<changed_files>
{GROUP_FILES — files changed in this group}
</changed_files>
<diff>
{GROUP_DIFF — only the diff for this group's files}
</diff>
<file_contents>
{Read and include full content of each changed file in this group}
</file_contents>
You are reviewing a specific subset of changes heading into an npm release. Focus exclusively on THIS change group. Other groups are reviewed by parallel agents.
ANALYSIS CHECKLIST:
1. **Intent Clarity**: What is this change trying to do? Is the intent clear from the code and commit messages? If you have to guess, that's a finding.
2. **Correctness**: Trace through the logic for 3+ scenarios. Does the code actually do what it claims? Off-by-one errors, null handling, async edge cases, resource cleanup.
3. **Breaking Changes**: Does this change alter any public API, config format, CLI behavior, or hook contract? If yes, is it backward compatible? Would existing users be surprised?
4. **Pattern Adherence**: Does the new code follow the established patterns visible in the existing file contents? New patterns where old ones exist = finding.
5. **Edge Cases**: What inputs or conditions would break this? Empty arrays, undefined values, concurrent calls, very large inputs, missing config fields.
6. **Error Handling**: Are errors properly caught and propagated? No empty catch blocks? No swallowed promises?
7. **Type Safety**: Any `as any`, `@ts-ignore`, `@ts-expect-error`? Loose typing where strict is possible?
8. **Test Coverage**: Are the behavioral changes covered by tests? Are the tests meaningful or just coverage padding?
9. **Side Effects**: Could this change break something in a different module? Check imports and exports — who depends on what changed?
10. **Release Risk**: On a scale of SAFE / CAUTION / RISKY — how confident are you this change won't cause issues in production?
OUTPUT FORMAT:
<group_name>{GROUP_NAME}</group_name>
<verdict>PASS or FAIL</verdict>
<risk>SAFE / CAUTION / RISKY</risk>
<summary>2-3 sentence assessment of this change group</summary>
<has_breaking_changes>YES or NO</has_breaking_changes>
<breaking_change_details>If YES, describe what breaks and for whom</breaking_change_details>
<findings>
For each finding:
- [CRITICAL/MAJOR/MINOR] Category: Description
- File: path (line range)
- Evidence: specific code reference
- Suggestion: how to fix
</findings>
<blocking_issues>Issues that MUST be fixed before publish. Empty if PASS.</blocking_issues>
""")
```
### Layer 2: Holistic Review via /review-work (5 agents)
Spawn a sub-agent that loads the `/review-work` skill. The review-work skill internally launches 5 parallel agents: Oracle (goal verification), unspecified-high (QA execution), Oracle (code quality), Oracle (security), unspecified-high (context mining). All 5 must pass for the review to pass.
```
task(
category="unspecified-high",
run_in_background=true,
load_skills=["review-work"],
description="Run /review-work on all unpublished changes",
prompt="""
Run /review-work on the unpublished changes between v{PUBLISHED} and HEAD.
GOAL: Review all changes heading into npm publish of oh-my-opencode. These changes span {COMMIT_COUNT} commits across {FILE_COUNT} files.
CONSTRAINTS:
- This is a plugin published to npm — public API stability matters
- TypeScript strict mode, Bun runtime
- No `as any`, `@ts-ignore`, `@ts-expect-error`
- Factory pattern (createXXX) for tools, hooks, agents
- kebab-case files, barrel exports, no catch-all files
BACKGROUND: Pre-publish review of oh-my-opencode, an OpenCode plugin with 1268 TypeScript files, 160k LOC. Changes since v{PUBLISHED} are about to be published.
The diff base is: git diff v{PUBLISHED}..HEAD
Follow the /review-work skill flow exactly — launch all 5 review agents and collect results. Do NOT skip any of the 5 agents.
""")
```
### Layer 3: Oracle Release Synthesis (1 agent)
The oracle gets the full picture — all commits, full diff stat, and changed file list. It provides the final release readiness assessment.
```
task(
subagent_type="oracle",
run_in_background=true,
load_skills=[],
description="Oracle: overall release synthesis and version bump recommendation",
prompt="""
<review_type>RELEASE SYNTHESIS — OVERALL ASSESSMENT</review_type>
<project>oh-my-opencode (npm package)</project>
<published_version>{PUBLISHED}</published_version>
<local_version>{LOCAL}</local_version>
<all_commits>
{ALL COMMITS since published version — hash, message, author, date}
</all_commits>
<diff_stat>
{DIFF_STAT — files changed, insertions, deletions}
</diff_stat>
<changed_files>
{CHANGED_FILES — full list of modified file paths}
</changed_files>
<full_diff>
{FULL_DIFF — the complete git diff between published version and HEAD}
</full_diff>
<file_contents>
{Read and include full content of KEY changed files — focus on public API surfaces, config schemas, agent definitions, hook registrations, tool registrations}
</file_contents>
You are the final gate before an npm publish. 10 ultrabrain agents are reviewing individual changes and 5 review-work agents are doing holistic review. Your job is the bird's-eye view that those focused reviews might miss.
SYNTHESIS CHECKLIST:
1. **Release Coherence**: Do these changes tell a coherent story? Or is this a grab-bag of unrelated changes that should be split into multiple releases?
2. **Version Bump**: Based on semver:
- PATCH: Bug fixes only, no behavior changes
- MINOR: New features, backward-compatible changes
- MAJOR: Breaking changes to public API, config format, or behavior
Recommend the correct bump with specific justification.
3. **Breaking Changes Audit**: Exhaustively list every change that could break existing users. Check:
- Config schema changes (new required fields, removed fields, renamed fields)
- Agent behavior changes (different prompts, different model routing)
- Hook contract changes (new parameters, removed hooks, renamed hooks)
- Tool interface changes (new required params, different return types)
- CLI changes (new commands, changed flags, different output)
- Skill format changes (SKILL.md schema changes)
4. **Migration Requirements**: If there are breaking changes, what migration steps do users need? Is there auto-migration in place?
5. **Dependency Changes**: New dependencies added? Dependencies removed? Version bumps? Any supply chain risk?
6. **Changelog Draft**: Write a draft changelog entry grouped by:
- feat: New features
- fix: Bug fixes
- refactor: Internal changes (no user impact)
- breaking: Breaking changes with migration instructions
- docs: Documentation changes
7. **Deployment Risk Assessment**:
- SAFE: Routine changes, well-tested, low risk
- CAUTION: Significant changes but manageable risk
- RISKY: Large surface area changes, insufficient testing, or breaking changes without migration
- BLOCK: Critical issues found, do NOT publish
8. **Post-Publish Monitoring**: What should be monitored after publish? Error rates, specific features, user feedback channels.
OUTPUT FORMAT:
<verdict>SAFE / CAUTION / RISKY / BLOCK</verdict>
<recommended_version_bump>PATCH / MINOR / MAJOR</recommended_version_bump>
<version_bump_justification>Why this bump level</version_bump_justification>
<release_coherence>Assessment of whether changes belong in one release</release_coherence>
<breaking_changes>
Exhaustive list, or "None" if none.
For each:
- What changed
- Who is affected
- Migration steps
</breaking_changes>
<changelog_draft>
Ready-to-use changelog entry
</changelog_draft>
<deployment_risk>
Overall risk assessment with specific concerns
</deployment_risk>
<monitoring_recommendations>
What to watch after publish
</monitoring_recommendations>
<blocking_issues>Issues that MUST be fixed before publish. Empty if SAFE.</blocking_issues>
""")
```
---
## Phase 3: Collect Results
As agents complete (system notifications), collect via `background_output(task_id="...")`.
Track completion in a table:
| # | Agent | Type | Status | Verdict |
|---|-------|------|--------|---------|
| 1-10 | Ultrabrain: {group_name} | ultrabrain | pending | — |
| 11 | Review-Work Coordinator | unspecified-high | pending | — |
| 12 | Release Synthesis Oracle | oracle | pending | — |
Do NOT deliver the final report until ALL agents have completed.
---
## Phase 4: Final Verdict
<verdict_logic>
**BLOCK** if:
- Oracle verdict is BLOCK
- Any ultrabrain found CRITICAL blocking issues
- Review-work failed on any MAIN agent
**RISKY** if:
- Oracle verdict is RISKY
- Multiple ultrabrains returned CAUTION or FAIL
- Review-work passed but with significant findings
**CAUTION** if:
- Oracle verdict is CAUTION
- A few ultrabrains flagged minor issues
- Review-work passed cleanly
**SAFE** if:
- Oracle verdict is SAFE
- All ultrabrains passed
- Review-work passed
</verdict_logic>
Compile the final report:
```markdown
# Pre-Publish Review — oh-my-opencode
## Release: v{PUBLISHED} -> v{LOCAL}
**Commits:** {COMMIT_COUNT} | **Files Changed:** {FILE_COUNT} | **Agents:** {AGENT_COUNT}
---
## Overall Verdict: SAFE / CAUTION / RISKY / BLOCK
## Recommended Version Bump: PATCH / MINOR / MAJOR
{Justification from Oracle}
---
## Per-Change Analysis (Ultrabrains)
| # | Change Group | Verdict | Risk | Breaking? | Blocking Issues |
|---|-------------|---------|------|-----------|-----------------|
| 1 | {name} | PASS/FAIL | SAFE/CAUTION/RISKY | YES/NO | {count or "none"} |
| ... | ... | ... | ... | ... | ... |
### Blocking Issues from Per-Change Analysis
{Aggregated from all ultrabrains — deduplicated}
---
## Holistic Review (Review-Work)
| # | Review Area | Verdict | Confidence |
|---|------------|---------|------------|
| 1 | Goal & Constraint Verification | PASS/FAIL | HIGH/MED/LOW |
| 2 | QA Execution | PASS/FAIL | HIGH/MED/LOW |
| 3 | Code Quality | PASS/FAIL | HIGH/MED/LOW |
| 4 | Security | PASS/FAIL | Severity |
| 5 | Context Mining | PASS/FAIL | HIGH/MED/LOW |
### Blocking Issues from Holistic Review
{Aggregated from review-work}
---
## Release Synthesis (Oracle)
### Breaking Changes
{From Oracle — exhaustive list or "None"}
### Changelog Draft
{From Oracle — ready to use}
### Deployment Risk
{From Oracle — specific concerns}
### Post-Publish Monitoring
{From Oracle — what to watch}
---
## All Blocking Issues (Prioritized)
{Deduplicated, merged from all three layers, ordered by severity}
## Recommendations
{If BLOCK/RISKY: exactly what to fix, in priority order}
{If CAUTION: suggestions worth considering before publish}
{If SAFE: non-blocking improvements for future}
```
---
## Anti-Patterns
| Violation | Severity |
|-----------|----------|
| Publishing without waiting for all agents | **CRITICAL** |
| Spawning ultrabrains sequentially instead of in parallel | CRITICAL |
| Using `run_in_background=false` for any agent | CRITICAL |
| Skipping the Oracle synthesis | HIGH |
| Not reading file contents for Oracle (it cannot read files) | HIGH |
| Grouping all changes into 1-2 ultrabrains instead of distributing | HIGH |
| Delivering verdict before all agents complete | HIGH |
| Not including diff in ultrabrain prompts | MAJOR |

View File

@@ -0,0 +1,76 @@
{
"skill_name": "work-with-pr",
"evals": [
{
"id": 1,
"prompt": "I need to add a `max_background_agents` config option to oh-my-opencode that limits how many background agents can run simultaneously. It should be in the plugin config schema with a default of 5. Add validation and make sure the background manager respects it. Create a PR for this.",
"expected_output": "Agent creates worktree, implements config option with schema validation, adds tests, creates PR, iterates through verification gates until merged",
"files": [],
"assertions": [
{"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory (not main working directory)"},
{"id": "branch-from-dev", "text": "Branch is created from origin/dev (not master/main)"},
{"id": "atomic-commits", "text": "Plan specifies multiple atomic commits for multi-file changes"},
{"id": "local-validation", "text": "Runs bun run typecheck, bun test, and bun run build before pushing"},
{"id": "pr-targets-dev", "text": "PR is created targeting dev branch (not master)"},
{"id": "three-gates", "text": "Verification loop includes all 3 gates: CI, review-work, and Cubic"},
{"id": "gate-ordering", "text": "Gates are checked in order: CI first, then review-work, then Cubic"},
{"id": "cubic-check-method", "text": "Cubic check uses gh api to check cubic-dev-ai[bot] reviews for 'No issues found'"},
{"id": "worktree-cleanup", "text": "Plan includes worktree cleanup after merge"},
{"id": "real-file-references", "text": "Code changes reference actual files in the codebase (config schema, background manager)"}
]
},
{
"id": 2,
"prompt": "The atlas hook has a bug where it crashes when boulder.json is missing the worktree_path field. Fix it and land the fix as a PR. Make sure CI passes.",
"expected_output": "Agent creates worktree for the fix branch, adds null check and test for missing worktree_path, creates PR, iterates verification loop",
"files": [],
"assertions": [
{"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
{"id": "minimal-fix", "text": "Fix is minimal — adds null check, doesn't refactor unrelated code"},
{"id": "test-added", "text": "Test case added for the missing worktree_path scenario"},
{"id": "three-gates", "text": "Verification loop includes all 3 gates: CI, review-work, Cubic"},
{"id": "real-atlas-files", "text": "References actual atlas hook files in src/hooks/atlas/"},
{"id": "fix-branch-naming", "text": "Branch name follows fix/ prefix convention"}
]
},
{
"id": 3,
"prompt": "Refactor src/tools/delegate-task/constants.ts to split DEFAULT_CATEGORIES and CATEGORY_MODEL_REQUIREMENTS into separate files. Keep backward compatibility with the barrel export. Make a PR.",
"expected_output": "Agent creates worktree, splits file with atomic commits, ensures imports still work via barrel, creates PR, runs through all gates",
"files": [],
"assertions": [
{"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
{"id": "multiple-atomic-commits", "text": "Uses 2+ commits for the multi-file refactor"},
{"id": "barrel-export", "text": "Maintains backward compatibility via barrel re-export in constants.ts or index.ts"},
{"id": "three-gates", "text": "Verification loop includes all 3 gates"},
{"id": "real-constants-file", "text": "References actual src/tools/delegate-task/constants.ts file and its exports"}
]
},
{
"id": 4,
"prompt": "implement issue #100 - we need to add a new built-in MCP for arxiv paper search. just the basic search endpoint, nothing fancy. pr it",
"expected_output": "Agent creates worktree, implements arxiv MCP following existing MCP patterns (websearch, context7, grep_app), creates PR with proper template, verification loop runs",
"files": [],
"assertions": [
{"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
{"id": "follows-mcp-pattern", "text": "New MCP follows existing pattern from src/mcp/ (websearch, context7, grep_app)"},
{"id": "three-gates", "text": "Verification loop includes all 3 gates"},
{"id": "pr-targets-dev", "text": "PR targets dev branch"},
{"id": "local-validation", "text": "Runs local checks before pushing"}
]
},
{
"id": 5,
"prompt": "The comment-checker hook is too aggressive - it's flagging legitimate comments that happen to contain 'Note:' as AI slop. Relax the regex pattern and add test cases for the false positives. Work on a separate branch and make a PR.",
"expected_output": "Agent creates worktree, fixes regex, adds specific test cases for false positive scenarios, creates PR, all three gates pass",
"files": [],
"assertions": [
{"id": "worktree-isolation", "text": "Plan uses git worktree in a sibling directory"},
{"id": "real-comment-checker-files", "text": "References actual comment-checker hook files in the codebase"},
{"id": "regression-tests", "text": "Adds test cases specifically for 'Note:' false positive scenarios"},
{"id": "three-gates", "text": "Verification loop includes all 3 gates"},
{"id": "minimal-change", "text": "Only modifies regex and adds tests — no unrelated changes"}
]
}
]
}

View File

@@ -0,0 +1,138 @@
{
"skill_name": "work-with-pr",
"iteration": 1,
"summary": {
"with_skill": {
"pass_rate": 0.968,
"mean_duration_seconds": 340.2,
"stddev_duration_seconds": 169.3
},
"without_skill": {
"pass_rate": 0.516,
"mean_duration_seconds": 303.0,
"stddev_duration_seconds": 77.8
},
"delta": {
"pass_rate": 0.452,
"mean_duration_seconds": 37.2,
"stddev_duration_seconds": 91.5
}
},
"evals": [
{
"eval_name": "happy-path-feature-config-option",
"with_skill": {
"pass_rate": 1.0,
"passed": 10,
"total": 10,
"duration_seconds": 292,
"failed_assertions": []
},
"without_skill": {
"pass_rate": 0.4,
"passed": 4,
"total": 10,
"duration_seconds": 365,
"failed_assertions": [
{"assertion": "Plan uses git worktree in a sibling directory", "reason": "Uses git checkout -b, no worktree isolation"},
{"assertion": "Plan specifies multiple atomic commits for multi-file changes", "reason": "Steps listed sequentially but no atomic commit strategy mentioned"},
{"assertion": "Verification loop includes all 3 gates: CI, review-work, and Cubic", "reason": "Only mentions CI pipeline in step 6. No review-work or Cubic."},
{"assertion": "Gates are checked in order: CI first, then review-work, then Cubic", "reason": "No gate ordering - only CI mentioned"},
{"assertion": "Cubic check uses gh api to check cubic-dev-ai[bot] reviews", "reason": "No mention of Cubic at all"},
{"assertion": "Plan includes worktree cleanup after merge", "reason": "No worktree used, no cleanup needed"}
]
}
},
{
"eval_name": "bugfix-atlas-null-check",
"with_skill": {
"pass_rate": 1.0,
"passed": 6,
"total": 6,
"duration_seconds": 506,
"failed_assertions": []
},
"without_skill": {
"pass_rate": 0.667,
"passed": 4,
"total": 6,
"duration_seconds": 325,
"failed_assertions": [
{"assertion": "Plan uses git worktree in a sibling directory", "reason": "No worktree. Steps go directly to creating branch and modifying files."},
{"assertion": "Verification loop includes all 3 gates", "reason": "Only mentions CI pipeline (step 5). No review-work or Cubic."}
]
}
},
{
"eval_name": "refactor-split-constants",
"with_skill": {
"pass_rate": 1.0,
"passed": 5,
"total": 5,
"duration_seconds": 181,
"failed_assertions": []
},
"without_skill": {
"pass_rate": 0.4,
"passed": 2,
"total": 5,
"duration_seconds": 229,
"failed_assertions": [
{"assertion": "Plan uses git worktree in a sibling directory", "reason": "git checkout -b only, no worktree"},
{"assertion": "Uses 2+ commits for the multi-file refactor", "reason": "Single atomic commit: 'refactor: split delegate-task constants and category model requirements'"},
{"assertion": "Verification loop includes all 3 gates", "reason": "Only mentions typecheck/test/build. No review-work or Cubic."}
]
}
},
{
"eval_name": "new-mcp-arxiv-casual",
"with_skill": {
"pass_rate": 1.0,
"passed": 5,
"total": 5,
"duration_seconds": 152,
"failed_assertions": []
},
"without_skill": {
"pass_rate": 0.6,
"passed": 3,
"total": 5,
"duration_seconds": 197,
"failed_assertions": [
{"assertion": "Verification loop includes all 3 gates", "reason": "Only mentions bun test/typecheck/build. No review-work or Cubic."}
]
}
},
{
"eval_name": "regex-fix-false-positive",
"with_skill": {
"pass_rate": 0.8,
"passed": 4,
"total": 5,
"duration_seconds": 570,
"failed_assertions": [
{"assertion": "Only modifies regex and adds tests — no unrelated changes", "reason": "Also proposes config schema change (exclude_patterns) and Go binary update — goes beyond minimal fix"}
]
},
"without_skill": {
"pass_rate": 0.6,
"passed": 3,
"total": 5,
"duration_seconds": 399,
"failed_assertions": [
{"assertion": "Plan uses git worktree in a sibling directory", "reason": "git checkout -b, no worktree"},
{"assertion": "Verification loop includes all 3 gates", "reason": "Only bun test and typecheck. No review-work or Cubic."}
]
}
}
],
"analyst_observations": [
"Three-gates assertion (CI + review-work + Cubic) is the strongest discriminator: 5/5 with-skill vs 0/5 without-skill. Without the skill, agents never know about Cubic or review-work gates.",
"Worktree isolation is nearly as discriminating (5/5 vs 1/5). One without-skill run (eval-4) independently chose worktree, suggesting some agents already know worktree patterns, but the skill makes it consistent.",
"The skill's only failure (eval-5 minimal-change) reveals a potential over-engineering tendency: the skill-guided agent proposed config schema changes and Go binary updates for what should have been a minimal regex fix. Consider adding explicit guidance for fix-type tasks to stay minimal.",
"Duration tradeoff: with-skill is 12% slower on average (340s vs 303s), driven mainly by eval-2 (bugfix) and eval-5 (regex fix) where the skill's thorough verification planning adds overhead. For eval-1 and eval-3-4, with-skill was actually faster.",
"Without-skill duration has lower variance (stddev 78s vs 169s), suggesting the skill introduces more variable execution paths depending on task complexity.",
"Non-discriminating assertions: 'References actual files', 'PR targets dev', 'Runs local checks' — these pass regardless of skill. They validate baseline agent competence, not skill value. Consider removing or downweighting in future iterations.",
"Atomic commits assertion discriminates moderately (2/2 with-skill tested vs 0/2 without-skill tested). Without the skill, agents default to single commits even for multi-file refactors."
]
}

View File

@@ -0,0 +1,42 @@
# Benchmark: work-with-pr (Iteration 1)
## Summary
| Metric | With Skill | Without Skill | Delta |
|--------|-----------|---------------|-------|
| Pass Rate | 96.8% (30/31) | 51.6% (16/31) | +45.2% |
| Mean Duration | 340.2s | 303.0s | +37.2s |
| Duration Stddev | 169.3s | 77.8s | +91.5s |
## Per-Eval Breakdown
| Eval | With Skill | Without Skill | Delta |
|------|-----------|---------------|-------|
| happy-path-feature-config-option | 100% (10/10) | 40% (4/10) | +60% |
| bugfix-atlas-null-check | 100% (6/6) | 67% (4/6) | +33% |
| refactor-split-constants | 100% (5/5) | 40% (2/5) | +60% |
| new-mcp-arxiv-casual | 100% (5/5) | 60% (3/5) | +40% |
| regex-fix-false-positive | 80% (4/5) | 60% (3/5) | +20% |
## Key Discriminators
- **three-gates** (CI + review-work + Cubic): 5/5 vs 0/5 — strongest signal
- **worktree-isolation**: 5/5 vs 1/5
- **atomic-commits**: 2/2 vs 0/2
- **cubic-check-method**: 1/1 vs 0/1
## Non-Discriminating Assertions
- References actual files: passes in both conditions
- PR targets dev: passes in both conditions
- Runs local checks before pushing: passes in both conditions
## Only With-Skill Failure
- **eval-5 minimal-change**: Skill-guided agent proposed config schema changes and Go binary update for a minimal regex fix. The skill may encourage over-engineering in fix scenarios.
## Analyst Notes
- The skill adds most value for procedural knowledge (verification gates, worktree workflow) that agents cannot infer from codebase alone.
- Duration cost is modest (+12%) and acceptable given the +45% pass rate improvement.
- Consider adding explicit "fix-type tasks: stay minimal" guidance in iteration 2.

View File

@@ -0,0 +1,57 @@
{
"eval_id": 1,
"eval_name": "happy-path-feature-config-option",
"prompt": "I need to add a `max_background_agents` config option to oh-my-opencode that limits how many background agents can run simultaneously. It should be in the plugin config schema with a default of 5. Add validation and make sure the background manager respects it. Create a PR for this.",
"assertions": [
{
"id": "worktree-isolation",
"text": "Plan uses git worktree in a sibling directory (not main working directory)",
"type": "manual"
},
{
"id": "branch-from-dev",
"text": "Branch is created from origin/dev (not master/main)",
"type": "manual"
},
{
"id": "atomic-commits",
"text": "Plan specifies multiple atomic commits for multi-file changes",
"type": "manual"
},
{
"id": "local-validation",
"text": "Runs bun run typecheck, bun test, and bun run build before pushing",
"type": "manual"
},
{
"id": "pr-targets-dev",
"text": "PR is created targeting dev branch (not master)",
"type": "manual"
},
{
"id": "three-gates",
"text": "Verification loop includes all 3 gates: CI, review-work, and Cubic",
"type": "manual"
},
{
"id": "gate-ordering",
"text": "Gates are checked in order: CI first, then review-work, then Cubic",
"type": "manual"
},
{
"id": "cubic-check-method",
"text": "Cubic check uses gh api to check cubic-dev-ai[bot] reviews for 'No issues found'",
"type": "manual"
},
{
"id": "worktree-cleanup",
"text": "Plan includes worktree cleanup after merge",
"type": "manual"
},
{
"id": "real-file-references",
"text": "Code changes reference actual files in the codebase (config schema, background manager)",
"type": "manual"
}
]
}

View File

@@ -0,0 +1,15 @@
{
"run_id": "eval-1-with_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": true, "evidence": "Uses ../omo-wt/feat-max-background-agents"},
{"text": "Branch is created from origin/dev", "passed": true, "evidence": "git checkout dev && git pull origin dev, then branch"},
{"text": "Plan specifies multiple atomic commits for multi-file changes", "passed": true, "evidence": "2 commits: schema+tests, then concurrency+manager"},
{"text": "Runs bun run typecheck, bun test, and bun run build before pushing", "passed": true, "evidence": "Explicit pre-push section with all 3 commands"},
{"text": "PR is created targeting dev branch", "passed": true, "evidence": "--base dev in gh pr create"},
{"text": "Verification loop includes all 3 gates: CI, review-work, and Cubic", "passed": true, "evidence": "Gate A (CI), Gate B (review-work 5 agents), Gate C (Cubic)"},
{"text": "Gates are checked in order: CI first, then review-work, then Cubic", "passed": true, "evidence": "Explicit ordering in verify loop pseudocode"},
{"text": "Cubic check uses gh api to check cubic-dev-ai[bot] reviews", "passed": true, "evidence": "Mentions cubic-dev-ai[bot] and 'No issues found' signal"},
{"text": "Plan includes worktree cleanup after merge", "passed": true, "evidence": "Phase 4: git worktree remove ../omo-wt/feat-max-background-agents"},
{"text": "Code changes reference actual files in the codebase", "passed": true, "evidence": "References src/config/schema/background-task.ts, src/features/background-agent/concurrency.ts, manager.ts"}
]
}

View File

@@ -0,0 +1,454 @@
# Code Changes: `max_background_agents` Config Option
## 1. `src/config/schema/background-task.ts` — Add schema field
```typescript
import { z } from "zod"
export const BackgroundTaskConfigSchema = z.object({
defaultConcurrency: z.number().min(1).optional(),
providerConcurrency: z.record(z.string(), z.number().min(0)).optional(),
modelConcurrency: z.record(z.string(), z.number().min(0)).optional(),
maxDepth: z.number().int().min(1).optional(),
maxDescendants: z.number().int().min(1).optional(),
/** Maximum number of background agents that can run simultaneously across all models/providers (default: 5, minimum: 1) */
maxBackgroundAgents: z.number().int().min(1).optional(),
/** Stale timeout in milliseconds - interrupt tasks with no activity for this duration (default: 180000 = 3 minutes, minimum: 60000 = 1 minute) */
staleTimeoutMs: z.number().min(60000).optional(),
/** Timeout for tasks that never received any progress update, falling back to startedAt (default: 1800000 = 30 minutes, minimum: 60000 = 1 minute) */
messageStalenessTimeoutMs: z.number().min(60000).optional(),
syncPollTimeoutMs: z.number().min(60000).optional(),
})
export type BackgroundTaskConfig = z.infer<typeof BackgroundTaskConfigSchema>
```
**Rationale:** Follows exact same pattern as `maxDepth` and `maxDescendants``z.number().int().min(1).optional()`. The field is optional; runtime default of 5 is applied in `ConcurrencyManager`. No barrel export changes needed since `src/config/schema.ts` already does `export * from "./schema/background-task"` and the type is inferred.
---
## 2. `src/config/schema/background-task.test.ts` — Add validation tests
Append after the existing `syncPollTimeoutMs` describe block (before the closing `})`):
```typescript
describe("maxBackgroundAgents", () => {
describe("#given valid maxBackgroundAgents (10)", () => {
test("#when parsed #then returns correct value", () => {
const result = BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 10 })
expect(result.maxBackgroundAgents).toBe(10)
})
})
describe("#given maxBackgroundAgents of 1 (minimum)", () => {
test("#when parsed #then returns correct value", () => {
const result = BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 1 })
expect(result.maxBackgroundAgents).toBe(1)
})
})
describe("#given maxBackgroundAgents below minimum (0)", () => {
test("#when parsed #then throws ZodError", () => {
let thrownError: unknown
try {
BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 0 })
} catch (error) {
thrownError = error
}
expect(thrownError).toBeInstanceOf(ZodError)
})
})
describe("#given maxBackgroundAgents not provided", () => {
test("#when parsed #then field is undefined", () => {
const result = BackgroundTaskConfigSchema.parse({})
expect(result.maxBackgroundAgents).toBeUndefined()
})
})
describe('#given maxBackgroundAgents is non-integer (2.5)', () => {
test("#when parsed #then throws ZodError", () => {
let thrownError: unknown
try {
BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 2.5 })
} catch (error) {
thrownError = error
}
expect(thrownError).toBeInstanceOf(ZodError)
})
})
})
```
**Rationale:** Follows exact test pattern from `maxDepth`, `maxDescendants`, and `syncPollTimeoutMs` tests. Uses `#given`/`#when`/`#then` nested describe style. Tests valid, minimum boundary, below minimum, not provided, and non-integer cases.
---
## 3. `src/features/background-agent/concurrency.ts` — Add global agent limit
```typescript
import type { BackgroundTaskConfig } from "../../config/schema"
const DEFAULT_MAX_BACKGROUND_AGENTS = 5
/**
* Queue entry with settled-flag pattern to prevent double-resolution.
*
* The settled flag ensures that cancelWaiters() doesn't reject
* an entry that was already resolved by release().
*/
interface QueueEntry {
resolve: () => void
rawReject: (error: Error) => void
settled: boolean
}
export class ConcurrencyManager {
private config?: BackgroundTaskConfig
private counts: Map<string, number> = new Map()
private queues: Map<string, QueueEntry[]> = new Map()
private globalRunningCount = 0
constructor(config?: BackgroundTaskConfig) {
this.config = config
}
getMaxBackgroundAgents(): number {
return this.config?.maxBackgroundAgents ?? DEFAULT_MAX_BACKGROUND_AGENTS
}
getGlobalRunningCount(): number {
return this.globalRunningCount
}
canSpawnGlobally(): boolean {
return this.globalRunningCount < this.getMaxBackgroundAgents()
}
acquireGlobal(): void {
this.globalRunningCount++
}
releaseGlobal(): void {
if (this.globalRunningCount > 0) {
this.globalRunningCount--
}
}
getConcurrencyLimit(model: string): number {
// ... existing implementation unchanged ...
}
async acquire(model: string): Promise<void> {
// ... existing implementation unchanged ...
}
release(model: string): void {
// ... existing implementation unchanged ...
}
cancelWaiters(model: string): void {
// ... existing implementation unchanged ...
}
clear(): void {
for (const [model] of this.queues) {
this.cancelWaiters(model)
}
this.counts.clear()
this.queues.clear()
this.globalRunningCount = 0
}
getCount(model: string): number {
return this.counts.get(model) ?? 0
}
getQueueLength(model: string): number {
return this.queues.get(model)?.length ?? 0
}
}
```
**Key changes:**
- Add `DEFAULT_MAX_BACKGROUND_AGENTS = 5` constant
- Add `globalRunningCount` private field
- Add `getMaxBackgroundAgents()`, `getGlobalRunningCount()`, `canSpawnGlobally()`, `acquireGlobal()`, `releaseGlobal()` methods
- `clear()` resets `globalRunningCount` to 0
- All existing per-model methods remain unchanged
---
## 4. `src/features/background-agent/concurrency.test.ts` — Add global limit tests
Append new describe block:
```typescript
describe("ConcurrencyManager global background agent limit", () => {
test("should default max background agents to 5 when no config", () => {
// given
const manager = new ConcurrencyManager()
// when
const max = manager.getMaxBackgroundAgents()
// then
expect(max).toBe(5)
})
test("should use configured maxBackgroundAgents", () => {
// given
const config: BackgroundTaskConfig = { maxBackgroundAgents: 10 }
const manager = new ConcurrencyManager(config)
// when
const max = manager.getMaxBackgroundAgents()
// then
expect(max).toBe(10)
})
test("should allow spawning when under global limit", () => {
// given
const config: BackgroundTaskConfig = { maxBackgroundAgents: 2 }
const manager = new ConcurrencyManager(config)
// when
manager.acquireGlobal()
// then
expect(manager.canSpawnGlobally()).toBe(true)
expect(manager.getGlobalRunningCount()).toBe(1)
})
test("should block spawning when at global limit", () => {
// given
const config: BackgroundTaskConfig = { maxBackgroundAgents: 2 }
const manager = new ConcurrencyManager(config)
// when
manager.acquireGlobal()
manager.acquireGlobal()
// then
expect(manager.canSpawnGlobally()).toBe(false)
expect(manager.getGlobalRunningCount()).toBe(2)
})
test("should allow spawning again after release", () => {
// given
const config: BackgroundTaskConfig = { maxBackgroundAgents: 1 }
const manager = new ConcurrencyManager(config)
manager.acquireGlobal()
// when
manager.releaseGlobal()
// then
expect(manager.canSpawnGlobally()).toBe(true)
expect(manager.getGlobalRunningCount()).toBe(0)
})
test("should not go below zero on extra release", () => {
// given
const manager = new ConcurrencyManager()
// when
manager.releaseGlobal()
// then
expect(manager.getGlobalRunningCount()).toBe(0)
})
test("should reset global count on clear", () => {
// given
const config: BackgroundTaskConfig = { maxBackgroundAgents: 5 }
const manager = new ConcurrencyManager(config)
manager.acquireGlobal()
manager.acquireGlobal()
manager.acquireGlobal()
// when
manager.clear()
// then
expect(manager.getGlobalRunningCount()).toBe(0)
})
})
```
---
## 5. `src/features/background-agent/manager.ts` — Enforce global limit
### In `launch()` method — add check before task creation (after `reserveSubagentSpawn`):
```typescript
async launch(input: LaunchInput): Promise<BackgroundTask> {
// ... existing logging ...
if (!input.agent || input.agent.trim() === "") {
throw new Error("Agent parameter is required")
}
// Check global background agent limit before spawn guard
if (!this.concurrencyManager.canSpawnGlobally()) {
const max = this.concurrencyManager.getMaxBackgroundAgents()
const current = this.concurrencyManager.getGlobalRunningCount()
throw new Error(
`Background agent spawn blocked: ${current} agents running, max is ${max}. Wait for existing tasks to complete or increase background_task.maxBackgroundAgents.`
)
}
const spawnReservation = await this.reserveSubagentSpawn(input.parentSessionID)
try {
// ... existing code ...
// After task creation, before queueing:
this.concurrencyManager.acquireGlobal()
// ... rest of existing code ...
} catch (error) {
spawnReservation.rollback()
throw error
}
}
```
### In `trackTask()` method — add global check:
```typescript
async trackTask(input: { ... }): Promise<BackgroundTask> {
const existingTask = this.tasks.get(input.taskId)
if (existingTask) {
// ... existing re-registration logic unchanged ...
return existingTask
}
// Check global limit for new external tasks
if (!this.concurrencyManager.canSpawnGlobally()) {
const max = this.concurrencyManager.getMaxBackgroundAgents()
const current = this.concurrencyManager.getGlobalRunningCount()
throw new Error(
`Background agent spawn blocked: ${current} agents running, max is ${max}. Wait for existing tasks to complete or increase background_task.maxBackgroundAgents.`
)
}
// ... existing task creation ...
this.concurrencyManager.acquireGlobal()
// ... rest unchanged ...
}
```
### In `tryCompleteTask()` — release global slot:
```typescript
private async tryCompleteTask(task: BackgroundTask, source: string): Promise<boolean> {
if (task.status !== "running") {
// ... existing guard ...
return false
}
task.status = "completed"
task.completedAt = new Date()
// ... existing history record ...
removeTaskToastTracking(task.id)
// Release per-model concurrency
if (task.concurrencyKey) {
this.concurrencyManager.release(task.concurrencyKey)
task.concurrencyKey = undefined
}
// Release global slot
this.concurrencyManager.releaseGlobal()
// ... rest unchanged ...
}
```
### In `cancelTask()` — release global slot:
```typescript
async cancelTask(taskId: string, options?: { ... }): Promise<boolean> {
// ... existing code up to concurrency release ...
if (task.concurrencyKey) {
this.concurrencyManager.release(task.concurrencyKey)
task.concurrencyKey = undefined
}
// Release global slot (only for running tasks, pending never acquired)
if (task.status !== "pending") {
this.concurrencyManager.releaseGlobal()
}
// ... rest unchanged ...
}
```
### In `handleEvent()` session.error handler — release global slot:
```typescript
if (event.type === "session.error") {
// ... existing error handling ...
task.status = "error"
// ...
if (task.concurrencyKey) {
this.concurrencyManager.release(task.concurrencyKey)
task.concurrencyKey = undefined
}
// Release global slot
this.concurrencyManager.releaseGlobal()
// ... rest unchanged ...
}
```
### In prompt error handler inside `startTask()` — release global slot:
```typescript
promptWithModelSuggestionRetry(this.client, { ... }).catch((error) => {
// ... existing error handling ...
if (existingTask) {
existingTask.status = "interrupt"
// ...
if (existingTask.concurrencyKey) {
this.concurrencyManager.release(existingTask.concurrencyKey)
existingTask.concurrencyKey = undefined
}
// Release global slot
this.concurrencyManager.releaseGlobal()
// ... rest unchanged ...
}
})
```
---
## Summary of Changes
| File | Lines Added | Lines Modified |
|------|-------------|----------------|
| `src/config/schema/background-task.ts` | 2 | 0 |
| `src/config/schema/background-task.test.ts` | ~50 | 0 |
| `src/features/background-agent/concurrency.ts` | ~25 | 1 (`clear()`) |
| `src/features/background-agent/concurrency.test.ts` | ~70 | 0 |
| `src/features/background-agent/manager.ts` | ~20 | 0 |
Total: ~167 lines added, 1 line modified across 5 files.

View File

@@ -0,0 +1,136 @@
# Execution Plan: `max_background_agents` Config Option
## Phase 0: Setup — Branch + Worktree
1. **Create branch** from `dev`:
```bash
git checkout dev && git pull origin dev
git checkout -b feat/max-background-agents
```
2. **Create worktree** in sibling directory:
```bash
mkdir -p ../omo-wt
git worktree add ../omo-wt/feat-max-background-agents feat/max-background-agents
```
3. **All subsequent work** happens in `../omo-wt/feat-max-background-agents/`, never in the main worktree.
---
## Phase 1: Implement — Atomic Commits
### Commit 1: Add `max_background_agents` to config schema
**Files changed:**
- `src/config/schema/background-task.ts` — Add `maxBackgroundAgents` field to `BackgroundTaskConfigSchema`
- `src/config/schema/background-task.test.ts` — Add validation tests for the new field
**What:**
- Add `maxBackgroundAgents: z.number().int().min(1).optional()` to `BackgroundTaskConfigSchema`
- Default value handled at runtime (5), not in schema (all schema fields are optional per convention)
- Add given/when/then tests: valid value, below minimum, not provided, non-number
### Commit 2: Enforce limit in BackgroundManager + ConcurrencyManager
**Files changed:**
- `src/features/background-agent/concurrency.ts` — Add global agent count tracking + `getGlobalRunningCount()` + `canSpawnGlobally()`
- `src/features/background-agent/concurrency.test.ts` — Tests for global limit enforcement
- `src/features/background-agent/manager.ts` — Check global limit before `launch()` and `trackTask()`
**What:**
- `ConcurrencyManager` already manages per-model concurrency. Add a separate global counter:
- `private globalRunningCount: number = 0`
- `private maxBackgroundAgents: number` (from config, default 5)
- `acquireGlobal()` / `releaseGlobal()` methods
- `getGlobalRunningCount()` for observability
- `BackgroundManager.launch()` checks `concurrencyManager.canSpawnGlobally()` before creating task
- `BackgroundManager.trackTask()` also checks global limit
- On task completion/cancellation/error, call `releaseGlobal()`
- Throw descriptive error when limit hit: `"Background agent spawn blocked: ${current} agents running, max is ${max}. Wait for existing tasks to complete or increase background_task.maxBackgroundAgents."`
### Local Validation
```bash
bun run typecheck
bun test src/config/schema/background-task.test.ts
bun test src/features/background-agent/concurrency.test.ts
bun run build
```
---
## Phase 2: PR Creation
1. **Push branch:**
```bash
git push -u origin feat/max-background-agents
```
2. **Create PR** targeting `dev`:
```bash
gh pr create \
--base dev \
--title "feat: add max_background_agents config to limit concurrent background agents" \
--body-file /tmp/pull-request-max-background-agents-$(date +%s).md
```
---
## Phase 3: Verify Loop
### Gate A: CI
- Wait for `ci.yml` workflow to complete
- Check: `gh pr checks <PR_NUMBER> --watch`
- If fails: read logs, fix, push, re-check
### Gate B: review-work (5 agents)
- Run `/review-work` skill which launches 5 parallel background sub-agents:
1. Oracle — goal/constraint verification
2. Oracle — code quality
3. Oracle — security
4. Hephaestus — hands-on QA execution
5. Hephaestus — context mining from GitHub/git
- All 5 must pass. If any fails, fix and re-push.
### Gate C: Cubic (cubic-dev-ai[bot])
- Wait for Cubic bot review on PR
- Must say "No issues found"
- If issues found: address feedback, push, re-check
### Loop
```
while (!allGatesPass) {
if (CI fails) → fix → push → continue
if (review-work fails) → fix → push → continue
if (Cubic has issues) → fix → push → continue
}
```
---
## Phase 4: Merge + Cleanup
1. **Squash merge:**
```bash
gh pr merge <PR_NUMBER> --squash --delete-branch
```
2. **Remove worktree:**
```bash
git worktree remove ../omo-wt/feat-max-background-agents
```
---
## File Impact Summary
| File | Change Type |
|------|-------------|
| `src/config/schema/background-task.ts` | Modified — add schema field |
| `src/config/schema/background-task.test.ts` | Modified — add validation tests |
| `src/features/background-agent/concurrency.ts` | Modified — add global limit tracking |
| `src/features/background-agent/concurrency.test.ts` | Modified — add global limit tests |
| `src/features/background-agent/manager.ts` | Modified — enforce global limit in launch/trackTask |
5 files changed across 2 atomic commits. No new files created (follows existing patterns).

View File

@@ -0,0 +1,47 @@
# PR Description
**Title:** `feat: add max_background_agents config to limit concurrent background agents`
**Base:** `dev`
---
## Summary
- Add `maxBackgroundAgents` field to `BackgroundTaskConfigSchema` (default: 5, min: 1) to cap total simultaneous background agents across all models/providers
- Enforce the global limit in `BackgroundManager.launch()` and `trackTask()` with descriptive error messages when the limit is hit
- Release global slots on task completion, cancellation, error, and interrupt to prevent slot leaks
## Motivation
The existing concurrency system in `ConcurrencyManager` limits agents **per model/provider** (e.g., 5 concurrent `anthropic/claude-opus-4-6` tasks). However, there is no **global** cap across all models. A user running tasks across multiple providers could spawn an unbounded number of background agents, exhausting system resources.
`max_background_agents` provides a single knob to limit total concurrent background agents regardless of which model they use.
## Config Usage
```jsonc
// .opencode/oh-my-opencode.jsonc
{
"background_task": {
"maxBackgroundAgents": 10 // default: 5, min: 1
}
}
```
## Changes
| File | What |
|------|------|
| `src/config/schema/background-task.ts` | Add `maxBackgroundAgents` schema field |
| `src/config/schema/background-task.test.ts` | Validation tests (valid, boundary, invalid) |
| `src/features/background-agent/concurrency.ts` | Global counter + `canSpawnGlobally()` / `acquireGlobal()` / `releaseGlobal()` |
| `src/features/background-agent/concurrency.test.ts` | Global limit unit tests |
| `src/features/background-agent/manager.ts` | Enforce global limit in `launch()`, `trackTask()`; release in completion/cancel/error paths |
## Testing
- `bun test src/config/schema/background-task.test.ts` — schema validation
- `bun test src/features/background-agent/concurrency.test.ts` — global limit enforcement
- `bun run typecheck` — clean
- `bun run build` — clean

View File

@@ -0,0 +1,163 @@
# Verification Strategy
## Pre-Push Local Validation
Before every push, run all three checks sequentially:
```bash
bun run typecheck && bun test && bun run build
```
Specific test files to watch:
```bash
bun test src/config/schema/background-task.test.ts
bun test src/features/background-agent/concurrency.test.ts
```
---
## Gate A: CI (`ci.yml`)
### What CI runs
1. **Tests (split):** mock-heavy tests run in isolation (separate `bun test` processes), rest in batch
2. **Typecheck:** `bun run typecheck` (tsc --noEmit)
3. **Build:** `bun run build` (ESM + declarations + schema)
4. **Schema auto-commit:** if generated schema changed, CI commits it
### How to monitor
```bash
gh pr checks <PR_NUMBER> --watch
```
### Common failure scenarios and fixes
| Failure | Likely Cause | Fix |
|---------|-------------|-----|
| Typecheck error | New field not matching existing type imports | Verify `BackgroundTaskConfig` type is auto-inferred from schema, no manual type updates needed |
| Test failure | Test assertion wrong or missing import | Fix test, re-push |
| Build failure | Import cycle or missing export | Check barrel exports in `src/config/schema.ts` (already re-exports via `export *`) |
| Schema auto-commit | Generated JSON schema changed | Pull the auto-commit, rebase if needed |
### Recovery
```bash
# Read CI logs
gh run view <RUN_ID> --log-failed
# Fix, commit, push
git add -A && git commit -m "fix: address CI failure" && git push
```
---
## Gate B: review-work (5 parallel agents)
### What it checks
Run `/review-work` which launches 5 background sub-agents:
| Agent | Role | What it checks for this PR |
|-------|------|---------------------------|
| Oracle (goal) | Goal/constraint verification | Does `maxBackgroundAgents` actually limit agents? Is default 5? Is min 1? |
| Oracle (quality) | Code quality | Follows existing patterns? No catch-all files? Under 200 LOC? given/when/then tests? |
| Oracle (security) | Security review | No injection vectors, no unsafe defaults, proper input validation via Zod |
| Hephaestus (QA) | Hands-on QA execution | Actually runs tests, checks typecheck, verifies build |
| Hephaestus (context) | Context mining | Checks git history, related issues, ensures no duplicate/conflicting PRs |
### Pass criteria
All 5 agents must pass. Any single failure blocks.
### Common failure scenarios and fixes
| Agent | Likely Issue | Fix |
|-------|-------------|-----|
| Oracle (goal) | Global limit not enforced in all exit paths (completion, cancel, error, interrupt) | Audit every status transition in `manager.ts` that should call `releaseGlobal()` |
| Oracle (quality) | Test style not matching given/when/then | Restructure tests with `#given`/`#when`/`#then` describe nesting |
| Oracle (quality) | File exceeds 200 LOC | `concurrency.ts` is 137 LOC + ~25 new = ~162 LOC, safe. `manager.ts` is already large but we're adding ~20 lines to existing methods, not creating new responsibility |
| Oracle (security) | Integer overflow or negative values | Zod `.int().min(1)` handles this at config parse time |
| Hephaestus (QA) | Test actually fails when run | Run tests locally first, fix before push |
### Recovery
```bash
# Review agent output
background_output(task_id="<review-work-task-id>")
# Fix identified issues
# ... edit files ...
git add -A && git commit -m "fix: address review-work feedback" && git push
```
---
## Gate C: Cubic (`cubic-dev-ai[bot]`)
### What it checks
Cubic is an automated code review bot that analyzes the PR diff. It must respond with "No issues found" for the gate to pass.
### Common failure scenarios and fixes
| Issue | Likely Cause | Fix |
|-------|-------------|-----|
| "Missing error handling" | `releaseGlobal()` not called in some error path | Add `releaseGlobal()` to the missed path |
| "Inconsistent naming" | Field name doesn't match convention | Use `maxBackgroundAgents` (camelCase in schema, `max_background_agents` in JSONC config) |
| "Missing documentation" | No JSDoc on new public methods | Add JSDoc comments to `canSpawnGlobally()`, `acquireGlobal()`, `releaseGlobal()`, `getMaxBackgroundAgents()` |
| "Test coverage gap" | Missing edge case test | Add the specific test case Cubic identifies |
### Recovery
```bash
# Read Cubic's review
gh api repos/code-yeongyu/oh-my-openagent/pulls/<PR_NUMBER>/reviews
# Address each comment
# ... edit files ...
git add -A && git commit -m "fix: address Cubic review feedback" && git push
```
---
## Verification Loop Pseudocode
```
iteration = 0
while true:
iteration++
log("Verification iteration ${iteration}")
# Gate A: CI (cheapest, check first)
push_and_wait_for_ci()
if ci_failed:
read_ci_logs()
fix_and_commit()
continue
# Gate B: review-work (5 agents, more expensive)
run_review_work()
if any_agent_failed:
read_agent_feedback()
fix_and_commit()
continue
# Gate C: Cubic (external bot, wait for it)
wait_for_cubic_review()
if cubic_has_issues:
read_cubic_comments()
fix_and_commit()
continue
# All gates passed
break
# Merge
gh pr merge <PR_NUMBER> --squash --delete-branch
```
No iteration cap. Loop continues until all three gates pass simultaneously in a single iteration.
---
## Risk Assessment
| Risk | Probability | Mitigation |
|------|------------|------------|
| Slot leak (global count never decremented) | Medium | Audit every exit path: `tryCompleteTask`, `cancelTask`, `handleEvent(session.error)`, `startTask` prompt error, `resume` prompt error |
| Race condition on global count | Low | `globalRunningCount` is synchronous (single-threaded JS), no async gap between check and increment in `launch()` |
| Breaking existing behavior | Low | Default is 5, same as existing per-model default. Users with <5 total agents see no change |
| `manager.ts` exceeding 200 LOC | Already exceeded | File is already ~1500 LOC (exempt due to being a core orchestration class with many methods). Our changes add ~20 lines to existing methods, not a new responsibility |

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 292000, "total_duration_seconds": 292}

View File

@@ -0,0 +1,15 @@
{
"run_id": "eval-1-without_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": false, "evidence": "Uses git checkout -b, no worktree isolation"},
{"text": "Branch is created from origin/dev", "passed": true, "evidence": "git checkout -b feat/max-background-agents dev"},
{"text": "Plan specifies multiple atomic commits for multi-file changes", "passed": false, "evidence": "Steps listed sequentially but no atomic commit strategy mentioned"},
{"text": "Runs bun run typecheck, bun test, and bun run build before pushing", "passed": true, "evidence": "Step 6 runs typecheck and tests, Step 8 implies push after verification"},
{"text": "PR is created targeting dev branch", "passed": true, "evidence": "Step 8 mentions creating PR"},
{"text": "Verification loop includes all 3 gates: CI, review-work, and Cubic", "passed": false, "evidence": "Only mentions CI pipeline in step 6. No review-work or Cubic."},
{"text": "Gates are checked in order: CI first, then review-work, then Cubic", "passed": false, "evidence": "No gate ordering - only CI mentioned"},
{"text": "Cubic check uses gh api to check cubic-dev-ai[bot] reviews", "passed": false, "evidence": "No mention of Cubic at all"},
{"text": "Plan includes worktree cleanup after merge", "passed": false, "evidence": "No worktree used, no cleanup needed"},
{"text": "Code changes reference actual files in the codebase", "passed": true, "evidence": "References actual files with detailed design decisions"}
]
}

View File

@@ -0,0 +1,615 @@
# Code Changes: `max_background_agents` Config Option
## 1. Schema Change
**File:** `src/config/schema/background-task.ts`
```typescript
import { z } from "zod"
export const BackgroundTaskConfigSchema = z.object({
defaultConcurrency: z.number().min(1).optional(),
providerConcurrency: z.record(z.string(), z.number().min(0)).optional(),
modelConcurrency: z.record(z.string(), z.number().min(0)).optional(),
maxDepth: z.number().int().min(1).optional(),
maxDescendants: z.number().int().min(1).optional(),
/** Maximum number of background agents that can run simultaneously across all models/providers (default: no global limit, only per-model limits apply) */
maxBackgroundAgents: z.number().int().min(1).optional(),
/** Stale timeout in milliseconds - interrupt tasks with no activity for this duration (default: 180000 = 3 minutes, minimum: 60000 = 1 minute) */
staleTimeoutMs: z.number().min(60000).optional(),
/** Timeout for tasks that never received any progress update, falling back to startedAt (default: 1800000 = 30 minutes, minimum: 60000 = 1 minute) */
messageStalenessTimeoutMs: z.number().min(60000).optional(),
syncPollTimeoutMs: z.number().min(60000).optional(),
})
export type BackgroundTaskConfig = z.infer<typeof BackgroundTaskConfigSchema>
```
**What changed:** Added `maxBackgroundAgents` field after `maxDescendants` (grouped with other limit fields). Uses `z.number().int().min(1).optional()` matching the pattern of `maxDepth` and `maxDescendants`.
---
## 2. ConcurrencyManager Changes
**File:** `src/features/background-agent/concurrency.ts`
```typescript
import type { BackgroundTaskConfig } from "../../config/schema"
/**
* Queue entry with settled-flag pattern to prevent double-resolution.
*
* The settled flag ensures that cancelWaiters() doesn't reject
* an entry that was already resolved by release().
*/
interface QueueEntry {
resolve: () => void
rawReject: (error: Error) => void
settled: boolean
}
export class ConcurrencyManager {
private config?: BackgroundTaskConfig
private counts: Map<string, number> = new Map()
private queues: Map<string, QueueEntry[]> = new Map()
private globalCount = 0
private globalQueue: QueueEntry[] = []
constructor(config?: BackgroundTaskConfig) {
this.config = config
}
getGlobalLimit(): number {
const limit = this.config?.maxBackgroundAgents
if (limit === undefined) {
return Infinity
}
return limit
}
getConcurrencyLimit(model: string): number {
const modelLimit = this.config?.modelConcurrency?.[model]
if (modelLimit !== undefined) {
return modelLimit === 0 ? Infinity : modelLimit
}
const provider = model.split('/')[0]
const providerLimit = this.config?.providerConcurrency?.[provider]
if (providerLimit !== undefined) {
return providerLimit === 0 ? Infinity : providerLimit
}
const defaultLimit = this.config?.defaultConcurrency
if (defaultLimit !== undefined) {
return defaultLimit === 0 ? Infinity : defaultLimit
}
return 5
}
async acquire(model: string): Promise<void> {
const perModelLimit = this.getConcurrencyLimit(model)
const globalLimit = this.getGlobalLimit()
// Fast path: both limits have capacity
if (perModelLimit === Infinity && globalLimit === Infinity) {
return
}
const currentPerModel = this.counts.get(model) ?? 0
if (currentPerModel < perModelLimit && this.globalCount < globalLimit) {
this.counts.set(model, currentPerModel + 1)
this.globalCount++
return
}
return new Promise<void>((resolve, reject) => {
const entry: QueueEntry = {
resolve: () => {
if (entry.settled) return
entry.settled = true
resolve()
},
rawReject: reject,
settled: false,
}
// Queue on whichever limit is blocking
if (currentPerModel >= perModelLimit) {
const queue = this.queues.get(model) ?? []
queue.push(entry)
this.queues.set(model, queue)
} else {
this.globalQueue.push(entry)
}
})
}
release(model: string): void {
const perModelLimit = this.getConcurrencyLimit(model)
const globalLimit = this.getGlobalLimit()
if (perModelLimit === Infinity && globalLimit === Infinity) {
return
}
// Try per-model handoff first
const queue = this.queues.get(model)
while (queue && queue.length > 0) {
const next = queue.shift()!
if (!next.settled) {
// Hand off the slot to this waiter (counts stay the same)
next.resolve()
return
}
}
// No per-model handoff - decrement per-model count
const current = this.counts.get(model) ?? 0
if (current > 0) {
this.counts.set(model, current - 1)
}
// Try global handoff
while (this.globalQueue.length > 0) {
const next = this.globalQueue.shift()!
if (!next.settled) {
// Hand off the global slot - but the waiter still needs a per-model slot
// Since they were queued on global, their per-model had capacity
// Re-acquire per-model count for them
const waiterModel = this.findModelForGlobalWaiter()
if (waiterModel) {
const waiterCount = this.counts.get(waiterModel) ?? 0
this.counts.set(waiterModel, waiterCount + 1)
}
next.resolve()
return
}
}
// No handoff occurred - decrement global count
if (this.globalCount > 0) {
this.globalCount--
}
}
/**
* Cancel all waiting acquires for a model. Used during cleanup.
*/
cancelWaiters(model: string): void {
const queue = this.queues.get(model)
if (queue) {
for (const entry of queue) {
if (!entry.settled) {
entry.settled = true
entry.rawReject(new Error(`Concurrency queue cancelled for model: ${model}`))
}
}
this.queues.delete(model)
}
}
/**
* Clear all state. Used during manager cleanup/shutdown.
* Cancels all pending waiters.
*/
clear(): void {
for (const [model] of this.queues) {
this.cancelWaiters(model)
}
// Cancel global queue waiters
for (const entry of this.globalQueue) {
if (!entry.settled) {
entry.settled = true
entry.rawReject(new Error("Concurrency queue cancelled: manager shutdown"))
}
}
this.globalQueue = []
this.globalCount = 0
this.counts.clear()
this.queues.clear()
}
/**
* Get current count for a model (for testing/debugging)
*/
getCount(model: string): number {
return this.counts.get(model) ?? 0
}
/**
* Get queue length for a model (for testing/debugging)
*/
getQueueLength(model: string): number {
return this.queues.get(model)?.length ?? 0
}
/**
* Get current global count across all models (for testing/debugging)
*/
getGlobalCount(): number {
return this.globalCount
}
/**
* Get global queue length (for testing/debugging)
*/
getGlobalQueueLength(): number {
return this.globalQueue.length
}
}
```
**What changed:**
- Added `globalCount` field to track total active agents across all keys
- Added `globalQueue` for tasks waiting on the global limit
- Added `getGlobalLimit()` method to read `maxBackgroundAgents` from config
- Modified `acquire()` to check both per-model AND global limits
- Modified `release()` to handle global queue handoff and decrement global count
- Modified `clear()` to reset global state
- Added `getGlobalCount()` and `getGlobalQueueLength()` for testing
**Important design note:** The `release()` implementation above is a simplified version. In practice, the global queue handoff is tricky because we need to know which model the global waiter was trying to acquire for. A cleaner approach would be to store the model key in the QueueEntry. Let me refine:
### Refined approach (simpler, more correct)
Instead of a separate global queue, a simpler approach is to check the global limit inside `acquire()` and use a single queue per model. When global capacity frees up on `release()`, we try to drain any model's queue:
```typescript
async acquire(model: string): Promise<void> {
const perModelLimit = this.getConcurrencyLimit(model)
const globalLimit = this.getGlobalLimit()
if (perModelLimit === Infinity && globalLimit === Infinity) {
return
}
const currentPerModel = this.counts.get(model) ?? 0
if (currentPerModel < perModelLimit && this.globalCount < globalLimit) {
this.counts.set(model, currentPerModel + 1)
if (globalLimit !== Infinity) {
this.globalCount++
}
return
}
return new Promise<void>((resolve, reject) => {
const queue = this.queues.get(model) ?? []
const entry: QueueEntry = {
resolve: () => {
if (entry.settled) return
entry.settled = true
resolve()
},
rawReject: reject,
settled: false,
}
queue.push(entry)
this.queues.set(model, queue)
})
}
release(model: string): void {
const perModelLimit = this.getConcurrencyLimit(model)
const globalLimit = this.getGlobalLimit()
if (perModelLimit === Infinity && globalLimit === Infinity) {
return
}
// Try per-model handoff first (same model queue)
const queue = this.queues.get(model)
while (queue && queue.length > 0) {
const next = queue.shift()!
if (!next.settled) {
// Hand off the slot to this waiter (per-model and global counts stay the same)
next.resolve()
return
}
}
// No per-model handoff - decrement per-model count
const current = this.counts.get(model) ?? 0
if (current > 0) {
this.counts.set(model, current - 1)
}
// Decrement global count
if (globalLimit !== Infinity && this.globalCount > 0) {
this.globalCount--
}
// Try to drain any other model's queue that was blocked by global limit
if (globalLimit !== Infinity) {
this.tryDrainGlobalWaiters()
}
}
private tryDrainGlobalWaiters(): void {
const globalLimit = this.getGlobalLimit()
if (this.globalCount >= globalLimit) return
for (const [model, queue] of this.queues) {
const perModelLimit = this.getConcurrencyLimit(model)
const currentPerModel = this.counts.get(model) ?? 0
if (currentPerModel >= perModelLimit) continue
while (queue.length > 0 && this.globalCount < globalLimit && currentPerModel < perModelLimit) {
const next = queue.shift()!
if (!next.settled) {
this.counts.set(model, (this.counts.get(model) ?? 0) + 1)
this.globalCount++
next.resolve()
return
}
}
}
}
```
This refined approach keeps all waiters in per-model queues (no separate global queue), and on release, tries to drain waiters from any model queue that was blocked by the global limit.
---
## 3. Schema Test Changes
**File:** `src/config/schema/background-task.test.ts`
Add after the `syncPollTimeoutMs` describe block:
```typescript
describe("maxBackgroundAgents", () => {
describe("#given valid maxBackgroundAgents (10)", () => {
test("#when parsed #then returns correct value", () => {
const result = BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 10 })
expect(result.maxBackgroundAgents).toBe(10)
})
})
describe("#given maxBackgroundAgents of 1 (minimum)", () => {
test("#when parsed #then returns correct value", () => {
const result = BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 1 })
expect(result.maxBackgroundAgents).toBe(1)
})
})
describe("#given maxBackgroundAgents below minimum (0)", () => {
test("#when parsed #then throws ZodError", () => {
let thrownError: unknown
try {
BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 0 })
} catch (error) {
thrownError = error
}
expect(thrownError).toBeInstanceOf(ZodError)
})
})
describe("#given maxBackgroundAgents is negative (-1)", () => {
test("#when parsed #then throws ZodError", () => {
let thrownError: unknown
try {
BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: -1 })
} catch (error) {
thrownError = error
}
expect(thrownError).toBeInstanceOf(ZodError)
})
})
describe("#given maxBackgroundAgents is non-integer (2.5)", () => {
test("#when parsed #then throws ZodError", () => {
let thrownError: unknown
try {
BackgroundTaskConfigSchema.parse({ maxBackgroundAgents: 2.5 })
} catch (error) {
thrownError = error
}
expect(thrownError).toBeInstanceOf(ZodError)
})
})
describe("#given maxBackgroundAgents not provided", () => {
test("#when parsed #then field is undefined", () => {
const result = BackgroundTaskConfigSchema.parse({})
expect(result.maxBackgroundAgents).toBeUndefined()
})
})
})
```
---
## 4. ConcurrencyManager Test Changes
**File:** `src/features/background-agent/concurrency.test.ts`
Add new describe block:
```typescript
describe("ConcurrencyManager.globalLimit (maxBackgroundAgents)", () => {
test("should return Infinity when maxBackgroundAgents is not set", () => {
// given
const manager = new ConcurrencyManager()
// when
const limit = manager.getGlobalLimit()
// then
expect(limit).toBe(Infinity)
})
test("should return configured maxBackgroundAgents", () => {
// given
const config: BackgroundTaskConfig = { maxBackgroundAgents: 3 }
const manager = new ConcurrencyManager(config)
// when
const limit = manager.getGlobalLimit()
// then
expect(limit).toBe(3)
})
test("should enforce global limit across different models", async () => {
// given
const config: BackgroundTaskConfig = {
maxBackgroundAgents: 2,
defaultConcurrency: 5,
}
const manager = new ConcurrencyManager(config)
await manager.acquire("model-a")
await manager.acquire("model-b")
// when
let resolved = false
const waitPromise = manager.acquire("model-c").then(() => { resolved = true })
await Promise.resolve()
// then - should be blocked by global limit even though per-model has capacity
expect(resolved).toBe(false)
expect(manager.getGlobalCount()).toBe(2)
// cleanup
manager.release("model-a")
await waitPromise
expect(resolved).toBe(true)
})
test("should allow tasks when global limit not reached", async () => {
// given
const config: BackgroundTaskConfig = {
maxBackgroundAgents: 3,
defaultConcurrency: 5,
}
const manager = new ConcurrencyManager(config)
// when
await manager.acquire("model-a")
await manager.acquire("model-b")
await manager.acquire("model-c")
// then
expect(manager.getGlobalCount()).toBe(3)
expect(manager.getCount("model-a")).toBe(1)
expect(manager.getCount("model-b")).toBe(1)
expect(manager.getCount("model-c")).toBe(1)
})
test("should respect both per-model and global limits", async () => {
// given - per-model limit of 1, global limit of 3
const config: BackgroundTaskConfig = {
maxBackgroundAgents: 3,
defaultConcurrency: 1,
}
const manager = new ConcurrencyManager(config)
await manager.acquire("model-a")
// when - try second acquire on same model
let resolved = false
const waitPromise = manager.acquire("model-a").then(() => { resolved = true })
await Promise.resolve()
// then - blocked by per-model limit, not global
expect(resolved).toBe(false)
expect(manager.getGlobalCount()).toBe(1)
// cleanup
manager.release("model-a")
await waitPromise
})
test("should release global slot and unblock waiting tasks", async () => {
// given
const config: BackgroundTaskConfig = {
maxBackgroundAgents: 1,
defaultConcurrency: 5,
}
const manager = new ConcurrencyManager(config)
await manager.acquire("model-a")
// when
let resolved = false
const waitPromise = manager.acquire("model-b").then(() => { resolved = true })
await Promise.resolve()
expect(resolved).toBe(false)
manager.release("model-a")
await waitPromise
// then
expect(resolved).toBe(true)
expect(manager.getGlobalCount()).toBe(1)
expect(manager.getCount("model-a")).toBe(0)
expect(manager.getCount("model-b")).toBe(1)
})
test("should not enforce global limit when not configured", async () => {
// given - no maxBackgroundAgents set
const config: BackgroundTaskConfig = { defaultConcurrency: 5 }
const manager = new ConcurrencyManager(config)
// when - acquire many across different models
await manager.acquire("model-a")
await manager.acquire("model-b")
await manager.acquire("model-c")
await manager.acquire("model-d")
await manager.acquire("model-e")
await manager.acquire("model-f")
// then - all should succeed (no global limit)
expect(manager.getCount("model-a")).toBe(1)
expect(manager.getCount("model-f")).toBe(1)
})
test("should reset global count on clear", async () => {
// given
const config: BackgroundTaskConfig = { maxBackgroundAgents: 5 }
const manager = new ConcurrencyManager(config)
await manager.acquire("model-a")
await manager.acquire("model-b")
// when
manager.clear()
// then
expect(manager.getGlobalCount()).toBe(0)
})
})
```
---
## Config Usage Example
User's `.opencode/oh-my-opencode.jsonc`:
```jsonc
{
"background_task": {
// Global limit: max 5 background agents total
"maxBackgroundAgents": 5,
// Per-model limits still apply independently
"defaultConcurrency": 3,
"providerConcurrency": {
"anthropic": 2
}
}
}
```
With this config:
- Max 5 background agents running simultaneously across all models
- Max 3 per model (default), max 2 for any Anthropic model
- If 2 Anthropic + 3 OpenAI agents are running (5 total), no more can start regardless of per-model capacity

View File

@@ -0,0 +1,99 @@
# Execution Plan: Add `max_background_agents` Config Option
## Overview
Add a `max_background_agents` config option to oh-my-opencode that limits total simultaneous background agents across all models/providers. Currently, concurrency is only limited per-model/provider key (default 5 per key). This new option adds a **global ceiling** on total running background agents.
## Step-by-Step Plan
### Step 1: Create feature branch
```bash
git checkout -b feat/max-background-agents dev
```
### Step 2: Add `max_background_agents` to BackgroundTaskConfigSchema
**File:** `src/config/schema/background-task.ts`
- Add `maxBackgroundAgents` field to the Zod schema with `z.number().int().min(1).optional()`
- This follows the existing pattern of `maxDepth` and `maxDescendants` (integer, min 1, optional)
- The field name uses camelCase to match existing schema fields (`defaultConcurrency`, `maxDepth`, `maxDescendants`)
- No `.default()` needed since the hardcoded fallback of 5 lives in `ConcurrencyManager`
### Step 3: Modify `ConcurrencyManager` to enforce global limit
**File:** `src/features/background-agent/concurrency.ts`
- Add a `globalCount` field tracking total active agents across all keys
- Modify `acquire()` to check global count against `maxBackgroundAgents` before granting a slot
- Modify `release()` to decrement global count
- Modify `clear()` to reset global count
- Add `getGlobalCount()` for testing/debugging (follows existing `getCount()`/`getQueueLength()` pattern)
The global limit check happens **in addition to** the per-model limit. Both must have capacity for a task to proceed.
### Step 4: Add tests for the new config schema field
**File:** `src/config/schema/background-task.test.ts`
- Add test cases following the existing given/when/then pattern with nested describes
- Test valid value, below-minimum value, undefined (not provided), non-number type
### Step 5: Add tests for ConcurrencyManager global limit
**File:** `src/features/background-agent/concurrency.test.ts`
- Test that global limit is enforced across different model keys
- Test that tasks queue when global limit reached even if per-model limit has capacity
- Test that releasing a slot from one model allows a queued task from another model to proceed
- Test default behavior (5) when no config provided
- Test interaction between global and per-model limits
### Step 6: Run typecheck and tests
```bash
bun run typecheck
bun test src/config/schema/background-task.test.ts
bun test src/features/background-agent/concurrency.test.ts
```
### Step 7: Verify LSP diagnostics clean
Check `src/config/schema/background-task.ts` and `src/features/background-agent/concurrency.ts` for errors.
### Step 8: Create PR
- Push branch to remote
- Create PR with structured description via `gh pr create`
## Files Modified (4 files)
| File | Change |
|------|--------|
| `src/config/schema/background-task.ts` | Add `maxBackgroundAgents` field |
| `src/features/background-agent/concurrency.ts` | Add global count tracking + enforcement |
| `src/config/schema/background-task.test.ts` | Add schema validation tests |
| `src/features/background-agent/concurrency.test.ts` | Add global limit enforcement tests |
## Files NOT Modified (intentional)
| File | Reason |
|------|--------|
| `src/config/schema/oh-my-opencode-config.ts` | No change needed - `BackgroundTaskConfigSchema` is already composed into root schema via `background_task` field |
| `src/create-managers.ts` | No change needed - `pluginConfig.background_task` already passed to `BackgroundManager` constructor |
| `src/features/background-agent/manager.ts` | No change needed - already passes config to `ConcurrencyManager` |
| `src/plugin-config.ts` | No change needed - `background_task` is a simple object field, uses default override merge |
| `src/config/schema.ts` | No change needed - barrel already exports `BackgroundTaskConfigSchema` |
## Design Decisions
1. **Field name `maxBackgroundAgents`** - camelCase to match existing schema fields (`maxDepth`, `maxDescendants`, `defaultConcurrency`). The user-facing JSONC config key is also camelCase per existing convention in `background_task` section.
2. **Global limit vs per-model limit** - The global limit is a ceiling across ALL concurrency keys. Per-model limits still apply independently. A task needs both a per-model slot AND a global slot to proceed.
3. **Default of 5** - Matches the existing hardcoded default in `getConcurrencyLimit()`. When `maxBackgroundAgents` is not set, no global limit is enforced (only per-model limits apply), preserving backward compatibility.
4. **Queue behavior** - When global limit is reached, tasks wait in the same FIFO queue mechanism. The global check happens inside `acquire()` before the per-model check.
5. **0 means Infinity** - Following the existing pattern where `defaultConcurrency: 0` means unlimited, `maxBackgroundAgents: 0` would also mean no global limit.

View File

@@ -0,0 +1,50 @@
# PR Description
**Title:** feat: add `maxBackgroundAgents` config to limit total simultaneous background agents
**Body:**
## Summary
- Add `maxBackgroundAgents` field to `BackgroundTaskConfigSchema` that enforces a global ceiling on total running background agents across all models/providers
- Modify `ConcurrencyManager` to track global count and enforce the limit alongside existing per-model limits
- Add schema validation tests and concurrency enforcement tests
## Motivation
Currently, concurrency is only limited per model/provider key (default 5 per key). On resource-constrained machines or when using many different models, the total number of background agents can grow unbounded (5 per model x N models). This config option lets users set a hard ceiling.
## Changes
### Schema (`src/config/schema/background-task.ts`)
- Added `maxBackgroundAgents: z.number().int().min(1).optional()` to `BackgroundTaskConfigSchema`
- Grouped with existing limit fields (`maxDepth`, `maxDescendants`)
### ConcurrencyManager (`src/features/background-agent/concurrency.ts`)
- Added `globalCount` tracking total active agents across all concurrency keys
- Added `getGlobalLimit()` reading `maxBackgroundAgents` from config (defaults to `Infinity` = no global limit)
- Modified `acquire()` to check both per-model AND global capacity
- Modified `release()` to decrement global count and drain cross-model waiters blocked by global limit
- Modified `clear()` to reset global state
- Added `getGlobalCount()` / `getGlobalQueueLength()` for testing
### Tests
- `src/config/schema/background-task.test.ts`: 6 test cases for schema validation (valid, min boundary, below min, negative, non-integer, undefined)
- `src/features/background-agent/concurrency.test.ts`: 8 test cases for global limit enforcement (cross-model blocking, release unblocking, per-model vs global interaction, no-config default, clear reset)
## Config Example
```jsonc
{
"background_task": {
"maxBackgroundAgents": 5,
"defaultConcurrency": 3
}
}
```
## Backward Compatibility
- When `maxBackgroundAgents` is not set (default), no global limit is enforced - behavior is identical to before
- Existing `defaultConcurrency`, `providerConcurrency`, and `modelConcurrency` continue to work unchanged
- No config migration needed

View File

@@ -0,0 +1,111 @@
# Verification Strategy
## 1. Static Analysis
### TypeScript Typecheck
```bash
bun run typecheck
```
- Verify no type errors introduced
- `BackgroundTaskConfig` type is inferred from Zod schema, so adding the field automatically updates the type
- All existing consumers of `BackgroundTaskConfig` remain compatible (new field is optional)
### LSP Diagnostics
Check changed files for errors:
- `src/config/schema/background-task.ts`
- `src/features/background-agent/concurrency.ts`
- `src/config/schema/background-task.test.ts`
- `src/features/background-agent/concurrency.test.ts`
## 2. Unit Tests
### Schema Validation Tests
```bash
bun test src/config/schema/background-task.test.ts
```
| Test Case | Input | Expected |
|-----------|-------|----------|
| Valid value (10) | `{ maxBackgroundAgents: 10 }` | Parses to `10` |
| Minimum boundary (1) | `{ maxBackgroundAgents: 1 }` | Parses to `1` |
| Below minimum (0) | `{ maxBackgroundAgents: 0 }` | Throws `ZodError` |
| Negative (-1) | `{ maxBackgroundAgents: -1 }` | Throws `ZodError` |
| Non-integer (2.5) | `{ maxBackgroundAgents: 2.5 }` | Throws `ZodError` |
| Not provided | `{}` | Field is `undefined` |
### ConcurrencyManager Tests
```bash
bun test src/features/background-agent/concurrency.test.ts
```
| Test Case | Setup | Expected |
|-----------|-------|----------|
| No config = no global limit | No `maxBackgroundAgents` | `getGlobalLimit()` returns `Infinity` |
| Config respected | `maxBackgroundAgents: 3` | `getGlobalLimit()` returns `3` |
| Cross-model blocking | Global limit 2, acquire model-a + model-b, try model-c | model-c blocks |
| Under-limit allows | Global limit 3, acquire 3 different models | All succeed |
| Per-model + global interaction | Per-model 1, global 3, acquire model-a twice | Blocked by per-model, not global |
| Release unblocks | Global limit 1, acquire model-a, queue model-b, release model-a | model-b proceeds |
| No global limit = no enforcement | No config, acquire 6 different models | All succeed |
| Clear resets global count | Acquire 2, clear | `getGlobalCount()` is 0 |
### Existing Test Regression
```bash
bun test src/features/background-agent/concurrency.test.ts
bun test src/config/schema/background-task.test.ts
bun test src/config/schema.test.ts
```
All existing tests must continue to pass unchanged.
## 3. Integration Verification
### Config Loading Path
Verify the config flows correctly through the system:
1. **Schema → Type**: `BackgroundTaskConfig` type auto-includes `maxBackgroundAgents` via `z.infer`
2. **Config file → Schema**: `loadConfigFromPath()` in `plugin-config.ts` uses `OhMyOpenCodeConfigSchema.safeParse()` which includes `BackgroundTaskConfigSchema`
3. **Config → Manager**: `create-managers.ts` passes `pluginConfig.background_task` to `BackgroundManager` constructor
4. **Manager → ConcurrencyManager**: `BackgroundManager` constructor passes config to `new ConcurrencyManager(config)`
5. **ConcurrencyManager → Enforcement**: `acquire()` reads `config.maxBackgroundAgents` via `getGlobalLimit()`
No changes needed in steps 2-4 since the field is optional and the existing plumbing passes the entire `BackgroundTaskConfig` object.
### Manual Config Test
Create a test config to verify parsing:
```bash
echo '{ "background_task": { "maxBackgroundAgents": 3 } }' | bun -e "
const { BackgroundTaskConfigSchema } = require('./src/config/schema/background-task');
const result = BackgroundTaskConfigSchema.safeParse(JSON.parse(require('fs').readFileSync('/dev/stdin', 'utf-8')).background_task);
console.log(result.success, result.data);
"
```
## 4. Build Verification
```bash
bun run build
```
- Verify build succeeds
- Schema JSON output includes the new field (if applicable)
## 5. Edge Cases to Verify
| Edge Case | Expected Behavior |
|-----------|-------------------|
| `maxBackgroundAgents` not set | No global limit enforced (backward compatible) |
| `maxBackgroundAgents: 1` | Only 1 background agent at a time across all models |
| `maxBackgroundAgents` > sum of all per-model limits | Global limit never triggers (per-model limits are tighter) |
| Per-model limit tighter than global | Per-model limit blocks first |
| Global limit tighter than per-model | Global limit blocks first |
| Release from one model unblocks different model | Global slot freed, different model's waiter proceeds |
| Manager shutdown with global waiters | `clear()` rejects all waiters and resets global count |
| Concurrent acquire/release | No race conditions (single-threaded JS event loop) |
## 6. CI Pipeline
The existing CI workflow (`ci.yml`) will run:
- `bun run typecheck` - type checking
- `bun test` - all tests including new ones
- `bun run build` - build verification
No CI changes needed.

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 365000, "total_duration_seconds": 365}

View File

@@ -0,0 +1,37 @@
{
"eval_id": 2,
"eval_name": "bugfix-atlas-null-check",
"prompt": "The atlas hook has a bug where it crashes when boulder.json is missing the worktree_path field. Fix it and land the fix as a PR. Make sure CI passes.",
"assertions": [
{
"id": "worktree-isolation",
"text": "Plan uses git worktree in a sibling directory",
"type": "manual"
},
{
"id": "minimal-fix",
"text": "Fix is minimal — adds null check, doesn't refactor unrelated code",
"type": "manual"
},
{
"id": "test-added",
"text": "Test case added for the missing worktree_path scenario",
"type": "manual"
},
{
"id": "three-gates",
"text": "Verification loop includes all 3 gates: CI, review-work, Cubic",
"type": "manual"
},
{
"id": "real-atlas-files",
"text": "References actual atlas hook files in src/hooks/atlas/",
"type": "manual"
},
{
"id": "fix-branch-naming",
"text": "Branch name follows fix/ prefix convention",
"type": "manual"
}
]
}

View File

@@ -0,0 +1,11 @@
{
"run_id": "eval-2-with_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": true, "evidence": "../omo-wt/fix-atlas-worktree-path-crash"},
{"text": "Fix is minimal — adds null check, doesn't refactor unrelated code", "passed": true, "evidence": "3 targeted changes: readBoulderState sanitization, idle-event guard, tests"},
{"text": "Test case added for the missing worktree_path scenario", "passed": true, "evidence": "Tests for missing and null worktree_path"},
{"text": "Verification loop includes all 3 gates", "passed": true, "evidence": "Gate A (CI), Gate B (review-work), Gate C (Cubic)"},
{"text": "References actual atlas hook files", "passed": true, "evidence": "src/hooks/atlas/idle-event.ts, src/features/boulder-state/storage.ts"},
{"text": "Branch name follows fix/ prefix convention", "passed": true, "evidence": "fix/atlas-worktree-path-crash"}
]
}

View File

@@ -0,0 +1,205 @@
# Code Changes
## File 1: `src/features/boulder-state/storage.ts`
**Change**: Add `worktree_path` sanitization in `readBoulderState()`
```typescript
// BEFORE (lines 29-32):
if (!Array.isArray(parsed.session_ids)) {
parsed.session_ids = []
}
return parsed as BoulderState
// AFTER:
if (!Array.isArray(parsed.session_ids)) {
parsed.session_ids = []
}
if (parsed.worktree_path !== undefined && typeof parsed.worktree_path !== "string") {
parsed.worktree_path = undefined
}
return parsed as BoulderState
```
**Rationale**: `readBoulderState` casts raw `JSON.parse()` output as `BoulderState` without validating individual fields. When boulder.json has `"worktree_path": null` (valid JSON from manual edits, corrupted state, or external tools), the runtime type is `null` but TypeScript type says `string | undefined`. This sanitization ensures downstream code always gets the correct type.
---
## File 2: `src/hooks/atlas/idle-event.ts`
**Change**: Add defensive string type guard before passing `worktree_path` to continuation functions.
```typescript
// BEFORE (lines 83-88 in scheduleRetry):
await injectContinuation({
ctx,
sessionID,
sessionState,
options,
planName: currentBoulder.plan_name,
progress: currentProgress,
agent: currentBoulder.agent,
worktreePath: currentBoulder.worktree_path,
})
// AFTER:
await injectContinuation({
ctx,
sessionID,
sessionState,
options,
planName: currentBoulder.plan_name,
progress: currentProgress,
agent: currentBoulder.agent,
worktreePath: typeof currentBoulder.worktree_path === "string" ? currentBoulder.worktree_path : undefined,
})
```
```typescript
// BEFORE (lines 184-188 in handleAtlasSessionIdle):
await injectContinuation({
ctx,
sessionID,
sessionState,
options,
planName: boulderState.plan_name,
progress,
agent: boulderState.agent,
worktreePath: boulderState.worktree_path,
})
// AFTER:
await injectContinuation({
ctx,
sessionID,
sessionState,
options,
planName: boulderState.plan_name,
progress,
agent: boulderState.agent,
worktreePath: typeof boulderState.worktree_path === "string" ? boulderState.worktree_path : undefined,
})
```
**Rationale**: Belt-and-suspenders defense. Even though `readBoulderState` now sanitizes, direct `writeBoulderState` calls elsewhere could still produce invalid state. The `typeof` check is zero-cost and prevents any possibility of `null` or non-string values leaking through.
---
## File 3: `src/hooks/atlas/index.test.ts`
**Change**: Add test cases for missing `worktree_path` scenarios within the existing `session.idle handler` describe block.
```typescript
test("should inject continuation when boulder.json has no worktree_path field", async () => {
// given - boulder state WITHOUT worktree_path
const planPath = join(TEST_DIR, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
const state: BoulderState = {
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: [MAIN_SESSION_ID],
plan_name: "test-plan",
}
writeBoulderState(TEST_DIR, state)
const readState = readBoulderState(TEST_DIR)
expect(readState?.worktree_path).toBeUndefined()
const mockInput = createMockPluginInput()
const hook = createAtlasHook(mockInput)
// when
await hook.handler({
event: {
type: "session.idle",
properties: { sessionID: MAIN_SESSION_ID },
},
})
// then - continuation injected, no worktree context in prompt
expect(mockInput._promptMock).toHaveBeenCalled()
const callArgs = mockInput._promptMock.mock.calls[0][0]
expect(callArgs.body.parts[0].text).not.toContain("[Worktree:")
expect(callArgs.body.parts[0].text).toContain("1 remaining")
})
test("should handle boulder.json with worktree_path: null without crashing", async () => {
// given - manually write boulder.json with worktree_path: null (corrupted state)
const planPath = join(TEST_DIR, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
const boulderPath = join(SISYPHUS_DIR, "boulder.json")
writeFileSync(boulderPath, JSON.stringify({
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: [MAIN_SESSION_ID],
plan_name: "test-plan",
worktree_path: null,
}, null, 2))
const mockInput = createMockPluginInput()
const hook = createAtlasHook(mockInput)
// when
await hook.handler({
event: {
type: "session.idle",
properties: { sessionID: MAIN_SESSION_ID },
},
})
// then - should inject continuation without crash, no "[Worktree: null]"
expect(mockInput._promptMock).toHaveBeenCalled()
const callArgs = mockInput._promptMock.mock.calls[0][0]
expect(callArgs.body.parts[0].text).not.toContain("[Worktree: null]")
expect(callArgs.body.parts[0].text).not.toContain("[Worktree: undefined]")
})
```
---
## File 4: `src/features/boulder-state/storage.test.ts` (addition to existing)
**Change**: Add `readBoulderState` sanitization test.
```typescript
describe("#given boulder.json with worktree_path: null", () => {
test("#then readBoulderState should sanitize null to undefined", () => {
// given
const boulderPath = join(TEST_DIR, ".sisyphus", "boulder.json")
writeFileSync(boulderPath, JSON.stringify({
active_plan: "/path/to/plan.md",
started_at: "2026-01-02T10:00:00Z",
session_ids: ["session-1"],
plan_name: "test-plan",
worktree_path: null,
}, null, 2))
// when
const state = readBoulderState(TEST_DIR)
// then
expect(state).not.toBeNull()
expect(state!.worktree_path).toBeUndefined()
})
test("#then readBoulderState should preserve valid worktree_path string", () => {
// given
const boulderPath = join(TEST_DIR, ".sisyphus", "boulder.json")
writeFileSync(boulderPath, JSON.stringify({
active_plan: "/path/to/plan.md",
started_at: "2026-01-02T10:00:00Z",
session_ids: ["session-1"],
plan_name: "test-plan",
worktree_path: "/valid/worktree/path",
}, null, 2))
// when
const state = readBoulderState(TEST_DIR)
// then
expect(state?.worktree_path).toBe("/valid/worktree/path")
})
})
```

View File

@@ -0,0 +1,78 @@
# Execution Plan — Fix atlas hook crash on missing worktree_path
## Phase 0: Setup
1. **Create worktree from origin/dev**:
```bash
git fetch origin dev
git worktree add ../omo-wt/fix-atlas-worktree-path-crash origin/dev
```
2. **Create feature branch**:
```bash
cd ../omo-wt/fix-atlas-worktree-path-crash
git checkout -b fix/atlas-worktree-path-crash
```
## Phase 1: Implement
### Step 1: Fix `readBoulderState()` in `src/features/boulder-state/storage.ts`
- Add `worktree_path` sanitization after JSON parse
- Ensure `worktree_path` is `string | undefined`, never `null` or other types
- This is the root cause: raw `JSON.parse` + `as BoulderState` cast allows type violations at runtime
### Step 2: Add defensive guard in `src/hooks/atlas/idle-event.ts`
- Before passing `boulderState.worktree_path` to `injectContinuation`, validate it's a string
- Apply same guard in the `scheduleRetry` callback (line 86)
- Ensures even if `readBoulderState` is bypassed, the idle handler won't crash
### Step 3: Add test coverage in `src/hooks/atlas/index.test.ts`
- Add test: boulder.json without `worktree_path` field → session.idle works
- Add test: boulder.json with `worktree_path: null` → session.idle works (no `[Worktree: null]` in prompt)
- Add test: `readBoulderState` sanitizes `null` worktree_path to `undefined`
- Follow existing given/when/then test pattern
### Step 4: Local validation
```bash
bun run typecheck
bun test src/hooks/atlas/
bun test src/features/boulder-state/
bun run build
```
### Step 5: Atomic commit
```bash
git add src/features/boulder-state/storage.ts src/hooks/atlas/idle-event.ts src/hooks/atlas/index.test.ts
git commit -m "fix(atlas): prevent crash when boulder.json missing worktree_path field
readBoulderState() performs unsafe cast of parsed JSON as BoulderState.
When worktree_path is absent or null in boulder.json, downstream code
in idle-event.ts could receive null where string|undefined is expected.
- Sanitize worktree_path in readBoulderState (reject non-string values)
- Add defensive typeof check in idle-event before passing to continuation
- Add test coverage for missing and null worktree_path scenarios"
```
## Phase 2: PR Creation
```bash
git push -u origin fix/atlas-worktree-path-crash
gh pr create \
--base dev \
--title "fix(atlas): prevent crash when boulder.json missing worktree_path" \
--body-file /tmp/pull-request-atlas-worktree-fix.md
```
## Phase 3: Verify Loop
- **Gate A (CI)**: `gh pr checks --watch` — wait for all checks green
- **Gate B (review-work)**: Run 5-agent review (Oracle goal, Oracle quality, Oracle security, QA execution, context mining)
- **Gate C (Cubic)**: Wait for cubic-dev-ai[bot] to respond "No issues found"
- On any failure: fix-commit-push, re-enter verify loop
## Phase 4: Merge
```bash
gh pr merge --squash --delete-branch
git worktree remove ../omo-wt/fix-atlas-worktree-path-crash
```

View File

@@ -0,0 +1,42 @@
# PR Title
```
fix(atlas): prevent crash when boulder.json missing worktree_path
```
# PR Body
## Summary
- Fix runtime type violation in atlas hook when `boulder.json` lacks `worktree_path` field
- Add `worktree_path` sanitization in `readBoulderState()` to reject non-string values (e.g., `null` from manual edits)
- Add defensive `typeof` guards in `idle-event.ts` before passing worktree path to continuation injection
- Add test coverage for missing and null `worktree_path` scenarios
## Problem
`readBoulderState()` in `src/features/boulder-state/storage.ts` casts raw `JSON.parse()` output directly as `BoulderState` via `return parsed as BoulderState`. This bypasses TypeScript's type system entirely at runtime.
When `boulder.json` is missing the `worktree_path` field (common for boulders created before worktree support was added, or created without `--worktree` flag), `boulderState.worktree_path` is `undefined` which is handled correctly. However, when boulder.json has `"worktree_path": null` (possible from manual edits, external tooling, or corrupted state), the runtime type becomes `null` which violates the TypeScript type `string | undefined`.
This `null` value propagates through:
1. `idle-event.ts:handleAtlasSessionIdle()``injectContinuation()``injectBoulderContinuation()`
2. `idle-event.ts:scheduleRetry()` callback → same chain
While the `boulder-continuation-injector.ts` handles falsy values via `worktreePath ? ... : ""`, the type mismatch can cause subtle downstream issues and violates the contract of the `BoulderState` interface.
## Changes
| File | Change |
|------|--------|
| `src/features/boulder-state/storage.ts` | Sanitize `worktree_path` in `readBoulderState()` — reject non-string values |
| `src/hooks/atlas/idle-event.ts` | Add `typeof` guards before passing worktree_path to continuation (2 call sites) |
| `src/hooks/atlas/index.test.ts` | Add 2 tests: missing worktree_path + null worktree_path in session.idle |
| `src/features/boulder-state/storage.test.ts` | Add 2 tests: sanitization of null + preservation of valid string |
## Testing
- `bun test src/hooks/atlas/` — all existing + new tests pass
- `bun test src/features/boulder-state/` — all existing + new tests pass
- `bun run typecheck` — clean
- `bun run build` — clean

View File

@@ -0,0 +1,87 @@
# Verification Strategy
## Gate A: CI (`gh pr checks --watch`)
### What CI runs (from `ci.yml`)
1. **Tests (split)**: Mock-heavy tests in isolation + batch tests
2. **Typecheck**: `bun run typecheck` (tsc --noEmit)
3. **Build**: `bun run build` (ESM + declarations + schema)
### Pre-push local validation
Before pushing, run the exact CI steps locally to catch failures early:
```bash
# Targeted test runs first (fast feedback)
bun test src/features/boulder-state/storage.test.ts
bun test src/hooks/atlas/index.test.ts
# Full test suite
bun test
# Type check
bun run typecheck
# Build
bun run build
```
### Failure handling
- **Test failure**: Read test output, fix code, create new commit (never amend pushed commits), push
- **Typecheck failure**: Run `lsp_diagnostics` on changed files, fix type errors, commit, push
- **Build failure**: Check build output for missing exports or circular deps, fix, commit, push
After each fix-commit-push: `gh pr checks --watch` to re-enter gate
## Gate B: review-work (5-agent review)
### The 5 parallel agents
1. **Oracle (goal/constraint verification)**: Checks the fix matches the stated problem — `worktree_path` crash resolved, no scope creep
2. **Oracle (code quality)**: Validates code follows existing patterns — factory pattern, given/when/then tests, < 200 LOC, no catch-all files
3. **Oracle (security)**: Ensures no new security issues — JSON parse injection, path traversal in worktree_path
4. **QA agent (hands-on execution)**: Actually runs the tests, checks `lsp_diagnostics` on changed files, verifies the fix in action
5. **Context mining agent**: Checks GitHub issues, git history, related PRs for context alignment
### Expected focus areas for this PR
- Oracle (goal): Does the sanitization in `readBoulderState` actually prevent the crash? Is the `typeof` guard necessary or redundant?
- Oracle (quality): Are the new tests following the given/when/then pattern? Do they use the same mock setup as existing tests?
- Oracle (security): Is the `worktree_path` value ever used in path operations without sanitization? (Answer: no, it's only used in template strings)
- QA: Run `bun test src/hooks/atlas/index.test.ts` — does the null worktree_path test actually trigger the bug before fix?
### Failure handling
- Each oracle produces a PASS/FAIL verdict with specific issues
- On FAIL: read the specific issue, fix in the worktree, commit, push, re-run review-work
- All 5 agents must PASS
## Gate C: Cubic (`cubic-dev-ai[bot]`)
### What Cubic checks
- Automated code review bot that analyzes the PR diff
- Looks for: type safety issues, missing error handling, test coverage gaps, anti-patterns
### Expected result
- "No issues found" for this small, focused fix
- 3 files changed (storage.ts, idle-event.ts, index.test.ts) + 1 test file
### Failure handling
- If Cubic flags an issue: evaluate if it's a real concern or false positive
- Real concern: fix, commit, push
- False positive: comment explaining why the flagged pattern is intentional
- Wait for Cubic to re-review after push
## Post-verification: Merge
Once all 3 gates pass:
```bash
gh pr merge --squash --delete-branch
git worktree remove ../omo-wt/fix-atlas-worktree-path-crash
```
On merge failure (conflicts):
```bash
cd ../omo-wt/fix-atlas-worktree-path-crash
git fetch origin dev
git rebase origin/dev
# Resolve conflicts if any
git push --force-with-lease
# Re-enter verify loop from Gate A
```

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 506000, "total_duration_seconds": 506}

View File

@@ -0,0 +1,11 @@
{
"run_id": "eval-2-without_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": false, "evidence": "No worktree. Steps go directly to creating branch and modifying files."},
{"text": "Fix is minimal — adds null check, doesn't refactor unrelated code", "passed": true, "evidence": "Focused fix though also adds try/catch in setTimeout (reasonable secondary fix)"},
{"text": "Test case added for the missing worktree_path scenario", "passed": true, "evidence": "Detailed test plan for missing/null/malformed boulder.json"},
{"text": "Verification loop includes all 3 gates", "passed": false, "evidence": "Only mentions CI pipeline (step 5). No review-work or Cubic."},
{"text": "References actual atlas hook files", "passed": true, "evidence": "References idle-event.ts, storage.ts with line numbers"},
{"text": "Branch name follows fix/ prefix convention", "passed": true, "evidence": "fix/atlas-hook-missing-worktree-path"}
]
}

View File

@@ -0,0 +1,334 @@
# Code Changes: Fix Atlas Hook Crash on Missing worktree_path
## Change 1: Harden `readBoulderState()` validation
**File:** `src/features/boulder-state/storage.ts`
### Before (lines 16-36):
```typescript
export function readBoulderState(directory: string): BoulderState | null {
const filePath = getBoulderFilePath(directory)
if (!existsSync(filePath)) {
return null
}
try {
const content = readFileSync(filePath, "utf-8")
const parsed = JSON.parse(content)
if (!parsed || typeof parsed !== "object" || Array.isArray(parsed)) {
return null
}
if (!Array.isArray(parsed.session_ids)) {
parsed.session_ids = []
}
return parsed as BoulderState
} catch {
return null
}
}
```
### After:
```typescript
export function readBoulderState(directory: string): BoulderState | null {
const filePath = getBoulderFilePath(directory)
if (!existsSync(filePath)) {
return null
}
try {
const content = readFileSync(filePath, "utf-8")
const parsed = JSON.parse(content)
if (!parsed || typeof parsed !== "object" || Array.isArray(parsed)) {
return null
}
if (typeof parsed.active_plan !== "string" || typeof parsed.plan_name !== "string") {
return null
}
if (!Array.isArray(parsed.session_ids)) {
parsed.session_ids = []
}
if (parsed.worktree_path !== undefined && typeof parsed.worktree_path !== "string") {
delete parsed.worktree_path
}
return parsed as BoulderState
} catch {
return null
}
}
```
**Rationale:** Validates that required fields (`active_plan`, `plan_name`) are strings. Strips `worktree_path` if it's present but not a string (e.g., `null`, number). This prevents downstream crashes from `existsSync(undefined)` and ensures type safety at the boundary.
---
## Change 2: Add try/catch in setTimeout retry callback
**File:** `src/hooks/atlas/idle-event.ts`
### Before (lines 62-88):
```typescript
sessionState.pendingRetryTimer = setTimeout(async () => {
sessionState.pendingRetryTimer = undefined
if (sessionState.promptFailureCount >= 2) return
if (sessionState.waitingForFinalWaveApproval) return
const currentBoulder = readBoulderState(ctx.directory)
if (!currentBoulder) return
if (!currentBoulder.session_ids?.includes(sessionID)) return
const currentProgress = getPlanProgress(currentBoulder.active_plan)
if (currentProgress.isComplete) return
if (options?.isContinuationStopped?.(sessionID)) return
if (options?.shouldSkipContinuation?.(sessionID)) return
if (hasRunningBackgroundTasks(sessionID, options)) return
await injectContinuation({
ctx,
sessionID,
sessionState,
options,
planName: currentBoulder.plan_name,
progress: currentProgress,
agent: currentBoulder.agent,
worktreePath: currentBoulder.worktree_path,
})
}, RETRY_DELAY_MS)
```
### After:
```typescript
sessionState.pendingRetryTimer = setTimeout(async () => {
sessionState.pendingRetryTimer = undefined
try {
if (sessionState.promptFailureCount >= 2) return
if (sessionState.waitingForFinalWaveApproval) return
const currentBoulder = readBoulderState(ctx.directory)
if (!currentBoulder) return
if (!currentBoulder.session_ids?.includes(sessionID)) return
const currentProgress = getPlanProgress(currentBoulder.active_plan)
if (currentProgress.isComplete) return
if (options?.isContinuationStopped?.(sessionID)) return
if (options?.shouldSkipContinuation?.(sessionID)) return
if (hasRunningBackgroundTasks(sessionID, options)) return
await injectContinuation({
ctx,
sessionID,
sessionState,
options,
planName: currentBoulder.plan_name,
progress: currentProgress,
agent: currentBoulder.agent,
worktreePath: currentBoulder.worktree_path,
})
} catch (error) {
log(`[${HOOK_NAME}] Retry continuation failed`, { sessionID, error: String(error) })
}
}, RETRY_DELAY_MS)
```
**Rationale:** The async callback in setTimeout creates a floating promise. Without try/catch, any error becomes an unhandled rejection that can crash the process. This is the critical safety net even after the `readBoulderState` fix.
---
## Change 3: Defensive guard in `getPlanProgress`
**File:** `src/features/boulder-state/storage.ts`
### Before (lines 115-118):
```typescript
export function getPlanProgress(planPath: string): PlanProgress {
if (!existsSync(planPath)) {
return { total: 0, completed: 0, isComplete: true }
}
```
### After:
```typescript
export function getPlanProgress(planPath: string): PlanProgress {
if (typeof planPath !== "string" || !existsSync(planPath)) {
return { total: 0, completed: 0, isComplete: true }
}
```
**Rationale:** Defense-in-depth. Even though `readBoulderState` now validates `active_plan`, the `getPlanProgress` function is a public API that could be called from other paths with invalid input. A `typeof` check before `existsSync` prevents the TypeError from `existsSync(undefined)`.
---
## Change 4: New tests
### File: `src/features/boulder-state/storage.test.ts` (additions)
```typescript
test("should return null when active_plan is missing", () => {
// given - boulder.json without active_plan
const boulderFile = join(SISYPHUS_DIR, "boulder.json")
writeFileSync(boulderFile, JSON.stringify({
started_at: "2026-01-01T00:00:00Z",
session_ids: ["ses-1"],
plan_name: "plan",
}))
// when
const result = readBoulderState(TEST_DIR)
// then
expect(result).toBeNull()
})
test("should return null when plan_name is missing", () => {
// given - boulder.json without plan_name
const boulderFile = join(SISYPHUS_DIR, "boulder.json")
writeFileSync(boulderFile, JSON.stringify({
active_plan: "/path/to/plan.md",
started_at: "2026-01-01T00:00:00Z",
session_ids: ["ses-1"],
}))
// when
const result = readBoulderState(TEST_DIR)
// then
expect(result).toBeNull()
})
test("should strip non-string worktree_path from boulder state", () => {
// given - boulder.json with worktree_path set to null
const boulderFile = join(SISYPHUS_DIR, "boulder.json")
writeFileSync(boulderFile, JSON.stringify({
active_plan: "/path/to/plan.md",
started_at: "2026-01-01T00:00:00Z",
session_ids: ["ses-1"],
plan_name: "plan",
worktree_path: null,
}))
// when
const result = readBoulderState(TEST_DIR)
// then
expect(result).not.toBeNull()
expect(result!.worktree_path).toBeUndefined()
})
test("should preserve valid worktree_path string", () => {
// given - boulder.json with valid worktree_path
const boulderFile = join(SISYPHUS_DIR, "boulder.json")
writeFileSync(boulderFile, JSON.stringify({
active_plan: "/path/to/plan.md",
started_at: "2026-01-01T00:00:00Z",
session_ids: ["ses-1"],
plan_name: "plan",
worktree_path: "/valid/worktree/path",
}))
// when
const result = readBoulderState(TEST_DIR)
// then
expect(result).not.toBeNull()
expect(result!.worktree_path).toBe("/valid/worktree/path")
})
```
### File: `src/features/boulder-state/storage.test.ts` (getPlanProgress additions)
```typescript
test("should handle undefined planPath without crashing", () => {
// given - undefined as planPath (from malformed boulder state)
// when
const progress = getPlanProgress(undefined as unknown as string)
// then
expect(progress.total).toBe(0)
expect(progress.isComplete).toBe(true)
})
```
### File: `src/hooks/atlas/index.test.ts` (additions to session.idle section)
```typescript
test("should handle boulder state without worktree_path gracefully", async () => {
// given - boulder state with incomplete plan, no worktree_path
const planPath = join(TEST_DIR, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1\n- [x] Task 2")
const state: BoulderState = {
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: [MAIN_SESSION_ID],
plan_name: "test-plan",
// worktree_path intentionally omitted
}
writeBoulderState(TEST_DIR, state)
const mockInput = createMockPluginInput()
const hook = createAtlasHook(mockInput)
// when
await hook.handler({
event: {
type: "session.idle",
properties: { sessionID: MAIN_SESSION_ID },
},
})
// then - should call prompt without crashing, continuation should not contain worktree context
expect(mockInput._promptMock).toHaveBeenCalled()
const callArgs = mockInput._promptMock.mock.calls[0][0]
expect(callArgs.body.parts[0].text).toContain("incomplete tasks")
expect(callArgs.body.parts[0].text).not.toContain("[Worktree:")
})
test("should include worktree context when worktree_path is present in boulder state", async () => {
// given - boulder state with worktree_path
const planPath = join(TEST_DIR, "test-plan.md")
writeFileSync(planPath, "# Plan\n- [ ] Task 1")
const state: BoulderState = {
active_plan: planPath,
started_at: "2026-01-02T10:00:00Z",
session_ids: [MAIN_SESSION_ID],
plan_name: "test-plan",
worktree_path: "/some/worktree/path",
}
writeBoulderState(TEST_DIR, state)
const mockInput = createMockPluginInput()
const hook = createAtlasHook(mockInput)
// when
await hook.handler({
event: {
type: "session.idle",
properties: { sessionID: MAIN_SESSION_ID },
},
})
// then - should include worktree context in continuation prompt
expect(mockInput._promptMock).toHaveBeenCalled()
const callArgs = mockInput._promptMock.mock.calls[0][0]
expect(callArgs.body.parts[0].text).toContain("[Worktree: /some/worktree/path]")
})
```
---
## Summary of Changes
| File | Change | Lines Modified |
|------|--------|---------------|
| `src/features/boulder-state/storage.ts` | Validate required fields + sanitize worktree_path + guard getPlanProgress | ~8 lines added |
| `src/hooks/atlas/idle-event.ts` | try/catch around setTimeout async callback | ~4 lines added |
| `src/features/boulder-state/storage.test.ts` | 5 new tests for validation | ~60 lines added |
| `src/hooks/atlas/index.test.ts` | 2 new tests for worktree_path handling | ~50 lines added |
Total: ~4 production lines changed, ~8 defensive lines added, ~110 test lines added.

View File

@@ -0,0 +1,86 @@
# Execution Plan: Fix Atlas Hook Crash on Missing worktree_path
## Bug Analysis
### Root Cause
`readBoulderState()` in `src/features/boulder-state/storage.ts` performs minimal validation when parsing `boulder.json`:
```typescript
const parsed = JSON.parse(content)
if (!parsed || typeof parsed !== "object" || Array.isArray(parsed)) return null
if (!Array.isArray(parsed.session_ids)) parsed.session_ids = []
return parsed as BoulderState // <-- unsafe cast, no field validation
```
It validates `session_ids` but NOT `active_plan`, `plan_name`, or `worktree_path`. This means a malformed `boulder.json` (e.g., `{}` or missing key fields) passes through and downstream code crashes.
### Crash Path
1. `boulder.json` is written without required fields (manual edit, corruption, partial write)
2. `readBoulderState()` returns it as `BoulderState` with `active_plan: undefined`
3. Multiple call sites pass `boulderState.active_plan` to `getPlanProgress(planPath: string)`:
- `src/hooks/atlas/idle-event.ts:72` (inside `setTimeout` callback - unhandled rejection!)
- `src/hooks/atlas/resolve-active-boulder-session.ts:21`
- `src/hooks/atlas/tool-execute-after.ts:74`
4. `getPlanProgress()` calls `existsSync(undefined)` which throws: `TypeError: The "path" argument must be of type string`
### worktree_path-Specific Issues
When `worktree_path` field is missing from `boulder.json`:
- The `idle-event.ts` `scheduleRetry` setTimeout callback (lines 62-88) has NO try/catch. An unhandled promise rejection from the async callback crashes the process.
- `readBoulderState()` returns `worktree_path: undefined` which itself is handled in `boulder-continuation-injector.ts` (line 42 uses truthiness check), but the surrounding code in the setTimeout lacks error protection.
### Secondary Issue: Unhandled Promise in setTimeout
In `idle-event.ts` lines 62-88:
```typescript
sessionState.pendingRetryTimer = setTimeout(async () => {
// ... no try/catch wrapper
const currentBoulder = readBoulderState(ctx.directory)
const currentProgress = getPlanProgress(currentBoulder.active_plan) // CRASH if active_plan undefined
// ...
}, RETRY_DELAY_MS)
```
The async callback creates a floating promise. Any thrown error becomes an unhandled rejection.
---
## Step-by-Step Plan
### Step 1: Harden `readBoulderState()` validation
**File:** `src/features/boulder-state/storage.ts`
- After the `session_ids` fix, add validation for `active_plan` and `plan_name` (required fields)
- Validate `worktree_path` is either `undefined` or a string (not `null`, not a number)
- Return `null` for boulder states with missing required fields
### Step 2: Add try/catch in setTimeout callback
**File:** `src/hooks/atlas/idle-event.ts`
- Wrap the `setTimeout` async callback body in try/catch
- Log errors with the atlas hook logger
### Step 3: Add defensive guard in `getPlanProgress`
**File:** `src/features/boulder-state/storage.ts`
- Add early return for non-string `planPath` argument
### Step 4: Add tests
**Files:**
- `src/features/boulder-state/storage.test.ts` - test missing/malformed fields
- `src/hooks/atlas/index.test.ts` - test atlas hook with boulder missing worktree_path
### Step 5: Run CI checks
```bash
bun run typecheck
bun test src/features/boulder-state/storage.test.ts
bun test src/hooks/atlas/index.test.ts
bun test # full suite
```
### Step 6: Create PR
- Branch: `fix/atlas-hook-missing-worktree-path`
- Target: `dev`
- Run CI and verify passes

View File

@@ -0,0 +1,23 @@
## Summary
- Fix crash in atlas hook when `boulder.json` is missing `worktree_path` (or other required fields) by hardening `readBoulderState()` validation
- Wrap the unprotected `setTimeout` retry callback in `idle-event.ts` with try/catch to prevent unhandled promise rejections
- Add defensive type guard in `getPlanProgress()` to prevent `existsSync(undefined)` TypeError
## Context
When `boulder.json` is malformed or manually edited to omit fields, `readBoulderState()` returns an object cast as `BoulderState` without validating required fields. Downstream callers like `getPlanProgress(boulderState.active_plan)` then pass `undefined` to `existsSync()`, which throws a TypeError. This crash is especially dangerous in the `setTimeout` retry callback in `idle-event.ts`, where the error becomes an unhandled promise rejection.
## Changes
### `src/features/boulder-state/storage.ts`
- `readBoulderState()`: Validate `active_plan` and `plan_name` are strings (return `null` if not)
- `readBoulderState()`: Strip `worktree_path` if present but not a string type
- `getPlanProgress()`: Add `typeof planPath !== "string"` guard before `existsSync`
### `src/hooks/atlas/idle-event.ts`
- Wrap `scheduleRetry` setTimeout async callback body in try/catch
### Tests
- `src/features/boulder-state/storage.test.ts`: 5 new tests for missing/malformed fields
- `src/hooks/atlas/index.test.ts`: 2 new tests for worktree_path presence/absence in continuation prompt

View File

@@ -0,0 +1,119 @@
# Verification Strategy
## 1. Unit Tests (Direct Verification)
### boulder-state storage tests
```bash
bun test src/features/boulder-state/storage.test.ts
```
Verify:
- `readBoulderState()` returns `null` when `active_plan` missing
- `readBoulderState()` returns `null` when `plan_name` missing
- `readBoulderState()` strips non-string `worktree_path` (e.g., `null`)
- `readBoulderState()` preserves valid string `worktree_path`
- `getPlanProgress(undefined)` returns safe default without crashing
- Existing tests still pass (session_ids defaults, empty object, etc.)
### atlas hook tests
```bash
bun test src/hooks/atlas/index.test.ts
```
Verify:
- session.idle handler works with boulder state missing `worktree_path` (no crash, prompt injected)
- session.idle handler includes `[Worktree: ...]` context when `worktree_path` IS present
- All 30+ existing tests still pass
### atlas idle-event lineage tests
```bash
bun test src/hooks/atlas/idle-event-lineage.test.ts
```
Verify existing lineage tests unaffected.
### start-work hook tests
```bash
bun test src/hooks/start-work/index.test.ts
```
Verify worktree-related start-work tests still pass (these create boulder states with/without `worktree_path`).
## 2. Type Safety
```bash
bun run typecheck
```
Verify zero new TypeScript errors. The changes are purely additive runtime guards that align with existing types (`worktree_path?: string`).
## 3. LSP Diagnostics on Changed Files
```
lsp_diagnostics on:
- src/features/boulder-state/storage.ts
- src/hooks/atlas/idle-event.ts
```
Verify zero errors/warnings.
## 4. Full Test Suite
```bash
bun test
```
Verify no regressions across the entire codebase.
## 5. Build
```bash
bun run build
```
Verify build succeeds.
## 6. Manual Smoke Test (Reproduction)
To manually verify the fix:
```bash
# Create a malformed boulder.json (missing worktree_path)
mkdir -p .sisyphus
echo '{"active_plan": ".sisyphus/plans/test.md", "plan_name": "test", "session_ids": ["ses-1"]}' > .sisyphus/boulder.json
# Create a plan file
mkdir -p .sisyphus/plans
echo '# Plan\n- [ ] Task 1' > .sisyphus/plans/test.md
# Start opencode - atlas hook should NOT crash when session.idle fires
# Verify /tmp/oh-my-opencode.log shows normal continuation behavior
```
Also test the extreme case:
```bash
# boulder.json with no required fields
echo '{}' > .sisyphus/boulder.json
# After fix: readBoulderState returns null, atlas hook gracefully skips
```
## 7. CI Pipeline
After pushing the branch, verify:
- `ci.yml` workflow passes: tests (split: mock-heavy isolated + batch), typecheck, build
- No new lint warnings
## 8. Edge Cases Covered
| Scenario | Expected Behavior |
|----------|-------------------|
| `boulder.json` = `{}` | `readBoulderState` returns `null` |
| `boulder.json` missing `active_plan` | `readBoulderState` returns `null` |
| `boulder.json` missing `plan_name` | `readBoulderState` returns `null` |
| `boulder.json` has `worktree_path: null` | Field stripped, returned as `undefined` |
| `boulder.json` has `worktree_path: 42` | Field stripped, returned as `undefined` |
| `boulder.json` has no `worktree_path` | Works normally, no crash |
| `boulder.json` has valid `worktree_path` | Preserved, included in continuation prompt |
| setTimeout retry with corrupted boulder.json | Error caught and logged, no process crash |
| `getPlanProgress(undefined)` | Returns `{ total: 0, completed: 0, isComplete: true }` |

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 325000, "total_duration_seconds": 325}

View File

@@ -0,0 +1,32 @@
{
"eval_id": 3,
"eval_name": "refactor-split-constants",
"prompt": "Refactor src/tools/delegate-task/constants.ts to split DEFAULT_CATEGORIES and CATEGORY_MODEL_REQUIREMENTS into separate files. Keep backward compatibility with the barrel export. Make a PR.",
"assertions": [
{
"id": "worktree-isolation",
"text": "Plan uses git worktree in a sibling directory",
"type": "manual"
},
{
"id": "multiple-atomic-commits",
"text": "Uses 2+ commits for the multi-file refactor",
"type": "manual"
},
{
"id": "barrel-export",
"text": "Maintains backward compatibility via barrel re-export in constants.ts or index.ts",
"type": "manual"
},
{
"id": "three-gates",
"text": "Verification loop includes all 3 gates",
"type": "manual"
},
{
"id": "real-constants-file",
"text": "References actual src/tools/delegate-task/constants.ts file and its exports",
"type": "manual"
}
]
}

View File

@@ -0,0 +1,10 @@
{
"run_id": "eval-3-with_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": true, "evidence": "../omo-wt/refactor-delegate-task-constants"},
{"text": "Uses 2+ commits for the multi-file refactor", "passed": true, "evidence": "Commit 1: category defaults+appends, Commit 2: plan agent prompt+names"},
{"text": "Maintains backward compatibility via barrel re-export", "passed": true, "evidence": "constants.ts converted to re-export from 4 new files, full import map verified"},
{"text": "Verification loop includes all 3 gates", "passed": true, "evidence": "Gate A (CI), Gate B (review-work), Gate C (Cubic)"},
{"text": "References actual src/tools/delegate-task/constants.ts", "passed": true, "evidence": "654 lines analyzed, 4 responsibilities identified, full external+internal import map"}
]
}

View File

@@ -0,0 +1,221 @@
# Code Changes
## New File: `src/tools/delegate-task/default-categories.ts`
```typescript
import type { CategoryConfig } from "../../config/schema"
export const DEFAULT_CATEGORIES: Record<string, CategoryConfig> = {
"visual-engineering": { model: "google/gemini-3.1-pro", variant: "high" },
ultrabrain: { model: "openai/gpt-5.4", variant: "xhigh" },
deep: { model: "openai/gpt-5.3-codex", variant: "medium" },
artistry: { model: "google/gemini-3.1-pro", variant: "high" },
quick: { model: "anthropic/claude-haiku-4-5" },
"unspecified-low": { model: "anthropic/claude-sonnet-4-6" },
"unspecified-high": { model: "anthropic/claude-opus-4-6", variant: "max" },
writing: { model: "kimi-for-coding/k2p5" },
}
export const CATEGORY_DESCRIPTIONS: Record<string, string> = {
"visual-engineering": "Frontend, UI/UX, design, styling, animation",
ultrabrain: "Use ONLY for genuinely hard, logic-heavy tasks. Give clear goals only, not step-by-step instructions.",
deep: "Goal-oriented autonomous problem-solving. Thorough research before action. For hairy problems requiring deep understanding.",
artistry: "Complex problem-solving with unconventional, creative approaches - beyond standard patterns",
quick: "Trivial tasks - single file changes, typo fixes, simple modifications",
"unspecified-low": "Tasks that don't fit other categories, low effort required",
"unspecified-high": "Tasks that don't fit other categories, high effort required",
writing: "Documentation, prose, technical writing",
}
```
## New File: `src/tools/delegate-task/category-prompt-appends.ts`
```typescript
export const VISUAL_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on VISUAL/UI tasks.
...
</Category_Context>`
// (exact content from lines 8-95 of constants.ts)
export const ULTRABRAIN_CATEGORY_PROMPT_APPEND = `<Category_Context>
...
</Category_Context>`
// (exact content from lines 97-117)
export const ARTISTRY_CATEGORY_PROMPT_APPEND = `<Category_Context>
...
</Category_Context>`
// (exact content from lines 119-134)
export const QUICK_CATEGORY_PROMPT_APPEND = `<Category_Context>
...
</Caller_Warning>`
// (exact content from lines 136-186)
export const UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND = `<Category_Context>
...
</Caller_Warning>`
// (exact content from lines 188-209)
export const UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND = `<Category_Context>
...
</Category_Context>`
// (exact content from lines 211-224)
export const WRITING_CATEGORY_PROMPT_APPEND = `<Category_Context>
...
</Category_Context>`
// (exact content from lines 226-250)
export const DEEP_CATEGORY_PROMPT_APPEND = `<Category_Context>
...
</Category_Context>`
// (exact content from lines 252-281)
export const CATEGORY_PROMPT_APPENDS: Record<string, string> = {
"visual-engineering": VISUAL_CATEGORY_PROMPT_APPEND,
ultrabrain: ULTRABRAIN_CATEGORY_PROMPT_APPEND,
deep: DEEP_CATEGORY_PROMPT_APPEND,
artistry: ARTISTRY_CATEGORY_PROMPT_APPEND,
quick: QUICK_CATEGORY_PROMPT_APPEND,
"unspecified-low": UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND,
"unspecified-high": UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND,
writing: WRITING_CATEGORY_PROMPT_APPEND,
}
```
## New File: `src/tools/delegate-task/plan-agent-prompt.ts`
```typescript
import type {
AvailableCategory,
AvailableSkill,
} from "../../agents/dynamic-agent-prompt-builder"
import { truncateDescription } from "../../shared/truncate-description"
/**
* System prompt prepended to plan agent invocations.
* Instructs the plan agent to first gather context via explore/librarian agents,
* then summarize user requirements and clarify uncertainties before proceeding.
* Also MANDATES dependency graphs, parallel execution analysis, and category+skill recommendations.
*/
export const PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS = `<system>
...
</CRITICAL_REQUIREMENT_DEPENDENCY_PARALLEL_EXECUTION_CATEGORY_SKILLS>
`
// (exact content from lines 324-430)
export const PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS = `### REQUIRED OUTPUT FORMAT
...
`
// (exact content from lines 432-569)
function renderPlanAgentCategoryRows(categories: AvailableCategory[]): string[] {
const sorted = [...categories].sort((a, b) => a.name.localeCompare(b.name))
return sorted.map((category) => {
const bestFor = category.description || category.name
const model = category.model || ""
return `| \`${category.name}\` | ${bestFor} | ${model} |`
})
}
function renderPlanAgentSkillRows(skills: AvailableSkill[]): string[] {
const sorted = [...skills].sort((a, b) => a.name.localeCompare(b.name))
return sorted.map((skill) => {
const domain = truncateDescription(skill.description).trim() || skill.name
return `| \`${skill.name}\` | ${domain} |`
})
}
export function buildPlanAgentSkillsSection(
categories: AvailableCategory[] = [],
skills: AvailableSkill[] = []
): string {
const categoryRows = renderPlanAgentCategoryRows(categories)
const skillRows = renderPlanAgentSkillRows(skills)
return `### AVAILABLE CATEGORIES
| Category | Best For | Model |
|----------|----------|-------|
${categoryRows.join("\n")}
### AVAILABLE SKILLS (ALWAYS EVALUATE ALL)
Skills inject specialized expertise into the delegated agent.
YOU MUST evaluate EVERY skill and justify inclusions/omissions.
| Skill | Domain |
|-------|--------|
${skillRows.join("\n")}`
}
export function buildPlanAgentSystemPrepend(
categories: AvailableCategory[] = [],
skills: AvailableSkill[] = []
): string {
return [
PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS,
buildPlanAgentSkillsSection(categories, skills),
PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS,
].join("\n\n")
}
```
## New File: `src/tools/delegate-task/plan-agent-names.ts`
```typescript
/**
* List of agent names that should be treated as plan agents (receive plan system prompt).
* Case-insensitive matching is used.
*/
export const PLAN_AGENT_NAMES = ["plan"]
/**
* Check if the given agent name is a plan agent (receives plan system prompt).
*/
export function isPlanAgent(agentName: string | undefined): boolean {
if (!agentName) return false
const lowerName = agentName.toLowerCase().trim()
return PLAN_AGENT_NAMES.some(name => lowerName === name || lowerName.includes(name))
}
/**
* Plan family: plan + prometheus. Shares mutual delegation blocking and task tool permission.
* Does NOT share system prompt (only isPlanAgent controls that).
*/
export const PLAN_FAMILY_NAMES = ["plan", "prometheus"]
/**
* Check if the given agent belongs to the plan family (blocking + task permission).
*/
export function isPlanFamily(category: string): boolean
export function isPlanFamily(category: string | undefined): boolean
export function isPlanFamily(category: string | undefined): boolean {
if (!category) return false
const lowerCategory = category.toLowerCase().trim()
return PLAN_FAMILY_NAMES.some(
(name) => lowerCategory === name || lowerCategory.includes(name)
)
}
```
## Modified File: `src/tools/delegate-task/constants.ts`
```typescript
export * from "./default-categories"
export * from "./category-prompt-appends"
export * from "./plan-agent-prompt"
export * from "./plan-agent-names"
```
## Unchanged: `src/tools/delegate-task/index.ts`
```typescript
export { createDelegateTask, resolveCategoryConfig, buildSystemContent, buildTaskPrompt } from "./tools"
export type { DelegateTaskToolOptions, SyncSessionCreatedEvent, BuildSystemContentInput } from "./tools"
export type * from "./types"
export * from "./constants"
```
No changes needed. `export * from "./constants"` transitively re-exports everything from the 4 new files.

View File

@@ -0,0 +1,104 @@
# Execution Plan: Split delegate-task/constants.ts
## Phase 0: Setup
```bash
git fetch origin dev
git worktree add ../omo-wt/refactor-delegate-task-constants origin/dev -b refactor/split-delegate-task-constants
cd ../omo-wt/refactor-delegate-task-constants
```
## Phase 1: Implement
### Analysis
`src/tools/delegate-task/constants.ts` is 654 lines with 4 distinct responsibilities:
1. **Category defaults** (lines 285-316): `DEFAULT_CATEGORIES`, `CATEGORY_DESCRIPTIONS`
2. **Category prompt appends** (lines 8-305): 8 `*_CATEGORY_PROMPT_APPEND` string constants + `CATEGORY_PROMPT_APPENDS` record
3. **Plan agent prompts** (lines 318-620): `PLAN_AGENT_SYSTEM_PREPEND_*`, builder functions
4. **Plan agent names** (lines 626-654): `PLAN_AGENT_NAMES`, `isPlanAgent`, `PLAN_FAMILY_NAMES`, `isPlanFamily`
Note: `CATEGORY_MODEL_REQUIREMENTS` is already in `src/shared/model-requirements.ts`. No move needed.
### New Files
| File | Responsibility | ~LOC |
|------|---------------|------|
| `default-categories.ts` | `DEFAULT_CATEGORIES`, `CATEGORY_DESCRIPTIONS` | ~40 |
| `category-prompt-appends.ts` | 8 prompt append constants + `CATEGORY_PROMPT_APPENDS` record | ~300 (exempt: prompt text) |
| `plan-agent-prompt.ts` | Plan agent system prompt constants + builder functions | ~250 (exempt: prompt text) |
| `plan-agent-names.ts` | `PLAN_AGENT_NAMES`, `isPlanAgent`, `PLAN_FAMILY_NAMES`, `isPlanFamily` | ~30 |
| `constants.ts` (updated) | Re-exports from all 4 files (backward compat) | ~5 |
### Commit 1: Extract category defaults and prompt appends
**Files changed**: 3 new + 1 modified
- Create `src/tools/delegate-task/default-categories.ts`
- Create `src/tools/delegate-task/category-prompt-appends.ts`
- Modify `src/tools/delegate-task/constants.ts` (remove extracted code, add re-exports)
### Commit 2: Extract plan agent prompt and names
**Files changed**: 2 new + 1 modified
- Create `src/tools/delegate-task/plan-agent-prompt.ts`
- Create `src/tools/delegate-task/plan-agent-names.ts`
- Modify `src/tools/delegate-task/constants.ts` (final: re-exports only)
### Local Validation
```bash
bun run typecheck
bun test src/tools/delegate-task/
bun run build
```
## Phase 2: PR Creation
```bash
git push -u origin refactor/split-delegate-task-constants
gh pr create --base dev --title "refactor(delegate-task): split constants.ts into focused modules" --body-file /tmp/pr-body.md
```
## Phase 3: Verify Loop
- **Gate A**: `gh pr checks --watch`
- **Gate B**: `/review-work` (5-agent review)
- **Gate C**: Wait for cubic-dev-ai[bot] "No issues found"
## Phase 4: Merge
```bash
gh pr merge --squash --delete-branch
git worktree remove ../omo-wt/refactor-delegate-task-constants
```
## Import Update Strategy
No import updates needed. Backward compatibility preserved through:
1. `constants.ts` re-exports everything from the 4 new files
2. `index.ts` already does `export * from "./constants"` (unchanged)
3. All external consumers import from `"../tools/delegate-task/constants"` or `"./constants"` -- both still work
### External Import Map (Verified -- NO CHANGES NEEDED)
| Consumer | Imports | Source Path |
|----------|---------|-------------|
| `src/agents/atlas/prompt-section-builder.ts` | `CATEGORY_DESCRIPTIONS` | `../../tools/delegate-task/constants` |
| `src/agents/builtin-agents.ts` | `CATEGORY_DESCRIPTIONS` | `../tools/delegate-task/constants` |
| `src/plugin/available-categories.ts` | `CATEGORY_DESCRIPTIONS` | `../tools/delegate-task/constants` |
| `src/plugin-handlers/category-config-resolver.ts` | `DEFAULT_CATEGORIES` | `../tools/delegate-task/constants` |
| `src/shared/merge-categories.ts` | `DEFAULT_CATEGORIES` | `../tools/delegate-task/constants` |
| `src/shared/merge-categories.test.ts` | `DEFAULT_CATEGORIES` | `../tools/delegate-task/constants` |
### Internal Import Map (Within delegate-task/ -- NO CHANGES NEEDED)
| Consumer | Imports |
|----------|---------|
| `categories.ts` | `DEFAULT_CATEGORIES`, `CATEGORY_PROMPT_APPENDS` |
| `tools.ts` | `CATEGORY_DESCRIPTIONS` |
| `prompt-builder.ts` | `buildPlanAgentSystemPrepend`, `isPlanAgent` |
| `subagent-resolver.ts` | `isPlanFamily` |
| `sync-continuation.ts` | `isPlanFamily` |
| `sync-prompt-sender.ts` | `isPlanFamily` |
| `tools.test.ts` | `DEFAULT_CATEGORIES`, `CATEGORY_PROMPT_APPENDS`, `CATEGORY_DESCRIPTIONS`, `isPlanAgent`, `PLAN_AGENT_NAMES`, `isPlanFamily`, `PLAN_FAMILY_NAMES` |

View File

@@ -0,0 +1,41 @@
# PR Title
```
refactor(delegate-task): split constants.ts into focused modules
```
# PR Body
## Summary
- Split the 654-line `src/tools/delegate-task/constants.ts` into 4 single-responsibility modules: `default-categories.ts`, `category-prompt-appends.ts`, `plan-agent-prompt.ts`, `plan-agent-names.ts`
- `constants.ts` becomes a pure re-export barrel, preserving all existing import paths (`from "./constants"` and `from "./delegate-task"`)
- Zero import changes across the codebase (6 external + 7 internal consumers verified)
## Motivation
`constants.ts` at 654 lines violates the project's 200 LOC soft limit (`modular-code-enforcement.md` rule) and bundles 4 unrelated responsibilities: category model configs, category prompt text, plan agent prompts, and plan agent name utilities.
## Changes
| New File | Responsibility | LOC |
|----------|---------------|-----|
| `default-categories.ts` | `DEFAULT_CATEGORIES`, `CATEGORY_DESCRIPTIONS` | ~25 |
| `category-prompt-appends.ts` | 8 `*_PROMPT_APPEND` constants + `CATEGORY_PROMPT_APPENDS` record | ~300 (prompt-exempt) |
| `plan-agent-prompt.ts` | Plan system prompt constants + `buildPlanAgentSystemPrepend()` | ~250 (prompt-exempt) |
| `plan-agent-names.ts` | `PLAN_AGENT_NAMES`, `isPlanAgent`, `PLAN_FAMILY_NAMES`, `isPlanFamily` | ~30 |
| `constants.ts` (updated) | 4-line re-export barrel | 4 |
## Backward Compatibility
All 13 consumers continue importing from `"./constants"` or `"../tools/delegate-task/constants"` with zero changes. The re-export chain: new modules -> `constants.ts` -> `index.ts` -> external consumers.
## Note on CATEGORY_MODEL_REQUIREMENTS
`CATEGORY_MODEL_REQUIREMENTS` already lives in `src/shared/model-requirements.ts`. No move needed. The AGENTS.md reference to it being in `constants.ts` is outdated.
## Testing
- `bun run typecheck` passes
- `bun test src/tools/delegate-task/` passes (all existing tests untouched)
- `bun run build` succeeds

View File

@@ -0,0 +1,84 @@
# Verification Strategy
## Gate A: CI (Blocking)
```bash
gh pr checks --watch
```
**Expected CI jobs** (from `ci.yml`):
1. **Tests (split)**: mock-heavy isolated + batch `bun test`
2. **Typecheck**: `bun run typecheck` (tsc --noEmit)
3. **Build**: `bun run build`
4. **Schema auto-commit**: If schema changes detected
**Likely failure points**: None. This is a pure refactor with re-exports. No runtime behavior changes.
**If CI fails**:
- Typecheck error: Missing re-export or import cycle. Fix in the new modules, amend commit.
- Test error: `tools.test.ts` imports all symbols from `"./constants"`. Re-export barrel must be complete.
## Gate B: review-work (5-Agent Review)
Invoke after CI passes:
```
/review-work
```
**5 parallel agents**:
1. **Oracle (goal/constraint)**: Verify backward compat claim. Check all 13 import paths resolve.
2. **Oracle (code quality)**: Verify single-responsibility per file, LOC limits, no catch-all violations.
3. **Oracle (security)**: No security implications in this refactor.
4. **QA (hands-on execution)**: Run `bun test src/tools/delegate-task/` and verify all pass.
5. **Context miner**: Check no related open issues/PRs conflict.
**Expected verdict**: Pass. Pure structural refactor with no behavioral changes.
## Gate C: Cubic (External Bot)
Wait for `cubic-dev-ai[bot]` to post "No issues found" on the PR.
**If Cubic flags issues**: Likely false positives on "large number of new files". Address in PR comments if needed.
## Pre-Gate Local Validation (Before Push)
```bash
# In worktree
bun run typecheck
bun test src/tools/delegate-task/
bun run build
# Verify re-exports are complete
bun -e "import * as c from './src/tools/delegate-task/constants'; console.log(Object.keys(c).sort().join('\n'))"
```
Expected exports from constants.ts (13 total):
- `ARTISTRY_CATEGORY_PROMPT_APPEND`
- `CATEGORY_DESCRIPTIONS`
- `CATEGORY_PROMPT_APPENDS`
- `DEFAULT_CATEGORIES`
- `DEEP_CATEGORY_PROMPT_APPEND`
- `PLAN_AGENT_NAMES`
- `PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS`
- `PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS`
- `PLAN_FAMILY_NAMES`
- `QUICK_CATEGORY_PROMPT_APPEND`
- `ULTRABRAIN_CATEGORY_PROMPT_APPEND`
- `UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND`
- `UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND`
- `VISUAL_CATEGORY_PROMPT_APPEND`
- `WRITING_CATEGORY_PROMPT_APPEND`
- `buildPlanAgentSkillsSection`
- `buildPlanAgentSystemPrepend`
- `isPlanAgent`
- `isPlanFamily`
## Merge Strategy
```bash
gh pr merge --squash --delete-branch
git worktree remove ../omo-wt/refactor-delegate-task-constants
```
Squash merge collapses the 2 atomic commits into 1 clean commit on dev.

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 181000, "total_duration_seconds": 181}

View File

@@ -0,0 +1,10 @@
{
"run_id": "eval-3-without_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": false, "evidence": "git checkout -b only, no worktree"},
{"text": "Uses 2+ commits for the multi-file refactor", "passed": false, "evidence": "Single atomic commit: 'refactor: split delegate-task constants and category model requirements'"},
{"text": "Maintains backward compatibility via barrel re-export", "passed": true, "evidence": "Re-exports from new files, zero consumer changes"},
{"text": "Verification loop includes all 3 gates", "passed": false, "evidence": "Only mentions typecheck/test/build. No review-work or Cubic."},
{"text": "References actual src/tools/delegate-task/constants.ts", "passed": true, "evidence": "654 lines, detailed responsibility breakdown, full import maps"}
]
}

View File

@@ -0,0 +1,342 @@
# Code Changes
## 1. NEW: `src/tools/delegate-task/default-categories.ts`
```typescript
import type { CategoryConfig } from "../../config/schema"
export const DEFAULT_CATEGORIES: Record<string, CategoryConfig> = {
"visual-engineering": { model: "google/gemini-3.1-pro", variant: "high" },
ultrabrain: { model: "openai/gpt-5.4", variant: "xhigh" },
deep: { model: "openai/gpt-5.3-codex", variant: "medium" },
artistry: { model: "google/gemini-3.1-pro", variant: "high" },
quick: { model: "anthropic/claude-haiku-4-5" },
"unspecified-low": { model: "anthropic/claude-sonnet-4-6" },
"unspecified-high": { model: "anthropic/claude-opus-4-6", variant: "max" },
writing: { model: "kimi-for-coding/k2p5" },
}
```
## 2. NEW: `src/tools/delegate-task/category-descriptions.ts`
```typescript
export const CATEGORY_DESCRIPTIONS: Record<string, string> = {
"visual-engineering": "Frontend, UI/UX, design, styling, animation",
ultrabrain: "Use ONLY for genuinely hard, logic-heavy tasks. Give clear goals only, not step-by-step instructions.",
deep: "Goal-oriented autonomous problem-solving. Thorough research before action. For hairy problems requiring deep understanding.",
artistry: "Complex problem-solving with unconventional, creative approaches - beyond standard patterns",
quick: "Trivial tasks - single file changes, typo fixes, simple modifications",
"unspecified-low": "Tasks that don't fit other categories, low effort required",
"unspecified-high": "Tasks that don't fit other categories, high effort required",
writing: "Documentation, prose, technical writing",
}
```
## 3. NEW: `src/tools/delegate-task/category-prompt-appends.ts`
```typescript
export const VISUAL_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on VISUAL/UI tasks.
...
</Category_Context>`
export const ULTRABRAIN_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on DEEP LOGICAL REASONING / COMPLEX ARCHITECTURE tasks.
...
</Category_Context>`
export const ARTISTRY_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on HIGHLY CREATIVE / ARTISTIC tasks.
...
</Category_Context>`
export const QUICK_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on SMALL / QUICK tasks.
...
</Caller_Warning>`
export const UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on tasks that don't fit specific categories but require moderate effort.
...
</Caller_Warning>`
export const UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on tasks that don't fit specific categories but require substantial effort.
...
</Category_Context>`
export const WRITING_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on WRITING / PROSE tasks.
...
</Category_Context>`
export const DEEP_CATEGORY_PROMPT_APPEND = `<Category_Context>
You are working on GOAL-ORIENTED AUTONOMOUS tasks.
...
</Category_Context>`
export const CATEGORY_PROMPT_APPENDS: Record<string, string> = {
"visual-engineering": VISUAL_CATEGORY_PROMPT_APPEND,
ultrabrain: ULTRABRAIN_CATEGORY_PROMPT_APPEND,
deep: DEEP_CATEGORY_PROMPT_APPEND,
artistry: ARTISTRY_CATEGORY_PROMPT_APPEND,
quick: QUICK_CATEGORY_PROMPT_APPEND,
"unspecified-low": UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND,
"unspecified-high": UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND,
writing: WRITING_CATEGORY_PROMPT_APPEND,
}
```
> Note: Each `*_CATEGORY_PROMPT_APPEND` contains the full template string from the original. Abbreviated with `...` here for readability. The actual code would contain the complete unmodified prompt text.
## 4. NEW: `src/tools/delegate-task/plan-agent-prompt.ts`
```typescript
import type {
AvailableCategory,
AvailableSkill,
} from "../../agents/dynamic-agent-prompt-builder"
import { truncateDescription } from "../../shared/truncate-description"
export const PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS = `<system>
BEFORE you begin planning, you MUST first understand the user's request deeply.
...
</CRITICAL_REQUIREMENT_DEPENDENCY_PARALLEL_EXECUTION_CATEGORY_SKILLS>
<FINAL_OUTPUT_FOR_CALLER>
...
</FINAL_OUTPUT_FOR_CALLER>
`
export const PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS = `### REQUIRED OUTPUT FORMAT
...
`
function renderPlanAgentCategoryRows(categories: AvailableCategory[]): string[] {
const sorted = [...categories].sort((a, b) => a.name.localeCompare(b.name))
return sorted.map((category) => {
const bestFor = category.description || category.name
const model = category.model || ""
return `| \`${category.name}\` | ${bestFor} | ${model} |`
})
}
function renderPlanAgentSkillRows(skills: AvailableSkill[]): string[] {
const sorted = [...skills].sort((a, b) => a.name.localeCompare(b.name))
return sorted.map((skill) => {
const domain = truncateDescription(skill.description).trim() || skill.name
return `| \`${skill.name}\` | ${domain} |`
})
}
export function buildPlanAgentSkillsSection(
categories: AvailableCategory[] = [],
skills: AvailableSkill[] = []
): string {
const categoryRows = renderPlanAgentCategoryRows(categories)
const skillRows = renderPlanAgentSkillRows(skills)
return `### AVAILABLE CATEGORIES
| Category | Best For | Model |
|----------|----------|-------|
${categoryRows.join("\n")}
### AVAILABLE SKILLS (ALWAYS EVALUATE ALL)
Skills inject specialized expertise into the delegated agent.
YOU MUST evaluate EVERY skill and justify inclusions/omissions.
| Skill | Domain |
|-------|--------|
${skillRows.join("\n")}`
}
export function buildPlanAgentSystemPrepend(
categories: AvailableCategory[] = [],
skills: AvailableSkill[] = []
): string {
return [
PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS,
buildPlanAgentSkillsSection(categories, skills),
PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS,
].join("\n\n")
}
```
> Note: Template strings abbreviated with `...`. Full unmodified content in the actual file.
## 5. NEW: `src/tools/delegate-task/plan-agent-identity.ts`
```typescript
/**
* List of agent names that should be treated as plan agents (receive plan system prompt).
* Case-insensitive matching is used.
*/
export const PLAN_AGENT_NAMES = ["plan"]
/**
* Check if the given agent name is a plan agent (receives plan system prompt).
*/
export function isPlanAgent(agentName: string | undefined): boolean {
if (!agentName) return false
const lowerName = agentName.toLowerCase().trim()
return PLAN_AGENT_NAMES.some(name => lowerName === name || lowerName.includes(name))
}
/**
* Plan family: plan + prometheus. Shares mutual delegation blocking and task tool permission.
* Does NOT share system prompt (only isPlanAgent controls that).
*/
export const PLAN_FAMILY_NAMES = ["plan", "prometheus"]
/**
* Check if the given agent belongs to the plan family (blocking + task permission).
*/
export function isPlanFamily(category: string): boolean
export function isPlanFamily(category: string | undefined): boolean
export function isPlanFamily(category: string | undefined): boolean {
if (!category) return false
const lowerCategory = category.toLowerCase().trim()
return PLAN_FAMILY_NAMES.some(
(name) => lowerCategory === name || lowerCategory.includes(name)
)
}
```
## 6. MODIFIED: `src/tools/delegate-task/constants.ts` (barrel re-export)
```typescript
export { DEFAULT_CATEGORIES } from "./default-categories"
export { CATEGORY_DESCRIPTIONS } from "./category-descriptions"
export {
VISUAL_CATEGORY_PROMPT_APPEND,
ULTRABRAIN_CATEGORY_PROMPT_APPEND,
ARTISTRY_CATEGORY_PROMPT_APPEND,
QUICK_CATEGORY_PROMPT_APPEND,
UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND,
UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND,
WRITING_CATEGORY_PROMPT_APPEND,
DEEP_CATEGORY_PROMPT_APPEND,
CATEGORY_PROMPT_APPENDS,
} from "./category-prompt-appends"
export {
PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS,
PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS,
buildPlanAgentSkillsSection,
buildPlanAgentSystemPrepend,
} from "./plan-agent-prompt"
export {
PLAN_AGENT_NAMES,
isPlanAgent,
PLAN_FAMILY_NAMES,
isPlanFamily,
} from "./plan-agent-identity"
```
## 7. NEW: `src/shared/category-model-requirements.ts`
```typescript
import type { ModelRequirement } from "./model-requirements"
export const CATEGORY_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
"visual-engineering": {
fallbackChain: [
{
providers: ["google", "github-copilot", "opencode"],
model: "gemini-3.1-pro",
variant: "high",
},
{ providers: ["zai-coding-plan", "opencode"], model: "glm-5" },
{
providers: ["anthropic", "github-copilot", "opencode"],
model: "claude-opus-4-6",
variant: "max",
},
{ providers: ["opencode-go"], model: "glm-5" },
{ providers: ["kimi-for-coding"], model: "k2p5" },
],
},
ultrabrain: {
fallbackChain: [
// ... full content from original
],
},
deep: {
fallbackChain: [
// ... full content from original
],
requiresModel: "gpt-5.3-codex",
},
artistry: {
fallbackChain: [
// ... full content from original
],
requiresModel: "gemini-3.1-pro",
},
quick: {
fallbackChain: [
// ... full content from original
],
},
"unspecified-low": {
fallbackChain: [
// ... full content from original
],
},
"unspecified-high": {
fallbackChain: [
// ... full content from original
],
},
writing: {
fallbackChain: [
// ... full content from original
],
},
}
```
> Note: Each category's `fallbackChain` contains the exact same entries as the original `model-requirements.ts`. Abbreviated here.
## 8. MODIFIED: `src/shared/model-requirements.ts`
**Remove** `CATEGORY_MODEL_REQUIREMENTS` from the file body. **Add** re-export at the end:
```typescript
export type FallbackEntry = {
providers: string[];
model: string;
variant?: string;
};
export type ModelRequirement = {
fallbackChain: FallbackEntry[];
variant?: string;
requiresModel?: string;
requiresAnyModel?: boolean;
requiresProvider?: string[];
};
export const AGENT_MODEL_REQUIREMENTS: Record<string, ModelRequirement> = {
// ... unchanged, full agent entries stay here
};
export { CATEGORY_MODEL_REQUIREMENTS } from "./category-model-requirements"
```
## Summary of Changes
| File | Lines Before | Lines After | Action |
|------|-------------|-------------|--------|
| `constants.ts` | 654 | ~25 | Rewrite as barrel re-export |
| `default-categories.ts` | - | ~15 | **NEW** |
| `category-descriptions.ts` | - | ~12 | **NEW** |
| `category-prompt-appends.ts` | - | ~280 | **NEW** (mostly exempt prompt text) |
| `plan-agent-prompt.ts` | - | ~270 | **NEW** (mostly exempt prompt text) |
| `plan-agent-identity.ts` | - | ~35 | **NEW** |
| `model-requirements.ts` | 311 | ~165 | Remove CATEGORY_MODEL_REQUIREMENTS |
| `category-model-requirements.ts` | - | ~150 | **NEW** |
**Zero consumer files modified.** Backward compatibility maintained through barrel re-exports.

View File

@@ -0,0 +1,131 @@
# Execution Plan: Refactor constants.ts
## Context
`src/tools/delegate-task/constants.ts` is **654 lines** with 6 distinct responsibilities. Violates the 200 LOC modular-code-enforcement rule. `CATEGORY_MODEL_REQUIREMENTS` is actually in `src/shared/model-requirements.ts` (311 lines, also violating 200 LOC), not in `constants.ts`.
## Pre-Flight Analysis
### Current `constants.ts` responsibilities:
1. **Category prompt appends** (8 template strings, ~274 LOC prompt text)
2. **DEFAULT_CATEGORIES** (Record<string, CategoryConfig>, ~10 LOC)
3. **CATEGORY_PROMPT_APPENDS** (map of category->prompt, ~10 LOC)
4. **CATEGORY_DESCRIPTIONS** (map of category->description, ~10 LOC)
5. **Plan agent prompts** (2 template strings + 4 builder functions, ~250 LOC prompt text)
6. **Plan agent identity utils** (`isPlanAgent`, `isPlanFamily`, ~30 LOC)
### Current `model-requirements.ts` responsibilities:
1. Types (`FallbackEntry`, `ModelRequirement`)
2. `AGENT_MODEL_REQUIREMENTS` (~146 LOC)
3. `CATEGORY_MODEL_REQUIREMENTS` (~148 LOC)
### Import dependency map for `constants.ts`:
**Internal consumers (within delegate-task/):**
| File | Imports |
|------|---------|
| `categories.ts` | `DEFAULT_CATEGORIES`, `CATEGORY_PROMPT_APPENDS` |
| `tools.ts` | `CATEGORY_DESCRIPTIONS` |
| `tools.test.ts` | `DEFAULT_CATEGORIES`, `CATEGORY_PROMPT_APPENDS`, `CATEGORY_DESCRIPTIONS`, `isPlanAgent`, `PLAN_AGENT_NAMES`, `isPlanFamily`, `PLAN_FAMILY_NAMES` |
| `prompt-builder.ts` | `buildPlanAgentSystemPrepend`, `isPlanAgent` |
| `subagent-resolver.ts` | `isPlanFamily` |
| `sync-continuation.ts` | `isPlanFamily` |
| `sync-prompt-sender.ts` | `isPlanFamily` |
| `index.ts` | `export * from "./constants"` (barrel) |
**External consumers (import from `"../../tools/delegate-task/constants"`):**
| File | Imports |
|------|---------|
| `agents/atlas/prompt-section-builder.ts` | `CATEGORY_DESCRIPTIONS` |
| `agents/builtin-agents.ts` | `CATEGORY_DESCRIPTIONS` |
| `plugin/available-categories.ts` | `CATEGORY_DESCRIPTIONS` |
| `plugin-handlers/category-config-resolver.ts` | `DEFAULT_CATEGORIES` |
| `shared/merge-categories.ts` | `DEFAULT_CATEGORIES` |
| `shared/merge-categories.test.ts` | `DEFAULT_CATEGORIES` |
**External consumers of `CATEGORY_MODEL_REQUIREMENTS`:**
| File | Import path |
|------|-------------|
| `tools/delegate-task/categories.ts` | `../../shared/model-requirements` |
## Step-by-Step Execution
### Step 1: Create branch
```bash
git checkout -b refactor/split-category-constants dev
```
### Step 2: Split `constants.ts` into 5 focused files
#### 2a. Create `default-categories.ts`
- Move `DEFAULT_CATEGORIES` record
- Import `CategoryConfig` type from config schema
- ~15 LOC
#### 2b. Create `category-descriptions.ts`
- Move `CATEGORY_DESCRIPTIONS` record
- No dependencies
- ~12 LOC
#### 2c. Create `category-prompt-appends.ts`
- Move all 8 `*_CATEGORY_PROMPT_APPEND` template string constants
- Move `CATEGORY_PROMPT_APPENDS` mapping record
- No dependencies (all self-contained template strings)
- ~280 LOC (mostly prompt text, exempt from 200 LOC per modular-code-enforcement)
#### 2d. Create `plan-agent-prompt.ts`
- Move `PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS`
- Move `PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS`
- Move `renderPlanAgentCategoryRows()`, `renderPlanAgentSkillRows()`
- Move `buildPlanAgentSkillsSection()`, `buildPlanAgentSystemPrepend()`
- Imports: `AvailableCategory`, `AvailableSkill` from agents, `truncateDescription` from shared
- ~270 LOC (mostly prompt text, exempt)
#### 2e. Create `plan-agent-identity.ts`
- Move `PLAN_AGENT_NAMES`, `isPlanAgent()`
- Move `PLAN_FAMILY_NAMES`, `isPlanFamily()`
- No dependencies
- ~35 LOC
### Step 3: Convert `constants.ts` to barrel re-export file
Replace entire contents with re-exports from the 5 new files. This maintains 100% backward compatibility for all existing importers.
### Step 4: Split `model-requirements.ts`
#### 4a. Create `src/shared/category-model-requirements.ts`
- Move `CATEGORY_MODEL_REQUIREMENTS` record
- Import `ModelRequirement` type from `./model-requirements`
- ~150 LOC
#### 4b. Update `model-requirements.ts`
- Remove `CATEGORY_MODEL_REQUIREMENTS`
- Add re-export: `export { CATEGORY_MODEL_REQUIREMENTS } from "./category-model-requirements"`
- Keep types (`FallbackEntry`, `ModelRequirement`) and `AGENT_MODEL_REQUIREMENTS`
- ~165 LOC (now under 200)
### Step 5: Verify no import breakage
- Run `bun run typecheck` to confirm all imports resolve
- Run `bun test` to confirm no behavioral regressions
- Run `bun run build` to confirm build succeeds
### Step 6: Verify LSP diagnostics clean
- Check `lsp_diagnostics` on all new and modified files
### Step 7: Commit and create PR
- Single atomic commit: `refactor: split delegate-task constants and category model requirements into focused modules`
- Create PR with description
## Files Modified
| File | Action |
|------|--------|
| `src/tools/delegate-task/constants.ts` | Rewrite as barrel re-export |
| `src/tools/delegate-task/default-categories.ts` | **NEW** |
| `src/tools/delegate-task/category-descriptions.ts` | **NEW** |
| `src/tools/delegate-task/category-prompt-appends.ts` | **NEW** |
| `src/tools/delegate-task/plan-agent-prompt.ts` | **NEW** |
| `src/tools/delegate-task/plan-agent-identity.ts` | **NEW** |
| `src/shared/model-requirements.ts` | Remove CATEGORY_MODEL_REQUIREMENTS, add re-export |
| `src/shared/category-model-requirements.ts` | **NEW** |
**Zero changes to any consumer files.** All existing imports work via barrel re-exports.

View File

@@ -0,0 +1,39 @@
## Summary
- Split `src/tools/delegate-task/constants.ts` (654 LOC, 6 responsibilities) into 5 focused modules: `default-categories.ts`, `category-descriptions.ts`, `category-prompt-appends.ts`, `plan-agent-prompt.ts`, `plan-agent-identity.ts`
- Extract `CATEGORY_MODEL_REQUIREMENTS` from `src/shared/model-requirements.ts` (311 LOC) into `category-model-requirements.ts`, bringing both files under the 200 LOC limit
- Convert original files to barrel re-exports for 100% backward compatibility (zero consumer changes)
## Motivation
Both files violate the project's 200 LOC modular-code-enforcement rule. `constants.ts` mixed 6 unrelated responsibilities (category configs, prompt templates, plan agent builders, identity utils). `model-requirements.ts` mixed agent and category model requirements.
## Changes
### `src/tools/delegate-task/`
| New File | Responsibility |
|----------|---------------|
| `default-categories.ts` | `DEFAULT_CATEGORIES` record |
| `category-descriptions.ts` | `CATEGORY_DESCRIPTIONS` record |
| `category-prompt-appends.ts` | 8 prompt template constants + `CATEGORY_PROMPT_APPENDS` map |
| `plan-agent-prompt.ts` | Plan agent system prompts + builder functions |
| `plan-agent-identity.ts` | `isPlanAgent`, `isPlanFamily` + name lists |
`constants.ts` is now a barrel re-export file (~25 LOC).
### `src/shared/`
| New File | Responsibility |
|----------|---------------|
| `category-model-requirements.ts` | `CATEGORY_MODEL_REQUIREMENTS` record |
`model-requirements.ts` retains types + `AGENT_MODEL_REQUIREMENTS` and re-exports `CATEGORY_MODEL_REQUIREMENTS`.
## Backward Compatibility
All existing import paths (`from "./constants"`, `from "../../tools/delegate-task/constants"`, `from "../../shared/model-requirements"`) continue to work unchanged. Zero consumer files modified.
## Testing
- `bun run typecheck` passes
- `bun test` passes (existing `tools.test.ts` validates all re-exported symbols)
- `bun run build` succeeds

View File

@@ -0,0 +1,128 @@
# Verification Strategy
## 1. Type Safety
### 1a. LSP diagnostics on all new files
```
lsp_diagnostics("src/tools/delegate-task/default-categories.ts")
lsp_diagnostics("src/tools/delegate-task/category-descriptions.ts")
lsp_diagnostics("src/tools/delegate-task/category-prompt-appends.ts")
lsp_diagnostics("src/tools/delegate-task/plan-agent-prompt.ts")
lsp_diagnostics("src/tools/delegate-task/plan-agent-identity.ts")
lsp_diagnostics("src/shared/category-model-requirements.ts")
```
### 1b. LSP diagnostics on modified files
```
lsp_diagnostics("src/tools/delegate-task/constants.ts")
lsp_diagnostics("src/shared/model-requirements.ts")
```
### 1c. Full typecheck
```bash
bun run typecheck
```
Expected: 0 errors. This confirms all 14 consumer files (8 internal + 6 external) resolve their imports correctly through the barrel re-exports.
## 2. Behavioral Regression
### 2a. Existing test suite
```bash
bun test src/tools/delegate-task/tools.test.ts
```
This test file imports `DEFAULT_CATEGORIES`, `CATEGORY_PROMPT_APPENDS`, `CATEGORY_DESCRIPTIONS`, `isPlanAgent`, `PLAN_AGENT_NAMES`, `isPlanFamily`, `PLAN_FAMILY_NAMES` from `./constants`. If the barrel re-export is correct, all these tests pass unchanged.
### 2b. Category resolver tests
```bash
bun test src/tools/delegate-task/category-resolver.test.ts
```
This exercises `resolveCategoryConfig()` which imports `DEFAULT_CATEGORIES` and `CATEGORY_PROMPT_APPENDS` from `./constants` and `CATEGORY_MODEL_REQUIREMENTS` from `../../shared/model-requirements`.
### 2c. Model selection tests
```bash
bun test src/tools/delegate-task/model-selection.test.ts
```
### 2d. Merge categories tests
```bash
bun test src/shared/merge-categories.test.ts
```
Imports `DEFAULT_CATEGORIES` from `../tools/delegate-task/constants` (external path).
### 2e. Full test suite
```bash
bun test
```
## 3. Build Verification
```bash
bun run build
```
Confirms ESM bundle + declarations emit correctly with the new file structure.
## 4. Export Completeness Verification
### 4a. Verify `constants.ts` re-exports match original exports
Cross-check that every symbol previously exported from `constants.ts` is still exported. The original file exported these symbols:
- `VISUAL_CATEGORY_PROMPT_APPEND`
- `ULTRABRAIN_CATEGORY_PROMPT_APPEND`
- `ARTISTRY_CATEGORY_PROMPT_APPEND`
- `QUICK_CATEGORY_PROMPT_APPEND`
- `UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND`
- `UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND`
- `WRITING_CATEGORY_PROMPT_APPEND`
- `DEEP_CATEGORY_PROMPT_APPEND`
- `DEFAULT_CATEGORIES`
- `CATEGORY_PROMPT_APPENDS`
- `CATEGORY_DESCRIPTIONS`
- `PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS`
- `PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS`
- `buildPlanAgentSkillsSection`
- `buildPlanAgentSystemPrepend`
- `PLAN_AGENT_NAMES`
- `isPlanAgent`
- `PLAN_FAMILY_NAMES`
- `isPlanFamily`
All 19 must be re-exported from the barrel.
### 4b. Verify `model-requirements.ts` re-exports match original exports
Original exports: `FallbackEntry`, `ModelRequirement`, `AGENT_MODEL_REQUIREMENTS`, `CATEGORY_MODEL_REQUIREMENTS`. All 4 must still be available.
## 5. LOC Compliance Check
Verify each new file is under 200 LOC (excluding prompt template text per modular-code-enforcement rule):
| File | Expected Total LOC | Non-prompt LOC | Compliant? |
|------|-------------------|----------------|------------|
| `default-categories.ts` | ~15 | ~15 | Yes |
| `category-descriptions.ts` | ~12 | ~12 | Yes |
| `category-prompt-appends.ts` | ~280 | ~15 | Yes (prompt exempt) |
| `plan-agent-prompt.ts` | ~270 | ~40 | Yes (prompt exempt) |
| `plan-agent-identity.ts` | ~35 | ~35 | Yes |
| `category-model-requirements.ts` | ~150 | ~150 | Yes |
| `model-requirements.ts` (after) | ~165 | ~165 | Yes |
| `constants.ts` (after) | ~25 | ~25 | Yes |
## 6. Consumer Impact Matrix
Verify zero consumer files need changes:
| Consumer File | Import Path | Should Still Work? |
|--------------|-------------|-------------------|
| `delegate-task/categories.ts` | `./constants` | Yes (barrel) |
| `delegate-task/tools.ts` | `./constants` | Yes (barrel) |
| `delegate-task/tools.test.ts` | `./constants` | Yes (barrel) |
| `delegate-task/prompt-builder.ts` | `./constants` | Yes (barrel) |
| `delegate-task/subagent-resolver.ts` | `./constants` | Yes (barrel) |
| `delegate-task/sync-continuation.ts` | `./constants` | Yes (barrel) |
| `delegate-task/sync-prompt-sender.ts` | `./constants` | Yes (barrel) |
| `delegate-task/index.ts` | `./constants` | Yes (barrel) |
| `agents/atlas/prompt-section-builder.ts` | `../../tools/delegate-task/constants` | Yes (barrel) |
| `agents/builtin-agents.ts` | `../tools/delegate-task/constants` | Yes (barrel) |
| `plugin/available-categories.ts` | `../tools/delegate-task/constants` | Yes (barrel) |
| `plugin-handlers/category-config-resolver.ts` | `../tools/delegate-task/constants` | Yes (barrel) |
| `shared/merge-categories.ts` | `../tools/delegate-task/constants` | Yes (barrel) |
| `shared/merge-categories.test.ts` | `../tools/delegate-task/constants` | Yes (barrel) |
| `delegate-task/categories.ts` | `../../shared/model-requirements` | Yes (re-export) |

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 229000, "total_duration_seconds": 229}

View File

@@ -0,0 +1,32 @@
{
"eval_id": 4,
"eval_name": "new-mcp-arxiv-casual",
"prompt": "implement issue #100 - we need to add a new built-in MCP for arxiv paper search. just the basic search endpoint, nothing fancy. pr it",
"assertions": [
{
"id": "worktree-isolation",
"text": "Plan uses git worktree in a sibling directory",
"type": "manual"
},
{
"id": "follows-mcp-pattern",
"text": "New MCP follows existing pattern from src/mcp/ (websearch, context7, grep_app)",
"type": "manual"
},
{
"id": "three-gates",
"text": "Verification loop includes all 3 gates",
"type": "manual"
},
{
"id": "pr-targets-dev",
"text": "PR targets dev branch",
"type": "manual"
},
{
"id": "local-validation",
"text": "Runs local checks before pushing",
"type": "manual"
}
]
}

View File

@@ -0,0 +1,10 @@
{
"run_id": "eval-4-with_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": true, "evidence": "../omo-wt/feat/arxiv-mcp"},
{"text": "New MCP follows existing pattern from src/mcp/", "passed": true, "evidence": "Follows context7.ts and grep-app.ts static export pattern"},
{"text": "Verification loop includes all 3 gates", "passed": true, "evidence": "Gate A (CI), Gate B (review-work 5 agents), Gate C (Cubic)"},
{"text": "PR targets dev branch", "passed": true, "evidence": "--base dev"},
{"text": "Runs local checks before pushing", "passed": true, "evidence": "bun run typecheck, bun test src/mcp/, bun run build"}
]
}

View File

@@ -0,0 +1,143 @@
# Code Changes: Issue #100 - Built-in arXiv MCP
## 1. NEW FILE: `src/mcp/arxiv.ts`
```typescript
export const arxiv = {
type: "remote" as const,
url: "https://mcp.arxiv.org",
enabled: true,
oauth: false as const,
}
```
Pattern: identical to `grep-app.ts` (static export, no auth, no config factory needed).
## 2. MODIFY: `src/mcp/types.ts`
```typescript
import { z } from "zod"
export const McpNameSchema = z.enum(["websearch", "context7", "grep_app", "arxiv"])
export type McpName = z.infer<typeof McpNameSchema>
export const AnyMcpNameSchema = z.string().min(1)
export type AnyMcpName = z.infer<typeof AnyMcpNameSchema>
```
Change: add `"arxiv"` to `McpNameSchema` enum.
## 3. MODIFY: `src/mcp/index.ts`
```typescript
import { createWebsearchConfig } from "./websearch"
import { context7 } from "./context7"
import { grep_app } from "./grep-app"
import { arxiv } from "./arxiv"
import type { OhMyOpenCodeConfig } from "../config/schema"
export { McpNameSchema, type McpName } from "./types"
type RemoteMcpConfig = {
type: "remote"
url: string
enabled: boolean
headers?: Record<string, string>
oauth?: false
}
export function createBuiltinMcps(disabledMcps: string[] = [], config?: OhMyOpenCodeConfig) {
const mcps: Record<string, RemoteMcpConfig> = {}
if (!disabledMcps.includes("websearch")) {
mcps.websearch = createWebsearchConfig(config?.websearch)
}
if (!disabledMcps.includes("context7")) {
mcps.context7 = context7
}
if (!disabledMcps.includes("grep_app")) {
mcps.grep_app = grep_app
}
if (!disabledMcps.includes("arxiv")) {
mcps.arxiv = arxiv
}
return mcps
}
```
Changes: import `arxiv`, add conditional block.
## 4. NEW FILE: `src/mcp/arxiv.test.ts`
```typescript
import { describe, expect, test } from "bun:test"
import { arxiv } from "./arxiv"
describe("arxiv MCP configuration", () => {
test("should have correct remote config shape", () => {
// given
// arxiv is a static export
// when
const config = arxiv
// then
expect(config.type).toBe("remote")
expect(config.url).toBe("https://mcp.arxiv.org")
expect(config.enabled).toBe(true)
expect(config.oauth).toBe(false)
})
})
```
## 5. MODIFY: `src/mcp/index.test.ts`
Changes needed:
- Test "should return all MCPs when disabled_mcps is empty": add `expect(result).toHaveProperty("arxiv")`, change length to 4
- Test "should filter out all built-in MCPs when all disabled": add `"arxiv"` to disabledMcps array, add `expect(result).not.toHaveProperty("arxiv")`
- Test "should handle empty disabled_mcps by default": add `expect(result).toHaveProperty("arxiv")`, change length to 4
- Test "should only filter built-in MCPs, ignoring unknown names": add `expect(result).toHaveProperty("arxiv")`, change length to 4
New test to add:
```typescript
test("should filter out arxiv when disabled", () => {
// given
const disabledMcps = ["arxiv"]
// when
const result = createBuiltinMcps(disabledMcps)
// then
expect(result).toHaveProperty("websearch")
expect(result).toHaveProperty("context7")
expect(result).toHaveProperty("grep_app")
expect(result).not.toHaveProperty("arxiv")
expect(Object.keys(result)).toHaveLength(3)
})
```
## 6. MODIFY: `src/mcp/AGENTS.md`
Add row to built-in MCPs table:
```
| **arxiv** | `mcp.arxiv.org` | None | arXiv paper search |
```
## Files touched summary
| File | Action |
|------|--------|
| `src/mcp/arxiv.ts` | NEW |
| `src/mcp/arxiv.test.ts` | NEW |
| `src/mcp/types.ts` | MODIFY (add enum value) |
| `src/mcp/index.ts` | MODIFY (import + conditional block) |
| `src/mcp/index.test.ts` | MODIFY (update counts + new test) |
| `src/mcp/AGENTS.md` | MODIFY (add table row) |

View File

@@ -0,0 +1,82 @@
# Execution Plan: Issue #100 - Built-in arXiv MCP
## Phase 0: Setup
1. `git fetch origin dev`
2. `git worktree add ../omo-wt/feat/arxiv-mcp origin/dev`
3. `cd ../omo-wt/feat/arxiv-mcp`
4. `git checkout -b feat/arxiv-mcp`
## Phase 1: Implement
### Step 1: Create `src/mcp/arxiv.ts`
- Follow static export pattern (same as `context7.ts` and `grep-app.ts`)
- arXiv API is public, no auth needed
- URL: `https://mcp.arxiv.org` (hypothetical remote MCP endpoint)
- If no remote MCP exists for arXiv, this would need to be a stdio MCP or a custom HTTP wrapper. For this plan, we assume a remote MCP endpoint pattern consistent with existing built-ins.
### Step 2: Update `src/mcp/types.ts`
- Add `"arxiv"` to `McpNameSchema` enum: `z.enum(["websearch", "context7", "grep_app", "arxiv"])`
### Step 3: Update `src/mcp/index.ts`
- Import `arxiv` from `"./arxiv"`
- Add conditional block in `createBuiltinMcps()`:
```typescript
if (!disabledMcps.includes("arxiv")) {
mcps.arxiv = arxiv
}
```
### Step 4: Create `src/mcp/arxiv.test.ts`
- Test arXiv config shape (type, url, enabled, oauth)
- Follow pattern from existing tests (given/when/then)
### Step 5: Update `src/mcp/index.test.ts`
- Update expected MCP count from 3 to 4
- Add `"arxiv"` to `toHaveProperty` checks
- Add `"arxiv"` to the "all disabled" test case
### Step 6: Update `src/mcp/AGENTS.md`
- Add arxiv row to the built-in MCPs table
### Step 7: Local validation
- `bun run typecheck`
- `bun test src/mcp/`
- `bun run build`
### Atomic commits (in order):
1. `feat(mcp): add arxiv paper search built-in MCP` - arxiv.ts + types.ts update
2. `test(mcp): add arxiv MCP tests` - arxiv.test.ts + index.test.ts updates
3. `docs(mcp): update AGENTS.md with arxiv MCP` - AGENTS.md update
## Phase 2: PR Creation
1. `git push -u origin feat/arxiv-mcp`
2. `gh pr create --base dev --title "feat(mcp): add built-in arXiv paper search MCP" --body-file /tmp/pull-request-arxiv-mcp-*.md`
## Phase 3: Verify Loop
### Gate A: CI
- Wait for `ci.yml` workflow (tests, typecheck, build)
- `gh run watch` or poll `gh pr checks`
### Gate B: review-work
- Run `/review-work` skill (5-agent parallel review)
- All 5 agents must pass: Oracle (goal), Oracle (code quality), Oracle (security), QA execution, context mining
### Gate C: Cubic
- Wait for cubic-dev-ai[bot] automated review
- Must show "No issues found"
- If issues found, fix and re-push
### Failure handling:
- Gate A fail: fix locally, amend or new commit, re-push
- Gate B fail: address review-work findings, new commit
- Gate C fail: address Cubic findings, new commit
- Re-enter verify loop from Gate A
## Phase 4: Merge
1. `gh pr merge --squash --delete-branch`
2. `git worktree remove ../omo-wt/feat/arxiv-mcp`
3. `git branch -D feat/arxiv-mcp` (if not auto-deleted)

View File

@@ -0,0 +1,51 @@
# PR: feat(mcp): add built-in arXiv paper search MCP
## Title
`feat(mcp): add built-in arXiv paper search MCP`
## Body
```markdown
## Summary
Closes #100
- Add `arxiv` as 4th built-in remote MCP for arXiv paper search
- Follows existing static export pattern (same as `grep_app`, `context7`)
- No auth required, disableable via `disabled_mcps: ["arxiv"]`
## Changes
- `src/mcp/arxiv.ts` - new MCP config (static export, remote type)
- `src/mcp/types.ts` - add `"arxiv"` to `McpNameSchema` enum
- `src/mcp/index.ts` - register arxiv in `createBuiltinMcps()`
- `src/mcp/arxiv.test.ts` - config shape tests
- `src/mcp/index.test.ts` - update counts, add disable test
- `src/mcp/AGENTS.md` - document new MCP
## Usage
Enabled by default. Disable with:
```jsonc
// .opencode/oh-my-opencode.jsonc
{
"disabled_mcps": ["arxiv"]
}
```
## Validation
- [x] `bun run typecheck` passes
- [x] `bun test src/mcp/` passes
- [x] `bun run build` passes
```
## Labels
`enhancement`, `mcp`
## Base branch
`dev`

View File

@@ -0,0 +1,69 @@
# Verification Strategy: Issue #100 - arXiv MCP
## Gate A: CI (`ci.yml`)
### What runs
- `bun test` (split: mock-heavy isolated + batch) - must include new `arxiv.test.ts` and updated `index.test.ts`
- `bun run typecheck` - validates `McpNameSchema` enum change propagates correctly
- `bun run build` - ensures no build regressions
### How to monitor
```bash
gh pr checks <pr-number> --watch
```
### Failure scenarios
| Failure | Likely cause | Fix |
|---------|-------------|-----|
| Type error in `types.ts` | Enum value not matching downstream consumers | Check all `McpName` usages via `lsp_find_references` |
| Test count mismatch in `index.test.ts` | Forgot to update `toHaveLength()` from 3 to 4 | Update all length assertions |
| Build failure | Import path or barrel export issue | Verify `src/mcp/index.ts` exports are clean |
### Retry
Fix locally in worktree, new commit, `git push`.
## Gate B: review-work (5-agent)
### Agents and focus areas
| Agent | What it checks for this PR |
|-------|--------------------------|
| Oracle (goal) | Does arxiv MCP satisfy issue #100 requirements? |
| Oracle (code quality) | Follows `grep-app.ts` pattern? No SRP violations? < 200 LOC? |
| Oracle (security) | No credentials hardcoded, no auth bypass |
| QA (execution) | Run tests, verify disable mechanism works |
| Context (mining) | Check issue #100 for any missed requirements |
### Pass criteria
All 5 must pass. Any single failure blocks.
### Failure handling
- Read each agent's report
- Address findings with new atomic commits
- Re-run full verify loop from Gate A
## Gate C: Cubic (`cubic-dev-ai[bot]`)
### Expected review scope
- Config shape consistency across MCPs
- Test coverage for new MCP
- Schema type safety
### Pass criteria
Comment from `cubic-dev-ai[bot]` containing "No issues found".
### Failure handling
- Read Cubic's specific findings
- Fix with new commit
- Re-push, re-enter Gate A
## Pre-merge checklist
- [ ] Gate A: CI green
- [ ] Gate B: All 5 review-work agents pass
- [ ] Gate C: Cubic "No issues found"
- [ ] No unresolved review comments
- [ ] PR has at least 1 approval (if required by branch protection)
## Post-merge
1. `gh pr merge --squash --delete-branch`
2. `git worktree remove ../omo-wt/feat/arxiv-mcp`
3. Verify merge commit on `dev` branch

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 152000, "total_duration_seconds": 152}

View File

@@ -0,0 +1,10 @@
{
"run_id": "eval-4-without_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": true, "evidence": "git worktree add ../omo-arxiv-mcp dev — agent independently chose worktree"},
{"text": "New MCP follows existing pattern from src/mcp/", "passed": true, "evidence": "Follows grep-app.ts pattern"},
{"text": "Verification loop includes all 3 gates", "passed": false, "evidence": "Only mentions bun test/typecheck/build. No review-work or Cubic."},
{"text": "PR targets dev branch", "passed": true, "evidence": "--base dev"},
{"text": "Runs local checks before pushing", "passed": true, "evidence": "bun test src/mcp/, bun run typecheck, bun run build"}
]
}

View File

@@ -0,0 +1,252 @@
# Code Changes: Built-in arXiv MCP
## 1. NEW FILE: `src/mcp/arxiv.ts`
```typescript
export const arxiv = {
type: "remote" as const,
url: "https://mcp.arxiv.org",
enabled: true,
oauth: false as const,
}
```
> **Note:** The URL `https://mcp.arxiv.org` is a placeholder. The actual endpoint needs to be verified. If no hosted arXiv MCP exists, alternatives include community-hosted servers or a self-hosted wrapper around the arXiv REST API (`export.arxiv.org/api/query`). This would be the single blocker requiring resolution before merging.
Pattern followed: `grep-app.ts` (static export, no auth, no config factory needed since arXiv API is public).
---
## 2. MODIFY: `src/mcp/types.ts`
```diff
import { z } from "zod"
-export const McpNameSchema = z.enum(["websearch", "context7", "grep_app"])
+export const McpNameSchema = z.enum(["websearch", "context7", "grep_app", "arxiv"])
export type McpName = z.infer<typeof McpNameSchema>
export const AnyMcpNameSchema = z.string().min(1)
export type AnyMcpName = z.infer<typeof AnyMcpNameSchema>
```
---
## 3. MODIFY: `src/mcp/index.ts`
```diff
import { createWebsearchConfig } from "./websearch"
import { context7 } from "./context7"
import { grep_app } from "./grep-app"
+import { arxiv } from "./arxiv"
import type { OhMyOpenCodeConfig } from "../config/schema"
-export { McpNameSchema, type McpName } from "./types"
+export { McpNameSchema, type McpName } from "./types"
type RemoteMcpConfig = {
type: "remote"
url: string
enabled: boolean
headers?: Record<string, string>
oauth?: false
}
export function createBuiltinMcps(disabledMcps: string[] = [], config?: OhMyOpenCodeConfig) {
const mcps: Record<string, RemoteMcpConfig> = {}
if (!disabledMcps.includes("websearch")) {
mcps.websearch = createWebsearchConfig(config?.websearch)
}
if (!disabledMcps.includes("context7")) {
mcps.context7 = context7
}
if (!disabledMcps.includes("grep_app")) {
mcps.grep_app = grep_app
}
+ if (!disabledMcps.includes("arxiv")) {
+ mcps.arxiv = arxiv
+ }
+
return mcps
}
```
---
## 4. MODIFY: `src/mcp/index.test.ts`
Changes needed in existing tests (count 3 → 4) plus one new test:
```diff
describe("createBuiltinMcps", () => {
test("should return all MCPs when disabled_mcps is empty", () => {
// given
const disabledMcps: string[] = []
// when
const result = createBuiltinMcps(disabledMcps)
// then
expect(result).toHaveProperty("websearch")
expect(result).toHaveProperty("context7")
expect(result).toHaveProperty("grep_app")
- expect(Object.keys(result)).toHaveLength(3)
+ expect(result).toHaveProperty("arxiv")
+ expect(Object.keys(result)).toHaveLength(4)
})
test("should filter out disabled built-in MCPs", () => {
// given
const disabledMcps = ["context7"]
// when
const result = createBuiltinMcps(disabledMcps)
// then
expect(result).toHaveProperty("websearch")
expect(result).not.toHaveProperty("context7")
expect(result).toHaveProperty("grep_app")
- expect(Object.keys(result)).toHaveLength(2)
+ expect(result).toHaveProperty("arxiv")
+ expect(Object.keys(result)).toHaveLength(3)
})
test("should filter out all built-in MCPs when all disabled", () => {
// given
- const disabledMcps = ["websearch", "context7", "grep_app"]
+ const disabledMcps = ["websearch", "context7", "grep_app", "arxiv"]
// when
const result = createBuiltinMcps(disabledMcps)
// then
expect(result).not.toHaveProperty("websearch")
expect(result).not.toHaveProperty("context7")
expect(result).not.toHaveProperty("grep_app")
+ expect(result).not.toHaveProperty("arxiv")
expect(Object.keys(result)).toHaveLength(0)
})
test("should ignore custom MCP names in disabled_mcps", () => {
// given
const disabledMcps = ["context7", "playwright", "custom"]
// when
const result = createBuiltinMcps(disabledMcps)
// then
expect(result).toHaveProperty("websearch")
expect(result).not.toHaveProperty("context7")
expect(result).toHaveProperty("grep_app")
- expect(Object.keys(result)).toHaveLength(2)
+ expect(result).toHaveProperty("arxiv")
+ expect(Object.keys(result)).toHaveLength(3)
})
test("should handle empty disabled_mcps by default", () => {
// given
// when
const result = createBuiltinMcps()
// then
expect(result).toHaveProperty("websearch")
expect(result).toHaveProperty("context7")
expect(result).toHaveProperty("grep_app")
- expect(Object.keys(result)).toHaveLength(3)
+ expect(result).toHaveProperty("arxiv")
+ expect(Object.keys(result)).toHaveLength(4)
})
test("should only filter built-in MCPs, ignoring unknown names", () => {
// given
const disabledMcps = ["playwright", "sqlite", "unknown-mcp"]
// when
const result = createBuiltinMcps(disabledMcps)
// then
expect(result).toHaveProperty("websearch")
expect(result).toHaveProperty("context7")
expect(result).toHaveProperty("grep_app")
- expect(Object.keys(result)).toHaveLength(3)
+ expect(result).toHaveProperty("arxiv")
+ expect(Object.keys(result)).toHaveLength(4)
})
+ test("should filter out arxiv when disabled", () => {
+ // given
+ const disabledMcps = ["arxiv"]
+
+ // when
+ const result = createBuiltinMcps(disabledMcps)
+
+ // then
+ expect(result).toHaveProperty("websearch")
+ expect(result).toHaveProperty("context7")
+ expect(result).toHaveProperty("grep_app")
+ expect(result).not.toHaveProperty("arxiv")
+ expect(Object.keys(result)).toHaveLength(3)
+ })
+
// ... existing tavily test unchanged
})
```
---
## 5. MODIFY: `src/mcp/AGENTS.md`
```diff
-# src/mcp/ — 3 Built-in Remote MCPs
+# src/mcp/ — 4 Built-in Remote MCPs
**Generated:** 2026-03-06
## OVERVIEW
-Tier 1 of the three-tier MCP system. 3 remote HTTP MCPs created via `createBuiltinMcps(disabledMcps, config)`.
+Tier 1 of the three-tier MCP system. 4 remote HTTP MCPs created via `createBuiltinMcps(disabledMcps, config)`.
## BUILT-IN MCPs
| Name | URL | Env Vars | Tools |
|------|-----|----------|-------|
| **websearch** | `mcp.exa.ai` (default) or `mcp.tavily.com` | `EXA_API_KEY` (optional), `TAVILY_API_KEY` (if tavily) | Web search |
| **context7** | `mcp.context7.com/mcp` | `CONTEXT7_API_KEY` (optional) | Library documentation |
| **grep_app** | `mcp.grep.app` | None | GitHub code search |
+| **arxiv** | `mcp.arxiv.org` | None | arXiv paper search |
...
## FILES
| File | Purpose |
|------|---------|
| `index.ts` | `createBuiltinMcps()` factory |
-| `types.ts` | `McpNameSchema`: "websearch" \| "context7" \| "grep_app" |
+| `types.ts` | `McpNameSchema`: "websearch" \| "context7" \| "grep_app" \| "arxiv" |
| `websearch.ts` | Exa/Tavily provider with config |
| `context7.ts` | Context7 with optional auth header |
| `grep-app.ts` | Grep.app (no auth) |
+| `arxiv.ts` | arXiv paper search (no auth) |
```
---
## Summary of Touched Files
| File | Lines Changed | Type |
|------|--------------|------|
| `src/mcp/arxiv.ts` | +6 (new) | Create |
| `src/mcp/types.ts` | 1 line modified | Modify |
| `src/mcp/index.ts` | +5 (import + block) | Modify |
| `src/mcp/index.test.ts` | ~20 lines (count fixes + new test) | Modify |
| `src/mcp/AGENTS.md` | ~6 lines | Modify |
Total: ~37 lines added/modified across 5 files. Minimal, surgical change.

View File

@@ -0,0 +1,83 @@
# Execution Plan: Add Built-in arXiv MCP (Issue #100)
## Pre-Implementation
1. **Create worktree + branch**
```bash
git worktree add ../omo-arxiv-mcp dev
cd ../omo-arxiv-mcp
git checkout -b feat/arxiv-mcp
```
2. **Verify arXiv MCP endpoint exists**
- The arXiv API is public (`export.arxiv.org/api/query`) but has no native MCP endpoint
- Need to identify a hosted remote MCP server for arXiv (e.g., community-maintained or self-hosted)
- If no hosted endpoint exists, consider alternatives: (a) use a community-hosted one from the MCP registry, (b) flag this in the PR and propose a follow-up for hosting
- For this plan, assume a remote MCP endpoint at a URL like `https://mcp.arxiv.org` or a third-party equivalent
## Implementation Steps (4 files to modify, 2 files to create)
### Step 1: Create `src/mcp/arxiv.ts`
- Follow the `grep-app.ts` pattern (simplest: static export, no auth, no config)
- arXiv API is public, so no API key needed
- Export a `const arxiv` with `type: "remote"`, `url`, `enabled: true`, `oauth: false`
### Step 2: Update `src/mcp/types.ts`
- Add `"arxiv"` to the `McpNameSchema` z.enum array
- This makes it a recognized built-in MCP name
### Step 3: Update `src/mcp/index.ts`
- Import `arxiv` from `"./arxiv"`
- Add the `if (!disabledMcps.includes("arxiv"))` block inside `createBuiltinMcps()`
- Place it after `grep_app` block (alphabetical among new additions, or last)
### Step 4: Update `src/mcp/index.test.ts`
- Update test "should return all MCPs when disabled_mcps is empty" to expect 4 MCPs instead of 3
- Update test "should filter out all built-in MCPs when all disabled" to include "arxiv" in the disabled list and expect it not present
- Update test "should handle empty disabled_mcps by default" to expect 4 MCPs
- Update test "should only filter built-in MCPs, ignoring unknown names" to expect 4 MCPs
- Add new test: "should filter out arxiv when disabled"
### Step 5: Create `src/mcp/arxiv.test.ts` (optional, only if factory pattern used)
- If using static export (like grep-app), no separate test file needed
- If using factory with config, add tests following `websearch.test.ts` pattern
### Step 6: Update `src/mcp/AGENTS.md`
- Add arxiv to the built-in MCPs table
- Update "3 Built-in Remote MCPs" to "4 Built-in Remote MCPs"
- Add arxiv to the FILES table
## Post-Implementation
### Verification
```bash
bun test src/mcp/ # Run MCP tests
bun run typecheck # Verify no type errors
bun run build # Verify build passes
```
### PR Creation
```bash
git add src/mcp/arxiv.ts src/mcp/types.ts src/mcp/index.ts src/mcp/index.test.ts src/mcp/AGENTS.md
git commit -m "feat(mcp): add built-in arxiv paper search MCP"
git push -u origin feat/arxiv-mcp
gh pr create --title "feat(mcp): add built-in arxiv paper search MCP" --body-file /tmp/pull-request-arxiv-mcp-....md --base dev
```
## Risk Assessment
| Risk | Likelihood | Mitigation |
|------|-----------|------------|
| No hosted arXiv MCP endpoint exists | Medium | Research MCP registries; worst case, create a minimal hosted wrapper or use a community server |
| Existing tests break due to MCP count change | Low | Update hardcoded count assertions from 3 to 4 |
| Config schema needs updates | None | `disabled_mcps` uses `AnyMcpNameSchema` (any string), not `McpNameSchema`, so no schema change needed for disable functionality |
## Files Changed Summary
| File | Action | Description |
|------|--------|-------------|
| `src/mcp/arxiv.ts` | Create | Static remote MCP config export |
| `src/mcp/types.ts` | Modify | Add "arxiv" to McpNameSchema enum |
| `src/mcp/index.ts` | Modify | Import + register in createBuiltinMcps() |
| `src/mcp/index.test.ts` | Modify | Update count assertions, add arxiv-specific test |
| `src/mcp/AGENTS.md` | Modify | Update docs to reflect 4 MCPs |

View File

@@ -0,0 +1,33 @@
## Summary
- Add `arxiv` as a 4th built-in remote MCP for arXiv paper search
- Follows the `grep-app.ts` pattern: static export, no auth required (arXiv API is public)
- Fully integrated with `disabled_mcps` config and `McpNameSchema` validation
## Changes
| File | Change |
|------|--------|
| `src/mcp/arxiv.ts` | New remote MCP config pointing to arXiv MCP endpoint |
| `src/mcp/types.ts` | Add `"arxiv"` to `McpNameSchema` enum |
| `src/mcp/index.ts` | Import + register arxiv in `createBuiltinMcps()` |
| `src/mcp/index.test.ts` | Update count assertions (3 → 4), add arxiv disable test |
| `src/mcp/AGENTS.md` | Update docs to reflect 4 built-in MCPs |
## How to Test
```bash
bun test src/mcp/
```
## How to Disable
```jsonc
// Method 1: disabled_mcps
{ "disabled_mcps": ["arxiv"] }
// Method 2: enabled flag
{ "mcp": { "arxiv": { "enabled": false } } }
```
Closes #100

View File

@@ -0,0 +1,101 @@
# Verification Strategy: arXiv MCP
## 1. Type Safety
```bash
bun run typecheck
```
Verify:
- `McpNameSchema` type union includes `"arxiv"`
- `arxiv` export in `arxiv.ts` matches `RemoteMcpConfig` shape
- Import in `index.ts` resolves correctly
- No new type errors introduced
## 2. Unit Tests
```bash
bun test src/mcp/
```
### Existing test updates verified:
- `index.test.ts`: All 7 existing tests pass with updated count (3 → 4)
- `websearch.test.ts`: Unchanged, still passes (no side effects)
### New test coverage:
- `index.test.ts`: New test "should filter out arxiv when disabled" passes
- Arxiv appears in all "all MCPs" assertions
- Arxiv excluded when in `disabled_mcps`
## 3. Build Verification
```bash
bun run build
```
Verify:
- ESM bundle includes `arxiv.ts` module
- Type declarations emitted for `arxiv` export
- No build errors
## 4. Integration Check
### Config disable path
- Add `"arxiv"` to `disabled_mcps` in test config → verify MCP excluded from `createBuiltinMcps()` output
- This is already covered by the unit test, but can be manually verified:
```typescript
import { createBuiltinMcps } from "./src/mcp"
const withArxiv = createBuiltinMcps([])
console.log(Object.keys(withArxiv)) // ["websearch", "context7", "grep_app", "arxiv"]
const withoutArxiv = createBuiltinMcps(["arxiv"])
console.log(Object.keys(withoutArxiv)) // ["websearch", "context7", "grep_app"]
```
### MCP config handler path
- `mcp-config-handler.ts` calls `createBuiltinMcps()` and merges results
- No changes needed there; arxiv automatically included in the merge
- Verify by checking `applyMcpConfig()` output includes arxiv when not disabled
## 5. LSP Diagnostics
```bash
# Run on all changed files
```
Check `lsp_diagnostics` on:
- `src/mcp/arxiv.ts`
- `src/mcp/types.ts`
- `src/mcp/index.ts`
- `src/mcp/index.test.ts`
All must return 0 errors.
## 6. Endpoint Verification (Manual / Pre-merge)
**Critical:** Before merging, verify the arXiv MCP endpoint URL is actually reachable:
```bash
curl -s -o /dev/null -w "%{http_code}" https://mcp.arxiv.org
```
If the endpoint doesn't exist or returns non-2xx, the MCP will silently fail at runtime (MCP framework handles connection errors gracefully). This is acceptable for a built-in MCP but should be documented.
## 7. Regression Check
Verify no existing functionality is broken:
- `bun test` (full suite) passes
- Existing 3 MCPs (websearch, context7, grep_app) still work
- `disabled_mcps` config still works for all MCPs
- `mcp-config-handler.test.ts` passes (if it has count-based assertions, update them)
## Checklist
- [ ] `bun run typecheck` passes
- [ ] `bun test src/mcp/` passes (all tests green)
- [ ] `bun run build` succeeds
- [ ] `lsp_diagnostics` clean on all 4 changed files
- [ ] arXiv MCP endpoint URL verified reachable
- [ ] No hardcoded MCP count assertions broken elsewhere in codebase
- [ ] AGENTS.md updated to reflect 4 MCPs

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 197000, "total_duration_seconds": 197}

View File

@@ -0,0 +1,32 @@
{
"eval_id": 5,
"eval_name": "regex-fix-false-positive",
"prompt": "The comment-checker hook is too aggressive - it's flagging legitimate comments that happen to contain 'Note:' as AI slop. Relax the regex pattern and add test cases for the false positives. Work on a separate branch and make a PR.",
"assertions": [
{
"id": "worktree-isolation",
"text": "Plan uses git worktree in a sibling directory",
"type": "manual"
},
{
"id": "real-comment-checker-files",
"text": "References actual comment-checker hook files in the codebase",
"type": "manual"
},
{
"id": "regression-tests",
"text": "Adds test cases specifically for 'Note:' false positive scenarios",
"type": "manual"
},
{
"id": "three-gates",
"text": "Verification loop includes all 3 gates",
"type": "manual"
},
{
"id": "minimal-change",
"text": "Only modifies regex and adds tests — no unrelated changes",
"type": "manual"
}
]
}

View File

@@ -0,0 +1,10 @@
{
"run_id": "eval-5-with_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": true, "evidence": "../omo-wt/fix/comment-checker-note-false-positive"},
{"text": "References actual comment-checker hook files", "passed": true, "evidence": "Found Go binary, extracted 24 regex patterns, references cli.ts, cli-runner.ts, hook.ts"},
{"text": "Adds test cases for Note: false positive scenarios", "passed": true, "evidence": "Commit 3 dedicated to false positive test cases"},
{"text": "Verification loop includes all 3 gates", "passed": true, "evidence": "Gate A (CI), Gate B (review-work 5 agents), Gate C (Cubic)"},
{"text": "Only modifies regex and adds tests — no unrelated changes", "passed": false, "evidence": "Also proposes config schema change (exclude_patterns) and Go binary update — goes beyond minimal fix"}
]
}

View File

@@ -0,0 +1,387 @@
# Code Changes
## File 1: `src/config/schema/comment-checker.ts`
### Before
```typescript
import { z } from "zod"
export const CommentCheckerConfigSchema = z.object({
/** Custom prompt to replace the default warning message. Use {{comments}} placeholder for detected comments XML. */
custom_prompt: z.string().optional(),
})
export type CommentCheckerConfig = z.infer<typeof CommentCheckerConfigSchema>
```
### After
```typescript
import { z } from "zod"
export const CommentCheckerConfigSchema = z.object({
/** Custom prompt to replace the default warning message. Use {{comments}} placeholder for detected comments XML. */
custom_prompt: z.string().optional(),
/** Regex patterns to exclude from comment detection (e.g. ["^Note:", "^TODO:"]). Case-insensitive. */
exclude_patterns: z.array(z.string()).optional(),
})
export type CommentCheckerConfig = z.infer<typeof CommentCheckerConfigSchema>
```
---
## File 2: `src/hooks/comment-checker/cli.ts`
### Change: `runCommentChecker` function (line 151)
Add `excludePatterns` parameter and pass `--exclude-pattern` flags to the binary.
### Before (line 151)
```typescript
export async function runCommentChecker(input: HookInput, cliPath?: string, customPrompt?: string): Promise<CheckResult> {
const binaryPath = cliPath ?? resolvedCliPath ?? getCommentCheckerPathSync()
// ...
try {
const args = [binaryPath, "check"]
if (customPrompt) {
args.push("--prompt", customPrompt)
}
```
### After
```typescript
export async function runCommentChecker(
input: HookInput,
cliPath?: string,
customPrompt?: string,
excludePatterns?: string[],
): Promise<CheckResult> {
const binaryPath = cliPath ?? resolvedCliPath ?? getCommentCheckerPathSync()
// ...
try {
const args = [binaryPath, "check"]
if (customPrompt) {
args.push("--prompt", customPrompt)
}
if (excludePatterns) {
for (const pattern of excludePatterns) {
args.push("--exclude-pattern", pattern)
}
}
```
---
## File 3: `src/hooks/comment-checker/cli-runner.ts`
### Change: `processWithCli` function (line 43)
Add `excludePatterns` parameter threading.
### Before (line 43-79)
```typescript
export async function processWithCli(
input: { tool: string; sessionID: string; callID: string },
pendingCall: PendingCall,
output: { output: string },
cliPath: string,
customPrompt: string | undefined,
debugLog: (...args: unknown[]) => void,
): Promise<void> {
await withCommentCheckerLock(async () => {
// ...
const result = await runCommentChecker(hookInput, cliPath, customPrompt)
```
### After
```typescript
export async function processWithCli(
input: { tool: string; sessionID: string; callID: string },
pendingCall: PendingCall,
output: { output: string },
cliPath: string,
customPrompt: string | undefined,
debugLog: (...args: unknown[]) => void,
excludePatterns?: string[],
): Promise<void> {
await withCommentCheckerLock(async () => {
// ...
const result = await runCommentChecker(hookInput, cliPath, customPrompt, excludePatterns)
```
### Change: `processApplyPatchEditsWithCli` function (line 87)
Same pattern - thread `excludePatterns` through.
### Before (line 87-120)
```typescript
export async function processApplyPatchEditsWithCli(
sessionID: string,
edits: ApplyPatchEdit[],
output: { output: string },
cliPath: string,
customPrompt: string | undefined,
debugLog: (...args: unknown[]) => void,
): Promise<void> {
// ...
const result = await runCommentChecker(hookInput, cliPath, customPrompt)
```
### After
```typescript
export async function processApplyPatchEditsWithCli(
sessionID: string,
edits: ApplyPatchEdit[],
output: { output: string },
cliPath: string,
customPrompt: string | undefined,
debugLog: (...args: unknown[]) => void,
excludePatterns?: string[],
): Promise<void> {
// ...
const result = await runCommentChecker(hookInput, cliPath, customPrompt, excludePatterns)
```
---
## File 4: `src/hooks/comment-checker/hook.ts`
### Change: Thread `config.exclude_patterns` through to CLI calls
### Before (line 177)
```typescript
await processWithCli(input, pendingCall, output, cliPath, config?.custom_prompt, debugLog)
```
### After
```typescript
await processWithCli(input, pendingCall, output, cliPath, config?.custom_prompt, debugLog, config?.exclude_patterns)
```
### Before (line 147-154)
```typescript
await processApplyPatchEditsWithCli(
input.sessionID,
edits,
output,
cliPath,
config?.custom_prompt,
debugLog,
)
```
### After
```typescript
await processApplyPatchEditsWithCli(
input.sessionID,
edits,
output,
cliPath,
config?.custom_prompt,
debugLog,
config?.exclude_patterns,
)
```
---
## File 5: `src/hooks/comment-checker/cli.test.ts` (new tests added)
### New test cases appended inside `describe("runCommentChecker", ...)`
```typescript
test("does not flag legitimate Note: comments when excluded", async () => {
// given
const { runCommentChecker } = await import("./cli")
const binaryPath = createScriptBinary(`#!/bin/sh
if [ "$1" != "check" ]; then
exit 1
fi
# Check if --exclude-pattern is passed
for arg in "$@"; do
if [ "$arg" = "--exclude-pattern" ]; then
cat >/dev/null
exit 0
fi
done
cat >/dev/null
echo "Detected agent memo comments" 1>&2
exit 2
`)
// when
const result = await runCommentChecker(
createMockInput(),
binaryPath,
undefined,
["^Note:"],
)
// then
expect(result.hasComments).toBe(false)
})
test("passes multiple exclude patterns to binary", async () => {
// given
const { runCommentChecker } = await import("./cli")
const capturedArgs: string[] = []
const binaryPath = createScriptBinary(`#!/bin/sh
echo "$@" > /tmp/comment-checker-test-args.txt
cat >/dev/null
exit 0
`)
// when
await runCommentChecker(
createMockInput(),
binaryPath,
undefined,
["^Note:", "^TODO:"],
)
// then
const { readFileSync } = await import("node:fs")
const args = readFileSync("/tmp/comment-checker-test-args.txt", "utf-8").trim()
expect(args).toContain("--exclude-pattern")
expect(args).toContain("^Note:")
expect(args).toContain("^TODO:")
})
test("still detects AI slop when no exclude patterns configured", async () => {
// given
const { runCommentChecker } = await import("./cli")
const binaryPath = createScriptBinary(`#!/bin/sh
if [ "$1" != "check" ]; then
exit 1
fi
cat >/dev/null
echo "Detected: // Note: This was added to handle..." 1>&2
exit 2
`)
// when
const result = await runCommentChecker(createMockInput(), binaryPath)
// then
expect(result.hasComments).toBe(true)
expect(result.message).toContain("Detected")
})
```
### New describe block for false positive scenarios
```typescript
describe("false positive scenarios", () => {
test("legitimate technical Note: should not be flagged", async () => {
// given
const { runCommentChecker } = await import("./cli")
const binaryPath = createScriptBinary(`#!/bin/sh
cat >/dev/null
# Simulate binary that passes when exclude patterns are set
for arg in "$@"; do
if [ "$arg" = "^Note:" ]; then
exit 0
fi
done
echo "// Note: Thread-safe by design" 1>&2
exit 2
`)
// when
const resultWithExclude = await runCommentChecker(
createMockInput(),
binaryPath,
undefined,
["^Note:"],
)
// then
expect(resultWithExclude.hasComments).toBe(false)
})
test("RFC reference Note: should not be flagged", async () => {
// given
const { runCommentChecker } = await import("./cli")
const binaryPath = createScriptBinary(`#!/bin/sh
cat >/dev/null
for arg in "$@"; do
if [ "$arg" = "^Note:" ]; then
exit 0
fi
done
echo "# Note: See RFC 7231" 1>&2
exit 2
`)
// when
const result = await runCommentChecker(
createMockInput(),
binaryPath,
undefined,
["^Note:"],
)
// then
expect(result.hasComments).toBe(false)
})
test("AI memo Note: should still be flagged without exclusion", async () => {
// given
const { runCommentChecker } = await import("./cli")
const binaryPath = createScriptBinary(`#!/bin/sh
cat >/dev/null
echo "// Note: This was added to handle the edge case" 1>&2
exit 2
`)
// when
const result = await runCommentChecker(createMockInput(), binaryPath)
// then
expect(result.hasComments).toBe(true)
})
})
```
---
## File 6: `src/hooks/comment-checker/hook.apply-patch.test.ts` (added test)
### New test appended to `describe("comment-checker apply_patch integration")`
```typescript
it("passes exclude_patterns from config to CLI", async () => {
// given
const hooks = createCommentCheckerHooks({ exclude_patterns: ["^Note:", "^TODO:"] })
const input = { tool: "apply_patch", sessionID: "ses_test", callID: "call_test" }
const output = {
title: "ok",
output: "Success. Updated the following files:\nM src/a.ts",
metadata: {
files: [
{
filePath: "/repo/src/a.ts",
before: "const a = 1\n",
after: "// Note: Thread-safe\nconst a = 1\n",
type: "update",
},
],
},
}
// when
await hooks["tool.execute.after"](input, output)
// then
expect(processApplyPatchEditsWithCli).toHaveBeenCalledWith(
"ses_test",
[{ filePath: "/repo/src/a.ts", before: "const a = 1\n", after: "// Note: Thread-safe\nconst a = 1\n" }],
expect.any(Object),
"/tmp/fake-comment-checker",
undefined,
expect.any(Function),
["^Note:", "^TODO:"],
)
})
```

View File

@@ -0,0 +1,112 @@
# Execution Plan: Relax comment-checker "Note:" false positives
## Phase 0: Setup (Worktree + Branch)
1. Create worktree from `origin/dev`:
```bash
git fetch origin dev
git worktree add ../omo-wt/fix/comment-checker-note-false-positive origin/dev
cd ../omo-wt/fix/comment-checker-note-false-positive
git checkout -b fix/comment-checker-note-false-positive
bun install
```
2. Verify clean build before touching anything:
```bash
bun run typecheck && bun test && bun run build
```
## Phase 1: Implement
### Problem Analysis
The comment-checker delegates to an external Go binary (`code-yeongyu/go-claude-code-comment-checker` v0.4.1). The binary contains the regex `(?i)^[\s#/*-]*note:\s*\w` which matches ANY comment starting with "Note:" followed by a word character. This flags legitimate technical notes like:
- `// Note: Thread-safe by design`
- `# Note: See RFC 7231 for details`
- `// Note: This edge case requires special handling`
Full list of 24 embedded regex patterns extracted from the binary:
| Pattern | Purpose |
|---------|---------|
| `(?i)^[\s#/*-]*note:\s*\w` | **THE PROBLEM** - Matches all "Note:" comments |
| `(?i)^[\s#/*-]*added?\b` | Detects "add/added" |
| `(?i)^[\s#/*-]*removed?\b` | Detects "remove/removed" |
| `(?i)^[\s#/*-]*deleted?\b` | Detects "delete/deleted" |
| `(?i)^[\s#/*-]*replaced?\b` | Detects "replace/replaced" |
| `(?i)^[\s#/*-]*implemented?\b` | Detects "implement/implemented" |
| `(?i)^[\s#/*-]*previously\b` | Detects "previously" |
| `(?i)^[\s#/*-]*here\s+we\b` | Detects "here we" |
| `(?i)^[\s#/*-]*refactor(ed\|ing)?\b` | Detects "refactor" variants |
| `(?i)^[\s#/*-]*implementation\s+(of\|note)\b` | Detects "implementation of/note" |
| `(?i)^[\s#/*-]*this\s+(implements?\|adds?\|removes?\|changes?\|fixes?)\b` | Detects "this implements/adds/etc" |
| ... and 13 more migration/change patterns | |
### Approach
Since the regex lives in the Go binary and this repo wraps it, the fix is two-pronged:
**A. Go binary update** (separate repo: `code-yeongyu/go-claude-code-comment-checker`):
- Relax `(?i)^[\s#/*-]*note:\s*\w` to only match AI-style memo patterns like `Note: this was changed...`, `Note: implementation details...`
- Add `--exclude-pattern` CLI flag for user-configurable exclusions
**B. This repo (oh-my-opencode)** - the PR scope:
1. Add `exclude_patterns` config field to `CommentCheckerConfigSchema`
2. Pass `--exclude-pattern` flags to the CLI binary
3. Add integration tests with mock binaries for false positive scenarios
### Commit Plan (Atomic)
| # | Commit | Files |
|---|--------|-------|
| 1 | `feat(config): add exclude_patterns to comment-checker config` | `src/config/schema/comment-checker.ts` |
| 2 | `feat(comment-checker): pass exclude patterns to CLI binary` | `src/hooks/comment-checker/cli.ts`, `src/hooks/comment-checker/cli-runner.ts` |
| 3 | `test(comment-checker): add false positive test cases for Note: comments` | `src/hooks/comment-checker/cli.test.ts`, `src/hooks/comment-checker/hook.apply-patch.test.ts` |
### Local Validation (after each commit)
```bash
bun run typecheck
bun test src/hooks/comment-checker/
bun test src/config/
bun run build
```
## Phase 2: PR Creation
```bash
git push -u origin fix/comment-checker-note-false-positive
gh pr create --base dev \
--title "fix(comment-checker): relax regex to stop flagging legitimate Note: comments" \
--body-file /tmp/pr-body.md
```
## Phase 3: Verify Loop
### Gate A: CI
- Wait for `ci.yml` workflow (tests, typecheck, build)
- If CI fails: fix locally, amend or new commit, force push
### Gate B: review-work (5-agent)
- Run `/review-work` to trigger 5 parallel sub-agents:
- Oracle (goal/constraint verification)
- Oracle (code quality)
- Oracle (security)
- Hephaestus (hands-on QA execution)
- Hephaestus (context mining)
- All 5 must pass
### Gate C: Cubic
- Wait for `cubic-dev-ai[bot]` review
- Must see "No issues found" comment
- If issues found: address feedback, push fix, re-request review
## Phase 4: Merge
```bash
gh pr merge --squash --auto
# Cleanup worktree
cd /Users/yeongyu/local-workspaces/omo
git worktree remove ../omo-wt/fix/comment-checker-note-false-positive
```

View File

@@ -0,0 +1,51 @@
# PR: fix(comment-checker): relax regex to stop flagging legitimate Note: comments
**Title:** `fix(comment-checker): relax regex to stop flagging legitimate Note: comments`
**Base:** `dev`
**Branch:** `fix/comment-checker-note-false-positive`
---
## Summary
- Add `exclude_patterns` config to comment-checker schema, allowing users to whitelist comment prefixes (e.g. `["^Note:", "^TODO:"]`) that should not be flagged as AI slop
- Thread the exclude patterns through `cli-runner.ts` and `cli.ts` to the Go binary via `--exclude-pattern` flags
- Add test cases covering false positive scenarios: legitimate technical notes, RFC references, and AI memo detection with/without exclusions
## Context
The comment-checker Go binary (`go-claude-code-comment-checker` v0.4.1) contains the regex `(?i)^[\s#/*-]*note:\s*\w` which matches ALL comments starting with "Note:" followed by a word character. This produces false positives for legitimate technical comments:
```typescript
// Note: Thread-safe by design <- flagged as AI slop
# Note: See RFC 7231 for details <- flagged as AI slop
// Note: This edge case requires... <- flagged as AI slop
```
These are standard engineering comments, not AI agent memos.
## Changes
| File | Change |
|------|--------|
| `src/config/schema/comment-checker.ts` | Add `exclude_patterns: string[]` optional field |
| `src/hooks/comment-checker/cli.ts` | Pass `--exclude-pattern` flags to binary |
| `src/hooks/comment-checker/cli-runner.ts` | Thread `excludePatterns` through `processWithCli` and `processApplyPatchEditsWithCli` |
| `src/hooks/comment-checker/hook.ts` | Pass `config.exclude_patterns` to CLI runner calls |
| `src/hooks/comment-checker/cli.test.ts` | Add 6 new test cases for false positive scenarios |
| `src/hooks/comment-checker/hook.apply-patch.test.ts` | Add test verifying exclude_patterns config threading |
## Usage
```jsonc
// .opencode/oh-my-opencode.jsonc
{
"comment_checker": {
"exclude_patterns": ["^Note:", "^TODO:", "^FIXME:"]
}
}
```
## Related
- Go binary repo: `code-yeongyu/go-claude-code-comment-checker` (needs corresponding `--exclude-pattern` flag support)

View File

@@ -0,0 +1,75 @@
# Verification Strategy
## Gate A: CI (`ci.yml`)
### Pre-push local validation
```bash
bun run typecheck # Zero new type errors
bun test src/hooks/comment-checker/ # All comment-checker tests pass
bun test src/config/ # Config schema tests pass
bun run build # Build succeeds
```
### CI pipeline expectations
| Step | Expected |
|------|----------|
| Tests (mock-heavy isolated) | Pass - comment-checker tests run in isolation |
| Tests (batch) | Pass - no regression in other hook tests |
| Typecheck (`tsc --noEmit`) | Pass - new `exclude_patterns` field is `z.array(z.string()).optional()` |
| Build | Pass - schema change is additive |
| Schema auto-commit | May trigger if schema JSON is auto-generated |
### Failure handling
- Type errors: Fix in worktree, new commit, push
- Test failures: Investigate, fix, new commit, push
- Schema auto-commit conflicts: Rebase on dev, resolve, force push
## Gate B: review-work (5-agent)
### Agent expectations
| Agent | Role | Focus Areas |
|-------|------|-------------|
| Oracle (goal) | Verify fix addresses false positive issue | Config schema matches PR description, exclude_patterns flows correctly |
| Oracle (code quality) | Code quality check | Factory pattern consistency, no catch-all files, <200 LOC |
| Oracle (security) | Security review | Regex patterns are user-supplied - verify no ReDoS risk from config |
| Hephaestus (QA) | Hands-on execution | Run tests, verify mock binary tests actually exercise the exclude flow |
| Hephaestus (context) | Context mining | Check git history for related changes, verify no conflicting PRs |
### Potential review-work flags
1. **ReDoS concern**: User-supplied regex patterns in `exclude_patterns` could theoretically cause ReDoS in the Go binary. Mitigation: the patterns are passed as CLI args, Go's `regexp` package is RE2-based (linear time guarantee).
2. **Breaking change check**: Adding optional field to config schema is non-breaking (Zod `z.optional()` fills default).
3. **Go binary dependency**: The `--exclude-pattern` flag must exist in the Go binary for this to work. If the binary doesn't support it yet, the patterns are silently ignored (binary treats unknown flags differently).
### Failure handling
- If any Oracle flags issues: address feedback, push new commit, re-run review-work
- If Hephaestus QA finds test gaps: add missing tests, push, re-verify
## Gate C: Cubic (`cubic-dev-ai[bot]`)
### Expected review focus
- Schema change additive and backward-compatible
- Parameter threading is mechanical and low-risk
- Tests use mock binaries (shell scripts) - standard project pattern per `cli.test.ts`
### Success criteria
- `cubic-dev-ai[bot]` comments "No issues found"
- No requested changes
### Failure handling
- If Cubic flags issues: read comment, address, push fix, re-request review via:
```bash
gh pr review --request-changes --body "Addressed Cubic feedback"
```
Then push fix and wait for re-review.
## Post-merge verification
1. Confirm squash merge landed on `dev`
2. Verify CI passes on `dev` branch post-merge
3. Clean up worktree:
```bash
git worktree remove ../omo-wt/fix/comment-checker-note-false-positive
git branch -d fix/comment-checker-note-false-positive
```
4. File issue on `code-yeongyu/go-claude-code-comment-checker` to add `--exclude-pattern` flag support and relax the `note:` regex upstream

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 570000, "total_duration_seconds": 570}

View File

@@ -0,0 +1,10 @@
{
"run_id": "eval-5-without_skill",
"expectations": [
{"text": "Plan uses git worktree in a sibling directory", "passed": false, "evidence": "git checkout -b, no worktree"},
{"text": "References actual comment-checker hook files", "passed": true, "evidence": "Deep analysis of Go binary, tree-sitter, formatter.go, agent_memo.go with line numbers"},
{"text": "Adds test cases for Note: false positive scenarios", "passed": true, "evidence": "Detailed test cases distinguishing legit vs AI slop patterns"},
{"text": "Verification loop includes all 3 gates", "passed": false, "evidence": "Only bun test and typecheck. No review-work or Cubic."},
{"text": "Only modifies regex and adds tests — no unrelated changes", "passed": true, "evidence": "Adds allowed-prefix filter module — focused approach with config extension"}
]
}

View File

@@ -0,0 +1,529 @@
# Code Changes: comment-checker false positive fix
## Change 1: Extend config schema
**File: `src/config/schema/comment-checker.ts`**
```typescript
// BEFORE
import { z } from "zod"
export const CommentCheckerConfigSchema = z.object({
/** Custom prompt to replace the default warning message. Use {{comments}} placeholder for detected comments XML. */
custom_prompt: z.string().optional(),
})
export type CommentCheckerConfig = z.infer<typeof CommentCheckerConfigSchema>
```
```typescript
// AFTER
import { z } from "zod"
const DEFAULT_ALLOWED_COMMENT_PREFIXES = [
"note:",
"todo:",
"fixme:",
"hack:",
"xxx:",
"warning:",
"important:",
"bug:",
"optimize:",
"workaround:",
"safety:",
"security:",
"perf:",
"see:",
"ref:",
"cf.",
]
export const CommentCheckerConfigSchema = z.object({
/** Custom prompt to replace the default warning message. Use {{comments}} placeholder for detected comments XML. */
custom_prompt: z.string().optional(),
/** Comment prefixes considered legitimate (not AI slop). Case-insensitive. Defaults include Note:, TODO:, FIXME:, etc. */
allowed_comment_prefixes: z.array(z.string()).optional().default(DEFAULT_ALLOWED_COMMENT_PREFIXES),
})
export type CommentCheckerConfig = z.infer<typeof CommentCheckerConfigSchema>
```
## Change 2: Create allowed-prefix-filter module
**File: `src/hooks/comment-checker/allowed-prefix-filter.ts`** (NEW)
```typescript
const COMMENT_XML_REGEX = /<comment\s+line-number="\d+">([\s\S]*?)<\/comment>/g
const COMMENTS_BLOCK_REGEX = /<comments\s+file="[^"]*">\s*([\s\S]*?)\s*<\/comments>/g
const AGENT_MEMO_HEADER_REGEX = /🚨 AGENT MEMO COMMENT DETECTED.*?---\n\n/s
function stripCommentPrefix(text: string): string {
let stripped = text.trim()
for (const prefix of ["//", "#", "/*", "--", "*"]) {
if (stripped.startsWith(prefix)) {
stripped = stripped.slice(prefix.length).trim()
break
}
}
return stripped
}
function isAllowedComment(commentText: string, allowedPrefixes: string[]): boolean {
const stripped = stripCommentPrefix(commentText).toLowerCase()
return allowedPrefixes.some((prefix) => stripped.startsWith(prefix.toLowerCase()))
}
function extractCommentTexts(xmlBlock: string): string[] {
const texts: string[] = []
let match: RegExpExecArray | null
const regex = new RegExp(COMMENT_XML_REGEX.source, COMMENT_XML_REGEX.flags)
while ((match = regex.exec(xmlBlock)) !== null) {
texts.push(match[1])
}
return texts
}
export function filterAllowedComments(
message: string,
allowedPrefixes: string[],
): { hasRemainingComments: boolean; filteredMessage: string } {
if (!message || allowedPrefixes.length === 0) {
return { hasRemainingComments: true, filteredMessage: message }
}
const commentTexts = extractCommentTexts(message)
if (commentTexts.length === 0) {
return { hasRemainingComments: true, filteredMessage: message }
}
const disallowedComments = commentTexts.filter(
(text) => !isAllowedComment(text, allowedPrefixes),
)
if (disallowedComments.length === 0) {
return { hasRemainingComments: false, filteredMessage: "" }
}
if (disallowedComments.length === commentTexts.length) {
return { hasRemainingComments: true, filteredMessage: message }
}
let filteredMessage = message
for (const text of commentTexts) {
if (isAllowedComment(text, allowedPrefixes)) {
const escapedText = text.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")
const lineRegex = new RegExp(`\\s*<comment\\s+line-number="\\d+">${escapedText}</comment>\\n?`, "g")
filteredMessage = filteredMessage.replace(lineRegex, "")
}
}
filteredMessage = filteredMessage.replace(AGENT_MEMO_HEADER_REGEX, "")
return { hasRemainingComments: true, filteredMessage }
}
```
## Change 3: Thread config through cli-runner.ts
**File: `src/hooks/comment-checker/cli-runner.ts`**
```typescript
// BEFORE (processWithCli signature and body)
export async function processWithCli(
input: { tool: string; sessionID: string; callID: string },
pendingCall: PendingCall,
output: { output: string },
cliPath: string,
customPrompt: string | undefined,
debugLog: (...args: unknown[]) => void,
): Promise<void> {
await withCommentCheckerLock(async () => {
// ...
const result = await runCommentChecker(hookInput, cliPath, customPrompt)
if (result.hasComments && result.message) {
debugLog("CLI detected comments, appending message")
output.output += `\n\n${result.message}`
} else {
debugLog("CLI: no comments detected")
}
}, undefined, debugLog)
}
```
```typescript
// AFTER
import { filterAllowedComments } from "./allowed-prefix-filter"
export async function processWithCli(
input: { tool: string; sessionID: string; callID: string },
pendingCall: PendingCall,
output: { output: string },
cliPath: string,
customPrompt: string | undefined,
allowedPrefixes: string[],
debugLog: (...args: unknown[]) => void,
): Promise<void> {
await withCommentCheckerLock(async () => {
void input
debugLog("using CLI mode with path:", cliPath)
const hookInput: HookInput = {
session_id: pendingCall.sessionID,
tool_name: pendingCall.tool.charAt(0).toUpperCase() + pendingCall.tool.slice(1),
transcript_path: "",
cwd: process.cwd(),
hook_event_name: "PostToolUse",
tool_input: {
file_path: pendingCall.filePath,
content: pendingCall.content,
old_string: pendingCall.oldString,
new_string: pendingCall.newString,
edits: pendingCall.edits,
},
}
const result = await runCommentChecker(hookInput, cliPath, customPrompt)
if (result.hasComments && result.message) {
const { hasRemainingComments, filteredMessage } = filterAllowedComments(
result.message,
allowedPrefixes,
)
if (hasRemainingComments && filteredMessage) {
debugLog("CLI detected comments, appending filtered message")
output.output += `\n\n${filteredMessage}`
} else {
debugLog("CLI: all detected comments matched allowed prefixes, suppressing")
}
} else {
debugLog("CLI: no comments detected")
}
}, undefined, debugLog)
}
// Same change applied to processApplyPatchEditsWithCli - add allowedPrefixes parameter
export async function processApplyPatchEditsWithCli(
sessionID: string,
edits: ApplyPatchEdit[],
output: { output: string },
cliPath: string,
customPrompt: string | undefined,
allowedPrefixes: string[],
debugLog: (...args: unknown[]) => void,
): Promise<void> {
debugLog("processing apply_patch edits:", edits.length)
for (const edit of edits) {
await withCommentCheckerLock(async () => {
const hookInput: HookInput = {
session_id: sessionID,
tool_name: "Edit",
transcript_path: "",
cwd: process.cwd(),
hook_event_name: "PostToolUse",
tool_input: {
file_path: edit.filePath,
old_string: edit.before,
new_string: edit.after,
},
}
const result = await runCommentChecker(hookInput, cliPath, customPrompt)
if (result.hasComments && result.message) {
const { hasRemainingComments, filteredMessage } = filterAllowedComments(
result.message,
allowedPrefixes,
)
if (hasRemainingComments && filteredMessage) {
debugLog("CLI detected comments for apply_patch file:", edit.filePath)
output.output += `\n\n${filteredMessage}`
}
}
}, undefined, debugLog)
}
}
```
## Change 4: Update hook.ts to pass config
**File: `src/hooks/comment-checker/hook.ts`**
```typescript
// BEFORE (in tool.execute.after handler, around line 177)
await processWithCli(input, pendingCall, output, cliPath, config?.custom_prompt, debugLog)
// AFTER
const allowedPrefixes = config?.allowed_comment_prefixes ?? []
await processWithCli(input, pendingCall, output, cliPath, config?.custom_prompt, allowedPrefixes, debugLog)
```
```typescript
// BEFORE (in apply_patch section, around line 147-154)
await processApplyPatchEditsWithCli(
input.sessionID,
edits,
output,
cliPath,
config?.custom_prompt,
debugLog,
)
// AFTER
const allowedPrefixes = config?.allowed_comment_prefixes ?? []
await processApplyPatchEditsWithCli(
input.sessionID,
edits,
output,
cliPath,
config?.custom_prompt,
allowedPrefixes,
debugLog,
)
```
## Change 5: Test file for allowed-prefix-filter
**File: `src/hooks/comment-checker/allowed-prefix-filter.test.ts`** (NEW)
```typescript
import { describe, test, expect } from "bun:test"
import { filterAllowedComments } from "./allowed-prefix-filter"
const DEFAULT_PREFIXES = [
"note:", "todo:", "fixme:", "hack:", "xxx:", "warning:",
"important:", "bug:", "optimize:", "workaround:", "safety:",
"security:", "perf:", "see:", "ref:", "cf.",
]
function buildMessage(comments: { line: number; text: string }[], filePath = "/tmp/test.ts"): string {
const xml = comments
.map((c) => `\t<comment line-number="${c.line}">${c.text}</comment>`)
.join("\n")
return `COMMENT/DOCSTRING DETECTED - IMMEDIATE ACTION REQUIRED\n\n` +
`Your recent changes contain comments or docstrings, which triggered this hook.\n` +
`Detected comments/docstrings:\n` +
`<comments file="${filePath}">\n${xml}\n</comments>\n`
}
describe("allowed-prefix-filter", () => {
describe("#given default allowed prefixes", () => {
describe("#when message contains only Note: comments", () => {
test("#then should suppress the entire message", () => {
const message = buildMessage([
{ line: 5, text: "// Note: Thread-safe implementation" },
{ line: 12, text: "// NOTE: See RFC 7231 for details" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(false)
expect(result.filteredMessage).toBe("")
})
})
describe("#when message contains only TODO/FIXME comments", () => {
test("#then should suppress the entire message", () => {
const message = buildMessage([
{ line: 3, text: "// TODO: implement caching" },
{ line: 7, text: "// FIXME: race condition here" },
{ line: 15, text: "# HACK: workaround for upstream bug" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(false)
expect(result.filteredMessage).toBe("")
})
})
describe("#when message contains only AI slop comments", () => {
test("#then should keep the entire message", () => {
const message = buildMessage([
{ line: 2, text: "// Added new validation logic" },
{ line: 8, text: "// Refactored for better performance" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(true)
expect(result.filteredMessage).toBe(message)
})
})
describe("#when message contains mix of legitimate and slop comments", () => {
test("#then should keep message but remove allowed comment XML entries", () => {
const message = buildMessage([
{ line: 5, text: "// Note: Thread-safe implementation" },
{ line: 10, text: "// Changed from old API to new API" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(true)
expect(result.filteredMessage).not.toContain("Thread-safe implementation")
expect(result.filteredMessage).toContain("Changed from old API to new API")
})
})
describe("#when Note: comment has lowercase prefix", () => {
test("#then should still be treated as allowed (case-insensitive)", () => {
const message = buildMessage([
{ line: 1, text: "// note: this is case insensitive" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(false)
})
})
describe("#when comment uses hash prefix", () => {
test("#then should strip prefix before matching", () => {
const message = buildMessage([
{ line: 1, text: "# Note: Python style comment" },
{ line: 5, text: "# TODO: something to do" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(false)
})
})
describe("#when comment has Security: prefix", () => {
test("#then should be treated as allowed", () => {
const message = buildMessage([
{ line: 1, text: "// Security: validate input before processing" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(false)
})
})
describe("#when comment has Warning: prefix", () => {
test("#then should be treated as allowed", () => {
const message = buildMessage([
{ line: 1, text: "// WARNING: This mutates the input array" },
])
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(false)
})
})
})
describe("#given empty allowed prefixes", () => {
describe("#when any comments are detected", () => {
test("#then should pass through unfiltered", () => {
const message = buildMessage([
{ line: 1, text: "// Note: this should pass through" },
])
const result = filterAllowedComments(message, [])
expect(result.hasRemainingComments).toBe(true)
expect(result.filteredMessage).toBe(message)
})
})
})
describe("#given custom allowed prefixes", () => {
describe("#when comment matches custom prefix", () => {
test("#then should suppress it", () => {
const message = buildMessage([
{ line: 1, text: "// PERF: O(n log n) complexity" },
])
const result = filterAllowedComments(message, ["perf:"])
expect(result.hasRemainingComments).toBe(false)
})
})
})
describe("#given empty message", () => {
describe("#when filterAllowedComments is called", () => {
test("#then should return hasRemainingComments true with empty string", () => {
const result = filterAllowedComments("", DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(true)
expect(result.filteredMessage).toBe("")
})
})
})
describe("#given message with agent memo header", () => {
describe("#when all flagged comments are legitimate Note: comments", () => {
test("#then should suppress agent memo header along with comments", () => {
const message =
"🚨 AGENT MEMO COMMENT DETECTED - CODE SMELL ALERT 🚨\n\n" +
"⚠️ AGENT MEMO COMMENTS DETECTED - THIS IS A CODE SMELL ⚠️\n\n" +
"You left \"memo-style\" comments...\n\n---\n\n" +
"Your recent changes contain comments...\n" +
"Detected comments/docstrings:\n" +
'<comments file="/tmp/test.ts">\n' +
'\t<comment line-number="5">// Note: Thread-safe</comment>\n' +
"</comments>\n"
const result = filterAllowedComments(message, DEFAULT_PREFIXES)
expect(result.hasRemainingComments).toBe(false)
expect(result.filteredMessage).toBe("")
})
})
})
})
```
## Change 6: Update existing test for new parameter
**File: `src/hooks/comment-checker/hook.apply-patch.test.ts`**
The `processApplyPatchEditsWithCli` mock needs to account for the new `allowedPrefixes` parameter:
```typescript
// BEFORE (line 58)
expect(processApplyPatchEditsWithCli).toHaveBeenCalledWith(
"ses_test",
[
{ filePath: "/repo/src/a.ts", before: "const a = 1\n", after: "// comment\nconst a = 1\n" },
{ filePath: "/repo/src/new.ts", before: "const b = 1\n", after: "// moved comment\nconst b = 1\n" },
],
expect.any(Object),
"/tmp/fake-comment-checker",
undefined,
expect.any(Function),
)
// AFTER - add allowed_comment_prefixes argument
expect(processApplyPatchEditsWithCli).toHaveBeenCalledWith(
"ses_test",
[
{ filePath: "/repo/src/a.ts", before: "const a = 1\n", after: "// comment\nconst a = 1\n" },
{ filePath: "/repo/src/new.ts", before: "const b = 1\n", after: "// moved comment\nconst b = 1\n" },
],
expect.any(Object),
"/tmp/fake-comment-checker",
undefined,
expect.any(Array),
expect.any(Function),
)
```
## Summary of all touched files
| File | Action | Description |
|------|--------|-------------|
| `src/config/schema/comment-checker.ts` | Modified | Add `allowed_comment_prefixes` with defaults |
| `src/hooks/comment-checker/allowed-prefix-filter.ts` | **New** | Post-processing filter for legitimate comment prefixes |
| `src/hooks/comment-checker/allowed-prefix-filter.test.ts` | **New** | 11 test cases covering false positives and edge cases |
| `src/hooks/comment-checker/cli-runner.ts` | Modified | Thread `allowedPrefixes` param, apply filter after binary result |
| `src/hooks/comment-checker/hook.ts` | Modified | Pass `allowed_comment_prefixes` from config to CLI runner |
| `src/hooks/comment-checker/hook.apply-patch.test.ts` | Modified | Update mock assertions for new parameter |

View File

@@ -0,0 +1,127 @@
# Execution Plan: Relax comment-checker hook false positives
## Problem Analysis
The comment-checker hook delegates to an external Go binary (`code-yeongyu/go-claude-code-comment-checker`). The binary:
1. Detects ALL comments in written/edited code using tree-sitter
2. Filters out only BDD markers, linter directives, and shebangs
3. Flags every remaining comment as problematic (exit code 2)
4. In the output formatter (`formatter.go`), uses `AgentMemoFilter` to categorize comments for display
The `AgentMemoFilter` in `pkg/filters/agent_memo.go` contains the overly aggressive regex:
```go
regexp.MustCompile(`(?i)^[\s#/*-]*note:\s*\w`),
```
This matches ANY comment starting with `Note:` (case-insensitive) followed by a word character, causing legitimate comments like `// Note: Thread-safe implementation` or `// NOTE: See RFC 7231` to be classified as "AGENT MEMO" AI slop with an aggressive warning banner.
Additionally, the binary flags ALL non-filtered comments (not just agent memos), so even without the `Note:` regex, `// Note: ...` comments would still be flagged as generic "COMMENT DETECTED."
## Architecture Understanding
```
TypeScript (oh-my-opencode) Go Binary (go-claude-code-comment-checker)
───────────────────────────── ──────────────────────────────────────────
hook.ts main.go
├─ tool.execute.before ├─ Read JSON from stdin
│ └─ registerPendingCall() ├─ Detect comments (tree-sitter)
└─ tool.execute.after ├─ applyFilters (BDD, Directive, Shebang)
└─ processWithCli() ├─ FormatHookMessage (uses AgentMemoFilter for display)
└─ runCommentChecker() └─ exit 0 (clean) or exit 2 (comments found, message on stderr)
└─ spawn binary, pipe JSON
└─ read stderr → message
└─ append to output
```
Key files in oh-my-opencode:
- `src/hooks/comment-checker/hook.ts` - Hook factory, registers before/after handlers
- `src/hooks/comment-checker/cli-runner.ts` - Orchestrates CLI invocation, semaphore
- `src/hooks/comment-checker/cli.ts` - Binary resolution, process spawning, timeout handling
- `src/hooks/comment-checker/types.ts` - PendingCall, CommentInfo types
- `src/config/schema/comment-checker.ts` - Config schema (currently only `custom_prompt`)
Key files in Go binary:
- `pkg/filters/agent_memo.go` - Contains the aggressive `note:\s*\w` regex (line 20)
- `pkg/output/formatter.go` - Uses AgentMemoFilter to add "AGENT MEMO" warnings
- `cmd/comment-checker/main.go` - Filter pipeline (BDD + Directive + Shebang only)
## Step-by-Step Plan
### Step 1: Create feature branch
```bash
git checkout dev
git pull origin dev
git checkout -b fix/comment-checker-note-false-positive
```
### Step 2: Extend CommentCheckerConfigSchema
**File: `src/config/schema/comment-checker.ts`**
Add `allowed_comment_prefixes` field with sensible defaults. This lets users configure which comment prefixes should be treated as legitimate (not AI slop).
### Step 3: Add a post-processing filter in cli-runner.ts
**File: `src/hooks/comment-checker/cli-runner.ts`**
After the Go binary returns its result, parse the stderr message to identify and suppress comments that match allowed prefixes. The binary's output contains XML like:
```xml
<comments file="/path/to/file.ts">
<comment line-number="5">// Note: Thread-safe</comment>
</comments>
```
Add a function `filterAllowedComments()` that:
1. Extracts `<comment>` elements from the message
2. Checks if the comment text matches any allowed prefix pattern
3. If ALL flagged comments match allowed patterns, suppress the entire warning
4. If some comments are legitimate and some aren't, rebuild the message without the legitimate ones
### Step 4: Create dedicated filter module
**File: `src/hooks/comment-checker/allowed-prefix-filter.ts`** (new)
Extract the filtering logic into its own module per the 200 LOC / single-responsibility rule.
### Step 5: Pass allowed_comment_prefixes through the hook chain
**File: `src/hooks/comment-checker/hook.ts`**
Thread the `allowed_comment_prefixes` config from `createCommentCheckerHooks()` down to `processWithCli()` and `processApplyPatchEditsWithCli()`.
### Step 6: Add test cases
**File: `src/hooks/comment-checker/allowed-prefix-filter.test.ts`** (new)
Test cases covering:
- `// Note: Thread-safe implementation` - should NOT be flagged (false positive)
- `// NOTE: See RFC 7231 for details` - should NOT be flagged
- `// Note: changed from X to Y` - SHOULD still be flagged (genuine AI slop)
- `// TODO: implement caching` - should NOT be flagged
- `// FIXME: race condition` - should NOT be flagged
- `// HACK: workaround for upstream bug` - should NOT be flagged
- `// Added new validation logic` - SHOULD be flagged
- Custom allowed patterns from config
**File: `src/hooks/comment-checker/cli-runner.test.ts`** (new or extend cli.test.ts)
Integration-level tests for the post-processing pipeline.
### Step 7: Verify
```bash
bun test src/hooks/comment-checker/
bun run typecheck
```
### Step 8: Commit and push
```bash
git add -A
git commit -m "fix(comment-checker): add allowed-prefix filter to reduce false positives on Note: comments"
git push -u origin fix/comment-checker-note-false-positive
```
### Step 9: Create PR
```bash
gh pr create --title "fix(comment-checker): reduce false positives for legitimate Note: comments" --body-file /tmp/pr-body.md --base dev
```
### Step 10 (Follow-up): Upstream Go binary fix
File an issue or PR on `code-yeongyu/go-claude-code-comment-checker` to:
1. Relax `(?i)^[\s#/*-]*note:\s*\w` to be more specific (e.g., `note:\s*(changed|modified|updated|added|removed|implemented|refactored)`)
2. Add a dedicated `LegitimateCommentFilter` to the filter pipeline in `main.go`
3. Support `--allow-prefix` CLI flag for external configuration

View File

@@ -0,0 +1,42 @@
## Summary
- Add `allowed_comment_prefixes` config to `CommentCheckerConfigSchema` with sensible defaults (Note:, TODO:, FIXME:, HACK:, WARNING:, etc.)
- Add post-processing filter in `allowed-prefix-filter.ts` that suppresses false positives from the Go binary's output before appending to tool output
- Add 11 test cases covering false positive scenarios (Note:, TODO:, FIXME:, case-insensitivity, mixed comments, agent memo header suppression)
## Problem
The comment-checker hook's upstream Go binary (`go-claude-code-comment-checker`) flags ALL non-filtered comments as problematic. Its `AgentMemoFilter` regex `(?i)^[\s#/*-]*note:\s*\w` classifies any `Note:` comment as AI-generated "agent memo" slop, triggering an aggressive warning banner.
This causes false positives for legitimate, widely-used comment patterns:
```typescript
// Note: Thread-safe implementation required due to concurrent access
// NOTE: See RFC 7231 section 6.5.4 for 404 semantics
// Note: This timeout matches the upstream service SLA
```
These are standard engineering documentation patterns, not AI slop.
## Solution
Rather than waiting for an upstream binary fix, this PR adds a configurable **post-processing filter** on the TypeScript side:
1. **Config**: `comment_checker.allowed_comment_prefixes` - array of case-insensitive prefixes (defaults: `note:`, `todo:`, `fixme:`, `hack:`, `warning:`, `important:`, `bug:`, etc.)
2. **Filter**: After the Go binary returns flagged comments, `filterAllowedComments()` parses the XML output and suppresses comments matching allowed prefixes
3. **Behavior**: If ALL flagged comments are legitimate → suppress entire warning. If mixed → remove only the legitimate entries from the XML, keep the warning for actual slop.
Users can customize via config:
```jsonc
{
"comment_checker": {
"allowed_comment_prefixes": ["note:", "todo:", "fixme:", "custom-prefix:"]
}
}
```
## Test Plan
- 11 new test cases in `allowed-prefix-filter.test.ts`
- Updated assertion in `hook.apply-patch.test.ts` for new parameter
- `bun test src/hooks/comment-checker/` passes
- `bun run typecheck` clean

View File

@@ -0,0 +1,120 @@
# Verification Strategy
## 1. Unit Tests
### New test file: `allowed-prefix-filter.test.ts`
Run: `bun test src/hooks/comment-checker/allowed-prefix-filter.test.ts`
| # | Scenario | Input | Expected |
|---|----------|-------|----------|
| 1 | Only Note: comments (default prefixes) | `// Note: Thread-safe`, `// NOTE: See RFC` | `hasRemainingComments: false`, empty message |
| 2 | Only TODO/FIXME/HACK (default prefixes) | `// TODO: impl`, `// FIXME: race`, `# HACK: workaround` | Suppressed |
| 3 | Only AI slop comments | `// Added validation`, `// Refactored for perf` | Full message preserved |
| 4 | Mixed legitimate + slop | `// Note: Thread-safe`, `// Changed from old to new` | Message kept, Note: entry removed from XML |
| 5 | Case-insensitive Note: | `// note: lowercase test` | Suppressed |
| 6 | Hash-prefixed comments | `# Note: Python`, `# TODO: something` | Suppressed (prefix stripped before matching) |
| 7 | Security: prefix | `// Security: validate input` | Suppressed |
| 8 | Warning: prefix | `// WARNING: mutates input` | Suppressed |
| 9 | Empty allowed prefixes | `// Note: should pass through` | Full message preserved (no filtering) |
| 10 | Custom prefix | `// PERF: O(n log n)` with `["perf:"]` | Suppressed |
| 11 | Agent memo header + Note: | Full agent memo banner + `// Note: Thread-safe` | Entire message suppressed including banner |
### Existing test: `hook.apply-patch.test.ts`
Run: `bun test src/hooks/comment-checker/hook.apply-patch.test.ts`
Verify the updated mock assertion accepts the new `allowedPrefixes` array parameter.
### Existing test: `cli.test.ts`
Run: `bun test src/hooks/comment-checker/cli.test.ts`
Verify no regressions in binary spawning, timeout, and semaphore logic.
## 2. Type Checking
```bash
bun run typecheck
```
Verify:
- `CommentCheckerConfigSchema` change propagates correctly to `CommentCheckerConfig` type
- All call sites in `hook.ts` and `cli-runner.ts` pass the new parameter
- `filterAllowedComments` return type matches usage in `cli-runner.ts`
- No new type errors introduced
## 3. LSP Diagnostics
```bash
# Check all changed files for errors
lsp_diagnostics src/config/schema/comment-checker.ts
lsp_diagnostics src/hooks/comment-checker/allowed-prefix-filter.ts
lsp_diagnostics src/hooks/comment-checker/cli-runner.ts
lsp_diagnostics src/hooks/comment-checker/hook.ts
lsp_diagnostics src/hooks/comment-checker/allowed-prefix-filter.test.ts
```
## 4. Full Test Suite
```bash
bun test src/hooks/comment-checker/
```
All 4 test files should pass:
- `cli.test.ts` (existing - no regressions)
- `pending-calls.test.ts` (existing - no regressions)
- `hook.apply-patch.test.ts` (modified assertion)
- `allowed-prefix-filter.test.ts` (new - all 11 cases)
## 5. Build Verification
```bash
bun run build
```
Ensure the new module is properly bundled and exported.
## 6. Integration Verification (Manual)
If binary is available locally:
```bash
# Test with a file containing Note: comment
echo '{"session_id":"test","tool_name":"Write","transcript_path":"","cwd":"/tmp","hook_event_name":"PostToolUse","tool_input":{"file_path":"/tmp/test.ts","content":"// Note: Thread-safe implementation\nconst x = 1"}}' | ~/.cache/oh-my-opencode/bin/comment-checker check
echo "Exit code: $?"
```
Expected: Binary returns exit 2 (comment detected), but the TypeScript post-filter should suppress it.
## 7. Config Validation
Test that config changes work:
```jsonc
// .opencode/oh-my-opencode.jsonc
{
"comment_checker": {
// Override: only allow Note: and TODO:
"allowed_comment_prefixes": ["note:", "todo:"]
}
}
```
Verify Zod schema accepts the config and defaults are applied when field is omitted.
## 8. Regression Checks
Verify the following still work correctly:
- AI slop comments (`// Added new feature`, `// Refactored for performance`) are still flagged
- BDD comments (`// given`, `// when`, `// then`) are still allowed (binary-side filter)
- Linter directives (`// eslint-disable`, `// @ts-ignore`) are still allowed (binary-side filter)
- Shebangs (`#!/usr/bin/env node`) are still allowed (binary-side filter)
- `custom_prompt` config still works
- Semaphore prevents concurrent comment-checker runs
- Timeout handling (30s) still works
## 9. Edge Cases to Watch
- Empty message from binary (exit code 0) - filter should be no-op
- Binary not available - hook gracefully degrades (existing behavior)
- Message with no `<comment>` XML elements - filter passes through
- Very long messages with many comments - regex performance
- Comments containing XML-special characters (`<`, `>`, `&`) in text

View File

@@ -0,0 +1 @@
{"total_tokens": null, "duration_ms": 399000, "total_duration_seconds": 399}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,360 @@
---
name: work-with-pr
description: "Full PR lifecycle: git worktree → implement → atomic commits → PR creation → verification loop (CI + review-work + Cubic approval) → merge. Keeps iterating until ALL gates pass and PR is merged. Worktree auto-cleanup after merge. Use whenever implementation work needs to land as a PR. Triggers: 'create a PR', 'implement and PR', 'work on this and make a PR', 'implement issue', 'land this as a PR', 'work-with-pr', 'PR workflow', 'implement end to end', even when user just says 'implement X' if the context implies PR delivery."
---
# Work With PR — Full PR Lifecycle
You are executing a complete PR lifecycle: from isolated worktree setup through implementation, PR creation, and an unbounded verification loop until the PR is merged. The loop has three gates — CI, review-work, and Cubic — and you keep fixing and pushing until all three pass simultaneously.
<architecture>
```
Phase 0: Setup → Branch + worktree in sibling directory
Phase 1: Implement → Do the work, atomic commits
Phase 2: PR Creation → Push, create PR targeting dev
Phase 3: Verify Loop → Unbounded iteration until ALL gates pass:
├─ Gate A: CI → gh pr checks (bun test, typecheck, build)
├─ Gate B: review-work → 5-agent parallel review
└─ Gate C: Cubic → cubic-dev-ai[bot] "No issues found"
Phase 4: Merge → Squash merge, worktree cleanup
```
</architecture>
---
## Phase 0: Setup
Create an isolated worktree so the user's main working directory stays clean. This matters because the user may have uncommitted work, and checking out a branch would destroy it.
<setup>
### 1. Resolve repository context
```bash
REPO=$(gh repo view --json nameWithOwner -q .nameWithOwner)
REPO_NAME=$(basename "$PWD")
BASE_BRANCH="dev" # CI blocks PRs to master
```
### 2. Create branch
If user provides a branch name, use it. Otherwise, derive from the task:
```bash
# Auto-generate: feature/short-description or fix/short-description
BRANCH_NAME="feature/$(echo "$TASK_SUMMARY" | tr '[:upper:] ' '[:lower:]-' | head -c 50)"
git fetch origin "$BASE_BRANCH"
git branch "$BRANCH_NAME" "origin/$BASE_BRANCH"
```
### 3. Create worktree
Place worktrees as siblings to the repo — not inside it. This avoids git nested repo issues and keeps the working tree clean.
```bash
WORKTREE_PATH="../${REPO_NAME}-wt/${BRANCH_NAME}"
mkdir -p "$(dirname "$WORKTREE_PATH")"
git worktree add "$WORKTREE_PATH" "$BRANCH_NAME"
```
### 4. Set working context
All subsequent work happens inside the worktree. Install dependencies if needed:
```bash
cd "$WORKTREE_PATH"
# If bun project:
[ -f "bun.lock" ] && bun install
```
</setup>
---
## Phase 1: Implement
Do the actual implementation work inside the worktree. The agent using this skill does the work directly — no subagent delegation for the implementation itself.
**Scope discipline**: For bug fixes, stay minimal. Fix the bug, add a test for it, done. Do not refactor surrounding code, add config options, or "improve" things that aren't broken. The verification loop will catch regressions — trust the process.
<implementation>
### Commit strategy
Use the git-master skill's atomic commit principles. The reason for atomic commits: if CI fails on one change, you can isolate and fix it without unwinding everything.
```
3+ files changed → 2+ commits minimum
5+ files changed → 3+ commits minimum
10+ files changed → 5+ commits minimum
```
Each commit should pair implementation with its tests. Load `git-master` skill when committing:
```
task(category="quick", load_skills=["git-master"], prompt="Commit the changes atomically following git-master conventions. Repository is at {WORKTREE_PATH}.")
```
### Pre-push local validation
Before pushing, run the same checks CI will run. Catching failures locally saves a full CI round-trip (~3-5 min):
```bash
bun run typecheck
bun test
bun run build
```
Fix any failures before pushing. Each fix-commit cycle should be atomic.
</implementation>
---
## Phase 2: PR Creation
<pr_creation>
### Push and create PR
```bash
git push -u origin "$BRANCH_NAME"
```
Create the PR using the project's template structure:
```bash
gh pr create \
--base "$BASE_BRANCH" \
--head "$BRANCH_NAME" \
--title "$PR_TITLE" \
--body "$(cat <<'EOF'
## Summary
[1-3 sentences describing what this PR does and why]
## Changes
[Bullet list of key changes]
## Testing
- `bun run typecheck` ✅
- `bun test` ✅
- `bun run build` ✅
## Related Issues
[Link to issue if applicable]
EOF
)"
```
Capture the PR number:
```bash
PR_NUMBER=$(gh pr view --json number -q .number)
```
</pr_creation>
---
## Phase 3: Verification Loop
This is the core of the skill. Three gates must ALL pass for the PR to be ready. The loop has no iteration cap — keep going until done. Gate ordering is intentional: CI is cheapest/fastest, review-work is most thorough, Cubic is external and asynchronous.
<verify_loop>
```
while true:
1. Wait for CI → Gate A
2. If CI fails → read logs, fix, commit, push, continue
3. Run review-work → Gate B
4. If review fails → fix blocking issues, commit, push, continue
5. Check Cubic → Gate C
6. If Cubic has issues → fix issues, commit, push, continue
7. All three pass → break
```
### Gate A: CI Checks
CI is the fastest feedback loop. Wait for it to complete, then parse results.
```bash
# Wait for checks to start (GitHub needs a moment after push)
# Then watch for completion
gh pr checks "$PR_NUMBER" --watch --fail-fast
```
**On failure**: Get the failed run logs to understand what broke:
```bash
# Find the failed run
RUN_ID=$(gh run list --branch "$BRANCH_NAME" --status failure --json databaseId --jq '.[0].databaseId')
# Get failed job logs
gh run view "$RUN_ID" --log-failed
```
Read the logs, fix the issue, commit atomically, push, and re-enter the loop.
### Gate B: review-work
The review-work skill launches 5 parallel sub-agents (goal verification, QA, code quality, security, context mining). All 5 must pass.
Invoke review-work after CI passes — there's no point reviewing code that doesn't build:
```
task(
category="unspecified-high",
load_skills=["review-work"],
run_in_background=false,
description="Post-implementation review of PR changes",
prompt="Review the implementation work on branch {BRANCH_NAME}. The worktree is at {WORKTREE_PATH}. Goal: {ORIGINAL_GOAL}. Constraints: {CONSTRAINTS}. Run command: bun run dev (or as appropriate)."
)
```
**On failure**: review-work reports blocking issues with specific files and line numbers. Fix each blocking issue, commit, push, and re-enter the loop from Gate A (since code changed, CI must re-run).
### Gate C: Cubic Approval
Cubic (`cubic-dev-ai[bot]`) is an automated review bot that comments on PRs. It does NOT use GitHub's APPROVED review state — instead it posts comments with issue counts and confidence scores.
**Approval signal**: The latest Cubic comment contains `**No issues found**` and confidence `**5/5**`.
**Issue signal**: The comment lists issues with file-level detail.
```bash
# Get the latest Cubic review
CUBIC_REVIEW=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/reviews" \
--jq '[.[] | select(.user.login == "cubic-dev-ai[bot]")] | last | .body')
# Check if approved
if echo "$CUBIC_REVIEW" | grep -q "No issues found"; then
echo "Cubic: APPROVED"
else
echo "Cubic: ISSUES FOUND"
echo "$CUBIC_REVIEW"
fi
```
**On issues**: Cubic's review body contains structured issue descriptions. Parse them, determine which are valid (some may be false positives), fix the valid ones, commit, push, re-enter from Gate A.
Cubic reviews are triggered automatically on PR updates. After pushing a fix, wait for the new review to appear before checking again. Use `gh api` polling with a conditional loop:
```bash
# Wait for new Cubic review after push
PUSH_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
while true; do
LATEST_REVIEW_TIME=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/reviews" \
--jq '[.[] | select(.user.login == "cubic-dev-ai[bot]")] | last | .submitted_at')
if [[ "$LATEST_REVIEW_TIME" > "$PUSH_TIME" ]]; then
break
fi
# Use gh api call itself as the delay mechanism — each call takes ~1-2s
# For longer waits, use: timeout 30 gh pr checks "$PR_NUMBER" --watch 2>/dev/null || true
done
```
### Iteration discipline
Each iteration through the loop:
1. Fix ONLY the issues identified by the failing gate
2. Commit atomically (one logical fix per commit)
3. Push
4. Re-enter from Gate A (code changed → full re-verification)
Avoid the temptation to "improve" unrelated code during fix iterations. Scope creep in the fix loop makes debugging harder and can introduce new failures.
</verify_loop>
---
## Phase 4: Merge & Cleanup
Once all three gates pass:
<merge_cleanup>
### Merge the PR
```bash
# Squash merge to keep history clean
gh pr merge "$PR_NUMBER" --squash --delete-branch
```
### Sync .sisyphus state back to main repo
Before removing the worktree, copy `.sisyphus/` state back. When `.sisyphus/` is gitignored, files written there during worktree execution are not committed or merged — they would be lost on worktree removal.
```bash
# Sync .sisyphus state from worktree to main repo (preserves task state, plans, notepads)
if [ -d "$WORKTREE_PATH/.sisyphus" ]; then
mkdir -p "$ORIGINAL_DIR/.sisyphus"
cp -r "$WORKTREE_PATH/.sisyphus/"* "$ORIGINAL_DIR/.sisyphus/" 2>/dev/null || true
fi
```
### Clean up the worktree
The worktree served its purpose — remove it to avoid disk bloat:
```bash
cd "$ORIGINAL_DIR" # Return to original working directory
git worktree remove "$WORKTREE_PATH"
# Prune any stale worktree references
git worktree prune
```
### Report completion
Summarize what happened:
```
## PR Merged ✅
- **PR**: #{PR_NUMBER} — {PR_TITLE}
- **Branch**: {BRANCH_NAME} → {BASE_BRANCH}
- **Iterations**: {N} verification loops
- **Gates passed**: CI ✅ | review-work ✅ | Cubic ✅
- **Worktree**: cleaned up
```
</merge_cleanup>
---
## Failure Recovery
<failure_recovery>
If you hit an unrecoverable error (e.g., merge conflict with base branch, infrastructure failure):
1. **Do NOT delete the worktree** — the user may want to inspect or continue manually
2. Report what happened, what was attempted, and where things stand
3. Include the worktree path so the user can resume
For merge conflicts:
```bash
cd "$WORKTREE_PATH"
git fetch origin "$BASE_BRANCH"
git rebase "origin/$BASE_BRANCH"
# Resolve conflicts, then continue the loop
```
</failure_recovery>
---
## Anti-Patterns
| Violation | Why it fails | Severity |
|-----------|-------------|----------|
| Working in main worktree instead of isolated worktree | Pollutes user's working directory, may destroy uncommitted work | CRITICAL |
| Pushing directly to dev/master | Bypasses review entirely | CRITICAL |
| Skipping CI gate after code changes | review-work and Cubic may pass on stale code | CRITICAL |
| Fixing unrelated code during verification loop | Scope creep causes new failures | HIGH |
| Deleting worktree on failure | User loses ability to inspect/resume | HIGH |
| Ignoring Cubic false positives without justification | Cubic issues should be evaluated, not blindly dismissed | MEDIUM |
| Giant single commits | Harder to isolate failures, violates git-master principles | MEDIUM |
| Not running local checks before push | Wastes CI time on obvious failures | MEDIUM |

View File

@@ -4,7 +4,7 @@
## OVERVIEW
OpenCode plugin (npm: `oh-my-opencode`) that extends Claude Code (OpenCode fork) with multi-agent orchestration, 46 lifecycle hooks, 26 tools, skill/command/MCP systems, and Claude Code compatibility. 1268 TypeScript files, 160k LOC.
OpenCode plugin (npm: `oh-my-opencode`) that extends Claude Code (OpenCode fork) with multi-agent orchestration, 48 lifecycle hooks, 26 tools, skill/command/MCP systems, and Claude Code compatibility. 1268 TypeScript files, 160k LOC.
## STRUCTURE
@@ -14,14 +14,14 @@ oh-my-opencode/
│ ├── index.ts # Plugin entry: loadConfig → createManagers → createTools → createHooks → createPluginInterface
│ ├── plugin-config.ts # JSONC multi-level config: user → project → defaults (Zod v4)
│ ├── agents/ # 11 agents (Sisyphus, Hephaestus, Oracle, Librarian, Explore, Atlas, Prometheus, Metis, Momus, Multimodal-Looker, Sisyphus-Junior)
│ ├── hooks/ # 46 hooks across 45 directories + 11 standalone files
│ ├── hooks/ # 48 lifecycle hooks across dedicated modules and standalone files
│ ├── tools/ # 26 tools across 15 directories
│ ├── features/ # 19 feature modules (background-agent, skill-loader, tmux, MCP-OAuth, etc.)
│ ├── shared/ # 95+ utility files in 13 categories
│ ├── config/ # Zod v4 schema system (24 files)
│ ├── cli/ # CLI: install, run, doctor, mcp-oauth (Commander.js)
│ ├── mcp/ # 3 built-in remote MCPs (websearch, context7, grep_app)
│ ├── plugin/ # 8 OpenCode hook handlers + 46 hook composition
│ ├── plugin/ # 8 OpenCode hook handlers + 48 hook composition
│ └── plugin-handlers/ # 6-phase config loading pipeline
├── packages/ # Monorepo: cli-runner, 12 platform binaries
└── local-ignore/ # Dev-only test fixtures
@@ -34,7 +34,7 @@ OhMyOpenCodePlugin(ctx)
├─→ loadPluginConfig() # JSONC parse → project/user merge → Zod validate → migrate
├─→ createManagers() # TmuxSessionManager, BackgroundManager, SkillMcpManager, ConfigHandler
├─→ createTools() # SkillContext + AvailableCategories + ToolRegistry (26 tools)
├─→ createHooks() # 3-tier: Core(37) + Continuation(7) + Skill(2) = 46 hooks
├─→ createHooks() # 3-tier: Core(39) + Continuation(7) + Skill(2) = 48 hooks
└─→ createPluginInterface() # 8 OpenCode hook handlers → PluginInterface
```
@@ -97,7 +97,7 @@ Fields: agents (14 overridable, 21 fields each), categories (8 built-in + custom
- **Test pattern**: Bun test (`bun:test`), co-located `*.test.ts`, given/when/then style (nested describe with `#given`/`#when`/`#then` prefixes)
- **CI test split**: mock-heavy tests run in isolation (separate `bun test` processes), rest in batch
- **Factory pattern**: `createXXX()` for all tools, hooks, agents
- **Hook tiers**: Session (23) → Tool-Guard (10) → Transform (4) → Continuation (7) → Skill (2)
- **Hook tiers**: Session (23) → Tool-Guard (12) → Transform (4) → Continuation (7) → Skill (2)
- **Agent modes**: `primary` (respects UI model) vs `subagent` (own fallback chain) vs `all`
- **Model resolution**: 4-step: override → category-default → provider-fallback → system-default
- **Config format**: JSONC with comments, Zod v4 validation, snake_case keys

122
FIX-BLOCKS.md Normal file
View File

@@ -0,0 +1,122 @@
# Pre-Publish BLOCK Issues: Fix ALL Before Release
Two independent pre-publish reviews (Opus 4.6 + GPT-5.4) both concluded **BLOCK -- do not publish**. You must fix ALL blocking issues below using UltraBrain parallel agents. Work TDD-style: write/update tests first, then fix, verify tests pass.
## Strategy
Use ultrawork (ulw) to spawn UltraBrain agents in parallel. Each UB agent gets a non-overlapping scope. After all agents complete, run bun test to verify everything passes. Commit atomically per fix group.
---
## CRITICAL BLOCKERS (must fix -- 6 items)
### C1: Hashline Backward Compatibility
**Problem:** Strict whitespace hashing in hashline changes LINE#ID values for indented lines. Breaks existing anchors in cached/persisted edit operations.
**Fix:** Add a compatibility shim -- when lookup by new hash fails, fall back to legacy hash (without strict whitespace). Or version the hash format.
**Files:** Look for hashline-related files in src/tools/ or src/shared/
### C2: OpenAI-Only Model Catalog Broken with OpenCode-Go
**Problem:** isOpenAiOnlyAvailability() does not exclude availability.opencodeGo. When OpenCode-Go is present, OpenAI-only detection is wrong -- models get misrouted.
**Fix:** Add !availability.opencodeGo check to isOpenAiOnlyAvailability().
**Files:** Model/provider system files -- search for isOpenAiOnlyAvailability
### C3: CLI/Runtime Model Table Divergence
**Problem:** Model tables disagree between CLI install-time and runtime:
- ultrabrain: gpt-5.3-codex in CLI vs gpt-5.4 in runtime
- atlas: claude-sonnet-4-5 in CLI vs claude-sonnet-4-6 in runtime
- unspecified-high also diverges
**Fix:** Reconcile all model tables. Pick the correct model for each and make CLI + runtime match.
**Files:** Search for model table definitions, agent configs, CLI model references
### C4: atlas/metis/sisyphus-junior Missing OpenAI Fallbacks
**Problem:** These agents can resolve to opencode/glm-4.7-free or undefined in OpenAI-only environments. No valid OpenAI fallback paths exist.
**Fix:** Add valid OpenAI model fallback paths for all agents that need them.
**Files:** Agent config/model resolution code
### C5: model_fallback Default Mismatch
**Problem:** Schema and docs say model_fallback defaults to false, but runtime treats unset as true. Silent behavior change for all users.
**Fix:** Align -- either update schema/docs to say true, or fix runtime to default to false. Check what the intended behavior is from git history.
**Files:** Schema definition, runtime config loading
### C6: background_output Default Changed
**Problem:** background_output now defaults to full_session=true. Old callers get different output format without code changes.
**Fix:** Either document this change clearly, or restore old default and make full_session opt-in.
**Files:** Background output handling code
---
## HIGH PRIORITY (strongly recommended -- 4 items)
### H1: Runtime Fallback session-status-handler Race
**Problem:** When fallback model is already pending, the handler cannot advance the chain on subsequent cooldown events.
**Fix:** Allow override like message-update-handler does.
**Files:** Search for session-status-handler, message-update-handler
### H2: Atlas Final-Wave Approval Gate Logic
**Problem:** Approval gate logic does not match real Prometheus plan structure (nested checkboxes, parallel execution). Trigger logic is wrong.
**Fix:** Update to handle real plan structures.
**Files:** Atlas agent code, approval gate logic
### H3: delegate-task-english-directive Dead Code
**Problem:** Not dispatched from tool-execute-before.ts + wrong hook signature. Either wire properly or remove entirely.
**Fix:** Remove if not needed (cleaner). If needed, fix dispatch + signature.
**Files:** src/hooks/, tool-execute-before.ts
### H4: Auto-Slash-Command Session-Lifetime Dedup
**Problem:** Dedup uses session lifetime, suppressing legitimate repeated identical commands.
**Fix:** Change to short TTL (e.g., 30 seconds) instead of session lifetime.
**Files:** Slash command handling code
---
## ADDITIONAL BLOCKERS FROM GPT-5.4 REVIEW
### G1: Package Identity Split-Brain
**Problem:** Installer writes oh-my-openagent but doctor, auto-update, version lookup, publish workflow still reference oh-my-opencode. Half-migrated state.
**Fix:** Audit ALL references to package name. Either complete the migration consistently or revert to single name for this release.
**Files:** Installer, doctor, auto-update, version lookup, publish workflow -- grep for both package names
### G2: OpenCode-Go --opencode-go Value Validation
**Problem:** No validation for --opencode-go CLI value. No detection of existing OpenCode-Go installations.
**Fix:** Add value validation + existing install detection.
**Files:** CLI option handling code
### G3: Skill/Hook Reference Errors
**Problem:**
- work-with-pr references non-existent git tool category
- github-triage references TaskCreate/TaskUpdate which are not real tool names
**Fix:** Fix tool references to use actual tool names.
**Files:** Skill definition files in .opencode/skills/
### G4: Stale Context-Limit Cache
**Problem:** Shared context-limit resolver caches provider config. When config changes, stale removed limits persist and corrupt compaction/truncation decisions.
**Fix:** Add cache invalidation when provider config changes, or make the resolver stateless.
**Files:** Context-limit resolver, compaction code
### G5: disabled_hooks Schema vs Runtime Contract Mismatch
**Problem:** Schema is strict (rejects unknown hook names) but runtime is permissive (ignores unknown). Contract disagreement.
**Fix:** Align -- either make both strict or both permissive.
**Files:** Hook schema definition, runtime hook loading
---
## EXECUTION INSTRUCTIONS
1. Spawn UltraBrain agents to fix these in parallel -- group by file proximity:
- UB-1: C1 (hashline) + H4 (slash-command dedup)
- UB-2: C2 + C3 + C4 (model/provider system) + G2
- UB-3: C5 + C6 (config defaults) + G5
- UB-4: H1 + H2 (runtime handlers + Atlas gate)
- UB-5: H3 + G3 (dead code + skill references)
- UB-6: G1 (package identity -- full audit)
- UB-7: G4 (context-limit cache)
2. Each UB agent MUST:
- Write or update tests FIRST (TDD)
- Implement the fix
- Run bun test on affected test files
- Commit with descriptive message
3. After all UB agents complete, run full bun test to verify no regressions.
ulw

View File

@@ -4,6 +4,17 @@
> コアメンテナーのQが負傷したため、今週は Issue/PR への返信とリリースが遅れる可能性があります。
> ご理解とご支援に感謝します。
> [!TIP]
> **Building in Public**
>
> メンテナーが Jobdori を使い、oh-my-opencode をリアルタイムで開発・メンテナンスしています。Jobdori は OpenClaw をベースに大幅カスタマイズされた AI アシスタントです。
> すべての機能開発、修正、Issue トリアージを Discord でライブでご覧いただけます。
>
> [![Building in Public](./.github/assets/building-in-public.png)](https://discord.gg/PUwSMR9XNk)
>
> [**→ #building-in-public で確認する**](https://discord.gg/PUwSMR9XNk)
> [!NOTE]
>
> [![Sisyphus Labs - Sisyphus is the agent that codes like your team.](./.github/assets/sisyphuslabs.png?v=2)](https://sisyphuslabs.ai)
@@ -157,7 +168,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) はあなたのメインのオーケストレーターです。計画を立て、専門家に委任し、攻撃的な並列実行でタスクを完了まで推進します。途中で投げ出すことはありません。
**Hephaestus** (`gpt-5.3-codex`) はあなたの自律的なディープワーカーです。レシピではなく、目標を与えてください。手取り足取り教えなくても、コードベースを探索し、パターンを研究し、端から端まで実行します。*正当なる職人 (The Legitimate Craftsman).*
**Hephaestus** (`gpt-5.4`) はあなたの自律的なディープワーカーです。レシピではなく、目標を与えてください。手取り足取り教えなくても、コードベースを探索し、パターンを研究し、端から端まで実行します。*正当なる職人 (The Legitimate Craftsman).*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) はあなたの戦略プランナーです。インタビューモードで動作し、コードに触れる前に質問をしてスコープを特定し、詳細な計画を構築します。
@@ -165,7 +176,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
> Anthropicが[私たちのせいでOpenCodeをブロックしました。](https://x.com/thdxr/status/2010149530486911014) だからこそHephaestusは「正当なる職人 (The Legitimate Craftsman)」と呼ばれているのです。皮肉を込めています。
>
> Opusで最もよく動きますが、Kimi K2.5 + GPT-5.3 Codexの組み合わせだけでも、バニラのClaude Codeを軽く凌駕します。設定は一切不要です。
> Opusで最もよく動きますが、Kimi K2.5 + GPT-5.4の組み合わせだけでも、バニラのClaude Codeを軽く凌駕します。設定は一切不要です。
### エージェントの<E38388><E381AE>ーケストレーション

View File

@@ -4,6 +4,17 @@
> 핵심 메인테이너 Q가 부상을 입어, 이번 주에는 이슈/PR 응답 및 릴리스가 지연될 수 있습니다.
> 양해와 응원에 감사드립니다.
> [!TIP]
> **Building in Public**
>
> 메인테이너가 Jobdori를 통해 oh-my-opencode를 실시간으로 개발하고 있습니다. Jobdori는 OpenClaw를 기반으로 대폭 커스터마이징된 AI 어시스턴트입니다.
> 모든 기능 개발, 버그 수정, 이슈 트리아지를 Discord에서 실시간으로 확인하세요.
>
> [![Building in Public](./.github/assets/building-in-public.png)](https://discord.gg/PUwSMR9XNk)
>
> [**→ #building-in-public에서 확인하기**](https://discord.gg/PUwSMR9XNk)
> [!TIP]
> 저희와 함께 하세요!
>
@@ -151,7 +162,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**)는 당신의 메인 오케스트레이터입니다. 공격적인 병렬 실행으로 계획을 세우고, 전문가들에게 위임하며, 완료될 때까지 밀어붙입니다. 중간에 포기하는 법이 없습니다.
**Hephaestus** (`gpt-5.3-codex`)는 당신의 자율 딥 워커입니다. 레시피가 아니라 목표를 주세요. 베이비시터 없이 알아서 코드베이스를 탐색하고, 패턴을 연구하며, 끝에서 끝까지 전부 해냅니다. *진정한 장인(The Legitimate Craftsman).*
**Hephaestus** (`gpt-5.4`)는 당신의 자율 딥 워커입니다. 레시피가 아니라 목표를 주세요. 베이비시터 없이 알아서 코드베이스를 탐색하고, 패턴을 연구하며, 끝에서 끝까지 전부 해냅니다. *진정한 장인(The Legitimate Craftsman).*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**)는 당신의 전략 플래너입니다. 인터뷰 모드로 작동합니다. 코드 한 줄 만지기 전에 질문을 던져 스코프를 파악하고 상세한 계획부터 세웁니다.
@@ -159,7 +170,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
> Anthropic이 [우리 때문에 OpenCode를 막아버렸습니다.](https://x.com/thdxr/status/2010149530486911014) 그래서 Hephaestus의 별명이 "진정한 장인(The Legitimate Craftsman)"인 겁니다. (어디서 많이 들어본 이름이죠?) 아이러니를 노렸습니다.
>
> Opus에서 제일 잘 돌아가긴 하지만, Kimi K2.5 + GPT-5.3 Codex 조합만으로도 바닐라 Claude Code는 가볍게 바릅니다. 설정도 필요 없습니다.
> Opus에서 제일 잘 돌아가긴 하지만, Kimi K2.5 + GPT-5.4 조합만으로도 바닐라 Claude Code는 가볍게 바릅니다. 설정도 필요 없습니다.
### 에이전트 오케스트레이션

View File

@@ -1,8 +1,12 @@
> [!WARNING]
> **TEMP NOTICE (This Week): Reduced Maintainer Availability**
> [!TIP]
> **Building in Public**
>
> Core maintainer Q got injured, so issue/PR responses and releases may be delayed this week.
> Thank you for your patience and support.
> The maintainer builds and maintains oh-my-opencode in real-time with Jobdori, an AI assistant built on a heavily customized fork of OpenClaw.
> Every feature, every fix, every issue triage — live in our Discord.
>
> [![Building in Public](./.github/assets/building-in-public.png)](https://discord.gg/PUwSMR9XNk)
>
> [**→ Watch it happen in #building-in-public**](https://discord.gg/PUwSMR9XNk)
> [!NOTE]
>
@@ -37,7 +41,7 @@
<div align="center">
[![GitHub Release](https://img.shields.io/github/v/release/code-yeongyu/oh-my-openagent?color=369eff&labelColor=black&logo=github&style=flat-square)](https://github.com/code-yeongyu/oh-my-openagent/releases)
[![npm downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fohmyopenagent.com%2Fapi%2Fnpm-downloads&style=flat-square)](https://www.npmjs.com/package/oh-my-openagent)
[![npm downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fohmyopenagent.com%2Fapi%2Fnpm-downloads&style=flat-square)](https://www.npmjs.com/package/oh-my-opencode)
[![GitHub Contributors](https://img.shields.io/github/contributors/code-yeongyu/oh-my-openagent?color=c4f042&labelColor=black&style=flat-square)](https://github.com/code-yeongyu/oh-my-openagent/graphs/contributors)
[![GitHub Forks](https://img.shields.io/github/forks/code-yeongyu/oh-my-openagent?color=8ae8ff&labelColor=black&style=flat-square)](https://github.com/code-yeongyu/oh-my-openagent/network/members)
[![GitHub Stars](https://img.shields.io/github/stars/code-yeongyu/oh-my-openagent?color=ffcb47&labelColor=black&style=flat-square)](https://github.com/code-yeongyu/oh-my-openagent/stargazers)
@@ -107,6 +111,8 @@ Fetch the installation guide and follow it:
curl -s https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/refs/heads/dev/docs/guide/installation.md
```
**Note**: Use the published package and binary name `oh-my-opencode`. Inside `opencode.json`, the compatibility layer now prefers the plugin entry `oh-my-openagent`, while legacy `oh-my-opencode` entries still load with a warning. Plugin config files still commonly use `oh-my-opencode.json` or `oh-my-opencode.jsonc`, and both legacy and renamed basenames are recognized during the transition.
---
## Skip This README
@@ -160,7 +166,7 @@ Even only with following subscriptions, ultrawork will work well (this project i
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`** ) is your main orchestrator. He plans, delegates to specialists, and drives tasks to completion with aggressive parallel execution. He does not stop halfway.
**Hephaestus** (`gpt-5.3-codex`) is your autonomous deep worker. Give him a goal, not a recipe. He explores the codebase, researches patterns, and executes end-to-end without hand-holding. *The Legitimate Craftsman.*
**Hephaestus** (`gpt-5.4`) is your autonomous deep worker. Give him a goal, not a recipe. He explores the codebase, researches patterns, and executes end-to-end without hand-holding. *The Legitimate Craftsman.*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`** ) is your strategic planner. Interview mode: it questions, identifies scope, and builds a detailed plan before a single line of code is touched.
@@ -168,7 +174,7 @@ Every agent is tuned to its model's specific strengths. No manual model-juggling
> Anthropic [blocked OpenCode because of us.](https://x.com/thdxr/status/2010149530486911014) That's why Hephaestus is called "The Legitimate Craftsman." The irony is intentional.
>
> We run best on Opus, but Kimi K2.5 + GPT-5.3 Codex already beats vanilla Claude Code. Zero config needed.
> We run best on Opus, but Kimi K2.5 + GPT-5.4 already beats vanilla Claude Code. Zero config needed.
### Agent Orchestration
@@ -181,7 +187,7 @@ When Sisyphus delegates to a subagent, it doesn't pick a model. It picks a **cat
| `quick` | Single-file changes, typos |
| `ultrabrain` | Hard logic, architecture decisions |
Agent says what kind of work. Harness picks the right model. You touch nothing.
Agent says what kind of work. Harness picks the right model. `ultrabrain` now routes to GPT-5.4 xhigh by default. You touch nothing.
### Claude Code Compatibility
@@ -269,11 +275,11 @@ To remove oh-my-opencode:
1. **Remove the plugin from your OpenCode config**
Edit `~/.config/opencode/opencode.json` (or `opencode.jsonc`) and remove `"oh-my-opencode"` from the `plugin` array:
Edit `~/.config/opencode/opencode.json` (or `opencode.jsonc`) and remove either `"oh-my-openagent"` or the legacy `"oh-my-opencode"` entry from the `plugin` array:
```bash
# Using jq
jq '.plugin = [.plugin[] | select(. != "oh-my-opencode")]' \
jq '.plugin = [.plugin[] | select(. != "oh-my-openagent" and . != "oh-my-opencode")]' \
~/.config/opencode/opencode.json > /tmp/oc.json && \
mv /tmp/oc.json ~/.config/opencode/opencode.json
```
@@ -281,11 +287,13 @@ To remove oh-my-opencode:
2. **Remove configuration files (optional)**
```bash
# Remove user config
rm -f ~/.config/opencode/oh-my-opencode.json ~/.config/opencode/oh-my-opencode.jsonc
# Remove plugin config files recognized during the compatibility window
rm -f ~/.config/opencode/oh-my-openagent.jsonc ~/.config/opencode/oh-my-openagent.json \
~/.config/opencode/oh-my-opencode.jsonc ~/.config/opencode/oh-my-opencode.json
# Remove project config (if exists)
rm -f .opencode/oh-my-opencode.json .opencode/oh-my-opencode.jsonc
rm -f .opencode/oh-my-openagent.jsonc .opencode/oh-my-openagent.json \
.opencode/oh-my-opencode.jsonc .opencode/oh-my-opencode.json
```
3. **Verify removal**
@@ -310,7 +318,11 @@ See full [Features Documentation](docs/reference/features.md).
- **Claude Code Compatibility**: Full hook system, commands, skills, agents, MCPs
- **Built-in MCPs**: websearch (Exa), context7 (docs), grep_app (GitHub search)
- **Session Tools**: List, read, search, and analyze session history
- **Productivity Features**: Ralph Loop, Todo Enforcer, GPT permission-tail continuation, Comment Checker, Think Mode, and more
- **Productivity Features**: Ralph Loop, Todo Enforcer, Comment Checker, Think Mode, and more
- **Doctor Command**: Built-in diagnostics (`bunx oh-my-opencode doctor`) verify plugin registration, config, models, and environment
- **Model Fallbacks**: `fallback_models` can mix plain model strings with per-fallback object settings in the same array
- **File Prompts**: Load prompts from files with `file://` support in agent configurations
- **Session Recovery**: Automatic recovery from session errors, context window limits, and API failures
- **Model Setup**: Agent-model matching is built into the [Installation Guide](docs/guide/installation.md#step-5-understand-your-model-setup)
## Configuration
@@ -320,14 +332,14 @@ Opinionated defaults, adjustable if you insist.
See [Configuration Documentation](docs/reference/configuration.md).
**Quick Overview:**
- **Config Locations**: `.opencode/oh-my-opencode.jsonc` or `.opencode/oh-my-opencode.json` (project), `~/.config/opencode/oh-my-opencode.jsonc` or `~/.config/opencode/oh-my-opencode.json` (user)
- **Config Locations**: The compatibility layer recognizes both `oh-my-openagent.json[c]` and legacy `oh-my-opencode.json[c]` plugin config files. Existing installs still commonly use the legacy basename.
- **JSONC Support**: Comments and trailing commas supported
- **Agents**: Override models, temperatures, prompts, and permissions for any agent
- **Built-in Skills**: `playwright` (browser automation), `git-master` (atomic commits)
- **Sisyphus Agent**: Main orchestrator with Prometheus (Planner) and Metis (Plan Consultant)
- **Background Tasks**: Configure concurrency limits per provider/model
- **Categories**: Domain-specific task delegation (`visual`, `business-logic`, custom)
- **Hooks**: 25+ built-in hooks, including `gpt-permission-continuation`, all configurable via `disabled_hooks`
- **Hooks**: 25+ built-in hooks, all configurable via `disabled_hooks`
- **MCPs**: Built-in websearch (Exa), context7 (docs), grep_app (GitHub search)
- **LSP**: Full LSP support with refactoring tools
- **Experimental**: Aggressive truncation, auto-resume, and more

View File

@@ -4,6 +4,17 @@
> Ключевой мейнтейнер Q получил травму, поэтому на этой неделе ответы по issue/PR и релизы могут задерживаться.
> Спасибо за терпение и поддержку.
> [!TIP]
> **Building in Public**
>
> Мейнтейнер разрабатывает и поддерживает oh-my-opencode в режиме реального времени с помощью Jobdori — ИИ-ассистента на базе глубоко кастомизированной версии OpenClaw.
> Каждая фича, каждый фикс, каждый триаж issue — в прямом эфире в нашем Discord.
>
> [![Building in Public](./.github/assets/building-in-public.png)](https://discord.gg/PUwSMR9XNk)
>
> [**→ Смотрите в #building-in-public**](https://discord.gg/PUwSMR9XNk)
> [!NOTE]
>
> [![Sisyphus Labs - Sisyphus is the agent that codes like your team.](./.github/assets/sisyphuslabs.png?v=2)](https://sisyphuslabs.ai)
@@ -141,7 +152,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) — главный оркестратор. Он планирует, делегирует задачи специалистам и доводит их до завершения с агрессивным параллельным выполнением. Он не останавливается на полпути.
**Hephaestus** (`gpt-5.3-codex`) — автономный глубокий исполнитель. Дайте ему цель, а не рецепт. Он исследует кодовую базу, изучает паттерны и выполняет задачи сквозным образом без лишних подсказок. *Законный Мастер.*
**Hephaestus** (`gpt-5.4`) — автономный глубокий исполнитель. Дайте ему цель, а не рецепт. Он исследует кодовую базу, изучает паттерны и выполняет задачи сквозным образом без лишних подсказок. *Законный Мастер.*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) — стратегический планировщик. Режим интервью: задаёт вопросы, определяет объём работ и формирует детальный план до того, как написана хотя бы одна строка кода.
@@ -149,7 +160,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
> Anthropic [заблокировал OpenCode из-за нас.](https://x.com/thdxr/status/2010149530486911014) Именно поэтому Hephaestus зовётся «Законным Мастером». Ирония намеренная.
>
> Мы работаем лучше всего на Opus, но Kimi K2.5 + GPT-5.3 Codex уже превосходят ванильный Claude Code. Никакой настройки не требуется.
> Мы работаем лучше всего на Opus, но Kimi K2.5 + GPT-5.4 уже превосходят ванильный Claude Code. Никакой настройки не требуется.
### Оркестрация агентов

View File

@@ -4,6 +4,17 @@
> 核心维护者 Q 因受伤,本周 issue/PR 回复和发布可能会延迟。
> 感谢你的耐心与支持。
> [!TIP]
> **Building in Public**
>
> 维护者正在使用 Jobdori 实时开发和维护 oh-my-opencode。Jobdori 是基于 OpenClaw 深度定制的 AI 助手。
> 每个功能开发、每次修复、每次 Issue 分类,都在 Discord 上实时进行。
>
> [![Building in Public](./.github/assets/building-in-public.png)](https://discord.gg/PUwSMR9XNk)
>
> [**→ 在 #building-in-public 频道中查看**](https://discord.gg/PUwSMR9XNk)
> [!NOTE]
>
> [![Sisyphus Labs - Sisyphus is the agent that codes like your team.](./.github/assets/sisyphuslabs.png?v=2)](https://sisyphuslabs.ai)
@@ -158,7 +169,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) 是你的主指挥官。他负责制定计划、分配任务给专家团队,并以极其激进的并行策略推动任务直至完成。他从不半途而废。
**Hephaestus** (`gpt-5.3-codex`) 是你的自主深度工作者。你只需要给他目标,不要给他具体做法。他会自动探索代码库模式,从头到尾独立执行任务,绝不会中途要你当保姆。*名副其实的正牌工匠。*
**Hephaestus** (`gpt-5.4`) 是你的自主深度工作者。你只需要给他目标,不要给他具体做法。他会自动探索代码库模式,从头到尾独立执行任务,绝不会中途要你当保姆。*名副其实的正牌工匠。*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) 是你的战略规划师。他通过访谈模式,在动一行代码之前,先通过提问确定范围并构建详尽的执行计划。
@@ -166,7 +177,7 @@ Read this and tell me why it's not just another boilerplate: https://raw.githubu
> Anthropic [因为我们屏蔽了 OpenCode](https://x.com/thdxr/status/2010149530486911014)。这就是为什么我们将 Hephaestus 命名为“正牌工匠 (The Legitimate Craftsman)”。这是一个故意的讽刺。
>
> 我们在 Opus 上运行得最好,但仅仅使用 Kimi K2.5 + GPT-5.3 Codex 就足以碾压原版的 Claude Code。完全不需要配置。
> 我们在 Opus 上运行得最好,但仅仅使用 Kimi K2.5 + GPT-5.4 就足以碾压原版的 Claude Code。完全不需要配置。
### 智能体调度机制

File diff suppressed because it is too large Load Diff

View File

@@ -1,18 +0,0 @@
{
"name": "hashline-edit-benchmark",
"version": "0.1.0",
"private": true,
"type": "module",
"description": "Hashline edit tool benchmark using Vercel AI SDK with FriendliAI provider",
"scripts": {
"bench:basic": "bun run test-edit-ops.ts",
"bench:edge": "bun run test-edge-cases.ts",
"bench:multi": "bun run test-multi-model.ts",
"bench:all": "bun run bench:basic && bun run bench:edge"
},
"dependencies": {
"@friendliai/ai-provider": "^1.0.9",
"ai": "^6.0.94",
"zod": "^4.1.0"
}
}

View File

@@ -71,9 +71,19 @@ function getSignalExitCode(signal) {
return 128 + (signalCodeByName[signal] ?? 1);
}
function getPackageBaseName() {
try {
const packageJson = JSON.parse(readFileSync(new URL("../package.json", import.meta.url), "utf8"));
return packageJson.name || "oh-my-opencode";
} catch {
return "oh-my-opencode";
}
}
function main() {
const { platform, arch } = process;
const libcFamily = getLibcFamily();
const packageBaseName = getPackageBaseName();
const avx2Supported = supportsAvx2();
let packageCandidates;
@@ -83,6 +93,7 @@ function main() {
arch,
libcFamily,
preferBaseline: avx2Supported === false,
packageBaseName,
});
} catch (error) {
console.error(`\noh-my-opencode: ${error.message}\n`);

View File

@@ -3,11 +3,11 @@
/**
* Get the platform-specific package name
* @param {{ platform: string, arch: string, libcFamily?: string | null }} options
* @param {{ platform: string, arch: string, libcFamily?: string | null, packageBaseName?: string }} options
* @returns {string} Package name like "oh-my-opencode-darwin-arm64"
* @throws {Error} If libc cannot be detected on Linux
*/
export function getPlatformPackage({ platform, arch, libcFamily }) {
export function getPlatformPackage({ platform, arch, libcFamily, packageBaseName = "oh-my-opencode" }) {
let suffix = "";
if (platform === "linux") {
if (libcFamily === null || libcFamily === undefined) {
@@ -23,13 +23,13 @@ export function getPlatformPackage({ platform, arch, libcFamily }) {
// Map platform names: win32 -> windows (for package name)
const os = platform === "win32" ? "windows" : platform;
return `oh-my-opencode-${os}-${arch}${suffix}`;
return `${packageBaseName}-${os}-${arch}${suffix}`;
}
/** @param {{ platform: string, arch: string, libcFamily?: string | null, preferBaseline?: boolean }} options */
export function getPlatformPackageCandidates({ platform, arch, libcFamily, preferBaseline = false }) {
const primaryPackage = getPlatformPackage({ platform, arch, libcFamily });
const baselinePackage = getBaselinePlatformPackage({ platform, arch, libcFamily });
/** @param {{ platform: string, arch: string, libcFamily?: string | null, preferBaseline?: boolean, packageBaseName?: string }} options */
export function getPlatformPackageCandidates({ platform, arch, libcFamily, preferBaseline = false, packageBaseName = "oh-my-opencode" }) {
const primaryPackage = getPlatformPackage({ platform, arch, libcFamily, packageBaseName });
const baselinePackage = getBaselinePlatformPackage({ platform, arch, libcFamily, packageBaseName });
if (!baselinePackage) {
return [primaryPackage];
@@ -38,18 +38,18 @@ export function getPlatformPackageCandidates({ platform, arch, libcFamily, prefe
return preferBaseline ? [baselinePackage, primaryPackage] : [primaryPackage, baselinePackage];
}
/** @param {{ platform: string, arch: string, libcFamily?: string | null }} options */
function getBaselinePlatformPackage({ platform, arch, libcFamily }) {
/** @param {{ platform: string, arch: string, libcFamily?: string | null, packageBaseName?: string }} options */
function getBaselinePlatformPackage({ platform, arch, libcFamily, packageBaseName = "oh-my-opencode" }) {
if (arch !== "x64") {
return null;
}
if (platform === "darwin") {
return "oh-my-opencode-darwin-x64-baseline";
return `${packageBaseName}-darwin-x64-baseline`;
}
if (platform === "win32") {
return "oh-my-opencode-windows-x64-baseline";
return `${packageBaseName}-windows-x64-baseline`;
}
if (platform === "linux") {
@@ -61,10 +61,10 @@ function getBaselinePlatformPackage({ platform, arch, libcFamily }) {
}
if (libcFamily === "musl") {
return "oh-my-opencode-linux-x64-musl-baseline";
return `${packageBaseName}-linux-x64-musl-baseline`;
}
return "oh-my-opencode-linux-x64-baseline";
return `${packageBaseName}-linux-x64-baseline`;
}
return null;

View File

@@ -190,6 +190,21 @@ describe("getPlatformPackageCandidates", () => {
]);
});
test("supports renamed package family via packageBaseName override", () => {
// #given Linux x64 with glibc and renamed package base
const input = { platform: "linux", arch: "x64", libcFamily: "glibc", packageBaseName: "oh-my-openagent" };
// #when getting package candidates
const result = getPlatformPackageCandidates(input);
// #then returns renamed package family candidates
expect(result).toEqual([
"oh-my-openagent-linux-x64",
"oh-my-openagent-linux-x64-baseline",
]);
});
test("returns only one candidate for ARM64", () => {
// #given non-x64 platform
const input = { platform: "linux", arch: "arm64", libcFamily: "glibc" };

View File

@@ -0,0 +1,88 @@
{
"$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/dev/assets/oh-my-opencode.schema.json",
// Optimized for intensive coding sessions.
// Prioritizes deep implementation agents and fast feedback loops.
"agents": {
// Primary orchestrator: aggressive parallel delegation
"sisyphus": {
"model": "kimi-for-coding/k2p5",
"ultrawork": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
"prompt_append": "Delegate heavily to hephaestus for implementation. Parallelize exploration.",
},
// Heavy lifter: maximum autonomy for coding tasks
"hephaestus": {
"model": "openai/gpt-5.4",
"prompt_append": "You are the primary implementation agent. Own the codebase. Explore, decide, execute. Use LSP and AST-grep aggressively.",
"permission": { "edit": "allow", "bash": { "git": "allow", "test": "allow" } },
},
// Lightweight planner: quick planning for coding tasks
"prometheus": {
"model": "opencode/gpt-5-nano",
"prompt_append": "Keep plans concise. Focus on file structure and key decisions.",
},
// Debugging and architecture
"oracle": { "model": "openai/gpt-5.4", "variant": "high" },
// Fast docs lookup
"librarian": { "model": "github-copilot/grok-code-fast-1" },
// Rapid codebase navigation
"explore": { "model": "github-copilot/grok-code-fast-1" },
// Frontend and visual work
"multimodal-looker": { "model": "google/gemini-3.1-pro" },
// Plan review: minimal overhead
"metis": { "model": "opencode/gpt-5-nano" },
// Code review focus
"momus": { "prompt_append": "Focus on code quality, edge cases, and test coverage." },
// Long-running coding sessions
"atlas": {},
// Quick fixes and small tasks
"sisyphus-junior": { "model": "opencode/gpt-5-nano" },
},
"categories": {
// Trivial changes: fastest possible
"quick": { "model": "opencode/gpt-5-nano" },
// Standard coding tasks: good quality, fast
"unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
// Complex refactors: best quality
"unspecified-high": { "model": "openai/gpt-5.3-codex" },
// Visual work
"visual-engineering": { "model": "google/gemini-3.1-pro", "variant": "high" },
// Deep autonomous work
"deep": { "model": "openai/gpt-5.3-codex" },
// Architecture decisions
"ultrabrain": { "model": "openai/gpt-5.4", "variant": "xhigh" },
},
// High concurrency for parallel agent work
"background_task": {
"defaultConcurrency": 8,
"providerConcurrency": {
"anthropic": 5,
"openai": 5,
"google": 10,
"github-copilot": 10,
"opencode": 15,
},
},
// Enable all coding aids
"hashline_edit": true,
"experimental": { "aggressive_truncation": true, "task_system": true },
}

View File

@@ -0,0 +1,71 @@
{
"$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/dev/assets/oh-my-opencode.schema.json",
// Balanced defaults for general development.
// Tuned for reliability across diverse tasks without overspending.
"agents": {
// Main orchestrator: handles delegation and drives tasks to completion
"sisyphus": {
"model": "anthropic/claude-opus-4-6",
"ultrawork": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
},
// Deep autonomous worker: end-to-end implementation
"hephaestus": {
"model": "openai/gpt-5.4",
"prompt_append": "Explore thoroughly, then implement. Prefer small, testable changes.",
},
// Strategic planner: interview mode before execution
"prometheus": {
"prompt_append": "Always interview first. Validate scope before planning.",
},
// Architecture consultant: complex design and debugging
"oracle": { "model": "openai/gpt-5.4", "variant": "high" },
// Documentation and code search
"librarian": { "model": "google/gemini-3-flash" },
// Fast codebase exploration
"explore": { "model": "github-copilot/grok-code-fast-1" },
// Visual tasks: UI/UX, images, diagrams
"multimodal-looker": { "model": "google/gemini-3.1-pro" },
// Plan consultant: reviews and improves plans
"metis": {},
// Critic and reviewer
"momus": {},
// Continuation and long-running task handler
"atlas": {},
// Lightweight task executor for simple jobs
"sisyphus-junior": { "model": "opencode/gpt-5-nano" },
},
"categories": {
"quick": { "model": "opencode/gpt-5-nano" },
"unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
"unspecified-high": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
"writing": { "model": "google/gemini-3-flash" },
"visual-engineering": { "model": "google/gemini-3.1-pro", "variant": "high" },
"deep": { "model": "openai/gpt-5.3-codex" },
"ultrabrain": { "model": "openai/gpt-5.4", "variant": "xhigh" },
},
// Conservative concurrency for cost control
"background_task": {
"providerConcurrency": {
"anthropic": 3,
"openai": 3,
"google": 5,
"opencode": 10,
},
},
"experimental": { "aggressive_truncation": true },
}

View File

@@ -0,0 +1,112 @@
{
"$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/dev/assets/oh-my-opencode.schema.json",
// Optimized for strategic planning, architecture, and complex project design.
// Prioritizes deep thinking agents and thorough analysis before execution.
"agents": {
// Orchestrator: delegates to planning agents first
"sisyphus": {
"model": "anthropic/claude-opus-4-6",
"ultrawork": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
"prompt_append": "Always consult prometheus and atlas for planning. Never rush to implementation.",
},
// Implementation: uses planning outputs
"hephaestus": {
"model": "openai/gpt-5.4",
"prompt_append": "Follow established plans precisely. Ask for clarification when plans are ambiguous.",
},
// Primary planner: deep interview mode
"prometheus": {
"model": "anthropic/claude-opus-4-6",
"thinking": { "type": "enabled", "budgetTokens": 160000 },
"prompt_append": "Interview extensively. Question assumptions. Build exhaustive plans with milestones, risks, and contingencies. Use deep & quick agents heavily in parallel for research.",
},
// Architecture consultant
"oracle": {
"model": "openai/gpt-5.4",
"variant": "xhigh",
"thinking": { "type": "enabled", "budgetTokens": 120000 },
},
// Research and documentation
"librarian": { "model": "google/gemini-3-flash" },
// Exploration for research phase
"explore": { "model": "github-copilot/grok-code-fast-1" },
// Visual planning and diagrams
"multimodal-looker": { "model": "google/gemini-3.1-pro", "variant": "high" },
// Plan review and refinement: heavily utilized
"metis": {
"model": "anthropic/claude-opus-4-6",
"prompt_append": "Critically evaluate plans. Identify gaps, risks, and improvements. Be thorough.",
},
// Critic: challenges assumptions
"momus": {
"model": "openai/gpt-5.4",
"prompt_append": "Challenge all assumptions in plans. Look for edge cases, failure modes, and overlooked requirements.",
},
// Long-running planning sessions
"atlas": {
"prompt_append": "Preserve context across long planning sessions. Track evolving decisions.",
},
// Quick research tasks
"sisyphus-junior": { "model": "opencode/gpt-5-nano" },
},
"categories": {
"quick": { "model": "opencode/gpt-5-nano" },
"unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
// High-effort planning tasks: maximum reasoning
"unspecified-high": {
"model": "openai/gpt-5.4",
"variant": "xhigh",
},
// Documentation from plans
"writing": { "model": "google/gemini-3-flash" },
// Visual architecture
"visual-engineering": { "model": "google/gemini-3.1-pro", "variant": "high" },
// Deep research and analysis
"deep": { "model": "openai/gpt-5.3-codex" },
// Strategic reasoning
"ultrabrain": { "model": "openai/gpt-5.4", "variant": "xhigh" },
// Creative approaches to problems
"artistry": { "model": "google/gemini-3.1-pro", "variant": "high" },
},
// Moderate concurrency: planning is sequential by nature
"background_task": {
"defaultConcurrency": 5,
"staleTimeoutMs": 300000,
"providerConcurrency": {
"anthropic": 3,
"openai": 3,
},
"modelConcurrency": {
"anthropic/claude-opus-4-6": 2,
"openai/gpt-5.4": 2,
},
},
"sisyphus_agent": {
"planner_enabled": true,
"replace_plan": true,
},
"experimental": { "aggressive_truncation": true },
}

View File

@@ -8,7 +8,7 @@ Think of AI models as developers on a team. Each has a different brain, differen
This isn't a bug. It's the foundation of the entire system.
Oh My OpenCode assigns each agent a model that matches its _working style_ — like building a team where each person is in the role that fits their personality.
Oh My OpenAgent assigns each agent a model that matches its _working style_ — like building a team where each person is in the role that fits their personality.
### Sisyphus: The Sociable Lead
@@ -27,7 +27,7 @@ Using Sisyphus with older GPT models would be like taking your best project mana
Hephaestus is the developer who stays in their room coding all day. Doesn't talk much. Might seem socially awkward. But give them a hard technical problem and they'll emerge three hours later with a solution nobody else could have found.
**This is why Hephaestus uses GPT-5.3 Codex.** Codex is built for exactly this:
**This is why Hephaestus uses GPT-5.4.** GPT-5.4 is built for exactly this:
- Deep, autonomous exploration without hand-holding
- Multi-file reasoning across complex codebases
@@ -64,8 +64,8 @@ These agents have Claude-optimized prompts — long, detailed, mechanics-driven.
| Agent | Role | Fallback Chain | Notes |
| ------------ | ----------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------- |
| **Sisyphus** | Main orchestrator | Claude Opus → K2P5 → Kimi K2.5 → GPT-5.4 → GLM 5 → Big Pickle | Claude-family first. GPT-5.4 has dedicated prompt support. Kimi/GLM as intermediate fallbacks. |
| **Metis** | Plan gap analyzer | Claude Opus → GPT-5.4 → Gemini 3.1 Pro | Claude preferred, GPT acceptable fallback. |
| **Sisyphus** | Main orchestrator | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/kimi-k2.5 → kimi-for-coding/k2p5 → opencode\|moonshotai\|moonshotai-cn\|firmware\|ollama-cloud\|aihubmix/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → zai-coding-plan\|opencode/glm-5 → opencode/big-pickle | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Metis** | Plan gap analyzer | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → opencode-go/glm-5 → kimi-for-coding/k2p5 | Exact runtime chain from `src/shared/model-requirements.ts`. |
### Dual-Prompt Agents → Claude preferred, GPT supported
@@ -73,8 +73,8 @@ These agents ship separate prompts for Claude and GPT families. They auto-detect
| Agent | Role | Fallback Chain | Notes |
| -------------- | ----------------- | -------------------------------------- | -------------------------------------------------------------------- |
| **Prometheus** | Strategic planner | Claude Opus → GPT-5.4 → Gemini 3.1 Pro | Interview-mode planning. GPT prompt is compact and principle-driven. |
| **Atlas** | Todo orchestrator | Claude Sonnet 4.6 → GPT-5.4 | Claude first, GPT-5.4 as the current fallback path. |
| **Prometheus** | Strategic planner | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → opencode-go/glm-5 → google\|github-copilot\|opencode/gemini-3.1-pro | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Atlas** | Todo orchestrator | anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → opencode-go/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → opencode-go/minimax-m2.7 | Exact runtime chain from `src/shared/model-requirements.ts`. |
### Deep Specialists → GPT
@@ -82,9 +82,9 @@ These agents are built for GPT's principle-driven style. Their prompts assume au
| Agent | Role | Fallback Chain | Notes |
| -------------- | ----------------------- | -------------------------------------- | ------------------------------------------------ |
| **Hephaestus** | Autonomous deep worker | GPT-5.3 Codex only | No fallback. Requires GPT access. The craftsman. |
| **Oracle** | Architecture consultant | GPT-5.4 → Gemini 3.1 Pro → Claude Opus | Read-only high-IQ consultation. |
| **Momus** | Ruthless reviewer | GPT-5.4 → Claude Opus → Gemini 3.1 Pro | Verification and plan review. |
| **Hephaestus** | Autonomous deep worker | GPT-5.4 (medium) | Requires a GPT-capable provider. The craftsman. |
| **Oracle** | Architecture consultant | openai\|github-copilot\|opencode/gpt-5.4 (high) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/glm-5 | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Momus** | Ruthless reviewer | openai\|github-copilot\|opencode/gpt-5.4 (xhigh) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → opencode-go/glm-5 | Exact runtime chain from `src/shared/model-requirements.ts`. |
### Utility Runners → Speed over Intelligence
@@ -92,9 +92,10 @@ These agents do grep, search, and retrieval. They intentionally use the fastest,
| Agent | Role | Fallback Chain | Notes |
| --------------------- | ------------------ | ---------------------------------------------- | ----------------------------------------------------- |
| **Explore** | Fast codebase grep | Grok Code Fast → MiniMax → Haiku → GPT-5-Nano | Speed is everything. Fire 10 in parallel. |
| **Librarian** | Docs/code search | Gemini Flash → MiniMax → Big Pickle | Doc retrieval doesn't need deep reasoning. |
| **Multimodal Looker** | Vision/screenshots | GPT-5.3 Codex → K2P5 → Gemini Flash → GLM-4.6v | Uses the first available multimodal-capable fallback. |
| **Explore** | Fast codebase grep | github-copilot\|xai/grok-code-fast-1 → opencode-go/minimax-m2.7-highspeed → opencode/minimax-m2.7 → anthropic\|opencode/claude-haiku-4-5 → opencode/gpt-5-nano | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Librarian** | Docs/code search | opencode-go/minimax-m2.7 → opencode/minimax-m2.7-highspeed → anthropic\|opencode/claude-haiku-4-5 → opencode/gpt-5-nano | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Multimodal Looker** | Vision/screenshots | openai\|opencode/gpt-5.4 (medium) → opencode-go/kimi-k2.5 → zai-coding-plan/glm-4.6v → openai\|github-copilot\|opencode/gpt-5-nano | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Sisyphus-Junior** | Category executor | anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → opencode-go/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → opencode-go/minimax-m2.7 → opencode/big-pickle | Exact runtime chain from `src/shared/model-requirements.ts`. |
---
@@ -118,9 +119,9 @@ Principle-driven, explicit reasoning, deep technical capability. Best for agents
| Model | Strengths |
| ----------------- | ----------------------------------------------------------------------------------------------- |
| **GPT-5.3 Codex** | Deep coding powerhouse. Autonomous exploration. Required for Hephaestus. |
| **GPT-5.4** | High intelligence, strategic reasoning. Default for Oracle. |
| **GPT-5.4** | Strong principle-driven reasoning. Default for Momus and a key fallback for Prometheus / Atlas. |
| **GPT-5.3 Codex** | Deep coding powerhouse. Autonomous exploration. Still available for deep category and explicit overrides. |
| **GPT-5.4** | High intelligence, strategic reasoning. Default for Oracle, Momus, and a key fallback for Prometheus / Atlas. Uses xhigh variant for Momus. |
| **GPT-5.4 Mini** | Fast + strong reasoning. Good for lightweight autonomous tasks. Default for quick category. |
| **GPT-5-Nano** | Ultra-cheap, fast. Good for simple utility tasks. |
### Other Models
@@ -130,11 +131,33 @@ Principle-driven, explicit reasoning, deep technical capability. Best for agents
| **Gemini 3.1 Pro** | Excels at visual/frontend tasks. Different reasoning style. Default for `visual-engineering` and `artistry`. |
| **Gemini 3 Flash** | Fast. Good for doc search and light tasks. |
| **Grok Code Fast 1** | Blazing fast code grep. Default for Explore agent. |
| **MiniMax M2.5** | Fast and smart. Good for utility tasks and search/retrieval. |
| **MiniMax M2.7** | Fast and smart. Used in OpenCode Go and OpenCode Zen utility fallback chains. |
| **MiniMax M2.7 Highspeed** | High-speed OpenCode catalog entry used in utility fallback chains that prefer the fastest available MiniMax path. |
### OpenCode Go
A premium subscription tier ($10/month) that provides reliable access to Chinese frontier models through OpenCode's infrastructure.
**Available Models:**
| Model | Use Case |
| ------------------------ | --------------------------------------------------------------------- |
| **opencode-go/kimi-k2.5** | Vision-capable, Claude-like reasoning. Used by Sisyphus, Atlas, Sisyphus-Junior, Multimodal Looker. |
| **opencode-go/glm-5** | Text-only orchestration model. Used by Oracle, Prometheus, Metis, Momus. |
| **opencode-go/minimax-m2.7** | Ultra-cheap, fast responses. Used by Librarian, Atlas, and Sisyphus-Junior for utility work. |
| **opencode-go/minimax-m2.7-highspeed** | Even faster OpenCode Go MiniMax entry used by Explore when the high-speed catalog entry is available. |
**When It Gets Used:**
OpenCode Go models appear throughout the fallback chains as intermediate options. Depending on the agent, they can sit before GPT, after GPT, or act as the last structured-model fallback before cheaper utility paths.
**Go-Only Scenarios:**
Some model identifiers like `k2p5` (paid Kimi K2.5) and `glm-5` may only be available through OpenCode Go subscription in certain regions. When configured with these short identifiers, the system resolves them through the opencode-go provider first.
### About Free-Tier Fallbacks
You may see model names like `kimi-k2.5-free`, `minimax-m2.5-free`, or `big-pickle` (GLM 4.6) in the source code or logs. These are free-tier versions of the same model families, served through the OpenCode Zen provider. They exist as lower-priority entries in fallback chains.
You may see model names like `kimi-k2.5-free`, `minimax-m2.7`, `minimax-m2.7-highspeed`, or `big-pickle` (GLM 4.6) in the source code or logs. These are provider-specific or speed-optimized entries in fallback chains.
You don't need to configure them. The system includes them so it degrades gracefully when you don't have every paid subscription. If you have the paid version, the paid version is always preferred.
@@ -146,14 +169,14 @@ When agents delegate work, they don't pick a model name — they pick a **catego
| Category | When Used | Fallback Chain |
| -------------------- | -------------------------- | -------------------------------------------- |
| `visual-engineering` | Frontend, UI, CSS, design | Gemini 3.1 Pro → GLM 5 → Claude Opus |
| `ultrabrain` | Maximum reasoning needed | GPT-5.3 Codex → Gemini 3.1 Pro → Claude Opus |
| `deep` | Deep coding, complex logic | GPT-5.3 Codex → Claude Opus → Gemini 3.1 Pro |
| `artistry` | Creative, novel approaches | Gemini 3.1 Pro → Claude Opus → GPT-5.4 |
| `quick` | Simple, fast tasks | Claude Haiku → Gemini Flash → GPT-5-Nano |
| `unspecified-high` | General complex work | GPT-5.4 → Claude Opus → GLM 5 → K2P5 |
| `unspecified-low` | General standard work | Claude Sonnet → GPT-5.3 Codex → Gemini Flash |
| `writing` | Text, docs, prose | Gemini Flash → Claude Sonnet |
| `visual-engineering` | Frontend, UI, CSS, design | google\|github-copilot\|opencode/gemini-3.1-pro (high) → zai-coding-plan\|opencode/glm-5 → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/glm-5 → kimi-for-coding/k2p5 |
| `ultrabrain` | Maximum reasoning needed | openai\|opencode/gpt-5.4 (xhigh) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/glm-5 |
| `deep` | Deep coding, complex logic | openai\|opencode/gpt-5.3-codex (medium) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → google\|github-copilot\|opencode/gemini-3.1-pro (high) |
| `artistry` | Creative, novel approaches | google\|github-copilot\|opencode/gemini-3.1-pro (high) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 |
| `quick` | Simple, fast tasks | openai\|github-copilot\|opencode/gpt-5.4-mini → anthropic\|github-copilot\|opencode/claude-haiku-4-5 → google\|github-copilot\|opencode/gemini-3-flash → opencode-go/minimax-m2.7 → opencode/gpt-5-nano |
| `unspecified-high` | General complex work | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → zai-coding-plan\|opencode/glm-5 → kimi-for-coding/k2p5 → opencode-go/glm-5 → opencode/kimi-k2.5 → opencode\|moonshotai\|moonshotai-cn\|firmware\|ollama-cloud\|aihubmix/kimi-k2.5 |
| `unspecified-low` | General standard work | anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → openai\|opencode/gpt-5.3-codex (medium) → opencode-go/kimi-k2.5 → google\|github-copilot\|opencode/gemini-3-flash → opencode-go/minimax-m2.7 |
| `writing` | Text, docs, prose | google\|github-copilot\|opencode/gemini-3-flash → opencode-go/kimi-k2.5 → anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → opencode-go/minimax-m2.7 |
See the [Orchestration System Guide](./orchestration.md) for how agents dispatch tasks to categories.
@@ -190,7 +213,7 @@ See the [Orchestration System Guide](./orchestration.md) for how agents dispatch
"categories": {
"quick": { "model": "opencode/gpt-5-nano" },
"unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
"unspecified-high": { "model": "openai/gpt-5.4-high" },
"unspecified-high": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
"visual-engineering": {
"model": "google/gemini-3.1-pro",
"variant": "high",
@@ -233,12 +256,46 @@ Run `opencode models` to see available models, `opencode auth login` to authenti
### How Model Resolution Works
Each agent has a fallback chain. The system tries models in priority order until it finds one available through your connected providers. You don't need to configure providers per model — just authenticate (`opencode auth login`) and the system figures out which models are available and where.
Each agent has a fallback chain. The system tries models in priority order until it finds one available through your connected providers. You don't need to configure providers per model. Just authenticate (`opencode auth login`) and the system figures out which models are available and where.
Core-agent tab cycling is deterministic via injected runtime order field. The fixed priority order is Sisyphus (order: 1), Hephaestus (order: 2), Prometheus (order: 3), and Atlas (order: 4), then the remaining agents follow.
Your explicit configuration always wins. If you set a specific model for an agent, that choice takes precedence even when resolution data is cold.
Variant and `reasoningEffort` overrides are normalized to model-supported values, so cross-provider overrides degrade gracefully instead of failing hard.
Model capabilities are models.dev-backed, with a refreshable cache and capability diagnostics. Use `bunx oh-my-opencode refresh-model-capabilities` to update the cache, or configure `model_capabilities.auto_refresh_on_start` to refresh at startup.
To see which models your agents will actually use, run `bunx oh-my-opencode doctor`. This shows effective model resolution based on your current authentication and config.
```
Agent Request → User Override (if configured) → Fallback Chain → System Default
```
### File-Based Prompts
You can load agent system prompts from external files using `file://` URLs in the `prompt` field, or append additional content with `prompt_append`. The `prompt_append` field also works on categories.
```jsonc
{
"agents": {
"sisyphus": {
"prompt": "file:///path/to/custom-prompt.md"
},
"oracle": {
"prompt_append": "file:///path/to/additional-context.md"
}
},
"categories": {
"deep": {
"prompt_append": "file:///path/to/deep-category-append.md"
}
}
}
```
The file content is loaded at runtime and injected into the agent's system prompt. Supports `~` expansion for home directory and relative `file://` paths.
---
## See Also

View File

@@ -26,6 +26,8 @@ Follow the prompts to configure your Claude, ChatGPT, and Gemini subscriptions.
After you install it, you can read this [overview guide](./overview.md) to understand more.
The published package and local binary are still `oh-my-opencode`. Inside `opencode.json`, the compatibility layer now prefers the plugin entry `oh-my-openagent`, while legacy `oh-my-opencode` entries still load with a warning. Plugin config loading recognizes both `oh-my-openagent.json[c]` and `oh-my-opencode.json[c]` during the transition. If you see a "Using legacy package name" warning from `bunx oh-my-opencode doctor`, update your `opencode.json` plugin entry from `"oh-my-opencode"` to `"oh-my-openagent"`.
## For LLM Agents
> **IMPORTANT: Use `curl` to fetch this file, NOT WebFetch.** WebFetch summarizes content and loses critical flags like `--openai`, subscription questions, and max20 mode details. Always use:
@@ -34,7 +36,7 @@ After you install it, you can read this [overview guide](./overview.md) to under
> curl -fsSL https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/refs/heads/dev/docs/guide/installation.md
> ```
If you're an LLM Agent helping set up oh-my-opencode, Welcome! You'll now help your user to install the best agent harness, OhMyOpenCode. Tell user this with greetings, with test 'oMoMoMoMo...'
If you're an LLM Agent helping set up oh-my-opencode, welcome. You'll help your user install the harness and verify the rename-compat setup cleanly. Tell user this with greetings, with text 'oMoMoMoMo...'
follow these steps:
@@ -68,7 +70,12 @@ Ask the user these questions to determine CLI options:
- If **yes**`--zai-coding-plan=yes`
- If **no**`--zai-coding-plan=no` (default)
**Provider Priority**: Native (anthropic/, openai/, google/) > GitHub Copilot > OpenCode Zen > Z.ai Coding Plan
7. **Do you have an OpenCode Go subscription?**
- OpenCode Go is a $10/month subscription providing access to GLM-5, Kimi K2.5, and MiniMax M2.7 models
- If **yes**`--opencode-go=yes`
- If **no**`--opencode-go=no` (default)
**Provider selection is agent-specific.** The installer and runtime do not use one single global provider priority. Each agent resolves against its own fallback chain.
MUST STRONGLY WARNING, WHEN USER SAID THEY DON'T HAVE CLAUDE SUBSCRIPTION, SISYPHUS AGENT MIGHT NOT WORK IDEALLY.
@@ -91,7 +98,7 @@ Spawn a subagent to handle installation and report back - to save context.
Based on user's answers, run the CLI installer with appropriate flags:
```bash
bunx oh-my-opencode install --no-tui --claude=<yes|no|max20> --gemini=<yes|no> --copilot=<yes|no> [--openai=<yes|no>] [--opencode-zen=<yes|no>] [--zai-coding-plan=<yes|no>]
bunx oh-my-opencode install --no-tui --claude=<yes|no|max20> --gemini=<yes|no> --copilot=<yes|no> [--openai=<yes|no>] [--opencode-go=<yes|no>] [--opencode-zen=<yes|no>] [--zai-coding-plan=<yes|no>]
```
**Examples:**
@@ -102,6 +109,7 @@ bunx oh-my-opencode install --no-tui --claude=<yes|no|max20> --gemini=<yes|no> -
- User has only GitHub Copilot: `bunx oh-my-opencode install --no-tui --claude=no --gemini=no --copilot=yes`
- User has Z.ai for Librarian: `bunx oh-my-opencode install --no-tui --claude=yes --gemini=no --copilot=no --zai-coding-plan=yes`
- User has only OpenCode Zen: `bunx oh-my-opencode install --no-tui --claude=no --gemini=no --copilot=no --opencode-zen=yes`
- User has OpenCode Go only: `bunx oh-my-opencode install --no-tui --claude=no --openai=no --gemini=no --copilot=no --opencode-go=yes`
- User has no subscriptions: `bunx oh-my-opencode install --no-tui --claude=no --gemini=no --copilot=no`
The CLI will:
@@ -114,8 +122,17 @@ The CLI will:
```bash
opencode --version # Should be 1.0.150 or higher
cat ~/.config/opencode/opencode.json # Should contain "oh-my-opencode" in plugin array
cat ~/.config/opencode/opencode.json # Should contain "oh-my-openagent" in plugin array, or the legacy "oh-my-opencode" entry while you are still migrating
```
#### Run Doctor Verification
After installation, verify everything is working correctly:
```bash
bunx oh-my-opencode doctor
```
This checks system, config, tools, and model resolution, including legacy package name warnings and compatibility-fallback diagnostics.
### Step 4: Configure Authentication
@@ -139,7 +156,7 @@ First, add the opencode-antigravity-auth plugin:
```json
{
"plugin": ["oh-my-opencode", "opencode-antigravity-auth@latest"]
"plugin": ["oh-my-openagent", "opencode-antigravity-auth@latest"]
}
```
@@ -148,9 +165,9 @@ First, add the opencode-antigravity-auth plugin:
You'll also need full model settings in `opencode.json`.
Read the [opencode-antigravity-auth documentation](https://github.com/NoeFabris/opencode-antigravity-auth), copy the full model configuration from the README, and merge carefully to avoid breaking the user's existing setup. The plugin now uses a **variant system** — models like `antigravity-gemini-3-pro` support `low`/`high` variants instead of separate `-low`/`-high` model entries.
##### oh-my-opencode Agent Model Override
##### Plugin config model override
The `opencode-antigravity-auth` plugin uses different model names than the built-in Google auth. Override the agent models in `oh-my-opencode.json` (or `.opencode/oh-my-opencode.json`):
The `opencode-antigravity-auth` plugin uses different model names than the built-in Google auth. Override the agent models in your plugin config file. Existing installs still commonly use `oh-my-opencode.json` or `.opencode/oh-my-opencode.json`, while the compatibility layer also recognizes `oh-my-openagent.json[c]`.
```json
{
@@ -170,7 +187,7 @@ The `opencode-antigravity-auth` plugin uses different model names than the built
**Available models (Gemini CLI quota)**:
- `google/gemini-2.5-flash`, `google/gemini-2.5-pro`, `google/gemini-3-flash-preview`, `google/gemini-3-pro-preview`
- `google/gemini-2.5-flash`, `google/gemini-2.5-pro`, `google/gemini-3-flash-preview`, `google/gemini-3.1-pro-preview`
> **Note**: Legacy tier-suffixed names like `google/antigravity-gemini-3-pro-high` still work but variants are recommended. Use `--variant=high` with the base model name instead.
@@ -195,16 +212,16 @@ GitHub Copilot is supported as a **fallback provider** when native providers are
##### Model Mappings
When GitHub Copilot is the best available provider, oh-my-opencode uses these model assignments:
When GitHub Copilot is the best available provider, install-time defaults are agent-specific. Common examples are:
| Agent | Model |
| ------------- | --------------------------------- |
| **Sisyphus** | `github-copilot/claude-opus-4-6` |
| **Oracle** | `github-copilot/gpt-5.4` |
| **Explore** | `github-copilot/grok-code-fast-1` |
| **Librarian** | `github-copilot/gemini-3-flash` |
| Agent | Model |
| ------------- | ---------------------------------- |
| **Sisyphus** | `github-copilot/claude-opus-4.6` |
| **Oracle** | `github-copilot/gpt-5.4` |
| **Explore** | `github-copilot/grok-code-fast-1` |
| **Atlas** | `github-copilot/claude-sonnet-4.6` |
GitHub Copilot acts as a proxy provider, routing requests to underlying models based on your subscription.
GitHub Copilot acts as a proxy provider, routing requests to underlying models based on your subscription. Some agents, like Librarian, are not installed from Copilot alone and instead rely on other configured providers or runtime fallback behavior.
#### Z.ai Coding Plan
@@ -221,39 +238,33 @@ If Z.ai is your main provider, the most important fallbacks are:
#### OpenCode Zen
OpenCode Zen provides access to `opencode/` prefixed models including `opencode/claude-opus-4-6`, `opencode/gpt-5.4`, `opencode/gpt-5.3-codex`, `opencode/gpt-5-nano`, `opencode/glm-5`, `opencode/big-pickle`, and `opencode/minimax-m2.5-free`.
OpenCode Zen provides access to `opencode/` prefixed models including `opencode/claude-opus-4-6`, `opencode/gpt-5.4`, `opencode/gpt-5.3-codex`, `opencode/gpt-5-nano`, `opencode/glm-5`, `opencode/big-pickle`, `opencode/minimax-m2.7`, and `opencode/minimax-m2.7-highspeed`.
When OpenCode Zen is the best available provider (no native or Copilot), these models are used:
When OpenCode Zen is the best available provider, these are the most relevant source-backed examples:
| Agent | Model |
| ------------- | ---------------------------------------------------- |
| **Sisyphus** | `opencode/claude-opus-4-6` |
| **Oracle** | `opencode/gpt-5.4` |
| **Explore** | `opencode/gpt-5-nano` |
| **Librarian** | `opencode/minimax-m2.5-free` / `opencode/big-pickle` |
| **Explore** | `opencode/minimax-m2.7` |
##### Setup
Run the installer and select "Yes" for GitHub Copilot:
Run the installer and select "Yes" for OpenCode Zen:
```bash
bunx oh-my-opencode install
# Select your subscriptions (Claude, ChatGPT, Gemini)
# When prompted: "Do you have a GitHub Copilot subscription?" → Select "Yes"
# Select your subscriptions (Claude, ChatGPT, Gemini, OpenCode Zen, etc.)
# When prompted: "Do you have access to OpenCode Zen (opencode/ models)?" → Select "Yes"
```
Or use non-interactive mode:
```bash
bunx oh-my-opencode install --no-tui --claude=no --openai=no --gemini=no --copilot=yes
bunx oh-my-opencode install --no-tui --claude=no --openai=no --gemini=no --opencode-zen=yes
```
Then authenticate with GitHub:
```bash
opencode auth login
# Select: GitHub → Authenticate via OAuth
```
This provider uses the `opencode/` model catalog. If your OpenCode environment prompts for provider authentication, follow the OpenCode provider flow for `opencode/` models instead of reusing the fallback-provider auth steps above.
### Step 5: Understand Your Model Setup
@@ -270,7 +281,7 @@ Not all models behave the same way. Understanding which models are "similar" hel
| **Claude Opus 4.6** | anthropic, github-copilot, opencode | Best overall. Default for Sisyphus. |
| **Claude Sonnet 4.6** | anthropic, github-copilot, opencode | Faster, cheaper. Good balance. |
| **Claude Haiku 4.5** | anthropic, opencode | Fast and cheap. Good for quick tasks. |
| **Kimi K2.5** | kimi-for-coding | Behaves very similarly to Claude. Great all-rounder. Default for Atlas. |
| **Kimi K2.5** | kimi-for-coding, opencode-go, opencode, moonshotai, moonshotai-cn, firmware, ollama-cloud, aihubmix | Behaves very similarly to Claude. Great all-rounder that appears in several orchestration fallback chains. |
| **Kimi K2.5 Free** | opencode | Free-tier Kimi. Rate-limited but functional. |
| **GLM 5** | zai-coding-plan, opencode | Claude-like behavior. Good for broad tasks. |
| **Big Pickle (GLM 4.6)** | opencode | Free-tier GLM. Decent fallback. |
@@ -279,27 +290,28 @@ Not all models behave the same way. Understanding which models are "similar" hel
| Model | Provider(s) | Notes |
| ----------------- | -------------------------------- | ------------------------------------------------- |
| **GPT-5.3-codex** | openai, github-copilot, opencode | Deep coding powerhouse. Required for Hephaestus. |
| **GPT-5.3-codex** | openai, github-copilot, opencode | Deep coding powerhouse. Still available for deep category and explicit overrides. |
| **GPT-5.4** | openai, github-copilot, opencode | High intelligence. Default for Oracle. |
| **GPT-5.4 Mini** | openai, github-copilot, opencode | Fast + strong reasoning. Default for quick category. |
| **GPT-5-Nano** | opencode | Ultra-cheap, fast. Good for simple utility tasks. |
**Different-Behavior Models**:
| Model | Provider(s) | Notes |
| --------------------- | -------------------------------- | ----------------------------------------------------------- |
| **Gemini 3 Pro** | google, github-copilot, opencode | Excels at visual/frontend tasks. Different reasoning style. |
| **Gemini 3.1 Pro** | google, github-copilot, opencode | Excels at visual/frontend tasks. Different reasoning style. |
| **Gemini 3 Flash** | google, github-copilot, opencode | Fast, good for doc search and light tasks. |
| **MiniMax M2.5** | venice | Fast and smart. Good for utility tasks. |
| **MiniMax M2.5 Free** | opencode | Free-tier MiniMax. Fast for search/retrieval. |
| **MiniMax M2.7** | opencode-go, opencode | Fast and smart. Utility fallbacks use `minimax-m2.7` or `minimax-m2.7-highspeed` depending on the chain. |
| **MiniMax M2.7 Highspeed** | opencode-go, opencode | Faster utility variant used in Explore and other retrieval-heavy fallback chains. |
**Speed-Focused Models**:
| Model | Provider(s) | Speed | Notes |
| ----------------------- | ---------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| **Grok Code Fast 1** | github-copilot, venice | Very fast | Optimized for code grep/search. Default for Explore. |
| **Grok Code Fast 1** | github-copilot, xai | Very fast | Optimized for code grep/search. Default for Explore. |
| **Claude Haiku 4.5** | anthropic, opencode | Fast | Good balance of speed and intelligence. |
| **MiniMax M2.5 (Free)** | opencode, venice | Fast | Smart for its speed class. |
| **GPT-5.3-codex-spark** | openai | Extremely fast | Blazing fast but compacts so aggressively that oh-my-opencode's context management doesn't work well with it. Not recommended for omo agents. |
| **MiniMax M2.7 Highspeed** | opencode-go, opencode | Very fast | High-speed MiniMax utility fallback used by runtime chains such as Explore and, on the OpenCode catalog, Librarian. |
| **GPT-5.3-codex-spark** | openai | Extremely fast | Blazing fast but compacts so aggressively that oh-my-openagent's context management doesn't work well with it. Not recommended for omo agents. |
#### What Each Agent Does and Which Model It Got
@@ -309,8 +321,8 @@ Based on your subscriptions, here's how the agents were configured:
| Agent | Role | Default Chain | What It Does |
| ------------ | ---------------- | ----------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **Sisyphus** | Main ultraworker | Opus (max) → Kimi K2.5 → GLM 5 → Big Pickle | Primary coding agent. Orchestrates everything. **Never use GPT — no GPT prompt exists.** |
| **Metis** | Plan review | Opus (max) → Kimi K2.5 → GPT-5.4 → Gemini 3 Pro | Reviews Prometheus plans for gaps. |
| **Sisyphus** | Main ultraworker | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/kimi-k2.5 → kimi-for-coding/k2p5 → opencode\|moonshotai\|moonshotai-cn\|firmware\|ollama-cloud\|aihubmix/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → zai-coding-plan\|opencode/glm-5 → opencode/big-pickle | Primary coding agent. Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Metis** | Plan review | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → opencode-go/glm-5 → kimi-for-coding/k2p5 | Reviews Prometheus plans for gaps. Exact runtime chain from `src/shared/model-requirements.ts`. |
**Dual-Prompt Agents** (auto-switch between Claude and GPT prompts):
@@ -320,16 +332,16 @@ Priority: **Claude > GPT > Claude-like models**
| Agent | Role | Default Chain | GPT Prompt? |
| -------------- | ----------------- | ---------------------------------------------------------- | ---------------------------------------------------------------- |
| **Prometheus** | Strategic planner | Opus (max) → **GPT-5.4 (high)** → Kimi K2.5 → Gemini 3 Pro | Yes — XML-tagged, principle-driven (~300 lines vs ~1,100 Claude) |
| **Atlas** | Todo orchestrator | **Kimi K2.5** → Sonnet → GPT-5.4 | Yes GPT-optimized todo management |
| **Prometheus** | Strategic planner | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → opencode-go/glm-5 → google\|github-copilot\|opencode/gemini-3.1-pro | Yes — XML-tagged, principle-driven (~300 lines vs ~1,100 Claude) |
| **Atlas** | Todo orchestrator | anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → opencode-go/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → opencode-go/minimax-m2.7 | Yes - GPT-optimized todo management |
**GPT-Native Agents** (built for GPT, don't override to Claude):
| Agent | Role | Default Chain | Notes |
| -------------- | ---------------------- | -------------------------------------- | ------------------------------------------------------ |
| **Hephaestus** | Deep autonomous worker | GPT-5.3-codex (medium) only | "Codex on steroids." No fallback. Requires GPT access. |
| **Oracle** | Architecture/debugging | GPT-5.4 (high) → Gemini 3 Pro → Opus | High-IQ strategic backup. GPT preferred. |
| **Momus** | High-accuracy reviewer | GPT-5.4 (medium) → Opus → Gemini 3 Pro | Verification agent. GPT preferred. |
| **Hephaestus** | Deep autonomous worker | GPT-5.4 (medium) only | "Codex on steroids." No fallback. Requires GPT access. |
| **Oracle** | Architecture/debugging | openai\|github-copilot\|opencode/gpt-5.4 (high) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/glm-5 | High-IQ strategic backup. GPT preferred. |
| **Momus** | High-accuracy reviewer | openai\|github-copilot\|opencode/gpt-5.4 (xhigh) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → opencode-go/glm-5 | Verification agent. GPT preferred. |
**Utility Agents** (speed over intelligence):
@@ -337,9 +349,9 @@ These agents do search, grep, and retrieval. They intentionally use fast, cheap
| Agent | Role | Default Chain | Design Rationale |
| --------------------- | ------------------ | ---------------------------------------------------------------------- | -------------------------------------------------------------- |
| **Explore** | Fast codebase grep | MiniMax M2.5 Free → Grok Code Fast → MiniMax M2.5Haiku → GPT-5-Nano | Speed is everything. Grok is blazing fast for grep. |
| **Librarian** | Docs/code search | MiniMax M2.5 Free → Gemini Flash → Big Pickle | Entirely free-tier. Doc retrieval doesn't need deep reasoning. |
| **Multimodal Looker** | Vision/screenshots | Kimi K2.5 → Kimi Free → Gemini Flash → GPT-5.4 → GLM-4.6v | Kimi excels at multimodal understanding. |
| **Explore** | Fast codebase grep | github-copilot\|xai/grok-code-fast-1 → opencode-go/minimax-m2.7-highspeed → opencode/minimax-m2.7anthropic\|opencode/claude-haiku-4-5 → opencode/gpt-5-nano | Speed is everything. Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Librarian** | Docs/code search | opencode-go/minimax-m2.7 → opencode/minimax-m2.7-highspeed → anthropic\|opencode/claude-haiku-4-5 → opencode/gpt-5-nano | Doc retrieval doesn't need deep reasoning. Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Multimodal Looker** | Vision/screenshots | openai\|opencode/gpt-5.4 (medium) → opencode-go/kimi-k2.5 → zai-coding-plan/glm-4.6v → openai\|github-copilot\|opencode/gpt-5-nano | GPT-5.4 now leads the default vision path when available. |
#### Why Different Models Need Different Prompts
@@ -358,7 +370,7 @@ This is why Prometheus and Atlas ship separate prompts per model family — they
#### Custom Model Configuration
If the user wants to override which model an agent uses, you can customize in `oh-my-opencode.json`:
If the user wants to override which model an agent uses, you can customize in your plugin config file. Existing installs still commonly use `oh-my-opencode.json`, while the compatibility layer also recognizes `oh-my-openagent.json[c]`.
```jsonc
{
@@ -393,18 +405,14 @@ GPT (5.3-codex, 5.2) > Claude Opus (decent fallback) > Gemini (acceptable)
**Dangerous** (no prompt support):
- Sisyphus → GPT: **No GPT prompt. Will degrade significantly.**
- Sisyphus → older GPT models: **Still a bad fit. GPT-5.4 is the only dedicated GPT prompt path.**
- Hephaestus → Claude: **Built for Codex. Claude can't replicate this.**
- Explore → Opus: **Massive cost waste. Explore needs speed, not intelligence.**
- Librarian → Opus: **Same. Doc search doesn't need Opus-level reasoning.**
#### Provider Priority Chain
#### Provider Resolution
When multiple providers are available, oh-my-opencode uses this priority:
```
Native (anthropic/, openai/, google/) > Kimi for Coding > GitHub Copilot > Venice > OpenCode Zen > Z.ai Coding Plan
```
There is no single global provider priority. The installer and runtime resolve each agent against its own fallback chain, so the winning provider depends on the agent and the subscriptions you enabled.
### ⚠️ Warning
@@ -418,7 +426,7 @@ Read this document again, think about you have done everything correctly.
#### Say 'Congratulations! 🎉' to the user
Say to user: Congratulations! 🎉 You have successfully set up OhMyOpenCode! Type `opencode` in your terminal to start using it.
Say to user: Congratulations! 🎉 You have successfully set up OhMyOpenAgent! Type `opencode` in your terminal to start using it.
#### Free advertising
@@ -455,3 +463,7 @@ Tell the user of following:
4. You wanna have your own agent- catalog setup? I can read the [docs](docs/guide/agent-model-matching.md) and set up for you after interviewing!
That's it. The agent will figure out the rest and handle everything automatically.
#### Advanced Configuration
You can customize agent models and fallback chains in your config. The `fallback_models` field accepts either a single string or an array that mixes strings and per-model objects with settings like `variant` and `temperature`. See the [Configuration Reference](../reference/configuration.md) and example configs in `docs/examples/` for details.

View File

@@ -1,6 +1,6 @@
# Orchestration System Guide
Oh My OpenCode's orchestration system transforms a simple AI agent into a coordinated development team through **separation of planning and execution**.
Oh My OpenAgent's orchestration system transforms a simple AI agent into a coordinated development team through **separation of planning and execution**.
---
@@ -296,12 +296,12 @@ task({ category: "quick", prompt: "..." }); // "Just get it done fast"
| Category | Model | When to Use |
| -------------------- | ---------------------- | ----------------------------------------------------------- |
| `visual-engineering` | Gemini 3.1 Pro | Frontend, UI/UX, design, styling, animation |
| `ultrabrain` | GPT-5.3 Codex (xhigh) | Deep logical reasoning, complex architecture decisions |
| `ultrabrain` | GPT-5.4 (xhigh) | Deep logical reasoning, complex architecture decisions |
| `artistry` | Gemini 3.1 Pro (high) | Highly creative or artistic tasks, novel ideas |
| `quick` | Claude Haiku 4.5 | Trivial tasks - single file changes, typo fixes |
| `quick` | GPT-5.4 Mini | Trivial tasks - single file changes, typo fixes |
| `deep` | GPT-5.3 Codex (medium) | Goal-oriented autonomous problem-solving, thorough research |
| `unspecified-low` | Claude Sonnet 4.6 | Tasks that don't fit other categories, low effort |
| `unspecified-high` | GPT-5.4 (high) | Tasks that don't fit other categories, high effort |
| `unspecified-high` | Claude Opus 4.6 (max) | Tasks that don't fit other categories, high effort |
| `writing` | Gemini 3 Flash | Documentation, prose, technical writing |
### Skills: Domain-Specific Instructions
@@ -420,7 +420,7 @@ Atlas is automatically activated when you run `/start-work`. You don't need to m
| Aspect | Hephaestus | Sisyphus + `ulw` / `ultrawork` |
| --------------- | ------------------------------------------ | ---------------------------------------------------- |
| **Model** | GPT-5.3 Codex (medium reasoning) | Claude Opus 4.6 / GPT-5.4 / GLM 5 depending on setup |
| **Model** | GPT-5.4 (medium reasoning) | Claude Opus 4.6 / GPT-5.4 / GLM 5 depending on setup |
| **Approach** | Autonomous deep worker | Keyword-activated ultrawork mode |
| **Best For** | Complex architectural work, deep reasoning | General complex tasks, "just do it" scenarios |
| **Planning** | Self-plans during execution | Uses Prometheus plans if available |
@@ -443,8 +443,8 @@ Switch to Hephaestus (Tab → Select Hephaestus) when:
- "Integrate our Rust core with the TypeScript frontend"
- "Migrate from MongoDB to PostgreSQL with zero downtime"
4. **You specifically want GPT-5.3 Codex reasoning**
- Some problems benefit from GPT-5.3 Codex's training characteristics
4. **You specifically want GPT-5.4 reasoning**
- Some problems benefit from GPT-5.4's training characteristics
**When to Use Sisyphus + `ulw`:**
@@ -469,13 +469,13 @@ Use the `ulw` keyword in Sisyphus when:
**Recommendation:**
- **For most users**: Use `ulw` keyword in Sisyphus. It's the default path and works excellently for 90% of complex tasks.
- **For power users**: Switch to Hephaestus when you specifically need GPT-5.3 Codex's reasoning style or want the "AmpCode deep mode" experience of fully autonomous exploration and execution.
- **For power users**: Switch to Hephaestus when you specifically need GPT-5.4's reasoning style or want the "AmpCode deep mode" experience of fully autonomous exploration and execution.
---
## Configuration
You can control related features in `oh-my-opencode.json`:
You can control related features in `oh-my-openagent.json`:
```jsonc
{
@@ -520,7 +520,7 @@ Type `exit` or start a new session. Atlas is primarily entered via `/start-work`
**For most tasks**: Type `ulw` in Sisyphus.
**Use Hephaestus when**: You specifically need GPT-5.3 Codex's reasoning style for deep architectural work or complex debugging.
**Use Hephaestus when**: You specifically need GPT-5.4's reasoning style for deep architectural work or complex debugging.
---

View File

@@ -1,6 +1,6 @@
# What Is Oh My OpenCode?
# What Is Oh My OpenAgent?
Oh My OpenCode is a multi-model agent orchestration harness for OpenCode. It transforms a single AI agent into a coordinated development team that actually ships code.
Oh My OpenAgent is a multi-model agent orchestration harness for OpenCode. It transforms a single AI agent into a coordinated development team that actually ships code.
Not locked to Claude. Not locked to OpenAI. Not locked to anyone.
@@ -15,7 +15,7 @@ Just better results, cheaper models, real orchestration.
Paste this into your LLM agent session:
```
Install and configure oh-my-opencode by following the instructions here:
Install and configure oh-my-openagent by following the instructions here:
https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/refs/heads/dev/docs/guide/installation.md
```
@@ -41,13 +41,13 @@ We used to call this "Claude Code on steroids." That was wrong.
This isn't about making Claude Code better. It's about breaking free from the idea that one model, one provider, one way of working is enough. Anthropic wants you locked in. OpenAI wants you locked in. Everyone wants you locked in.
Oh My OpenCode doesn't play that game. It orchestrates across models, picking the right brain for the right job. Claude for orchestration. GPT for deep reasoning. Gemini for frontend. Haiku for quick tasks. All working together, automatically.
Oh My OpenAgent doesn't play that game. It orchestrates across models, picking the right brain for the right job. Claude for orchestration. GPT for deep reasoning. Gemini for frontend. GPT-5.4 Mini for quick tasks. All working together, automatically.
---
## How It Works: Agent Orchestration
Instead of one agent doing everything, Oh My OpenCode uses **specialized agents that delegate to each other** based on task type.
Instead of one agent doing everything, Oh My OpenAgent uses **specialized agents that delegate to each other** based on task type.
**The Architecture:**
@@ -93,15 +93,15 @@ Sisyphus still works best on Claude-family models, Kimi, and GLM. GPT-5.4 now ha
Named with intentional irony. Anthropic blocked OpenCode from using their API because of this project. So the team built an autonomous GPT-native agent instead.
Hephaestus runs on GPT-5.3 Codex. Give him a goal, not a recipe. He explores the codebase, researches patterns, and executes end-to-end without hand-holding. He is the legitimate craftsman because he was born from necessity, not privilege.
Hephaestus runs on GPT-5.4. Give him a goal, not a recipe. He explores the codebase, researches patterns, and executes end-to-end without hand-holding. He is the legitimate craftsman because he was born from necessity, not privilege.
Use Hephaestus when you need deep architectural reasoning, complex debugging across many files, or cross-domain knowledge synthesis. Switch to him explicitly when the work demands GPT-5.3 Codex's particular strengths.
Use Hephaestus when you need deep architectural reasoning, complex debugging across many files, or cross-domain knowledge synthesis. Switch to him explicitly when the work demands GPT-5.4's particular strengths.
**Why this beats vanilla Codex CLI:**
- **Multi-model orchestration.** Pure Codex is single-model. OmO routes different tasks to different models automatically. GPT for deep reasoning. Gemini for frontend. Haiku for speed. The right brain for the right job.
- **Multi-model orchestration.** Pure Codex is single-model. OmO routes different tasks to different models automatically. GPT for deep reasoning. Gemini for frontend. GPT-5.4 Mini for speed. The right brain for the right job.
- **Background agents.** Fire 5+ agents in parallel. Something Codex simply cannot do. While one agent writes code, another researches patterns, another checks documentation. Like a real dev team.
- **Category system.** Tasks are routed by intent, not model name. `visual-engineering` gets Gemini. `ultrabrain` gets GPT-5.3 Codex. `quick` gets Haiku. No manual juggling.
- **Category system.** Tasks are routed by intent, not model name. `visual-engineering` gets Gemini. `ultrabrain` gets GPT-5.4. `quick` gets GPT-5.4 Mini. No manual juggling.
- **Accumulated wisdom.** Subagents learn from previous results. Conventions discovered in task 1 are passed to task 5. Mistakes made early aren't repeated. The system gets smarter as it works.
### Prometheus: The Strategic Planner
@@ -154,7 +154,7 @@ Use Prometheus for multi-day projects, critical production changes, complex refa
## Agent Model Matching
Different agents work best with different models. Oh My OpenCode automatically assigns optimal models, but you can customize everything.
Different agents work best with different models. Oh My OpenAgent automatically assigns optimal models, but you can customize everything.
### Default Configuration
@@ -168,7 +168,7 @@ You can override specific agents or categories in your config:
```jsonc
{
"$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/dev/assets/oh-my-opencode.schema.json",
"$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/dev/assets/oh-my-openagent.schema.json",
"agents": {
// Main orchestrator: Claude Opus or Kimi K2.5 work best
@@ -193,13 +193,13 @@ You can override specific agents or categories in your config:
},
// General high-effort work
"unspecified-high": { "model": "openai/gpt-5.4", "variant": "high" },
"unspecified-high": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
// Quick tasks: use the cheapest models
"quick": { "model": "anthropic/claude-haiku-4-5" },
// Quick tasks: use GPT-5.4-mini (fast and cheap)
"quick": { "model": "openai/gpt-5.4-mini" },
// Deep reasoning: GPT-5.3-codex
"ultrabrain": { "model": "openai/gpt-5.3-codex", "variant": "xhigh" },
// Deep reasoning: GPT-5.4
"ultrabrain": { "model": "openai/gpt-5.4", "variant": "xhigh" },
},
}
```
@@ -214,14 +214,13 @@ You can override specific agents or categories in your config:
**GPT models** (explicit reasoning, principle-driven):
- GPT-5.3-codex — deep coding powerhouse, required for Hephaestus
- GPT-5.4 — high intelligence, default for Oracle
- GPT-5.4 — deep coding powerhouse, required for Hephaestus and default for Oracle
- GPT-5-Nano — ultra-cheap, fast utility tasks
**Different-behavior models**:
- Gemini 3 Pro — excels at visual/frontend tasks
- MiniMax M2.5 — fast and smart for utility tasks
- Gemini 3.1 Pro — excels at visual/frontend tasks
- MiniMax M2.7 / M2.7-highspeed — fast and smart for utility tasks
- Grok Code Fast 1 — optimized for code grep/search
See the [Agent-Model Matching Guide](./agent-model-matching.md) for complete details on which models work best for each agent, safe vs dangerous overrides, and provider priority chains.
@@ -232,7 +231,7 @@ See the [Agent-Model Matching Guide](./agent-model-matching.md) for complete det
Claude Code is good. But it's a single agent running a single model doing everything alone.
Oh My OpenCode turns that into a coordinated team:
Oh My OpenAgent turns that into a coordinated team:
**Parallel execution.** Claude Code processes one thing at a time. OmO fires background agents in parallel — research, implementation, and verification happening simultaneously. Like having 5 engineers instead of 1.
@@ -246,7 +245,7 @@ Oh My OpenCode turns that into a coordinated team:
**Discipline enforcement.** Todo enforcer yanks idle agents back to work. Comment checker strips AI slop. Ralph Loop keeps going until 100% done. The system doesn't let the agent slack off.
**The fundamental advantage.** Models have different temperaments. Claude thinks deeply. GPT reasons architecturally. Gemini visualizes. Haiku moves fast. Single-model tools force you to pick one personality for all tasks. Oh My OpenCode leverages them all, routing by task type. This isn't a temporary hack — it's the only architecture that makes sense as models specialize further. The gap between multi-model orchestration and single-model limitation widens every month. We're betting on that future.
**The fundamental advantage.** Models have different temperaments. Claude thinks deeply. GPT reasons architecturally. Gemini visualizes. Haiku moves fast. Single-model tools force you to pick one personality for all tasks. Oh My OpenAgent leverages them all, routing by task type. This isn't a temporary hack — it's the only architecture that makes sense as models specialize further. The gap between multi-model orchestration and single-model limitation widens every month. We're betting on that future.
---
@@ -256,7 +255,7 @@ Before acting on any request, Sisyphus classifies your true intent.
Are you asking for research? Implementation? Investigation? A fix? The Intent Gate figures out what you actually want, not just the literal words you typed. This means the agent understands context, nuance, and the real goal behind your request.
Claude Code doesn't have this. It takes your prompt and runs. Oh My OpenCode thinks first, then acts.
Claude Code doesn't have this. It takes your prompt and runs. Oh My OpenAgent thinks first, then acts.
---

View File

@@ -1,6 +1,6 @@
# Manifesto
The principles and philosophy behind Oh My OpenCode.
The principles and philosophy behind Oh My OpenAgent.
---
@@ -20,7 +20,7 @@ When you find yourself:
That's not "human-AI collaboration." That's the AI failing to do its job.
**Oh My OpenCode is built on this premise**: Human intervention during agentic work is fundamentally a wrong signal. If the system is designed correctly, the agent should complete the work without requiring you to babysit it.
**Oh My OpenAgent is built on this premise**: Human intervention during agentic work is fundamentally a wrong signal. If the system is designed correctly, the agent should complete the work without requiring you to babysit it.
---
@@ -144,7 +144,7 @@ Human Intent → Agent Execution → Verified Result
(intervention only on true failure)
```
Everything in Oh My OpenCode is designed to make this loop work:
Everything in Oh My OpenAgent is designed to make this loop work:
| Feature | Purpose |
|---------|---------|

View File

@@ -0,0 +1,33 @@
# Model Capabilities Maintenance
This project treats model capability resolution as a layered system:
1. runtime metadata from connected providers
2. `models.dev` bundled/runtime snapshot data
3. explicit compatibility aliases
4. heuristic fallback as the last resort
## Internal policy
- Built-in OmO agent/category requirement models must use canonical model IDs.
- Aliases exist only to preserve compatibility with historical OmO names or provider-specific decorations.
- New decorated names like `-high`, `-low`, or `-thinking` should not be added to built-in requirements when a canonical model ID plus structured settings can express the same thing.
- If a provider or config input still uses an alias, normalize it at the edge and continue internally with the canonical ID.
## When adding an alias
- Add the alias rule to `src/shared/model-capability-aliases.ts`.
- Include a rationale for why the alias exists.
- Add or update tests so the alias is covered explicitly.
- Ensure the alias canonical target exists in the bundled `models.dev` snapshot.
## Guardrails
`bun run test:model-capabilities` enforces the following invariants:
- exact alias targets must exist in the bundled snapshot
- exact alias keys must not silently become canonical `models.dev` IDs
- pattern aliases must not rewrite canonical snapshot IDs
- built-in requirement models must stay canonical and snapshot-backed
The scheduled `refresh-model-capabilities` workflow runs these guardrails before opening an automated snapshot refresh PR.

View File

@@ -1,6 +1,6 @@
# CLI Reference
Complete reference for the `oh-my-opencode` command-line interface.
Complete reference for the published `oh-my-opencode` CLI. During the rename transition, OpenCode plugin registration now prefers `oh-my-openagent` inside `opencode.json`.
## Basic Usage
@@ -14,20 +14,21 @@ npx oh-my-opencode
## Commands
| Command | Description |
| ------------------- | ----------------------------------------- |
| `install` | Interactive setup wizard |
| `doctor` | Environment diagnostics and health checks |
| `run` | OpenCode session runner |
| `mcp oauth` | MCP OAuth authentication management |
| `auth` | Google Antigravity OAuth authentication |
| `get-local-version` | Display local version information |
| Command | Description |
| ----------------------------- | ------------------------------------------------------ |
| `install` | Interactive setup wizard |
| `doctor` | Environment diagnostics and health checks |
| `run` | OpenCode session runner with task completion enforcement |
| `get-local-version` | Display local version information and update check |
| `refresh-model-capabilities` | Refresh the cached models.dev-based model capabilities |
| `version` | Show version information |
| `mcp oauth` | MCP OAuth authentication management |
---
## install
Interactive installation tool for initial Oh-My-OpenCode setup. Provides a TUI based on `@clack/prompts`.
Interactive installation tool for initial Oh My OpenCode setup. Provides a TUI based on `@clack/prompts`.
### Usage
@@ -37,24 +38,37 @@ bunx oh-my-opencode install
### Installation Process
1. **Provider Selection**: Choose your AI provider (Claude, ChatGPT, or Gemini)
2. **API Key Input**: Enter the API key for your selected provider
3. **Configuration File Creation**: Generates `opencode.json` or `oh-my-opencode.json` files
4. **Plugin Registration**: Automatically registers the oh-my-opencode plugin in OpenCode settings
1. **Subscription Selection**: Choose which providers and subscriptions you actually have
2. **Plugin Registration**: Registers `oh-my-openagent` in OpenCode settings, or upgrades a legacy `oh-my-opencode` entry during the compatibility window
3. **Configuration File Creation**: Writes the generated OmO config to `oh-my-opencode.json` in the active OpenCode config directory
4. **Authentication Hints**: Shows the `opencode auth login` steps for the providers you selected, unless `--skip-auth` is set
### Options
| Option | Description |
| ----------- | ---------------------------------------------------------------- |
| `--no-tui` | Run in non-interactive mode without TUI (for CI/CD environments) |
| `--verbose` | Display detailed logs |
| Option | Description |
| ------ | ----------- |
| `--no-tui` | Run in non-interactive mode without TUI |
| `--claude <no\|yes\|max20>` | Claude subscription mode |
| `--openai <no\|yes>` | OpenAI / ChatGPT subscription |
| `--gemini <no\|yes>` | Gemini integration |
| `--copilot <no\|yes>` | GitHub Copilot subscription |
| `--opencode-zen <no\|yes>` | OpenCode Zen access |
| `--zai-coding-plan <no\|yes>` | Z.ai Coding Plan subscription |
| `--kimi-for-coding <no\|yes>` | Kimi for Coding subscription |
| `--opencode-go <no\|yes>` | OpenCode Go subscription |
| `--skip-auth` | Skip authentication setup hints |
---
## doctor
Diagnoses your environment to ensure Oh-My-OpenCode is functioning correctly. Performs 17+ health checks.
Diagnoses your environment to ensure Oh My OpenCode is functioning correctly. The current checks are grouped into system, config, tools, and models.
The doctor command detects common issues including:
- Legacy plugin entry references in `opencode.json` (warns when `oh-my-opencode` is still used instead of `oh-my-openagent`)
- Configuration file validity and JSONC parsing errors
- Model resolution and fallback chain verification
- Missing or misconfigured MCP servers
### Usage
```bash
@@ -63,22 +77,20 @@ bunx oh-my-opencode doctor
### Diagnostic Categories
| Category | Check Items |
| ------------------ | --------------------------------------------------------- |
| **Installation** | OpenCode version (>= 1.0.150), plugin registration status |
| **Configuration** | Configuration file validity, JSONC parsing |
| **Authentication** | Anthropic, OpenAI, Google API key validity |
| **Dependencies** | Bun, Node.js, Git installation status |
| **Tools** | LSP server status, MCP server status |
| **Updates** | Latest version check |
| Category | Check Items |
| ----------------- | ------------------------------------------------------------------------------------ |
| **System** | OpenCode binary, version (>= 1.0.150), plugin registration, legacy package name warning |
| **Config** | Configuration file validity, JSONC parsing, Zod schema validation |
| **Tools** | AST-Grep, LSP servers, GitHub CLI, MCP servers |
| **Models** | Model capabilities cache, model resolution, agent/category overrides, availability |
### Options
| Option | Description |
| ------------------- | ---------------------------------------------------------------- |
| `--category <name>` | Check specific category only (e.g., `--category authentication`) |
| `--json` | Output results in JSON format |
| `--verbose` | Include detailed information |
| Option | Description |
| ------------ | ----------------------------------------- |
| `--status` | Show compact system dashboard |
| `--verbose` | Show detailed diagnostic information |
| `--json` | Output results in JSON format |
### Example Output
@@ -86,57 +98,95 @@ bunx oh-my-opencode doctor
oh-my-opencode doctor
┌──────────────────────────────────────────────────┐
│ Oh-My-OpenCode Doctor │
│ Oh-My-OpenAgent Doctor │
└──────────────────────────────────────────────────┘
Installation
System
✓ OpenCode version: 1.0.155 (>= 1.0.150)
✓ Plugin registered in opencode.json
Configuration
✓ oh-my-opencode.json is valid
Config
✓ oh-my-opencode.jsonc is valid
✓ Model resolution: all agents have valid fallback chains
⚠ categories.visual-engineering: using default model
Authentication
✓ Anthropic API key configured
OpenAI API key configured
✗ Google API key not found
Tools
✓ AST-Grep available
LSP servers configured
Dependencies
Bun 1.2.5 installed
✓ Node.js 22.0.0 installed
✓ Git 2.45.0 installed
Models
11 agents, 8 categories, 0 overrides
⚠ Some configured models rely on compatibility fallback
Summary: 10 passed, 1 warning, 1 failed
Summary: 10 passed, 1 warning, 0 failed
```
---
## run
Executes OpenCode sessions and monitors task completion.
Run opencode with todo/background task completion enforcement. Unlike 'opencode run', this command waits until all todos are completed or cancelled, and all child sessions (background tasks) are idle.
### Usage
```bash
bunx oh-my-opencode run [prompt]
bunx oh-my-opencode run <message>
```
### Options
| Option | Description |
| ------------------------ | ------------------------------------------------- |
| `--enforce-completion` | Keep session active until all TODOs are completed |
| `--timeout <seconds>` | Set maximum execution time |
| `--agent <name>` | Specify agent to use |
| `--directory <path>` | Set working directory |
| `--port <number>` | Set port for session |
| `--attach` | Attach to existing session |
| `--json` | Output in JSON format |
| `--no-timestamp` | Disable timestamped output |
| `--session-id <id>` | Resume existing session |
| `--on-complete <action>` | Action on completion |
| `--verbose` | Enable verbose logging |
| Option | Description |
| --------------------- | ------------------------------------------------------------------- |
| `-a, --agent <name>` | Agent to use (default: from CLI/env/config, fallback: Sisyphus) |
| `-m, --model <provider/model>` | Model override (e.g., anthropic/claude-sonnet-4) |
| `-d, --directory <path>` | Working directory |
| `-p, --port <port>` | Server port (attaches if port already in use) |
| `--attach <url>` | Attach to existing opencode server URL |
| `--on-complete <command>` | Shell command to run after completion |
| `--json` | Output structured JSON result to stdout |
| `--no-timestamp` | Disable timestamp prefix in run output |
| `--verbose` | Show full event stream (default: messages/tools only) |
| `--session-id <id>` | Resume existing session instead of creating new one |
---
## get-local-version
Show current installed version and check for updates.
### Usage
```bash
bunx oh-my-opencode get-local-version
```
### Options
| Option | Description |
| ----------------- | ---------------------------------------------- |
| `-d, --directory` | Working directory to check config from |
| `--json` | Output in JSON format for scripting |
### Output
Shows:
- Current installed version
- Latest available version on npm
- Whether you're up to date
- Special modes (local dev, pinned version)
---
## version
Show version information.
### Usage
```bash
bunx oh-my-opencode version
```
`--on-complete` runs through your current shell when possible: `sh` on Unix shells, `pwsh` for PowerShell on non-Windows, `powershell.exe` for PowerShell on Windows, and `cmd.exe` as the Windows fallback.
---
@@ -151,10 +201,10 @@ Manages OAuth 2.1 authentication for remote MCP servers.
bunx oh-my-opencode mcp oauth login <server-name> --server-url https://api.example.com
# Login with explicit client ID and scopes
bunx oh-my-opencode mcp oauth login my-api --server-url https://api.example.com --client-id my-client --scopes "read,write"
bunx oh-my-opencode mcp oauth login my-api --server-url https://api.example.com --client-id my-client --scopes read write
# Remove stored OAuth tokens
bunx oh-my-opencode mcp oauth logout <server-name>
bunx oh-my-opencode mcp oauth logout <server-name> --server-url https://api.example.com
# Check OAuth token status
bunx oh-my-opencode mcp oauth status [server-name]
@@ -166,7 +216,7 @@ bunx oh-my-opencode mcp oauth status [server-name]
| -------------------- | ------------------------------------------------------------------------- |
| `--server-url <url>` | MCP server URL (required for login) |
| `--client-id <id>` | OAuth client ID (optional if server supports Dynamic Client Registration) |
| `--scopes <scopes>` | Comma-separated OAuth scopes |
| `--scopes <scopes>` | OAuth scopes as separate variadic arguments (for example: `--scopes read write`) |
### Token Storage
@@ -176,10 +226,20 @@ Tokens are stored in `~/.config/opencode/mcp-oauth.json` with `0600` permissions
## Configuration Files
The CLI searches for configuration files in the following locations (in priority order):
The runtime loads user config as the base config, then merges project config on top:
1. **Project Level**: `.opencode/oh-my-opencode.json`
2. **User Level**: `~/.config/opencode/oh-my-opencode.json`
1. **Project Level**: `.opencode/oh-my-openagent.jsonc`, `.opencode/oh-my-openagent.json`, `.opencode/oh-my-opencode.jsonc`, or `.opencode/oh-my-opencode.json`
2. **User Level**: `~/.config/opencode/oh-my-openagent.jsonc`, `~/.config/opencode/oh-my-openagent.json`, `~/.config/opencode/oh-my-opencode.jsonc`, or `~/.config/opencode/oh-my-opencode.json`
**Naming Note**: The published package and binary are still `oh-my-opencode`. Inside `opencode.json`, the compatibility layer now prefers the plugin entry `oh-my-openagent`. Plugin config loading recognizes both `oh-my-openagent.*` and legacy `oh-my-opencode.*` basenames. If both basenames exist in the same directory, the legacy `oh-my-opencode.*` file currently wins.
### Filename Compatibility
Both `.jsonc` and `.json` extensions are supported. JSONC (JSON with Comments) is preferred as it allows:
- Comments (both `//` and `/* */` styles)
- Trailing commas in arrays and objects
If both `.jsonc` and `.json` exist in the same directory, the `.jsonc` file takes precedence.
### JSONC Support
@@ -228,19 +288,66 @@ bunx oh-my-opencode install
# Diagnose with detailed information
bunx oh-my-opencode doctor --verbose
# Check specific category only
bunx oh-my-opencode doctor --category authentication
# Show compact system dashboard
bunx oh-my-opencode doctor --status
# JSON output for scripting
bunx oh-my-opencode doctor --json
```
### "Using legacy package name" Warning
The doctor warns if it finds the legacy plugin entry `oh-my-opencode` in `opencode.json`. Update the plugin array to the canonical `oh-my-openagent` entry:
```bash
# Replace the legacy plugin entry in user config
jq '.plugin = (.plugin // [] | map(if . == "oh-my-opencode" then "oh-my-openagent" else . end))' \
~/.config/opencode/opencode.json > /tmp/opencode.json && mv /tmp/opencode.json ~/.config/opencode/opencode.json
```
---
## refresh-model-capabilities
Refreshes the cached model capabilities snapshot from models.dev. This updates the local cache used by capability resolution and compatibility diagnostics.
### Usage
```bash
bunx oh-my-opencode refresh-model-capabilities
```
### Options
| Option | Description |
| ----------------- | --------------------------------------------------- |
| `-d, --directory` | Working directory to read oh-my-opencode config from |
| `--source-url <url>` | Override the models.dev source URL |
| `--json` | Output refresh summary as JSON |
### Configuration
Configure automatic refresh behavior in your plugin config:
```jsonc
{
"model_capabilities": {
"enabled": true,
"auto_refresh_on_start": true,
"refresh_timeout_ms": 5000,
"source_url": "https://models.dev/api.json"
}
}
```
---
## Non-Interactive Mode
Use the `--no-tui` option for CI/CD environments.
Use JSON output for CI or scripted diagnostics.
```bash
# Run doctor in CI environment
bunx oh-my-opencode doctor --no-tui --json
bunx oh-my-opencode doctor --json
# Save results to file
bunx oh-my-opencode doctor --json > doctor-report.json

Some files were not shown because too many files have changed in this diff Show More